================================================================================ LECTURE 001 ================================================================================ General Intro | Stanford CS221: Artificial Intelligence: Principles and Techniques (Autumn 2021) Source: https://www.youtube.com/watch?v=ZiwogMtbjr4 --- Transcript [00:00:05] okay hello everyone i'm dorothy static [00:00:07] okay hello everyone i'm dorothy static uh and i am one of the co-instructors of [00:00:10] uh and i am one of the co-instructors of cs221 and uh today i'm here with percy [00:00:14] cs221 and uh today i'm here with percy liang and our group of [00:00:16] liang and our group of tas here to teach uh the first lecture [00:00:19] tas here to teach uh the first lecture of 221 [00:00:21] of 221 so with that i would like to first [00:00:24] so with that i would like to first before getting started in the details of [00:00:26] before getting started in the details of the class just introduce uh the team so [00:00:29] the class just introduce uh the team so i am dorothy sadik i am an assistant [00:00:31] i am dorothy sadik i am an assistant professor in computer science [00:00:33] professor in computer science uh and i this is the fifth time i'm [00:00:35] uh and i this is the fifth time i'm teaching cs221 [00:00:37] teaching cs221 uh the second time i'm teaching it [00:00:39] uh the second time i'm teaching it online virtually uh i think it is the [00:00:42] online virtually uh i think it is the third time i'm co-teaching with percy [00:00:44] third time i'm co-teaching with percy liang so really excited uh to to start [00:00:48] liang so really excited uh to to start the quarter with you guys here and and [00:00:50] the quarter with you guys here and and my research just a little bit about my [00:00:52] my research just a little bit about my research my work is on the [00:00:54] research my work is on the in robotics and ai and in general very [00:00:57] in robotics and ai and in general very interested in the interaction between [00:00:59] interested in the interaction between robotics and ai agents with humans and [00:01:02] robotics and ai agents with humans and with other agents other intelligent [00:01:03] with other agents other intelligent agents so if these topics are of [00:01:05] agents so if these topics are of interest come to office hours we'd love [00:01:07] interest come to office hours we'd love to chat about that and talk about it [00:01:09] to chat about that and talk about it offline in general too [00:01:11] offline in general too uh my co-instructor today here is percy [00:01:13] uh my co-instructor today here is percy liang i think i saw percy uh somewhere [00:01:16] liang i think i saw percy uh somewhere yeah i'm here [00:01:18] yeah i'm here hello everyone i'm percy i'm the [00:01:20] hello everyone i'm percy i'm the co-instructor um and i think this is my [00:01:24] co-instructor um and i think this is my ninth or tenth year teaching 221 it's [00:01:27] ninth or tenth year teaching 221 it's really been interesting how ai has [00:01:29] really been interesting how ai has evolved since since when i first started [00:01:32] evolved since since when i first started talking [00:01:33] talking about it [00:01:34] about it my research interests are in machine [00:01:36] my research interests are in machine learning and natural language processing [00:01:38] learning and natural language processing thinking about how to make systems more [00:01:40] thinking about how to make systems more robust and [00:01:41] robust and trustworthy [00:01:43] trustworthy recently i've been really fascinated by [00:01:44] recently i've been really fascinated by what we've been calling foundation [00:01:46] what we've been calling foundation models uh models such as gp3 and bert [00:01:49] models uh models such as gp3 and bert and dali [00:01:50] and dali and i'm going to discuss that more later [00:01:52] and i'm going to discuss that more later in the class [00:01:54] in the class all right [00:01:55] all right thank you percy [00:01:57] thank you percy all right so um so what are we going to [00:02:00] all right so um so what are we going to be talking about today so our plan for [00:02:02] be talking about today so our plan for today is to talk a little bit about some [00:02:04] today is to talk a little bit about some of the course logistics and then some of [00:02:06] of the course logistics and then some of the course contents like what are we [00:02:07] the course contents like what are we going to actually cover as part of this [00:02:09] going to actually cover as part of this class then we'll have some icebreakers [00:02:12] class then we'll have some icebreakers so we'll have like a five minute [00:02:13] so we'll have like a five minute breakout room we'll discuss things about [00:02:15] breakout room we'll discuss things about ai and then toward the end of the class [00:02:17] ai and then toward the end of the class i'm going to talk a little bit about the [00:02:19] i'm going to talk a little bit about the history of a brief history of ai and [00:02:21] history of a brief history of ai and then what ai is today and what are some [00:02:24] then what ai is today and what are some risks and benefits of ai and how we [00:02:25] risks and benefits of ai and how we should think about it moving forward so [00:02:28] should think about it moving forward so so that is our plan for today okay so [00:02:31] so that is our plan for today okay so before i start also if there are any [00:02:33] before i start also if there are any questions um [00:02:34] questions um feel free to put questions on on zoom [00:02:36] feel free to put questions on on zoom chat and or raise your hand and the ca [00:02:39] chat and or raise your hand and the ca skin and can try to um kind of like [00:02:42] skin and can try to um kind of like answer the questions or like ask the [00:02:43] answer the questions or like ask the questions like throughout um as a as i [00:02:45] questions like throughout um as a as i give as i give the talk [00:02:48] give as i give the talk all right so so let's talk about course [00:02:50] all right so so let's talk about course logistics [00:02:52] so [00:02:54] so so we're going to have a set of [00:02:55] so we're going to have a set of activities as part of the class and and [00:02:57] activities as part of the class and and last year when we had to go virtually [00:02:59] last year when we had to go virtually because of because of covet we started [00:03:01] because of because of covet we started experimenting with a few different ways [00:03:03] experimenting with a few different ways of changing and reformatting the class [00:03:05] of changing and reformatting the class and some of them worked really well and [00:03:07] and some of them worked really well and some of them actually didn't work so [00:03:09] some of them actually didn't work so well so based on the experience that we [00:03:11] well so based on the experience that we have from last year we've decided to [00:03:13] have from last year we've decided to switch up the activities a little bit [00:03:14] switch up the activities a little bit more and also like some of some of these [00:03:17] more and also like some of some of these some of these changes really make sense [00:03:20] some of these changes really make sense to do also throughout like normal [00:03:22] to do also throughout like normal quarter to even if [00:03:23] quarter to even if even if you're not virtual so so one of [00:03:25] even if you're not virtual so so one of these changes is going from traditional [00:03:28] these changes is going from traditional lectures to something that you're [00:03:29] lectures to something that you're calling modules so these are basically [00:03:32] calling modules so these are basically pre-recorded modules and lectures that [00:03:34] pre-recorded modules and lectures that are that are broken into small [00:03:36] are that are broken into small bite-sized chunks so so on every topic [00:03:39] bite-sized chunks so so on every topic that we are we are going to cover in [00:03:40] that we are we are going to cover in this class we're going to have a lecture [00:03:42] this class we're going to have a lecture of like 10 to 20 minutes and a module [00:03:44] of like 10 to 20 minutes and a module really that goes over that topic and [00:03:47] really that goes over that topic and these are pre-recorded and you're going [00:03:48] these are pre-recorded and you're going to release these like the modules for [00:03:51] to release these like the modules for that week on the monday of the week so [00:03:53] that week on the monday of the week so you have time to to watch these kind of [00:03:55] you have time to to watch these kind of like [00:03:56] like based on your own schedule and when it [00:03:58] based on your own schedule and when it makes sense for you to watch these [00:04:00] makes sense for you to watch these lectures so it's a little bit easier to [00:04:02] lectures so it's a little bit easier to manage these like bite-sized chunks [00:04:04] manage these like bite-sized chunks that's one reason that you're going [00:04:05] that's one reason that you're going towards these modules also these are [00:04:07] towards these modules also these are pre-recorded and you're probably going [00:04:09] pre-recorded and you're probably going to use the same recordings as as last [00:04:11] to use the same recordings as as last year so so um so we have more time to [00:04:13] year so so um so we have more time to spend with you guys during our lecture [00:04:15] spend with you guys during our lecture times uh kind of in a flipped format [00:04:17] times uh kind of in a flipped format okay [00:04:18] okay so so that's the modules [00:04:20] so so that's the modules then in addition to that during our [00:04:22] then in addition to that during our normal lecture time you're going to have [00:04:24] normal lecture time you're going to have two types of activities so on mondays [00:04:26] two types of activities so on mondays you're going to have faculty chats so [00:04:28] you're going to have faculty chats so these are going to be unzoom and they're [00:04:31] these are going to be unzoom and they're basically small group discussions with [00:04:33] basically small group discussions with faculty on on ai related topics so um [00:04:36] faculty on on ai related topics so um they're going to be basically six like [00:04:38] they're going to be basically six like 25 minute sessions so so every monday [00:04:41] 25 minute sessions so so every monday from from 1 30 to 3 p.m uh percy and i [00:04:44] from from 1 30 to 3 p.m uh percy and i each of us are going to have a zoom room [00:04:46] each of us are going to have a zoom room like 30 minutes each for each session [00:04:49] like 30 minutes each for each session and you're going to be assigned for one [00:04:51] and you're going to be assigned for one of these faculty chats so this is [00:04:53] of these faculty chats so this is actually mandatory to attend at least [00:04:55] actually mandatory to attend at least like one of these faculty chats and the [00:04:57] like one of these faculty chats and the reason we are doing this is [00:04:58] reason we are doing this is traditionally when you have a large ai [00:05:00] traditionally when you have a large ai course like you have like 300 something [00:05:02] course like you have like 300 something enrollment and it's really hard to get [00:05:04] enrollment and it's really hard to get to know you guys and actually like talk [00:05:06] to know you guys and actually like talk to you and sometimes it becomes like [00:05:07] to you and sometimes it becomes like really difficult to know the faculty [00:05:09] really difficult to know the faculty when you're in some of these larger [00:05:10] when you're in some of these larger classes and this is really like what [00:05:13] classes and this is really like what what we are trying what we're trying to [00:05:14] what we are trying what we're trying to do here is is really to to get to know [00:05:17] do here is is really to to get to know you through some of these faculty chats [00:05:18] you through some of these faculty chats and discuss like some of the ai related [00:05:20] and discuss like some of the ai related topics some of the some of the more [00:05:22] topics some of the some of the more recent material research material around [00:05:25] recent material research material around newer topics like foundation models that [00:05:27] newer topics like foundation models that percy is going to talk about or some of [00:05:29] percy is going to talk about or some of the some of the topics around robotics [00:05:31] the some of the topics around robotics autonomous driving and so on [00:05:33] autonomous driving and so on so i'll talk about this a little bit [00:05:35] so i'll talk about this a little bit later in the talk the exact format of it [00:05:37] later in the talk the exact format of it and and and there is a little bit of a [00:05:39] and and and there is a little bit of a homework to do beforehand before coming [00:05:41] homework to do beforehand before coming to [00:05:42] to to these [00:05:43] to these sessions but but the idea is we'll have [00:05:45] sessions but but the idea is we'll have these in pers not in person sorry [00:05:47] these in pers not in person sorry virtual virtual faculty chats on mondays [00:05:50] virtual virtual faculty chats on mondays and lecture time and one other point [00:05:52] and lecture time and one other point that i want to mention is that if you [00:05:54] that i want to mention is that if you have conflicts during lecture times you [00:05:55] have conflicts during lecture times you want to make sure that you wouldn't have [00:05:57] want to make sure that you wouldn't have conflict for the time that you're you're [00:05:58] conflict for the time that you're you're assigned so it does actually because [00:06:00] assigned so it does actually because that is mandatory so make sure that you [00:06:02] that is mandatory so make sure that you actually don't have conflicts during [00:06:03] actually don't have conflicts during lecture times [00:06:05] lecture times all right [00:06:06] all right so um the second uh bit is problem [00:06:10] so um the second uh bit is problem session so this is going to be on [00:06:11] session so this is going to be on wednesdays so on wednesdays again during [00:06:13] wednesdays so on wednesdays again during class time you're going to have these [00:06:16] class time you're going to have these problem sessions and they're kind of [00:06:17] problem sessions and they're kind of like traditional sections except for we [00:06:20] like traditional sections except for we have changed them a little bit based on [00:06:21] have changed them a little bit based on that based on the feedback that we got [00:06:23] that based on the feedback that we got last year and during these problem [00:06:24] last year and during these problem sessions we're going to have the cas [00:06:26] sessions we're going to have the cas work out practice problems so these [00:06:28] work out practice problems so these could be previous years quizzes or [00:06:31] could be previous years quizzes or previous years exam questions or [00:06:33] previous years exam questions or basically just problems that can help [00:06:35] basically just problems that can help you get started on your homework or or [00:06:37] you get started on your homework or or basically get ready for for some of some [00:06:39] basically get ready for for some of some of the exam questions later on so so i [00:06:42] of the exam questions later on so so i do recommend going to these problem [00:06:43] do recommend going to these problem sessions they're incredibly useful to to [00:06:45] sessions they're incredibly useful to to get your hands dirty with some of the [00:06:46] get your hands dirty with some of the topics that that you're learning that [00:06:48] topics that that you're learning that week and again this is on zoom [00:06:50] week and again this is on zoom on wednesdays class time [00:06:53] on wednesdays class time all right so what else we're going to [00:06:54] all right so what else we're going to also have um homework parties so [00:06:57] also have um homework parties so homework parties uh used to be very [00:06:59] homework parties uh used to be very popular when they were in person and [00:07:01] popular when they were in person and last year i think it was a little bit [00:07:02] last year i think it was a little bit more difficult to make it happen but [00:07:05] more difficult to make it happen but eventually people realize that homework [00:07:06] eventually people realize that homework parties like matter a lot because that's [00:07:08] parties like matter a lot because that's a very good place to show up and and and [00:07:11] a very good place to show up and and and work with other people on your homework [00:07:12] work with other people on your homework problems get started on on on some of [00:07:15] problems get started on on on some of some of the more challenging problems [00:07:16] some of the more challenging problems together study together and the cas will [00:07:19] together study together and the cas will be there to answer questions so these [00:07:21] be there to answer questions so these homework parties are going to happen on [00:07:23] homework parties are going to happen on nukes which is a platform that we [00:07:24] nukes which is a platform that we started using last year again all the [00:07:26] started using last year again all the information about zoom nukes like all [00:07:28] information about zoom nukes like all these all these links are is on the [00:07:31] these all these links are is on the cs221 website so so the details of all [00:07:33] cs221 website so so the details of all everything i say today is also on the [00:07:35] everything i say today is also on the website okay [00:07:37] website okay all right so beyond homework parties we [00:07:39] all right so beyond homework parties we also have office hours so the cas have [00:07:42] also have office hours so the cas have office hours and there are two types of [00:07:44] office hours and there are two types of office hours so um there there are a set [00:07:46] office hours so um there there are a set of in-person office hours that i was [00:07:48] of in-person office hours that i was talking about earlier these are limited [00:07:50] talking about earlier these are limited but they're going to be in the basement [00:07:51] but they're going to be in the basement of hwang and and they are group based [00:07:54] of hwang and and they are group based there's no sign up required but but [00:07:56] there's no sign up required but but basically there's a group of students [00:07:57] basically there's a group of students and the ca at a at the basement of huang [00:08:00] and the ca at a at the basement of huang in addition to that we also the majority [00:08:02] in addition to that we also the majority of office hours are actually going to be [00:08:04] of office hours are actually going to be virtual and these are by appointments so [00:08:07] virtual and these are by appointments so so we used calendar last year and part [00:08:09] so we used calendar last year and part of it was just these queues were [00:08:10] of it was just these queues were becoming too long to handle so now we [00:08:13] becoming too long to handle so now we you have you can make it and make an [00:08:14] you have you can make it and make an appointment for ca office hour it's [00:08:16] appointment for ca office hour it's going to be a one-on-one office hour and [00:08:18] going to be a one-on-one office hour and these office hours will happen on weeks [00:08:20] these office hours will happen on weeks okay [00:08:21] okay in addition our ca office hours are two [00:08:24] in addition our ca office hours are two categories so we have separated general [00:08:26] categories so we have separated general office hours from homework office hours [00:08:28] office hours from homework office hours so if you have over questions you should [00:08:30] so if you have over questions you should just go to the dedicated homework office [00:08:32] just go to the dedicated homework office hours but if you have more general [00:08:33] hours but if you have more general questions about the course if you're [00:08:35] questions about the course if you're thinking about your project if you're [00:08:36] thinking about your project if you're thinking about general ai questions you [00:08:38] thinking about general ai questions you should go to the general office hours or [00:08:40] should go to the general office hours or the faculty office hours [00:08:44] all right so um and then the final thing [00:08:47] all right so um and then the final thing that we have is faculty office hours so [00:08:49] that we have is faculty office hours so so percy and i both of us will have a [00:08:51] so percy and i both of us will have a 50-minute office hours weekly and the [00:08:53] 50-minute office hours weekly and the schedule for this is also uh on on on [00:08:56] schedule for this is also uh on on on the website so you can take a look at [00:08:58] the website so you can take a look at that again it's one on one it is going [00:09:00] that again it's one on one it is going to be virtual um and um you can sign up [00:09:04] to be virtual um and um you can sign up for them beforehand and come and chat [00:09:06] for them beforehand and come and chat with us and again all details is on the [00:09:08] with us and again all details is on the website [00:09:09] website all right so let me just like see if [00:09:11] all right so let me just like see if there are any questions here [00:09:15] there are any questions here does that [00:09:16] does that anyone have any questions [00:09:22] should i look at chat okay let me just [00:09:24] should i look at chat okay let me just quickly look at chat uh so many of you [00:09:26] quickly look at chat uh so many of you know each faculty chat to attend so we [00:09:29] know each faculty chat to attend so we will be in touch about this soon so [00:09:30] will be in touch about this soon so actually on that there is um i'll talk [00:09:33] actually on that there is um i'll talk about this in a little bit but uh [00:09:35] about this in a little bit but uh basically uh there's a survey that you [00:09:37] basically uh there's a survey that you need to fill out uh by i should notice [00:09:40] need to fill out uh by i should notice by wednesday correct me if i'm wrong by [00:09:42] by wednesday correct me if i'm wrong by wednesday [00:09:44] wednesday no wednesday [00:09:45] no wednesday okay by wednesday yeah so that is [00:09:48] okay by wednesday yeah so that is basically to get your preferences on um [00:09:51] basically to get your preferences on um on what faculty chat you would like to [00:09:53] on what faculty chat you would like to go and then we'll also assign you after [00:09:55] go and then we'll also assign you after that to specific faculty chats okay [00:09:59] that to specific faculty chats okay all right [00:10:01] all right uh do we sign up for ca and faculty [00:10:04] uh do we sign up for ca and faculty office hours through nukes so no we will [00:10:06] office hours through nukes so no we will use calendar for signing up but we'll [00:10:08] use calendar for signing up but we'll use nukes for for answering questions [00:10:11] use nukes for for answering questions all right [00:10:13] all right okay [00:10:15] okay all right so if there are not [00:10:17] all right so if there are not no more questions let me go to the next [00:10:19] no more questions let me go to the next slide so so now let's talk a little bit [00:10:21] slide so so now let's talk a little bit about periodics so so this is a question [00:10:23] about periodics so so this is a question that oftentimes comes up [00:10:25] that oftentimes comes up so what are the prereqs for the class so [00:10:26] so what are the prereqs for the class so you need to have some programming [00:10:28] you need to have some programming background uh it would be good if it is [00:10:30] background uh it would be good if it is in python so so these are some courses [00:10:32] in python so so these are some courses that are prereqs for this course in [00:10:34] that are prereqs for this course in addition to that it's a good idea to [00:10:36] addition to that it's a good idea to have some mathematic math backgrounds so [00:10:37] have some mathematic math backgrounds so discrete math cs10103 [00:10:40] discrete math cs10103 is is a [00:10:41] is is a is a prereq in addition to that it would [00:10:43] is a prereq in addition to that it would be a good idea to have some background [00:10:45] be a good idea to have some background in probability and linear algebra so 109 [00:10:47] in probability and linear algebra so 109 and math math 51. okay [00:10:49] and math math 51. okay but but in general we want you guys to [00:10:51] but but in general we want you guys to have general familiarity with these [00:10:55] have general familiarity with these and have some mathematical rigor and [00:10:56] and have some mathematical rigor and general familiarity with probability [00:10:58] general familiarity with probability linearity of our discrete math these [00:11:00] linearity of our discrete math these types of topics we are not really [00:11:02] types of topics we are not really expecting a very specific type of [00:11:04] expecting a very specific type of knowledge for example like in linear [00:11:06] knowledge for example like in linear algebra you'll learn about eigenvectors [00:11:08] algebra you'll learn about eigenvectors but like we don't really like require [00:11:10] but like we don't really like require the knowledge of eigenvectors in this [00:11:11] the knowledge of eigenvectors in this class so so they're not specific topics [00:11:14] class so so they're not specific topics that we're looking for but generally you [00:11:16] that we're looking for but generally you want to know math you want to know [00:11:17] want to know math you want to know programming and and come into class with [00:11:19] programming and and come into class with that knowledge and and the reason is [00:11:21] that knowledge and and the reason is that this course is also fairly like [00:11:23] that this course is also fairly like fast-paced so you don't want to spend [00:11:24] fast-paced so you don't want to spend your time like learning python or [00:11:26] your time like learning python or learning learning math through this [00:11:28] learning learning math through this class your your python programming and [00:11:31] class your your python programming and programming is going to improve your [00:11:32] programming is going to improve your math knowledge is going to improve but [00:11:34] math knowledge is going to improve but you don't want to spend time learning [00:11:36] you don't want to spend time learning these backgrounds and you want to really [00:11:38] these backgrounds and you want to really spend all of your time learning ai so so [00:11:40] spend all of your time learning ai so so if there are gaps some people do catch [00:11:43] if there are gaps some people do catch up it is possible but again you want to [00:11:45] up it is possible but again you want to spend your time like learning ai so you [00:11:47] spend your time like learning ai so you kind of like leave it to you guys to [00:11:49] kind of like leave it to you guys to decide and then and move forward and and [00:11:52] decide and then and move forward and and you might ask okay how do i decide so we [00:11:53] you might ask okay how do i decide so we have a we have a couple of things online [00:11:55] have a we have a couple of things online that you can take a look at so so we [00:11:57] that you can take a look at so so we have a set of modules that we recorded [00:11:58] have a set of modules that we recorded actually last year and and these are [00:12:00] actually last year and and these are prereq modules so these kind of provide [00:12:02] prereq modules so these kind of provide refreshers of some of these topics so [00:12:04] refreshers of some of these topics so definitely take a look at some of these [00:12:06] definitely take a look at some of these prereq modules and that gives you a kind [00:12:08] prereq modules and that gives you a kind of like a good sense of what is required [00:12:10] of like a good sense of what is required to know to come into this class [00:12:12] to know to come into this class in addition to that the first homework [00:12:14] in addition to that the first homework is based on foundations and the first [00:12:17] is based on foundations and the first homework really gives you a good idea of [00:12:18] homework really gives you a good idea of what to expect as part of this class in [00:12:20] what to expect as part of this class in terms of again programming and math [00:12:22] terms of again programming and math knowledge coming in so take a look at [00:12:24] knowledge coming in so take a look at these before deciding if you want to [00:12:26] these before deciding if you want to skip every wreck or not but in general [00:12:28] skip every wreck or not but in general again i do think it's a good idea to [00:12:30] again i do think it's a good idea to have these backgrounds coming into this [00:12:31] have these backgrounds coming into this class [00:12:33] class all right [00:12:34] all right so let's then talk about grading a [00:12:36] so let's then talk about grading a little bit so grading is fairly [00:12:38] little bit so grading is fairly straightforward so so we're going to [00:12:40] straightforward so so we're going to have a set of homeworks um that's 55 of [00:12:42] have a set of homeworks um that's 55 of the grade we're going to have two exams [00:12:45] the grade we're going to have two exams so that is 40 of the grade uh the [00:12:48] so that is 40 of the grade uh the faculty chats we actually count [00:12:49] faculty chats we actually count participation part as part of the grade [00:12:51] participation part as part of the grade so that is five percent and then [00:12:53] so that is five percent and then projects we are going to make that uh [00:12:55] projects we are going to make that uh optional this year so it's going to [00:12:57] optional this year so it's going to count down towards extra credit and then [00:12:59] count down towards extra credit and then if you contribute to ed so we're going [00:13:01] if you contribute to ed so we're going to use at this quarter as opposed to [00:13:03] to use at this quarter as opposed to piata if you contribute to ed that is [00:13:06] piata if you contribute to ed that is also going to give you some level of [00:13:07] also going to give you some level of extra credit okay [00:13:09] extra credit okay uh and in general you can take the class [00:13:11] uh and in general you can take the class uh for letter grade or pass no pass that [00:13:14] uh for letter grade or pass no pass that is also like your choice basically so [00:13:16] is also like your choice basically so now let's talk about each one of these [00:13:18] now let's talk about each one of these components a little bit in more detail [00:13:20] components a little bit in more detail so um [00:13:22] so um so in terms of homeworks we have eight [00:13:24] so in terms of homeworks we have eight homeworks and these eight homeworks are [00:13:25] homeworks and these eight homeworks are a mix of programming questions and [00:13:27] a mix of programming questions and written questions and the programming [00:13:29] written questions and the programming problems are mainly focused on a [00:13:31] problems are mainly focused on a specific application so uh like for [00:13:34] specific application so uh like for example we might be looking at uh [00:13:36] example we might be looking at uh blackjack as a game or we might be [00:13:37] blackjack as a game or we might be looking at like pac-man as a game of [00:13:39] looking at like pac-man as a game of pac-man or various types of topics like [00:13:41] pac-man or various types of topics like car tracking so so there's a particular [00:13:43] car tracking so so there's a particular application that is used as part of the [00:13:45] application that is used as part of the programming component of the homeworks [00:13:47] programming component of the homeworks and these programming components they're [00:13:49] and these programming components they're auto-graded so [00:13:51] auto-graded so and then there are a set of basically [00:13:52] and then there are a set of basically public and private tests so you should [00:13:54] public and private tests so you should definitely try out these public tests [00:13:56] definitely try out these public tests first make sure that you you test this [00:13:59] first make sure that you you test this thoroughly because the grading is very [00:14:01] thoroughly because the grading is very strict it's based on auto grading and [00:14:03] strict it's based on auto grading and you don't see all the tests so that's [00:14:04] you don't see all the tests so that's kind of like the point that i was trying [00:14:06] kind of like the point that i was trying to make here [00:14:07] to make here uh and then um in addition to that you [00:14:10] uh and then um in addition to that you have seven total late states and you can [00:14:12] have seven total late states and you can use maximum of two per homework the [00:14:14] use maximum of two per homework the reason for that is you want to release [00:14:16] reason for that is you want to release the homework solutions so you can't use [00:14:18] the homework solutions so you can't use more than two late days for homeworks [00:14:21] more than two late days for homeworks okay so so that is homeworks okay so [00:14:23] okay so so that is homeworks okay so that's our plan for homeworks the usual [00:14:25] that's our plan for homeworks the usual we'll go with that [00:14:27] we'll go with that um [00:14:28] um one other point that i want to add on [00:14:29] one other point that i want to add on homeworks is we are adding an extra [00:14:31] homeworks is we are adding an extra addition to every every single homework [00:14:33] addition to every every single homework which is an ethics component so it's an [00:14:35] which is an ethics component so it's an ethics component is going to be added uh [00:14:38] ethics component is going to be added uh to all of our homeworks is a new edition [00:14:40] to all of our homeworks is a new edition that you're having this quarter and [00:14:41] that you're having this quarter and we're also going to significantly change [00:14:43] we're also going to significantly change some of these homeworks uh to [00:14:45] some of these homeworks uh to incorporate an ethics question into them [00:14:48] incorporate an ethics question into them so so we're trying to incorporate that [00:14:49] so so we're trying to incorporate that throughout the class throughout these [00:14:51] throughout the class throughout these homeworks so uh that would be that would [00:14:54] homeworks so uh that would be that would be also an addition to consider this [00:14:56] be also an addition to consider this course all right so moving forward with [00:14:59] course all right so moving forward with exams so um this quarter last year we [00:15:02] exams so um this quarter last year we decided to do a set of quizzes this year [00:15:03] decided to do a set of quizzes this year we're not going to do the quizzes that [00:15:05] we're not going to do the quizzes that didn't really like students don't really [00:15:07] didn't really like students don't really like it every every week so instead we [00:15:09] like it every every week so instead we were going to have two exams okay [00:15:12] were going to have two exams okay and and the point of the exams is really [00:15:14] and and the point of the exams is really to test the ability of your knowledge on [00:15:17] to test the ability of your knowledge on working in new problems it's not really [00:15:19] working in new problems it's not really to know like facts that you're teaching [00:15:20] to know like facts that you're teaching it's more about like your knowledge of [00:15:22] it's more about like your knowledge of ai and if you can actually apply that to [00:15:24] ai and if you can actually apply that to new problems [00:15:26] new problems and and all these problems are going to [00:15:27] and and all these problems are going to be written so no coding and you should [00:15:30] be written so no coding and you should take a look at like past exams to get a [00:15:31] take a look at like past exams to get a sense of like that like how these [00:15:34] sense of like that like how these problems look like and and what is a [00:15:35] problems look like and and what is a format of them so each one of the exams [00:15:38] format of them so each one of the exams is going to be a hundred minutes and and [00:15:41] is going to be a hundred minutes and and these exams are going to be to be open [00:15:43] these exams are going to be to be open book [00:15:44] book so um we actually have the dates for [00:15:45] so um we actually have the dates for these exams already they are going to be [00:15:48] these exams already they are going to be released in a 24 24-hour window uh so [00:15:51] released in a 24 24-hour window uh so they're going to be released on one of [00:15:53] they're going to be released on one of the first one is going to be released on [00:15:54] the first one is going to be released on october 29th at 3 15 p.m and then it's [00:15:57] october 29th at 3 15 p.m and then it's going to be due the next day at 3 15 p.m [00:16:01] going to be due the next day at 3 15 p.m pacific time and then similarly we have [00:16:02] pacific time and then similarly we have exam 2 on december 8th since it's 3 15 [00:16:05] exam 2 on december 8th since it's 3 15 p.m pacific time is the time okay [00:16:09] p.m pacific time is the time okay all right so um we have these dates if [00:16:12] all right so um we have these dates if you have major conflicts about with any [00:16:14] you have major conflicts about with any of these dates you should let us know by [00:16:16] of these dates you should let us know by october 8th which is week three of the [00:16:18] october 8th which is week three of the class okay [00:16:20] class okay uh in addition we will not have any late [00:16:22] uh in addition we will not have any late days for these exams um again and [00:16:25] days for these exams um again and because we need to release solutions we [00:16:26] because we need to release solutions we need to make sure it works for everyone [00:16:28] need to make sure it works for everyone so no late days gets applied to the [00:16:29] so no late days gets applied to the exams and of course no collaboration on [00:16:32] exams and of course no collaboration on the exam so please do not talk about the [00:16:34] the exam so please do not talk about the exams on x so so like if you've done it [00:16:37] exams on x so so like if you've done it you're done with it but there's still [00:16:38] you're done with it but there's still like time left uh within that 24 hour [00:16:41] like time left uh within that 24 hour window do not post anything about the [00:16:43] window do not post anything about the exam on on it okay [00:16:46] exam on on it okay all right so that was that was exams [00:16:49] all right so that was that was exams um and uh the last component uh that is [00:16:53] um and uh the last component uh that is mandatory as part of the class is the [00:16:54] mandatory as part of the class is the faculty chat participation so so as i [00:16:57] faculty chat participation so so as i was mentioning earlier the goal of this [00:16:59] was mentioning earlier the goal of this is really discussing topics around ai [00:17:01] is really discussing topics around ai like related and topics around the ai [00:17:04] like related and topics around the ai so fill out this initial survey that i [00:17:06] so fill out this initial survey that i was talking about by wednesday so so [00:17:07] was talking about by wednesday so so that way we can start scheduling these [00:17:09] that way we can start scheduling these you're going to be a sign in session [00:17:11] you're going to be a sign in session again six sessions run in parallel on [00:17:13] again six sessions run in parallel on mondays [00:17:15] mondays and this is during class time on mondays [00:17:17] and this is during class time on mondays so make sure that you can actually make [00:17:18] so make sure that you can actually make them make that time [00:17:20] them make that time and then [00:17:21] and then you should prepare before these sessions [00:17:23] you should prepare before these sessions and these sessions are going to be on [00:17:24] and these sessions are going to be on different topics so if they are on [00:17:26] different topics so if they are on specific like research topics like [00:17:28] specific like research topics like robotics autonomous driving ethics uh [00:17:31] robotics autonomous driving ethics uh robustness foundation models uh we often [00:17:34] robustness foundation models uh we often have some related material that that we [00:17:36] have some related material that that we released beforehand sometimes these are [00:17:38] released beforehand sometimes these are we had a set of fireside chats last year [00:17:41] we had a set of fireside chats last year it could be like that fireside chat to [00:17:43] it could be like that fireside chat to watch and or talks last year [00:17:45] watch and or talks last year it could be basically that talk to watch [00:17:47] it could be basically that talk to watch beforehand so you come to the come to [00:17:49] beforehand so you come to the come to the session a little bit prepared and we [00:17:51] the session a little bit prepared and we can talk about these topics we also have [00:17:53] can talk about these topics we also have another set of topics that are really [00:17:55] another set of topics that are really about uh more thinking about academia [00:17:58] about uh more thinking about academia versus industry graduate school uh [00:18:01] versus industry graduate school uh thinking about like how you read a [00:18:02] thinking about like how you read a research paper so [00:18:04] research paper so so some of these and other components [00:18:06] so some of these and other components that are maybe not necessarily a [00:18:08] that are maybe not necessarily a particular research area and again [00:18:09] particular research area and again you'll have some material for this [00:18:11] you'll have some material for this reading material to have beforehand so [00:18:13] reading material to have beforehand so you come in again prepared so the way we [00:18:16] you come in again prepared so the way we are looking at participation as part of [00:18:18] are looking at participation as part of these faculty chats is as you come in [00:18:20] these faculty chats is as you come in you should introduce yourself and you [00:18:22] you should introduce yourself and you should also share a little bit about [00:18:23] should also share a little bit about your thoughts or your goals for that [00:18:25] your thoughts or your goals for that session so you should actively [00:18:27] session so you should actively participate in that in that 25 minute [00:18:29] participate in that in that 25 minute session and you kind of expect that when [00:18:31] session and you kind of expect that when you're thinking about participation and [00:18:33] you're thinking about participation and grading participation uh during these [00:18:35] grading participation uh during these faculty chats okay [00:18:38] faculty chats okay uh you will not be tested on the [00:18:40] uh you will not be tested on the material that you're discussing on the [00:18:41] material that you're discussing on the faculty chats so i also just wanted to [00:18:44] faculty chats so i also just wanted to mention [00:18:46] mention all right [00:18:48] all right um do we need to attend one faculty chat [00:18:52] um do we need to attend one faculty chat session to get yes so you will be [00:18:53] session to get yes so you will be assigned one to one faculty chat if [00:18:56] assigned one to one faculty chat if there is room you can actually attend [00:18:58] there is room you can actually attend more faculty chats uh we are potentially [00:19:01] more faculty chats uh we are potentially going to have more room based on the [00:19:03] going to have more room based on the number of students who are enrolled but [00:19:04] number of students who are enrolled but uh we will be in touch on like what is [00:19:06] uh we will be in touch on like what is like what are the availabilities and if [00:19:08] like what are the availabilities and if you can attend more than one faculty [00:19:10] you can attend more than one faculty chat but yeah you'll be assigned one [00:19:13] chat but yeah you'll be assigned one okay so let me talk about uh the project [00:19:16] okay so let me talk about uh the project also real quick so [00:19:18] also real quick so um the project this quarter is going to [00:19:20] um the project this quarter is going to be optional uh this is what we did last [00:19:22] be optional uh this is what we did last year too because the course is virtual [00:19:24] year too because the course is virtual and we thought it would be um [00:19:26] and we thought it would be um it might be a little bit more difficult [00:19:27] it might be a little bit more difficult to to find a team and work together but [00:19:29] to to find a team and work together but regardless like a lot of students did [00:19:31] regardless like a lot of students did the project last year and and there are [00:19:34] the project last year and and there are a lot of interesting ideas and projects [00:19:36] a lot of interesting ideas and projects came out of that and it was really [00:19:37] came out of that and it was really exciting to see like so many cool [00:19:39] exciting to see like so many cool projects uh like during that quarter too [00:19:41] projects uh like during that quarter too so i do recommend that you guys look [00:19:43] so i do recommend that you guys look into this closely even though it is [00:19:44] into this closely even though it is optional so so the idea is you want to [00:19:46] optional so so the idea is you want to choose a task uh where you can actually [00:19:49] choose a task uh where you can actually apply some of the ideas that you have [00:19:51] apply some of the ideas that you have learned as part of this class and and [00:19:53] learned as part of this class and and use those techniques for that particular [00:19:55] use those techniques for that particular task it's a little bit open-ended you [00:19:57] task it's a little bit open-ended you need to like decide what that task is [00:19:58] need to like decide what that task is but that's also the beauty of it right [00:20:00] but that's also the beauty of it right you can pick anything and apply some of [00:20:02] you can pick anything and apply some of the ai techniques that you're learning [00:20:04] the ai techniques that you're learning for that [00:20:05] for that uh the idea is that you can work in [00:20:07] uh the idea is that you can work in groups of up to four people [00:20:10] groups of up to four people and then you also have a set of [00:20:11] and then you also have a set of [Music] [00:20:13] [Music] stones like uh you need to fill out a [00:20:15] stones like uh you need to fill out a project interest form there's a proposal [00:20:17] project interest form there's a proposal progress report and there's a video and [00:20:19] progress report and there's a video and final reports that you need to do so if [00:20:20] final reports that you need to do so if you decide to do the project and [00:20:22] you decide to do the project and actually get the extra credit you should [00:20:23] actually get the extra credit you should do all these different you should [00:20:24] do all these different you should actually like pass through all these [00:20:26] actually like pass through all these milestones and finish the project [00:20:28] milestones and finish the project uh again the task is completely open and [00:20:30] uh again the task is completely open and and but but there are a set of [00:20:32] and but but there are a set of well-defined steps that we expect you [00:20:33] well-defined steps that we expect you guys to to to have throughout the course [00:20:36] guys to to to have throughout the course uh throughout the course for this um for [00:20:38] uh throughout the course for this um for this project [00:20:40] this project uh so this includes things of the form [00:20:42] uh so this includes things of the form of like defining the task or [00:20:44] of like defining the task or implementing your baselines and oracles [00:20:47] implementing your baselines and oracles and things of those form or having a [00:20:49] and things of those form or having a literature review thinking about what [00:20:50] literature review thinking about what your revaluation metrics are and and you [00:20:53] your revaluation metrics are and and you will have a ca assigned to you if you [00:20:55] will have a ca assigned to you if you decide to do a project you'll have a ca [00:20:56] decide to do a project you'll have a ca assigned to your group and your ca can [00:20:58] assigned to your group and your ca can also walk you through some of these [00:21:00] also walk you through some of these different components that that you want [00:21:02] different components that that you want to have as part of your project [00:21:05] to have as part of your project and in addition to that one other thing [00:21:07] and in addition to that one other thing that we've added is a mandatory check-in [00:21:09] that we've added is a mandatory check-in meeting with your ca so this is a [00:21:11] meeting with your ca so this is a 15-minute mandatory check-in meeting [00:21:12] 15-minute mandatory check-in meeting david your ca you think this is really [00:21:14] david your ca you think this is really useful and to make sure that you are you [00:21:16] useful and to make sure that you are you keep you keep up with the project if you [00:21:18] keep you keep up with the project if you decide to do it and in general if you [00:21:20] decide to do it and in general if you want to think about ideas for what to do [00:21:23] want to think about ideas for what to do for your project or if you have some [00:21:24] for your project or if you have some idea and you want to discuss it [00:21:26] idea and you want to discuss it definitely come to office hours you can [00:21:27] definitely come to office hours you can come to a person at nice office hour or [00:21:29] come to a person at nice office hour or like the ci's office hours and discuss [00:21:31] like the ci's office hours and discuss discuss some of these questions okay [00:21:35] discuss some of these questions okay all right [00:21:37] all right um and the last point that i want to [00:21:38] um and the last point that i want to mention on [00:21:40] mention on honor [00:21:41] honor on logistics is the honor code so i want [00:21:43] on logistics is the honor code so i want to spend a little bit of time talking [00:21:44] to spend a little bit of time talking about this because this is really [00:21:45] about this because this is really important you guys don't want to deal [00:21:47] important you guys don't want to deal with it we don't want to deal with it so [00:21:49] with it we don't want to deal with it so let's just talk about it and [00:21:51] let's just talk about it and get it out of the way so [00:21:53] get it out of the way so especially this quarter given that [00:21:54] especially this quarter given that things are online we do want you guys to [00:21:56] things are online we do want you guys to collaborate we do want you guys to [00:21:58] collaborate we do want you guys to discuss together learn together like [00:22:00] discuss together learn together like think about problems together but the [00:22:02] think about problems together but the write up and the code needs to be [00:22:04] write up and the code needs to be entered independently so so you need to [00:22:06] entered independently so so you need to like write your code you need to write [00:22:08] like write your code you need to write up your solutions like independently [00:22:10] up your solutions like independently based on your own thoughts and your own [00:22:12] based on your own thoughts and your own ideas so please do not sure code please [00:22:15] ideas so please do not sure code please do not share your write-ups with others [00:22:17] do not share your write-ups with others and don't look at anyone else's write-up [00:22:19] and don't look at anyone else's write-up or code even if it is on internet and [00:22:21] or code even if it is on internet and you found it but not look at these [00:22:22] you found it but not look at these things [00:22:24] things um and then um yeah do not post it [00:22:26] um and then um yeah do not post it online but like if you're proud of your [00:22:28] online but like if you're proud of your code you shouldn't post it on github do [00:22:30] code you shouldn't post it on github do not do that [00:22:31] not do that um and and in general when you're [00:22:33] um and and in general when you're debugging try to look at like input [00:22:35] debugging try to look at like input output behavior you could be like going [00:22:36] output behavior you could be like going to homework parties and debugging your [00:22:38] to homework parties and debugging your solutions with other people and really [00:22:40] solutions with other people and really just look at input output behavior don't [00:22:42] just look at input output behavior don't look at each other's code and and that [00:22:43] look at each other's code and and that way you'll be safe [00:22:45] way you'll be safe but i do want to emphasize that we do [00:22:47] but i do want to emphasize that we do run moss periodically and this will [00:22:49] run moss periodically and this will automatically like detect like if there [00:22:51] automatically like detect like if there is like matching between codes and and [00:22:53] is like matching between codes and and please like do not do that in boss is [00:22:55] please like do not do that in boss is really good uh so [00:22:58] really good uh so yeah so and every year we have a number [00:23:00] yeah so and every year we have a number of cases and sometimes like we run these [00:23:03] of cases and sometimes like we run these things mid-quarter so i want to also [00:23:05] things mid-quarter so i want to also emphasize that and then you don't want [00:23:07] emphasize that and then you don't want to you don't want to want [00:23:09] to you don't want to want things like yeah you don't want to go [00:23:10] things like yeah you don't want to go through these things like mid-quarter [00:23:12] through these things like mid-quarter and and it's it's again something that [00:23:14] and and it's it's again something that we don't want to deal with we don't want [00:23:15] we don't want to deal with we don't want to deal with let's just not do it you're [00:23:17] to deal with let's just not do it you're also changing a number of like homework [00:23:19] also changing a number of like homework questions and you're adding like you're [00:23:22] questions and you're adding like you're adapting things to make this a little [00:23:24] adapting things to make this a little bit easier on everyone [00:23:26] bit easier on everyone all right oh and then the last point i [00:23:28] all right oh and then the last point i want to make is on communication so um [00:23:32] want to make is on communication so um we're going to use ed this quarter so in [00:23:34] we're going to use ed this quarter so in general if you have any questions best [00:23:36] general if you have any questions best idea is to [00:23:37] idea is to put it public at the post that way of [00:23:40] put it public at the post that way of course staff students everyone can see [00:23:41] course staff students everyone can see it and you have a broader group of [00:23:43] it and you have a broader group of people who can answer that question and [00:23:45] people who can answer that question and you can hope like probably other people [00:23:46] you can hope like probably other people are thinking about that question too so [00:23:48] are thinking about that question too so that's kind of a best way of [00:23:49] that's kind of a best way of communicating with us if there is a [00:23:51] communicating with us if there is a private question make a private ad post [00:23:53] private question make a private ad post and that way the course staff can see it [00:23:55] and that way the course staff can see it uh and and for example if you have a [00:23:57] uh and and for example if you have a question uh that can give away answers [00:23:59] question uh that can give away answers it's a good idea to to post that as a [00:24:01] it's a good idea to to post that as a private post and in general if there are [00:24:03] private post and in general if there are sensitive matters that you want to [00:24:05] sensitive matters that you want to discuss or oe accommodations you should [00:24:07] discuss or oe accommodations you should email the court uh the the this [00:24:09] email the court uh the the this particular email address this goes to [00:24:11] particular email address this goes to only four people this goes to percy and [00:24:13] only four people this goes to percy and i and shiri and faith so our student [00:24:15] i and shiri and faith so our student liaison and our head ca so if you have [00:24:18] liaison and our head ca so if you have any sensitive matter you should just uh [00:24:20] any sensitive matter you should just uh send an email to this email list that [00:24:21] send an email to this email list that goes to the four of us [00:24:24] goes to the four of us um in addition to that you're going to [00:24:25] um in addition to that you're going to have periodic surveys you also have a [00:24:27] have periodic surveys you also have a welcome survey already on canvas so so [00:24:29] welcome survey already on canvas so so please take a look at that and fill that [00:24:31] please take a look at that and fill that up and and that way we can start getting [00:24:34] up and and that way we can start getting getting some feedback and again as the [00:24:36] getting some feedback and again as the course is virtual we would love to get [00:24:37] course is virtual we would love to get more feedback periodic feedback [00:24:39] more feedback periodic feedback throughout the course so it will be [00:24:40] throughout the course so it will be great to give us feedback tell us what [00:24:42] great to give us feedback tell us what works what doesn't work so we can [00:24:44] works what doesn't work so we can adapt and again all these details [00:24:47] adapt and again all these details everything i've said so far [00:24:48] everything i've said so far is on the course website [00:24:51] is on the course website and and with that i can take any [00:24:53] and and with that i can take any questions about logistics i know i [00:24:55] questions about logistics i know i covered quite a bit on logistics [00:24:59] covered quite a bit on logistics if anyone wants to just ask a question [00:25:00] if anyone wants to just ask a question that's probably easier to because then [00:25:02] that's probably easier to because then so [00:25:04] so if uh so on the exam if we're looking [00:25:06] if uh so on the exam if we're looking for a clarification should we post that [00:25:09] for a clarification should we post that privately to ad not at all or should we [00:25:12] privately to ad not at all or should we email it [00:25:15] assuming that it's not something that [00:25:16] assuming that it's not something that was supposed to give anything away it's [00:25:18] was supposed to give anything away it's just supposed to be a clarification of [00:25:19] just supposed to be a clarification of what's intended by the question [00:25:21] what's intended by the question you should you should post a private [00:25:23] you should you should post a private private post on end that only go to [00:25:25] private post on end that only go to their courses [00:25:36] that was a good guideline but then uh as [00:25:39] that was a good guideline but then uh as far as with the coding of [00:25:40] far as with the coding of python um what about uh the use you know [00:25:44] python um what about uh the use you know in terms of basic routines obviously not [00:25:47] in terms of basic routines obviously not uh trying to copy uh [00:25:49] uh trying to copy uh codes wholesale but [00:25:51] codes wholesale but as far as using things like stack [00:25:53] as far as using things like stack overflow and [00:25:55] overflow and and others as far as four [00:25:57] and others as far as four um obviously there's all the various uh [00:26:00] um obviously there's all the various uh things that can be used as sort of like [00:26:02] things that can be used as sort of like uh you know uh virtual to tutorials and [00:26:05] uh you know uh virtual to tutorials and uh you know various things that you [00:26:07] uh you know various things that you wanna accomplish with the python that [00:26:09] wanna accomplish with the python that you're writing [00:26:10] you're writing i assume that [00:26:12] i assume that as long as it's you know just little [00:26:15] as long as it's you know just little uh routines that it's not a problem it's [00:26:17] uh routines that it's not a problem it's the problem is when it's starting to be [00:26:19] the problem is when it's starting to be that you're [00:26:20] that you're taking somebody's idea [00:26:22] taking somebody's idea wholesale yeah and in general yeah so [00:26:25] wholesale yeah and in general yeah so try to write things like yeah yourself [00:26:28] try to write things like yeah yourself basically like when it comes to writing [00:26:30] basically like when it comes to writing the code part of it try to like you can [00:26:31] the code part of it try to like you can get ideas you can discuss the idea of it [00:26:33] get ideas you can discuss the idea of it with other people or you can like yeah [00:26:35] with other people or you can like yeah like look at like online forms for ideas [00:26:37] like look at like online forms for ideas but when it comes to writing the code uh [00:26:40] but when it comes to writing the code uh try to just write it yourself if there [00:26:42] try to just write it yourself if there are specific like things that you're not [00:26:44] are specific like things that you're not sure of like you should go to the ca [00:26:46] sure of like you should go to the ca office hours or should go to our office [00:26:48] office hours or should go to our office hours basically ask us like that [00:26:49] hours basically ask us like that specific instance and uh we can talk [00:26:51] specific instance and uh we can talk about it then yeah [00:26:53] about it then yeah okay [00:26:55] okay all right so let's move forward uh so [00:26:58] all right so let's move forward uh so now let's talk a little bit about the [00:26:59] now let's talk a little bit about the course content okay [00:27:02] course content okay so what are we discussing what is ai [00:27:04] so what are we discussing what is ai what are we going to be covering in this [00:27:05] what are we going to be covering in this class [00:27:06] class so so in general you're an ai right like [00:27:09] so so in general you're an ai right like you're interested in solving realistic [00:27:11] you're interested in solving realistic complex problems that have a lot of [00:27:13] complex problems that have a lot of messiness and uncertainty and if you [00:27:15] messiness and uncertainty and if you think about a complex problem let's say [00:27:17] think about a complex problem let's say routing cars like in a city with a lot [00:27:20] routing cars like in a city with a lot of like complex settings that is [00:27:22] of like complex settings that is happening in that city how do you how do [00:27:25] happening in that city how do you how do you go about solving that question let's [00:27:26] you go about solving that question let's say the question is just routing the [00:27:28] say the question is just routing the vehicles right like you're not going to [00:27:30] vehicles right like you're not going to start just writing out code for it right [00:27:32] start just writing out code for it right like you're going don't start from [00:27:33] like you're going don't start from scratch and from like not really having [00:27:35] scratch and from like not really having a formal exam to just directly code that [00:27:38] a formal exam to just directly code that that seems pretty difficult and in [00:27:40] that seems pretty difficult and in general there's there's a gap between [00:27:42] general there's there's a gap between the code or the software the hardware [00:27:44] the code or the software the hardware that in general we developed as ai [00:27:46] that in general we developed as ai scientists as engineers and what is [00:27:49] scientists as engineers and what is happening in reality like what is the [00:27:51] happening in reality like what is the real world with messy with all the messy [00:27:53] real world with messy with all the messy and complex messiness and complexities [00:27:55] and complex messiness and complexities that exist [00:27:56] that exist and then really what ai and what in this [00:27:59] and then really what ai and what in this course is trying to do is to bridge that [00:28:01] course is trying to do is to bridge that gap to figure out how we can take some [00:28:03] gap to figure out how we can take some of these real world problems and and [00:28:04] of these real world problems and and make it simpler in a way that is [00:28:06] make it simpler in a way that is manageable so we can develop algorithms [00:28:08] manageable so we can develop algorithms and code for it [00:28:10] and code for it and for that we have a paradigm in this [00:28:11] and for that we have a paradigm in this class that would like to follow and this [00:28:14] class that would like to follow and this paradigm has three different core [00:28:16] paradigm has three different core components three pillars and these three [00:28:19] components three pillars and these three pillars are the modeling in friends and [00:28:21] pillars are the modeling in friends and learning pillars and i'm going to talk a [00:28:23] learning pillars and i'm going to talk a little bit about these so so the idea is [00:28:25] little bit about these so so the idea is we take a very difficult problem we [00:28:27] we take a very difficult problem we model it and then we develop inference [00:28:30] model it and then we develop inference algorithms for it and then throughout [00:28:33] algorithms for it and then throughout the this process [00:28:34] the this process there could also be the model could have [00:28:36] there could also be the model could have a set of unknowns and we use learning [00:28:38] a set of unknowns and we use learning throughout to actually make our models [00:28:40] throughout to actually make our models better so so let me let me try to make [00:28:42] better so so let me let me try to make this a little bit more clear moving [00:28:44] this a little bit more clear moving forward so so let's go back to this real [00:28:47] forward so so let's go back to this real world problem that you were talking [00:28:48] world problem that you were talking about droughting vehicles like on in a [00:28:50] about droughting vehicles like on in a city okay so so this is a this is a big [00:28:53] city okay so so this is a this is a big problem and in general i would like to [00:28:55] problem and in general i would like to have a formalism so what modeling does [00:28:57] have a formalism so what modeling does is it takes that complex problem and it [00:28:59] is it takes that complex problem and it tries to come up with a formalism with a [00:29:01] tries to come up with a formalism with a mathematical way of thinking about that [00:29:03] mathematical way of thinking about that problem and and modeling just by [00:29:05] problem and and modeling just by definition is lossy right like i'm not [00:29:07] definition is lossy right like i'm not gonna get all that complexity that [00:29:09] gonna get all that complexity that exists in the real world right all [00:29:11] exists in the real world right all models are wrong but some are useful [00:29:13] models are wrong but some are useful right so so under that idea of course [00:29:16] right so so under that idea of course you're going to lose some of this [00:29:17] you're going to lose some of this complexity but we're still going to come [00:29:19] complexity but we're still going to come up with something that is somewhat [00:29:21] up with something that is somewhat useful for the goal that we have maybe i [00:29:23] useful for the goal that we have maybe i would like to find the shortest surest [00:29:25] would like to find the shortest surest way of getting from one one road to [00:29:27] way of getting from one one road to another road and if that is my goal i [00:29:29] another road and if that is my goal i can basically maybe model this real [00:29:31] can basically maybe model this real world problem as a graph problem where i [00:29:34] world problem as a graph problem where i have i have a bunch of edges and [00:29:36] have i have a bunch of edges and vertices and my vertices here are maybe [00:29:39] vertices and my vertices here are maybe my locations like in the world and then [00:29:41] my locations like in the world and then the edges are maybe the roads that [00:29:43] the edges are maybe the roads that connect them okay so this would be a [00:29:45] connect them okay so this would be a graph model that represents that real [00:29:47] graph model that represents that real world problem [00:29:49] world problem so we're going to spend quite a bit of [00:29:50] so we're going to spend quite a bit of time in this class talking about [00:29:51] time in this class talking about modeling [00:29:52] modeling and then once i have a model then i can [00:29:55] and then once i have a model then i can start asking questions about that model [00:29:57] start asking questions about that model right i can ask for what is the shortest [00:29:58] right i can ask for what is the shortest path of getting from one one node to [00:30:00] path of getting from one one node to another node or what is the most scenic [00:30:03] another node or what is the most scenic path of getting getting from one region [00:30:05] path of getting getting from one region to another region or i might have [00:30:06] to another region or i might have different objectives that i would like [00:30:08] different objectives that i would like to be able to optimize and inference is [00:30:11] to be able to optimize and inference is really a way of trying [00:30:13] really a way of trying to solve that problem and give us an [00:30:16] to solve that problem and give us an answer to some of these questions that [00:30:18] answer to some of these questions that that we have here okay so how do we make [00:30:20] that we have here okay so how do we make predictions how do we figure out what is [00:30:22] predictions how do we figure out what is the right path to take in in in this [00:30:24] the right path to take in in in this problem is is kind of like the thing [00:30:26] problem is is kind of like the thing that inference lets us get and then [00:30:28] that inference lets us get and then finally the last pillar is learning and [00:30:30] finally the last pillar is learning and the way i want you to think of learning [00:30:32] the way i want you to think of learning is is that if you think about that model [00:30:34] is is that if you think about that model right like we're oftentimes you're not [00:30:36] right like we're oftentimes you're not gonna write be able to write everything [00:30:38] gonna write be able to write everything in in that model with all the [00:30:39] in in that model with all the complexities but all we can do is we can [00:30:42] complexities but all we can do is we can write a skeleton for what we're trying [00:30:43] write a skeleton for what we're trying to do maybe a graph but that graph might [00:30:46] to do maybe a graph but that graph might not you might not know what are the edge [00:30:48] not you might not know what are the edge like what are the what are the weights [00:30:50] like what are the what are the weights on on the edges like we might not be [00:30:51] on on the edges like we might not be given given given the the edge values [00:30:54] given given given the the edge values here because like that would be like too [00:30:56] here because like that would be like too complicated to write or we might just [00:30:57] complicated to write or we might just not have it like periods at the [00:30:59] not have it like periods at the beginning so we often have a model [00:31:01] beginning so we often have a model without parameters and the goal of [00:31:04] without parameters and the goal of learning is to look at data and from [00:31:06] learning is to look at data and from data complete this model and and add [00:31:10] data complete this model and and add these parameters that were unknown at [00:31:11] these parameters that were unknown at the beginning [00:31:12] the beginning so so what learning is really doing is [00:31:14] so so what learning is really doing is is taking the complexity that we have in [00:31:17] is taking the complexity that we have in writing the specification writing the [00:31:19] writing the specification writing the model and and takes that away and puts [00:31:21] model and and takes that away and puts that into data and the fact that there [00:31:23] that into data and the fact that there is data i can take that data and then [00:31:25] is data i can take that data and then based on how good that data is or based [00:31:27] based on how good that data is or based on what i can learn from that data i [00:31:29] on what i can learn from that data i would be able to complete my model and [00:31:32] would be able to complete my model and have a better model than it can actually [00:31:33] have a better model than it can actually do and friends over so so we're going to [00:31:35] do and friends over so so we're going to have also learning throughout this class [00:31:37] have also learning throughout this class as a filler in every section that you'll [00:31:39] as a filler in every section that you'll talk about in this class okay [00:31:42] talk about in this class okay all right so so modeling inference and [00:31:45] all right so so modeling inference and learning are just three pillars that [00:31:47] learning are just three pillars that keep appearing throughout every week of [00:31:50] keep appearing throughout every week of this class but what is our course plan [00:31:52] this class but what is our course plan so our first plan is really to talk [00:31:53] so our first plan is really to talk about different types of models starting [00:31:55] about different types of models starting from low-level intelligence all the way [00:31:57] from low-level intelligence all the way to higher level intelligence [00:31:59] to higher level intelligence and and we're going to basically go over [00:32:02] and and we're going to basically go over a variety of these models but before we [00:32:04] a variety of these models but before we start talking about these we're going to [00:32:06] start talking about these we're going to actually spend two weeks talking about [00:32:08] actually spend two weeks talking about machine learning so so this is just to [00:32:10] machine learning so so this is just to get some of the basics of machine [00:32:12] get some of the basics of machine learning out of also machine learning in [00:32:14] learning out of also machine learning in general is a very powerful tool that has [00:32:16] general is a very powerful tool that has been quite impactful in the field of ai [00:32:18] been quite impactful in the field of ai so so it's a good idea to learn some of [00:32:20] so so it's a good idea to learn some of these some of these ideas in machine [00:32:21] these some of these ideas in machine learning at the beginning so that we can [00:32:23] learning at the beginning so that we can actually also use it like throughout the [00:32:25] actually also use it like throughout the class when we are thinking about [00:32:27] class when we are thinking about learning modeling and inference [00:32:28] learning modeling and inference throughout the class based on these [00:32:30] throughout the class based on these different types of uh models that we [00:32:32] different types of uh models that we will discuss throughout throughout okay [00:32:34] will discuss throughout throughout okay so so next next week and the week after [00:32:37] so so next next week and the week after is basically going to be modules and [00:32:39] is basically going to be modules and machine learning [00:32:40] machine learning okay um and and um i'm just spending a [00:32:44] okay um and and um i'm just spending a little bit time on what is machine [00:32:46] little bit time on what is machine learning so again the role of machine [00:32:47] learning so again the role of machine learning is to take data right and and [00:32:50] learning is to take data right and and from that data try to generate these [00:32:52] from that data try to generate these models that that were at the beginning [00:32:54] models that that were at the beginning incomplete but now we can actually use [00:32:56] incomplete but now we can actually use them and then we can actually [00:32:56] them and then we can actually incorporate the data the information [00:32:58] incorporate the data the information that's in that data [00:33:00] that's in that data in in the model and the idea of it is [00:33:02] in in the model and the idea of it is really moving from code to data so again [00:33:05] really moving from code to data so again adding [00:33:06] adding moving the complexity that exists in the [00:33:08] moving the complexity that exists in the code to complexity existing data and [00:33:11] code to complexity existing data and then one other point about machine [00:33:12] then one other point about machine learning is it kind of like it requires [00:33:15] learning is it kind of like it requires faith right so so if we have some data [00:33:17] faith right so so if we have some data based on that data below the model [00:33:19] based on that data below the model there's no reason [00:33:21] there's no reason on the surface that that model could [00:33:23] on the surface that that model could work in a new scenario that it could [00:33:25] work in a new scenario that it could generalize to new settings and then [00:33:27] generalize to new settings and then we'll talk about the idea of [00:33:28] we'll talk about the idea of generalization quite a bit like when is [00:33:30] generalization quite a bit like when is it that the model could generalize to [00:33:32] it that the model could generalize to new settings like if i've trained it on [00:33:34] new settings like if i've trained it on some set of data of let's say like house [00:33:37] some set of data of let's say like house prices how can i make sure that this [00:33:38] prices how can i make sure that this model could actually work in a new [00:33:40] model could actually work in a new setting for a new house and and that [00:33:42] setting for a new house and and that kind of goes back to this question of [00:33:43] kind of goes back to this question of generalization and we'll spend time on [00:33:45] generalization and we'll spend time on that [00:33:46] that all right so that was machine learning [00:33:49] all right so that was machine learning so as we were talking about machine [00:33:51] so as we were talking about machine learning in the first two two weeks of [00:33:53] learning in the first two two weeks of the class we're also going to spend a [00:33:55] the class we're also going to spend a little bit of time talking about reflex [00:33:56] little bit of time talking about reflex based models so these are kind of the [00:33:59] based models so these are kind of the lowest level of intelligence in terms of [00:34:01] lowest level of intelligence in terms of in terms of the modeling paradigms that [00:34:03] in terms of the modeling paradigms that we'll be talking about throughout the [00:34:04] we'll be talking about throughout the course [00:34:05] course and then example and and here's an [00:34:07] and then example and and here's an example of a reflex based model so i'm [00:34:10] example of a reflex based model so i'm going to ask you guys what is this [00:34:11] going to ask you guys what is this animal [00:34:13] animal okay [00:34:14] okay maybe you can put in chat what is this [00:34:16] maybe you can put in chat what is this animal or just like chat [00:34:19] animal or just like chat what was it [00:34:22] what was it it was a zebra right and you were very [00:34:24] it was a zebra right and you were very quickly you were able to quickly figure [00:34:26] quickly you were able to quickly figure out that you just saw a zebra here right [00:34:28] out that you just saw a zebra here right um and and this is really based on your [00:34:31] um and and this is really based on your reflexes this is really like an example [00:34:33] reflexes this is really like an example of what a reflex based model could do [00:34:35] of what a reflex based model could do other examples of reflex space models [00:34:37] other examples of reflex space models are things of the form of linear [00:34:39] are things of the form of linear classifiers or deep neural networks and [00:34:41] classifiers or deep neural networks and the idea that i'm calling these like the [00:34:43] the idea that i'm calling these like the reason i'm calling these low intel [00:34:45] reason i'm calling these low intel low-level intelligence is we're not [00:34:46] low-level intelligence is we're not making a lot we're not doing a lot of [00:34:48] making a lot we're not doing a lot of reasoning here we basically have a feed [00:34:50] reasoning here we basically have a feed forward model and then we're not doing [00:34:53] forward model and then we're not doing much computation into responding and [00:34:55] much computation into responding and saying well that was a zebra right like [00:34:56] saying well that was a zebra right like we quickly were just able to quickly say [00:34:59] we quickly were just able to quickly say that that that was a zebra and this [00:35:00] that that that was a zebra and this reflect space models they're the most [00:35:02] reflect space models they're the most common form of models in machine [00:35:04] common form of models in machine learning they're often like fully feed [00:35:06] learning they're often like fully feed forward no backtracking no reasoning [00:35:08] forward no backtracking no reasoning about what was going on and just like [00:35:10] about what was going on and just like evaluating the model i think [00:35:13] evaluating the model i think deep neural networks are an example of [00:35:14] deep neural networks are an example of this linear classifiers are an example [00:35:16] this linear classifiers are an example of this and that that's actually a [00:35:17] of this and that that's actually a reason where we are going to discuss [00:35:19] reason where we are going to discuss machine learning we're going to also [00:35:20] machine learning we're going to also spend a little bit time thinking about [00:35:21] spend a little bit time thinking about reflex space models right [00:35:23] reflex space models right inference is extremely simple and then [00:35:25] inference is extremely simple and then we just call the model [00:35:28] we just call the model all right so moving along uh on top of [00:35:30] all right so moving along uh on top of the reflex based models the one level [00:35:32] the reflex based models the one level higher we're going to talk about [00:35:33] higher we're going to talk about state-based models [00:35:35] state-based models and these state-based models uh we're [00:35:37] and these state-based models uh we're going to talk about three types of them [00:35:38] going to talk about three types of them search problems mbps and adversarial [00:35:40] search problems mbps and adversarial games so so so what are state-based [00:35:42] games so so so what are state-based models so so here is an example let's [00:35:45] models so so here is an example let's say you want to you want to play a game [00:35:46] say you want to you want to play a game of chess and you want to figure out [00:35:49] of chess and you want to figure out where what should be the next move of [00:35:51] where what should be the next move of white [00:35:52] white so so [00:35:53] so so this is not the same as detecting if [00:35:56] this is not the same as detecting if that if that animal was a zebra this is [00:35:58] that if that animal was a zebra this is actually a lot more difficult than that [00:36:00] actually a lot more difficult than that you actually need to sit down and do a [00:36:01] you actually need to sit down and do a little bit of reasoning and figure out [00:36:03] little bit of reasoning and figure out what state in the world you're in and [00:36:05] what state in the world you're in and and figure out like how how the world is [00:36:08] and figure out like how how the world is going to evolve so there is kind of like [00:36:10] going to evolve so there is kind of like this notion of like sequences of of [00:36:13] this notion of like sequences of of actions and sequence of states that are [00:36:15] actions and sequence of states that are coming after each other like a leading [00:36:17] coming after each other like a leading to b and so on and and this kind of [00:36:20] to b and so on and and this kind of brings us this idea of state-based [00:36:22] brings us this idea of state-based models and it hasn't it has many [00:36:23] models and it hasn't it has many applications including in in games so if [00:36:26] applications including in in games so if you think about games like chess go [00:36:28] you think about games like chess go pac-man starcraft these are these are [00:36:30] pac-man starcraft these are these are all examples uh where we can we can [00:36:32] all examples uh where we can we can think about state-based models as a good [00:36:34] think about state-based models as a good way of modeling them [00:36:36] way of modeling them they show up in robotics all the time [00:36:37] they show up in robotics all the time you think about motion planning like [00:36:39] you think about motion planning like getting a robot arm to move from one [00:36:41] getting a robot arm to move from one location to another location uh we [00:36:43] location to another location uh we oftentimes use state-based models as a [00:36:45] oftentimes use state-based models as a way of formalizing that they also show [00:36:47] way of formalizing that they also show up in natural language generation [00:36:49] up in natural language generation machine translation image captioning [00:36:51] machine translation image captioning they're basically like all throughout ai [00:36:53] they're basically like all throughout ai and and they're very good way of [00:36:55] and and they're very good way of thinking about like what are all the [00:36:57] thinking about like what are all the sufficient information that you need to [00:36:59] sufficient information that you need to know at the current time and how that [00:37:02] know at the current time and how that should evolve like uh in the next time [00:37:04] should evolve like uh in the next time step and then adding an ordering of [00:37:06] step and then adding an ordering of going from this state to the next state [00:37:09] going from this state to the next state so so we'll talk about three types of [00:37:10] so so we'll talk about three types of state-based models we'll talk about [00:37:12] state-based models we'll talk about search problems where you can actually [00:37:13] search problems where you can actually control everything so so you have a [00:37:16] control everything so so you have a state and then based on the action that [00:37:17] state and then based on the action that you take you end up in a new state [00:37:20] you take you end up in a new state um we'll talk about markov decision [00:37:22] um we'll talk about markov decision processes which are making research [00:37:24] processes which are making research problems a little bit more difficult by [00:37:26] problems a little bit more difficult by adding uncertainty that comes from the [00:37:29] adding uncertainty that comes from the world so so basically these markov [00:37:31] world so so basically these markov decision processes are state-based [00:37:32] decision processes are state-based models where you're playing against the [00:37:34] models where you're playing against the nature right like nature gives you some [00:37:36] nature right like nature gives you some probabilities you you would look at like [00:37:39] probabilities you you would look at like coin tosses and based on that you [00:37:41] coin tosses and based on that you proceed so there is kind of this notion [00:37:43] proceed so there is kind of this notion of uncertainty and then we'll spend some [00:37:45] of uncertainty and then we'll spend some time talking about games adversarial [00:37:47] time talking about games adversarial games where you're not playing against [00:37:49] games where you're not playing against nature which is probably [00:37:51] nature which is probably instead you're playing against another [00:37:52] instead you're playing against another opponent which is also very intelligent [00:37:54] opponent which is also very intelligent and is making decisions kind of like [00:37:56] and is making decisions kind of like against you as opposed to you and then [00:37:58] against you as opposed to you and then we'll we'll basically go over these [00:38:00] we'll we'll basically go over these different types of state-based models uh [00:38:02] different types of state-based models uh a little bit [00:38:04] a little bit okay so and as part of the homework for [00:38:06] okay so and as part of the homework for uh state-based models we're going to [00:38:09] uh state-based models we're going to play around with the game of pac-man i [00:38:11] play around with the game of pac-man i just want to show a quick demo of uh [00:38:13] just want to show a quick demo of uh this game here [00:38:15] this game here yeah so you're going to play around with [00:38:17] yeah so you're going to play around with the game of pac-man and basically come [00:38:20] the game of pac-man and basically come up with algorithms for pac-man who can [00:38:22] up with algorithms for pac-man who can avoid ghosts and eat these food pellets [00:38:26] avoid ghosts and eat these food pellets and it will be kind of fun playing [00:38:27] and it will be kind of fun playing around with it and [00:38:29] around with it and let me go back to my slides um yeah so [00:38:33] let me go back to my slides um yeah so and as you're thinking about pac-man and [00:38:35] and as you're thinking about pac-man and general state-based models the things to [00:38:36] general state-based models the things to think about is what is a notion of [00:38:38] think about is what is a notion of states how should you transit how do you [00:38:40] states how should you transit how do you transition from one state to another uh [00:38:42] transition from one state to another uh how can you come up with a strategy a [00:38:44] how can you come up with a strategy a policy that can get you from one point [00:38:46] policy that can get you from one point to another so you avoid like the ghosts [00:38:47] to another so you avoid like the ghosts and feet your food pilots and so on and [00:38:50] and feet your food pilots and so on and these are some of the questions that [00:38:51] these are some of the questions that we're going to talk about when we [00:38:52] we're going to talk about when we discuss state-based models [00:38:54] discuss state-based models all right so moving forward uh we're [00:38:57] all right so moving forward uh we're then going to move to the next level of [00:38:59] then going to move to the next level of complexity and intelligence and not [00:39:01] complexity and intelligence and not complexity intelligence and that is [00:39:03] complexity intelligence and that is variable based models so so an example [00:39:06] variable based models so so an example of a variable based model is something [00:39:08] of a variable based model is something like a game of sudoku so if you think [00:39:10] like a game of sudoku so if you think about state-based models there's a [00:39:12] about state-based models there's a notion of like sequential ordering of [00:39:15] notion of like sequential ordering of states you have to do a to get to you [00:39:17] states you have to do a to get to you have to go through a to get to b right [00:39:19] have to go through a to get to b right like if you think about like moving [00:39:20] like if you think about like moving through a graph to for solving like [00:39:23] through a graph to for solving like shortest path you actually need to like [00:39:24] shortest path you actually need to like see city one to two and then after that [00:39:27] see city one to two and then after that ccd2 but there are a set of set of [00:39:29] ccd2 but there are a set of set of problems that don't really require that [00:39:32] problems that don't really require that type of ordering that type of strict [00:39:34] type of ordering that type of strict ordering so think of the game of sudoku [00:39:36] ordering so think of the game of sudoku right like the game of sudoku you have a [00:39:37] right like the game of sudoku you have a bunch of numbers you want to make sure [00:39:39] bunch of numbers you want to make sure that you can fit in like digits of one [00:39:41] that you can fit in like digits of one through nine in every row and column and [00:39:43] through nine in every row and column and the order that you put in these numbers [00:39:45] the order that you put in these numbers that doesn't really matter right like [00:39:47] that doesn't really matter right like you can put the nines first or you can [00:39:49] you can put the nines first or you can put the ones first and that really [00:39:51] put the ones first and that really doesn't [00:39:54] matter really brings us this idea of [00:39:57] matter really brings us this idea of variable based models where we don't [00:39:59] variable based models where we don't have this script ordering and because of [00:40:00] have this script ordering and because of that then we can do something that's a [00:40:02] that then we can do something that's a little bit more intelligent and and [00:40:04] little bit more intelligent and and helps us come up with better algorithms [00:40:05] helps us come up with better algorithms in these settings so we will talk about [00:40:07] in these settings so we will talk about two types of variable based models uh [00:40:09] two types of variable based models uh we'll talk about uh constraint [00:40:11] we'll talk about uh constraint satisfaction problems these are settings [00:40:13] satisfaction problems these are settings when we have hard constraints so uh [00:40:15] when we have hard constraints so uh sudoku was an example right like you [00:40:17] sudoku was an example right like you have a hard constraint you have to [00:40:18] have a hard constraint you have to actually fit like one through nine in in [00:40:20] actually fit like one through nine in in your board or like scheduling type [00:40:22] your board or like scheduling type problems like a person cannot be at two [00:40:24] problems like a person cannot be at two places at the same time so it actually [00:40:26] places at the same time so it actually has very strong strict relations between [00:40:29] has very strong strict relations between the different variables that exist [00:40:32] the different variables that exist but we all in addition to that we have [00:40:34] but we all in addition to that we have also bayesian networks that try to take [00:40:36] also bayesian networks that try to take those hard constraints and make them [00:40:38] those hard constraints and make them soft so there are stock dependencies [00:40:40] soft so there are stock dependencies when you think about bayesian networks [00:40:41] when you think about bayesian networks um like unlike let's say sudoku or [00:40:44] um like unlike let's say sudoku or scheduling so so an example is let's say [00:40:46] scheduling so so an example is let's say you want to track an airplane or you [00:40:47] you want to track an airplane or you want to track a car right like if you're [00:40:49] want to track a car right like if you're tracking your car you might have a set [00:40:51] tracking your car you might have a set of sensors on that car and those sensors [00:40:54] of sensors on that car and those sensors are noisy they're not going to give you [00:40:55] are noisy they're not going to give you the ground truth of where the car is you [00:40:57] the ground truth of where the car is you also know that your car cannot like [00:40:59] also know that your car cannot like teleport right so so the previous time [00:41:02] teleport right so so the previous time step where it was and the current times [00:41:04] step where it was and the current times the next time step they're related to [00:41:06] the next time step they're related to each other and based on these different [00:41:08] each other and based on these different types of relations of where you're where [00:41:10] types of relations of where you're where the car is and where the car is going to [00:41:11] the car is and where the car is going to be and the fact that you have these [00:41:13] be and the fact that you have these noisy sensor readings [00:41:14] noisy sensor readings you can have these soft dependencies [00:41:16] you can have these soft dependencies between these variables and that allows [00:41:18] between these variables and that allows you to estimate where the car is [00:41:21] you to estimate where the car is and then that's the topic that that [00:41:22] and then that's the topic that that we'll discuss like through uh beijing [00:41:25] we'll discuss like through uh beijing network we'll have a homework on this [00:41:26] network we'll have a homework on this and that will actually be about tracking [00:41:28] and that will actually be about tracking cars that'll be exciting [00:41:31] cars that'll be exciting all right and then finally the last [00:41:33] all right and then finally the last components that we are going to discuss [00:41:35] components that we are going to discuss is uh going to be uh on logic so so this [00:41:38] is uh going to be uh on logic so so this would bring us to the highest level of [00:41:41] would bring us to the highest level of intelligence [00:41:42] intelligence and and for logic like as an instance of [00:41:45] and and for logic like as an instance of an example that uses logic we can think [00:41:48] an example that uses logic we can think of a virtual assistant so think of a [00:41:50] of a virtual assistant so think of a virtual assistant what do you want from [00:41:52] virtual assistant what do you want from virtual assistant you oftentimes want to [00:41:54] virtual assistant you oftentimes want to tell it some information give it some [00:41:56] tell it some information give it some information um and you also want to be [00:41:58] information um and you also want to be able to ask it some some questions right [00:42:01] able to ask it some some questions right um and then expect it to respond and [00:42:03] um and then expect it to respond and maybe you would want to use like natural [00:42:05] maybe you would want to use like natural language as a way of communicating with [00:42:07] language as a way of communicating with this virtual assistant so we actually go [00:42:09] this virtual assistant so we actually go through a virtual assistant example as [00:42:11] through a virtual assistant example as part of the homework and logic i want to [00:42:13] part of the homework and logic i want to show a quick demo of that again here um [00:42:16] show a quick demo of that again here um let me see [00:42:18] let me see if i can bring this [00:42:20] if i can bring this to the right window let me go to the [00:42:25] let me see if i can bring terminal to [00:42:28] let me see if i can bring terminal to this [00:42:29] this ah there you go there you go [00:42:31] ah there you go there you go okay so this is actually a tool that [00:42:33] okay so this is actually a tool that we're going to play around with during [00:42:35] we're going to play around with during the uh during the logic homework um and [00:42:39] the uh during the logic homework um and uh basically it's a virtual assistant [00:42:41] uh basically it's a virtual assistant you can give it information you can ask [00:42:43] you can give it information you can ask it information so let me let me try to [00:42:45] it information so let me let me try to some example let me let me actually give [00:42:47] some example let me let me actually give it some information so i'm going to say [00:42:49] it some information so i'm going to say alice is a student [00:42:51] alice is a student okay [00:42:52] okay sorry dorset could you zoom in [00:42:55] sorry dorset could you zoom in oh yeah [00:43:10] okay so i told it alice as a student and [00:43:12] okay so i told it alice as a student and it just learned something uh i can ask [00:43:15] it just learned something uh i can ask it now is [00:43:17] it now is alice [00:43:18] alice student [00:43:20] student what should it say [00:43:22] what should it say it says yes right because i just told it [00:43:24] it says yes right because i just told it alice's student [00:43:25] alice's student i'm going to ask is bob student [00:43:29] i'm going to ask is bob student um what should it respond [00:43:32] um what should it respond you should probably say i don't know [00:43:33] you should probably say i don't know right because how would it know [00:43:36] right because how would it know i don't know [00:43:37] i don't know let me give it some facts i can say [00:43:39] let me give it some facts i can say students are people [00:43:41] students are people okay [00:43:42] okay then i can say alice is not a person [00:43:45] then i can say alice is not a person let's see what it says in response to [00:43:47] let's see what it says in response to that [00:43:49] that okay it says i don't buy that so it [00:43:51] okay it says i don't buy that so it understands it understands contradiction [00:43:53] understands it understands contradiction so so i i told the students or people i [00:43:55] so so i i told the students or people i i showed a generalization and now it's a [00:43:58] i showed a generalization and now it's a contradiction to that and it understands [00:44:00] contradiction to that and it understands that [00:44:01] that i can say alice is the person let's see [00:44:03] i can say alice is the person let's see what that says it confirms i already [00:44:05] what that says it confirms i already knew that okay [00:44:07] knew that okay uh let's give it some more information [00:44:09] uh let's give it some more information alice is from phoenix maybe let's do [00:44:13] alice is from phoenix maybe let's do that [00:44:14] that house is from phoenix [00:44:16] house is from phoenix i learned something we can say phoenix [00:44:19] i learned something we can say phoenix is a hot [00:44:20] is a hot city i learned something [00:44:23] city i learned something i can say cities are places [00:44:26] i can say cities are places places [00:44:27] places i learned something [00:44:29] i learned something let me [00:44:31] let me actually make this a little bit smaller [00:44:33] actually make this a little bit smaller so you can see this [00:44:34] so you can see this okay so if it is uh snowing [00:44:39] okay so if it is uh snowing then it is cold so i'm gonna teach you [00:44:42] then it is cold so i'm gonna teach you kind of like anything else type of [00:44:43] kind of like anything else type of statement [00:44:44] statement i'll learn something okay i'm gonna ask [00:44:46] i'll learn something okay i'm gonna ask it is it snowing what should you say [00:44:50] it is it snowing what should you say oh he doesn't know [00:44:54] okay so it says i don't know okay so i'm [00:44:57] okay so it says i don't know okay so i'm gonna give it more information if a [00:44:58] gonna give it more information if a person is from a hot place [00:45:01] person is from a hot place and it is it is cold uh then she's not [00:45:05] and it is it is cold uh then she's not happy [00:45:06] happy okay [00:45:07] okay so i'm giving it this more complicated [00:45:10] so i'm giving it this more complicated if-then-else type statement [00:45:12] if-then-else type statement i learned something [00:45:14] i learned something i'm gonna ask is it snowing [00:45:16] i'm gonna ask is it snowing what would it say [00:45:20] mommy doesn't know right i don't know [00:45:23] mommy doesn't know right i don't know let's say alice is happy okay [00:45:26] let's say alice is happy okay now i'm gonna ask is it snowing what you [00:45:28] now i'm gonna ask is it snowing what you could say [00:45:32] yes it's not snowing okay yeah [00:45:35] yes it's not snowing okay yeah so uh yeah so so this was just an [00:45:37] so uh yeah so so this was just an example um that was going over like this [00:45:40] example um that was going over like this the spiritual assistant and you'll play [00:45:41] the spiritual assistant and you'll play around with this virtual assistant and [00:45:43] around with this virtual assistant and in the logic module um you will be [00:45:46] in the logic module um you will be thinking about this idea of giving [00:45:47] thinking about this idea of giving information asking information and the [00:45:49] information asking information and the logical relationships between them um [00:45:51] logical relationships between them um and and this will be something that we [00:45:53] and and this will be something that we will work on so i just want to quickly [00:45:55] will work on so i just want to quickly show this demo [00:45:56] show this demo but one thing to notice here is that [00:45:58] but one thing to notice here is that here we were giving these kind of like [00:46:01] here we were giving these kind of like heterogeneous information this was very [00:46:03] heterogeneous information this was very different from me giving like a millions [00:46:05] different from me giving like a millions of pictures of cats and training in all [00:46:07] of pictures of cats and training in all natural right like i was giving these [00:46:08] natural right like i was giving these very heterogeneous information and then [00:46:11] very heterogeneous information and then the system was able to like reason about [00:46:13] the system was able to like reason about these this information in a very deep [00:46:15] these this information in a very deep way right like it was making these deep [00:46:17] way right like it was making these deep connections and i could ask these [00:46:19] connections and i could ask these questions from it and that's very [00:46:20] questions from it and that's very exciting right like that like being able [00:46:22] exciting right like that like being able to have these types of deep deep [00:46:26] to have these types of deep deep deep interactions between the symbols [00:46:27] deep interactions between the symbols that we are providing it [00:46:29] that we are providing it um all right so so that kind of brings [00:46:32] um all right so so that kind of brings us uh to the end of uh this module where [00:46:35] us uh to the end of uh this module where we are thinking about different types of [00:46:37] we are thinking about different types of different types of states uh state [00:46:39] different types of states uh state different types of sorry models and and [00:46:42] different types of sorry models and and as just as a just a quick recap right in [00:46:44] as just as a just a quick recap right in this class we were going to talk about [00:46:46] this class we were going to talk about low low-level intelligence all the way [00:46:48] low low-level intelligence all the way to high-level intelligence models [00:46:49] to high-level intelligence models reflex-based models state-based models [00:46:51] reflex-based models state-based models variable-based models and logic and in [00:46:54] variable-based models and logic and in each one of these models we're going to [00:46:55] each one of these models we're going to talk about the usual paradigm that we [00:46:57] talk about the usual paradigm that we have that tries to do modeling so we'll [00:47:00] have that tries to do modeling so we'll talk about modeling in each one of these [00:47:02] talk about modeling in each one of these these settings and then we talk about [00:47:04] these settings and then we talk about inference what are the different [00:47:05] inference what are the different inference algorithms that we can use in [00:47:07] inference algorithms that we can use in addition to that we talk about learning [00:47:08] addition to that we talk about learning right like how can we have data and how [00:47:10] right like how can we have data and how can we how can we learn and improve our [00:47:13] can we how can we learn and improve our models for each one of these components [00:47:15] models for each one of these components so that paradigm keeps showing up [00:47:17] so that paradigm keeps showing up throughout the class basically every [00:47:19] throughout the class basically every week [00:47:20] week all right so now let's uh spend five [00:47:22] all right so now let's uh spend five minutes and and let's have an icebreaker [00:47:25] minutes and and let's have an icebreaker so [00:47:26] so he's going to put us in [00:47:27] he's going to put us in in groups of four uh and then in [00:47:30] in groups of four uh and then in breakout rooms and during this five [00:47:32] breakout rooms and during this five minutes let's just try to introduce [00:47:34] minutes let's just try to introduce ourselves to others and and just to [00:47:36] ourselves to others and and just to maybe set the stage let's let's have a [00:47:38] maybe set the stage let's let's have a question and let's discuss a question [00:47:40] question and let's discuss a question and the question is what is the biggest [00:47:41] and the question is what is the biggest benefit of ai and what is the biggest [00:47:43] benefit of ai and what is the biggest risk of ai and when you come back try to [00:47:46] risk of ai and when you come back try to like put that on chat and uh we will [00:47:49] like put that on chat and uh we will discuss and and go from there okay so [00:47:52] discuss and and go from there okay so let's spend five minutes on [00:47:55] let's spend five minutes on on in brave cat rooms [00:47:57] on in brave cat rooms so it was um good talking to some of you [00:48:00] so it was um good talking to some of you during the breakout rooms um maybe uh [00:48:03] during the breakout rooms um maybe uh yeah [00:48:04] yeah so maybe you can uh kind of like put [00:48:06] so maybe you can uh kind of like put some of your responses some of the [00:48:08] some of your responses some of the things that you discussed on chat [00:48:09] things that you discussed on chat um as a way of like just discussing some [00:48:12] um as a way of like just discussing some of them um [00:48:15] also quick things don't direct message [00:48:18] also quick things don't direct message me on chat i'm like barely looking at [00:48:20] me on chat i'm like barely looking at chat so if there are any questions [00:48:22] chat so if there are any questions please ping the cas or email me later on [00:48:24] please ping the cas or email me later on or with questions on ed [00:48:27] or with questions on ed okay [00:48:28] okay all right so yeah [00:48:30] all right so yeah any [00:48:31] any biggest benefits biggest risks anyone [00:48:34] biggest benefits biggest risks anyone has thoughts [00:48:37] okay [00:48:39] okay thank you for starting this [00:48:43] thank you for starting this improving people's lives tangible [00:48:45] improving people's lives tangible applications [00:48:46] applications mutual assistance biggest risks ml [00:48:49] mutual assistance biggest risks ml fairness ethics [00:48:50] fairness ethics yeah so these are all like great points [00:48:53] yeah so these are all like great points um i'm gonna talk about them a little [00:48:55] um i'm gonna talk about them a little bit a fifth time but now let's uh let's [00:48:59] bit a fifth time but now let's uh let's continue with the kind of like next [00:49:00] continue with the kind of like next segment where i want to give a little [00:49:02] segment where i want to give a little bit of a history of uh ai here and this [00:49:05] bit of a history of uh ai here and this is going to be brief i don't want to go [00:49:06] is going to be brief i don't want to go into too much detail it's not going to [00:49:08] into too much detail it's not going to be a complete history uh but i think [00:49:10] be a complete history uh but i think it's good idea to talk about this [00:49:12] it's good idea to talk about this because it gives a little bit of like [00:49:13] because it gives a little bit of like insights in terms of like why we are [00:49:15] insights in terms of like why we are where we are today and how things shaped [00:49:18] where we are today and how things shaped over time so if you want to give a [00:49:19] over time so if you want to give a history of ai like you can really like [00:49:21] history of ai like you can really like go back to 1950 and 19 1950 alan turing [00:49:25] go back to 1950 and 19 1950 alan turing put out during put out um [00:49:27] put out during put out um his kind of like his landmark paper on [00:49:30] his kind of like his landmark paper on computing machinery and intelligence and [00:49:33] computing machinery and intelligence and in this paper alan turing asked a [00:49:35] in this paper alan turing asked a question can machines think okay and he [00:49:37] question can machines think okay and he came up with his answer which was the [00:49:39] came up with his answer which was the imitation game which you might know as [00:49:41] imitation game which you might know as the touring test and then you guys might [00:49:43] the touring test and then you guys might be familiar with the turing test the [00:49:45] be familiar with the turing test the idea of the turing test is that a [00:49:47] idea of the turing test is that a machine can pass a turing test if it is [00:49:49] machine can pass a turing test if it is able to fool a person into thinking that [00:49:51] able to fool a person into thinking that it's actually a human okay [00:49:53] it's actually a human okay and and this paper was was really like [00:49:56] and and this paper was was really like foundational like in a sense that it [00:49:58] foundational like in a sense that it started allowing us to think about [00:50:00] started allowing us to think about intelligence a lot more carefully um and [00:50:03] intelligence a lot more carefully um and and actually try to formalize that uh in [00:50:05] and actually try to formalize that uh in a in a better way and it [00:50:07] a in a better way and it it was kind of like one of the first [00:50:09] it was kind of like one of the first words one of the foundational works that [00:50:10] words one of the foundational works that it started thinking about yeah [00:50:12] it started thinking about yeah formalizing this this idea of [00:50:13] formalizing this this idea of intelligence and then we might argue on [00:50:16] intelligence and then we might argue on if turing test is a good test or not in [00:50:18] if turing test is a good test or not in terms of measuring intelligence and then [00:50:20] terms of measuring intelligence and then we might have various opinions on that [00:50:22] we might have various opinions on that but that part of it is not really the [00:50:23] but that part of it is not really the part that matters the part of it that [00:50:25] part that matters the part of it that matters is thinking about intelligence [00:50:27] matters is thinking about intelligence being able to formalize it and one other [00:50:29] being able to formalize it and one other thing that alan turin actually i'm [00:50:31] thing that alan turin actually i'm turing actually like uh provided in this [00:50:33] turing actually like uh provided in this in this paper was this idea of [00:50:35] in this paper was this idea of separating the the question that you're [00:50:38] separating the the question that you're trying to ask the what from the how like [00:50:41] trying to ask the what from the how like how are we going to answer this question [00:50:42] how are we going to answer this question right like alan turing came up with this [00:50:44] right like alan turing came up with this idea of uh imitation game or or um or [00:50:48] idea of uh imitation game or or um or and and basically what what this was [00:50:50] and and basically what what this was giving us was was this idea of object [00:50:53] giving us was was this idea of object specification right this is the thing [00:50:55] specification right this is the thing that we are trying to get this is our [00:50:57] that we are trying to get this is our specification that we are trying to get [00:50:59] specification that we are trying to get at but how we do it or how the machine [00:51:02] at but how we do it or how the machine really does it but he didn't really [00:51:03] really does it but he didn't really specify that and this modularity of [00:51:06] specify that and this modularity of specifying what we are trying to get and [00:51:08] specifying what we are trying to get and how we are trying to get that this is [00:51:10] how we are trying to get that this is really like a foundational idea that we [00:51:12] really like a foundational idea that we have been also using in a lot of our [00:51:14] have been also using in a lot of our algorithms and we see throughout this [00:51:15] algorithms and we see throughout this class like separating out the objective [00:51:18] class like separating out the objective from the algorithm and how we are going [00:51:19] from the algorithm and how we are going about it is actually quite important and [00:51:22] about it is actually quite important and then it's a [00:51:23] then it's a very good foundational idea um and and [00:51:26] very good foundational idea um and and one interesting thing is that touring uh [00:51:28] one interesting thing is that touring uh also provided like at the end of this [00:51:30] also provided like at the end of this paper provided some ideas in terms of [00:51:32] paper provided some ideas in terms of how we should go about it and he talked [00:51:35] how we should go about it and he talked about two different approaches one was [00:51:37] about two different approaches one was this very abstract way of going about [00:51:39] this very abstract way of going about this problem kind of like a top-down [00:51:41] this problem kind of like a top-down view kind of like how we would go about [00:51:43] view kind of like how we would go about solving chess and this is really related [00:51:45] solving chess and this is really related to this idea of symbolic ai that i'm [00:51:47] to this idea of symbolic ai that i'm going to talk about and he also provided [00:51:49] going to talk about and he also provided another potential way of going about [00:51:51] another potential way of going about this which is like having machines that [00:51:53] this which is like having machines that have sense organs aka sensors and then [00:51:56] have sense organs aka sensors and then teach them like a child right and then [00:51:58] teach them like a child right and then this idea of teaching a machine and then [00:52:01] this idea of teaching a machine and then getting that machine and have sensors on [00:52:03] getting that machine and have sensors on it and let it sense data and from that [00:52:05] it and let it sense data and from that data try to learn um that that idea is [00:52:08] data try to learn um that that idea is very related to the idea of neural ai so [00:52:12] very related to the idea of neural ai so so since this point right 19 1950 where [00:52:14] so since this point right 19 1950 where touring put out this paper there has [00:52:16] touring put out this paper there has been kind of like three different [00:52:18] been kind of like three different flavors of ai that has been around right [00:52:20] flavors of ai that has been around right this is there's symbolic ai there's [00:52:22] this is there's symbolic ai there's neural ai and there's statistical ai and [00:52:25] neural ai and there's statistical ai and i want to give a little bit of a history [00:52:26] i want to give a little bit of a history brief history of each one of these so [00:52:28] brief history of each one of these so let's start with in 1956 and let's start [00:52:31] let's start with in 1956 and let's start with the the story of symbolic ai so [00:52:34] with the the story of symbolic ai so this is the first type of ai flavor of [00:52:37] this is the first type of ai flavor of ai that i want to talk about so so the [00:52:39] ai that i want to talk about so so the term ai really comes back to to this age [00:52:42] term ai really comes back to to this age of 1956 and this is when uh mccarthy um [00:52:46] of 1956 and this is when uh mccarthy um basically organized a workshop and at [00:52:48] basically organized a workshop and at dartsmith college so john mccarthy he [00:52:50] dartsmith college so john mccarthy he was a faculty at stanford cs he actually [00:52:53] was a faculty at stanford cs he actually uh created the stanford ai lab and and [00:52:56] uh created the stanford ai lab and and he organized a workshop at dartmouth [00:52:58] he organized a workshop at dartmouth college that summer and he invited a lot [00:53:00] college that summer and he invited a lot of other big names like marvin minsky [00:53:02] of other big names like marvin minsky newell um herbert simon [00:53:05] newell um herbert simon he invited all of these people and the [00:53:07] he invited all of these people and the goal of this workshop was to think about [00:53:09] goal of this workshop was to think about intelligence right they wanted to they [00:53:11] intelligence right they wanted to they had a very ambitious goal they wanted to [00:53:13] had a very ambitious goal they wanted to think about every aspect of learning and [00:53:15] think about every aspect of learning and then every feature of intelligence and [00:53:17] then every feature of intelligence and they wanted to model that so precisely [00:53:20] they wanted to model that so precisely so they can have a machine that could [00:53:22] so they can have a machine that could simulate that and this is a very [00:53:24] simulate that and this is a very ambitious goal right and then they were [00:53:25] ambitious goal right and then they were really after generality they wanted to [00:53:27] really after generality they wanted to figure out what are the general [00:53:28] figure out what are the general principles of intelligence and [00:53:32] principles of intelligence and learning that they can really get so so [00:53:34] learning that they can really get so so they can have an artificial intelligence [00:53:36] they can have an artificial intelligence and an intelligent agent that can [00:53:38] and an intelligent agent that can stimulate that and that was that was [00:53:40] stimulate that and that was that was really exciting and immediately after [00:53:42] really exciting and immediately after after this workshop all these people [00:53:44] after this workshop all these people went back their ways and they started [00:53:45] went back their ways and they started producing really cool systems around [00:53:47] producing really cool systems around this time so this was really the birth [00:53:49] this time so this was really the birth of ai and you started seeing a lot of [00:53:51] of ai and you started seeing a lot of early successes so like 1952 arthur fami [00:53:54] early successes so like 1952 arthur fami or arthur samuel put out uh one of these [00:53:57] or arthur samuel put out uh one of these first checkers programs that was able to [00:54:00] first checkers programs that was able to play at the at the level of an amateur [00:54:02] play at the at the level of an amateur like play checker at the level on [00:54:03] like play checker at the level on amateur which was really exciting um we [00:54:06] amateur which was really exciting um we also in addition to that um we had uh in [00:54:09] also in addition to that um we had uh in 1955 noah and simon came up with uh the [00:54:12] 1955 noah and simon came up with uh the theorem with kind of our first fear [00:54:14] theorem with kind of our first fear improvers and and they basically had the [00:54:16] improvers and and they basically had the system that could uh that could solve [00:54:19] system that could uh that could solve generally solve problems and solve [00:54:21] generally solve problems and solve theorems they came up with a proof for a [00:54:24] theorems they came up with a proof for a theorem uh that was actually a lot more [00:54:25] theorem uh that was actually a lot more elegant than uh than like a per like [00:54:28] elegant than uh than like a per like than what people had before and they [00:54:30] than what people had before and they tried to publish a paper uh on this [00:54:32] tried to publish a paper uh on this proof but the paper got rejected because [00:54:34] proof but the paper got rejected because the reviewers thought the theorem [00:54:35] the reviewers thought the theorem already existed but it was really [00:54:37] already existed but it was really exciting to be able to have systems that [00:54:39] exciting to be able to have systems that can prove theorems that can play [00:54:40] can prove theorems that can play checkers that can generally solve [00:54:42] checkers that can generally solve problems [00:54:43] problems and there are a lot of optimism right [00:54:44] and there are a lot of optimism right like all these really famous people um [00:54:47] like all these really famous people um in the in the field [00:54:49] in the in the field and they all had a lot of optimism about [00:54:51] and they all had a lot of optimism about what is possible with ai right herbert [00:54:53] what is possible with ai right herbert simon said machines will be capable [00:54:55] simon said machines will be capable within 20 years of doing any work a man [00:54:57] within 20 years of doing any work a man can do marvin minsky said within 10 [00:54:59] can do marvin minsky said within 10 years the problem of artificial [00:55:01] years the problem of artificial intelligence will be substantially [00:55:02] intelligence will be substantially solved claude shannon said i visualized [00:55:05] solved claude shannon said i visualized a time when when we will be to robots [00:55:07] a time when when we will be to robots with dogs or to humans and i'm rooting [00:55:08] with dogs or to humans and i'm rooting for the machines so these are not like [00:55:11] for the machines so these are not like random people on the street [00:55:15] famous people like founding fathers of [00:55:17] famous people like founding fathers of ai and and these are like some of the [00:55:19] ai and and these are like some of the optimism overwhelming optimism that [00:55:20] optimism overwhelming optimism that people had around that time what is what [00:55:23] people had around that time what is what we could actually do with these ai [00:55:25] we could actually do with these ai systems and unfortunately we started [00:55:27] systems and unfortunately we started seeing very underwhelming results so [00:55:28] seeing very underwhelming results so around this time right the government [00:55:30] around this time right the government really cared about the problem machine [00:55:31] really cared about the problem machine translation and there there's a lot of [00:55:33] translation and there there's a lot of funding around it and then we started [00:55:35] funding around it and then we started seeing results that weren't very like [00:55:37] seeing results that weren't very like that that weren't [00:55:38] that that weren't kind of underwhelming and then here is [00:55:40] kind of underwhelming and then here is kind of a made-up example but the [00:55:43] kind of a made-up example but the results were things of the form of you [00:55:45] results were things of the form of you may you might have a text that says the [00:55:46] may you might have a text that says the spirit is willing but the fresh the [00:55:48] spirit is willing but the fresh the flesh is weak and you might translate [00:55:51] flesh is weak and you might translate that to russian and translate it back to [00:55:53] that to russian and translate it back to english and you would get a text that [00:55:56] english and you would get a text that says the white card is good but the meat [00:55:57] says the white card is good but the meat is rotten which is which is very which [00:56:00] is rotten which is which is very which is not very good [00:56:01] is not very good and as we started seeing these results [00:56:04] and as we started seeing these results right um [00:56:05] right um like governments started putting out a [00:56:07] like governments started putting out a report about how these results are are [00:56:10] report about how these results are are not are not so great and then they [00:56:12] not are not so great and then they started cutting off funding for ai [00:56:14] started cutting off funding for ai research and this is around the same [00:56:16] research and this is around the same time that we started seeing the first [00:56:18] time that we started seeing the first winter of ai so so a lot of optimism [00:56:21] winter of ai so so a lot of optimism wasn't really going anywhere and we had [00:56:23] wasn't really going anywhere and we had this first winter of ai and that wasn't [00:56:25] this first winter of ai and that wasn't so great [00:56:26] so great so if you think about this first early [00:56:28] so if you think about this first early era of of ai what were some of the [00:56:31] era of of ai what were some of the problems so some of the problems were [00:56:33] problems so some of the problems were first off we had very limited [00:56:34] first off we had very limited computation like a lot of these problems [00:56:36] computation like a lot of these problems were written as logical problems and and [00:56:39] were written as logical problems and and usually like resolved as search problems [00:56:41] usually like resolved as search problems where the search space was just growing [00:56:43] where the search space was just growing exponentially and with the limited [00:56:45] exponentially and with the limited hardware that we had it was just like [00:56:47] hardware that we had it was just like not possible to solve these very [00:56:48] not possible to solve these very difficult problems [00:56:50] difficult problems but even if we had like infinite compute [00:56:52] but even if we had like infinite compute at that time which we didn't there was [00:56:54] at that time which we didn't there was another problem and this other problem [00:56:56] another problem and this other problem was the fact that we had limited [00:56:58] was the fact that we had limited information right like if you think [00:56:59] information right like if you think about solving some of these very complex [00:57:01] about solving some of these very complex ai problems that people were thinking [00:57:03] ai problems that people were thinking about they they needed to write out [00:57:05] about they they needed to write out these problems and the knowledge that [00:57:07] these problems and the knowledge that exists around them using using words and [00:57:09] exists around them using using words and objects and writing out the concepts and [00:57:11] objects and writing out the concepts and then it was very difficult to actually [00:57:14] then it was very difficult to actually provide all these information and we had [00:57:16] provide all these information and we had really limited information about some of [00:57:18] really limited information about some of these concepts but regardless we started [00:57:20] these concepts but regardless we started seeing a lot of interesting [00:57:22] seeing a lot of interesting contributions that came out of this era [00:57:24] contributions that came out of this era even though it was a failure and we had [00:57:25] even though it was a failure and we had this [00:57:26] this winter of ai there were a lot of [00:57:28] winter of ai there were a lot of interesting ideas that came out around [00:57:29] interesting ideas that came out around this time like we had the list [00:57:30] this time like we had the list programming language we had ideas like [00:57:33] programming language we had ideas like garbage collection and time sharing and [00:57:35] garbage collection and time sharing and a lot of these ideas are associated [00:57:36] a lot of these ideas are associated actually to john mccarthy but it was [00:57:38] actually to john mccarthy but it was exciting to see a lot of advances even [00:57:41] exciting to see a lot of advances even though like we there were the problems [00:57:43] though like we there were the problems were still there and we couldn't really [00:57:44] were still there and we couldn't really solve the big problem and this really [00:57:47] solve the big problem and this really brings us to the era of 70s and 80s so [00:57:50] brings us to the era of 70s and 80s so in 70s and 80s really uh people started [00:57:53] in 70s and 80s really uh people started thinking about this idea of knowledge [00:57:55] thinking about this idea of knowledge and building knowledge based systems and [00:57:57] and building knowledge based systems and and and kind of like the core idea was [00:57:59] and and kind of like the core idea was knowledge is really the key if we can [00:58:01] knowledge is really the key if we can encode knowledge if you can bring in [00:58:03] encode knowledge if you can bring in ideas from experts and donate like bring [00:58:05] ideas from experts and donate like bring in your domain knowledge and incorporate [00:58:07] in your domain knowledge and incorporate that in the system then we can actually [00:58:09] that in the system then we can actually solve interesting ai questions so so [00:58:11] solve interesting ai questions so so this was the rise of expert systems [00:58:13] this was the rise of expert systems where we can basically elicit like [00:58:15] where we can basically elicit like domain specific knowledge from an expert [00:58:17] domain specific knowledge from an expert and encode that into like these if then [00:58:19] and encode that into like these if then else type statements into rules that the [00:58:22] else type statements into rules that the system can call [00:58:24] system can call and be able to like solve various types [00:58:26] and be able to like solve various types of problems [00:58:27] of problems there was also another shift around this [00:58:29] there was also another shift around this time so so the first era of ai right the [00:58:31] time so so the first era of ai right the john mccarthy like dartmouth college [00:58:33] john mccarthy like dartmouth college like workshop that happened that was all [00:58:36] like workshop that happened that was all about understanding intelligence being [00:58:38] about understanding intelligence being able to say well what is human [00:58:39] able to say well what is human intelligence and can we stimulate that [00:58:42] intelligence and can we stimulate that and that didn't really work out but in [00:58:44] and that didn't really work out but in this new era what happened was people [00:58:46] this new era what happened was people started kind of like changing and [00:58:49] started kind of like changing and changing paradigms and they started [00:58:51] changing paradigms and they started thinking about applications a lot more [00:58:53] thinking about applications a lot more so sure you're not going to be able to [00:58:55] so sure you're not going to be able to think about intelligence and simulating [00:58:56] think about intelligence and simulating intelligence but i can build systems [00:58:59] intelligence but i can build systems that can be used in chemistry or they [00:59:00] that can be used in chemistry or they can be used in medical diagnosis or they [00:59:02] can be used in medical diagnosis or they could be used in business and then this [00:59:04] could be used in business and then this was kind of like the first era that [00:59:06] was kind of like the first era that people started building ai systems that [00:59:08] people started building ai systems that maybe they learned so much about [00:59:09] maybe they learned so much about simulating intelligence but they were [00:59:11] simulating intelligence but they were they were about solving interesting [00:59:13] they were about solving interesting useful problems that could be used in [00:59:14] useful problems that could be used in industry [00:59:16] industry so um lots of events during this time [00:59:18] so um lots of events during this time right so so these knowledge-based [00:59:20] right so so these knowledge-based systems they really helped us uh with [00:59:22] systems they really helped us uh with both the information and computation gap [00:59:25] both the information and computation gap right they allowed us to like [00:59:26] right they allowed us to like incorporate knowledge and information [00:59:28] incorporate knowledge and information and by doing so they allowed us to kind [00:59:30] and by doing so they allowed us to kind of like prune the search space and that [00:59:32] of like prune the search space and that would help us need like less compute to [00:59:35] would help us need like less compute to solve some of these problems so that was [00:59:37] solve some of these problems so that was exciting and this was the first time [00:59:39] exciting and this was the first time that we were seeing some of these real [00:59:41] that we were seeing some of these real applications that actually impacted [00:59:42] applications that actually impacted industry and that was also a lot that [00:59:45] industry and that was also a lot that was also very exciting [00:59:46] was also very exciting but there were still some problems [00:59:49] but there were still some problems around this era one of the problems was [00:59:52] around this era one of the problems was these rules were very deterministic and [00:59:54] these rules were very deterministic and they couldn't really handle the [00:59:55] they couldn't really handle the uncertainty that existed in real world [00:59:57] uncertainty that existed in real world we had all these deterministic [00:59:58] we had all these deterministic connections and rules that were never [01:00:00] connections and rules that were never coming together and they weren't really [01:00:02] coming together and they weren't really capturing the complexity of that that [01:00:04] capturing the complexity of that that exists in the world [01:00:05] exists in the world in addition to that the rules were [01:00:07] in addition to that the rules were becoming very complex very quickly so so [01:00:10] becoming very complex very quickly so so this is a quote from terry vinograd who [01:00:12] this is a quote from terry vinograd who was a faculty and hci um in cs [01:00:16] was a faculty and hci um in cs department in hci at stanford and at the [01:00:18] department in hci at stanford and at the time he was faculty actually and working [01:00:20] time he was faculty actually and working in ai at mit and and here's here is some [01:00:23] in ai at mit and and here's here is some codes that he said around these [01:00:25] codes that he said around these knowledge-based systems he said well [01:00:26] knowledge-based systems he said well these systems are a dead end and they [01:00:28] these systems are a dead end and they have very complex interactions that are [01:00:30] have very complex interactions that are that are difficult to handle and they're [01:00:32] that are difficult to handle and they're just really no easy footholds to have [01:00:34] just really no easy footholds to have and and this really brings us to the [01:00:36] and and this really brings us to the second winter of ai so lots of [01:00:39] second winter of ai so lots of excitement right we're seeing real [01:00:41] excitement right we're seeing real applications but there was still quite a [01:00:42] applications but there was still quite a bit of difficulty into extending these [01:00:45] bit of difficulty into extending these systems [01:00:46] systems and and this was kind of the end of like [01:00:48] and and this was kind of the end of like this era of symbolic ai right symbolic [01:00:51] this era of symbolic ai right symbolic ai really dominated ai for many decades [01:00:54] ai really dominated ai for many decades and then and i i now want to go back in [01:00:57] and then and i i now want to go back in time right like i'm now in 1987 but now [01:01:00] time right like i'm now in 1987 but now i want to go back in time and tell you a [01:01:02] i want to go back in time and tell you a little bit of a history of neural ai and [01:01:05] little bit of a history of neural ai and where that started and what was what was [01:01:07] where that started and what was what was kind of um [01:01:08] kind of um the progression of that and that takes [01:01:10] the progression of that and that takes us to 1943. so [01:01:12] us to 1943. so going back in time [01:01:14] going back in time let's think about okay artificial neural [01:01:16] let's think about okay artificial neural networks and how they started so so in [01:01:18] networks and how they started so so in 1943 mccullough and pitts they they came [01:01:22] 1943 mccullough and pitts they they came up with the first artificial neural [01:01:24] up with the first artificial neural network where they had a single neuron [01:01:25] network where they had a single neuron and they basically modeled that's that [01:01:27] and they basically modeled that's that single neuron with that and they they [01:01:29] single neuron with that and they they thought about very similar relations [01:01:31] thought about very similar relations like ants and nerves and and they [01:01:33] like ants and nerves and and they weren't thinking about learning rules or [01:01:34] weren't thinking about learning rules or anything of that form at that point and [01:01:36] anything of that form at that point and then that is kind of like 1943 the very [01:01:38] then that is kind of like 1943 the very first version of artificial neural [01:01:40] first version of artificial neural networks and in 1949 help came up with [01:01:44] networks and in 1949 help came up with the idea of coming up with a learning [01:01:45] the idea of coming up with a learning rule here this learning rule was very [01:01:47] rule here this learning rule was very simple it was cells that fire together [01:01:49] simple it was cells that fire together wire together and this learning rule it [01:01:52] wire together and this learning rule it was actually kind of like it didn't [01:01:53] was actually kind of like it didn't really work and it wasn't very unstable [01:01:55] really work and it wasn't very unstable but it was one of the first learning [01:01:56] but it was one of the first learning rules that was put in place [01:01:58] rules that was put in place and finally in 1958 you started seeing [01:02:00] and finally in 1958 you started seeing some advances in artificial neural [01:02:02] some advances in artificial neural networks this is when rosenblatt came up [01:02:04] networks this is when rosenblatt came up with the perceptron algorithm for a [01:02:06] with the perceptron algorithm for a single layer neural network which is [01:02:08] single layer neural network which is basically a linear classifier and this [01:02:10] basically a linear classifier and this perceptron algorithm was even been like [01:02:13] perceptron algorithm was even been like was was being used until very recently [01:02:15] was was being used until very recently and it showed a lot of success it was [01:02:17] and it showed a lot of success it was actually very powerful and there was a [01:02:18] actually very powerful and there was a lot of excitement around it [01:02:20] lot of excitement around it um 1959 we started seeing the analog of [01:02:24] um 1959 we started seeing the analog of linear regression that was adeline there [01:02:26] linear regression that was adeline there was a multi-layer extension of that [01:02:28] was a multi-layer extension of that madeline and and this was actually used [01:02:31] madeline and and this was actually used for removing echoes on phone lines and [01:02:33] for removing echoes on phone lines and this was again one of the very first [01:02:35] this was again one of the very first times that people started using [01:02:36] times that people started using artificial neural networks for for a [01:02:38] artificial neural networks for for a real application like removing echoes [01:02:40] real application like removing echoes from a phone line [01:02:42] from a phone line and in 1969 that was an important year [01:02:45] and in 1969 that was an important year for artificial neural networks so so [01:02:47] for artificial neural networks so so this year minsky and packard wrote a [01:02:49] this year minsky and packard wrote a book on artificial neural networks they [01:02:51] book on artificial neural networks they wrote a book called perceptrons and they [01:02:53] wrote a book called perceptrons and they basically tried to analyze mathematical [01:02:55] basically tried to analyze mathematical properties of linear models and the [01:02:58] properties of linear models and the thing that they showed was actually [01:02:59] thing that they showed was actually something that was very simple which is [01:03:01] something that was very simple which is a single layer neural network is a [01:03:03] a single layer neural network is a linear classifier it's actually not [01:03:05] linear classifier it's actually not going to be able to solve the function [01:03:07] going to be able to solve the function of x or and then this this book is [01:03:10] of x or and then this this book is really associated uh to shutting down [01:03:13] really associated uh to shutting down the research on artificial neural [01:03:14] the research on artificial neural networks so that was a time that people [01:03:16] networks so that was a time that people started thinking maybe maybe um [01:03:18] started thinking maybe maybe um these types of artificial neural [01:03:20] these types of artificial neural networks are not very powerful and maybe [01:03:21] networks are not very powerful and maybe we should stop doing research on them [01:03:23] we should stop doing research on them even though the book wasn't really any [01:03:25] even though the book wasn't really any saying anything about these neural [01:03:26] saying anything about these neural networks [01:03:28] networks regardless um they started seeing a [01:03:30] regardless um they started seeing a revival of neural networks and and [01:03:33] revival of neural networks and and around 1980s and this was kind of with [01:03:35] around 1980s and this was kind of with the rise of convolutional neural [01:03:36] the rise of convolutional neural networks and the idea like this came [01:03:38] networks and the idea like this came under the umbrella connectionism [01:03:41] under the umbrella connectionism and you started seeing that this very [01:03:43] and you started seeing that this very very first convolutional neural networks [01:03:45] very first convolutional neural networks and the training of them was very ad hoc [01:03:47] and the training of them was very ad hoc uh but but in 1986 that was kind of [01:03:51] uh but but in 1986 that was kind of around the time that we started seeing [01:03:52] around the time that we started seeing better training and giving like more [01:03:54] better training and giving like more principled ways of training these [01:03:56] principled ways of training these systems so uh so uh around this time [01:04:00] systems so uh so uh around this time rama hart hinton and and williams they [01:04:02] rama hart hinton and and williams they popularized or kind of like reinvented [01:04:04] popularized or kind of like reinvented the idea of back propagation and then [01:04:07] the idea of back propagation and then that was adding a lot more principle [01:04:08] that was adding a lot more principle into how we should train these systems [01:04:11] into how we should train these systems and 1989 again that was like one of the [01:04:13] and 1989 again that was like one of the first times that you started seeing [01:04:15] first times that you started seeing these systems being used in practice so [01:04:17] these systems being used in practice so john le applied commercial neural [01:04:19] john le applied commercial neural networks for recognizing handwritten [01:04:21] networks for recognizing handwritten digits and he actually deployed this [01:04:23] digits and he actually deployed this with usps for detecting digits of zip [01:04:25] with usps for detecting digits of zip codes which was really exciting [01:04:27] codes which was really exciting but still this idea of artificial neural [01:04:29] but still this idea of artificial neural networks it was still like a niche like [01:04:31] networks it was still like a niche like it wasn't like a thing that everyone was [01:04:33] it wasn't like a thing that everyone was working on and and that was until like [01:04:36] working on and and that was until like the era of like deep learning and the [01:04:38] the era of like deep learning and the 2000 2010s like era so and part of the [01:04:41] 2000 2010s like era so and part of the reason was it was actually really [01:04:43] reason was it was actually really difficult to train these models but in [01:04:45] difficult to train these models but in 2006 hinton and all they developed the [01:04:48] 2006 hinton and all they developed the system an unsupervised layer-wise [01:04:50] system an unsupervised layer-wise pre-training system that was helping [01:04:53] pre-training system that was helping pre-training some of these some of these [01:04:55] pre-training some of these some of these neural networks and kind of reducing the [01:04:57] neural networks and kind of reducing the efforts that goes into into training the [01:05:00] efforts that goes into into training the disease models but the break really [01:05:02] disease models but the break really happened around 2000 so 2012 we started [01:05:06] happened around 2000 so 2012 we started seeing systems like alex said that were [01:05:08] seeing systems like alex said that were giving us huge gains in terms of object [01:05:11] giving us huge gains in terms of object recognition like these systems basically [01:05:13] recognition like these systems basically [Music] [01:05:15] [Music] revolutionize and transform the field of [01:05:17] revolutionize and transform the field of computer vision overnight right the type [01:05:19] computer vision overnight right the type of the computer vision course that i [01:05:21] of the computer vision course that i took in 2012 was actually pre-neural [01:05:23] took in 2012 was actually pre-neural networks and it was very different from [01:05:25] networks and it was very different from like what is being taught today and and [01:05:27] like what is being taught today and and this basically changed the field like [01:05:29] this basically changed the field like overnight with with the rise of uh [01:05:31] overnight with with the rise of uh commercial neural networks and training [01:05:33] commercial neural networks and training these systems and being able to do [01:05:34] these systems and being able to do object recognition [01:05:37] object recognition and uh finally in 2016 also we started [01:05:40] and uh finally in 2016 also we started seeing um things like alphago like [01:05:42] seeing um things like alphago like another breakthrough and and alphago um [01:05:45] another breakthrough and and alphago um was basically using deep reinforcement [01:05:47] was basically using deep reinforcement learning to defeat like world champion [01:05:49] learning to defeat like world champion go player um and that was another [01:05:52] go player um and that was another breakthrough that that was that was a [01:05:53] breakthrough that that was that was a game that people were not thinking we're [01:05:55] game that people were not thinking we're thinking it was a lot more difficult and [01:05:56] thinking it was a lot more difficult and it was really exciting to see deep [01:05:58] it was really exciting to see deep learning and deep reinforcement learning [01:06:00] learning and deep reinforcement learning is able to solve some of these problems [01:06:02] is able to solve some of these problems so all right so let me try to wrap up [01:06:04] so all right so let me try to wrap up because i know it's almost 3 p.m and and [01:06:06] because i know it's almost 3 p.m and and we'll release um modules on some of the [01:06:08] we'll release um modules on some of the rest of the rest of the lecture uh later [01:06:11] rest of the rest of the lecture uh later today uh but let me just give you some [01:06:14] today uh but let me just give you some some kind of like food for thoughts i've [01:06:16] some kind of like food for thoughts i've talked about symbolically i have talked [01:06:18] talked about symbolically i have talked about numeral ai the symbolic ai is [01:06:20] about numeral ai the symbolic ai is really this top-down view that that is [01:06:24] really this top-down view that that is that goes back its roots goes back [01:06:25] that goes back its roots goes back really to logic right like you had these [01:06:27] really to logic right like you had these very big goals of building a virtual [01:06:30] very big goals of building a virtual assistant and neural ai on the other [01:06:32] assistant and neural ai on the other hand it's more like bottom up it's [01:06:34] hand it's more like bottom up it's trying to solve these perceptual tasks [01:06:36] trying to solve these perceptual tasks and um and the two like might seem like [01:06:40] and um and the two like might seem like might seem to have very philosophical [01:06:42] might seem to have very philosophical differences and there's they might seem [01:06:44] differences and there's they might seem contradictory but they're actually not [01:06:46] contradictory but they're actually not very contradictory like there are a lot [01:06:48] very contradictory like there are a lot of deeper connections between them and [01:06:50] of deeper connections between them and today actually people are thinking about [01:06:51] today actually people are thinking about integrating them in ways that that we [01:06:54] integrating them in ways that that we weren't able to do before and even if [01:06:56] weren't able to do before and even if you go back to the history of it right [01:06:57] you go back to the history of it right like what mccullough and were doing with [01:06:59] like what mccullough and were doing with the first neural network was was [01:07:02] the first neural network was was actually analyzing properties of a [01:07:03] actually analyzing properties of a logical system or alphago if you think [01:07:06] logical system or alphago if you think about alphago it's a very logical game [01:07:08] about alphago it's a very logical game that you write like the the the rules of [01:07:10] that you write like the the the rules of the game in logic and and it's using [01:07:12] the game in logic and and it's using it's using neural networks to solve that [01:07:14] it's using neural networks to solve that game so there are deeper connections [01:07:16] game so there are deeper connections between these two between between uh [01:07:18] between these two between between uh these two views of ai and they really [01:07:20] these two views of ai and they really come together [01:07:22] come together um all right so um [01:07:24] um all right so um sorry for going over a little bit uh [01:07:27] sorry for going over a little bit uh what is left as part of this lecture is [01:07:29] what is left as part of this lecture is really talking talking a little bit [01:07:31] really talking talking a little bit about statistical ai um we'll release [01:07:34] about statistical ai um we'll release modules on these um and then thinking [01:07:37] modules on these um and then thinking about where statistical ai is coming [01:07:38] about where statistical ai is coming into play and and wrapping up the [01:07:40] into play and and wrapping up the history of ai and then the other part [01:07:42] history of ai and then the other part that is really left in is talking about [01:07:44] that is really left in is talking about ai and what are some risks and benefits [01:07:46] ai and what are some risks and benefits of that and that's something that you [01:07:47] of that and that's something that you guys talked a little bit about during [01:07:49] guys talked a little bit about during the breakout rooms uh but we'll also [01:07:51] the breakout rooms uh but we'll also talk about that in lecture so if you [01:07:53] talk about that in lecture so if you want to watch these lectures uh later [01:07:56] want to watch these lectures uh later that would be cool and [01:07:57] that would be cool and with that [01:07:59] with that i can [01:08:00] i can if there are any questions i can take [01:08:01] if there are any questions i can take any questions otherwise i'll see you [01:08:04] any questions otherwise i'll see you guys [01:08:05] guys next week [01:08:11] you ================================================================================ LECTURE 002 ================================================================================ AI History | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=z8fEXuH0mu0 --- Transcript [00:00:05] the next thing i want to do is talk a [00:00:07] the next thing i want to do is talk a bit about the history of ai [00:00:10] bit about the history of ai and obviously the history of ai is going [00:00:12] and obviously the history of ai is going to be necessarily abbreviated and [00:00:14] to be necessarily abbreviated and simplified here but i just want to give [00:00:15] simplified here but i just want to give you appreciation for how multifaceted [00:00:18] you appreciation for how multifaceted the history is and how [00:00:20] the history is and how rich and somewhat sometimes [00:00:22] rich and somewhat sometimes controversial it is [00:00:24] controversial it is so a natural starting point to talk [00:00:26] so a natural starting point to talk about the history of ai is alan turing's [00:00:29] about the history of ai is alan turing's famous paper in 1950 called computing [00:00:31] famous paper in 1950 called computing machinery and intelligence [00:00:33] machinery and intelligence so in this paper he asked the question [00:00:36] so in this paper he asked the question can machines think [00:00:37] can machines think and he proposes the imitation game as [00:00:39] and he proposes the imitation game as his solution [00:00:41] his solution more popularly known as the turing test [00:00:44] more popularly known as the turing test and some of you probably know the turing [00:00:46] and some of you probably know the turing test is said to be passed by a machine [00:00:49] test is said to be passed by a machine if it can fool a human judge into [00:00:51] if it can fool a human judge into thinking that it is actually a human [00:00:53] thinking that it is actually a human being [00:00:54] being so this paper is remarkable not because [00:00:57] so this paper is remarkable not because it built a system or proposed new [00:00:59] it built a system or proposed new methods but it framed the philosophical [00:01:01] methods but it framed the philosophical discussions of what is intelligence for [00:01:04] discussions of what is intelligence for years to come [00:01:06] years to come and you just have to appreciate how [00:01:08] and you just have to appreciate how difficult a notion [00:01:10] difficult a notion intelligence is to pin down so this was [00:01:12] intelligence is to pin down so this was really the first actionable [00:01:14] really the first actionable formal answer to the question can [00:01:16] formal answer to the question can machines think and now whether you [00:01:19] machines think and now whether you think that [00:01:20] think that working on a turing test is a good idea [00:01:22] working on a turing test is a good idea that will lead to progress is [00:01:24] that will lead to progress is questionable and controversial but at [00:01:26] questionable and controversial but at least philosophically it's quite [00:01:28] least philosophically it's quite thought-provoking [00:01:30] thought-provoking so for us one major takeaway of the [00:01:32] so for us one major takeaway of the turing tests which was not really [00:01:34] turing tests which was not really highlighted [00:01:35] highlighted is this objective specification [00:01:38] is this objective specification so note that the test itself [00:01:41] so note that the test itself is [00:01:42] is meant to be capturing what a system [00:01:44] meant to be capturing what a system ought to be doing [00:01:46] ought to be doing independent of how you get there it [00:01:48] independent of how you get there it doesn't say whether it should be using [00:01:50] doesn't say whether it should be using neural networks or logic based methods [00:01:52] neural networks or logic based methods or so on [00:01:54] or so on and this modularity is going to be [00:01:56] and this modularity is going to be really important to us in the [00:01:58] really important to us in the in this course [00:02:00] in this course so at the end of the paper train does [00:02:02] so at the end of the paper train does speculate on what might work so he talks [00:02:05] speculate on what might work so he talks about two possible approaches you could [00:02:07] about two possible approaches you could take a top-down approach and try to [00:02:10] take a top-down approach and try to tackle abstract problems such as chess [00:02:12] tackle abstract problems such as chess this is the route taken by symbolic ai [00:02:15] this is the route taken by symbolic ai you could also [00:02:16] you could also quote unquote um [00:02:18] quote unquote um provide a machine with the best sense [00:02:20] provide a machine with the best sense organs and aka sensors and teach it like [00:02:24] organs and aka sensors and teach it like a child [00:02:25] a child and this is more of the approach taken [00:02:27] and this is more of the approach taken by neural and statistical ai [00:02:30] by neural and statistical ai and both have been tried [00:02:33] and both have been tried and we'll see [00:02:34] and we'll see how all three [00:02:36] how all three uh [00:02:37] uh types of ai symbolic neural statistical [00:02:40] types of ai symbolic neural statistical kind of meld together at the end [00:02:43] so to start our first story let's go to [00:02:46] so to start our first story let's go to the summer of 1956. the place was [00:02:49] the summer of 1956. the place was starmeth college [00:02:51] starmeth college john mccarthy who actually founded the [00:02:53] john mccarthy who actually founded the stanford ai lab and [00:02:56] stanford ai lab and organized a workshop he gathered the [00:02:58] organized a workshop he gathered the brightest minds of the time [00:03:00] brightest minds of the time in attendance with marvin minsky alan [00:03:02] in attendance with marvin minsky alan newell herbert simon all of whom want to [00:03:04] newell herbert simon all of whom want to make seminal contributions ai [00:03:07] make seminal contributions ai and the participants set out a [00:03:09] and the participants set out a not so modest proposal it was to [00:03:12] not so modest proposal it was to they claim that every aspect of learning [00:03:15] they claim that every aspect of learning or any other feature intelligence can be [00:03:17] or any other feature intelligence can be so precisely described that a machine [00:03:19] so precisely described that a machine can be made to simulate it [00:03:22] can be made to simulate it so they were really after the moon [00:03:26] so they were really after the moon they were after generality [00:03:28] they were after generality and this was post-war computers were [00:03:31] and this was post-war computers were just coming on the scene it was a really [00:03:33] just coming on the scene it was a really exciting time and people were really [00:03:35] exciting time and people were really ambitious [00:03:38] so during this time there were a few [00:03:40] so during this time there were a few systems that were built um arthur samuel [00:03:43] systems that were built um arthur samuel built a computer program that could play [00:03:45] built a computer program that could play chat checkers at a reasonable amateur [00:03:47] chat checkers at a reasonable amateur level and actually featured some uh you [00:03:50] level and actually featured some uh you know machine learning [00:03:52] know machine learning um [00:03:53] um ali newell and herbert simon [00:03:55] ali newell and herbert simon came up with a logic theorist that could [00:03:57] came up with a logic theorist that could prove theorems [00:03:59] prove theorems for one theorem they actually found a [00:04:00] for one theorem they actually found a proof that was better than the human [00:04:02] proof that was better than the human written proof and they tried to submit a [00:04:04] written proof and they tried to submit a paper on the result but the paper got [00:04:06] paper on the result but the paper got rejected because [00:04:08] rejected because the reviewers said it was not a new [00:04:10] the reviewers said it was not a new theorem [00:04:11] theorem what the reviewers didn't realize that [00:04:13] what the reviewers didn't realize that the third author was actually a computer [00:04:15] the third author was actually a computer program [00:04:17] program later they worked generalized these [00:04:19] later they worked generalized these ideas to the general problem solver [00:04:22] ideas to the general problem solver which [00:04:22] which was aimed at solving any problem [00:04:24] was aimed at solving any problem provided it could be suitably encoded in [00:04:26] provided it could be suitably encoded in logic and again this carries forward the [00:04:29] logic and again this carries forward the ambitious general intelligence [00:04:31] ambitious general intelligence agenda [00:04:34] sand this was a time of high optimism [00:04:38] sand this was a time of high optimism with the leaders of the field who are [00:04:40] with the leaders of the field who are all really impressive thinkers [00:04:42] all really impressive thinkers predicting ai would be solved in a [00:04:45] predicting ai would be solved in a matter of years [00:04:48] but we know that [00:04:50] but we know that they didn't get solved in 10 years and [00:04:52] they didn't get solved in 10 years and there were some [00:04:54] there were some tasks such as machine translations which [00:04:56] tasks such as machine translations which were very stubborn [00:04:57] were very stubborn so this is now a folklore story i don't [00:05:00] so this is now a folklore story i don't know how true it is but it's amusing [00:05:01] know how true it is but it's amusing nonetheless um you take a sentence like [00:05:04] nonetheless um you take a sentence like the spirit is willing but the flesh is [00:05:05] the spirit is willing but the flesh is weak [00:05:06] weak translate it into russian which was the [00:05:09] translate it into russian which was the favorite language for translation in the [00:05:11] favorite language for translation in the 50s and you translate it back [00:05:14] 50s and you translate it back and then you get the vodka is good but [00:05:16] and then you get the vodka is good but the meat is rotten [00:05:18] the meat is rotten so [00:05:19] so uh this was [00:05:21] uh this was less than amusing to the government [00:05:23] less than amusing to the government funding agencies [00:05:24] funding agencies who decided to write a report showing [00:05:26] who decided to write a report showing how really machine translation wasn't [00:05:28] how really machine translation wasn't going anywhere and cut off funding this [00:05:30] going anywhere and cut off funding this led to the first ai winter [00:05:34] led to the first ai winter so what went wrong here [00:05:36] so what went wrong here so there's two things [00:05:39] so there's two things first is that most of the approaches [00:05:42] first is that most of the approaches involved casting problems as logical [00:05:44] involved casting problems as logical reasoning which required a search over [00:05:46] reasoning which required a search over an exponentially large state space and [00:05:48] an exponentially large state space and the hardware at the time was just simply [00:05:50] the hardware at the time was just simply too limited [00:05:52] too limited and secondly [00:05:54] and secondly even if the research had infinite [00:05:56] even if the research had infinite compute they would still not be able to [00:05:58] compute they would still not be able to solve ai because there's just too many [00:06:00] solve ai because there's just too many concepts in the world words objects and [00:06:04] concepts in the world words objects and all this information has to somehow be [00:06:06] all this information has to somehow be put [00:06:07] put into the ai system [00:06:10] into the ai system so these grand ambitions weren't [00:06:11] so these grand ambitions weren't realized but nonetheless there were some [00:06:14] realized but nonetheless there were some useful contributions [00:06:16] useful contributions many due to john mccarthy that came out [00:06:18] many due to john mccarthy that came out of this era [00:06:19] of this era first lisp was invented for ai and [00:06:22] first lisp was invented for ai and arguably it's still the world's most [00:06:25] arguably it's still the world's most advanced programming language [00:06:27] advanced programming language garbage collection is something that if [00:06:28] garbage collection is something that if you're programming only in python it [00:06:31] you're programming only in python it allows you to not know what garbage [00:06:32] allows you to not know what garbage collection is and time sharing the [00:06:35] collection is and time sharing the ability to use a single computer by [00:06:37] ability to use a single computer by multiple people was prescient at the [00:06:39] multiple people was prescient at the time [00:06:42] so then fast forward to the 70s and 80s [00:06:46] so then fast forward to the 70s and 80s knowledge was the key word [00:06:48] knowledge was the key word and ai researchers thought knowledge was [00:06:50] and ai researchers thought knowledge was the key to combat both the computation [00:06:53] the key to combat both the computation and information limitations of the [00:06:55] and information limitations of the previous era [00:06:56] previous era and at that time expert systems became [00:06:58] and at that time expert systems became very fashionable [00:07:00] very fashionable where a domain expert could encode [00:07:02] where a domain expert could encode knowledge in the form of rules usually [00:07:04] knowledge in the form of rules usually looking like this [00:07:06] looking like this and [00:07:07] and there was a noticeable shift [00:07:09] there was a noticeable shift as well to solve it all optimism from [00:07:11] as well to solve it all optimism from the 50s and 60s was gone and instead [00:07:13] the 50s and 60s was gone and instead researchers focused on very practical [00:07:16] researchers focused on very practical systems targeted at particular domains [00:07:18] systems targeted at particular domains for example chemistry medical diagnosis [00:07:21] for example chemistry medical diagnosis and [00:07:22] and business operations [00:07:24] business operations and [00:07:26] and there were some good things knowledge [00:07:27] there were some good things knowledge did help [00:07:28] did help curb both the information complexity and [00:07:31] curb both the information complexity and also restricted the space state space so [00:07:34] also restricted the space state space so that it alleviated the computation [00:07:36] that it alleviated the computation burden [00:07:38] burden and this was the first time that ai had [00:07:41] and this was the first time that ai had real [00:07:42] real applications on industry [00:07:45] applications on industry but there were obviously problems [00:07:47] but there were obviously problems deterministic rules couldn't handle the [00:07:49] deterministic rules couldn't handle the complexity and uncertainty in the real [00:07:51] complexity and uncertainty in the real world and moreover these rules just [00:07:54] world and moreover these rules just became quickly too complex to create and [00:07:56] became quickly too complex to create and maintain [00:07:57] maintain so this is a quote from terry buenergrad [00:07:59] so this is a quote from terry buenergrad who some of you know was on the hci [00:08:01] who some of you know was on the hci faculty at stanford but before he was a [00:08:04] faculty at stanford but before he was a hdf faculty he worked at mit as an ai [00:08:06] hdf faculty he worked at mit as an ai researcher [00:08:07] researcher and [00:08:08] and this is what he had to say in the mid [00:08:10] this is what he had to say in the mid 70s you thought that it was a dead end [00:08:12] 70s you thought that it was a dead end there was just too many complex [00:08:14] there was just too many complex interactions between all the components [00:08:16] interactions between all the components no easy footholds and you just couldn't [00:08:19] no easy footholds and you just couldn't hold the comp [00:08:21] hold the comp have a mental model what was going on in [00:08:23] have a mental model what was going on in your head [00:08:25] your head and moreover there was a lot of [00:08:27] and moreover there was a lot of over-promising and under-delivering [00:08:29] over-promising and under-delivering field collapsed again [00:08:31] field collapsed again and it really seemed that history was [00:08:33] and it really seemed that history was repeating itself [00:08:36] repeating itself so at this point we're going to leave [00:08:37] so at this point we're going to leave aside the story of symbolic ai which [00:08:40] aside the story of symbolic ai which dominated ai for multiple decades [00:08:42] dominated ai for multiple decades and go back in time to 1943 to tell the [00:08:45] and go back in time to 1943 to tell the story of neural ai [00:08:48] story of neural ai so 1943 is the year often attributed to [00:08:52] so 1943 is the year often attributed to the birth of artificial neural networks [00:08:55] the birth of artificial neural networks so mccollum pitts [00:08:56] so mccollum pitts devised a simple model that [00:08:59] devised a simple model that and study mathematical properties of the [00:09:02] and study mathematical properties of the simple model [00:09:05] but they didn't do anything in a way of [00:09:07] but they didn't do anything in a way of learning [00:09:08] learning the model's parameters in 1946 there was [00:09:12] the model's parameters in 1946 there was a first learning rule by donald hebb [00:09:15] a first learning rule by donald hebb based on the mantra that cells that fire [00:09:17] based on the mantra that cells that fire together wired together it was nice and [00:09:19] together wired together it was nice and simple but it didn't really work 1958 [00:09:22] simple but it didn't really work 1958 rose and black came up with a perceptron [00:09:24] rose and black came up with a perceptron algorithm for learning single-layer [00:09:26] algorithm for learning single-layer artificial neural networks aka linear [00:09:28] artificial neural networks aka linear classifiers which actually turned out to [00:09:30] classifiers which actually turned out to work really well [00:09:32] work really well even was used even fairly recently [00:09:36] even was used even fairly recently 59 there was analog for linear [00:09:38] 59 there was analog for linear regression by woodrow and hoff [00:09:40] regression by woodrow and hoff they came up with actually a multi-layer [00:09:42] they came up with actually a multi-layer generalization called madeline [00:09:44] generalization called madeline which was actually used to eliminate [00:09:46] which was actually used to eliminate echoes on phone lines at the time and [00:09:49] echoes on phone lines at the time and this was one of the first real world [00:09:50] this was one of the first real world applications of neural networks [00:09:54] applications of neural networks and then 1969 this was a big year [00:09:57] and then 1969 this was a big year so marvin minsky seymour pappard wrote a [00:10:00] so marvin minsky seymour pappard wrote a small book called perceptrons [00:10:03] small book called perceptrons and they analyzed perceptrons with [00:10:06] and they analyzed perceptrons with varying mathematical properties [00:10:08] varying mathematical properties and they had a little almost trivial [00:10:10] and they had a little almost trivial result that showed that single-layer [00:10:13] result that showed that single-layer perceptron couldn't recognize the xor [00:10:17] perceptron couldn't recognize the xor function [00:10:18] function and even though that is said nothing [00:10:20] and even though that is said nothing about the capabilities deep networks [00:10:23] about the capabilities deep networks somehow this book is largely credited [00:10:26] somehow this book is largely credited with [00:10:27] with shutting down neural networks research [00:10:29] shutting down neural networks research and the continued rise of symbolic ai [00:10:32] and the continued rise of symbolic ai it's a really kind of interesting piece [00:10:33] it's a really kind of interesting piece of history and i you know encourage you [00:10:35] of history and i you know encourage you to go examine it [00:10:38] to go examine it in the 80s our neural networks started [00:10:40] in the 80s our neural networks started coming back again [00:10:42] coming back again 1980 was the first convolutional neural [00:10:44] 1980 was the first convolutional neural network um which was trained in a kind [00:10:47] network um which was trained in a kind of a ad hoc way 1986 [00:10:50] of a ad hoc way 1986 romahart hinton-williams [00:10:53] romahart hinton-williams reinvented [00:10:54] reinvented and popularized back propagation from [00:10:56] and popularized back propagation from multi-layer networks and now training [00:10:58] multi-layer networks and now training became a little bit more principled 1989 [00:11:02] became a little bit more principled 1989 yamachan [00:11:03] yamachan devised a convolutional network that was [00:11:06] devised a convolutional network that was able to recognize handwritten digits [00:11:09] able to recognize handwritten digits and was actually deployed for the use at [00:11:10] and was actually deployed for the use at usps to recognize zip codes [00:11:13] usps to recognize zip codes and this was one of the first major [00:11:15] and this was one of the first major success stories of using neural networks [00:11:19] success stories of using neural networks um but [00:11:21] um but until the mid 2000s neural network's [00:11:23] until the mid 2000s neural network's research was still fairly niche i would [00:11:26] research was still fairly niche i would say and they were very notorious hard to [00:11:28] say and they were very notorious hard to train [00:11:29] train in 2006 this kind of started changing [00:11:31] in 2006 this kind of started changing jeff hinton and his colleagues had a [00:11:33] jeff hinton and his colleagues had a paper showing how you could use [00:11:35] paper showing how you could use unsupervised layer-wise pre-training to [00:11:37] unsupervised layer-wise pre-training to mitigate some of these [00:11:39] mitigate some of these facts and the term deep learning started [00:11:42] facts and the term deep learning started getting used around this time as well [00:11:44] getting used around this time as well but it was really [00:11:46] but it was really 2012. i would say that was a real kind [00:11:48] 2012. i would say that was a real kind of the major break for neural networks [00:11:51] of the major break for neural networks so alice [00:11:53] so alice chuzzkevsky elias suscover and jeff [00:11:55] chuzzkevsky elias suscover and jeff hinton wrote this landmark paper [00:11:58] hinton wrote this landmark paper um which came up with what is now called [00:12:00] um which came up with what is now called alexnet a convolutional network which [00:12:03] alexnet a convolutional network which had huge huge gains in object [00:12:05] had huge huge gains in object recognition [00:12:06] recognition and at the time computer vision [00:12:08] and at the time computer vision community was very skeptical and almost [00:12:11] community was very skeptical and almost overnight it completely transformed the [00:12:13] overnight it completely transformed the field think about computer vision [00:12:14] field think about computer vision without neural networks today that's [00:12:16] without neural networks today that's kind of um this almost feels like kind [00:12:18] kind of um this almost feels like kind of distant memory almost [00:12:20] of distant memory almost 2016 was another big event alphago [00:12:24] 2016 was another big event alphago defeated they said oh and go something [00:12:26] defeated they said oh and go something that experts thought was still many [00:12:29] that experts thought was still many decades away and that just kind of [00:12:31] decades away and that just kind of firmly more established deep learning as [00:12:34] firmly more established deep learning as a dominant [00:12:35] a dominant paradigm in ai [00:12:37] paradigm in ai and this kind of continues even to the [00:12:39] and this kind of continues even to the modern day [00:12:40] modern day um but [00:12:42] um but let's reflect so far so we have seen two [00:12:45] let's reflect so far so we have seen two intellectual traditions symbolic ai [00:12:47] intellectual traditions symbolic ai which is roots [00:12:49] which is roots in logic and neural ai with its roots in [00:12:52] in logic and neural ai with its roots in neuroscience [00:12:53] neuroscience the two have fought fiercely over the [00:12:55] the two have fought fiercely over the decades over philosophical differences [00:12:58] decades over philosophical differences but i want to suggest some food for [00:13:01] but i want to suggest some food for thought maybe there are deeper [00:13:02] thought maybe there are deeper connections here [00:13:03] connections here so remember that nicole and pitt's paper [00:13:06] so remember that nicole and pitt's paper that introduced neural networks and [00:13:08] that introduced neural networks and arguably the root of deep learning well [00:13:10] arguably the root of deep learning well they spent most of the time talking [00:13:12] they spent most of the time talking about how it can actually encode logical [00:13:15] about how it can actually encode logical operations [00:13:17] operations and the game of go which is actually a [00:13:18] and the game of go which is actually a perfectly logical game [00:13:20] perfectly logical game designed by a few elegant simple rules [00:13:23] designed by a few elegant simple rules but alphago use the power pattern [00:13:26] but alphago use the power pattern matching capabilities of neural networks [00:13:28] matching capabilities of neural networks to solve this otherwise logical game [00:13:32] to solve this otherwise logical game so there may be room for more symbiosis [00:13:34] so there may be room for more symbiosis than we think [00:13:37] than we think so now there's a third and final story [00:13:38] so now there's a third and final story that we must tell to complete the [00:13:40] that we must tell to complete the picture [00:13:41] picture so this story is not really about ai per [00:13:44] so this story is not really about ai per se but it's about the influx of certain [00:13:47] se but it's about the influx of certain other ideas from other areas that have [00:13:50] other ideas from other areas that have helped shape and form a mathematical [00:13:53] helped shape and form a mathematical foundation [00:13:54] foundation for ai and we call this statistical ai [00:13:57] for ai and we call this statistical ai so machine learning is very [00:13:59] so machine learning is very popular but the idea of fitting models [00:14:03] popular but the idea of fitting models from data goes which is at the core of [00:14:06] from data goes which is at the core of machine learning goes far back [00:14:08] machine learning goes far back even to uh gauss and legend and at the [00:14:11] even to uh gauss and legend and at the beginning of the [00:14:12] beginning of the 19th century we developed at least [00:14:14] 19th century we developed at least squares for linear regression [00:14:16] squares for linear regression classification was also very early in [00:14:18] classification was also very early in statistics [00:14:20] statistics and ai also consists of sequential [00:14:22] and ai also consists of sequential decision-making problems for [00:14:23] decision-making problems for deterministic versions dijkstra's [00:14:26] deterministic versions dijkstra's algorithm from the algorithms community [00:14:29] algorithm from the algorithms community for uh models with uncertainty from [00:14:31] for uh models with uncertainty from control theory bellman created markov [00:14:34] control theory bellman created markov decision processes [00:14:36] decision processes um and notice that all these [00:14:37] um and notice that all these developments largely predated um the 50s [00:14:40] developments largely predated um the 50s which [00:14:41] which and 40s where ai really kind of started [00:14:44] and 40s where ai really kind of started springing up [00:14:47] so you might have noticed if you're [00:14:49] so you might have noticed if you're paying close attention that [00:14:51] paying close attention that where we left symbolic ai was at the end [00:14:54] where we left symbolic ai was at the end of the 80s [00:14:55] of the 80s but where neural ai started really [00:14:56] but where neural ai started really gaining traction was [00:14:58] gaining traction was the 2010s so what was going on between [00:15:02] the 2010s so what was going on between and what was going on between was that [00:15:05] and what was going on between was that there was a period where the term ai [00:15:08] there was a period where the term ai wasn't [00:15:08] wasn't really used at least not to the extent [00:15:10] really used at least not to the extent that it is today [00:15:12] that it is today and i think that part of it was to [00:15:14] and i think that part of it was to distance um [00:15:16] distance um to add distance to the failed attempts [00:15:19] to add distance to the failed attempts of the recent account of ai winter [00:15:22] of the recent account of ai winter and also because the goals were just [00:15:23] and also because the goals were just more down to earth people talked about [00:15:25] more down to earth people talked about machine learning [00:15:26] machine learning and then during that period there were [00:15:29] and then during that period there were two paradigms [00:15:31] two paradigms there was bayesian networks developed in [00:15:33] there was bayesian networks developed in 80s by judea pearl which provided [00:15:36] 80s by judea pearl which provided reasoning under uncertainty framework [00:15:40] reasoning under uncertainty framework which is something that a symbolic ai [00:15:42] which is something that a symbolic ai didn't have a satisfying answer for 1995 [00:15:45] didn't have a satisfying answer for 1995 support vector machines were developed [00:15:48] support vector machines were developed derived from ideas from learning theory [00:15:50] derived from ideas from learning theory and optimization [00:15:52] and optimization and at that time svms were easier to [00:15:54] and at that time svms were easier to turn tuned than neural networks and [00:15:56] turn tuned than neural networks and really became the favorite tool in [00:15:57] really became the favorite tool in machine learning before deep learning [00:15:59] machine learning before deep learning started taking off again [00:16:03] so to kind of wrap up [00:16:04] so to kind of wrap up you know the there's three stories that [00:16:06] you know the there's three stories that we talked about symbolic ai [00:16:09] we talked about symbolic ai took a top-down approach [00:16:11] took a top-down approach and really failed to deserve on its [00:16:13] and really failed to deserve on its original promise but it did offer a [00:16:16] original promise but it did offer a vision and built impressive artifacts [00:16:18] vision and built impressive artifacts like question answering and dialogue [00:16:20] like question answering and dialogue system managing trying to do this on [00:16:22] system managing trying to do this on ancient hardware in the 60s [00:16:26] ancient hardware in the 60s neural ai took a completely different [00:16:28] neural ai took a completely different approach proceeding bottom up starting [00:16:30] approach proceeding bottom up starting with simple perceptual tasks which the [00:16:33] with simple perceptual tasks which the symbolic community wasn't interested at [00:16:35] symbolic community wasn't interested at a time i compared machine translation [00:16:37] a time i compared machine translation with removing echoes on phone lines for [00:16:39] with removing echoes on phone lines for example but in the end it offered a [00:16:42] example but in the end it offered a class of models and a way of thinking [00:16:44] class of models and a way of thinking about data that has proven [00:16:47] about data that has proven capable of conquering [00:16:49] capable of conquering today's ambitious problems [00:16:51] today's ambitious problems and finally statistical ai [00:16:54] and finally statistical ai foremost [00:16:55] foremost for us will offer mathematical rigor and [00:16:57] for us will offer mathematical rigor and clarity for example in the course when [00:16:59] clarity for example in the course when we define define objective functions [00:17:02] we define define objective functions separate from optimization or have a [00:17:04] separate from optimization or have a language to talk about the complexity of [00:17:06] language to talk about the complexity of a model in learning [00:17:07] a model in learning these ideas and language all stem from [00:17:10] these ideas and language all stem from statistical ai and the course will [00:17:13] statistical ai and the course will actually be presented mostly through the [00:17:15] actually be presented mostly through the lens of statistical ai but i want to [00:17:17] lens of statistical ai but i want to highlight that all three views are kind [00:17:19] highlight that all three views are kind of compatible and just offer different [00:17:22] of compatible and just offer different advantages on the same underlying [00:17:24] advantages on the same underlying ideas [00:17:26] ideas stepping back you know the modern world [00:17:28] stepping back you know the modern world of ai is kind of like new york city it's [00:17:30] of ai is kind of like new york city it's a melting pot that has drawn largely [00:17:33] a melting pot that has drawn largely from a lot of different fields [00:17:34] from a lot of different fields statistics algorithms neuroscience [00:17:36] statistics algorithms neuroscience economics and it's really a symbiosis [00:17:39] economics and it's really a symbiosis between all these fields and how they [00:17:42] between all these fields and how they come together and allow you to tackle [00:17:44] come together and allow you to tackle real-world applications that makes our [00:17:46] real-world applications that makes our ai so rewarding [00:17:50] okay so that ends the [00:17:53] okay so that ends the uh the ai [00:17:55] uh the ai history module [00:17:57] history module you can read much more about it at a few [00:17:59] you can read much more about it at a few links at the end of these slides ================================================================================ LECTURE 003 ================================================================================ Artificial Intelligence Today | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=C0IhR4D5KYc --- Transcript [00:00:05] so if i had to use one word to describe [00:00:08] so if i had to use one word to describe ai today it would be [00:00:10] ai today it would be surreal [00:00:12] surreal it's kind of hard for me to imagine that [00:00:13] it's kind of hard for me to imagine that 10 years ago it was very much an [00:00:15] 10 years ago it was very much an academic endeavor and now countries are [00:00:18] academic endeavor and now countries are forming national strategies around they [00:00:20] forming national strategies around they are what [00:00:22] are what so the ai index [00:00:23] so the ai index is a project that aims to track [00:00:27] is a project that aims to track the status of ai and each year they [00:00:28] the status of ai and each year they release an annual report [00:00:30] release an annual report um here are some quotes from this report [00:00:32] um here are some quotes from this report compute doubling every 3.4 months um the [00:00:35] compute doubling every 3.4 months um the conference neurops [00:00:37] conference neurops increased over 800 percent in the last [00:00:40] increased over 800 percent in the last eight years number of jobs is also going [00:00:42] eight years number of jobs is also going up and so quantitatively at least we see [00:00:45] up and so quantitatively at least we see that you know shouldn't be surprising to [00:00:47] that you know shouldn't be surprising to people that ai [00:00:49] people that ai is just becoming a big deal [00:00:52] is just becoming a big deal qualitatively [00:00:54] qualitatively what [00:00:55] what i think is really interesting is that ai [00:00:57] i think is really interesting is that ai is transitioning [00:00:59] is transitioning from being in the lab to the real world [00:01:01] from being in the lab to the real world for a long time ai was limited to [00:01:03] for a long time ai was limited to relatively artificial environments which [00:01:05] relatively artificial environments which was useful for developing [00:01:07] was useful for developing methods [00:01:08] methods but now we're seeing real world [00:01:10] but now we're seeing real world deployment in ways that really impact [00:01:13] deployment in ways that really impact people's lives [00:01:15] people's lives and i want to stress that ai [00:01:17] and i want to stress that ai like any technology is an amplifier it [00:01:19] like any technology is an amplifier it makes what is good better and makes what [00:01:22] makes what is good better and makes what is bad worse and we really need to be [00:01:24] is bad worse and we really need to be aware of both sides [00:01:26] aware of both sides so let me start with the positives the [00:01:28] so let me start with the positives the prospects [00:01:30] prospects so here are some examples in which ai [00:01:33] so here are some examples in which ai has been well beneficial so in the last [00:01:37] has been well beneficial so in the last decade speech recognition question [00:01:38] decade speech recognition question answerings have gone [00:01:40] answerings have gone remarkably good and now you can talk to [00:01:42] remarkably good and now you can talk to your favorite assistant and expect some [00:01:44] your favorite assistant and expect some basic though obviously not perfect level [00:01:46] basic though obviously not perfect level of language understanding my [00:01:48] of language understanding my three-year-old is growing up thinking [00:01:50] three-year-old is growing up thinking that talking to computers is perfectly [00:01:51] that talking to computers is perfectly normal [00:01:53] normal and you know search engines like google [00:01:55] and you know search engines like google have told us [00:01:56] have told us that [00:01:57] that you know enabling [00:01:59] you know enabling power comes that comes with um [00:02:02] power comes that comes with um being able to tap into the world's rich [00:02:04] being able to tap into the world's rich information and now taking one step [00:02:07] information and now taking one step further these assistants allow uh this [00:02:09] further these assistants allow uh this information to be more efficiently and [00:02:12] information to be more efficiently and naturally accessible which could be [00:02:14] naturally accessible which could be especially useful for people do not have [00:02:16] especially useful for people do not have the means to use a computer [00:02:20] so there's language barriers in the [00:02:21] so there's language barriers in the world that pose significant [00:02:24] world that pose significant significant challenges to travelers [00:02:27] significant challenges to travelers immigrants [00:02:28] immigrants businesses minority subject communities [00:02:31] businesses minority subject communities and so connecting people is very [00:02:33] and so connecting people is very valuable so machine translation aims to [00:02:35] valuable so machine translation aims to overcome these barriers [00:02:36] overcome these barriers machine translation has come a long way [00:02:39] machine translation has come a long way since the 60s and while it's far from [00:02:41] since the 60s and while it's far from perfect it is really good enough for [00:02:43] perfect it is really good enough for someone to get the basic gist of a [00:02:45] someone to get the basic gist of a document written in a different language [00:02:47] document written in a different language or to have a real-time conversation with [00:02:49] or to have a real-time conversation with someone speaking in a completely [00:02:51] someone speaking in a completely different language [00:02:53] different language autonomous driving will someday [00:02:55] autonomous driving will someday hopefully be able to reduce the number [00:02:57] hopefully be able to reduce the number of accidents and congestion but a major [00:03:00] of accidents and congestion but a major challenge is to recognize what is going [00:03:03] challenge is to recognize what is going on in the unstructured environment and [00:03:05] on in the unstructured environment and computer vision has [00:03:08] computer vision has made a lot of progress towards [00:03:10] made a lot of progress towards recognizing these objects but [00:03:12] recognizing these objects but there is still headroom to be made to [00:03:14] there is still headroom to be made to ensure sufficient [00:03:16] ensure sufficient reliability an interesting application [00:03:18] reliability an interesting application is visual assistive technology so this [00:03:21] is visual assistive technology so this is an example it's called scene ai for [00:03:23] is an example it's called scene ai for microsoft research would you point a [00:03:25] microsoft research would you point a camera at something and it narrates um [00:03:27] camera at something and it narrates um what's going on there and so this [00:03:29] what's going on there and so this obviously could be a game changer for [00:03:31] obviously could be a game changer for the visually impaired [00:03:32] the visually impaired auto capturing technology is the [00:03:34] auto capturing technology is the opposite which it's also potentially [00:03:36] opposite which it's also potentially very impactful turning sound into sight [00:03:41] healthcare is another big area that's [00:03:43] healthcare is another big area that's growing in importance [00:03:45] growing in importance both for diagnosis and therapeutic [00:03:47] both for diagnosis and therapeutic development especially in areas where [00:03:50] development especially in areas where there is a shortage of [00:03:52] there is a shortage of clinical expertise [00:03:54] clinical expertise so an example of this is [00:03:56] so an example of this is detecting [00:03:58] detecting diseases based on chest x-rays or [00:04:02] diseases based on chest x-rays or diagnosing diabetic retinopathy [00:04:05] diagnosing diabetic retinopathy which is a you know one of the major [00:04:09] challenges in ai and health care these [00:04:11] challenges in ai and health care these days [00:04:12] days there's also interesting recent data set [00:04:13] there's also interesting recent data set that [00:04:14] that shows images of covalent 19 cells [00:04:17] shows images of covalent 19 cells infected uh [00:04:18] infected uh the 19 infected cells [00:04:20] the 19 infected cells and how they respond to certain drugs [00:04:22] and how they respond to certain drugs with the hope that one day we can find [00:04:24] with the hope that one day we can find drugs that can uh [00:04:27] drugs that can uh treat late stage cover 19. [00:04:30] poverty is a big a problem in the world [00:04:33] poverty is a big a problem in the world and but even figuring out [00:04:36] and but even figuring out the areas in greatest need is [00:04:38] the areas in greatest need is challenging [00:04:39] challenging so some recently people have been using [00:04:41] so some recently people have been using satellite imagery to try to [00:04:44] satellite imagery to try to figure this out because gathering survey [00:04:46] figure this out because gathering survey data on the ground is [00:04:48] data on the ground is very very expensive [00:04:50] very very expensive and using machine learning you can look [00:04:53] and using machine learning you can look at satellite images and try to predict [00:04:55] at satellite images and try to predict various wealth indicators this could be [00:04:57] various wealth indicators this could be really useful for governments and ngos [00:05:00] really useful for governments and ngos to take action and monitor the progress [00:05:04] to take action and monitor the progress so this sounds all great right so what's [00:05:06] so this sounds all great right so what's the catch [00:05:07] the catch well there's a lot of things [00:05:10] well there's a lot of things that one has to be aware of we'll [00:05:13] that one has to be aware of we'll give i just want to give you a general [00:05:15] give i just want to give you a general idea of the space and i'm going to go [00:05:17] idea of the space and i'm going to go fairly quickly here [00:05:19] fairly quickly here first is energy consumption [00:05:21] first is energy consumption so there is a genuine cost to training [00:05:24] so there is a genuine cost to training high performing models that we're seeing [00:05:26] high performing models that we're seeing today [00:05:27] today so if we look at nlp [00:05:28] so if we look at nlp there has been a trend of training more [00:05:30] there has been a trend of training more and more large language models back in [00:05:32] and more large language models back in 2018 which is like ancient history now [00:05:36] 2018 which is like ancient history now models had about 100 million parameters [00:05:39] models had about 100 million parameters only [00:05:40] only and then bird came along which some of [00:05:41] and then bird came along which some of you might have heard of of a big splash [00:05:44] you might have heard of of a big splash with 300 [00:05:45] with 300 and then in january this year [00:05:48] and then in january this year microsoft released a model which was 17 [00:05:51] microsoft released a model which was 17 billion parameters and then to top it [00:05:53] billion parameters and then to top it off open ai may release 10 times [00:05:57] off open ai may release 10 times larger model with 175 billion parameters [00:06:00] larger model with 175 billion parameters so this is [00:06:02] so this is big [00:06:03] big um last year there was a paper published [00:06:05] um last year there was a paper published that talked about the carbon footprint [00:06:07] that talked about the carbon footprint of training these models they looked at [00:06:10] of training these models they looked at transformer with 200 million [00:06:12] transformer with 200 million uh parameters which would be around [00:06:16] uh parameters which would be around here on this on this on this graph [00:06:18] here on this on this on this graph and they show that even this train if [00:06:20] and they show that even this train if you use neural architecture search this [00:06:22] you use neural architecture search this was five times [00:06:24] was five times the amount of co2 emissions in the [00:06:26] the amount of co2 emissions in the entire lifetime of the u.s car [00:06:30] entire lifetime of the u.s car so now i'll leave you to speculate how [00:06:32] so now i'll leave you to speculate how much dpd3 uh what gp 33's environmental [00:06:36] much dpd3 uh what gp 33's environmental footprint is [00:06:38] footprint is so needless to say a lot of people are [00:06:40] so needless to say a lot of people are actively trying to somehow reduce the [00:06:42] actively trying to somehow reduce the model's [00:06:44] model's size um [00:06:45] size um improve efficiency without sacrificing [00:06:47] improve efficiency without sacrificing accuracy [00:06:49] accuracy privacy is another big area so [00:06:52] privacy is another big area so machine learning algorithms have really [00:06:54] machine learning algorithms have really been developed to assume that data is [00:06:56] been developed to assume that data is just sitting there in one place and is [00:06:58] just sitting there in one place and is fully accessible [00:07:00] fully accessible but [00:07:01] but our mobile files generate a wealth of [00:07:02] our mobile files generate a wealth of information and we might not want to be [00:07:04] information and we might not want to be sending all that information up into [00:07:06] sending all that information up into some big internet company [00:07:08] some big internet company recently there's been a lot of active [00:07:09] recently there's been a lot of active work in privacy preserving machine [00:07:11] work in privacy preserving machine learning which allows some of the [00:07:13] learning which allows some of the learning to be happening on device in a [00:07:15] learning to be happening on device in a centralized way and only transmitting uh [00:07:18] centralized way and only transmitting uh various essential statistics to a [00:07:20] various essential statistics to a central server [00:07:21] central server security is another major uh challenge [00:07:25] security is another major uh challenge especially in high-stakes applications [00:07:26] especially in high-stakes applications like autonomous driving and face [00:07:28] like autonomous driving and face identification [00:07:31] identification for authentication [00:07:34] for authentication so here models not only need to be [00:07:35] so here models not only need to be accurate but robust against attackers [00:07:38] accurate but robust against attackers and malicious behavior which we know [00:07:40] and malicious behavior which we know exists in the world [00:07:42] exists in the world so researchers have shown that if you [00:07:44] so researchers have shown that if you can adversarial examples so if you take [00:07:46] can adversarial examples so if you take us images of stop sign and you actually [00:07:47] us images of stop sign and you actually post these stickers on them you can get [00:07:50] post these stickers on them you can get a state of the art [00:07:52] a state of the art system to think that these are speed [00:07:54] system to think that these are speed limit signs [00:07:56] limit signs or you can actually buy these cool [00:07:57] or you can actually buy these cool looking glasses that will trick face id [00:08:00] looking glasses that will trick face id to think that you're some celebrity that [00:08:02] to think that you're some celebrity that you're not [00:08:03] you're not so guarding against these attackers [00:08:05] so guarding against these attackers quite kind of frightening is still a [00:08:07] quite kind of frightening is still a wide open problem [00:08:09] wide open problem um bias was mentioned in the chat [00:08:12] um bias was mentioned in the chat um this is something that's maybe less [00:08:16] um this is something that's maybe less uh spectacular [00:08:17] uh spectacular in terms of [00:08:19] in terms of you know sudden impact but i think it's [00:08:20] you know sudden impact but i think it's more pernicious so here's an example [00:08:22] more pernicious so here's an example from machine translation if you take [00:08:24] from machine translation if you take hungarian [00:08:25] hungarian and you have the words he and she which [00:08:27] and you have the words he and she which are not differentiated you translate [00:08:29] are not differentiated you translate them into english [00:08:31] them into english so the machine translation model has to [00:08:33] so the machine translation model has to hallucinate the gender you can reveal [00:08:35] hallucinate the gender you can reveal all sorts of stereotypes that the model [00:08:38] all sorts of stereotypes that the model is harboring for example she is a nurse [00:08:40] is harboring for example she is a nurse baker wedding organizer but he is a [00:08:42] baker wedding organizer but he is a scientist engineer and teacher and ceo [00:08:45] scientist engineer and teacher and ceo and there's a lot of active work showing [00:08:47] and there's a lot of active work showing how [00:08:49] how hard it is to actually remove these [00:08:51] hard it is to actually remove these biases [00:08:52] biases so i want to say that [00:08:55] so i want to say that machine learning algorithms they are [00:08:57] machine learning algorithms they are based on [00:08:58] based on quote-unquote objective mathematical [00:09:00] quote-unquote objective mathematical principles but [00:09:02] principles but the trained models are trained on [00:09:04] the trained models are trained on uh to latch on to statistics in the data [00:09:08] uh to latch on to statistics in the data and the data comes from society so any [00:09:11] and the data comes from society so any biases [00:09:12] biases in society are reflected in data and [00:09:14] in society are reflected in data and propagated to model predictions and [00:09:16] propagated to model predictions and worse sometimes they're even amplified [00:09:21] so here's another case study um [00:09:24] so here's another case study um so north point is a company that [00:09:26] so north point is a company that produces a software called compass that [00:09:28] produces a software called compass that assesses um whether someone is going to [00:09:31] assesses um whether someone is going to commit a crime again [00:09:34] commit a crime again and so propublica this um nonprofit [00:09:37] and so propublica this um nonprofit organization that does some [00:09:38] organization that does some investigative journalism [00:09:40] investigative journalism came out and said whoa whoa whoa you are [00:09:43] came out and said whoa whoa whoa you are not being fair because given that an [00:09:45] not being fair because given that an individual did not reoffend black people [00:09:47] individual did not reoffend black people are twice as likely to be wrongly [00:09:48] are twice as likely to be wrongly classified as committing a of a higher [00:09:52] classified as committing a of a higher risk war than [00:09:53] risk war than white people her [00:09:55] white people her but north pine the north point defended [00:09:58] but north pine the north point defended themselves by saying that given a risk [00:10:00] themselves by saying that given a risk score of seven [00:10:02] score of seven sixty percent of white people are [00:10:03] sixty percent of white people are reoffended sixty percent of black people [00:10:05] reoffended sixty percent of black people are offended so [00:10:07] are offended so therefore it's it's fair [00:10:10] therefore it's it's fair so [00:10:12] so both of these actually turn out to be [00:10:14] both of these actually turn out to be simply different kind of desirada of [00:10:17] simply different kind of desirada of fairness and unfortunately there's some [00:10:19] fairness and unfortunately there's some actually impossibility results that say [00:10:22] actually impossibility results that say you can't have [00:10:23] you can't have these two and a third criteria that hold [00:10:26] these two and a third criteria that hold for imperfect classifiers at the same [00:10:28] for imperfect classifiers at the same time [00:10:30] time and given that these algorithms are you [00:10:32] and given that these algorithms are you know being actually deployed and really [00:10:35] know being actually deployed and really impacting people's lives in a [00:10:37] impacting people's lives in a huge way this indicates that we not only [00:10:40] huge way this indicates that we not only need to understand [00:10:42] need to understand the technical implications of all these [00:10:44] the technical implications of all these algorithms but also think about the [00:10:46] algorithms but also think about the philosophical and policy related issues [00:10:49] philosophical and policy related issues as well [00:10:52] so this one's kind of scary generating [00:10:54] so this one's kind of scary generating fake content [00:10:56] fake content deep learning has enabled us to generate [00:10:58] deep learning has enabled us to generate deep fix such as obama saying things [00:11:00] deep fix such as obama saying things that he never did which you can find [00:11:02] that he never did which you can find online [00:11:04] online um or more recently this is a blog post [00:11:07] um or more recently this is a blog post written by our friend gbt3 that was made [00:11:10] written by our friend gbt3 that was made it its way to number one on hacker news [00:11:14] it its way to number one on hacker news so it's completely clear at least to me [00:11:18] so it's completely clear at least to me that we've lost the ability to gen tell [00:11:19] that we've lost the ability to gen tell the difference between real and fake [00:11:21] the difference between real and fake content and given the ease and skill at [00:11:23] content and given the ease and skill at which fake content can now be generated [00:11:26] which fake content can now be generated back to actors spreading disinformation [00:11:28] back to actors spreading disinformation is i think a major threat to our society [00:11:32] is i think a major threat to our society finally ai systems are being deployed in [00:11:36] finally ai systems are being deployed in dynamic environments where you have [00:11:38] dynamic environments where you have systems which are making predictions [00:11:41] systems which are making predictions serving you search results giving you [00:11:42] serving you search results giving you recommendations serving ads [00:11:44] recommendations serving ads and users are taking actions by [00:11:48] and users are taking actions by clicking essentially [00:11:49] clicking essentially and these actions are recorded as data [00:11:53] and these actions are recorded as data this data is used to retrain the system [00:11:55] this data is used to retrain the system which furthermore reinforces these [00:11:57] which furthermore reinforces these actions so i think there's a very [00:11:59] actions so i think there's a very dangerous feedback loop inherent in [00:12:02] dangerous feedback loop inherent in machine learning where all these biases [00:12:05] machine learning where all these biases and are amplified and they're polarized [00:12:08] and are amplified and they're polarized and leads to quite unstable behavior [00:12:11] and leads to quite unstable behavior so i think the major open research [00:12:12] so i think the major open research challenges to how to figure out how to [00:12:15] challenges to how to figure out how to build more robust systems that are [00:12:18] build more robust systems that are not as susceptible to these uh unstable [00:12:21] not as susceptible to these uh unstable dynamics [00:12:24] so [00:12:25] so to conclude [00:12:27] to conclude i just want to stress that ai technology [00:12:30] i just want to stress that ai technology is an amplifier [00:12:32] is an amplifier and we've seen that ai can [00:12:35] and we've seen that ai can and promises to [00:12:37] and promises to be quite beneficial to society reducing [00:12:40] be quite beneficial to society reducing excel accessibility barriers improving [00:12:43] excel accessibility barriers improving efficiency [00:12:44] efficiency but on the other hand it can generally [00:12:47] but on the other hand it can generally amplify biases introduce new security [00:12:49] amplify biases introduce new security risk [00:12:50] risk centralized power [00:12:52] centralized power in ways that was kind of unprecedented [00:12:55] in ways that was kind of unprecedented you know before [00:12:57] you know before and i just want you to [00:12:59] and i just want you to keep these issues in mind as we go [00:13:01] keep these issues in mind as we go through the course [00:13:03] through the course just because you can build it doesn't [00:13:05] just because you can build it doesn't mean you should [00:13:07] mean you should and we could if we're not careful we [00:13:09] and we could if we're not careful we could potentially build something that [00:13:11] could potentially build something that does more harm than good [00:13:14] does more harm than good and moreover figuring out the best way [00:13:17] and moreover figuring out the best way to tread the line [00:13:18] to tread the line between positive prospects and negative [00:13:22] between positive prospects and negative risks is i think something that requires [00:13:25] risks is i think something that requires a deep technical understanding [00:13:27] a deep technical understanding especially if we are to develop novel [00:13:29] especially if we are to develop novel solutions [00:13:30] solutions and that's something that this course is [00:13:32] and that's something that this course is going to equip you with [00:13:36] so that concludes [00:13:38] so that concludes this module ================================================================================ LECTURE 004 ================================================================================ Artificial Intelligence and Machine Learning 1 - Overview | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=mtrYwgIrRNk --- Transcript [00:00:05] hi in this module i'm going to be [00:00:07] hi in this module i'm going to be talking about machine learning and give [00:00:08] talking about machine learning and give you an overview of all the topics we'll [00:00:10] you an overview of all the topics we'll cover [00:00:11] cover so remember that machine learning in the [00:00:13] so remember that machine learning in the process of taking data and converting it [00:00:15] process of taking data and converting it into models and with those models you [00:00:17] into models and with those models you can go and perform inferences and answer [00:00:19] can go and perform inferences and answer all sorts of questions [00:00:21] all sorts of questions so we're going to focus on reflex based [00:00:23] so we're going to focus on reflex based models these are models including linear [00:00:25] models these are models including linear classifiers neural networks which in [00:00:28] classifiers neural networks which in which inference is very fast and feed [00:00:30] which inference is very fast and feed forward which makes them very attractive [00:00:34] forward which makes them very attractive so in a nutshell this is what a reflex [00:00:36] so in a nutshell this is what a reflex space model is we'll call a reflex space [00:00:38] space model is we'll call a reflex space model a predictor and the predictor [00:00:40] model a predictor and the predictor takes as input sum x and produces some [00:00:44] takes as input sum x and produces some output [00:00:45] output y [00:00:46] y and in general x can be something [00:00:48] and in general x can be something arbitrary like an image or a text [00:00:51] arbitrary like an image or a text and [00:00:52] and y is going to be restricted and that [00:00:55] y is going to be restricted and that particular restriction is going to [00:00:56] particular restriction is going to determine what type of prediction task [00:00:58] determine what type of prediction task we are talking about [00:01:00] we are talking about we'll consider two common cases of [00:01:02] we'll consider two common cases of prediction tasks here [00:01:04] prediction tasks here the first is binary classification so in [00:01:07] the first is binary classification so in binary classification the predictor is [00:01:09] binary classification the predictor is also called a classifier [00:01:11] also called a classifier and the output y is called a label and [00:01:14] and the output y is called a label and that label can either be plus one for [00:01:16] that label can either be plus one for the positive class or minus one for the [00:01:19] the positive class or minus one for the negative class [00:01:21] negative class so some examples of binary [00:01:22] so some examples of binary classification problems there's fraud [00:01:24] classification problems there's fraud detection where x is a credit card [00:01:27] detection where x is a credit card transaction and we're trying to predict [00:01:29] transaction and we're trying to predict why whether there's fraud or no fraud so [00:01:32] why whether there's fraud or no fraud so that the transaction can be blocked or [00:01:34] that the transaction can be blocked or not [00:01:36] not another example is moderating online [00:01:38] another example is moderating online discussion forums so the input x is an [00:01:41] discussion forums so the input x is an online comment a piece of text and [00:01:43] online comment a piece of text and you're trying to predict why whether [00:01:45] you're trying to predict why whether it's toxic or not so that the comment [00:01:47] it's toxic or not so that the comment can be flagged or taken down [00:01:49] can be flagged or taken down appropriately [00:01:51] appropriately and finally here's an example from [00:01:54] and finally here's an example from physics [00:01:55] physics so after the higgs boson was discovered [00:01:57] so after the higgs boson was discovered scientists want to know how does it [00:01:58] scientists want to know how does it decay [00:02:00] decay so the hadron collider collected a bunch [00:02:02] so the hadron collider collected a bunch of data which includes [00:02:04] of data which includes measurements of events so here x is a [00:02:06] measurements of events so here x is a measurement of particular event you're [00:02:08] measurement of particular event you're trying to predict [00:02:10] trying to predict whether it was a decay event or simply [00:02:12] whether it was a decay event or simply background [00:02:15] so the second type of task we're going [00:02:17] so the second type of task we're going to consider is regression so in [00:02:19] to consider is regression so in regression [00:02:21] regression y is going to be a real number [00:02:24] y is going to be a real number and it's generally known as the response [00:02:28] and it's generally known as the response so here are some examples of regression [00:02:30] so here are some examples of regression problems so in poverty mapping acts as a [00:02:33] problems so in poverty mapping acts as a satellite image and you're trying to [00:02:34] satellite image and you're trying to predict why which is the asset wealth [00:02:37] predict why which is the asset wealth index of the homes in that area in the [00:02:40] index of the homes in that area in the satellite image [00:02:42] satellite image in housing you might want to predict [00:02:45] in housing you might want to predict using the information about the house [00:02:47] using the information about the house location number of bedrooms year and [00:02:50] location number of bedrooms year and predict the price [00:02:52] predict the price and finally [00:02:53] and finally you might be interested in predicting [00:02:54] you might be interested in predicting arrival times given where you're going [00:02:57] arrival times given where you're going why there are conditions at the time [00:02:59] why there are conditions at the time what time of day it is and you're trying [00:03:01] what time of day it is and you're trying to predict why which is a time of [00:03:03] to predict why which is a time of arrival [00:03:05] arrival so the main difference between [00:03:06] so the main difference between regression and classification is that in [00:03:08] regression and classification is that in classification y is a discrete [00:03:11] classification y is a discrete entity and in regression it is a [00:03:13] entity and in regression it is a continuous entity [00:03:17] so the final thing we're going to talk [00:03:19] so the final thing we're going to talk about is structured prediction so [00:03:21] about is structured prediction so instruction prediction is a little bit [00:03:22] instruction prediction is a little bit of a catch-all an instruction prediction [00:03:24] of a catch-all an instruction prediction y is simply a complex object [00:03:27] y is simply a complex object so some examples include machine [00:03:29] so some examples include machine translation where x the input is a [00:03:32] translation where x the input is a sentence in one language and y is its [00:03:35] sentence in one language and y is its translation in another language [00:03:37] translation in another language dialog can also be cast as structure [00:03:39] dialog can also be cast as structure prediction you're given a conversational [00:03:41] prediction you're given a conversational history between a user and agent for [00:03:43] history between a user and agent for example in a virtual assistant setting [00:03:45] example in a virtual assistant setting and you're trying to predict why which [00:03:47] and you're trying to predict why which is the next utterance that the agent [00:03:50] is the next utterance that the agent should say [00:03:52] should say another example is image captioning [00:03:54] another example is image captioning which might be useful for visual [00:03:55] which might be useful for visual assistive technologies [00:03:57] assistive technologies acts is an image of a scene and why is a [00:04:00] acts is an image of a scene and why is a sentence describing or narrating that [00:04:02] sentence describing or narrating that scene [00:04:05] image segmentation which is useful for [00:04:07] image segmentation which is useful for autonomous driving [00:04:09] autonomous driving takes an image of a scene as x and [00:04:11] takes an image of a scene as x and produces y which is a segmentation of [00:04:13] produces y which is a segmentation of that scene into a region's corresponding [00:04:17] that scene into a region's corresponding objects in the real world [00:04:20] objects in the real world so it might seem daunting at first to be [00:04:22] so it might seem daunting at first to be able to generate segmentations or [00:04:24] able to generate segmentations or sentences or [00:04:25] sentences or texts [00:04:26] texts but there's a secret here which is that [00:04:28] but there's a secret here which is that many structure prediction problems can [00:04:30] many structure prediction problems can be actually decomposed into a sequence [00:04:33] be actually decomposed into a sequence of multi-class [00:04:34] of multi-class classification problems [00:04:36] classification problems and this allows us to leverage the [00:04:38] and this allows us to leverage the machinery that we'll talk about in just [00:04:41] machinery that we'll talk about in just multi-class classification [00:04:43] multi-class classification or structure prediction [00:04:47] so here is the roadmap of [00:04:49] so here is the roadmap of the rest of the modules in the machine [00:04:52] the rest of the modules in the machine learning unit [00:04:53] learning unit so first we're going to start with [00:04:55] so first we're going to start with regression and classification the bread [00:04:57] regression and classification the bread and butter of machine learning we're [00:04:59] and butter of machine learning we're going to focus on the most simple [00:05:00] going to focus on the most simple settings linear models [00:05:02] settings linear models where we're training using radiant [00:05:04] where we're training using radiant descent [00:05:05] descent then we're going to step over to [00:05:07] then we're going to step over to algorithms and introduce stochastic [00:05:10] algorithms and introduce stochastic gradient descent which is going to give [00:05:11] gradient descent which is going to give us major speed ups over gradient descent [00:05:14] us major speed ups over gradient descent next we're going to hop over to models [00:05:17] next we're going to hop over to models and improve [00:05:19] and improve from linear models so first we'll show [00:05:22] from linear models so first we'll show that actually even linear models can be [00:05:25] that actually even linear models can be pushed to its limits by using non-linear [00:05:28] pushed to its limits by using non-linear features using the linear machinery we [00:05:30] features using the linear machinery we can use feature templates to organize [00:05:32] can use feature templates to organize the set of features that we have [00:05:35] the set of features that we have then we'll talk about neural networks [00:05:37] then we'll talk about neural networks which also allows you to have nonlinear [00:05:39] which also allows you to have nonlinear predictors but allow these [00:05:41] predictors but allow these nonlinearities to be learned from data [00:05:44] nonlinearities to be learned from data following neural networks we're going to [00:05:46] following neural networks we're going to look at the back propagation algorithm [00:05:48] look at the back propagation algorithm for computing gradients automatically so [00:05:50] for computing gradients automatically so you don't have to do it yourself [00:05:52] you don't have to do it yourself manually [00:05:53] manually so you can train neural networks [00:05:55] so you can train neural networks we're going to hop back over here and [00:05:57] we're going to hop back over here and talk about differential programming [00:05:59] talk about differential programming which is a generalization or extension [00:06:01] which is a generalization or extension of neural networks that will enable us [00:06:02] of neural networks that will enable us to build all sorts of complicated deep [00:06:05] to build all sorts of complicated deep learning models [00:06:06] learning models using [00:06:07] using like building blocks [00:06:10] like building blocks and all this is generally done in the [00:06:12] and all this is generally done in the context of supervised learning we're [00:06:13] context of supervised learning we're going to touch on unsupervised learning [00:06:15] going to touch on unsupervised learning a little bit and introduce the classical [00:06:17] a little bit and introduce the classical k-means algorithm for clustering points [00:06:20] k-means algorithm for clustering points and finally we're going to end on a few [00:06:22] and finally we're going to end on a few nodes so first is generalization the [00:06:25] nodes so first is generalization the question of if you train a machine [00:06:27] question of if you train a machine learning model on a particular set of [00:06:28] learning model on a particular set of data [00:06:29] data when does it [00:06:31] when does it is it able to generalize to a new set of [00:06:33] is it able to generalize to a new set of examples [00:06:35] examples and finally i'm going to talk about best [00:06:37] and finally i'm going to talk about best practices like cross-validation and how [00:06:39] practices like cross-validation and how do you do machine learning in practice [00:06:42] do you do machine learning in practice so that concludes this module [00:06:49] you ================================================================================ LECTURE 005 ================================================================================ Artificial Intelligence & Machine Learning 2 - Linear Regression | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=nEWNNt2KmfQ --- Transcript [00:00:05] hi in this module i'm going to cover the [00:00:07] hi in this module i'm going to cover the basics of linear regression [00:00:10] basics of linear regression a story of linear regression begins on [00:00:12] a story of linear regression begins on january 1 1801 [00:00:14] january 1 1801 italian astronomer piazzi looked up at [00:00:17] italian astronomer piazzi looked up at the night sky and discovered something [00:00:20] the night sky and discovered something which he named ceres he didn't know what [00:00:22] which he named ceres he didn't know what it was what's a comet or a planet but he [00:00:24] it was what's a comet or a planet but he did make some observations of the [00:00:26] did make some observations of the location before it was obscured by the [00:00:28] location before it was obscured by the sun the data he collected looked like [00:00:30] sun the data he collected looked like this [00:00:32] this at a particular time two numbers which [00:00:34] at a particular time two numbers which represent the location of the of series [00:00:38] represent the location of the of series in the night sky [00:00:40] in the night sky so now the big question at the time was [00:00:42] so now the big question at the time was when and where was ceres going to be [00:00:44] when and where was ceres going to be observed again to emerge from behind the [00:00:47] observed again to emerge from behind the sun [00:00:48] sun all the top astronomers at the time went [00:00:51] all the top astronomers at the time went and tried to analyze this data and [00:00:52] and tried to analyze this data and figure out at the answer [00:00:55] figure out at the answer so carl frederick gauss famous german [00:00:58] so carl frederick gauss famous german mathematician took piazza's data and [00:01:01] mathematician took piazza's data and created a model of series orbits and [00:01:03] created a model of series orbits and makes a prediction [00:01:04] makes a prediction this prediction was actually wildly [00:01:06] this prediction was actually wildly different from all the other predictions [00:01:08] different from all the other predictions that other astronomers made but in [00:01:10] that other astronomers made but in december [00:01:11] december series was located and gauss's [00:01:13] series was located and gauss's prediction was by far the most accurate [00:01:16] prediction was by far the most accurate so now there's an interesting story here [00:01:18] so now there's an interesting story here gauss was actually very secretive about [00:01:20] gauss was actually very secretive about what his method was and in 1805 the [00:01:23] what his method was and in 1805 the french mathematician legend was actually [00:01:26] french mathematician legend was actually the first to publish the method [00:01:28] the first to publish the method before accounts could publish in 1809 [00:01:30] before accounts could publish in 1809 even though gals had this method back in [00:01:32] even though gals had this method back in 1795. [00:01:34] 1795. the method here is none other than least [00:01:37] the method here is none other than least squares linear regression which is the [00:01:39] squares linear regression which is the topic of this module [00:01:42] topic of this module so here is the framework [00:01:44] so here is the framework so we are given some training data which [00:01:47] so we are given some training data which consists of a set of examples each [00:01:50] consists of a set of examples each example consists of an input x and [00:01:52] example consists of an input x and output y so 1 1 2 3 [00:01:56] output y so 1 1 2 3 4 3 [00:01:57] 4 3 and we can visualize these examples on a [00:02:00] and we can visualize these examples on a 2d plot here plotting y the output [00:02:03] 2d plot here plotting y the output against x the input so here is 1 1 [00:02:07] against x the input so here is 1 1 here is 2 3 and here is 4 3. [00:02:12] here is 2 3 and here is 4 3. so what we want to do is take this data [00:02:14] so what we want to do is take this data have a learning algorithm [00:02:16] have a learning algorithm produce a predictor f [00:02:19] produce a predictor f and the predictor in this case is uh [00:02:23] and the predictor in this case is uh let's say a line [00:02:25] let's say a line okay and what the predictor allows us to [00:02:27] okay and what the predictor allows us to do is to take new inputs such as this [00:02:29] do is to take new inputs such as this three here [00:02:30] three here and send it through and produce an [00:02:33] and send it through and produce an output [00:02:34] output 2.71 corresponding to this point on this [00:02:37] 2.71 corresponding to this point on this line here [00:02:40] and [00:02:42] and there are three design decisions that we [00:02:44] there are three design decisions that we need to make to flesh out this framework [00:02:46] need to make to flesh out this framework first [00:02:47] first what are the possible predictors that [00:02:49] what are the possible predictors that the learning algorithm is allowed to [00:02:50] the learning algorithm is allowed to output is it only lines or is it curves [00:02:53] output is it only lines or is it curves as well this is a question of what is [00:02:55] as well this is a question of what is the hypothesis class [00:02:57] the hypothesis class second question how good is a predictor [00:03:02] second question how good is a predictor and the answer is going to be framed in [00:03:04] and the answer is going to be framed in terms of determining a loss function [00:03:07] terms of determining a loss function that judges each and individual [00:03:08] that judges each and individual predictor in the hypothesis class [00:03:11] predictor in the hypothesis class and finally how do we actually compute [00:03:14] and finally how do we actually compute the best predictor there are a lot of [00:03:15] the best predictor there are a lot of predictors there [00:03:16] predictors there and if we even if we have the loss [00:03:18] and if we even if we have the loss function how do we go searching for them [00:03:21] function how do we go searching for them and this is going to be the question of [00:03:22] and this is going to be the question of the optimization algorithm [00:03:25] the optimization algorithm so this is a recipe that we're going to [00:03:27] so this is a recipe that we're going to see over and over again [00:03:29] see over and over again and it's kind of like a build your own [00:03:31] and it's kind of like a build your own learning algorithm [00:03:33] learning algorithm uh so we're going to start [00:03:35] uh so we're going to start with the first question what is a [00:03:37] with the first question what is a hypothesis class [00:03:39] hypothesis class so here is that predictor that we were [00:03:40] so here is that predictor that we were looking at f of x equals 1 plus 0.75 [00:03:44] looking at f of x equals 1 plus 0.75 now it's a five seven x [00:03:47] now it's a five seven x and that corresponds to this red line [00:03:51] and that corresponds to this red line and here's another one here's a purple [00:03:52] and here's another one here's a purple predictor which has [00:03:55] predictor which has uh a intercept of two and a slope of [00:03:58] uh a intercept of two and a slope of zero point two and in general you can [00:04:01] zero point two and in general you can consider predictors of the following [00:04:03] consider predictors of the following form [00:04:04] form f of x equals w1 plus w2 of x for [00:04:08] f of x equals w1 plus w2 of x for arbitrary w1 which is the intercept and [00:04:11] arbitrary w1 which is the intercept and w2 which is the slope [00:04:15] w2 which is the slope so now we're going to generalize this [00:04:17] so now we're going to generalize this using vector notation [00:04:19] using vector notation so let's take w 1 and w 2 and pack them [00:04:23] so let's take w 1 and w 2 and pack them up together [00:04:24] up together into a vector which we will call w this [00:04:28] into a vector which we will call w this is called the weight vector [00:04:30] is called the weight vector and we're also going to define a feature [00:04:32] and we're also going to define a feature extractor also known as a feature map [00:04:35] extractor also known as a feature map fee [00:04:36] fee so phi is going to take an arbitrary [00:04:38] so phi is going to take an arbitrary input x and return the vector one comma [00:04:42] input x and return the vector one comma x at least in this case and one comma x [00:04:45] x at least in this case and one comma x is going to be known as the feature [00:04:46] is going to be known as the feature vector [00:04:48] vector so now we can simply rewrite this [00:04:50] so now we can simply rewrite this equation up here in vector notation so [00:04:52] equation up here in vector notation so we're going to write f sub w to denote [00:04:55] we're going to write f sub w to denote that this predictor depends on the [00:04:56] that this predictor depends on the weights [00:04:57] weights of a particular input x is equal to w [00:05:01] of a particular input x is equal to w dot v of x [00:05:03] dot v of x and this w dot phi of x which we'll see [00:05:06] and this w dot phi of x which we'll see over and over again is called the [00:05:08] over and over again is called the score okay so here's an example [00:05:11] score okay so here's an example if you've taken three into this [00:05:13] if you've taken three into this predictor [00:05:14] predictor then what we're doing is taking the [00:05:16] then what we're doing is taking the weight vector [00:05:17] weight vector and dotting it with [00:05:19] and dotting it with the feature vector applied to three [00:05:22] the feature vector applied to three and remember the definition of feature [00:05:24] and remember the definition of feature vector is of one comma x so that's one [00:05:26] vector is of one comma x so that's one comma three here and if you take the dot [00:05:29] comma three here and if you take the dot product one times one plus zero point [00:05:30] product one times one plus zero point five seven times three that gives you [00:05:32] five seven times three that gives you two point seven one [00:05:36] so now finally the hypothesis class is [00:05:39] so now finally the hypothesis class is defined as [00:05:41] defined as the set script f [00:05:43] the set script f is in the set of all predictors [00:05:47] is in the set of all predictors f [00:05:48] f sub w where w can be arbitrary intercept [00:05:51] sub w where w can be arbitrary intercept in slope w is an arbitrary vector [00:05:56] okay so that defines our hypothesis [00:05:58] okay so that defines our hypothesis class that we're going to be working [00:06:00] class that we're going to be working with [00:06:02] so now let's turn to the second design [00:06:04] so now let's turn to the second design decision how good is a predictor [00:06:07] decision how good is a predictor so let's take our predictor that we're [00:06:09] so let's take our predictor that we're looking at the red one [00:06:11] looking at the red one and let's look at some training data so [00:06:13] and let's look at some training data so this is the training data that we had [00:06:14] this is the training data that we had before let's plot [00:06:16] before let's plot the predictor and the three data points [00:06:19] the predictor and the three data points this one the two three and four three [00:06:23] this one the two three and four three so intuitively how good is the predictor [00:06:25] so intuitively how good is the predictor is how [00:06:27] is how good it uh fits the training data and [00:06:30] good it uh fits the training data and we're gonna quantify that by measuring [00:06:32] we're gonna quantify that by measuring the distance between the prediction and [00:06:35] the distance between the prediction and the target [00:06:36] the target this difference is called the residual [00:06:38] this difference is called the residual so we're going to measure the residual [00:06:39] so we're going to measure the residual for each of our points [00:06:41] for each of our points and [00:06:42] and we're going to take that into account [00:06:44] we're going to take that into account so formally we're going to define a loss [00:06:47] so formally we're going to define a loss function [00:06:48] function which is a function of an example x y [00:06:51] which is a function of an example x y and a particular way vector [00:06:54] and a particular way vector and that's going to be equal to [00:06:56] and that's going to be equal to the prediction f of [00:06:58] the prediction f of x [00:06:59] x minus the target y [00:07:01] minus the target y so that's the residual here and i'm [00:07:03] so that's the residual here and i'm going to square it [00:07:06] going to square it so that is called the squared loss [00:07:09] so that is called the squared loss so when it's aside you could also take [00:07:11] so when it's aside you could also take the absolute value here which gives you [00:07:13] the absolute value here which gives you the absolute deviation loss but we're [00:07:15] the absolute deviation loss but we're going to stick with the square laws for [00:07:16] going to stick with the square laws for mathematical components [00:07:19] so on these three examples we can [00:07:21] so on these three examples we can compute the laws so we take 1 1 [00:07:25] compute the laws so we take 1 1 and the y vector we dot them together [00:07:27] and the y vector we dot them together that's the prediction you subtract off [00:07:30] that's the prediction you subtract off the target [00:07:31] the target and that [00:07:33] and that is [00:07:34] is and square it and that gives you 0.32 [00:07:37] and square it and that gives you 0.32 second example [00:07:38] second example is a [00:07:40] is a 2 3 and the third example is 4 3 each [00:07:43] 2 3 and the third example is 4 3 each one gives you a loss function which [00:07:45] one gives you a loss function which corresponds to the square of the length [00:07:47] corresponds to the square of the length of these [00:07:49] of these dashed lines [00:07:51] so now we can define the training loss [00:07:55] so now we can define the training loss of a particular y vector to be [00:07:59] of a particular y vector to be simply the average over the losses so [00:08:01] simply the average over the losses so formally this is going to be a sum [00:08:05] formally this is going to be a sum over all the examples [00:08:07] over all the examples in our training set [00:08:09] in our training set so these [00:08:11] so these and of the loss function of a particular [00:08:14] and of the loss function of a particular example with respect to that weight [00:08:16] example with respect to that weight vector and then finally we're going to [00:08:18] vector and then finally we're going to just divide by the number of points in [00:08:20] just divide by the number of points in the training set [00:08:23] the training set so in this example we just average these [00:08:25] so in this example we just average these three numbers and we get 0.3 [00:08:28] three numbers and we get 0.3 okay so that is how we define the [00:08:31] okay so that is how we define the squared loss and the training loss in [00:08:33] squared loss and the training loss in terms of the square loss [00:08:36] so [00:08:37] so here is a training loss as we had from [00:08:39] here is a training loss as we had from the previous slide and we can visualize [00:08:41] the previous slide and we can visualize this so for every single weight vector [00:08:43] this so for every single weight vector we can stick it in and get out a number [00:08:45] we can stick it in and get out a number so fortunately w is only two dimensional [00:08:48] so fortunately w is only two dimensional here so we can plot this actually so [00:08:50] here so we can plot this actually so here is the plot w1 w2 and every point [00:08:54] here is the plot w1 w2 and every point here gives you a training loss on the [00:08:56] here gives you a training loss on the z-axis [00:08:58] z-axis so red here denotes high loss [00:09:01] so red here denotes high loss blue here denotes low loss and [00:09:03] blue here denotes low loss and so it's natural to [00:09:06] so it's natural to think about how you would find the the [00:09:08] think about how you would find the the point here with the minimum training [00:09:10] point here with the minimum training loss and that's captured mathematically [00:09:12] loss and that's captured mathematically as minimum of a w of train loss [00:09:16] as minimum of a w of train loss of w so this is the optimization problem [00:09:19] of w so this is the optimization problem that we want to solve [00:09:23] so now the third question is how do you [00:09:26] so now the third question is how do you compute the best so fortunately we [00:09:28] compute the best so fortunately we already have our well-defined goal we [00:09:30] already have our well-defined goal we want to find the weight vector that [00:09:32] want to find the weight vector that minimizes the training loss [00:09:35] minimizes the training loss and we're going to adopt a very simple [00:09:36] and we're going to adopt a very simple strategy called follow your nose okay so [00:09:39] strategy called follow your nose okay so you start at a particular w and then you [00:09:42] you start at a particular w and then you sniff around and then you just move in [00:09:44] sniff around and then you just move in the direction that seems like it's going [00:09:46] the direction that seems like it's going to reduce your loss the most [00:09:48] to reduce your loss the most more mathematically uh we're going to [00:09:50] more mathematically uh we're going to define the gradient as uh the direction [00:09:53] define the gradient as uh the direction that increases the training loss the [00:09:55] that increases the training loss the most and importantly we want to go in [00:09:57] most and importantly we want to go in the opposite direction because we want [00:09:59] the opposite direction because we want to decrease the training loss not [00:10:01] to decrease the training loss not increase it [00:10:02] increase it so pictorially what the fall in the nose [00:10:04] so pictorially what the fall in the nose strategy or gradient descent is going to [00:10:06] strategy or gradient descent is going to look like is you're going to start at [00:10:08] look like is you're going to start at some w and then you're going to follow [00:10:11] some w and then you're going to follow the gradient and you're going to end up [00:10:12] the gradient and you're going to end up here and then you're going to look [00:10:14] here and then you're going to look compute the gradient you're going to end [00:10:15] compute the gradient you're going to end up here and you might bounce around a [00:10:17] up here and you might bounce around a bit but hopefully you'll decrease the [00:10:20] bit but hopefully you'll decrease the loss um on average over time [00:10:23] loss um on average over time okay so here's the pseudo code for [00:10:24] okay so here's the pseudo code for gradient descent [00:10:26] gradient descent so we initialize w to be something let's [00:10:28] so we initialize w to be something let's say zeros for simplicity [00:10:31] say zeros for simplicity and then we're going to repeat big t [00:10:33] and then we're going to repeat big t times big t is the number of epochs [00:10:36] times big t is the number of epochs and what we're going to do is we're [00:10:37] and what we're going to do is we're going to take our old value of the white [00:10:40] going to take our old value of the white vector and we're going to subtract out [00:10:44] vector and we're going to subtract out some eta which is called the the step [00:10:47] some eta which is called the the step size [00:10:50] um which we'll get into a little bit [00:10:52] um which we'll get into a little bit later so here's the step size times [00:10:55] later so here's the step size times the gradient so grad of returning loss [00:10:59] the gradient so grad of returning loss of w so that's called the gradient here [00:11:05] okay so that's it [00:11:07] okay so that's it there's three lines and really only one [00:11:10] there's three lines and really only one line that's of of interest here and [00:11:12] line that's of of interest here and that's all there is uh to gradient uh [00:11:14] that's all there is uh to gradient uh descent [00:11:17] okay so [00:11:19] okay so at least at an abstract level so all [00:11:22] at least at an abstract level so all that remains to do is to actually [00:11:24] that remains to do is to actually compute the gradient so remember here is [00:11:26] compute the gradient so remember here is our objective function train loss is the [00:11:28] our objective function train loss is the average over the individual losses which [00:11:31] average over the individual losses which i've expanded the square loss right now [00:11:33] i've expanded the square loss right now but gradient descent is actually much [00:11:35] but gradient descent is actually much more general than just [00:11:37] more general than just square lots or even machine learning [00:11:40] square lots or even machine learning and now we just need to compute the [00:11:41] and now we just need to compute the gradient [00:11:42] gradient and if you remember [00:11:44] and if you remember uh your calculus here's how you do it so [00:11:47] uh your calculus here's how you do it so the gradient with respect to w of the [00:11:50] the gradient with respect to w of the train loss of w so remember there's a [00:11:52] train loss of w so remember there's a lot of symbols here but we are [00:11:54] lot of symbols here but we are differentiating with respect to w not x [00:11:57] differentiating with respect to w not x not y not v [00:11:58] not y not v and this is going to be equal to the [00:12:00] and this is going to be equal to the gradient of this expression this is just [00:12:03] gradient of this expression this is just a constant so the gradient can be pushed [00:12:05] a constant so the gradient can be pushed inside [00:12:07] inside and this is a sum the gradient can be [00:12:09] and this is a sum the gradient can be pushed inside of some linearity [00:12:12] pushed inside of some linearity so this is a sum over the training set [00:12:14] so this is a sum over the training set again [00:12:15] again and then so now the interesting thing [00:12:17] and then so now the interesting thing happens so here is something squared the [00:12:20] happens so here is something squared the gradient of something squared you bring [00:12:22] gradient of something squared you bring down the two and then you have the same [00:12:26] down the two and then you have the same something which is uh [00:12:28] something which is uh you remember the residual [00:12:30] you remember the residual times the gradient of what's inside here [00:12:34] times the gradient of what's inside here and what's inside here is w dot phi of x [00:12:36] and what's inside here is w dot phi of x p of x is a constant y is a constant so [00:12:40] p of x is a constant y is a constant so the gradient of that residual is just [00:12:43] the gradient of that residual is just phi of x [00:12:45] phi of x and so notice that there's something [00:12:46] and so notice that there's something interesting i want to point out here [00:12:49] interesting i want to point out here which is that the gradient [00:12:51] which is that the gradient can be expressed as [00:12:53] can be expressed as the residual times the feature vector [00:12:55] the residual times the feature vector where the residual is the prediction [00:12:58] where the residual is the prediction minus the target [00:13:01] minus the target so intuitively you can think about if [00:13:03] so intuitively you can think about if the prediction is equal to target then [00:13:05] the prediction is equal to target then the gradient is zero so nothing will [00:13:07] the gradient is zero so nothing will happen [00:13:08] happen and if the prediction is not equal [00:13:09] and if the prediction is not equal target then [00:13:11] target then the gradient will be in the direction [00:13:13] the gradient will be in the direction that ascends the prediction away from [00:13:16] that ascends the prediction away from the target and remember we're always [00:13:17] the target and remember we're always minimizing so we're subtracting that off [00:13:20] minimizing so we're subtracting that off and which will move the weights in the [00:13:21] and which will move the weights in the right direction [00:13:24] okay so let's walk through the gradient [00:13:27] okay so let's walk through the gradient descent for our example here so here's a [00:13:29] descent for our example here so here's a training example [00:13:31] training example again [00:13:32] again here is expression for the gradient that [00:13:34] here is expression for the gradient that we just computed on the previous slide [00:13:36] we just computed on the previous slide and here is the gradient update where [00:13:37] and here is the gradient update where i've taken the liberty of just [00:13:39] i've taken the liberty of just substituting the step size to be just [00:13:41] substituting the step size to be just point one just for simplicity [00:13:43] point one just for simplicity okay so [00:13:45] okay so we start with w equals 0 0 [00:13:48] we start with w equals 0 0 and then what we're going to do is plug [00:13:49] and then what we're going to do is plug in 0 0 [00:13:51] in 0 0 into this extrane loss expression this [00:13:54] into this extrane loss expression this somewhat nasty looking thing [00:13:56] somewhat nasty looking thing and [00:13:57] and that is this this is just simply the [00:14:00] that is this this is just simply the average three examples one over three [00:14:02] average three examples one over three here's the first example here's the [00:14:04] here's the first example here's the second example here's the third example [00:14:06] second example here's the third example each example consists of a dot product [00:14:09] each example consists of a dot product the prediction minus target times the [00:14:11] the prediction minus target times the feature vector of that example [00:14:14] feature vector of that example so i'll let you go through the details [00:14:15] so i'll let you go through the details here but if you do the math you get [00:14:18] here but if you do the math you get minus 4.67 minus 12.67 you multiply by [00:14:22] minus 4.67 minus 12.67 you multiply by the step size and you get [00:14:25] the step size and you get this [00:14:26] this weight [00:14:29] weight okay so now the second iteration you're [00:14:30] okay so now the second iteration you're going to [00:14:32] going to take this way vector stick it into this [00:14:35] take this way vector stick it into this expression all over again you compute a [00:14:37] expression all over again you compute a new gradient and then you [00:14:40] new gradient and then you subtract that gradient times 0.1 from [00:14:43] subtract that gradient times 0.1 from this wave vector and you're going to get [00:14:44] this wave vector and you're going to get a new wave vector [00:14:46] a new wave vector and then you keep on repeating and [00:14:48] and then you keep on repeating and repeating so um after [00:14:51] repeating so um after maybe 200 iterations you're going to end [00:14:53] maybe 200 iterations you're going to end up with something like this [00:14:55] up with something like this and something interesting happens if [00:14:58] and something interesting happens if you're lucky the gradient at the end [00:15:00] you're lucky the gradient at the end will be [00:15:00] will be zero [00:15:02] zero so what is zero means so zero [00:15:04] so what is zero means so zero if you subtract out zero you get the [00:15:06] if you subtract out zero you get the same thing so means that gradient [00:15:08] same thing so means that gradient descent has converged by subtracting off [00:15:10] descent has converged by subtracting off the gradient you're not going to move [00:15:12] the gradient you're not going to move anywhere and as you might as well just [00:15:13] anywhere and as you might as well just stop and the stopping point is this way [00:15:15] stop and the stopping point is this way vector one comma [00:15:17] vector one comma 0.57 which is indeed the red classifier [00:15:24] so just to concretize this even more [00:15:27] so just to concretize this even more let's uh do gradient descent in python [00:15:30] let's uh do gradient descent in python okay so i'm going to pull out the [00:15:31] okay so i'm going to pull out the terminal here [00:15:32] terminal here um so [00:15:34] um so in practice you probably wouldn't [00:15:35] in practice you probably wouldn't implement gradient descent uh from [00:15:37] implement gradient descent uh from scratch except for if you're just trying [00:15:39] scratch except for if you're just trying to learn about brain descent but for [00:15:41] to learn about brain descent but for pedagogical purposes uh let me do this [00:15:44] pedagogical purposes uh let me do this okay [00:15:45] okay so um [00:15:47] so um i'm gonna do this in a kind of very bare [00:15:49] i'm gonna do this in a kind of very bare bones way [00:15:51] bones way so i'm gonna use [00:15:53] so i'm gonna use numpy [00:15:55] numpy rather than pi torch or something that [00:15:57] rather than pi torch or something that can do gradients for you [00:16:00] can do gradients for you first i'm going to define our training [00:16:01] first i'm going to define our training examples as 1 [00:16:04] examples as 1 1 [00:16:04] 1 2 3 and 4 3. [00:16:07] 2 3 and 4 3. i believe those are the training [00:16:08] i believe those are the training examples let me just double check over [00:16:11] examples let me just double check over here one one two three four three [00:16:13] here one one two three four three okay so now i have to define a feature [00:16:16] okay so now i have to define a feature vector of x which is remember is one x [00:16:21] vector of x which is remember is one x this is just a [00:16:23] this is just a numpy array [00:16:24] numpy array um i'm going to initialize uh the weight [00:16:28] um i'm going to initialize uh the weight vector with let's call this initial wave [00:16:30] vector with let's call this initial wave vector [00:16:33] vector and this is just going to be all zeros [00:16:35] and this is just going to be all zeros vector of dimension two which is going [00:16:38] vector of dimension two which is going to match the dimensionality of phi [00:16:42] okay so now i need to define the [00:16:44] okay so now i need to define the training loss [00:16:46] training loss training loss takes away vector [00:16:48] training loss takes away vector and i'm going to actually go to the [00:16:51] and i'm going to actually go to the previous slide here and it's just [00:16:53] previous slide here and it's just basically copying down math and turning [00:16:55] basically copying down math and turning it into code [00:16:57] it into code so this is 1 over [00:16:59] so this is 1 over the number of training examples [00:17:03] the number of training examples times the sum [00:17:06] times the sum um and the sum is over [00:17:08] um and the sum is over for all training examples x y and [00:17:11] for all training examples x y and training examples [00:17:13] training examples and for each one i'm going to do w [00:17:16] and for each one i'm going to do w uh dot e of x it's really literally the [00:17:20] uh dot e of x it's really literally the same thing minus y [00:17:21] same thing minus y and i'm going to take this expression [00:17:23] and i'm going to take this expression the residual and i'm going to square it [00:17:27] the residual and i'm going to square it okay so let's make this a little bit [00:17:28] okay so let's make this a little bit bigger [00:17:29] bigger okay so that's the training loss [00:17:32] okay so that's the training loss um okay now i need to take the gradient [00:17:35] um okay now i need to take the gradient so i'm going to cheat a little bit and [00:17:36] so i'm going to cheat a little bit and just copy that down here and edit it [00:17:39] just copy that down here and edit it so the gradient of the training loss is [00:17:42] so the gradient of the training loss is going to be [00:17:44] going to be 2 times the residual [00:17:47] 2 times the residual times [00:17:48] times e of x [00:17:50] e of x okay so that's it for the training loss [00:17:53] okay so that's it for the training loss okay so now i'm going to implement [00:17:55] okay so now i'm going to implement gradient uh descent [00:17:58] gradient uh descent so [00:17:58] so i'm going to [00:18:01] i'm going to um [00:18:02] um do it this way actually so gradient [00:18:04] do it this way actually so gradient descent [00:18:05] descent like i alluded to before is actually a [00:18:07] like i alluded to before is actually a general purpose optimization so all it [00:18:10] general purpose optimization so all it needs is a function [00:18:12] needs is a function gradient access to that function an [00:18:14] gradient access to that function an initial wave vector and it's ready to go [00:18:16] initial wave vector and it's ready to go okay so i'm going to [00:18:19] okay so i'm going to initialize w to the initial wave vector [00:18:22] initialize w to the initial wave vector and then i'm going to let's say eta to [00:18:25] and then i'm going to let's say eta to 0.1 [00:18:26] 0.1 um for [00:18:28] um for a number of iterations t [00:18:31] a number of iterations t in range of let's just say i know 500 [00:18:34] in range of let's just say i know 500 just for fun [00:18:36] just for fun um i'm going to [00:18:38] um i'm going to uh [00:18:39] uh evaluate the function at w [00:18:42] evaluate the function at w i'm going to evaluate the gradient [00:18:45] i'm going to evaluate the gradient um and i'm just going to do the one line [00:18:49] um and i'm just going to do the one line thing of [00:18:50] thing of subtracting out a to times the gradient [00:18:52] subtracting out a to times the gradient from the existing wave vector and [00:18:54] from the existing wave vector and setting it to the new way vector [00:18:57] setting it to the new way vector and i'm going to print out [00:19:01] where i am so f of t [00:19:04] where i am so f of t w equals w [00:19:06] w equals w f of w equals uh the value [00:19:10] f of w equals uh the value and let's do the gradient just one so [00:19:13] and let's do the gradient just one so grad [00:19:15] grad gradient f [00:19:16] gradient f equals um the gradient [00:19:19] equals um the gradient okay [00:19:21] okay okay so now i just need to call gradient [00:19:23] okay so now i just need to call gradient descent [00:19:24] descent with [00:19:26] with what function am i optimizing the train [00:19:28] what function am i optimizing the train loss [00:19:30] loss the gradient of the train loss [00:19:33] the gradient of the train loss is the gradient of the train loss and [00:19:36] is the gradient of the train loss and the initial weight vector [00:19:38] the initial weight vector okay [00:19:40] okay so [00:19:41] so uh that's all i have and let's actually [00:19:43] uh that's all i have and let's actually just [00:19:44] just you know run it [00:19:47] gradient descent [00:19:48] gradient descent um [00:19:50] um so we see here that in f x 0 the wave [00:19:53] so we see here that in f x 0 the wave vector [00:19:54] vector is something [00:19:55] is something and the function value is something [00:19:57] and the function value is something greater than something and over time [00:20:00] greater than something and over time the function value is going to decrease [00:20:02] the function value is going to decrease which is a good sign [00:20:04] which is a good sign the gradient of f is going to start [00:20:07] the gradient of f is going to start becoming zero zero [00:20:09] becoming zero zero and uh and the weight vectors are now [00:20:13] and uh and the weight vectors are now converging to [00:20:14] converging to one comma zero point five seven as [00:20:17] one comma zero point five seven as advertised [00:20:19] advertised so i will declare this program working [00:20:21] so i will declare this program working let's just kind of summarize what we did [00:20:23] let's just kind of summarize what we did here [00:20:24] here so i want to set this up as follows so [00:20:27] so i want to set this up as follows so here is the optimization problem [00:20:30] here is the optimization problem which is [00:20:31] which is how you you have the training example [00:20:33] how you you have the training example the future vectors the loss and gradient [00:20:35] the future vectors the loss and gradient and so on and this is kind of a [00:20:39] and so on and this is kind of a specification of what the problem we [00:20:41] specification of what the problem we want to solve is [00:20:42] want to solve is and then down here [00:20:44] and then down here we have the optimization algorithm [00:20:48] we have the optimization algorithm and we're going to be doing this a few [00:20:49] and we're going to be doing this a few times throughout the course drawing kind [00:20:51] times throughout the course drawing kind of [00:20:52] of modules where we can separate the [00:20:55] modules where we can separate the optimization problem from the [00:20:57] optimization problem from the optimization algorithm notice that on [00:20:59] optimization algorithm notice that on the optimization algorithm again doesn't [00:21:01] the optimization algorithm again doesn't depend on anything [00:21:03] depend on anything relating to [00:21:04] relating to machine learning at all [00:21:06] machine learning at all and the optimization problem doesn't say [00:21:08] and the optimization problem doesn't say anything about how you solve it so it's [00:21:10] anything about how you solve it so it's decoupling this what from the how i [00:21:12] decoupling this what from the how i think is a really important thing [00:21:16] think is a really important thing okay so that was gradient descent in [00:21:19] okay so that was gradient descent in code [00:21:20] code and let's summarize now so in summary we [00:21:24] and let's summarize now so in summary we take training data we have a learning [00:21:25] take training data we have a learning algorithm that produces a predictor that [00:21:28] algorithm that produces a predictor that can produce new predictions on new [00:21:30] can produce new predictions on new inputs [00:21:31] inputs and there are three design decisions or [00:21:34] and there are three design decisions or build your own learning algorithm which [00:21:36] build your own learning algorithm which predictors are possible [00:21:38] predictors are possible that is a question of hypothesis class [00:21:40] that is a question of hypothesis class and we consider linear functions here [00:21:42] and we consider linear functions here where [00:21:43] where the function is simply w dot phi of x [00:21:47] the function is simply w dot phi of x with a particular [00:21:49] with a particular feature map one comma x [00:21:51] feature map one comma x um you could imagine other things we'll [00:21:53] um you could imagine other things we'll see other things later non-linear uh [00:21:55] see other things later non-linear uh features and even neural networks but [00:21:58] features and even neural networks but it's still the question of what is the [00:21:59] it's still the question of what is the hypothesis class [00:22:01] hypothesis class how good is a predictor [00:22:03] how good is a predictor that's a question of what is the loss [00:22:04] that's a question of what is the loss function for regression we looked at the [00:22:07] function for regression we looked at the squared loss [00:22:08] squared loss later for classification we're going to [00:22:10] later for classification we're going to look at the hinge loss and the zero one [00:22:12] look at the hinge loss and the zero one loss [00:22:13] loss um but this is orthogonal for neural [00:22:15] um but this is orthogonal for neural networks we can also look at the hinge [00:22:17] networks we can also look at the hinge locks or the square lots or any of the [00:22:19] locks or the square lots or any of the losses [00:22:20] losses and finally [00:22:22] and finally how do we compute the best predictor [00:22:24] how do we compute the best predictor this is a question of what is the [00:22:25] this is a question of what is the optimization algorithm [00:22:27] optimization algorithm and for this we introduce gradient [00:22:29] and for this we introduce gradient descent which is this lovely uh simple [00:22:32] descent which is this lovely uh simple and very effective algorithm for [00:22:34] and very effective algorithm for optimization [00:22:36] optimization so that concludes this module [00:22:43] you ================================================================================ LECTURE 006 ================================================================================ Artificial Intelligence & Machine learning 3 - Linear Classification | Stanford CS221 (Autumn 2021) Source: https://www.youtube.com/watch?v=WcaMiqJR09s --- Transcript [00:00:05] hi this module is about linear [00:00:07] hi this module is about linear classification we're going to go through [00:00:09] classification we're going to go through a linear classification via a simple [00:00:11] a linear classification via a simple example just like we did for linear [00:00:12] example just like we did for linear regression [00:00:14] regression so as before we have training data which [00:00:17] so as before we have training data which consists of a set of examples [00:00:20] consists of a set of examples and each example [00:00:22] and each example is now going to be an input x1 x2 [00:00:26] is now going to be an input x1 x2 followed by a label y [00:00:28] followed by a label y so we have three examples here the input [00:00:30] so we have three examples here the input zero two [00:00:32] zero two is has output one [00:00:34] is has output one minus two zero has output one [00:00:36] minus two zero has output one and minus one minus one [00:00:39] and minus one minus one sorry one minus one has output minus one [00:00:43] sorry one minus one has output minus one we can visualize these points just the [00:00:45] we can visualize these points just the input part on the [00:00:47] input part on the on the 2d diagram so where i'm plotting [00:00:50] on the 2d diagram so where i'm plotting x1 by x2 [00:00:53] x1 by x2 so here is 0 2 [00:00:55] so here is 0 2 and i'm coloring it [00:00:57] and i'm coloring it orange to denote that as a positive [00:00:59] orange to denote that as a positive point [00:01:00] point this is minus 2 0 that's also orange [00:01:03] this is minus 2 0 that's also orange because it's positive and here is ma 1 [00:01:06] because it's positive and here is ma 1 minus 1 [00:01:08] minus 1 which is blue because it's labeled as [00:01:11] which is blue because it's labeled as negative [00:01:13] negative so given these points we want to design [00:01:15] so given these points we want to design a learning algorithm [00:01:16] a learning algorithm that can output a predictor in [00:01:18] that can output a predictor in classification it's known as a [00:01:20] classification it's known as a classifier [00:01:21] classifier and this classifier can take new inputs [00:01:24] and this classifier can take new inputs and crank them through the classifier [00:01:26] and crank them through the classifier and produce an output label [00:01:30] and produce an output label and so this is demonstrated um as [00:01:34] and so this is demonstrated um as follows on the plot as follows so the [00:01:36] follows on the plot as follows so the classifier in [00:01:37] classifier in classification is going to be [00:01:39] classification is going to be represented by a decision boundary [00:01:42] represented by a decision boundary so this decision boundary carves up the [00:01:45] so this decision boundary carves up the space into a region [00:01:48] space into a region where [00:01:49] where the points are labeled positive [00:01:52] the points are labeled positive and the region where the points are [00:01:54] and the region where the points are labeled minus [00:01:56] labeled minus so two zero is going to be predicted as [00:01:59] so two zero is going to be predicted as a minus one in this case [00:02:03] a minus one in this case okay as before we have three design [00:02:05] okay as before we have three design decisions we need to settle [00:02:07] decisions we need to settle first which classifiers are possible [00:02:11] first which classifiers are possible and this is a question of the hypothesis [00:02:13] and this is a question of the hypothesis class we're going to consider [00:02:15] class we're going to consider are the decision boundaries going to be [00:02:17] are the decision boundaries going to be straight or can they be curved [00:02:20] straight or can they be curved second how good is a classifier this is [00:02:22] second how good is a classifier this is a question of a loss function [00:02:24] a question of a loss function and third how do we compute [00:02:26] and third how do we compute the best classifier [00:02:28] the best classifier aka the classifier with the lowest loss [00:02:32] aka the classifier with the lowest loss and that's going to be a question of the [00:02:34] and that's going to be a question of the optimization algorithm [00:02:37] optimization algorithm so [00:02:39] so before we begin talking about [00:02:41] before we begin talking about the design space of [00:02:44] the design space of the hypothesis class i'm going to focus [00:02:46] the hypothesis class i'm going to focus on an example linear classifier here [00:02:49] on an example linear classifier here so we have f of x equals [00:02:53] so we have f of x equals and then we have i'm going to define [00:02:55] and then we have i'm going to define this weight vector w [00:02:57] this weight vector w to be [00:02:59] to be minus 0.6 comma 0.6 [00:03:05] and i'm going to take the dot product [00:03:08] and i'm going to take the dot product with [00:03:09] with a feature vector [00:03:11] a feature vector which is going to be [00:03:13] which is going to be just the identity feature vector mapping [00:03:15] just the identity feature vector mapping to x1 comma x2 remember x is [00:03:18] to x1 comma x2 remember x is now a two-dimensional [00:03:20] now a two-dimensional list of [00:03:21] list of two numbers [00:03:23] two numbers and then i'm going to take this dot [00:03:24] and then i'm going to take this dot product i'm going to take the sign [00:03:28] product i'm going to take the sign and remember the sign of a scalar is [00:03:32] and remember the sign of a scalar is equal to [00:03:36] plus one if that scalar is positive [00:03:39] plus one if that scalar is positive minus one if it's negative [00:03:42] minus one if it's negative and zero [00:03:43] and zero if it is [00:03:44] if it is zero [00:03:47] zero okay so [00:03:49] okay so um how [00:03:50] um how let's see what this classifier does on [00:03:52] let's see what this classifier does on some points [00:03:54] some points so each point is x1 x2 so let's look at [00:03:58] so each point is x1 x2 so let's look at zero [00:03:59] zero okay so let's look at where 0 2 is [00:04:02] okay so let's look at where 0 2 is on the plot 0 2 is right here [00:04:05] on the plot 0 2 is right here and i'm going to represent it by this [00:04:07] and i'm going to represent it by this vector here [00:04:08] vector here and now [00:04:10] and now this vector is phi of x [00:04:12] this vector is phi of x w is going to be this vector here that's [00:04:16] w is going to be this vector here that's the weight vector [00:04:18] the weight vector and [00:04:18] and the dot product remembering from linear [00:04:21] the dot product remembering from linear algebra is the cosine of this angle [00:04:25] algebra is the cosine of this angle and in particular the dot product is [00:04:27] and in particular the dot product is positive if and only if this angle is [00:04:29] positive if and only if this angle is acute [00:04:30] acute and it's [00:04:32] and it's negative if the angle is up to so in [00:04:34] negative if the angle is up to so in this case it is [00:04:36] this case it is acute so therefore this point is going [00:04:39] acute so therefore this point is going to be classified as positive [00:04:43] to be classified as positive so let's take another point so [00:04:45] so let's take another point so minus 2 0 [00:04:47] minus 2 0 minus 2 0 is here [00:04:50] minus 2 0 is here and this angle is also acute so [00:04:53] and this angle is also acute so therefore [00:04:54] therefore this point is also labeled as positive [00:04:58] this point is also labeled as positive and the third point is one minus one so [00:05:01] and the third point is one minus one so one minus one is over here [00:05:03] one minus one is over here and now this angle between the red and [00:05:06] and now this angle between the red and the blue is obtuse [00:05:08] the blue is obtuse therefore the [00:05:10] therefore the sign is negative [00:05:13] sign is negative so you can [00:05:14] so you can understand how a classifier behaves [00:05:17] understand how a classifier behaves geometrically [00:05:18] geometrically um but you can also do this uh [00:05:20] um but you can also do this uh symbolically by following the math so if [00:05:23] symbolically by following the math so if you plug in the first point 0 2 [00:05:26] you plug in the first point 0 2 you take the sign [00:05:28] you take the sign the dot product is 1.2 you take the sign [00:05:31] the dot product is 1.2 you take the sign and you get 1. [00:05:32] and you get 1. if you take the second point [00:05:35] if you take the second point the sign is also 1.2 you get 1. and you [00:05:38] the sign is also 1.2 you get 1. and you take the third point the sign is minus [00:05:40] take the third point the sign is minus 1.2 and the sine of minus 1.2 is minus [00:05:43] 1.2 and the sine of minus 1.2 is minus 1. [00:05:45] 1. okay so you can kind of see the pattern [00:05:46] okay so you can kind of see the pattern now [00:05:48] now so [00:05:48] so we have [00:05:49] we have any point over here that forms an acute [00:05:52] any point over here that forms an acute angle with this wave vector [00:05:55] angle with this wave vector minus 0.6.6 is going to be labeled as [00:05:59] minus 0.6.6 is going to be labeled as positive [00:06:00] positive and anything that forms an obtuse angle [00:06:03] and anything that forms an obtuse angle with this wave vector is going to be [00:06:05] with this wave vector is going to be labeled as negative [00:06:06] labeled as negative and the decision boundary is exactly [00:06:09] and the decision boundary is exactly those points [00:06:10] those points that are perpendicular and indeed you [00:06:13] that are perpendicular and indeed you can see that this is a right angle here [00:06:15] can see that this is a right angle here these are the points which the [00:06:16] these are the points which the classifier just doesn't know if it's [00:06:18] classifier just doesn't know if it's positive or negative [00:06:22] okay [00:06:23] okay so that was one particular classifier um [00:06:27] so that was one particular classifier um that was this one [00:06:29] that was this one but we can imagine other ones [00:06:31] but we can imagine other ones so we can imagine this purple classifier [00:06:34] so we can imagine this purple classifier which has weights [00:06:35] which has weights 0.5 and 1 [00:06:37] 0.5 and 1 and that corresponds to [00:06:40] and that corresponds to this point here [00:06:42] this point here so that is 0.51 and remember the [00:06:44] so that is 0.51 and remember the decision boundary is the thing that is [00:06:47] decision boundary is the thing that is perpendicular or normal to the weight [00:06:49] perpendicular or normal to the weight vector and in 2d it's given by this line [00:06:52] vector and in 2d it's given by this line so this purple classifier will classify [00:06:54] so this purple classifier will classify all these points plus and all of these [00:06:57] all these points plus and all of these points minus [00:06:59] points minus in general [00:07:01] in general the binary classifier f sub w [00:07:05] the binary classifier f sub w uh where w is the weights of a [00:07:08] uh where w is the weights of a particular input x is equal to [00:07:11] particular input x is equal to you take the dot product [00:07:13] you take the dot product and then you take the sign of that dot [00:07:15] and then you take the sign of that dot product [00:07:17] product okay [00:07:19] okay and the hypothesis class as before is [00:07:22] and the hypothesis class as before is just simply the set of all possible [00:07:24] just simply the set of all possible classifiers by ranging the weights over [00:07:27] classifiers by ranging the weights over any two real numbers [00:07:32] so that's the hypothesis class now let's [00:07:34] so that's the hypothesis class now let's go on to the second design decision [00:07:37] go on to the second design decision what is a good loss function [00:07:39] what is a good loss function okay so let's take our purple classifier [00:07:43] okay so let's take our purple classifier and some training data [00:07:46] and some training data and we're going to evaluate how [00:07:49] and we're going to evaluate how good this classifier is on this training [00:07:51] good this classifier is on this training data okay so the training data uh [00:07:54] data okay so the training data uh let's go through this so here's the [00:07:56] let's go through this so here's the classifier and the first point is zero [00:08:00] classifier and the first point is zero two [00:08:01] two and this was labeled as uh plus one okay [00:08:05] and this was labeled as uh plus one okay so that is this [00:08:06] so that is this point over here [00:08:08] point over here um [00:08:10] um and this classifier is predicted [00:08:12] and this classifier is predicted correctly because it's on this side it's [00:08:15] correctly because it's on this side it's a positive label and it's the classifier [00:08:17] a positive label and it's the classifier also thinks it's positive [00:08:19] also thinks it's positive so therefore we expect low loss [00:08:22] so therefore we expect low loss whereas this point over here [00:08:25] whereas this point over here minus 2 0 [00:08:27] minus 2 0 is labeled as positive but it's on the [00:08:29] is labeled as positive but it's on the other side of the decision boundary and [00:08:31] other side of the decision boundary and therefore it's classified incorrectly [00:08:35] therefore it's classified incorrectly on this point [00:08:37] on this point 1 minus 1 minus 1 [00:08:40] 1 minus 1 minus 1 is over here and it's is labeled in the [00:08:43] is over here and it's is labeled in the training data as minus [00:08:46] training data as minus it is on this side of the decision [00:08:47] it is on this side of the decision boundary so it's predicted as minus [00:08:49] boundary so it's predicted as minus therefore it is labeled correctly as [00:08:52] therefore it is labeled correctly as well [00:08:53] well so to formalize this we're going to [00:08:55] so to formalize this we're going to define something called the zero one [00:08:57] define something called the zero one loss [00:08:58] loss and just like any loss function it takes [00:09:00] and just like any loss function it takes in a particular example and a wave [00:09:03] in a particular example and a wave vector [00:09:04] vector and it looks at the prediction [00:09:07] and it looks at the prediction and the the target and says do they [00:09:10] and the the target and says do they disagree [00:09:11] disagree and if they disagree [00:09:13] and if they disagree then this indicator function will return [00:09:15] then this indicator function will return one [00:09:16] one and if they agree then the indicator [00:09:18] and if they agree then the indicator function returns zero so this is a zero [00:09:20] function returns zero so this is a zero on loss so mathematically you can walk [00:09:24] on loss so mathematically you can walk through these calculations you plug in [00:09:26] through these calculations you plug in the first point [00:09:29] the first point and you look at the sign [00:09:31] and you look at the sign the sign here is going to be [00:09:35] the sign here is going to be two [00:09:36] two the dot product of two is one [00:09:38] the dot product of two is one they don't disagree so that's a zero the [00:09:41] they don't disagree so that's a zero the second point [00:09:42] second point they do disagree so the loss is one and [00:09:45] they do disagree so the loss is one and the third point they also don't disagree [00:09:48] the third point they also don't disagree and the loss is zero [00:09:51] and the loss is zero and as before the training loss [00:09:53] and as before the training loss over the entire training set of examples [00:09:56] over the entire training set of examples is [00:09:58] is just simply the average over the per [00:10:00] just simply the average over the per example losses and in this case it's one [00:10:02] example losses and in this case it's one third [00:10:07] so before we move on to the design [00:10:09] so before we move on to the design decision of how to optimize the loss [00:10:11] decision of how to optimize the loss function let's spend some time [00:10:13] function let's spend some time understanding two important concepts so [00:10:15] understanding two important concepts so we can rewrite the zero loss in a [00:10:18] we can rewrite the zero loss in a slightly different way [00:10:19] slightly different way so recall that the predicted label on [00:10:21] so recall that the predicted label on the particular input is the sign of the [00:10:24] the particular input is the sign of the dot product and [00:10:26] dot product and the target label is y [00:10:30] so [00:10:31] so the score [00:10:33] the score is something that we've seen before a [00:10:36] is something that we've seen before a score on example is simply this [00:10:39] score on example is simply this expression [00:10:40] expression which is a dot product inside the sign [00:10:43] which is a dot product inside the sign and while the sign is just one or minus [00:10:46] and while the sign is just one or minus one [00:10:47] one um the score is a real number which [00:10:50] um the score is a real number which uh intuitively represents how confident [00:10:53] uh intuitively represents how confident we are in predicting plus one [00:10:56] we are in predicting plus one so points over here have large dot [00:10:59] so points over here have large dot product with this purple weight vector [00:11:01] product with this purple weight vector and have high score um ones on the [00:11:03] and have high score um ones on the decision boundary have zero score ones [00:11:06] decision boundary have zero score ones over here have um very negative scores [00:11:10] over here have um very negative scores the second concept is that of a margin [00:11:13] the second concept is that of a margin which takes takes into account the [00:11:15] which takes takes into account the target label [00:11:16] target label so the margin on the example is simply [00:11:19] so the margin on the example is simply the score [00:11:20] the score times [00:11:21] times the correct target label [00:11:23] the correct target label and this measures how correct we are [00:11:26] and this measures how correct we are notice that you can be confident but not [00:11:28] notice that you can be confident but not correct [00:11:29] correct uh important life lesson [00:11:31] uh important life lesson so um [00:11:34] so um so if y is positive [00:11:37] so if y is positive then the margin is going to be high when [00:11:40] then the margin is going to be high when this number is hugely positive and if y [00:11:43] this number is hugely positive and if y is minus one [00:11:45] is minus one then the margin is going to be high when [00:11:47] then the margin is going to be high when this score is hugely negative [00:11:52] okay so with these two definitions in [00:11:55] okay so with these two definitions in mind we can now [00:11:56] mind we can now look at the zero and loss again remember [00:11:59] look at the zero and loss again remember the zero on loss is an indicator of [00:12:01] the zero on loss is an indicator of whether the prediction and target [00:12:02] whether the prediction and target disagree [00:12:04] disagree but now we can represent it in terms of [00:12:06] but now we can represent it in terms of the margin [00:12:07] the margin so this is the expression it's basically [00:12:09] so this is the expression it's basically when the margin is less or equal to zero [00:12:12] when the margin is less or equal to zero so remember [00:12:13] so remember positive margin means that we're [00:12:15] positive margin means that we're classifying correctly a negative margin [00:12:17] classifying correctly a negative margin means that we're classifying incorrectly [00:12:20] means that we're classifying incorrectly and we can visualize this [00:12:22] and we can visualize this as follows so here i'm plotting the [00:12:24] as follows so here i'm plotting the margin [00:12:25] margin against the loss [00:12:27] against the loss and if the margin is positive greater [00:12:30] and if the margin is positive greater than zero the loss is zero and if the [00:12:33] than zero the loss is zero and if the margin is less or equal to zero then the [00:12:36] margin is less or equal to zero then the loss is one [00:12:41] okay so that is zero one losses [00:12:43] okay so that is zero one losses expressed in the margin [00:12:45] expressed in the margin okay so now let's optimize the third [00:12:47] okay so now let's optimize the third design decision let's do [00:12:49] design decision let's do um [00:12:50] um optimize the training loss [00:12:52] optimize the training loss we want to find the minimum weight [00:12:54] we want to find the minimum weight vector that minimizes this expression [00:12:56] vector that minimizes this expression which is the average of the individual [00:12:57] which is the average of the individual losses [00:12:58] losses and let's just use gradient descent as [00:13:01] and let's just use gradient descent as we did before [00:13:02] we did before and to do it we have to compute the [00:13:03] and to do it we have to compute the gradients so the gradient of the [00:13:05] gradients so the gradient of the training loss is equal to the sum over [00:13:07] training loss is equal to the sum over the gradient of the individual losses [00:13:10] the gradient of the individual losses you look at the individual losses take [00:13:12] you look at the individual losses take the gradient and now you have to take [00:13:14] the gradient and now you have to take the gradient with respect to this [00:13:16] the gradient with respect to this indicator function [00:13:19] indicator function and now that's where things go uh wrong [00:13:22] and now that's where things go uh wrong so if you remember what this [00:13:25] so if you remember what this the loss looks like [00:13:27] the loss looks like it's looks like this step function [00:13:30] it's looks like this step function right and what's the gradient of this [00:13:31] right and what's the gradient of this function [00:13:32] function well [00:13:33] well it's zero almost everywhere it's zero [00:13:36] it's zero almost everywhere it's zero zero zero zero zero and then there's [00:13:37] zero zero zero zero and then there's this discontinuity where it's undefined [00:13:39] this discontinuity where it's undefined and then zero zero zero zero [00:13:42] and then zero zero zero zero so remember what gradient descent's [00:13:43] so remember what gradient descent's trying to do it computes the gradient [00:13:45] trying to do it computes the gradient and then it moves in that direction and [00:13:47] and then it moves in that direction and if the gradient is zero brain descent [00:13:49] if the gradient is zero brain descent just gets stuck and it can't go anywhere [00:13:51] just gets stuck and it can't go anywhere so gradient descent will not work [00:13:54] so gradient descent will not work on the xeron loss [00:13:56] on the xeron loss so one kind of technical note is that if [00:13:59] so one kind of technical note is that if someone asks you why can't you do [00:14:00] someone asks you why can't you do gradient descent on the zero and loss uh [00:14:03] gradient descent on the zero and loss uh initial reaction might be because it's [00:14:04] initial reaction might be because it's not differentiable and that is true it's [00:14:07] not differentiable and that is true it's not differentiable but it's only [00:14:08] not differentiable but it's only differentiable not differentiable at one [00:14:10] differentiable not differentiable at one point [00:14:11] point the real reason is that the gradient is [00:14:13] the real reason is that the gradient is zero everywhere and with a zero gradient [00:14:15] zero everywhere and with a zero gradient you just can't make any progress [00:14:19] so how do you fix this there's a few [00:14:21] so how do you fix this there's a few things you can do but one example is [00:14:25] things you can do but one example is what is called the hinge loss so [00:14:26] what is called the hinge loss so pictorially the hinge loss is just [00:14:28] pictorially the hinge loss is just another loss function [00:14:30] another loss function um the one in green here [00:14:33] um the one in green here and i'm plotting it on this margin [00:14:35] and i'm plotting it on this margin versus loss plot the zero velocity looks [00:14:38] versus loss plot the zero velocity looks like this and the hinge loss looks like [00:14:40] like this and the hinge loss looks like that [00:14:40] that so it's the maximum of two lines one is [00:14:44] so it's the maximum of two lines one is this descending line and one is this [00:14:46] this descending line and one is this flat [00:14:47] flat line at zero [00:14:49] line at zero okay so formally what is this so the [00:14:51] okay so formally what is this so the hinge loss [00:14:53] hinge loss is [00:14:54] is equal to [00:14:56] equal to the max over two things the first is [00:14:59] the max over two things the first is um one minus [00:15:01] um one minus the margin [00:15:02] the margin okay so this complicated expression is [00:15:04] okay so this complicated expression is just the margin [00:15:06] just the margin and the zero function [00:15:08] and the zero function okay corresponding these two arguments [00:15:10] okay corresponding these two arguments to the max corresponding to these two [00:15:13] to the max corresponding to these two regions of [00:15:15] regions of the the hinge loss [00:15:17] the the hinge loss okay so let's interpret this a little [00:15:19] okay so let's interpret this a little bit so [00:15:20] bit so if the margin is greater than or equal [00:15:23] if the margin is greater than or equal to one [00:15:24] to one then the hinge loss is zero [00:15:27] then the hinge loss is zero but once the margin starts dipping below [00:15:29] but once the margin starts dipping below one [00:15:30] one then the hinge loss starts growing [00:15:32] then the hinge loss starts growing linearly [00:15:34] linearly with [00:15:34] with the margin violation [00:15:37] the margin violation and why is there a one here and not at [00:15:39] and why is there a one here and not at zero um well this is because we asked [00:15:42] zero um well this is because we asked the classifier to predict not only [00:15:43] the classifier to predict not only correctly but by a positive margin of [00:15:46] correctly but by a positive margin of safety [00:15:48] safety and just an aside this one could really [00:15:50] and just an aside this one could really be two or three or any number as long as [00:15:52] be two or three or any number as long as it's positive and its magnitude [00:15:55] it's positive and its magnitude effectively determines uh the [00:15:57] effectively determines uh the regularization strength if you're using [00:15:59] regularization strength if you're using regularizers [00:16:00] regularizers don't worry if you didn't get that [00:16:04] don't worry if you didn't get that okay so also notice that the hinge loss [00:16:06] okay so also notice that the hinge loss is an upper bound on the zero loss so [00:16:08] is an upper bound on the zero loss so this is cool because suppose you [00:16:10] this is cool because suppose you optimize the hinge loss and you drive it [00:16:12] optimize the hinge loss and you drive it down driving down this thing is going to [00:16:14] down driving down this thing is going to start pushing on the zero loss [00:16:16] start pushing on the zero loss more or less and particularly if you get [00:16:18] more or less and particularly if you get this hinge loss of zero [00:16:20] this hinge loss of zero then what is the zero loss well it's [00:16:22] then what is the zero loss well it's also going to be zero so that's that's a [00:16:25] also going to be zero so that's that's a nice fact [00:16:27] so here's a minor digression there's a [00:16:30] so here's a minor digression there's a lot of other loss functions here is the [00:16:32] lot of other loss functions here is the logistic loss and we can just plot it on [00:16:34] logistic loss and we can just plot it on this diagram and you see that the [00:16:35] this diagram and you see that the logistic loss doesn't have this kink in [00:16:38] logistic loss doesn't have this kink in it it has a smooth transition between [00:16:41] it it has a smooth transition between something that's growing linearly to [00:16:43] something that's growing linearly to something that fades away to zero [00:16:46] something that fades away to zero and the [00:16:47] and the the key property of the logistic loss is [00:16:50] the key property of the logistic loss is that even if you were out here [00:16:52] that even if you were out here so if you have a margin of 2 [00:16:55] so if you have a margin of 2 then you're classifying correctly and [00:16:56] then you're classifying correctly and the hinge loss would say you get zero [00:16:58] the hinge loss would say you get zero loss and you don't need to do anything [00:16:59] loss and you don't need to do anything but the logistic loss is greedy it says [00:17:02] but the logistic loss is greedy it says well you still have a little bit of a [00:17:05] well you still have a little bit of a loss and if you try to minimize the [00:17:07] loss and if you try to minimize the logistic loss you're just going to try [00:17:09] logistic loss you're just going to try to [00:17:09] to keep on pushing this margin far out as [00:17:13] keep on pushing this margin far out as possible [00:17:15] possible so the logistic loss is differentiable [00:17:19] so the logistic loss is differentiable everywhere and smooth and it's nice and [00:17:21] everywhere and smooth and it's nice and it's typically known as logistic [00:17:23] it's typically known as logistic regression because it has connections to [00:17:25] regression because it has connections to probability [00:17:26] probability okay so let's now go back to the hinge [00:17:28] okay so let's now go back to the hinge loss [00:17:30] loss here is our friend the hinge loss [00:17:32] here is our friend the hinge loss and here is the expression for the hinge [00:17:34] and here is the expression for the hinge loss and remember it's the maximum of [00:17:37] loss and remember it's the maximum of two expressions [00:17:38] two expressions um this decreasing align part and then [00:17:42] um this decreasing align part and then the zero part in orange and blue [00:17:44] the zero part in orange and blue respectively [00:17:46] respectively okay so now if we want to apply gradient [00:17:48] okay so now if we want to apply gradient descent to the hinge loss we have to [00:17:50] descent to the hinge loss we have to take the gradient [00:17:51] take the gradient so how do we take the gradient so the [00:17:53] so how do we take the gradient so the gradient of the loss [00:17:56] gradient of the loss hinge is equal to [00:17:59] hinge is equal to and now we have this max thing okay [00:18:01] and now we have this max thing okay which might be a little bit scary but if [00:18:03] which might be a little bit scary but if you look up here [00:18:05] you look up here we can just do this kind of visually so [00:18:07] we can just do this kind of visually so what is the slope here well the slope [00:18:10] what is the slope here well the slope here is whatever the slope of the orange [00:18:12] here is whatever the slope of the orange part is and what is the slope here it's [00:18:15] part is and what is the slope here it's the slope of this [00:18:16] the slope of this blue part [00:18:18] blue part and so now we just have to switch [00:18:19] and so now we just have to switch between the two cases [00:18:21] between the two cases so in particular if [00:18:24] so in particular if the margin [00:18:26] the margin is orange apart [00:18:28] is orange apart is greater than zero [00:18:31] is greater than zero that means we're in this region [00:18:34] that means we're in this region and then the gradient is just going to [00:18:36] and then the gradient is just going to be the gradient of this expression one's [00:18:39] be the gradient of this expression one's a constant it's so it's going to be [00:18:42] a constant it's so it's going to be minus [00:18:43] minus we're differentiating with respect to w [00:18:46] we're differentiating with respect to w so and phi of x y is a constant so it's [00:18:49] so and phi of x y is a constant so it's just going to be phi of x y [00:18:52] just going to be phi of x y and then if this condition doesn't hold [00:18:55] and then if this condition doesn't hold otherwise that means we're in this [00:18:57] otherwise that means we're in this region [00:18:58] region and what is the gradient of zero well [00:19:01] and what is the gradient of zero well that's the world's easiest differential [00:19:03] that's the world's easiest differential uh calculus problem and it's uh zero [00:19:08] uh calculus problem and it's uh zero okay so this is the gradient of the [00:19:09] okay so this is the gradient of the hinge loss [00:19:11] hinge loss and just to kind of sanity check things [00:19:13] and just to kind of sanity check things so if you have a pick up an example [00:19:16] so if you have a pick up an example and it's on this side over here then the [00:19:19] and it's on this side over here then the gradient is going to be zero and you're [00:19:21] gradient is going to be zero and you're not going to update your weights [00:19:23] not going to update your weights on the other hand if you are over here [00:19:26] on the other hand if you are over here then the gradient will be non-zero in [00:19:28] then the gradient will be non-zero in particular it's going to be minus v of x [00:19:30] particular it's going to be minus v of x y [00:19:34] okay so now let's put things together [00:19:36] okay so now let's put things together and revisit our [00:19:38] and revisit our example so here's the purple classifier [00:19:41] example so here's the purple classifier over here [00:19:43] over here and [00:19:44] and here we have some training data and [00:19:46] here we have some training data and we're going to try to compute the hinge [00:19:48] we're going to try to compute the hinge loss on this training data along with [00:19:50] loss on this training data along with its gradient [00:19:51] its gradient okay so [00:19:52] okay so remember the hinge loss is this [00:19:54] remember the hinge loss is this expression [00:19:56] expression so let's look at the first point 0 2 [00:19:59] so let's look at the first point 0 2 0 2 is here and it's labeled as a [00:20:01] 0 2 is here and it's labeled as a positive so if you go and you plug [00:20:05] positive so if you go and you plug that point into the hinge loss then you [00:20:08] that point into the hinge loss then you get a max over one minus the margin [00:20:12] get a max over one minus the margin and zero so what is the margin here [00:20:15] and zero so what is the margin here well it's this dot product which happens [00:20:17] well it's this dot product which happens to be two [00:20:19] to be two so we have 1 minus 2 minus 1 [00:20:22] so we have 1 minus 2 minus 1 and my max of minus 1 and 0 is 0. and [00:20:25] and my max of minus 1 and 0 is 0. and that agrees with our intuition that the [00:20:28] that agrees with our intuition that the loss here should be 0 because it's [00:20:31] loss here should be 0 because it's correctly classified and correctly [00:20:33] correctly classified and correctly classified by a margin of 2. [00:20:36] classified by a margin of 2. and now let's look at the second point [00:20:38] and now let's look at the second point minus 2 [00:20:40] minus 2 0 [00:20:42] 0 so if we compute the loss here [00:20:44] so if we compute the loss here we see that the loss is actually 2. [00:20:47] we see that the loss is actually 2. so notice that even though we are [00:20:50] so notice that even though we are getting [00:20:52] getting this oh so this loss is actually 2 [00:20:55] this oh so this loss is actually 2 and that makes sense because we [00:20:57] and that makes sense because we misclassified this point [00:20:59] misclassified this point so now if we look at the third [00:21:02] so now if we look at the third point here [00:21:04] point here minus 1 [00:21:05] minus 1 sorry 1 minus 1 [00:21:08] sorry 1 minus 1 then the loss on this example is 1.5 [00:21:11] then the loss on this example is 1.5 so notice that even though we're [00:21:12] so notice that even though we're classifying this point correctly we're [00:21:15] classifying this point correctly we're still incurring a lot because the margin [00:21:17] still incurring a lot because the margin was only 0.5 and didn't meet the [00:21:21] was only 0.5 and didn't meet the threshold [00:21:23] okay so [00:21:25] okay so or i guess the loss is maybe 1.5 [00:21:29] or i guess the loss is maybe 1.5 sorry the margin is 0.5 but the loss is [00:21:31] sorry the margin is 0.5 but the loss is 1.5 [00:21:33] 1.5 so [00:21:34] so now we can also compute the gradients [00:21:37] now we can also compute the gradients here so the loss on the first [00:21:39] here so the loss on the first a point is zero because the loss is zero [00:21:41] a point is zero because the loss is zero and generally [00:21:43] and generally not always if the loss is zero then the [00:21:45] not always if the loss is zero then the gradient will be zero as well [00:21:48] gradient will be zero as well and on the second [00:21:49] and on the second one the loss is not zero so we have a [00:21:52] one the loss is not zero so we have a non-zero gradient which is [00:21:56] non-zero gradient which is minus of e of x [00:21:58] minus of e of x y so it's this [00:22:00] y so it's this part [00:22:01] part um [00:22:03] times this minus sign [00:22:05] times this minus sign and the third [00:22:07] and the third point [00:22:08] point also has positive loss so it has a [00:22:10] also has positive loss so it has a positive gradient 1 minus 1. [00:22:14] positive gradient 1 minus 1. now [00:22:15] now we can compute the train loss which is [00:22:17] we can compute the train loss which is the average over the losses that gives [00:22:19] the average over the losses that gives us 1.17 and the gradient of the train [00:22:22] us 1.17 and the gradient of the train loss is just the average of the [00:22:24] loss is just the average of the gradients and that gives us 1 minus 0.33 [00:22:32] okay so [00:22:33] okay so let us now move on let's concretize this [00:22:36] let us now move on let's concretize this in python [00:22:39] in python okay so let's uh remember last time [00:22:43] okay so let's uh remember last time we coded up gradient descent for [00:22:45] we coded up gradient descent for uh [00:22:48] linear regression so now i'm gonna just [00:22:50] linear regression so now i'm gonna just copy this [00:22:51] copy this um [00:22:52] um object against that send hinge [00:22:55] object against that send hinge and do it for the hinge loss okay so i'm [00:22:58] and do it for the hinge loss okay so i'm going to use this as a starting point [00:23:00] going to use this as a starting point and [00:23:01] and i'm going to just change a few things um [00:23:06] i'm going to just change a few things um let's [00:23:07] let's change the training examples because now [00:23:09] change the training examples because now we're working with this training data [00:23:12] we're working with this training data so we have uh just to keep track of [00:23:14] so we have uh just to keep track of things so this is x y pairs [00:23:17] things so this is x y pairs so x now [00:23:18] so x now is um [00:23:21] is um zero two [00:23:23] zero two one [00:23:24] one and then the second point is minus two [00:23:26] and then the second point is minus two zero [00:23:27] zero and the third point is one minus one [00:23:30] and the third point is one minus one and so we have three points x y where x [00:23:33] and so we have three points x y where x is uh uh [00:23:34] is uh uh triple [00:23:36] triple okay so phi [00:23:38] okay so phi is uh just going to be [00:23:41] is uh just going to be um [00:23:42] um you know x [00:23:45] and [00:23:46] and uh [00:23:47] uh the wave dimension of the white vector [00:23:49] the wave dimension of the white vector is still two [00:23:50] is still two and now the key thing we have to do is [00:23:52] and now the key thing we have to do is change the uh the definition of the loss [00:23:57] change the uh the definition of the loss so now let us see so we before we have [00:24:01] so now let us see so we before we have the average over a sum here and instead [00:24:04] the average over a sum here and instead of the square loss i'm going to make [00:24:07] of the square loss i'm going to make this the hinge loss so the hinge loss is [00:24:10] this the hinge loss so the hinge loss is max over 1 minus the margin [00:24:16] okay and 0. [00:24:18] okay and 0. all right so this is max over one minus [00:24:21] all right so this is max over one minus the margin and zero [00:24:24] the margin and zero and the gradient of that [00:24:26] and the gradient of that um is going to be [00:24:29] um is going to be so [00:24:30] so um let's actually just copy this down [00:24:34] um let's actually just copy this down let's delete this [00:24:36] let's delete this to not confuse ourselves [00:24:38] to not confuse ourselves um so remember if the [00:24:41] um so remember if the this first expression [00:24:43] this first expression is greater than 0 [00:24:46] is greater than 0 then the gradient is minus [00:24:50] then the gradient is minus phi of x times y [00:24:53] phi of x times y if [00:24:53] if we're on that side of [00:24:56] we're on that side of the curve and otherwise it's just going [00:24:58] the curve and otherwise it's just going to be [00:24:59] to be zero [00:25:02] okay so that's it we just changed the [00:25:05] okay so that's it we just changed the training examples and changed [00:25:07] training examples and changed the [00:25:08] the definition of the loss function [00:25:11] definition of the loss function and [00:25:12] and the optimization algorithm we don't [00:25:14] the optimization algorithm we don't actually have to change at all [00:25:16] actually have to change at all okay [00:25:17] okay so let's run this and see what we get [00:25:24] okay so here's where you said it starts [00:25:26] okay so here's where you said it starts out with w equals zero and then it [00:25:29] out with w equals zero and then it starts moving w to [00:25:31] starts moving w to 0.5 uh minus 0.55 [00:25:34] 0.5 uh minus 0.55 you see that the the train loss is [00:25:37] you see that the the train loss is decreasing nicely and actually in this [00:25:39] decreasing nicely and actually in this case it gets to [00:25:40] case it gets to zero which means that we [00:25:44] zero which means that we remember the hinge loss is upper bound [00:25:45] remember the hinge loss is upper bound on zero and loss so that means the zero [00:25:47] on zero and loss so that means the zero loss is also zero [00:25:49] loss is also zero and the gradient also vanishes [00:25:51] and the gradient also vanishes and becomes zero meaning that we [00:25:53] and becomes zero meaning that we converged [00:25:56] okay so just to recap all we did here [00:25:59] okay so just to recap all we did here was change the training examples the [00:26:01] was change the training examples the feature riser [00:26:02] feature riser and uh redefine the loss and it's great [00:26:05] and uh redefine the loss and it's great that we didn't have to touch the [00:26:06] that we didn't have to touch the optimization algorithm because this was [00:26:09] optimization algorithm because this was meant to be a generic piece of code [00:26:13] all right so [00:26:15] all right so let us summarize [00:26:18] let us summarize and in particular i'm going to contrast [00:26:20] and in particular i'm going to contrast regression with classification since [00:26:22] regression with classification since we've seen two of them so far [00:26:25] we've seen two of them so far so [00:26:26] so the key [00:26:28] the key quantity [00:26:29] quantity that drives the prediction in both cases [00:26:31] that drives the prediction in both cases is the score the dot product between the [00:26:33] is the score the dot product between the weight vector and the feature vector [00:26:36] weight vector and the feature vector and in regression the prediction is [00:26:38] and in regression the prediction is exactly just a raw score while in [00:26:41] exactly just a raw score while in classification you stick it through the [00:26:43] classification you stick it through the sine function so you get a one [00:26:45] sine function so you get a one or a minus one [00:26:48] or a minus one how the prediction is related to the [00:26:50] how the prediction is related to the target [00:26:53] target well in regression we looked at the [00:26:55] well in regression we looked at the residual which was the score minus y [00:26:58] residual which was the score minus y and in classification we're looking at [00:27:00] and in classification we're looking at the margin so in regression low residual [00:27:03] the margin so in regression low residual was good [00:27:04] was good in classification high margin is good [00:27:07] in classification high margin is good because we want score and y to have the [00:27:08] because we want score and y to have the same sign [00:27:12] using those quantities we can define [00:27:14] using those quantities we can define loss functions [00:27:15] loss functions so in regression we looked at the square [00:27:17] so in regression we looked at the square laws but as i mentioned briefly you can [00:27:19] laws but as i mentioned briefly you can also do the absolute deviation loss in [00:27:22] also do the absolute deviation loss in classification the story becomes a [00:27:24] classification the story becomes a little bit stranger [00:27:26] little bit stranger because we generally care about the zero [00:27:28] because we generally care about the zero loss that's our misclassification rate [00:27:30] loss that's our misclassification rate but we can't optimize it so we have to [00:27:32] but we can't optimize it so we have to come up with a surrogate loss function [00:27:34] come up with a surrogate loss function like the hinge loss which we went into [00:27:37] like the hinge loss which we went into depth and the logistic loss which we [00:27:39] depth and the logistic loss which we briefly mentioned [00:27:41] briefly mentioned and given the loss functions [00:27:43] and given the loss functions in both cases we use the gradient [00:27:45] in both cases we use the gradient descent algorithm to optimize the loss [00:27:48] descent algorithm to optimize the loss function [00:27:50] function that's it that concludes the unit on [00:27:53] that's it that concludes the unit on linear classification thanks for [00:27:55] linear classification thanks for listening ================================================================================ LECTURE 007 ================================================================================ Artificial Intelligence & Machine Learning 4 - Stochastic Gradient Descent | Stanford CS221 (2021) Source: https://www.youtube.com/watch?v=bl2WgBLH0tI --- Transcript [00:00:05] hi in this lecture i'm going to talk [00:00:08] hi in this lecture i'm going to talk about sarcastic gradient descent [00:00:11] about sarcastic gradient descent so recall grading ascent which was the [00:00:13] so recall grading ascent which was the optimization [00:00:14] optimization algorithm that we decided on for [00:00:16] algorithm that we decided on for optimizing all our training losses for [00:00:20] optimizing all our training losses for classification and regression [00:00:22] classification and regression so recall that the training loss is an [00:00:25] so recall that the training loss is an average [00:00:26] average over all the examples in the training [00:00:28] over all the examples in the training set [00:00:29] set of the per example losses [00:00:33] of the per example losses so graded descent [00:00:34] so graded descent works as follows [00:00:36] works as follows we're going to initialize the weight [00:00:37] we're going to initialize the weight vector to 0 and then we're going to [00:00:39] vector to 0 and then we're going to repeat t times [00:00:42] repeat t times and do the following update [00:00:44] and do the following update we're going to take [00:00:45] we're going to take the old wave vector and subtract out the [00:00:48] the old wave vector and subtract out the step size times the gradient over the [00:00:51] step size times the gradient over the training loss [00:00:53] training loss and now this looks very simple but if [00:00:56] and now this looks very simple but if you unpack [00:00:57] you unpack what this gradient is it's actually an [00:01:00] what this gradient is it's actually an average over the gradients of the per [00:01:03] average over the gradients of the per example losses [00:01:05] example losses so now imagine you have a data set with [00:01:07] so now imagine you have a data set with a million examples computing the single [00:01:10] a million examples computing the single gradient is going to include involve [00:01:14] gradient is going to include involve looping over all million examples just [00:01:16] looping over all million examples just to get a single update [00:01:18] to get a single update and then you take a step [00:01:20] and then you take a step and then you have to do it all over [00:01:22] and then you have to do it all over again [00:01:23] again so this is why gradient descent is slow [00:01:26] so this is why gradient descent is slow because it requires going through all [00:01:27] because it requires going through all the training examples just to make one [00:01:30] the training examples just to make one update [00:01:31] update so what can we do about this [00:01:34] so what can we do about this so the answer is stochastic gradient [00:01:37] so the answer is stochastic gradient descent [00:01:38] descent so here is the same training loss [00:01:40] so here is the same training loss function and stochastic gradient set is [00:01:42] function and stochastic gradient set is going to work as follows so we [00:01:44] going to work as follows so we initialize the weight vectors to zero [00:01:47] initialize the weight vectors to zero and then we iterate t times [00:01:50] and then we iterate t times and now [00:01:51] and now on each [00:01:52] on each epoch we're going to [00:01:55] epoch we're going to loop [00:01:56] loop over [00:01:58] over the trainings examples and then perform [00:02:01] the trainings examples and then perform an update [00:02:03] an update on [00:02:03] on the individual losses [00:02:06] the individual losses okay so here [00:02:08] okay so here instead of [00:02:09] instead of going through the training set [00:02:12] going through the training set and performing one [00:02:14] and performing one update [00:02:15] update we're going to go through the training [00:02:16] we're going to go through the training set and after each example we're going [00:02:19] set and after each example we're going to perform an update [00:02:21] to perform an update and this is going to be a lot faster in [00:02:23] and this is going to be a lot faster in terms of [00:02:25] terms of having the number of updates be large [00:02:28] having the number of updates be large of course there is a trade-off because [00:02:30] of course there is a trade-off because each update itself is not going to be as [00:02:33] each update itself is not going to be as high quality because it's only consists [00:02:36] high quality because it's only consists of one example as opposed to all of the [00:02:38] of one example as opposed to all of the examples [00:02:41] examples and that's it for stochastic gradient [00:02:43] and that's it for stochastic gradient descent i want to talk about one small [00:02:45] descent i want to talk about one small note [00:02:46] note which is the step size [00:02:49] which is the step size so recall the update is [00:02:51] so recall the update is includes a step size which determines [00:02:54] includes a step size which determines how [00:02:55] how far in the direction of the gradient or [00:02:57] far in the direction of the gradient or away from the gradient do you want to [00:02:59] away from the gradient do you want to move [00:03:01] move okay so what should ada be and in [00:03:03] okay so what should ada be and in general there's not really a one [00:03:05] general there's not really a one satisfying answer to this and it's [00:03:07] satisfying answer to this and it's usually a hyper parameter that has to be [00:03:09] usually a hyper parameter that has to be tuned via trial and error but here's [00:03:12] tuned via trial and error but here's some general guidance here so the step [00:03:14] some general guidance here so the step size has to be greater or equal to zero [00:03:17] size has to be greater or equal to zero and if it is small [00:03:19] and if it is small that means you're taking little little [00:03:20] that means you're taking little little steps but that means your algorithm is [00:03:23] steps but that means your algorithm is going to be more stable and less likely [00:03:24] going to be more stable and less likely to bounce around [00:03:26] to bounce around and as you increase ada larger and [00:03:28] and as you increase ada larger and larger [00:03:30] larger then you're taking more aggressive steps [00:03:32] then you're taking more aggressive steps so you can move faster [00:03:34] so you can move faster but perhaps at the risk of being a bit [00:03:36] but perhaps at the risk of being a bit more unstable [00:03:38] more unstable so two typical strategies for setting [00:03:41] so two typical strategies for setting the step size one is using just a [00:03:42] the step size one is using just a constant step size [00:03:44] constant step size we've used so far a equals 0.1 kind of [00:03:47] we've used so far a equals 0.1 kind of arbitrary number [00:03:49] arbitrary number um or you can do a decreasing step size [00:03:52] um or you can do a decreasing step size rate where ada is one over [00:03:55] rate where ada is one over the number of updates [00:03:57] the number of updates that you've made [00:03:58] that you've made and the intuition here is that in the [00:04:01] and the intuition here is that in the beginning [00:04:02] beginning you're far away from the optimum so [00:04:04] you're far away from the optimum so you're going to move quickly you want to [00:04:06] you're going to move quickly you want to move quickly but as soon as you start [00:04:09] move quickly but as soon as you start getting closer although you want to slow [00:04:11] getting closer although you want to slow down [00:04:15] so now let us [00:04:17] so now let us explore stochastic gradient descent um [00:04:20] explore stochastic gradient descent um in python i'm going to code it up and uh [00:04:23] in python i'm going to code it up and uh see [00:04:24] see you know what happens [00:04:26] you know what happens okay [00:04:28] okay so remember last time we degraded [00:04:30] so remember last time we degraded descent so i'm going to [00:04:32] descent so i'm going to copy this code over [00:04:34] copy this code over descent hinge [00:04:36] descent hinge uh sorry i just sent stochastic gradient [00:04:39] uh sorry i just sent stochastic gradient descent [00:04:41] descent um and [00:04:43] um and what we're going to do is modify this [00:04:45] what we're going to do is modify this code to make it uh do staccato screen [00:04:48] code to make it uh do staccato screen descent [00:04:50] descent okay so just recall last time we set up [00:04:53] okay so just recall last time we set up some training examples um we define the [00:04:55] some training examples um we define the loss function and then we have this [00:04:57] loss function and then we have this generic optimization algorithm [00:05:00] generic optimization algorithm so now to really tell the difference [00:05:02] so now to really tell the difference between gradient descent and category [00:05:04] between gradient descent and category and descent i'm going to make a larger [00:05:07] and descent i'm going to make a larger data set and i'm going to do it in a way [00:05:09] data set and i'm going to do it in a way so that it's large but it's structured [00:05:11] so that it's large but it's structured so we know what the right answer is [00:05:13] so we know what the right answer is because otherwise how can we verify that [00:05:15] because otherwise how can we verify that it did the right thing [00:05:17] it did the right thing to do this uh this is kind of a just a [00:05:19] to do this uh this is kind of a just a general trick is that you kind of [00:05:22] general trick is that you kind of generate synthetic data from kind of a [00:05:25] generate synthetic data from kind of a ground truth and then you try to recover [00:05:26] ground truth and then you try to recover that ground truth so suppose we had some [00:05:29] that ground truth so suppose we had some true way vector [00:05:31] true way vector um this is our secret code which is [00:05:33] um this is our secret code which is unknown to the learning algorithm but we [00:05:35] unknown to the learning algorithm but we hope that learning algorithm will [00:05:37] hope that learning algorithm will recover this [00:05:39] recover this and then we're going to define a [00:05:40] and then we're going to define a function called generate which [00:05:43] function called generate which uses this true w to generate an example [00:05:46] uses this true w to generate an example so [00:05:47] so here i'm going to generate x [00:05:50] here i'm going to generate x and i'm going to just sample randomly um [00:05:53] and i'm going to just sample randomly um a five-dimensional uh weight vector [00:05:57] a five-dimensional uh weight vector oh sorry uh a [00:05:58] oh sorry uh a input point [00:06:00] input point and then i'm going to set y [00:06:02] and then i'm going to set y to be [00:06:03] to be true w dot x [00:06:06] true w dot x so true [00:06:08] so true the examples i'm going to generate are [00:06:09] the examples i'm going to generate are generated from the true ray vector and [00:06:12] generated from the true ray vector and then i'm just going to add some [00:06:14] then i'm just going to add some noise [00:06:18] okay and then i'm going to [00:06:22] okay and then i'm going to set the training examples to be just [00:06:24] set the training examples to be just generate [00:06:25] generate uh for [00:06:27] uh for let's say [00:06:29] let's say one let's do one million examples that's [00:06:32] one let's do one million examples that's a lot of examples [00:06:34] a lot of examples all right [00:06:35] all right so let's see what this data looks like [00:06:37] so let's see what this data looks like so i'm going to print out x and y [00:06:43] and just to see what is coming out [00:06:48] oops um i had a typo here [00:06:53] okay so here is the data set that we are [00:06:56] okay so here is the data set that we are going to train on [00:06:58] going to train on so example x is a five [00:07:01] so example x is a five vector and the output is a scalar so [00:07:05] vector and the output is a scalar so there's a lot of examples here [00:07:08] there's a lot of examples here okay [00:07:10] okay all right so i need to update the [00:07:12] all right so i need to update the feature vector um [00:07:14] feature vector um to be just x [00:07:16] to be just x identity and here i'm going to the [00:07:19] identity and here i'm going to the initial way vector has to match the [00:07:21] initial way vector has to match the dimensionality of the of the true wave [00:07:23] dimensionality of the of the true wave vector [00:07:24] vector and then everything else the training [00:07:25] and then everything else the training loss and gradient are i'm going to leave [00:07:28] loss and gradient are i'm going to leave along okay [00:07:31] along okay so now let's uh [00:07:33] so now let's uh uncomment this line [00:07:36] uncomment this line and let's run gradient descent let's see [00:07:39] and let's run gradient descent let's see what happens here [00:07:42] okay so it's going to [00:07:44] okay so it's going to generate the data and now [00:07:46] generate the data and now to compute a single gradient it has to [00:07:48] to compute a single gradient it has to enumerate [00:07:50] enumerate over 1 million examples [00:07:53] over 1 million examples so this is going to be quite slow i'll [00:07:55] so this is going to be quite slow i'll finish the first epoch [00:07:57] finish the first epoch and [00:07:58] and it has some some values [00:08:02] it has some some values so [00:08:02] so and then the second epoch [00:08:05] and then the second epoch and it seems like it's making some [00:08:08] and it seems like it's making some progress [00:08:09] progress remember we want to see if this can hit [00:08:10] remember we want to see if this can hit one two three four five [00:08:12] one two three four five um the loss is going down [00:08:15] um the loss is going down which is good and it seems like it's [00:08:17] which is good and it seems like it's moving in the right direction but um [00:08:20] moving in the right direction but um it's it's pretty slow and i'm just going [00:08:22] it's it's pretty slow and i'm just going to stop it there because i don't want to [00:08:23] to stop it there because i don't want to wait forever [00:08:26] wait forever okay so now let's do sarcastic gradient [00:08:27] okay so now let's do sarcastic gradient descent [00:08:29] descent so [00:08:31] so first i need to [00:08:32] first i need to change the interface because gradient [00:08:35] change the interface because gradient descent only had access to f and the [00:08:37] descent only had access to f and the gradient of f [00:08:38] gradient of f and now stochastic gradient descent [00:08:40] and now stochastic gradient descent needs to access individual losses [00:08:43] needs to access individual losses so what i'm going to do is i'm going to [00:08:45] so what i'm going to do is i'm going to define a stochastic [00:08:49] define a stochastic or actually i'll just call this the loss [00:08:51] or actually i'll just call this the loss of w [00:08:53] of w i'm going to [00:08:55] i'm going to use i [00:08:56] use i here to denote [00:08:58] here to denote an index into one of these terms in the [00:09:01] an index into one of these terms in the sum [00:09:02] sum so what the loss is going to be [00:09:04] so what the loss is going to be is [00:09:05] is it's just going to be one [00:09:07] it's just going to be one of these [00:09:09] of these terms [00:09:10] terms and the term i'm going to select out is [00:09:12] and the term i'm going to select out is just the i [00:09:13] just the i data point [00:09:15] data point okay [00:09:16] okay and similarly [00:09:18] and similarly the the gradient [00:09:20] the the gradient of the loss [00:09:21] of the loss is [00:09:22] is going to be just [00:09:24] going to be just uh the gradient [00:09:27] but for [00:09:29] but for the ith [00:09:30] the ith data point [00:09:32] and this also takes in the index i [00:09:37] and this also takes in the index i so now if i feed in i for various values [00:09:39] so now if i feed in i for various values i can access the loss and the gradient [00:09:42] i can access the loss and the gradient of that loss function for any given [00:09:44] of that loss function for any given point vector [00:09:46] point vector all right so [00:09:48] all right so so now let's go over to the optimization [00:09:50] so now let's go over to the optimization algorithm and let me do stochastic [00:09:53] algorithm and let me do stochastic gradient descent [00:09:55] gradient descent okay so i'm going to call this [00:09:58] okay so i'm going to call this stochastic [00:09:59] stochastic gradient descent [00:10:01] gradient descent and i'm going to call this just to [00:10:03] and i'm going to call this just to distinguish things i'm going to use [00:10:05] distinguish things i'm going to use lowercase f [00:10:06] lowercase f for like individual components of the [00:10:08] for like individual components of the objective function [00:10:10] objective function okay so i'm going to initialize the [00:10:12] okay so i'm going to initialize the weight vector [00:10:14] weight vector i'm going to [00:10:16] i'm going to use a different step size here just for [00:10:19] use a different step size here just for fun [00:10:20] fun um i'm going to initialize with one [00:10:24] um i'm going to initialize with one um [00:10:26] um actually let me do this instead [00:10:29] actually let me do this instead i'm going to set the step size to be 1 [00:10:32] i'm going to set the step size to be 1 over the square root of number of [00:10:35] over the square root of number of updates [00:10:36] updates and each time i do an update [00:10:39] and each time i do an update i'm going to [00:10:40] i'm going to [Music] [00:10:41] [Music] increase the number of updates so [00:10:43] increase the number of updates so actually let me do it in this order [00:10:46] actually let me do it in this order okay so number of updates is starts at [00:10:48] okay so number of updates is starts at zero [00:10:49] zero um and then [00:10:51] um and then uh remembering sarcastic gradient [00:10:53] uh remembering sarcastic gradient descent [00:10:54] descent i'm going to [00:10:56] i'm going to loop over [00:10:59] loop over um the number [00:11:01] um the number of components of [00:11:03] of components of the objective function [00:11:05] the objective function so 1 [00:11:07] so 1 0 to m minus 1. [00:11:09] 0 to m minus 1. so another thing i'm going to have to [00:11:11] so another thing i'm going to have to pass in [00:11:12] pass in is the number of components [00:11:14] is the number of components that i'm going to use to index into f [00:11:17] that i'm going to use to index into f and so now this is f of w i gradient f [00:11:21] and so now this is f of w i gradient f of w i [00:11:23] of w i and then i'm going to move everything [00:11:25] and then i'm going to move everything inward [00:11:28] inward okay [00:11:30] okay so [00:11:31] so now to call this function [00:11:34] now to call this function i'm going to run stochastic gradient [00:11:36] i'm going to run stochastic gradient descent [00:11:37] descent and with just the loss [00:11:41] and with just the loss and [00:11:42] and the the gradient of the loss [00:11:46] the the gradient of the loss um i'm going to pass an n which is a [00:11:48] um i'm going to pass an n which is a number of [00:11:49] number of training examples and an initial weight [00:11:52] training examples and an initial weight vector [00:11:53] vector okay so let's just review what's going [00:11:55] okay so let's just review what's going on here so sarcastic gradient descent [00:11:57] on here so sarcastic gradient descent takes a function which can access [00:12:00] takes a function which can access individual components of the objective [00:12:03] individual components of the objective it initializes the weights and then [00:12:06] it initializes the weights and then uh iterates [00:12:07] uh iterates some number of times and in each epoch [00:12:10] some number of times and in each epoch it's going to [00:12:13] in loop over all the examples compute [00:12:17] in loop over all the examples compute the value compute the gradient [00:12:19] the value compute the gradient and then [00:12:20] and then it's going to [00:12:22] it's going to do a gradient update [00:12:24] do a gradient update and here i'm using the step size which [00:12:27] and here i'm using the step size which is one over the number of updates that [00:12:29] is one over the number of updates that i've made so far [00:12:32] okay so let's see [00:12:34] okay so let's see um sarcastic gradient descent in action [00:12:37] um sarcastic gradient descent in action now [00:12:38] now uh i have two returns here so that is a [00:12:42] uh i have two returns here so that is a syntax error let me fix that [00:12:46] um so now it's going through one million [00:12:49] um so now it's going through one million examples oh i need to [00:12:51] examples oh i need to import math as well [00:12:54] import math as well um [00:12:56] um so it's going to loop over 1 million [00:12:59] so it's going to loop over 1 million examples but each example it's going to [00:13:01] examples but each example it's going to perform an update [00:13:03] perform an update and so when it prints out it's going to [00:13:05] and so when it prints out it's going to have already taken 1 million steps [00:13:08] have already taken 1 million steps of stochastic radiant descent [00:13:11] of stochastic radiant descent and look what happened here [00:13:14] and look what happened here so after the first step it's already [00:13:17] so after the first step it's already quite close to one two three four five [00:13:20] quite close to one two three four five and the objective [00:13:21] and the objective uh [00:13:23] uh i guess the function value doesn't [00:13:24] i guess the function value doesn't really mean as much because it's only of [00:13:26] really mean as much because it's only of an individual point but you can see that [00:13:29] an individual point but you can see that the [00:13:30] the weight vector is converging you know [00:13:32] weight vector is converging you know quite nicely [00:13:34] quite nicely and this shows that stochastic gradient [00:13:36] and this shows that stochastic gradient descent [00:13:37] descent just even sometimes with one passable [00:13:39] just even sometimes with one passable training data can get [00:13:41] training data can get uh much closer to the optimum than if [00:13:44] uh much closer to the optimum than if you were to do [00:13:45] you were to do um many many rounds of gradient descent [00:13:50] okay so that was [00:13:52] okay so that was a sarcastic radiant descent in python [00:13:56] a sarcastic radiant descent in python so let's summarize here [00:13:58] so let's summarize here so we want to optimize this training [00:14:01] so we want to optimize this training loss which is an average over the per [00:14:04] loss which is an average over the per example losses [00:14:06] example losses and we looked at gradient descent [00:14:09] and we looked at gradient descent which takes a step [00:14:11] which takes a step on the gradient of the training loss [00:14:14] on the gradient of the training loss and we also looked at stochastic [00:14:16] and we also looked at stochastic gradient descent which picked up [00:14:18] gradient descent which picked up individual examples and updated [00:14:21] individual examples and updated on computing after computing the [00:14:23] on computing after computing the gradient of individual examples [00:14:26] gradient of individual examples and now we on this example we've shown [00:14:28] and now we on this example we've shown that sarcastic gradient descent wins and [00:14:31] that sarcastic gradient descent wins and the key idea behind stochastic updates [00:14:33] the key idea behind stochastic updates is that it's not about quality it's [00:14:35] is that it's not about quality it's about quantity [00:14:37] about quantity so maybe not a general life lesson [00:14:39] so maybe not a general life lesson but it seems like in this case it is [00:14:42] but it seems like in this case it is more wise to keep in mind what you're [00:14:45] more wise to keep in mind what you're trying to do which is optimize this [00:14:46] trying to do which is optimize this objective [00:14:48] objective rather than compute [00:14:50] rather than compute the gradient which is only a means to an [00:14:53] the gradient which is only a means to an end [00:14:55] end okay so that concludes the module on [00:14:56] okay so that concludes the module on stochastic gradient design thanks for [00:14:58] stochastic gradient design thanks for listening ================================================================================ LECTURE 008 ================================================================================ Artificial Intelligence and Machine Learning 5 - Group DRO | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=ZFK2XtWqUbw --- Transcript [00:00:05] hello in this module i'm going to first [00:00:07] hello in this module i'm going to first show you how minimizing the average [00:00:10] show you how minimizing the average error on your training examples can [00:00:12] error on your training examples can actually lead to disparities between a [00:00:14] actually lead to disparities between a performance between groups [00:00:16] performance between groups and then i'm going to show you a simple [00:00:18] and then i'm going to show you a simple approach called group distribution and [00:00:19] approach called group distribution and robust optimization that can mitigate [00:00:22] robust optimization that can mitigate some of these disparities [00:00:24] some of these disparities so let me begin with a very famous [00:00:27] so let me begin with a very famous example of disparities or inequalities [00:00:30] example of disparities or inequalities in machine learning it's called the [00:00:31] in machine learning it's called the gender shades project in this project [00:00:33] gender shades project in this project the authors collected a data set of [00:00:36] the authors collected a data set of images and faces of different [00:00:39] images and faces of different genders and different [00:00:41] genders and different skin tones [00:00:42] skin tones and then they evaluated a gender [00:00:44] and then they evaluated a gender classifier from microsoft face plus plus [00:00:48] classifier from microsoft face plus plus in ibm [00:00:49] in ibm what they found was rather striking so [00:00:51] what they found was rather striking so for a group of lighter skinned males [00:00:55] for a group of lighter skinned males the classifier was almost perfect [00:00:58] the classifier was almost perfect but if you look at the performance of [00:01:01] but if you look at the performance of those classifiers on darker skinned [00:01:03] those classifiers on darker skinned females you'll see that the accuracy are [00:01:06] females you'll see that the accuracy are much much worse [00:01:08] much much worse so this is a general problem in machine [00:01:11] so this is a general problem in machine learning which is that inequalities [00:01:13] learning which is that inequalities between different groups arise because [00:01:16] between different groups arise because machine learning is generally [00:01:18] machine learning is generally where you minimize the average loss [00:01:23] so these [00:01:26] so these inequalities can have real world [00:01:28] inequalities can have real world consequences [00:01:29] consequences so in this vivid case a black man was [00:01:32] so in this vivid case a black man was wrongly arrested due to an incorrect [00:01:35] wrongly arrested due to an incorrect match with another black man captured [00:01:36] match with another black man captured from a surveillance video and this [00:01:38] from a surveillance video and this mistake was due to a mistake made by a [00:01:41] mistake was due to a mistake made by a facial recognition system [00:01:44] facial recognition system so [00:01:44] so given what we just saw on the generous [00:01:47] given what we just saw on the generous age project we can see that lower [00:01:48] age project we can see that lower accuracies for some groups might lead to [00:01:50] accuracies for some groups might lead to more false arrests which adds to already [00:01:54] more false arrests which adds to already problematic inequalities that [00:01:56] problematic inequalities that exist in our society today [00:01:58] exist in our society today so in this module i'm going to focus on [00:02:00] so in this module i'm going to focus on this issue of performance disparities [00:02:02] this issue of performance disparities between groups and how we might be able [00:02:05] between groups and how we might be able to mitigate them [00:02:06] to mitigate them but i also want to highlight that even [00:02:09] but i also want to highlight that even if we didn't have any disparities [00:02:11] if we didn't have any disparities between groups there's a question of [00:02:13] between groups there's a question of whether facial recognition technology [00:02:15] whether facial recognition technology should be used in law enforcement or in [00:02:17] should be used in law enforcement or in surveillance or anything at all [00:02:19] surveillance or anything at all and these are big thorny ethical [00:02:22] and these are big thorny ethical questions which we're not going to [00:02:24] questions which we're not going to unfortunately be able to spend much time [00:02:25] unfortunately be able to spend much time with in this module but i just want to [00:02:28] with in this module but i just want to highlight that it's important to [00:02:29] highlight that it's important to remember that sometimes [00:02:31] remember that sometimes the issue is not with the solution but [00:02:33] the issue is not with the solution but in the framing of the problem itself [00:02:37] so [00:02:38] so general shades was an example of [00:02:40] general shades was an example of classification but to make things [00:02:42] classification but to make things simpler let us consider our friend [00:02:44] simpler let us consider our friend linear regression [00:02:45] linear regression so recall in linear aggression we start [00:02:48] so recall in linear aggression we start with a training [00:02:49] with a training set which consists of examples each [00:02:52] set which consists of examples each example has an input x and output y [00:02:56] example has an input x and output y and but in our case we're going to [00:02:57] and but in our case we're going to assume each example is also annotated [00:02:59] assume each example is also annotated with a group g [00:03:01] with a group g um so we're going to have a [00:03:04] um so we're going to have a set of let's plot this over here so [00:03:06] set of let's plot this over here so here's 1 4. [00:03:08] here's 1 4. and here's a second example 2 8 which is [00:03:11] and here's a second example 2 8 which is up here [00:03:12] up here um and then [00:03:14] um and then these examples down here are going to [00:03:16] these examples down here are going to come from group b so we're going to have [00:03:18] come from group b so we're going to have two groups a and b [00:03:20] two groups a and b and here [00:03:22] and here they are over here [00:03:24] they are over here okay so the goal of machine learning or [00:03:26] okay so the goal of machine learning or linear regression in particular is to [00:03:28] linear regression in particular is to produce a predictor [00:03:31] produce a predictor such as this one [00:03:33] such as this one and the predictor is going to take new [00:03:36] and the predictor is going to take new inputs such as 3 and produce an output [00:03:40] inputs such as 3 and produce an output such as 3.27 [00:03:43] such as 3.27 so [00:03:44] so in linear regression um we assume that [00:03:46] in linear regression um we assume that the predictor [00:03:48] the predictor has the form um a weight vector dot [00:03:52] has the form um a weight vector dot a feature vector v of x [00:03:55] a feature vector v of x in the simple example we're going to [00:03:56] in the simple example we're going to restrict ourselves to the case where the [00:03:58] restrict ourselves to the case where the the feature vector is simply the [00:04:00] the feature vector is simply the identity map just x [00:04:02] identity map just x which gives us a hypothesis class which [00:04:05] which gives us a hypothesis class which is the set of all lines that go through [00:04:07] is the set of all lines that go through the origin [00:04:08] the origin so you can think about sweeping lines [00:04:11] so you can think about sweeping lines through the origin here [00:04:13] through the origin here and the white vector is just going to be [00:04:15] and the white vector is just going to be a single number uh w [00:04:19] a single number uh w so already you can see some tension here [00:04:22] so already you can see some tension here so which web vector would you choose [00:04:24] so which web vector would you choose would you choose one that's closer to [00:04:27] would you choose one that's closer to these [00:04:28] these points in group b [00:04:30] points in group b or in group a and this tension means [00:04:33] or in group a and this tension means that we have to compromise somehow and [00:04:34] that we have to compromise somehow and exactly how we compromise is going to [00:04:37] exactly how we compromise is going to have some [00:04:38] have some implications [00:04:41] implications so notice also that the predictor [00:04:44] so notice also that the predictor doesn't use the group information it [00:04:45] doesn't use the group information it just takes an input x as before [00:04:48] just takes an input x as before what's going to use group information is [00:04:50] what's going to use group information is the learning algorithm and we'll get to [00:04:52] the learning algorithm and we'll get to that a little bit later [00:04:55] so um just as a review [00:04:58] so um just as a review for linear regression we define the loss [00:05:00] for linear regression we define the loss function of an input x y [00:05:03] function of an input x y um and a particular wave vector to be [00:05:05] um and a particular wave vector to be the simply the difference between the [00:05:07] the simply the difference between the predicted value of that classifier or [00:05:09] predicted value of that classifier or sorry progressive [00:05:11] sorry progressive f of w [00:05:12] f of w and the target value [00:05:15] and the target value y squared [00:05:18] y squared and uh remember that we defined the [00:05:20] and uh remember that we defined the training loss of a particular wave [00:05:22] training loss of a particular wave vector as follows it's going to be the [00:05:25] vector as follows it's going to be the average so one over a number of training [00:05:27] average so one over a number of training examples over some over training [00:05:29] examples over some over training examples of the per example loss [00:05:32] examples of the per example loss so visually we can [00:05:34] so visually we can see this on this plot where for each [00:05:36] see this on this plot where for each value of w [00:05:38] value of w in our case here remember w is a scalar [00:05:41] in our case here remember w is a scalar for this particular example we get a [00:05:43] for this particular example we get a loss value [00:05:44] loss value so this is the training loss which is [00:05:48] so this is the training loss which is this [00:05:48] this this curve here [00:05:51] this curve here and um [00:05:52] and um what we can do is let's practice [00:05:54] what we can do is let's practice evaluating this training loss at a [00:05:57] evaluating this training loss at a particular value of w let's say one so [00:06:00] particular value of w let's say one so this is going to take the average over [00:06:02] this is going to take the average over this data set [00:06:04] this data set and it's going to return some value [00:06:07] and it's going to return some value 7.5 [00:06:09] 7.5 okay so the loss of [00:06:11] okay so the loss of the average loss [00:06:13] the average loss at w equals 1 is 7.5 [00:06:18] so [00:06:19] so um which seems okay but now let's [00:06:22] um which seems okay but now let's remember let's pure a little bit closer [00:06:24] remember let's pure a little bit closer at how [00:06:25] at how the loss is spread across groups okay so [00:06:28] the loss is spread across groups okay so we're going to define [00:06:29] we're going to define a notion of a per-group loss so here's [00:06:32] a notion of a per-group loss so here's our training set so for group a what is [00:06:35] our training set so for group a what is this loss in group b group b what's this [00:06:37] this loss in group b group b what's this loss so formally we're going to define [00:06:39] loss so formally we're going to define the per group loss written train loss [00:06:41] the per group loss written train loss sub g for group g [00:06:43] sub g for group g g can be either a or b [00:06:45] g can be either a or b to be the average now over only those [00:06:49] to be the average now over only those examples in that group so this notation [00:06:51] examples in that group so this notation d train of g is the set of examples in [00:06:54] d train of g is the set of examples in group g [00:06:57] group g and of the for example loss again [00:07:00] and of the for example loss again okay so um we're gonna plot these two [00:07:03] okay so um we're gonna plot these two losses on this uh curve here and we see [00:07:07] losses on this uh curve here and we see that um [00:07:08] that um we have [00:07:10] we have um these two plots so [00:07:13] um these two plots so train loss a looks like this and train [00:07:16] train loss a looks like this and train loss b looks like that [00:07:19] loss b looks like that and we can practice evaluating um these [00:07:22] and we can practice evaluating um these different loss functions at our uh [00:07:24] different loss functions at our uh example weight vector one here so train [00:07:27] example weight vector one here so train loss a is going to be an average [00:07:29] loss a is going to be an average remember only over the examples in group [00:07:32] remember only over the examples in group a [00:07:33] a and that's going to give us 22.5 [00:07:36] and that's going to give us 22.5 you can see [00:07:37] you can see it looks like about 22.5 here [00:07:40] it looks like about 22.5 here and then what about b [00:07:42] and then what about b so b actually gets a loss of zero [00:07:46] so b actually gets a loss of zero which you can see at this point [00:07:48] which you can see at this point so you can see that we have a single [00:07:51] so you can see that we have a single wave vector [00:07:53] wave vector one [00:07:54] one gets very different losses on the two [00:07:57] gets very different losses on the two data sets [00:07:58] data sets on the two groups a is doing a lot worse [00:08:01] on the two groups a is doing a lot worse it has 22.5 and b is doing [00:08:04] it has 22.5 and b is doing much better it has a zero which is the [00:08:06] much better it has a zero which is the minimum loss you can hope for [00:08:09] minimum loss you can hope for so this is an example of a disparity [00:08:11] so this is an example of a disparity between [00:08:12] between if we were to choose wave vector one [00:08:14] if we were to choose wave vector one there would be a huge disparity between [00:08:16] there would be a huge disparity between the performance on [00:08:17] the performance on these two groups [00:08:20] so um [00:08:22] so um so we can look at the losses of [00:08:23] so we can look at the losses of different groups but it'll be helpful to [00:08:25] different groups but it'll be helpful to kind of summarize that as a single [00:08:27] kind of summarize that as a single number [00:08:28] number now we're going to capture by by a [00:08:30] now we're going to capture by by a quantity called the maximum group loss [00:08:32] quantity called the maximum group loss and you might guess from the name that [00:08:35] and you might guess from the name that the maximum group loss written train [00:08:38] the maximum group loss written train loss max [00:08:39] loss max is simply just going to be the maximum [00:08:43] is simply just going to be the maximum over all groups of the per group loss [00:08:47] over all groups of the per group loss and so visually what this looks like is [00:08:49] and so visually what this looks like is as follows [00:08:51] as follows so let me [00:08:52] so let me um so remember we had um in [00:08:55] um so remember we had um in in yellow here um [00:08:58] in yellow here um or orange the [00:09:00] or orange the loss [00:09:01] loss of group a [00:09:02] of group a and in blue we have the loss of group b [00:09:05] and in blue we have the loss of group b and the maximum group loss is this [00:09:08] and the maximum group loss is this function of w as the other functions [00:09:11] function of w as the other functions which is going to be the pointwise [00:09:12] which is going to be the pointwise maximum so at every point we choose [00:09:14] maximum so at every point we choose either [00:09:16] either the value of [00:09:18] the value of that's loss of a or b that's going to be [00:09:20] that's loss of a or b that's going to be larger so as you can see it traces out [00:09:23] larger so as you can see it traces out an upper envelope here um over here it's [00:09:27] an upper envelope here um over here it's uh the loss of a [00:09:29] uh the loss of a is higher so it's going to track that [00:09:31] is higher so it's going to track that and over here the loss of b is higher so [00:09:33] and over here the loss of b is higher so it kind of [00:09:34] it kind of hugs b from there on [00:09:38] okay so um let's evaluate our at our [00:09:41] okay so um let's evaluate our at our point um w equals one [00:09:43] point um w equals one we see that remember from the previous [00:09:46] we see that remember from the previous slide that the two losses are of our [00:09:50] slide that the two losses are of our 22.5 and zero for the two groups [00:09:54] 22.5 and zero for the two groups um and then to compute the [00:09:56] um and then to compute the maximum we just take the max of these [00:09:58] maximum we just take the max of these two values and you get 22.5 so 22.5 is a [00:10:01] two values and you get 22.5 so 22.5 is a single number that summarizes how [00:10:04] single number that summarizes how bad is the worst group of and the max [00:10:07] bad is the worst group of and the max what's the maximum group loss [00:10:10] what's the maximum group loss and if you compare the maximum group [00:10:12] and if you compare the maximum group loss 22.5 with the average loss which is [00:10:15] loss 22.5 with the average loss which is 7.5 [00:10:16] 7.5 you'll see that the maximum group loss [00:10:18] you'll see that the maximum group loss is [00:10:19] is larger and it's always larger [00:10:24] so now let's uh compare these two loss [00:10:27] so now let's uh compare these two loss functions we have the average [00:10:29] functions we have the average loss [00:10:30] loss and the maximum group loss so we can [00:10:32] and the maximum group loss so we can plot both of these [00:10:34] plot both of these here [00:10:35] here and [00:10:36] and [Music] [00:10:38] [Music] what [00:10:40] what pictorially we can see what's what's [00:10:42] pictorially we can see what's what's going on [00:10:44] going on so uh here let me just plot our [00:10:47] so uh here let me just plot our data points just so we have them [00:10:49] data points just so we have them available [00:10:50] available um [00:10:52] um so these functions are definitely very [00:10:54] so these functions are definitely very different [00:10:55] different okay so what happens now when we try to [00:10:58] okay so what happens now when we try to minimize the average loss versus the [00:11:01] minimize the average loss versus the maximum group loss so let's start with [00:11:03] maximum group loss so let's start with minimizing the average slot so this is [00:11:04] minimizing the average slot so this is standard learning [00:11:06] standard learning this is a status quo you find the [00:11:08] this is a status quo you find the minimum of the average loss which is [00:11:10] minimum of the average loss which is going to be [00:11:12] going to be this point w equals [00:11:14] this point w equals 1.09 it gets a loss of 7.29 [00:11:18] 1.09 it gets a loss of 7.29 so it looks like you're doing pretty [00:11:20] so it looks like you're doing pretty well but if you remember look at the [00:11:22] well but if you remember look at the worst group loss of that wave vector [00:11:25] worst group loss of that wave vector then we'll see that it's above 20 which [00:11:28] then we'll see that it's above 20 which is not great [00:11:30] is not great so what you can do instead [00:11:33] so what you can do instead is to do what we call group [00:11:35] is to do what we call group distributionally robust optimization [00:11:38] distributionally robust optimization or group dro which is simply going to [00:11:40] or group dro which is simply going to minimize the maximum group loss so it's [00:11:43] minimize the maximum group loss so it's going to minimize this purple [00:11:46] going to minimize this purple plot here [00:11:47] plot here and what happens when you do that you [00:11:49] and what happens when you do that you get w equals 1.58 [00:11:51] get w equals 1.58 which gets a loss of 15.69 which is [00:11:54] which gets a loss of 15.69 which is better than the 20 plus that you work on [00:11:57] better than the 20 plus that you work on there [00:11:58] there now of course the average loss is [00:12:00] now of course the average loss is worsened [00:12:01] worsened because at this point the average loss [00:12:03] because at this point the average loss on the on the red curve is a little bit [00:12:05] on the on the red curve is a little bit higher so there's a trade-off here [00:12:08] higher so there's a trade-off here and we can see this tension kind of play [00:12:10] and we can see this tension kind of play out over on this this plot over here [00:12:13] out over on this this plot over here so here we see that if you were to [00:12:15] so here we see that if you were to minimize the average loss what you would [00:12:18] minimize the average loss what you would do is find a regressor or a model that's [00:12:21] do is find a regressor or a model that's um very close to the points over here b [00:12:24] um very close to the points over here b because there's four of them they're [00:12:25] because there's four of them they're kind of the majority class dominates [00:12:28] kind of the majority class dominates whereas if you minimize the maximum [00:12:30] whereas if you minimize the maximum group loss then you're going to get this [00:12:32] group loss then you're going to get this purple line which is going to be able to [00:12:34] purple line which is going to be able to balance out the two groups [00:12:37] balance out the two groups no matter how many points are over here [00:12:39] no matter how many points are over here versus over here so you can think about [00:12:41] versus over here so you can think about this the purple line is more fair [00:12:43] this the purple line is more fair because it treats groups more [00:12:45] because it treats groups more equally [00:12:49] so how do we [00:12:51] so how do we minimize the maximum grip loss [00:12:55] minimize the maximum grip loss so [00:12:55] so as before we're going to use gradient [00:12:57] as before we're going to use gradient descent and follow our nose [00:13:00] descent and follow our nose so what this looks like let me just [00:13:02] so what this looks like let me just should i plot this [00:13:06] so here's the objective function the [00:13:08] so here's the objective function the maximum group loss is train loss max is [00:13:13] maximum group loss is train loss max is remember the maximum over all the groups [00:13:16] remember the maximum over all the groups of um [00:13:17] of um trained the per group training loss [00:13:21] trained the per group training loss and so [00:13:23] and so how do you take the gradient of of a max [00:13:27] how do you take the gradient of of a max well the grading of a max [00:13:29] well the grading of a max remember is equal to the gradient [00:13:32] remember is equal to the gradient of the function where we're evaluating [00:13:35] of the function where we're evaluating at [00:13:36] at the particular [00:13:37] the particular value of g g star and what is g star g [00:13:41] value of g g star and what is g star g star is the arg max [00:13:43] star is the arg max over the training loss [00:13:46] over the training loss so [00:13:47] so um [00:13:47] um let's look at this picture so basically [00:13:49] let's look at this picture so basically what we're doing so we want to diff uh [00:13:52] what we're doing so we want to diff uh take the gradient of this purple curve [00:13:54] take the gradient of this purple curve right [00:13:55] right and so if you're over here the gradient [00:13:58] and so if you're over here the gradient of the purple curve is exactly the [00:13:59] of the purple curve is exactly the gradient of [00:14:02] gradient of the [00:14:02] the the loss on group a [00:14:04] the loss on group a and if you're over here [00:14:07] and if you're over here the grading of the the [00:14:09] the grading of the the maximum group loss is exactly [00:14:12] maximum group loss is exactly the gradient of [00:14:14] the gradient of group b and it's exactly coin's response [00:14:18] group b and it's exactly coin's response to g star is a over here because group a [00:14:21] to g star is a over here because group a is worse and g star is b over here [00:14:25] is worse and g star is b over here because group b is worse [00:14:28] because group b is worse so [00:14:29] so to compute the gradient it's actually [00:14:32] to compute the gradient it's actually kind of very simple you first just [00:14:33] kind of very simple you first just evaluate [00:14:35] evaluate at your current um [00:14:37] at your current um weight vector [00:14:39] weight vector what are the losses of the different [00:14:40] what are the losses of the different groups [00:14:42] groups and you look at the group that is [00:14:44] and you look at the group that is hurting the most has the highest loss [00:14:46] hurting the most has the highest loss and then you just update um on those [00:14:49] and then you just update um on those examples [00:14:50] examples so it's a very intuitive process you [00:14:52] so it's a very intuitive process you find which group needs the most amount [00:14:53] find which group needs the most amount of help and then you only um update your [00:14:56] of help and then you only um update your parameters based on that group [00:14:59] parameters based on that group so one note is that it's important that [00:15:01] so one note is that it's important that we're talking about gradient descent [00:15:04] we're talking about gradient descent not sarcastic descent because the cancer [00:15:07] not sarcastic descent because the cancer gradient descent relies on the objective [00:15:10] gradient descent relies on the objective function being a sum over terms but this [00:15:14] function being a sum over terms but this is a maximum over [00:15:16] is a maximum over a sum [00:15:17] a sum so it exactly won't work um how did [00:15:21] so it exactly won't work um how did exactly get stochastic methods to work [00:15:23] exactly get stochastic methods to work properly is beyond the scope of [00:15:25] properly is beyond the scope of this module but you can read the notes [00:15:27] this module but you can read the notes for pointers [00:15:30] okay so let me summarize here so we've [00:15:33] okay so let me summarize here so we've introduced the setting where examples [00:15:36] introduced the setting where examples are associated with groups we've done a [00:15:38] are associated with groups we've done a free regression but this generalizes for [00:15:41] free regression but this generalizes for classification and they're more general [00:15:42] classification and they're more general machine learning problems [00:15:45] machine learning problems we saw that we have the average loss [00:15:48] we saw that we have the average loss and the maximum group loss and these are [00:15:51] and the maximum group loss and these are different what is good on average is not [00:15:54] different what is good on average is not going to be a group for [00:15:57] going to be a group for all groups [00:15:59] all groups and we see that there's always a tension [00:16:01] and we see that there's always a tension between [00:16:03] between the groups if the groups are pulling you [00:16:05] the groups if the groups are pulling you in kind of different directions [00:16:07] in kind of different directions and we saw that group distributionally [00:16:09] and we saw that group distributionally robust optimization or group dro is a [00:16:12] robust optimization or group dro is a very simple algorithm that minimizes the [00:16:14] very simple algorithm that minimizes the maximum group loss [00:16:16] maximum group loss the the purple curve over here [00:16:19] the the purple curve over here and finally i want to remark that um [00:16:23] and finally i want to remark that um this module has kept things simple but [00:16:25] this module has kept things simple but there's many many more nuances so [00:16:27] there's many many more nuances so intersectionality is this principle or a [00:16:30] intersectionality is this principle or a property where [00:16:31] property where you on a group such as you know white [00:16:34] you on a group such as you know white women is actually made out of multiple [00:16:36] women is actually made out of multiple attributes and these groups might behave [00:16:39] attributes and these groups might behave differently than the more coarse groups [00:16:42] differently than the more coarse groups the women or [00:16:44] the women or um [00:16:45] um the the set of white people [00:16:48] the the set of white people and [00:16:49] and so we have to kind of take into account [00:16:52] so we have to kind of take into account um you know finer green groups [00:16:54] um you know finer green groups there are also cases where we don't [00:16:56] there are also cases where we don't might not know what the groups are maybe [00:16:58] might not know what the groups are maybe we don't collect demographic information [00:16:59] we don't collect demographic information and we have to infer them [00:17:02] and we have to infer them there's also an issue with overfitting [00:17:04] there's also an issue with overfitting we're talking only about the training [00:17:06] we're talking only about the training loss here just for simplicity but of [00:17:08] loss here just for simplicity but of course what we care about in machine [00:17:09] course what we care about in machine learning is doing well on um out of [00:17:13] learning is doing well on um out of a test set [00:17:15] a test set um but we're not talking about the test [00:17:16] um but we're not talking about the test set here so there's many more references [00:17:20] set here so there's many more references in the notes and i hope this has piqued [00:17:23] in the notes and i hope this has piqued your interest and [00:17:25] your interest and realizing that inequality is should be [00:17:27] realizing that inequality is should be considered a first-class citizen when we [00:17:30] considered a first-class citizen when we think about machine learning methods [00:17:32] think about machine learning methods so that's it thank you ================================================================================ LECTURE 009 ================================================================================ Artificial Intelligence & Machine Learning 6 - Non Linear Features | Stanford CS221: AI(Autumn 2021) Source: https://www.youtube.com/watch?v=eIxbNkB4byY --- Transcript [00:00:05] hi in this module i'm going to show you [00:00:07] hi in this module i'm going to show you how you can use the machinery of linear [00:00:10] how you can use the machinery of linear predictors that we've developed so far [00:00:12] predictors that we've developed so far to get some non-linear predictors [00:00:15] to get some non-linear predictors so we're going to first focus on [00:00:16] so we're going to first focus on regression and then later talk about [00:00:18] regression and then later talk about classification [00:00:19] classification so remembering regression we're given [00:00:21] so remembering regression we're given some training data we have a learning [00:00:22] some training data we have a learning algorithm that produces a predictor [00:00:25] algorithm that produces a predictor and the first key question or design [00:00:27] and the first key question or design decision is which predictors is the [00:00:29] decision is which predictors is the learning algorithm allowed to choose [00:00:31] learning algorithm allowed to choose from that's the question of the [00:00:33] from that's the question of the hypothesis class [00:00:36] hypothesis class so for [00:00:37] so for linear predictors remember that the [00:00:40] linear predictors remember that the hypothesis class is defined to be the [00:00:43] hypothesis class is defined to be the set of all predictors f x equals [00:00:47] set of all predictors f x equals some weight vector dot some feature [00:00:49] some weight vector dot some feature vector phi of x [00:00:52] vector phi of x and we allow the wave vector to range [00:00:54] and we allow the wave vector to range freely over all d dimensional real [00:00:56] freely over all d dimensional real vectors okay so if we take phi of x [00:01:00] vectors okay so if we take phi of x equals one comma x like we did before [00:01:03] equals one comma x like we did before then [00:01:04] then we can get some lines [00:01:06] we can get some lines so if we set the weight vector to be one [00:01:09] so if we set the weight vector to be one comma zero point five seven then we get [00:01:11] comma zero point five seven then we get this line with an intercept at 1 and a [00:01:14] this line with an intercept at 1 and a slope of 0.57 [00:01:16] slope of 0.57 and here's a purple one with the [00:01:18] and here's a purple one with the intercept [00:01:19] intercept of 2 and a slope of 0.2 [00:01:23] of 2 and a slope of 0.2 so all is good [00:01:25] so all is good but what happens if we get data that [00:01:27] but what happens if we get data that looks like this if you try to fit a line [00:01:29] looks like this if you try to fit a line through it you won't be very happy with [00:01:31] through it you won't be very happy with this [00:01:32] this you really want to fit some sort of [00:01:34] you really want to fit some sort of non-linear predictor something that can [00:01:35] non-linear predictor something that can curve around to fit the data [00:01:39] curve around to fit the data so your first reaction might be to reach [00:01:41] so your first reaction might be to reach for something like neural networks or [00:01:43] for something like neural networks or decision trees something that's more [00:01:45] decision trees something that's more complex but let's see how far we can get [00:01:47] complex but let's see how far we can get with just using linear predictors [00:01:51] with just using linear predictors so the key thing [00:01:53] so the key thing is that [00:01:54] is that the feature vector [00:01:56] the feature vector can be arbitrary [00:01:59] can be arbitrary so let's take the feature vector to be [00:02:01] so let's take the feature vector to be one comma x as before but let's just add [00:02:03] one comma x as before but let's just add on an x squared term [00:02:06] on an x squared term okay just for fun and so for example if [00:02:10] okay just for fun and so for example if we feed x equals three then we get the [00:02:12] we feed x equals three then we get the future vector one comma [00:02:14] future vector one comma nine [00:02:16] nine um let's define some weights two comma [00:02:18] um let's define some weights two comma one comma zero point two [00:02:21] one comma zero point two add and let's plot what that function [00:02:23] add and let's plot what that function looks like [00:02:24] looks like and we get a nice curve [00:02:29] and we get a nice curve so that's a nonlinear predictor so it [00:02:31] so that's a nonlinear predictor so it has a intercept of 2 a slope [00:02:34] has a intercept of 2 a slope of one at the origin and a curvature of [00:02:38] of one at the origin and a curvature of negative zero point two [00:02:40] negative zero point two here's another one four comma [00:02:42] here's another one four comma minus one comma zero point one [00:02:46] minus one comma zero point one there's an intercept of four a slope of [00:02:49] there's an intercept of four a slope of minus 1 and a curvature of 0.1 [00:02:53] minus 1 and a curvature of 0.1 and here's another one 1 1 0. so what [00:02:56] and here's another one 1 1 0. so what does this one look like this one just [00:02:58] does this one look like this one just looks like a line because we've used a [00:03:00] looks like a line because we've used a zero weight on this x squared term so it [00:03:03] zero weight on this x squared term so it just reduces to a linear predictor [00:03:06] just reduces to a linear predictor in general we can define a family of all [00:03:09] in general we can define a family of all quadratic predictors [00:03:11] quadratic predictors that looks like this by ranging the wave [00:03:13] that looks like this by ranging the wave vector freely over all [00:03:16] vector freely over all three dimensional vectors [00:03:18] three dimensional vectors so here is our first example of getting [00:03:21] so here is our first example of getting a non-linear predictor in particular [00:03:22] a non-linear predictor in particular quadratic predictors just by changing [00:03:25] quadratic predictors just by changing feet [00:03:26] feet the one small note here is that in one [00:03:28] the one small note here is that in one dimension [00:03:30] dimension x squared is just a single feature [00:03:34] x squared is just a single feature but if x were d dimensional to begin [00:03:36] but if x were d dimensional to begin with then to get the range full range of [00:03:39] with then to get the range full range of quadratic predictors we would need d [00:03:41] quadratic predictors we would need d squared features one for every x i xj [00:03:45] squared features one for every x i xj pair [00:03:46] pair so that would be a lot [00:03:48] so that would be a lot so that's one slight disadvantage of [00:03:50] so that's one slight disadvantage of using the machinery of linear predictors [00:03:52] using the machinery of linear predictors to get nonlinear predictors [00:03:56] to get nonlinear predictors let's move on [00:03:57] let's move on so quadratic predictors are great but [00:04:00] so quadratic predictors are great but they can only kind of vary smoothly [00:04:03] they can only kind of vary smoothly what happens if you want [00:04:05] what happens if you want a function that looks like this [00:04:09] a function that looks like this so here's an example of a piecewise [00:04:11] so here's an example of a piecewise constant predictor [00:04:13] constant predictor and we can get this predictor also by [00:04:16] and we can get this predictor also by just reimagining what a feature vector [00:04:18] just reimagining what a feature vector is so here [00:04:19] is so here is uh i'm going to define phi of x [00:04:22] is uh i'm going to define phi of x equals [00:04:23] equals and the first i'm going to carve up the [00:04:25] and the first i'm going to carve up the input space into a bunch of regions [00:04:28] input space into a bunch of regions and define a feature to be [00:04:30] and define a feature to be whether x lies in that region or not [00:04:33] whether x lies in that region or not the first feature is tests whether x is [00:04:35] the first feature is tests whether x is between 0 and 1 [00:04:37] between 0 and 1 and the indicator function will return 1 [00:04:40] and the indicator function will return 1 if that's true and 0 otherwise [00:04:42] if that's true and 0 otherwise um the second one is going to test [00:04:44] um the second one is going to test between 1 and 2 [00:04:46] between 1 and 2 and so on [00:04:48] and so on so here's an example if you punch in 0 [00:04:50] so here's an example if you punch in 0 2.3 [00:04:52] 2.3 that is 0 on all the feature slash [00:04:54] that is 0 on all the feature slash regions except for [00:04:56] regions except for this one [00:04:59] okay so if i set the weight vector [00:05:02] okay so if i set the weight vector corresponding to one two four four three [00:05:04] corresponding to one two four four three then i get this function [00:05:06] then i get this function and notice that each weight [00:05:08] and notice that each weight is just identifying the function value [00:05:11] is just identifying the function value of that region so between 0 and 1 the [00:05:15] of that region so between 0 and 1 the feature [00:05:17] feature vector is [00:05:18] vector is sorry the function is at 1 and then it's [00:05:21] sorry the function is at 1 and then it's 2 and then it's 4 and then it's 3. [00:05:25] 2 and then it's 4 and then it's 3. okay so here's another one [00:05:28] okay so here's another one it's 4 and then 3 3 to [00:05:32] it's 4 and then 3 3 to 1.5 [00:05:34] 1.5 and again in general the set of [00:05:36] and again in general the set of predictors is [00:05:39] predictors is w dot v of x where w can range [00:05:43] w dot v of x where w can range so this is a general technique uh [00:05:46] so this is a general technique uh piecewise constant functions which can [00:05:49] piecewise constant functions which can give you expressive nonlinear predictors [00:05:51] give you expressive nonlinear predictors by partitioning the input space [00:05:53] by partitioning the input space again a caveat is that [00:05:56] again a caveat is that everything looks nice in one dimension [00:05:58] everything looks nice in one dimension but if d x were d dimensions and each [00:06:02] but if d x were d dimensions and each dimension were carved up into b regions [00:06:05] dimension were carved up into b regions then you have b to the d different [00:06:06] then you have b to the d different features which is an exponential number [00:06:09] features which is an exponential number of features which is a kind of a no go [00:06:14] so [00:06:15] so you can kind of get the idea now but [00:06:17] you can kind of get the idea now but let's just do another example suppose [00:06:20] let's just do another example suppose you're trying to [00:06:22] you're trying to predict a function with some periodic [00:06:23] predict a function with some periodic structure like you're trying to predict [00:06:25] structure like you're trying to predict uh traffic patterns or sales of across a [00:06:28] uh traffic patterns or sales of across a year [00:06:29] year so imagine that you want to get a [00:06:31] so imagine that you want to get a function that looks like [00:06:32] function that looks like this okay [00:06:34] this okay so let's see if we can hack together a [00:06:37] so let's see if we can hack together a feature vector that does that so v of x [00:06:39] feature vector that does that so v of x equals one x [00:06:41] equals one x and x squared so put in the quadratic [00:06:43] and x squared so put in the quadratic and now let's add a cosine three [00:06:47] and now let's add a cosine three it's kind of arbitrary [00:06:49] it's kind of arbitrary um [00:06:50] um so here's an example if you punch 2 into [00:06:53] so here's an example if you punch 2 into x then you get this feature vector [00:06:56] x then you get this feature vector um if you define the weights in a [00:06:58] um if you define the weights in a certain way [00:06:59] certain way then you get that red [00:07:01] then you get that red curve you can take define the weights [00:07:04] curve you can take define the weights this way and then you get the purple [00:07:05] this way and then you get the purple curve and then so on [00:07:08] curve and then so on so here the kind of a key idea is that [00:07:11] so here the kind of a key idea is that you can really go wild you can throw in [00:07:14] you can really go wild you can throw in any sort of features you want and get [00:07:16] any sort of features you want and get all sorts of wacky looking predictors [00:07:19] all sorts of wacky looking predictors all using the machinery of linear [00:07:20] all using the machinery of linear predictor [00:07:23] predictor so you might say wait a minute wait a [00:07:24] so you might say wait a minute wait a minute [00:07:25] minute how are we able to [00:07:27] how are we able to do this get all this expressive [00:07:29] do this get all this expressive non-linear capabilities when we haven't [00:07:31] non-linear capabilities when we haven't really changed the learning algorithm or [00:07:33] really changed the learning algorithm or it's it's still supposed to be a linear [00:07:35] it's it's still supposed to be a linear predictor right [00:07:37] predictor right um [00:07:38] um well [00:07:38] well that's because the word linear is a [00:07:41] that's because the word linear is a little bit ambiguous here [00:07:43] little bit ambiguous here so remember the prediction is w dot p of [00:07:46] so remember the prediction is w dot p of x so that's the score [00:07:48] x so that's the score and the question is what [00:07:50] and the question is what linear in what [00:07:51] linear in what so is the score linear in w [00:07:54] so is the score linear in w yes because the score is just some [00:07:57] yes because the score is just some constant times w [00:07:58] constant times w is it linear in phi of x [00:08:00] is it linear in phi of x yes because [00:08:02] yes because uh it's something times v of x [00:08:06] uh it's something times v of x how about is it linear in x [00:08:08] how about is it linear in x well the answer is no because phi of x [00:08:11] well the answer is no because phi of x can be arbitrary so it doesn't have to [00:08:12] can be arbitrary so it doesn't have to be linear x [00:08:14] be linear x and the key idea behind non-linearity is [00:08:17] and the key idea behind non-linearity is that there's two ways of viewing it from [00:08:19] that there's two ways of viewing it from the point of view of gaining expressive [00:08:20] the point of view of gaining expressive non-linear predictors this is great [00:08:23] non-linear predictors this is great because you can define phi of x to be [00:08:25] because you can define phi of x to be something and get arbitrary non-linear [00:08:27] something and get arbitrary non-linear functions out [00:08:29] functions out but from the point of view of having to [00:08:31] but from the point of view of having to learn such a [00:08:33] learn such a model [00:08:34] model it's actually great because the score is [00:08:36] it's actually great because the score is a linear function of w [00:08:39] a linear function of w and when you're learning you take the [00:08:40] and when you're learning you take the gradient with respect to w so it's just [00:08:42] gradient with respect to w so it's just a kind of score is just a linear [00:08:44] a kind of score is just a linear function so [00:08:45] function so uh life is great in fact the learning [00:08:47] uh life is great in fact the learning algorithm doesn't even care what fee is [00:08:49] algorithm doesn't even care what fee is it only looks at the data through the [00:08:50] it only looks at the data through the lens of p of x it doesn't know whether [00:08:53] lens of p of x it doesn't know whether you gave it x and then applied fee or [00:08:55] you gave it x and then applied fee or you just gave it p of x directly [00:09:00] okay so now let's turn from regression [00:09:03] okay so now let's turn from regression to classification the store is story is [00:09:06] to classification the store is story is pretty much the same you can define [00:09:07] pretty much the same you can define arbitrary features and get nonlinear [00:09:09] arbitrary features and get nonlinear classifiers [00:09:10] classifiers but just to kind of review [00:09:13] but just to kind of review remember in linear classification you [00:09:15] remember in linear classification you define in two dimensions you define the [00:09:17] define in two dimensions you define the feature vector to be x1 comma x2 [00:09:20] feature vector to be x1 comma x2 and if you define [00:09:23] and if you define the predictor [00:09:24] the predictor as now a sign here [00:09:26] as now a sign here and the sign [00:09:28] and the sign allows you to partition define the [00:09:31] allows you to partition define the decision boundary which separates the [00:09:34] decision boundary which separates the region of the space which is labeled [00:09:35] region of the space which is labeled plus from the region of space which is [00:09:37] plus from the region of space which is labeled minus [00:09:39] labeled minus okay so now what is non-linear mean well [00:09:41] okay so now what is non-linear mean well if you look at f of x because of the [00:09:43] if you look at f of x because of the sine functions it's already non-linear [00:09:45] sine functions it's already non-linear so it doesn't really make sense so [00:09:47] so it doesn't really make sense so instead non-linearity for a [00:09:49] instead non-linearity for a classification means whether the [00:09:51] classification means whether the decision boundary is linear or not in [00:09:54] decision boundary is linear or not in particular here is it a line [00:09:57] particular here is it a line and if we define the feature vector as [00:09:59] and if we define the feature vector as x1 x2 then we just get a line [00:10:04] so so now let's try to do something a [00:10:07] so so now let's try to do something a little bit more interesting so let's see [00:10:09] little bit more interesting so let's see if we can define a quadratic classifier [00:10:11] if we can define a quadratic classifier suppose we wanted to define a classifier [00:10:14] suppose we wanted to define a classifier that looks like this so it's a circle [00:10:17] that looks like this so it's a circle the decision boundary is a circle where [00:10:20] the decision boundary is a circle where inside the circle we want to label as [00:10:22] inside the circle we want to label as plus and outside we want to label as [00:10:24] plus and outside we want to label as minus [00:10:26] minus okay so how are we going to do that [00:10:28] okay so how are we going to do that well let's start with a feature vector [00:10:31] well let's start with a feature vector equals uh x1 x2 as we had before and now [00:10:36] equals uh x1 x2 as we had before and now we're just going to tack on a quadratic [00:10:38] we're just going to tack on a quadratic term so x1 squared plus x2 squared [00:10:42] term so x1 squared plus x2 squared okay [00:10:44] okay and now if you define the corresponding [00:10:46] and now if you define the corresponding weight vector to be 2 2 minus 1 then i [00:10:50] weight vector to be 2 2 minus 1 then i claim that this gives you exactly this [00:10:52] claim that this gives you exactly this decision boundary [00:10:54] decision boundary which is a circle [00:10:56] which is a circle so there's some algebra that you can do [00:10:58] so there's some algebra that you can do which i'm going to uh [00:11:00] which i'm going to uh skip over but what you can do is you can [00:11:02] skip over but what you can do is you can rewrite this expression [00:11:04] rewrite this expression as follows [00:11:06] as follows so f of x the same f of x is equal to 1 [00:11:10] so f of x the same f of x is equal to 1 if [00:11:12] if this quadratic form is less than or [00:11:13] this quadratic form is less than or equal so what is this you might remember [00:11:15] equal so what is this you might remember from [00:11:16] from kind of algebra uh trigonometry days [00:11:19] kind of algebra uh trigonometry days that this is the squared distance of a [00:11:22] that this is the squared distance of a point to the point one one [00:11:25] point to the point one one okay so in particular if i [00:11:27] okay so in particular if i constrain the squared distance to be [00:11:29] constrain the squared distance to be less than or equal to 2 then this [00:11:31] less than or equal to 2 then this is the region of points [00:11:35] is the region of points with within radius square root of 2 of a [00:11:39] with within radius square root of 2 of a circle centered at 1 1 which is exactly [00:11:41] circle centered at 1 1 which is exactly what this is [00:11:43] what this is and everything else is classified as -1 [00:11:47] and everything else is classified as -1 so the zoom boundary [00:11:49] so the zoom boundary we got successfully to be a circle [00:11:54] okay so [00:11:55] okay so this is let me try to take one more step [00:11:58] this is let me try to take one more step to try to reconcile this tension between [00:12:01] to try to reconcile this tension between linear [00:12:03] linear in phi of x and nonlinear and x okay so [00:12:06] in phi of x and nonlinear and x okay so what we're going to do here is [00:12:08] what we're going to do here is remember the input space [00:12:11] remember the input space x [00:12:12] x this decision boundary is a circle [00:12:14] this decision boundary is a circle and in feature space [00:12:17] and in feature space you can see that the decision boundary [00:12:18] you can see that the decision boundary is a line [00:12:20] is a line so here is a cool [00:12:22] so here is a cool animation that i found on youtube which [00:12:24] animation that i found on youtube which i think [00:12:25] i think really nicely illustrates this [00:12:28] really nicely illustrates this so this is done in context of svm but [00:12:29] so this is done in context of svm but the idea is the same so here we have [00:12:32] the idea is the same so here we have points in inside the circle and outside [00:12:34] points in inside the circle and outside the circle in the ambient [00:12:37] the circle in the ambient x space they're not separable what we're [00:12:39] x space they're not separable what we're going to do is we're going to apply the [00:12:41] going to do is we're going to apply the feature map and the feature map remember [00:12:43] feature map and the feature map remember adds this third dimension [00:12:45] adds this third dimension x1 squared plus x2 squared and now we're [00:12:48] x1 squared plus x2 squared and now we're in feature space which is 3d [00:12:50] in feature space which is 3d and in 3d we can actually slice a linear [00:12:53] and in 3d we can actually slice a linear predictor that separates the green of [00:12:56] predictor that separates the green of the red and the blue points [00:12:58] the red and the blue points and that separation [00:13:00] and that separation induces a circle in the [00:13:03] induces a circle in the original 2t 2tsp [00:13:10] okay to summarize [00:13:12] okay to summarize so [00:13:13] so linear is ambiguous [00:13:15] linear is ambiguous so we have a predictor [00:13:18] so we have a predictor in a case of regression which is w dot v [00:13:20] in a case of regression which is w dot v of x [00:13:21] of x it's linear in w and phi of x but it's [00:13:24] it's linear in w and phi of x but it's non-linear in x [00:13:25] non-linear in x and this is what allows us to get [00:13:27] and this is what allows us to get nonlinear predictors using the machinery [00:13:29] nonlinear predictors using the machinery of linear predictor [00:13:31] of linear predictor we solve for regression nonlinearity [00:13:33] we solve for regression nonlinearity talks about the predictor directly and [00:13:35] talks about the predictor directly and classification we talk about the [00:13:36] classification we talk about the decision boundary [00:13:38] decision boundary we also saw many types of nonlinear [00:13:40] we also saw many types of nonlinear features quadratic features piecewise [00:13:42] features quadratic features piecewise constant periodic features and again you [00:13:44] constant periodic features and again you can kind of make up your own features [00:13:46] can kind of make up your own features for the application you have in mind [00:13:50] for the application you have in mind so next time someone on the street asks [00:13:52] so next time someone on the street asks you about linear predictors you first [00:13:54] you about linear predictors you first have to clarify linear and what [00:13:57] have to clarify linear and what okay that's the end [00:14:04] you ================================================================================ LECTURE 010 ================================================================================ Artificial Intelligence & Machine Learning 7 - Feature Templates | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=2QfSBLtvioE --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about how to use feature templates to [00:00:09] about how to use feature templates to organize and construct your features in [00:00:11] organize and construct your features in a very flexible way [00:00:14] a very flexible way so recall that a hypothesis class is a [00:00:17] so recall that a hypothesis class is a set of all predictors that a learning [00:00:18] set of all predictors that a learning algorithm is going to consider [00:00:21] algorithm is going to consider and then in case of linear predictors [00:00:23] and then in case of linear predictors we've looked at predictors [00:00:25] we've looked at predictors of a function of x to be equal to in the [00:00:28] of a function of x to be equal to in the case of regression w dot phi of x [00:00:32] case of regression w dot phi of x or in the case of classification sine of [00:00:34] or in the case of classification sine of that quantity [00:00:36] that quantity and we allow the wave vectors to where [00:00:38] and we allow the wave vectors to where vary freely [00:00:41] okay so we can visualize the hypothesis [00:00:44] okay so we can visualize the hypothesis class as follows so imagine the space of [00:00:46] class as follows so imagine the space of all possible predictors all possible [00:00:48] all possible predictors all possible functions mapping x to y [00:00:51] functions mapping x to y when you define a feature extractor fee [00:00:55] when you define a feature extractor fee what you're doing is committing to a [00:00:57] what you're doing is committing to a particular subset of all possible [00:00:59] particular subset of all possible predictors [00:01:01] predictors and usually you do this by [00:01:04] and usually you do this by using prior knowledge [00:01:06] using prior knowledge and the second part is the learning [00:01:08] and the second part is the learning algorithm [00:01:10] algorithm where you're given script f the [00:01:12] where you're given script f the hypothesis class you're asking the [00:01:14] hypothesis class you're asking the learning algorithm to choose a [00:01:16] learning algorithm to choose a particular predictor from that set [00:01:18] particular predictor from that set based on data [00:01:21] based on data so intuitively we want the script f [00:01:24] so intuitively we want the script f hypothesis class to [00:01:26] hypothesis class to contain the good predictors of course [00:01:28] contain the good predictors of course but it can also contain some bad ones [00:01:30] but it can also contain some bad ones because they will will be filtered out [00:01:32] because they will will be filtered out based on the basis of data but we don't [00:01:34] based on the basis of data but we don't want it to be too big [00:01:37] want it to be too big so that the learning algorithm has a [00:01:39] so that the learning algorithm has a trouble identifying good predictors from [00:01:41] trouble identifying good predictors from the bad predictors [00:01:45] so let's look at an example task and i [00:01:48] so let's look at an example task and i want to give you an idea of how to [00:01:50] want to give you an idea of how to choose the feature extractor [00:01:53] choose the feature extractor so suppose you're given a string [00:01:55] so suppose you're given a string such as abc gmail.com and you're asked [00:01:59] such as abc gmail.com and you're asked to predict [00:02:00] to predict whether [00:02:01] whether this is a valid email address or not [00:02:04] this is a valid email address or not using a linear classifier [00:02:07] using a linear classifier so in this case what we have to do is to [00:02:11] so in this case what we have to do is to identify the feature extractor fee [00:02:15] identify the feature extractor fee so when you're designing a feature [00:02:16] so when you're designing a feature extractor the main question you ask [00:02:18] extractor the main question you ask yourself is what properties of the input [00:02:21] yourself is what properties of the input x might be relevant for predicting why [00:02:25] x might be relevant for predicting why of course you don't want to necessarily [00:02:26] of course you don't want to necessarily commit to a particular aspect to be [00:02:29] commit to a particular aspect to be important because you don't know that [00:02:30] important because you don't know that you want to learn that from data [00:02:32] you want to learn that from data but you should give the learning [00:02:34] but you should give the learning algorithm some guidance [00:02:37] algorithm some guidance so what we're going to do is define the [00:02:39] so what we're going to do is define the feature extractor as given x produce a [00:02:43] feature extractor as given x produce a set of feature name feature value pairs [00:02:46] set of feature name feature value pairs so in this particular example the [00:02:47] so in this particular example the feature extractor is going to produce a [00:02:49] feature extractor is going to produce a feature vector [00:02:51] feature vector and let's say if in this case we might [00:02:54] and let's say if in this case we might look at [00:02:56] look at the length [00:02:57] the length is it greater than 10 [00:02:59] is it greater than 10 in case it's one because length [00:03:02] in case it's one because length is has something to do with whether it's [00:03:03] is has something to do with whether it's a value address the fraction of [00:03:05] a value address the fraction of alphanumeric characters [00:03:07] alphanumeric characters 0.85 in this case doesn't contain an at [00:03:10] 0.85 in this case doesn't contain an at sign that's one because it does contain [00:03:12] sign that's one because it does contain an x sign [00:03:14] an x sign ends with a com [00:03:16] ends with a com and it's [00:03:17] and it's one here [00:03:18] one here and does it end with dot org and it's [00:03:21] and does it end with dot org and it's zero here [00:03:22] zero here okay so this is a feature vector or that [00:03:26] okay so this is a feature vector or that we might construct for this particular [00:03:28] we might construct for this particular application [00:03:30] application so now we go to prediction [00:03:33] so now we go to prediction so remember that we've previously [00:03:36] so remember that we've previously defined the feature vector to just be [00:03:39] defined the feature vector to just be a real vector it's just a list of [00:03:41] a real vector it's just a list of numbers so what we've done right now is [00:03:43] numbers so what we've done right now is to just annotate or comment each [00:03:46] to just annotate or comment each component of that feature vector with a [00:03:48] component of that feature vector with a name that describes what that [00:03:50] name that describes what that component is about [00:03:52] component is about we can do the same thing with the [00:03:53] we can do the same thing with the corresponding wave vector so here is a [00:03:56] corresponding wave vector so here is a white vector just a list of numbers and [00:03:58] white vector just a list of numbers and we can annotate each component with the [00:04:00] we can annotate each component with the name of the corresponding field [00:04:04] name of the corresponding field and recall that the score is just the [00:04:06] and recall that the score is just the dot product w dot phi of x [00:04:09] dot product w dot phi of x and just to write out the dot product [00:04:10] and just to write out the dot product it's a [00:04:13] it's a sum over all the features or components [00:04:16] sum over all the features or components of wj the weight of that feature times [00:04:20] of wj the weight of that feature times the feature value [00:04:22] the feature value okay so here's an example [00:04:24] okay so here's an example um the weight of length greater than 10 [00:04:27] um the weight of length greater than 10 is minus 1.2 [00:04:28] is minus 1.2 the the feature value is one so you have [00:04:31] the the feature value is one so you have that product here and you have all the [00:04:33] that product here and you have all the other features [00:04:37] so a little piece of intuition here is [00:04:39] so a little piece of intuition here is that you can think about the score [00:04:41] that you can think about the score remember in classification positive [00:04:43] remember in classification positive scores are [00:04:45] scores are uh result in positive classification [00:04:47] uh result in positive classification negative scores result in negative [00:04:49] negative scores result in negative classification you can think about each [00:04:51] classification you can think about each feature as providing a [00:04:53] feature as providing a vote [00:04:54] vote you can think about if let's say phi of [00:04:56] you can think about if let's say phi of x j is one [00:04:59] x j is one and w j if it's positive that means it's [00:05:01] and w j if it's positive that means it's vading in fo fading in favor of a [00:05:05] vading in fo fading in favor of a positive classification and if wj is [00:05:07] positive classification and if wj is negative it's invading voting in favor [00:05:10] negative it's invading voting in favor of negative classification and the [00:05:12] of negative classification and the magnitude of wj determines the strength [00:05:14] magnitude of wj determines the strength of the vote [00:05:15] of the vote so that's another way to interpret [00:05:18] so that's another way to interpret the dot product before we previously saw [00:05:20] the dot product before we previously saw that we can interpret as a cosine of the [00:05:22] that we can interpret as a cosine of the angle which is a more geometric [00:05:24] angle which is a more geometric interpretation [00:05:28] so so far we've seen that we can take [00:05:30] so so far we've seen that we can take inputs define arbitrary features [00:05:32] inputs define arbitrary features extractors get out our feature vectors [00:05:34] extractors get out our feature vectors and do learner [00:05:35] and do learner but [00:05:36] but how do we choose these feature vectors i [00:05:39] how do we choose these feature vectors i i just kind of made up at com and org um [00:05:42] i just kind of made up at com and org um which ones do we include [00:05:44] which ones do we include so far we've used some primary knowledge [00:05:46] so far we've used some primary knowledge but um it's very easy [00:05:48] but um it's very easy in this manner to miss some um what [00:05:51] in this manner to miss some um what about [00:05:51] about suffixes like an enus for example [00:05:55] suffixes like an enus for example we need a more systematic way of doing [00:05:58] we need a more systematic way of doing this [00:05:59] this and this is where feature templates come [00:06:01] and this is where feature templates come in it comes in so a feature template is [00:06:04] in it comes in so a feature template is simply a group of features all computed [00:06:06] simply a group of features all computed in a similar way [00:06:08] in a similar way so here's an example so with the input [00:06:10] so here's an example so with the input abc gmail.com [00:06:12] abc gmail.com we're going to write the feature [00:06:14] we're going to write the feature template as simply an english [00:06:16] template as simply an english description [00:06:17] description with a blank and that blank is meant to [00:06:20] with a blank and that blank is meant to be filled in with an arbitrary value [00:06:22] be filled in with an arbitrary value last three characters equals something [00:06:25] last three characters equals something and by instantiating that blank with all [00:06:28] and by instantiating that blank with all sorts of different [00:06:29] sorts of different values then we begin to realize [00:06:33] values then we begin to realize the feature vectors [00:06:35] the feature vectors that are the features that are actually [00:06:38] that are the features that are actually defined by this feature template [00:06:44] so [00:06:45] so the important part here is that [00:06:47] the important part here is that we no longer have to say which suffixes [00:06:50] we no longer have to say which suffixes are important we don't have to say what [00:06:52] are important we don't have to say what types of patterns [00:06:54] types of patterns what particular patterns to look at we [00:06:56] what particular patterns to look at we just have to know that there exists [00:06:58] just have to know that there exists some suffix [00:06:59] some suffix that might be important in define this [00:07:01] that might be important in define this feature template letting the learning [00:07:03] feature template letting the learning algorithm sort out which of these many [00:07:06] algorithm sort out which of these many features are actually relevant [00:07:11] so to continue this example so the input [00:07:13] so to continue this example so the input is abc gmail.com we define this feature [00:07:16] is abc gmail.com we define this feature template which can be instantiated uh by [00:07:19] template which can be instantiated uh by substituting something like dot com [00:07:21] substituting something like dot com um we can also define this other feature [00:07:23] um we can also define this other feature template length greater than blank [00:07:26] template length greater than blank and we can type plug in one two three [00:07:28] and we can type plug in one two three four five six seven eight nine ten and [00:07:30] four five six seven eight nine ten and so on into that [00:07:31] so on into that um some feature templates [00:07:33] um some feature templates are don't have a blank and that's okay [00:07:35] are don't have a blank and that's okay because that just corresponds to specify [00:07:38] because that just corresponds to specify one single feature and that has a [00:07:40] one single feature and that has a particular value [00:07:42] particular value so here's another example so suppose the [00:07:45] so here's another example so suppose the input is an aerial image along with some [00:07:48] input is an aerial image along with some metadata about the location [00:07:50] metadata about the location so you can go figure out where this [00:07:51] so you can go figure out where this actually is [00:07:52] actually is um [00:07:53] um so the feature template in this case we [00:07:56] so the feature template in this case we might want to look at the following so [00:07:59] might want to look at the following so we want to look at the pixel intensity [00:08:01] we want to look at the pixel intensity of this image at a particular row [00:08:05] of this image at a particular row and a particular column [00:08:07] and a particular column and it's a color image so there's three [00:08:10] and it's a color image so there's three channels rgb so we for identify a [00:08:13] channels rgb so we for identify a particular channel that we're looking at [00:08:15] particular channel that we're looking at so this might be instantiated as the [00:08:17] so this might be instantiated as the pixel intensity of image at row 10 and [00:08:19] pixel intensity of image at row 10 and column 93 [00:08:20] column 93 red channel and that might have a [00:08:22] red channel and that might have a particular value [00:08:25] particular value another feature template might look at [00:08:27] another feature template might look at the metadata the location [00:08:29] the metadata the location and be [00:08:30] and be a feature on whether the latitude is in [00:08:33] a feature on whether the latitude is in a particular range and longitude is in a [00:08:35] a particular range and longitude is in a particular range [00:08:37] particular range so [00:08:38] so this feature template gets instantiated [00:08:41] this feature template gets instantiated might be essentially with um [00:08:43] might be essentially with um particular values that uh denote ranges [00:08:47] particular values that uh denote ranges um [00:08:48] um so this is if you remember um piecewise [00:08:52] so this is if you remember um piecewise constant features this is an example of [00:08:54] constant features this is an example of piecewise constant features that carves [00:08:56] piecewise constant features that carves up the [00:08:57] up the um the [00:08:58] um the the world into a bunch of regions and [00:09:00] the world into a bunch of regions and has a feature firing if the [00:09:04] has a feature firing if the lat long is in a particular region or [00:09:06] lat long is in a particular region or not [00:09:09] so one thing you might know is that [00:09:11] so one thing you might know is that feature templates are pretty flexible [00:09:13] feature templates are pretty flexible but sometimes they can give rise to a [00:09:15] but sometimes they can give rise to a lot of features last character equals [00:09:17] lot of features last character equals blank and there's already you know 26 if [00:09:20] blank and there's already you know 26 if you only include a lowercase letters [00:09:23] you only include a lowercase letters and furthermore most of these feature [00:09:26] and furthermore most of these feature values are zero [00:09:28] values are zero so in these cases this is what we mean [00:09:31] so in these cases this is what we mean when a feature vector is sparse and you [00:09:33] when a feature vector is sparse and you can actually represent sparse feature [00:09:36] can actually represent sparse feature vectors [00:09:36] vectors by more compactly by just as a [00:09:39] by more compactly by just as a dictionary mapping the feature name to [00:09:41] dictionary mapping the feature name to the actual feature value [00:09:44] so in general there's two ways you can [00:09:47] so in general there's two ways you can represent feature vectors one is using [00:09:50] represent feature vectors one is using arrays and one is using dictionaries [00:09:53] arrays and one is using dictionaries so if your feature vector looks like [00:09:55] so if your feature vector looks like this which is dense or not sparse [00:09:58] this which is dense or not sparse that means all the feature values are [00:10:00] that means all the feature values are mostly non-zero [00:10:02] mostly non-zero then you might want to just represent [00:10:04] then you might want to just represent this as an array order the feature [00:10:05] this as an array order the feature somehow and just list out the numbers [00:10:08] somehow and just list out the numbers but in cases where your feature vector [00:10:10] but in cases where your feature vector looks like this and has lots of zeros [00:10:13] looks like this and has lots of zeros then it will be more efficient to [00:10:15] then it will be more efficient to represent this as a dictionary where you [00:10:17] represent this as a dictionary where you again you specify the feature name colon [00:10:20] again you specify the feature name colon the [00:10:21] the feature value of only the non-zero [00:10:24] feature value of only the non-zero elements and by a convention anything [00:10:27] elements and by a convention anything that is not mentioned has a value of [00:10:29] that is not mentioned has a value of zero [00:10:31] zero so one interesting advantage of sparse [00:10:33] so one interesting advantage of sparse features is that you don't have to ins a [00:10:36] features is that you don't have to ins a priori instantiate all the features in [00:10:38] priori instantiate all the features in advance you can kind of as data comes [00:10:42] advance you can kind of as data comes you only kind of lazily build up these [00:10:44] you only kind of lazily build up these features over time whereas if you were [00:10:47] features over time whereas if you were doing [00:10:48] doing things in a dense way [00:10:50] things in a dense way you would have to [00:10:51] you would have to pre-define the fixed set of features [00:10:53] pre-define the fixed set of features that you're going to be working with now [00:10:55] that you're going to be working with now in recent years with deep learning dense [00:10:57] in recent years with deep learning dense features have and arrays have been much [00:11:00] features have and arrays have been much more ubiquitous [00:11:02] more ubiquitous partly because you can take advantage of [00:11:04] partly because you can take advantage of fast [00:11:05] fast matrix computations on the gpu [00:11:10] so to summarize [00:11:13] so to summarize we want to identify hypothesis classes [00:11:16] we want to identify hypothesis classes and in this case we're looking at [00:11:19] and in this case we're looking at defining the hypothesis class with [00:11:21] defining the hypothesis class with respect to the feature extractor [00:11:24] respect to the feature extractor to define the feature of extractor we [00:11:26] to define the feature of extractor we use feature templates which is a [00:11:27] use feature templates which is a convenient shorthand for unrolling a [00:11:30] convenient shorthand for unrolling a feature single feature template into a [00:11:33] feature single feature template into a bunch of different features we also saw [00:11:35] bunch of different features we also saw that in some cases the feature vectors [00:11:37] that in some cases the feature vectors were sparse and therefore you can use a [00:11:40] were sparse and therefore you can use a dictionary implementation to be more [00:11:42] dictionary implementation to be more efficient [00:11:43] efficient okay so that's the end of this module [00:11:45] okay so that's the end of this module thanks ================================================================================ LECTURE 011 ================================================================================ Artificial Intelligence & Machine Learning 8 - Neural Networks | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=pnKXgBHuN58 --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about neural networks a way to construct [00:00:09] about neural networks a way to construct nonlinear predictors via problem [00:00:11] nonlinear predictors via problem decomposition [00:00:14] decomposition so when we started we talked about [00:00:15] so when we started we talked about linear predictor and there were two [00:00:17] linear predictor and there were two linear in two ways first is that the [00:00:20] linear in two ways first is that the feature vector was linear function of x [00:00:23] feature vector was linear function of x and the way that the feature vector [00:00:25] and the way that the feature vector interacted with the prediction was also [00:00:26] interacted with the prediction was also linear this gave you rise to lines [00:00:31] linear this gave you rise to lines next we talked about non-linear [00:00:33] next we talked about non-linear predictors but [00:00:34] predictors but keeping the same [00:00:36] keeping the same linear machinery but just playing around [00:00:38] linear machinery but just playing around with the feature vector and by adding [00:00:41] with the feature vector and by adding terms like x squared you could get [00:00:43] terms like x squared you could get quadratic predictors and so on [00:00:46] quadratic predictors and so on so now what we're going to do is we're [00:00:47] so now what we're going to do is we're going to define neural networks where we [00:00:50] going to define neural networks where we can just leave [00:00:51] can just leave b of x alone the future vector alone [00:00:54] b of x alone the future vector alone and play with the way that the feature [00:00:56] and play with the way that the feature vector results in the prediction [00:01:00] vector results in the prediction and that will allow us to get all sorts [00:01:02] and that will allow us to get all sorts of fancy stuff [00:01:05] so let me begin with a motivating [00:01:07] so let me begin with a motivating example [00:01:08] example so suppose you're trying to predict [00:01:10] so suppose you're trying to predict whether two cars are going to collide or [00:01:12] whether two cars are going to collide or not so the input are the positions of [00:01:14] not so the input are the positions of the two cars [00:01:16] the two cars so x1 is the position of car 1 and x2 is [00:01:19] so x1 is the position of car 1 and x2 is the position of car 2. [00:01:22] the position of car 2. and what you like to output is whether [00:01:24] and what you like to output is whether y equals one whether there's it's safe [00:01:27] y equals one whether there's it's safe or y equals one whether y equals minus [00:01:29] or y equals one whether y equals minus one or whether they're wide or not [00:01:32] one or whether they're wide or not and [00:01:34] and what is unknown here um is that [00:01:38] what is unknown here um is that we're going to say that cars are safe if [00:01:40] we're going to say that cars are safe if they're sufficiently far so the distance [00:01:42] they're sufficiently far so the distance between them is at least one then we're [00:01:44] between them is at least one then we're going to be safe you can visualize this [00:01:47] going to be safe you can visualize this a true [00:01:49] a true um predictor as follows so here is x1 [00:01:52] um predictor as follows so here is x1 and x2 and [00:01:55] and x2 and um [00:01:56] um what is going to happen was you're going [00:01:57] what is going to happen was you're going to draw these two lines here [00:02:00] to draw these two lines here and anything any point that is over here [00:02:06] and anything any point that is over here and anything that is over here is going [00:02:09] and anything that is over here is going to be labeled as uh [00:02:11] to be labeled as uh plus which is safe and anything that's [00:02:14] plus which is safe and anything that's in between is going to be labeled as [00:02:16] in between is going to be labeled as minus or that will collide [00:02:19] minus or that will collide okay so let's do some examples here so [00:02:22] okay so let's do some examples here so suppose we have a point zero two which [00:02:25] suppose we have a point zero two which is this point here [00:02:27] is this point here um [00:02:28] um this is safe [00:02:30] this is safe so y equals one two zero is also [00:02:34] so y equals one two zero is also safe and 0 0 [00:02:37] safe and 0 0 is uh [00:02:39] is uh here which is not safe [00:02:41] here which is not safe and 2 2 [00:02:43] and 2 2 is -1 which is also not safe [00:02:47] is -1 which is also not safe okay so [00:02:48] okay so as an aside this configuration points is [00:02:52] as an aside this configuration points is what was historically known as a xor [00:02:54] what was historically known as a xor problem and was shown that [00:02:56] problem and was shown that pure linear classifiers could not be [00:02:58] pure linear classifiers could not be used to solve this problem you couldn't [00:03:00] used to solve this problem you couldn't draw a line to separate the blue and the [00:03:02] draw a line to separate the blue and the orange points but nonetheless we're [00:03:05] orange points but nonetheless we're going to show how neural networks can be [00:03:06] going to show how neural networks can be used to [00:03:08] used to solve this [00:03:11] okay so the key intuition is the [00:03:14] okay so the key intuition is the idea of problem decomposition so instead [00:03:16] idea of problem decomposition so instead of solving the problem all at once we're [00:03:18] of solving the problem all at once we're going to decompose it into two [00:03:20] going to decompose it into two subproblems [00:03:21] subproblems but first we're going to test if car 1 [00:03:23] but first we're going to test if car 1 is to the far right of car 2. [00:03:26] is to the far right of car 2. and in the picture here that corresponds [00:03:28] and in the picture here that corresponds to [00:03:29] to simply this [00:03:30] simply this region over here which we're going to [00:03:32] region over here which we're going to call [00:03:33] call h1 so h1 is whether x1 minus x2 is [00:03:37] h1 so h1 is whether x1 minus x2 is greater than equal to 1. [00:03:40] greater than equal to 1. and then we're going to find another sub [00:03:41] and then we're going to find another sub problem testing whether car 2 is to the [00:03:43] problem testing whether car 2 is to the far right of car 1 [00:03:45] far right of car 1 which is called h2 [00:03:47] which is called h2 that corresponds to this [00:03:50] that corresponds to this region over here [00:03:53] region over here and then we're going to predict safe if [00:03:55] and then we're going to predict safe if at least one of them is true so we just [00:03:57] at least one of them is true so we just add the 2 here which is either one or [00:04:00] add the 2 here which is either one or zero and [00:04:02] zero and if at least one of them is one then [00:04:04] if at least one of them is one then we're going to return plus one [00:04:07] we're going to return plus one and by convention we're going to assume [00:04:09] and by convention we're going to assume that the sine of zero is uh [00:04:11] that the sine of zero is uh minus [00:04:12] minus one [00:04:14] one okay so [00:04:15] okay so um here are some examples here so [00:04:18] um here are some examples here so suppose we have zero two again so this [00:04:21] suppose we have zero two again so this point [00:04:22] point h1 says nope that's not on my side h2 [00:04:26] h1 says nope that's not on my side h2 says yep that's on my side and at least [00:04:29] says yep that's on my side and at least one is enough to make the prediction [00:04:31] one is enough to make the prediction plus one [00:04:32] plus one if you take two zero that's this point [00:04:35] if you take two zero that's this point um each one says yep h2 says nope and [00:04:38] um each one says yep h2 says nope and then f is one because all it takes is [00:04:41] then f is one because all it takes is one [00:04:42] one zero zero is this point both of them say [00:04:46] zero zero is this point both of them say no and it's minus one and same with two [00:04:49] no and it's minus one and same with two two both of them say no [00:04:51] two both of them say no it's minus [00:04:55] okay so so far we've just defined the [00:04:57] okay so so far we've just defined the true function f [00:04:59] true function f um [00:05:00] um uh of course we don't know f [00:05:02] uh of course we don't know f so [00:05:03] so what we're going to do is try to [00:05:06] what we're going to do is try to move gradually to defining a hypothesis [00:05:09] move gradually to defining a hypothesis class and the first next step is to [00:05:12] class and the first next step is to rewrite f using vector notation [00:05:16] rewrite f using vector notation so here are the two intermediate sub [00:05:18] so here are the two intermediate sub problems [00:05:19] problems and the predictor is f of x equals the [00:05:21] and the predictor is f of x equals the sine [00:05:23] sine and what we're going to do is to write [00:05:25] and what we're going to do is to write this in terms of a dot product between a [00:05:28] this in terms of a dot product between a weight vector and a [00:05:30] weight vector and a future vector so here's the feature [00:05:32] future vector so here's the feature vector [00:05:32] vector 1 x 1 [00:05:34] 1 x 1 x 2 [00:05:36] x 2 and then we're going to define a white [00:05:38] and then we're going to define a white vector which is minus 1 and if you look [00:05:40] vector which is minus 1 and if you look at the dot product it's going to be so [00:05:42] at the dot product it's going to be so it's minus 1 [00:05:43] it's minus 1 plus x 1 [00:05:45] plus x 1 minus x 2 [00:05:47] minus x 2 and if that quantity is greater than 0 [00:05:50] and if that quantity is greater than 0 then we're going to return 1 [00:05:53] then we're going to return 1 otherwise return 0. [00:05:55] otherwise return 0. and you can verify that this is exactly [00:05:58] and you can verify that this is exactly just a rewrite of this expression [00:06:00] just a rewrite of this expression and similarly if you reverse the roles [00:06:02] and similarly if you reverse the roles of x1 and x2 then you can rewrite [00:06:05] of x1 and x2 then you can rewrite h2 [00:06:06] h2 as [00:06:07] as in vector notation as well [00:06:10] in vector notation as well now what we're going to do is we're [00:06:11] now what we're going to do is we're going to just [00:06:13] going to just combine h1 and h2 by stacking them [00:06:17] combine h1 and h2 by stacking them so we're going to find this matrix which [00:06:19] so we're going to find this matrix which is just the two [00:06:21] is just the two wave vectors here stacked up so we have [00:06:23] wave vectors here stacked up so we have two rows here [00:06:25] two rows here and we're going to [00:06:27] and we're going to multiply this matrix by the feature [00:06:29] multiply this matrix by the feature vector [00:06:30] vector so remember left multiplication by a [00:06:33] so remember left multiplication by a matrix is just taking the dot product [00:06:35] matrix is just taking the dot product with each of the rows of that matrix [00:06:38] with each of the rows of that matrix and now this produces [00:06:40] and now this produces a two dimensional vector and we're going [00:06:42] a two dimensional vector and we're going to test whether each component is [00:06:45] to test whether each component is greater than or equal to zero [00:06:47] greater than or equal to zero so in the end [00:06:48] so in the end h of x is going to be a two dimensional [00:06:52] h of x is going to be a two dimensional vector [00:06:54] vector okay and now given that we can rewrite [00:06:56] okay and now given that we can rewrite the predictor [00:06:58] the predictor as simply the sine of the dot product [00:07:00] as simply the sine of the dot product between 1 1 and h of x which is simply [00:07:04] between 1 1 and h of x which is simply the sum of the two components [00:07:07] the sum of the two components so now we've written [00:07:09] so now we've written f of x which is the true function in [00:07:11] f of x which is the true function in terms of [00:07:13] terms of a bunch of [00:07:15] a bunch of matrix or vector multiplies [00:07:18] matrix or vector multiplies now everything in red here [00:07:20] now everything in red here are just numbers [00:07:22] are just numbers and so far we've specified what they are [00:07:24] and so far we've specified what they are but [00:07:25] but in general we're not going to know them [00:07:27] in general we're not going to know them and we're going to have to learn them [00:07:29] and we're going to have to learn them from data [00:07:31] from data but [00:07:32] but before we do that we're going to [00:07:34] before we do that we're going to preemptively see one problem that's [00:07:36] preemptively see one problem that's going to come up [00:07:38] going to come up and this problem we saw before when we [00:07:40] and this problem we saw before when we tried to optimize the zero one loss [00:07:42] tried to optimize the zero one loss so let's look at the gradient of h1 of x [00:07:46] so let's look at the gradient of h1 of x with respect to v1 [00:07:48] with respect to v1 um we can plot this as follows so here [00:07:51] um we can plot this as follows so here is um [00:07:52] is um this [00:07:54] this score z [00:07:55] score z which is the dot product [00:07:57] which is the dot product and [00:07:58] and this is h1 [00:08:00] this is h1 and this is just a step function so the [00:08:03] and this is just a step function so the step function or threshold function is [00:08:05] step function or threshold function is just [00:08:06] just whether d is greater than zero it's one [00:08:08] whether d is greater than zero it's one over here and zero over [00:08:10] over here and zero over here okay so now if you try to degrade [00:08:13] here okay so now if you try to degrade and descend on this uh you're just going [00:08:15] and descend on this uh you're just going to get stuck because the gradients are [00:08:16] to get stuck because the gradients are going to be 0 basically everywhere [00:08:20] going to be 0 basically everywhere so the solution is to replace this [00:08:22] so the solution is to replace this threshold function with a more general [00:08:24] threshold function with a more general activation function sigma [00:08:26] activation function sigma which has more friendly gradients [00:08:30] which has more friendly gradients so classically and by classic i mean [00:08:33] so classically and by classic i mean like in the 80s and 90s people use the [00:08:37] like in the 80s and 90s people use the logistic function [00:08:38] logistic function uh as activation function which looks [00:08:41] uh as activation function which looks like this [00:08:43] like this and this is just a kind of a smooth out [00:08:45] and this is just a kind of a smooth out version of the threshold function [00:08:48] version of the threshold function and in particular its gradients are [00:08:50] and in particular its gradients are zero uh nowhere so that's just great so [00:08:54] zero uh nowhere so that's just great so the gradient you can always move in [00:08:56] the gradient you can always move in progress [00:08:57] progress there is a caveat here which is that [00:09:00] there is a caveat here which is that if you look out here this this function [00:09:02] if you look out here this this function is pretty flat which means that the [00:09:04] is pretty flat which means that the gradient is actually approaching zero [00:09:06] gradient is actually approaching zero which means that if you're out here then [00:09:08] which means that if you're out here then you can get stuck or at least make very [00:09:11] you can get stuck or at least make very slow progress [00:09:14] slow progress so [00:09:14] so in 2012 the value activation was [00:09:18] in 2012 the value activation was invented [00:09:19] invented which just takes a max of x as z and [00:09:22] which just takes a max of x as z and zero so that looks like this [00:09:25] zero so that looks like this so [00:09:26] so if [00:09:27] if the inputs to the value is less than a [00:09:29] the inputs to the value is less than a zero then i'm just going to keep it clip [00:09:31] zero then i'm just going to keep it clip it to zero [00:09:32] it to zero and then otherwise i'm going to just [00:09:34] and then otherwise i'm going to just leave it alone [00:09:36] leave it alone so now this function actually has [00:09:39] so now this function actually has nice gradients over here so the green [00:09:41] nice gradients over here so the green never vanishes it's always you know [00:09:44] never vanishes it's always you know positive and bound away from zero [00:09:48] positive and bound away from zero although over here it is zero [00:09:51] although over here it is zero so it turns out empirically the value [00:09:53] so it turns out empirically the value activation function works really well [00:09:55] activation function works really well it's simpler in a lot of ways so it's [00:09:57] it's simpler in a lot of ways so it's kind of become the activation function [00:09:59] kind of become the activation function of choice here [00:10:02] of choice here so [00:10:02] so um the solution here is to replace this [00:10:06] um the solution here is to replace this threshold step function with an [00:10:07] threshold step function with an activation function choose your favorite [00:10:10] activation function choose your favorite i would choose the value and now you [00:10:12] i would choose the value and now you have uh something that has [00:10:16] have uh something that has non-vanishing gradients [00:10:21] so [00:10:22] so let's now define two layer neural [00:10:24] let's now define two layer neural networks using the machinery that we've [00:10:27] networks using the machinery that we've so far [00:10:28] so far okay so we're gonna define some [00:10:30] okay so we're gonna define some intermediate set problems [00:10:33] intermediate set problems so we start with a feature vector [00:10:35] so we start with a feature vector v of x now i'm going to represent [00:10:38] v of x now i'm going to represent vectors and matrices using these dots [00:10:41] vectors and matrices using these dots so this is a six dimensional [00:10:44] so this is a six dimensional feature vector but in general it's d [00:10:46] feature vector but in general it's d dimensional [00:10:47] dimensional um i'm going to next multiply it by this [00:10:50] um i'm going to next multiply it by this weight matrix [00:10:51] weight matrix which is going to be a [00:10:53] which is going to be a three by six but in general a k by d [00:10:55] three by six but in general a k by d matrix [00:10:57] matrix and now that generates a three [00:10:58] and now that generates a three three-dimensional or k-dimensional [00:11:01] three-dimensional or k-dimensional vector i'm going to send it through this [00:11:04] vector i'm going to send it through this non-linearity [00:11:05] non-linearity activation function like the value or [00:11:07] activation function like the value or the logistic [00:11:08] the logistic and we're going to get [00:11:10] and we're going to get a vector which i'm going to call h of x [00:11:14] a vector which i'm going to call h of x okay so now given this h of x [00:11:17] okay so now given this h of x i can now do prediction by taking h of x [00:11:20] i can now do prediction by taking h of x and simply dot producting it with [00:11:23] and simply dot producting it with a wave vector w [00:11:26] a wave vector w and if i take the sign [00:11:28] and if i take the sign and that gives me the prediction of that [00:11:32] and that gives me the prediction of that neural network [00:11:34] neural network so one thing that's kind of interesting [00:11:36] so one thing that's kind of interesting here is that if you look at this [00:11:37] here is that if you look at this equation [00:11:38] equation it really pretty much looks like the [00:11:40] it really pretty much looks like the equation for a linear classifier the [00:11:43] equation for a linear classifier the only difference is now we have h of x [00:11:45] only difference is now we have h of x instead of v of x [00:11:47] instead of v of x so one way to interpret what neural [00:11:49] so one way to interpret what neural networks are doing is that [00:11:52] networks are doing is that instead of using the original feature [00:11:54] instead of using the original feature vector we've kind of learned a smarter [00:11:57] vector we've kind of learned a smarter representation and at the end of the day [00:12:00] representation and at the end of the day we're still doing a linear [00:12:01] we're still doing a linear classification on top of that feature [00:12:03] classification on top of that feature representation so you can often people [00:12:05] representation so you can often people think about neural networks as doing [00:12:06] think about neural networks as doing feature learning for precisely this [00:12:08] feature learning for precisely this reason [00:12:10] reason and finally [00:12:12] and finally now we can define the hypothesis class f [00:12:14] now we can define the hypothesis class f is equal to set of all predictors [00:12:18] is equal to set of all predictors and the predictor is given parameterized [00:12:20] and the predictor is given parameterized by [00:12:21] by a weight matrix v and of weight vector w [00:12:26] defined up here [00:12:28] defined up here and we can let the weight matrix be any [00:12:31] and we can let the weight matrix be any arbitrary k by d [00:12:33] arbitrary k by d matrix [00:12:35] matrix and we let w be any d-dimensional [00:12:39] and we let w be any d-dimensional vector [00:12:40] vector sorry this d should actually be okay [00:12:42] sorry this d should actually be okay there i will fix that [00:12:45] there i will fix that okay [00:12:46] okay so [00:12:47] so we have to find a hypothesis class that [00:12:49] we have to find a hypothesis class that corresponds to two layer neural networks [00:12:51] corresponds to two layer neural networks for classification [00:12:55] for classification now we can kind of push this farther [00:12:58] now we can kind of push this farther we can go and talk about deep neural [00:12:59] we can go and talk about deep neural networks so remember going back to [00:13:02] networks so remember going back to single layer neural networks aka linear [00:13:04] single layer neural networks aka linear predictors we see that we take the [00:13:06] predictors we see that we take the feature vector we take the dot product [00:13:09] feature vector we take the dot product with respect to a weight vector and you [00:13:11] with respect to a weight vector and you get the score which can be used to drive [00:13:13] get the score which can be used to drive prediction directly in the regression or [00:13:16] prediction directly in the regression or take the sign to get classification [00:13:18] take the sign to get classification predictions um for two layer neural [00:13:21] predictions um for two layer neural networks we take phi of x we take the [00:13:23] networks we take phi of x we take the dot product between [00:13:25] dot product between layer one's uh weight matrix [00:13:27] layer one's uh weight matrix take element y's uh [00:13:29] take element y's uh activation function [00:13:31] activation function and then multiply dot product with a [00:13:34] and then multiply dot product with a weight vector you get the score [00:13:36] weight vector you get the score and now [00:13:37] and now the key thing is that this piece v [00:13:41] the key thing is that this piece v apply v and then apply sigma you can [00:13:44] apply v and then apply sigma you can just iterate over and over again so [00:13:46] just iterate over and over again so here's a three-layer neural network [00:13:49] here's a three-layer neural network big phi of x [00:13:50] big phi of x which is a feature vector you multiply [00:13:52] which is a feature vector you multiply by some matrix v one take a [00:13:55] by some matrix v one take a non-linearity [00:13:56] non-linearity multiply by another matrix take a [00:13:59] multiply by another matrix take a non-linearity [00:14:00] non-linearity and then finally you get some [00:14:03] and then finally you get some vector that you take the dot product [00:14:05] vector that you take the dot product with w and you get the score which can [00:14:07] with w and you get the score which can be used to power your predictions [00:14:11] be used to power your predictions so one small note is that i've left out [00:14:13] so one small note is that i've left out all the bias terms [00:14:15] all the bias terms for notational simplicity in practice [00:14:17] for notational simplicity in practice you would have [00:14:18] you would have you know biased terms [00:14:21] you know biased terms okay and you can imagine just iterating [00:14:23] okay and you can imagine just iterating this um [00:14:24] this um over and over again but you know what is [00:14:27] over and over again but you know what is this doing it it's kind of looks like a [00:14:29] this doing it it's kind of looks like a little bit of abstract nonsense you're [00:14:31] little bit of abstract nonsense you're just multiplying by matrices and [00:14:33] just multiplying by matrices and ascending through non-linearity and you [00:14:34] ascending through non-linearity and you hope something good happens [00:14:36] hope something good happens and you know that's not com there's not [00:14:39] and you know that's not com there's not uh completely false but there are some [00:14:42] uh completely false but there are some intuitions which we can derive [00:14:45] intuitions which we can derive so one intuition is thinking about [00:14:47] so one intuition is thinking about layers as representing multiple [00:14:49] layers as representing multiple levels of abstraction [00:14:51] levels of abstraction so in computer vision let's say the [00:14:54] so in computer vision let's say the input [00:14:54] input is [00:14:55] is an image [00:14:57] an image so you can think about the first layer [00:14:59] so you can think about the first layer as computing some sort of notion of [00:15:02] as computing some sort of notion of edges [00:15:03] edges and the second layer when you multiply [00:15:05] and the second layer when you multiply matrix and you take a non-linearity you [00:15:07] matrix and you take a non-linearity you compute some notion of object parts [00:15:10] compute some notion of object parts and then the third layer you [00:15:13] and then the third layer you um [00:15:15] um multiply by matrix and apply some [00:15:16] multiply by matrix and apply some nonlinearity you get some notion of [00:15:19] nonlinearity you get some notion of objects [00:15:21] objects now this is kind of just a story and we [00:15:23] now this is kind of just a story and we haven't talked at all about learning so [00:15:26] haven't talked at all about learning so this is definitely not true for all [00:15:27] this is definitely not true for all neural networks [00:15:28] neural networks it turns out that [00:15:30] it turns out that when you actually learn a network to [00:15:32] when you actually learn a network to data and you visualize what the weights [00:15:34] data and you visualize what the weights are you actually do get some [00:15:36] are you actually do get some interpretable results which is kind of [00:15:38] interpretable results which is kind of interesting and you know somewhat [00:15:39] interesting and you know somewhat surprising [00:15:42] so now there's a question of uh depth [00:15:45] so now there's a question of uh depth so the fact that you take a feature [00:15:47] so the fact that you take a feature vector and you apply [00:15:50] vector and you apply some sort of transformation again and [00:15:53] some sort of transformation again and again again to get a score [00:15:55] again again to get a score so why why do we do this so one [00:15:58] so why why do we do this so one intuition that we talked about already [00:16:00] intuition that we talked about already is this is representing different levels [00:16:02] is this is representing different levels of abstraction to kind of low level [00:16:05] of abstraction to kind of low level pixels to high level object parts and [00:16:07] pixels to high level object parts and objects [00:16:09] objects another way to think about this is this [00:16:11] another way to think about this is this is performing multiple steps of [00:16:12] is performing multiple steps of computation [00:16:13] computation just like in a classic program if you [00:16:15] just like in a classic program if you get more steps of computation it gives [00:16:18] get more steps of computation it gives you more expressive power you can do [00:16:19] you more expressive power you can do more things [00:16:21] more things you can think about each of these [00:16:23] you can think about each of these operations is simply doing some compute [00:16:27] operations is simply doing some compute now it's it's maybe a kind of a foreign [00:16:29] now it's it's maybe a kind of a foreign type of compute because you're [00:16:30] type of compute because you're multiplying by a crazy unknown matrix [00:16:33] multiplying by a crazy unknown matrix but [00:16:34] but what way we can think about this is that [00:16:36] what way we can think about this is that you set up this compute computation and [00:16:39] you set up this compute computation and learning algorithm is going to figure [00:16:40] learning algorithm is going to figure out what kind of computation makes sense [00:16:43] out what kind of computation makes sense for making the best prediction [00:16:46] for making the best prediction um another piece of intuition is that [00:16:47] um another piece of intuition is that empirically it just happens to work [00:16:49] empirically it just happens to work really well [00:16:50] really well which is not not to be understated [00:16:53] which is not not to be understated um if you're [00:16:55] um if you're looking for a more [00:16:56] looking for a more theoretical reason [00:16:58] theoretical reason um [00:16:59] um there are [00:17:00] there are the jury's kind of still out on this you [00:17:03] the jury's kind of still out on this you can have uh intuitions how um [00:17:07] can have uh intuitions how um you know deeper logical circuits can [00:17:09] you know deeper logical circuits can capture more than shallower ones but [00:17:11] capture more than shallower ones but then there's like the kind of [00:17:13] then there's like the kind of relationship between circuits and neural [00:17:14] relationship between circuits and neural networks which is [00:17:16] networks which is requires a little bit of massaging so [00:17:18] requires a little bit of massaging so this is still kind of a pretty active [00:17:19] this is still kind of a pretty active area of research [00:17:23] so to summarize we start out with the [00:17:25] so to summarize we start out with the very toy problem the xor problem testing [00:17:28] very toy problem the xor problem testing whether two cars are going to collide or [00:17:30] whether two cars are going to collide or not and we used it to motivate problem [00:17:32] not and we used it to motivate problem decomposition and eventually defining uh [00:17:36] decomposition and eventually defining uh neural networks [00:17:39] we saw that [00:17:41] we saw that intuitively neural networks allow you to [00:17:44] intuitively neural networks allow you to define nonlinear predictors but in this [00:17:46] define nonlinear predictors but in this particular way [00:17:49] particular way and the way is to decompose the original [00:17:51] and the way is to decompose the original problem into intermediate sub problems [00:17:54] problem into intermediate sub problems testing of the cars to the far right or [00:17:56] testing of the cars to the far right or the far left [00:17:57] the far left and then combining them [00:17:59] and then combining them over time [00:18:00] over time and you can kind of take this idea [00:18:02] and you can kind of take this idea further and iterate on this [00:18:04] further and iterate on this decomposition multiple times giving rise [00:18:06] decomposition multiple times giving rise to multiple levels of abstraction [00:18:08] to multiple levels of abstraction multiple steps of computation [00:18:11] multiple steps of computation a hypothesis class is now [00:18:14] a hypothesis class is now larger it contains all predictors where [00:18:18] larger it contains all predictors where the weights of all the layers can vary [00:18:20] the weights of all the layers can vary um freely [00:18:23] um freely in the next up we're going to show you [00:18:25] in the next up we're going to show you how to actually learn the weights of a [00:18:27] how to actually learn the weights of a neural network [00:18:28] neural network that is the end ================================================================================ LECTURE 012 ================================================================================ Machine Learning 9 - Backpropagation | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=OcAF-l2xB9Y --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about the back propagation algorithm for [00:00:09] about the back propagation algorithm for computing gradients automatically [00:00:11] computing gradients automatically it's generally associated with training [00:00:13] it's generally associated with training neural networks but it's actually a far [00:00:15] neural networks but it's actually a far more general algorithm [00:00:18] so let's begin with our motivating [00:00:20] so let's begin with our motivating example which is suppose we're doing [00:00:22] example which is suppose we're doing regression with a four layer neural [00:00:24] regression with a four layer neural network [00:00:25] network so remember that we [00:00:27] so remember that we do the loss on a given example [00:00:30] do the loss on a given example the loss with respect to a particular [00:00:32] the loss with respect to a particular example [00:00:33] example and now we have the weights [00:00:36] and now we have the weights of the network [00:00:38] of the network v1 v2 v3 and w [00:00:41] v1 v2 v3 and w is equal to [00:00:43] is equal to and remember the form of the neural [00:00:44] and remember the form of the neural network you start with a feature vector [00:00:48] network you start with a feature vector you multiply it by some weight matrix [00:00:51] you multiply it by some weight matrix that gives you a vector you send it [00:00:53] that gives you a vector you send it through the activation function [00:00:56] through the activation function repeatedly apply it apply a [00:00:58] repeatedly apply it apply a matrix [00:00:59] matrix send it through an activation function [00:01:01] send it through an activation function um the left multiplied by a matrix [00:01:03] um the left multiplied by a matrix center through an activation function [00:01:05] center through an activation function you take the final vector you take the [00:01:07] you take the final vector you take the dot product with respect to the final [00:01:09] dot product with respect to the final weight vector and that gives you your [00:01:12] weight vector and that gives you your score [00:01:13] score so this is your prediction [00:01:16] so this is your prediction subtract [00:01:18] subtract the target value and you square it that [00:01:21] the target value and you square it that gives you your loss [00:01:24] gives you your loss so now if you wanted to train this [00:01:26] so now if you wanted to train this neural network using stochastic gradient [00:01:28] neural network using stochastic gradient descent you would need to [00:01:31] descent you would need to compute the gradient of this loss [00:01:33] compute the gradient of this loss function with respect to each of the [00:01:35] function with respect to each of the parameters so for example [00:01:38] parameters so for example would compute the gradient of the loss [00:01:40] would compute the gradient of the loss with respect to v1 that gives you a [00:01:42] with respect to v1 that gives you a gradient update which you can then use [00:01:44] gradient update which you can then use to update v1 same with v2 [00:01:47] to update v1 same with v2 3 [00:01:48] 3 and w [00:01:50] and w so now you can sit down with this lovely [00:01:53] so now you can sit down with this lovely expression [00:01:54] expression and you can just grind through the map [00:01:56] and you can just grind through the map and get the expressions it's [00:01:58] and get the expressions it's straightforward but it's rather tedious [00:02:01] straightforward but it's rather tedious another question is how can you get the [00:02:03] another question is how can you get the gradients without all doing all this [00:02:05] gradients without all doing all this manual work [00:02:09] so the answer to that is computation [00:02:12] so the answer to that is computation graphs [00:02:13] graphs so here is our loss function again [00:02:16] so here is our loss function again and what we're going to do is [00:02:17] and what we're going to do is write down the computation graph for [00:02:20] write down the computation graph for this mathematics computation graph is a [00:02:23] this mathematics computation graph is a direct acyclic graph whose root node [00:02:25] direct acyclic graph whose root node represents the final expression this [00:02:28] represents the final expression this loss function [00:02:29] loss function and each node [00:02:32] and each node represents intermediate sub expressions [00:02:34] represents intermediate sub expressions like one v of x for example [00:02:38] like one v of x for example now what this computation graph is going [00:02:40] now what this computation graph is going to allow us to do is [00:02:42] to allow us to do is allows us to apply the back propagation [00:02:44] allows us to apply the back propagation algorithm to the computation graph and [00:02:46] algorithm to the computation graph and automatically get gradients out [00:02:50] automatically get gradients out so there's two purposes actually that [00:02:53] so there's two purposes actually that we're going to do this the first is [00:02:56] we're going to do this the first is computing the gradients automatically [00:02:58] computing the gradients automatically and this is how deep learning packages [00:03:00] and this is how deep learning packages like tensorflow and pi torch work behind [00:03:03] like tensorflow and pi torch work behind the hood [00:03:04] the hood and second more of all [00:03:06] and second more of all we're going to use this as a tool to [00:03:08] we're going to use this as a tool to gain insight into the modular structure [00:03:11] gain insight into the modular structure of the gradients and try to demystify [00:03:14] of the gradients and try to demystify because taking gradients by hand you can [00:03:15] because taking gradients by hand you can lead it into situations where you just [00:03:17] lead it into situations where you just have a lot of symbols [00:03:19] have a lot of symbols but using a graph we can start to [00:03:23] but using a graph we can start to see the structure [00:03:26] okay so our starting point is [00:03:29] okay so our starting point is to think about functions as boxes so [00:03:32] to think about functions as boxes so imagine you have this [00:03:34] imagine you have this expression a plus b [00:03:36] expression a plus b and that gives rise to some variable c [00:03:39] and that gives rise to some variable c so i'm going to represent this as a very [00:03:42] so i'm going to represent this as a very simple computation graph where you have [00:03:44] simple computation graph where you have a [00:03:45] a and b [00:03:47] and b and these arrows point into this [00:03:49] and these arrows point into this box that does plus [00:03:52] box that does plus and the result will be labeled as c here [00:03:56] and the result will be labeled as c here okay [00:03:57] okay so [00:03:58] so now the question is [00:04:00] now the question is if i change [00:04:03] if i change a or b by a small amount how much does c [00:04:06] a or b by a small amount how much does c change oh this is just a notion of a [00:04:09] change oh this is just a notion of a gradient [00:04:11] gradient so informally we can look at this as [00:04:14] so informally we can look at this as a [00:04:16] a plus b equals c [00:04:18] plus b equals c now if i go and fiddle [00:04:21] now if i go and fiddle with a a little bit i add epsilon [00:04:24] with a a little bit i add epsilon so what happens to the right hand side [00:04:27] so what happens to the right hand side well on the right hand side i just get [00:04:29] well on the right hand side i just get plus one epsilon [00:04:32] plus one epsilon and what i'm gonna do so the gradient of [00:04:35] and what i'm gonna do so the gradient of c with respect to a [00:04:37] c with respect to a is one [00:04:38] is one i'm just gonna write it on that [00:04:42] i'm just gonna write it on that so this can be [00:04:43] so this can be interpreted as kind of amplification or [00:04:45] interpreted as kind of amplification or gain if i move a by l a little bit this [00:04:49] gain if i move a by l a little bit this is kind of the multiplicative factor [00:04:50] is kind of the multiplicative factor that c needs to be by [00:04:54] that c needs to be by so let's do the other side so a [00:04:57] so let's do the other side so a uh plus b and you add a bit of noise to [00:05:00] uh plus b and you add a bit of noise to b [00:05:00] b and again you get one plus epsilon so [00:05:04] and again you get one plus epsilon so the gradient of c with respect to b is [00:05:07] the gradient of c with respect to b is one as well [00:05:09] one as well here's another example of c equals a [00:05:11] here's another example of c equals a times b [00:05:12] times b so as a computation graph a [00:05:15] so as a computation graph a and b [00:05:16] and b go into this box which uh takes the dot [00:05:19] go into this box which uh takes the dot product [00:05:20] product and you get c [00:05:22] and you get c so [00:05:24] so what happens uh [00:05:27] what happens uh when you add epsilon noise to a [00:05:30] when you add epsilon noise to a a plus epsilon times b [00:05:32] a plus epsilon times b is equal to c [00:05:33] is equal to c plus [00:05:34] plus and now you have b epsilon coming out so [00:05:38] and now you have b epsilon coming out so therefore the gradient of c with respect [00:05:40] therefore the gradient of c with respect to a is now b [00:05:43] to a is now b and analogously [00:05:45] and analogously we add the voice to b and we see that [00:05:48] we add the voice to b and we see that the [00:05:49] the the contribution to the output c is a [00:05:53] the contribution to the output c is a times epsilon a epsilon [00:05:56] times epsilon a epsilon okay therefore the gradient over here is [00:05:59] okay therefore the gradient over here is a [00:06:01] a okay so this all should be kind of [00:06:03] okay so this all should be kind of familiar i just cast the sum and product [00:06:06] familiar i just cast the sum and product rules for differentiation [00:06:08] rules for differentiation in graphical form [00:06:11] so let's do a few more kind of small [00:06:13] so let's do a few more kind of small examples these small examples are going [00:06:15] examples these small examples are going to be the building blocks it turns out [00:06:18] to be the building blocks it turns out that you can take these building blocks [00:06:19] that you can take these building blocks and compose them to build all sorts of [00:06:22] and compose them to build all sorts of more complicated functions [00:06:24] more complicated functions so here's the example we saw before a [00:06:27] so here's the example we saw before a plus b [00:06:28] plus b and the gradients are one and one [00:06:30] and the gradients are one and one a minus b [00:06:33] a minus b the gradients are 1 and minus 1 because [00:06:35] the gradients are 1 and minus 1 because if you add epsilon to b [00:06:38] if you add epsilon to b then [00:06:38] then this difference is going to go down by [00:06:41] this difference is going to go down by epsilon [00:06:42] epsilon here we saw this example a [00:06:44] here we saw this example a times b [00:06:46] times b and the gradients are b and a [00:06:48] and the gradients are b and a um if you look at the squared function a [00:06:51] um if you look at the squared function a squared [00:06:53] squared the gradient with respect to input is 2a [00:06:57] the gradient with respect to input is 2a so [00:06:58] so the kind of power rule [00:07:00] the kind of power rule um let's consider a and b [00:07:03] um let's consider a and b where you take the max [00:07:05] where you take the max okay so this one [00:07:06] okay so this one uh let's think about this so if i add a [00:07:09] uh let's think about this so if i add a little bit of [00:07:11] little bit of add epsilon to a [00:07:12] add epsilon to a how's that going to change the max [00:07:15] how's that going to change the max well [00:07:16] well if a is greater than b [00:07:19] if a is greater than b then it's just going to [00:07:21] then it's just going to change the max by epsilon [00:07:24] change the max by epsilon but if a is less than b then it's going [00:07:26] but if a is less than b then it's going to be [00:07:27] to be 0 because b is going to be [00:07:31] 0 because b is going to be um so the gradient [00:07:32] um so the gradient of this max of a and b with respect to a [00:07:35] of this max of a and b with respect to a is the indicator function of whether a [00:07:38] is the indicator function of whether a is greater than b or not [00:07:41] is greater than b or not and symmetrically [00:07:42] and symmetrically the gradient with respect to b is [00:07:44] the gradient with respect to b is whether b is greater than a [00:07:47] whether b is greater than a okay [00:07:49] okay and finally here is the logistic [00:07:51] and finally here is the logistic function [00:07:53] function a [00:07:54] a uh sending it through this logistic [00:07:55] uh sending it through this logistic function [00:07:56] function and a little bit of algebra which i'll [00:07:58] and a little bit of algebra which i'll spare you of produces that's actually a [00:08:01] spare you of produces that's actually a quite elegant expression which is sigma [00:08:03] quite elegant expression which is sigma times one minus [00:08:07] and you can kind of check [00:08:08] and you can kind of check that as a goes to [00:08:11] that as a goes to infinity or minus infinity [00:08:14] infinity or minus infinity remember the the sigmoid is going to [00:08:17] remember the the sigmoid is going to saturate at one or zero which means this [00:08:20] saturate at one or zero which means this gradient is actually going to go to zero [00:08:22] gradient is actually going to go to zero so that's just a simple sanity check [00:08:25] so that's just a simple sanity check okay so these are the basic building [00:08:27] okay so these are the basic building blocks [00:08:28] blocks and that's really all the the kind of [00:08:31] and that's really all the the kind of the um [00:08:33] the um uh brute force differentiation that [00:08:35] uh brute force differentiation that we're gonna really do the rest is just [00:08:37] we're gonna really do the rest is just composition [00:08:40] so so now we take these building blocks [00:08:43] so so now we take these building blocks and we put them together so here's a [00:08:45] and we put them together so here's a simple example so suppose you take a [00:08:48] simple example so suppose you take a square you get b [00:08:49] square you get b and then you take b and you square it [00:08:51] and then you take b and you square it and you get c [00:08:53] and you get c so [00:08:54] so by the [00:08:56] by the building blocks from the previous slide [00:08:57] building blocks from the previous slide we know that the gradient [00:09:00] we know that the gradient on this edge is going to be 2 [00:09:02] on this edge is going to be 2 times input here which is b [00:09:05] times input here which is b and the gradient along this edge is [00:09:07] and the gradient along this edge is going to be 2 times [00:09:08] going to be 2 times a [00:09:10] a okay so now using ts2 we can apply [00:09:14] okay so now using ts2 we can apply chain rule from calculus to compute the [00:09:17] chain rule from calculus to compute the gradient [00:09:18] gradient of c with respect to a [00:09:21] of c with respect to a and this is going to be nothing more [00:09:23] and this is going to be nothing more than just the product of those two [00:09:25] than just the product of those two quantities [00:09:27] quantities so in this case we get 2b times 2a [00:09:30] so in this case we get 2b times 2a and remember that [00:09:32] and remember that b is equal to a squared [00:09:35] b is equal to a squared subset that 2 that in you get 4 8 cubed [00:09:40] subset that 2 that in you get 4 8 cubed and remember [00:09:42] and remember c is a to the fourth so we can verify [00:09:45] c is a to the fourth so we can verify that this is indeed consistent with [00:09:47] that this is indeed consistent with using the power rule [00:09:49] using the power rule okay [00:09:50] okay so in general [00:09:52] so in general you can take [00:09:54] you can take compute these gradients by simply taking [00:09:56] compute these gradients by simply taking the product along edges [00:09:59] the product along edges and that's going to be really useful [00:10:01] and that's going to be really useful um [00:10:02] um on this slide okay so now let's turn to [00:10:05] on this slide okay so now let's turn to our first example [00:10:07] our first example um [00:10:08] um the hinge loss for linear classification [00:10:10] the hinge loss for linear classification we actually did this one before but i [00:10:12] we actually did this one before but i just want to do it again through the [00:10:14] just want to do it again through the lens of computation graph so here is a [00:10:16] lens of computation graph so here is a loss function [00:10:18] loss function and given this loss function i'm going [00:10:20] and given this loss function i'm going to construct the computation graph and [00:10:22] to construct the computation graph and then compute the gradient [00:10:24] then compute the gradient of the loss with respect to w [00:10:27] of the loss with respect to w so working bottom up we have [00:10:29] so working bottom up we have the weight [00:10:30] the weight vector and we have the future vector and [00:10:33] vector and we have the future vector and we take the dot product that gives us a [00:10:36] we take the dot product that gives us a score [00:10:38] score we take the score we take y and multiply [00:10:40] we take the score we take y and multiply them together and that gives us the [00:10:42] them together and that gives us the margin [00:10:43] margin 1 [00:10:44] 1 minus the margin [00:10:47] minus the margin and you take the max of that [00:10:49] and you take the max of that and 0 and you get the loss [00:10:52] and 0 and you get the loss so another nice thing about the [00:10:54] so another nice thing about the computation graph is it allows you to [00:10:56] computation graph is it allows you to annotate these sub expressions and see [00:10:58] annotate these sub expressions and see how the computation and what the pieces [00:11:02] how the computation and what the pieces are [00:11:04] are okay so now let us compute the gradient [00:11:07] okay so now let us compute the gradient of the loss with respect to w [00:11:11] of the loss with respect to w and what i'm going to do here [00:11:13] and what i'm going to do here is all i need to do is compute the [00:11:15] is all i need to do is compute the gradients along all these edges [00:11:18] gradients along all these edges from [00:11:19] from loss down to w [00:11:22] loss down to w okay so let's begin at the top here [00:11:25] okay so let's begin at the top here so the gradient width and oh here is our [00:11:27] so the gradient width and oh here is our cheat sheet [00:11:28] cheat sheet don't forget the cheat sheet [00:11:30] don't forget the cheat sheet so we just now pattern match here's a [00:11:32] so we just now pattern match here's a max over two things well what's on this [00:11:35] max over two things well what's on this edge it's first thing [00:11:37] edge it's first thing greater than second thing okay so [00:11:40] greater than second thing okay so the gradient here is going to be first [00:11:42] the gradient here is going to be first thing which is one minus margin greater [00:11:44] thing which is one minus margin greater than the second thing which is zero [00:11:47] than the second thing which is zero and now what about this edge so here is [00:11:50] and now what about this edge so here is minus one so let's have minus one [00:11:53] minus one so let's have minus one what about this uh times so times is the [00:11:57] what about this uh times so times is the second input second input [00:12:00] second input second input and the second input here is y [00:12:03] and the second input here is y and here's another times the second [00:12:05] and here's another times the second input is f of x [00:12:08] input is f of x so [00:12:09] so this allows us to think about the [00:12:11] this allows us to think about the gradients one piece at a time and all [00:12:14] gradients one piece at a time and all the little edges are just [00:12:16] the little edges are just implications of this cheat sheet here [00:12:20] implications of this cheat sheet here okay now we're ready to read off the [00:12:23] okay now we're ready to read off the gradient [00:12:24] gradient of the loss with respect to w [00:12:26] of the loss with respect to w and this is just going to be product of [00:12:28] and this is just going to be product of all [00:12:29] all the [00:12:30] the edges [00:12:31] edges okay so we have [00:12:33] okay so we have first we have [00:12:35] first we have one minus margin greater than 0 so [00:12:38] one minus margin greater than 0 so that's [00:12:39] that's i'm going to rewrite that as margin less [00:12:41] i'm going to rewrite that as margin less than one [00:12:42] than one verify that's the same thing [00:12:44] verify that's the same thing but we have a minus sign here [00:12:47] but we have a minus sign here that's a minus sign here [00:12:49] that's a minus sign here and then we have [00:12:50] and then we have y and then we have phi of x [00:12:55] y and then we have phi of x you multiply them all together and [00:12:57] you multiply them all together and that's the expression and you can verify [00:12:59] that's the expression and you can verify that [00:13:00] that this is indeed [00:13:01] this is indeed the gradient of the loss function [00:13:06] the gradient of the loss function so in summary [00:13:07] so in summary we computed from uh computed the [00:13:09] we computed from uh computed the computation graph [00:13:10] computation graph and then we apply the this cheat sheet [00:13:14] and then we apply the this cheat sheet to the individual edges and then you [00:13:15] to the individual edges and then you just multiply them all together [00:13:20] and just as another note [00:13:22] and just as another note remember the gradient [00:13:25] remember the gradient with respect to w is really [00:13:28] with respect to w is really think about perturbations if you change [00:13:30] think about perturbations if you change w by a little bit how much is a loss [00:13:33] w by a little bit how much is a loss going to change [00:13:35] going to change and the change is going to be the [00:13:38] and the change is going to be the product of all these kind of [00:13:39] product of all these kind of amplifications [00:13:40] amplifications evaluated at a particular [00:13:42] evaluated at a particular point [00:13:45] all right so now let's do neural [00:13:48] all right so now let's do neural networks now [00:13:49] networks now um so this is not going to be really [00:13:52] um so this is not going to be really anything new it's just going to be a [00:13:53] anything new it's just going to be a different example [00:13:55] different example so i'm going to do a two-layer layer [00:13:58] so i'm going to do a two-layer layer neural networks [00:14:00] neural networks and we're going to again build this [00:14:02] and we're going to again build this computation graph up [00:14:04] computation graph up so we have the feature vector [00:14:06] so we have the feature vector you have the first layer weight matrix v [00:14:09] you have the first layer weight matrix v you take the product [00:14:12] you take the product and then you stick this through the [00:14:14] and then you stick this through the activation function and we're going to [00:14:16] activation function and we're going to label that h [00:14:18] label that h which is the hidden uh [00:14:20] which is the hidden uh hidden vector [00:14:22] hidden vector and now we're going to take the dot [00:14:24] and now we're going to take the dot product of w and h that gives you [00:14:27] product of w and h that gives you the score [00:14:29] the score and then now it's a score [00:14:33] and then now it's a score minus y [00:14:34] minus y is a residual and the residual square is [00:14:37] is a residual and the residual square is a loss [00:14:39] a loss okay [00:14:39] okay another aside is that the computation [00:14:42] another aside is that the computation graph really allows you to see visually [00:14:45] graph really allows you to see visually this modularity so the part up here [00:14:48] this modularity so the part up here is just [00:14:50] is just the square loss [00:14:51] the square loss and the part down here is anything [00:14:54] and the part down here is anything any way of computing [00:14:58] any way of computing any way of computing a score [00:15:00] any way of computing a score so before we had a linear classifier a [00:15:03] so before we had a linear classifier a class linear predictor now we have a [00:15:06] class linear predictor now we have a two-layer neural network it could be a [00:15:07] two-layer neural network it could be a four-layer neural network which [00:15:09] four-layer neural network which this computation graph is just [00:15:12] this computation graph is just okay so that's a computation graph now [00:15:14] okay so that's a computation graph now let's uh [00:15:16] let's uh to perform stochastic gradient descent [00:15:18] to perform stochastic gradient descent we need to compute the gradient with [00:15:19] we need to compute the gradient with respect to both w [00:15:21] respect to both w and v [00:15:23] and v okay so let's compute the gradient with [00:15:25] okay so let's compute the gradient with respect to w of the loss [00:15:28] respect to w of the loss and [00:15:29] and what i'm going to do is [00:15:32] what i'm going to do is look at the [00:15:35] look at the the edges and could be the gradients [00:15:37] the edges and could be the gradients okay so here's our cheat sheet um so [00:15:41] okay so here's our cheat sheet um so okay what goes on this edge what's the [00:15:43] okay what goes on this edge what's the gradient of the square [00:15:45] gradient of the square this is just two times the input which [00:15:47] this is just two times the input which is in this case two times residual [00:15:50] is in this case two times residual um [00:15:51] um what about this edge [00:15:53] what about this edge so um [00:15:55] so um minus so this should just be a one [00:15:58] minus so this should just be a one on here [00:15:59] on here and then what about this edge uh this is [00:16:02] and then what about this edge uh this is just going to be the second input um [00:16:05] just going to be the second input um right here so that is an h [00:16:08] right here so that is an h okay [00:16:09] okay so now multiply all these things [00:16:11] so now multiply all these things together and you get the gradient [00:16:13] together and you get the gradient of um [00:16:15] of um the loss function with respect to w [00:16:19] the loss function with respect to w okay [00:16:22] okay all right um [00:16:24] all right um so [00:16:25] so one thing you can um kind of double [00:16:28] one thing you can um kind of double check is that we did do the gradient of [00:16:30] check is that we did do the gradient of the square lots for linear predictors [00:16:33] the square lots for linear predictors and it was also two times the residual [00:16:35] and it was also two times the residual times the feature vector instead of [00:16:38] times the feature vector instead of h [00:16:39] h and now we just have h which is a kind [00:16:42] and now we just have h which is a kind of a stand-in for [00:16:44] of a stand-in for the feature vector with [00:16:46] the feature vector with as far as w is concerned [00:16:49] as far as w is concerned so that's kind of a nice sanity check [00:16:51] so that's kind of a nice sanity check all right so now let's do this more [00:16:53] all right so now let's do this more complicated one so this we want to [00:16:55] complicated one so this we want to compute the gradient with respect to v [00:16:58] compute the gradient with respect to v of loss of [00:17:00] of loss of all the arguments [00:17:03] all the arguments and this equals [00:17:05] and this equals um let's fill in all the edges um [00:17:09] um let's fill in all the edges um so first of all notice that [00:17:11] so first of all notice that these two [00:17:13] these two uh edges are actually in common with [00:17:15] uh edges are actually in common with this path [00:17:16] this path so we can go ahead and write them down [00:17:18] so we can go ahead and write them down so one cool thing about computation [00:17:20] so one cool thing about computation graphs is it allows you to see [00:17:23] graphs is it allows you to see the shared structure that the gradients [00:17:26] the shared structure that the gradients are actually have themselves also have [00:17:28] are actually have themselves also have common sub expressions [00:17:31] common sub expressions okay so now we need to do more work here [00:17:34] okay so now we need to do more work here so the gradient on this edge is going to [00:17:37] so the gradient on this edge is going to be [00:17:37] be the other input which is w [00:17:40] the other input which is w um this is [00:17:42] um this is sigma so the gradient is going to be [00:17:44] sigma so the gradient is going to be sigma the input minus 1 minus [00:17:47] sigma the input minus 1 minus sigma [00:17:48] sigma so this is going to just be h [00:17:51] so this is going to just be h times 1 minus h [00:17:55] this [00:17:57] this this hollow circle here represents the [00:18:00] this hollow circle here represents the element product of a vector so you just [00:18:03] element product of a vector so you just take two vectors and you multiply the [00:18:04] take two vectors and you multiply the elements together [00:18:06] elements together and this is because this [00:18:08] and this is because this function is applying just element-wise [00:18:12] function is applying just element-wise and then what about this final [00:18:15] and then what about this final edge this is just going to be p of x [00:18:18] edge this is just going to be p of x which is just other input [00:18:20] which is just other input and now we can just multiply the rest of [00:18:22] and now we can just multiply the rest of these things together so we have w [00:18:25] these things together so we have w times [00:18:26] times uh h times y minus h [00:18:30] uh h times y minus h times a fee of x [00:18:33] times a fee of x transpose [00:18:36] transpose so there's a slight kind of uh annoyance [00:18:39] so there's a slight kind of uh annoyance here because here we have v times v of x [00:18:42] here because here we have v times v of x whereas before um there's no transpose [00:18:45] whereas before um there's no transpose here because we just have w dot [00:18:47] here because we just have w dot something and w dot is the same as w [00:18:49] something and w dot is the same as w transpose okay so [00:18:52] transpose okay so uh but the the high level is that [00:18:55] uh but the the high level is that the product of all of these green pieces [00:18:59] the product of all of these green pieces yields the gradient [00:19:03] of the loss with respect to v [00:19:06] of the loss with respect to v okay [00:19:08] okay all right so that [00:19:11] all right so that finishes up this example [00:19:15] so now [00:19:17] so now we have mainly used this graphical [00:19:18] we have mainly used this graphical representation to visualize the [00:19:20] representation to visualize the computation of function [00:19:22] computation of function values using gradients [00:19:25] values using gradients but you know the promise of back [00:19:26] but you know the promise of back propagation is that we didn't have to do [00:19:28] propagation is that we didn't have to do any of that at all i just did that to [00:19:31] any of that at all i just did that to kind of illustrate [00:19:32] kind of illustrate the the inner workings of [00:19:35] the the inner workings of radiant computations on the computation [00:19:38] radiant computations on the computation but now we're going to introduce a batch [00:19:40] but now we're going to introduce a batch propagation algorithm which is a general [00:19:42] propagation algorithm which is a general procedure for computing these gradients [00:19:44] procedure for computing these gradients so we never have to worry about it [00:19:47] so we never have to worry about it i'm going to do this back propagation [00:19:50] i'm going to do this back propagation for a simple example which is just this [00:19:53] for a simple example which is just this squared [00:19:54] squared loss and linear regression [00:19:58] loss and linear regression and one note is that [00:20:00] and one note is that previously we've worked with uh [00:20:03] previously we've worked with uh symbolic expressions but the actual [00:20:07] symbolic expressions but the actual algorithm is going to operate on numbers [00:20:09] algorithm is going to operate on numbers usually [00:20:10] usually so what i'm going to do is work with a [00:20:12] so what i'm going to do is work with a concrete example and walk the through [00:20:15] concrete example and walk the through the back propagation algorithm with this [00:20:17] the back propagation algorithm with this example [00:20:19] example so the back propagation algorithm [00:20:21] so the back propagation algorithm includes two steps a forward uh step and [00:20:24] includes two steps a forward uh step and a backwards step [00:20:26] a backwards step so in the forward step what we're going [00:20:28] so in the forward step what we're going to do is we're going to compute a bunch [00:20:30] to do is we're going to compute a bunch of forward values [00:20:33] of forward values from the leaves to the root [00:20:36] from the leaves to the root and each forward value is simply the [00:20:38] and each forward value is simply the value of that sub expression rooted at i [00:20:41] value of that sub expression rooted at i the value could be a scalar a vector or [00:20:43] the value could be a scalar a vector or a matrix [00:20:45] a matrix so let's walk through this example here [00:20:47] so let's walk through this example here okay so at the leaves we have um w which [00:20:51] okay so at the leaves we have um w which is three one [00:20:52] is three one and we have the feature vector one two [00:20:56] and we have the feature vector one two so now if you take um [00:20:59] so now if you take um these two quantities [00:21:01] these two quantities and you [00:21:02] and you multiple take the dot product [00:21:05] multiple take the dot product you get three plus two which is [00:21:08] you get three plus two which is five [00:21:10] five um and now you take the score five and [00:21:13] um and now you take the score five and you take y [00:21:15] you take y you subtract them [00:21:17] you subtract them and you get the residual which is three [00:21:21] and you get the residual which is three notice that the forward values of this [00:21:23] notice that the forward values of this node is five and the four value of this [00:21:25] node is five and the four value of this node is three [00:21:27] node is three and now finally you [00:21:29] and now finally you square this [00:21:31] square this and the value of the square is uh [00:21:34] and the value of the square is uh 3 squared which is 9. [00:21:36] 3 squared which is 9. or value at this node is [00:21:40] or value at this node is okay so now we're done with a forward [00:21:42] okay so now we're done with a forward phase all we've done is evaluated [00:21:44] phase all we've done is evaluated uh the loss [00:21:46] uh the loss but importantly we have also remembered [00:21:49] but importantly we have also remembered all the values along the way which will [00:21:51] all the values along the way which will become come in handy [00:21:54] become come in handy so now the backward step is we're going [00:21:59] so now the backward step is we're going to compute [00:22:00] to compute a backward value [00:22:02] a backward value gi 1 for every node [00:22:05] gi 1 for every node and this is going to be [00:22:07] and this is going to be the gradient of the loss with respect to [00:22:10] the gradient of the loss with respect to the value at that node if that node [00:22:12] the value at that node if that node changes value how does the loss change [00:22:16] changes value how does the loss change so the backward pass is going to compute [00:22:18] so the backward pass is going to compute the values from the root to the leaves [00:22:21] the values from the root to the leaves so let's do this first of this example [00:22:24] so let's do this first of this example so the base case [00:22:25] so the base case gradient of the loss with respect to [00:22:27] gradient of the loss with respect to loss is one [00:22:29] loss is one um [00:22:30] um and now [00:22:32] and now we [00:22:33] we look at the gradient on this edge we did [00:22:35] look at the gradient on this edge we did this before it's just two times the [00:22:37] this before it's just two times the residual [00:22:38] residual okay so now [00:22:40] okay so now um we need to compute the backward value [00:22:43] um we need to compute the backward value of this node [00:22:45] of this node okay to do that [00:22:47] okay to do that we're going to take [00:22:48] we're going to take the backward value of the parent and [00:22:50] the backward value of the parent and multiply whatever is on this edge what's [00:22:54] multiply whatever is on this edge what's on this edge is two times the residual [00:22:56] on this edge is two times the residual the residual is three [00:22:58] the residual is three so it's two times three which is six [00:23:01] so it's two times three which is six and so one times six is six [00:23:04] and so one times six is six notice that in computing this backward [00:23:06] notice that in computing this backward value i'm using [00:23:08] value i'm using the intermediate computations for the [00:23:10] the intermediate computations for the from the forward pass [00:23:13] from the forward pass okay so let's continue [00:23:15] okay so let's continue so the gradient on this edge is 1. [00:23:19] so the gradient on this edge is 1. so backward value here is [00:23:21] so backward value here is 6 which is the parent backward value [00:23:24] 6 which is the parent backward value times what's on this edge which is 1 [00:23:26] times what's on this edge which is 1 that gives us 6. [00:23:28] that gives us 6. and then the backward value of this node [00:23:31] and then the backward value of this node is 6 times what's on this edge which is [00:23:34] is 6 times what's on this edge which is this other input one two [00:23:37] this other input one two and that gives us six comma [00:23:41] so to conclude the back complication [00:23:43] so to conclude the back complication algorithm takes these concrete values [00:23:46] algorithm takes these concrete values this expression [00:23:48] this expression and produces the gradient of the loss [00:23:52] and produces the gradient of the loss with respect to w evaluated at these [00:23:55] with respect to w evaluated at these concrete values and that's 6 comma 12. [00:23:59] concrete values and that's 6 comma 12. okay [00:24:00] okay and the math propagation algorithm [00:24:01] and the math propagation algorithm remember works for any computation graph [00:24:04] remember works for any computation graph four layer neural networks [00:24:06] four layer neural networks much more complicated models but this is [00:24:09] much more complicated models but this is just a simple example to show you the [00:24:11] just a simple example to show you the dynamics of forward pass [00:24:13] dynamics of forward pass and backward [00:24:16] pass okay so now we have the back [00:24:19] pass okay so now we have the back propagation algorithm we compute [00:24:21] propagation algorithm we compute gradients we stick these gradients into [00:24:23] gradients we stick these gradients into stochastic gradient descent and then we [00:24:25] stochastic gradient descent and then we just [00:24:26] just run scans the gradients in and then we [00:24:28] run scans the gradients in and then we get some weights [00:24:30] get some weights um so now one question is [00:24:33] um so now one question is uh [00:24:34] uh what do we get um so we [00:24:36] what do we get um so we wanted to optimize the training loss [00:24:39] wanted to optimize the training loss using stochastic gradient descent [00:24:42] using stochastic gradient descent but running stochastic ingredients in [00:24:44] but running stochastic ingredients in does it actually minimize this weight [00:24:47] does it actually minimize this weight this is a little bit of a delicate uh [00:24:49] this is a little bit of a delicate uh question here [00:24:51] question here so [00:24:52] so for linear predictors [00:24:54] for linear predictors it turns out that the training loss for [00:24:56] it turns out that the training loss for a complex loss is going to be a convex [00:24:58] a complex loss is going to be a convex function [00:24:59] function which [00:25:01] which means that it is going to have a single [00:25:04] means that it is going to have a single global [00:25:06] global minimum [00:25:08] minimum which means that if you start at some [00:25:09] which means that if you start at some point and then you just follow your [00:25:11] point and then you just follow your nodes by running gradient set with [00:25:13] nodes by running gradient set with appropriate step size it's going to [00:25:15] appropriate step size it's going to converge to the goal law optima [00:25:19] converge to the goal law optima but for neural networks [00:25:21] but for neural networks the training loss is non-complex [00:25:25] the training loss is non-complex and which means that [00:25:26] and which means that there's no guarantees at all that you're [00:25:29] there's no guarantees at all that you're going to converge to the global minimum [00:25:31] going to converge to the global minimum you're lucky you can convert to a local [00:25:33] you're lucky you can convert to a local mill [00:25:35] mill so optimization of neural networks is in [00:25:37] so optimization of neural networks is in principle hard [00:25:39] principle hard but [00:25:40] but of course people do it anyway and you uh [00:25:43] of course people do it anyway and you uh actually get some [00:25:44] actually get some good results [00:25:46] good results so that there's a gap between theory and [00:25:48] so that there's a gap between theory and practice which is uh not quite [00:25:50] practice which is uh not quite understood yet [00:25:54] but in practice um [00:25:57] but in practice um getting neuronal works to train properly [00:25:59] getting neuronal works to train properly is a little bit of art so i think of it [00:26:01] is a little bit of art so i think of it as kind of like driving stick there's [00:26:03] as kind of like driving stick there's just a lot of [00:26:04] just a lot of degrees of freedom you can stall and get [00:26:07] degrees of freedom you can stall and get stuck but if you know what you're doing [00:26:09] stuck but if you know what you're doing you can actually get a lot of good [00:26:10] you can actually get a lot of good results okay [00:26:12] results okay so here are some examples just to give [00:26:15] so here are some examples just to give you a flavor of what needs to be done [00:26:18] you a flavor of what needs to be done okay so here is a 200 network and here [00:26:21] okay so here is a 200 network and here is the loss function [00:26:23] is the loss function um the first is [00:26:25] um the first is initialization matters so if you have a [00:26:26] initialization matters so if you have a convex function wherever you initialize [00:26:28] convex function wherever you initialize you run it for long enough you converge [00:26:30] you run it for long enough you converge to that global optimum [00:26:32] to that global optimum for a non-convex function if you [00:26:34] for a non-convex function if you initialize here you might get stuck up [00:26:35] initialize here you might get stuck up here if you initialize over here you'll [00:26:37] here if you initialize over here you'll get stuck here and so on [00:26:39] get stuck here and so on so generally you have to be a little bit [00:26:41] so generally you have to be a little bit careful about how [00:26:42] careful about how you initialize um you can't initialize [00:26:46] you initialize um you can't initialize at zero [00:26:47] at zero because all it turns out that all the [00:26:50] because all it turns out that all the rows of your [00:26:52] rows of your play matrix are going to be identical [00:26:54] play matrix are going to be identical which is not very useful [00:26:56] which is not very useful um so you temporarily initialize around [00:26:59] um so you temporarily initialize around zero with some amount of random noise or [00:27:01] zero with some amount of random noise or you can use pre-training to initialize [00:27:04] you can use pre-training to initialize uh your network as well [00:27:07] uh your network as well which we won't cover right now [00:27:10] which we won't cover right now um [00:27:10] um another [00:27:12] another thing that people do is called over [00:27:14] thing that people do is called over parameterization so here this [00:27:16] parameterization so here this corresponds to adding more hidden units [00:27:18] corresponds to adding more hidden units than you kind of really need [00:27:20] than you kind of really need this corresponds to having more a lot of [00:27:23] this corresponds to having more a lot of rows of this uh of this matrix [00:27:26] rows of this uh of this matrix and the idea here is that [00:27:29] and the idea here is that the more [00:27:30] the more hidden units you have the more kind of [00:27:32] hidden units you have the more kind of quote-unquote chances you have [00:27:34] quote-unquote chances you have of [00:27:35] of having the network learn something [00:27:37] having the network learn something reasonable by your data so some of the [00:27:39] reasonable by your data so some of the units might die off and not be very [00:27:40] units might die off and not be very useful but [00:27:42] useful but you know maybe like some fraction of [00:27:44] you know maybe like some fraction of them will actually be useful [00:27:47] them will actually be useful and the final [00:27:48] and the final thing that people do is using adaptive [00:27:50] thing that people do is using adaptive step sizes which is generally an [00:27:53] step sizes which is generally an extension of stochastic gradient descent [00:27:55] extension of stochastic gradient descent remember as the caster grain is said you [00:27:57] remember as the caster grain is said you have [00:27:57] have we had a single step size ada which [00:28:00] we had a single step size ada which controlled how fast you move [00:28:02] controlled how fast you move with methods like atograd or atom you [00:28:05] with methods like atograd or atom you actually get a uh per features uh or per [00:28:09] actually get a uh per features uh or per uh parameter step size so for every [00:28:12] uh parameter step size so for every weight you get a number which dictates [00:28:14] weight you get a number which dictates how fast you should be moving in that [00:28:16] how fast you should be moving in that direction [00:28:17] direction and this generally leads to better [00:28:19] and this generally leads to better results [00:28:22] okay so one maybe high level thing to [00:28:25] okay so one maybe high level thing to keep in mind is [00:28:26] keep in mind is don't let your gradients vanish or [00:28:28] don't let your gradients vanish or explode so [00:28:30] explode so um if i explain this it will become kind [00:28:33] um if i explain this it will become kind of clear so when you run gradient [00:28:35] of clear so when you run gradient descent or gradients the cascading [00:28:37] descent or gradients the cascading descent [00:28:38] descent if your gradients vanish which means [00:28:40] if your gradients vanish which means become [00:28:41] become too small or close to zero then you [00:28:43] too small or close to zero then you won't make you'll get stuck and you [00:28:44] won't make you'll get stuck and you won't make progress [00:28:46] won't make progress but your gradients become too large [00:28:49] but your gradients become too large then you'll just explode and you will [00:28:51] then you'll just explode and you will oscillate and [00:28:53] oscillate and might diverge [00:28:55] might diverge so [00:28:56] so with careful initialization and setting [00:28:58] with careful initialization and setting up the step sizes generally uh and even [00:29:01] up the step sizes generally uh and even designing of the neural uh network [00:29:04] designing of the neural uh network architecture [00:29:05] architecture all of this is around kind of making [00:29:08] all of this is around kind of making sure that your gradients don't vanish or [00:29:10] sure that your gradients don't vanish or explode [00:29:11] explode okay so that's all the guidance i'll [00:29:13] okay so that's all the guidance i'll provide you there's a lot more to be [00:29:14] provide you there's a lot more to be said on this topic we're just kind of [00:29:16] said on this topic we're just kind of giving you a [00:29:18] giving you a high level overview [00:29:22] okay so let's summarize now so [00:29:25] okay so let's summarize now so the most important topic of [00:29:27] the most important topic of this module is that of a computation [00:29:30] this module is that of a computation group [00:29:32] group this allows you to represent arbitrary [00:29:34] this allows you to represent arbitrary mathematical expressions and these [00:29:36] mathematical expressions and these expressions are built out of these [00:29:38] expressions are built out of these simple building blocks and i hope that [00:29:40] simple building blocks and i hope that this car the idea of computation graphs [00:29:43] this car the idea of computation graphs will allow you to get a better visual [00:29:46] will allow you to get a better visual understanding of what your mathematical [00:29:49] understanding of what your mathematical expressions are doing and also what [00:29:51] expressions are doing and also what gradient computations are about [00:29:54] gradient computations are about and [00:29:55] and then we saw we had a back propagation [00:29:58] then we saw we had a back propagation algorithm which is this general purpose [00:30:00] algorithm which is this general purpose algorithm for leveraging the computation [00:30:02] algorithm for leveraging the computation graph to make uh compute the gradients [00:30:06] graph to make uh compute the gradients so notice that [00:30:09] so notice that we've done this kind of in the context [00:30:11] we've done this kind of in the context of neural networks but i stress that [00:30:13] of neural networks but i stress that computation graphs and back propagation [00:30:16] computation graphs and back propagation is fully general it allows you to handle [00:30:19] is fully general it allows you to handle many many functions [00:30:21] many many functions and the generality [00:30:23] and the generality is one of these reasons that you can [00:30:26] is one of these reasons that you can allows you to iterate very quickly on [00:30:28] allows you to iterate very quickly on new types of models and loss functions [00:30:31] new types of models and loss functions and [00:30:32] and opens up this new paradigm for model [00:30:34] opens up this new paradigm for model development [00:30:35] development differential programming which we'll [00:30:37] differential programming which we'll talk about in a future module [00:30:39] talk about in a future module all right that's it thanks ================================================================================ LECTURE 013 ================================================================================ Machine Learning 10 - Differentiable Programming | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=c5btEEisp_g --- Transcript [00:00:05] hi in this module i'm going to briefly [00:00:07] hi in this module i'm going to briefly introduce the idea of differential [00:00:09] introduce the idea of differential programming [00:00:10] programming and differential programming kind of [00:00:12] and differential programming kind of just runs off with ideas of computation [00:00:15] just runs off with ideas of computation graphs and back propagation that we [00:00:17] graphs and back propagation that we developed for simple neural networks [00:00:19] developed for simple neural networks um there's really enough to say here to [00:00:21] um there's really enough to say here to fill up an entire course at least so i'm [00:00:24] fill up an entire course at least so i'm going to keep trying to keep things [00:00:26] going to keep trying to keep things pretty high level but i will try to [00:00:28] pretty high level but i will try to highlight the power of composition [00:00:32] highlight the power of composition so differential programming is closely [00:00:34] so differential programming is closely related deep learning i've adopted the [00:00:36] related deep learning i've adopted the former term as an attempt to be more [00:00:39] former term as an attempt to be more precise in terms of highlighting the [00:00:42] precise in terms of highlighting the mechanics of writing models as you would [00:00:45] mechanics of writing models as you would code [00:00:48] so if you look around at deep learning [00:00:49] so if you look around at deep learning today there's some pretty complex models [00:00:52] today there's some pretty complex models which have many layers potential [00:00:54] which have many layers potential mechanisms residual connections [00:00:57] mechanisms residual connections to name a few and this could be quite [00:00:59] to name a few and this could be quite overwhelming at first glance [00:01:01] overwhelming at first glance but when you look closer [00:01:03] but when you look closer you'll notice that these complex models [00:01:05] you'll notice that these complex models are actually composed of functions and [00:01:08] are actually composed of functions and these functions themselves are composed [00:01:10] these functions themselves are composed of smaller functions so this is the [00:01:13] of smaller functions so this is the programming part of differential [00:01:14] programming part of differential programming which allows you to build up [00:01:16] programming which allows you to build up increasingly more sophisticated model [00:01:19] increasingly more sophisticated model without losing track of what's going on [00:01:24] so let's begin with our familiar example [00:01:27] so let's begin with our familiar example the three layer neural network [00:01:29] the three layer neural network so remember that in [00:01:31] so remember that in three layer neural network we start with [00:01:34] three layer neural network we start with our feature vector [00:01:35] our feature vector this case is a six dimensional vector [00:01:38] this case is a six dimensional vector and we left multiply by a matrix [00:01:41] and we left multiply by a matrix um i've drawn some lines here to help us [00:01:44] um i've drawn some lines here to help us interpret this matrix as a set of rows [00:01:48] interpret this matrix as a set of rows so each row corresponds to a hidden unit [00:01:50] so each row corresponds to a hidden unit and i'm going to take the dot product of [00:01:52] and i'm going to take the dot product of each row with [00:01:54] each row with the input vector to produce a hidden [00:01:57] the input vector to produce a hidden vector of dimension four [00:02:00] vector of dimension four i'm going to add a bias term [00:02:03] i'm going to add a bias term and then i'm going to apply an [00:02:04] and then i'm going to apply an activation function element wise for [00:02:07] activation function element wise for example the relu or the logistic [00:02:10] example the relu or the logistic now i have a vector [00:02:14] now i have a vector and now i can do the same thing again i [00:02:16] and now i can do the same thing again i apply [00:02:17] apply a matrix add a bias term [00:02:19] a matrix add a bias term apply an activation function [00:02:21] apply an activation function apply a matrix which happens to be a [00:02:23] apply a matrix which happens to be a vector i so i get a scalar and i add a [00:02:26] vector i so i get a scalar and i add a simple scalar bias term and i get a [00:02:28] simple scalar bias term and i get a score which then i can happily drive [00:02:31] score which then i can happily drive regression or take the sign to drive [00:02:34] regression or take the sign to drive classification [00:02:37] so [00:02:38] so what i want to [00:02:40] what i want to do now [00:02:41] do now is to factor out this kind of complex [00:02:44] is to factor out this kind of complex looking expression into a reusable [00:02:46] looking expression into a reusable component which i'm going to call it [00:02:48] component which i'm going to call it feed forward [00:02:49] feed forward so we're going to see a lot of these box [00:02:51] so we're going to see a lot of these box diagrams which are going to represent [00:02:53] diagrams which are going to represent functions that we can reuse and have a [00:02:56] functions that we can reuse and have a nice intro [00:02:58] nice intro so the feedforward function takes in an [00:03:00] so the feedforward function takes in an input vector x [00:03:02] input vector x and produces another an output vector [00:03:06] and produces another an output vector which could be of a different [00:03:07] which could be of a different dimensionality [00:03:09] dimensionality and the way to interpret what v4 are [00:03:11] and the way to interpret what v4 are doing is performing one step of [00:03:13] doing is performing one step of processing [00:03:15] processing in particular what that processing is is [00:03:18] in particular what that processing is is uh taking this input vector [00:03:21] uh taking this input vector multiplying by a matrix adding a bias [00:03:23] multiplying by a matrix adding a bias term and applying an activation function [00:03:27] term and applying an activation function okay [00:03:28] okay so this is a [00:03:30] so this is a function or a program but unlike normal [00:03:32] function or a program but unlike normal programming it's underspecified because [00:03:35] programming it's underspecified because the red [00:03:37] the red numbers here are parameters which are [00:03:39] numbers here are parameters which are private to this function which are going [00:03:41] private to this function which are going to be set [00:03:42] to be set in tune later via back propagation [00:03:46] in tune later via back propagation so now we can write our three-legged [00:03:48] so now we can write our three-legged neural network using feedforward and the [00:03:51] neural network using feedforward and the way i'm going to do this is [00:03:53] way i'm going to do this is score is equal to [00:03:56] score is equal to you take [00:03:58] you take x or this should be phi of x [00:04:01] x or this should be phi of x um [00:04:02] um and you [00:04:03] and you apply fee forward b forward feedforward [00:04:06] apply fee forward b forward feedforward and you can write this as feedforward [00:04:09] and you can write this as feedforward cubed to as to be more compact [00:04:12] cubed to as to be more compact so this is a very compact way of writing [00:04:15] so this is a very compact way of writing something that would otherwise be [00:04:18] something that would otherwise be quite complicated [00:04:23] so now let's suppose we want to do image [00:04:25] so now let's suppose we want to do image classification [00:04:27] classification we need some way of representing images [00:04:30] we need some way of representing images so the feedforward function that we just [00:04:32] so the feedforward function that we just introduced takes a vector as input [00:04:35] introduced takes a vector as input and we can represent an image as a long [00:04:38] and we can represent an image as a long vector by uh for example adding all the [00:04:41] vector by uh for example adding all the rows [00:04:42] rows but then we would have this huge matrix [00:04:45] but then we would have this huge matrix that we would need to be able to [00:04:47] that we would need to be able to transform this vector resulting a lot of [00:04:49] transform this vector resulting a lot of parameters and uh which may make it [00:04:52] parameters and uh which may make it difficult [00:04:54] difficult um and the the problem here is that [00:04:56] um and the the problem here is that we're not really using of the spatial [00:05:00] we're not really using of the spatial structure of images for example if i [00:05:02] structure of images for example if i just permuted all the um the elements of [00:05:05] just permuted all the um the elements of this vector and retrain i would [00:05:07] this vector and retrain i would basically get this i would get their [00:05:09] basically get this i would get their identical model so it's kind of not [00:05:12] identical model so it's kind of not paying attention to which [00:05:14] paying attention to which pixels are close by [00:05:17] pixels are close by to fix this uh problem uh we introduce [00:05:20] to fix this uh problem uh we introduce convolutional neural networks which is a [00:05:22] convolutional neural networks which is a refinement of a fully connected network [00:05:25] refinement of a fully connected network so here in this example [00:05:27] so here in this example of a conf net in action [00:05:31] of a conf net in action uh so here's a car and you can see that [00:05:34] uh so here's a car and you can see that it goes through a number of layers and [00:05:37] it goes through a number of layers and over time it [00:05:39] over time it computes increasingly abstract [00:05:41] computes increasingly abstract representations of the image and at the [00:05:44] representations of the image and at the end you get a vector representing the [00:05:47] end you get a vector representing the probabilities of the different [00:05:49] probabilities of the different object categories [00:05:51] object categories so if you want to play with compnets you [00:05:53] so if you want to play with compnets you can actually click here for android [00:05:55] can actually click here for android carpathi's excellent [00:05:57] carpathi's excellent demo where you can actually create and [00:05:59] demo where you can actually create and train contents in your browser [00:06:03] train contents in your browser so another comment is that confidence [00:06:05] so another comment is that confidence we're going to introduce them for [00:06:07] we're going to introduce them for 2d images but they can also [00:06:09] 2d images but they can also be applied to [00:06:11] be applied to texts or sequences which are 1d or [00:06:13] texts or sequences which are 1d or videos which are [00:06:18] so confidence have two basic [00:06:21] so confidence have two basic building blocks um we're not going to go [00:06:23] building blocks um we're not going to go through the details um you can take a [00:06:25] through the details um you can take a cs231 if you want to learn all about [00:06:27] cs231 if you want to learn all about confidence but instead i'm going to [00:06:29] confidence but instead i'm going to focus on the interface and show how [00:06:32] focus on the interface and show how these things these modules come [00:06:35] these things these modules come so the first is [00:06:37] so the first is conf [00:06:38] conf and so conf takes an image [00:06:41] and so conf takes an image and the image is going to be represented [00:06:43] and the image is going to be represented as a volume which is a collection of [00:06:46] as a volume which is a collection of matrices one for each channel red green [00:06:49] matrices one for each channel red green blue [00:06:50] blue each matrix it has the same [00:06:52] each matrix it has the same dimensionality as the image [00:06:54] dimensionality as the image height by [00:06:55] height by width [00:06:57] width and what the confidence is going to do [00:07:00] and what the confidence is going to do is it's going to compute another volume [00:07:02] is it's going to compute another volume of a slightly different size usually the [00:07:05] of a slightly different size usually the height and width of this volume is going [00:07:07] height and width of this volume is going to be equal or [00:07:08] to be equal or maybe slightly smaller than the input [00:07:10] maybe slightly smaller than the input volume [00:07:11] volume and uh the number of channels is going [00:07:14] and uh the number of channels is going to be somewhat different the way that [00:07:16] to be somewhat different the way that coff is going to compute this [00:07:19] coff is going to compute this volume is via a sequence of filters and [00:07:22] volume is via a sequence of filters and intuitively what it's going to do is try [00:07:24] intuitively what it's going to do is try to detect local patterns with [00:07:27] to detect local patterns with so here is one filter [00:07:30] so here is one filter and how [00:07:33] and how it works is i'm going to slap the slide [00:07:35] it works is i'm going to slap the slide this filter across the image [00:07:38] this filter across the image and [00:07:39] and if i put the filter here i'm going to [00:07:42] if i put the filter here i'm going to kind of align it up with the first [00:07:45] kind of align it up with the first pixels on on the image [00:07:47] pixels on on the image um i'm going to compute the dot product [00:07:49] um i'm going to compute the dot product between the eight numbers here and [00:07:52] between the eight numbers here and the [00:07:53] the um actually sorry 12 numbers here and [00:07:55] um actually sorry 12 numbers here and the 12 numbers here i get a single [00:07:57] the 12 numbers here i get a single number which i'm going to write into [00:07:59] number which i'm going to write into this entry i slide the filter over a [00:08:01] this entry i slide the filter over a little bit i'm going to write into the [00:08:03] little bit i'm going to write into the second entry and so on [00:08:06] second entry and so on and then for the second filter i'm going [00:08:08] and then for the second filter i'm going to use to fill up the second output [00:08:10] to use to fill up the second output channel so the number of filters is the [00:08:12] channel so the number of filters is the number of output channels [00:08:14] number of output channels okay so that's all i'm going to say [00:08:15] okay so that's all i'm going to say about conf [00:08:18] about conf the second operation is a max pool which [00:08:21] the second operation is a max pool which again takes an input volume and then it [00:08:24] again takes an input volume and then it produces a smaller [00:08:26] produces a smaller output volume [00:08:28] output volume it's going to have the same number of [00:08:29] it's going to have the same number of channels and for every slice through the [00:08:32] channels and for every slice through the matrix it's going to [00:08:34] matrix it's going to slide [00:08:35] slide a little [00:08:37] a little operate max operation over every two by [00:08:40] operate max operation over every two by two or three by three region [00:08:42] two or three by three region so the max over these four numbers is [00:08:45] so the max over these four numbers is going to be used for this [00:08:48] going to be used for this number and so on [00:08:50] number and so on okay so that's all i'm going to say [00:08:52] okay so that's all i'm going to say about max pool um if you want to drill [00:08:55] about max pool um if you want to drill into the details you can check out this [00:08:57] into the details you can check out this demo or you can uh learn more in two [00:09:00] demo or you can uh learn more in two three one [00:09:02] three one but again i wanna highlight that there's [00:09:04] but again i wanna highlight that there's these two modules one for detecting [00:09:07] these two modules one for detecting patterns and one for aggregating to kind [00:09:09] patterns and one for aggregating to kind of reduce the the dimensionality [00:09:12] of reduce the the dimensionality and with these [00:09:13] and with these uh [00:09:14] uh two functions along with feed forward [00:09:16] two functions along with feed forward now we can define alexnet which was uh [00:09:19] now we can define alexnet which was uh the seminal [00:09:21] the seminal cnn from 2012 that won the image [00:09:24] cnn from 2012 that won the image connection and really transformed my [00:09:26] connection and really transformed my computer [00:09:28] computer so how this works is i'm going to start [00:09:31] so how this works is i'm going to start with my input image [00:09:33] with my input image apply a convolutional layer apply max 4 [00:09:36] apply a convolutional layer apply max 4 apply another configuration error by max [00:09:39] apply another configuration error by max 4 apply three more convolutional layers [00:09:42] 4 apply three more convolutional layers by max 4 and then apply three [00:09:45] by max 4 and then apply three layers of feet forward [00:09:48] layers of feet forward okay so in one line i have um [00:09:51] okay so in one line i have um alexnet [00:09:52] alexnet now of course i've underspecified um a [00:09:55] now of course i've underspecified um a couple of things here [00:09:56] couple of things here one is um [00:09:59] one is um i haven't specified the parameters those [00:10:02] i haven't specified the parameters those are to be learned [00:10:03] are to be learned and each of these functions holds its uh [00:10:07] and each of these functions holds its uh a private set of parameters that need to [00:10:09] a private set of parameters that need to be learned the second thing is i also [00:10:11] be learned the second thing is i also haven't specified the hyper parameters [00:10:13] haven't specified the hyper parameters which is the number of channels the [00:10:15] which is the number of channels the filter sizes and so on which are [00:10:17] filter sizes and so on which are actually pretty important for getting a [00:10:19] actually pretty important for getting a good performance but i just wanted to [00:10:20] good performance but i just wanted to highlight the overarching structure and [00:10:23] highlight the overarching structure and the idea that you can compose in a [00:10:25] the idea that you can compose in a fairly effortless way [00:10:29] fairly effortless way so now let's turn our attention to [00:10:31] so now let's turn our attention to natural language processing so here's a [00:10:33] natural language processing so here's a motivating example [00:10:35] motivating example suppose we want to build a question [00:10:36] suppose we want to build a question answering system [00:10:38] answering system we have [00:10:39] we have a paragraph [00:10:41] a paragraph it's from wikipedia [00:10:43] it's from wikipedia we have a question [00:10:45] we have a question and we want to select the answer from [00:10:47] and we want to select the answer from that passage [00:10:48] that passage from the paragraph so this happens to be [00:10:51] from the paragraph so this happens to be from the squad question answering [00:10:53] from the squad question answering benchmark [00:10:54] benchmark um so [00:10:55] um so let's just read this so in meteorology [00:10:57] let's just read this so in meteorology precipitation is any product of a [00:10:59] precipitation is any product of a condensation of atmospheric water vapor [00:11:01] condensation of atmospheric water vapor that falls under gravity [00:11:03] that falls under gravity and the question is what causes [00:11:05] and the question is what causes precipitation to fall [00:11:07] precipitation to fall and the answer is gravity [00:11:10] and the answer is gravity so to do question answering you have to [00:11:13] so to do question answering you have to do [00:11:14] do a fair amount of processing um so you [00:11:17] a fair amount of processing um so you somehow have to relate to the question [00:11:20] somehow have to relate to the question with uh the paragraph but it's not an [00:11:23] with uh the paragraph but it's not an exact match some of the words match like [00:11:25] exact match some of the words match like precipitation but some of them are kind [00:11:27] precipitation but some of them are kind of more subtle like causes is somehow [00:11:30] of more subtle like causes is somehow related to product [00:11:32] related to product and [00:11:33] and also the fact that some words are [00:11:35] also the fact that some words are ambiguous like product can be um [00:11:38] ambiguous like product can be um or [00:11:39] or multiplication or um output [00:11:43] multiplication or um output um so there's a lot of [00:11:45] um so there's a lot of processing that needs to happen [00:11:48] processing that needs to happen and it's hard to kind of specify in [00:11:50] and it's hard to kind of specify in advance [00:11:51] advance so [00:11:52] so um [00:11:54] um so first things first so words are [00:11:56] so first things first so words are discrete objects and neural networks [00:11:59] discrete objects and neural networks speak vectors [00:12:00] speak vectors so whenever you're doing nlp with um [00:12:04] so whenever you're doing nlp with um with neural nets you first have to embed [00:12:08] with neural nets you first have to embed um the words or more generally tokens [00:12:11] um the words or more generally tokens so we're going to define an embed token [00:12:13] so we're going to define an embed token function that takes [00:12:15] function that takes a word or a token x and maps it into a [00:12:18] a word or a token x and maps it into a vector [00:12:19] vector and all this function is going to do is [00:12:21] and all this function is going to do is it's going to look up vector in a [00:12:23] it's going to look up vector in a dictionary [00:12:24] dictionary that has a static [00:12:27] that has a static set of vectors associated with [00:12:28] set of vectors associated with particular tokens [00:12:32] particular tokens so um this is fine and for if you have [00:12:36] so um this is fine and for if you have a a sequence of words [00:12:40] a a sequence of words then you can just embed each word into a [00:12:42] then you can just embed each word into a vector to get a sequence of vectors [00:12:46] vector to get a sequence of vectors um there's one [00:12:47] um there's one problem which is that the meaning of the [00:12:50] problem which is that the meaning of the words and tokens depends on context so [00:12:53] words and tokens depends on context so this [00:12:54] this representation of the sentence is not [00:12:56] representation of the sentence is not going to be a particularly [00:12:58] going to be a particularly sophisticated one [00:13:01] so what we're going to do is going to [00:13:04] so what we're going to do is going to define [00:13:06] define an abstract function [00:13:08] an abstract function borrowing terminology for programming [00:13:11] borrowing terminology for programming abstract function is something that has [00:13:13] abstract function is something that has an interface but not an implementation [00:13:16] an interface but not an implementation so a sequence model is going to be [00:13:17] so a sequence model is going to be something that takes a [00:13:19] something that takes a sequence of input vectors and produces a [00:13:22] sequence of input vectors and produces a corresponding sequence of output vectors [00:13:25] corresponding sequence of output vectors where each vector in this sequence is [00:13:29] where each vector in this sequence is a process with respect to the other [00:13:32] a process with respect to the other elements so in other words i want to [00:13:34] elements so in other words i want to contextualize these vectors [00:13:37] contextualize these vectors um [00:13:38] um using the sequence models [00:13:41] using the sequence models i'm going to talk about two [00:13:42] i'm going to talk about two implementations of the sequence models [00:13:44] implementations of the sequence models one is recurrent neural networks and one [00:13:47] one is recurrent neural networks and one is transformers [00:13:49] is transformers so historically recurrent neural [00:13:51] so historically recurrent neural networks have have been around fog since [00:13:54] networks have have been around fog since the early 90s and uh since [00:13:58] the early 90s and uh since 2011 or so became really kind of the [00:14:00] 2011 or so became really kind of the dominant paradigm for doing a deep [00:14:03] dominant paradigm for doing a deep learning nlp transformers uh [00:14:06] learning nlp transformers uh who came out in 2017 and really has kind [00:14:09] who came out in 2017 and really has kind of started [00:14:11] of started uh i guess transformed the landscape of [00:14:14] uh i guess transformed the landscape of deep learning nlp [00:14:18] so on rnn or a recurrent network can be [00:14:21] so on rnn or a recurrent network can be thought of as reading [00:14:24] thought of as reading you know a sentence left to right that's [00:14:25] you know a sentence left to right that's a kind of intuitive way to think about [00:14:27] a kind of intuitive way to think about it so [00:14:28] it so we have um you know a word which gets [00:14:31] we have um you know a word which gets mapped into a vector [00:14:33] mapped into a vector that produces some hidden state [00:14:36] that produces some hidden state and then i'm going to read a second [00:14:39] and then i'm going to read a second uh [00:14:40] uh input vector and i'm going to update [00:14:42] input vector and i'm going to update this hidden state [00:14:44] this hidden state with along with this thing that i just [00:14:47] with along with this thing that i just read into a new hidden state [00:14:49] read into a new hidden state and then i'm going to read another [00:14:51] and then i'm going to read another input vector updated state [00:14:54] input vector updated state and repeating [00:14:55] and repeating and and again okay so at the end of the [00:14:59] and and again okay so at the end of the day [00:15:00] day i have the sequence model because that [00:15:02] i have the sequence model because that maps um input sequence into an output [00:15:06] maps um input sequence into an output sequence [00:15:07] sequence and [00:15:08] and i notice that each vector here now [00:15:11] i notice that each vector here now depends on not just the [00:15:13] depends on not just the us [00:15:14] us the input [00:15:16] the input vector but [00:15:18] vector but um into the left so if you look at h3 h3 [00:15:21] um into the left so if you look at h3 h3 depends on x3 x2 and x1 [00:15:24] depends on x3 x2 and x1 all this computation graph [00:15:27] all this computation graph so the intuition again is reading left [00:15:30] so the intuition again is reading left to right updating hidden state as you go [00:15:32] to right updating hidden state as you go along it's kind of like a memory [00:15:35] along it's kind of like a memory um [00:15:36] um one thing i haven't specified is what [00:15:39] one thing i haven't specified is what this function that takes an old hidden [00:15:42] this function that takes an old hidden state an input and updates the hidden [00:15:45] state an input and updates the hidden state [00:15:47] state so i'm going to do that next there's two [00:15:50] so i'm going to do that next there's two types of implementations i'm going to [00:15:52] types of implementations i'm going to talk about one is a simple rnn [00:15:55] talk about one is a simple rnn um so [00:15:56] um so the contract here is i'm going to have [00:15:58] the contract here is i'm going to have an old hidden state [00:16:00] an old hidden state an input and we're going to want to [00:16:02] an input and we're going to want to generate a new hidden state of the same [00:16:04] generate a new hidden state of the same dimensionality [00:16:06] dimensionality and the way a simple rnn works is it's [00:16:09] and the way a simple rnn works is it's uh i take the hidden state multiply by a [00:16:12] uh i take the hidden state multiply by a matrix [00:16:13] matrix um [00:16:15] um take the input [00:16:17] take the input um multiply by matrix [00:16:19] um multiply by matrix and i add these two and i apply an [00:16:21] and i add these two and i apply an activation function [00:16:24] activation function so it's fairly simple and one other way [00:16:27] so it's fairly simple and one other way to think about this is that this is [00:16:29] to think about this is that this is really the feed forward function applied [00:16:31] really the feed forward function applied to concatenation of h and [00:16:34] to concatenation of h and x okay so one problem with a simple rnn [00:16:38] x okay so one problem with a simple rnn is that [00:16:40] is that it suffers from the vanishing gradient [00:16:42] it suffers from the vanishing gradient problem [00:16:43] problem if you have long sequences then the [00:16:46] if you have long sequences then the gradients start vanishing so lstms or [00:16:50] gradients start vanishing so lstms or long [00:16:50] long short-term memory were developed to [00:16:53] short-term memory were developed to solve this problem [00:16:55] solve this problem and the way that this works is uh the [00:16:58] and the way that this works is uh the interface is the same [00:17:00] interface is the same and [00:17:01] and the implementation is some [00:17:03] the implementation is some rather involved thing that i'm not going [00:17:05] rather involved thing that i'm not going to explain [00:17:08] to explain but [00:17:08] but intuitively you should black boxes and [00:17:11] intuitively you should black boxes and think about lstms this is just a way to [00:17:13] think about lstms this is just a way to update the hidden state [00:17:15] update the hidden state given a new input but without forgetting [00:17:18] given a new input but without forgetting the past [00:17:19] the past remember up here for that simple rnn we [00:17:22] remember up here for that simple rnn we can think of it as this feed forward on [00:17:25] can think of it as this feed forward on x and [00:17:27] x and h [00:17:28] h which are treated kind of equally lsm [00:17:30] which are treated kind of equally lsm lsm's kind of privilege h and make sure [00:17:33] lsm's kind of privilege h and make sure that each doesn't get forgot while going [00:17:35] that each doesn't get forgot while going through this era [00:17:39] okay so [00:17:40] okay so now we have our [00:17:42] now we have our sequence model on rnn which [00:17:45] sequence model on rnn which produces a [00:17:47] produces a sequence of vectors which and the number [00:17:49] sequence of vectors which and the number of vectors depends on how long the input [00:17:51] of vectors depends on how long the input sequence is [00:17:53] sequence is so suppose we want to do classification [00:17:55] so suppose we want to do classification we need to somehow collapse that into a [00:17:57] we need to somehow collapse that into a single vector so i'm going to define [00:17:59] single vector so i'm going to define this function collapse which takes a [00:18:01] this function collapse which takes a sequence of vectors and returns a single [00:18:04] sequence of vectors and returns a single vector [00:18:06] vector so you can intuitively think about this [00:18:08] so you can intuitively think about this as summarizing the collection of vectors [00:18:10] as summarizing the collection of vectors as one [00:18:12] as one there's uh three um common things you [00:18:14] there's uh three um common things you can do you can just simply take the [00:18:16] can do you can just simply take the first vector [00:18:18] first vector you can take the last vector or you can [00:18:20] you can take the last vector or you can take uh the average of [00:18:22] take uh the average of all the vectors so if you're doing text [00:18:24] all the vectors so if you're doing text classification you probably want to pick [00:18:26] classification you probably want to pick the average to not privilege any [00:18:28] the average to not privilege any individual word but as we'll see later [00:18:30] individual word but as we'll see later if you're trying to do language modeling [00:18:31] if you're trying to do language modeling you want to take class [00:18:34] you want to take class so here is an example text [00:18:36] so here is an example text classification model that we can uh [00:18:38] classification model that we can uh develop [00:18:39] develop um the score [00:18:42] um the score or let's say binary classification is [00:18:44] or let's say binary classification is going to be equal to [00:18:46] going to be equal to taking the input [00:18:48] taking the input sequence of tokens [00:18:50] sequence of tokens to embed all the tokens into a sequence [00:18:53] to embed all the tokens into a sequence of vectors and now you can apply [00:18:58] of vectors and now you can apply a sequence model for example the [00:18:59] a sequence model for example the sequence rnn [00:19:01] sequence rnn um and you can do this three times that [00:19:03] um and you can do this three times that gives you um depth just like we talked [00:19:06] gives you um depth just like we talked about for feed with forward networks [00:19:08] about for feed with forward networks and now you can collapse that into a [00:19:10] and now you can collapse that into a single vector take the dot product to [00:19:12] single vector take the dot product to get a number out [00:19:14] get a number out so this um [00:19:16] so this um these types of [00:19:19] these types of functions that where the input and [00:19:20] functions that where the input and output have the same type signature are [00:19:23] output have the same type signature are really handy because then you can [00:19:25] really handy because then you can compose them with each other and get [00:19:28] compose them with each other and get multiple steps of computation [00:19:34] so [00:19:35] so recurrent neural networks are [00:19:38] recurrent neural networks are work generally fairly well but [00:19:41] work generally fairly well but they suffer from one problem is that [00:19:43] they suffer from one problem is that they're fairly [00:19:45] they're fairly you know local [00:19:47] you know local and um [00:19:49] and um so [00:19:53] the one problem [00:19:55] the one problem that oh so this is a problem that we're [00:19:57] that oh so this is a problem that we're going to try to [00:19:58] going to try to address [00:20:00] address with transformers [00:20:02] with transformers so uh introducing transformers is fairly [00:20:05] so uh introducing transformers is fairly involved [00:20:06] involved so i'm going to step through [00:20:09] so i'm going to step through introduce a few things before actually [00:20:11] introduce a few things before actually defining it [00:20:12] defining it so the core part of a transformer is [00:20:16] so the core part of a transformer is the tension mechanism [00:20:18] the tension mechanism and [00:20:19] and the intention mechanism [00:20:21] the intention mechanism takes in a collection of vectors [00:20:24] takes in a collection of vectors of input vectors and a query vector [00:20:27] of input vectors and a query vector and then outputs a single vector [00:20:30] and then outputs a single vector and intuitively what the tension is [00:20:32] and intuitively what the tension is doing is it's going to process [00:20:34] doing is it's going to process y by comparing it to each of [00:20:37] y by comparing it to each of these x's [00:20:39] these x's okay [00:20:40] okay so [00:20:41] so mathematically what this is doing is um [00:20:45] mathematically what this is doing is um you start with the query vector [00:20:48] you start with the query vector i'm going to multiply a matrix to reduce [00:20:50] i'm going to multiply a matrix to reduce its dimensionality in this case from six [00:20:53] its dimensionality in this case from six to three [00:20:55] to three um i'm also going to take um [00:20:58] um i'm also going to take um the x [00:20:59] the x transpose which is [00:21:01] transpose which is each row here [00:21:02] each row here is um one of the input vectors x1 x2 x3 [00:21:07] is um one of the input vectors x1 x2 x3 x4 [00:21:08] x4 i'm going to reduce its dimensionality [00:21:10] i'm going to reduce its dimensionality to also [00:21:12] to also three dimensions [00:21:14] three dimensions and now i can take the dot product [00:21:17] and now i can take the dot product between these x's and y's [00:21:20] between these x's and y's so that's going to give me a four [00:21:22] so that's going to give me a four dimensional vector [00:21:24] dimensional vector of of dot products intuitively measuring [00:21:27] of of dot products intuitively measuring this uh [00:21:28] this uh similarity between x and the x's and the [00:21:32] similarity between x and the x's and the y [00:21:34] y so now i can take those scores [00:21:37] so now i can take those scores and i can uh turn them into [00:21:39] and i can uh turn them into probabilities by taking a softmax so a [00:21:42] probabilities by taking a softmax so a softmax [00:21:44] softmax exponentiates the scores and normalizes [00:21:47] exponentiates the scores and normalizes that into a probability distribution [00:21:50] that into a probability distribution so now i have a distribution over the [00:21:52] so now i have a distribution over the input vectors x1 x2 x34 it's a four [00:21:56] input vectors x1 x2 x34 it's a four dimensional vector [00:21:57] dimensional vector i can use that [00:21:59] i can use that those probabilities as weights to [00:22:02] those probabilities as weights to when i multiply by x [00:22:04] when i multiply by x to take a weighted combination of the [00:22:06] to take a weighted combination of the columns of x here [00:22:09] columns of x here so [00:22:10] so for intuition if um one of the [00:22:14] for intuition if um one of the and the inputs has a very high [00:22:16] and the inputs has a very high probability let's say it's a 0 0 1 0 [00:22:19] probability let's say it's a 0 0 1 0 then i'm just going to [00:22:22] then i'm just going to pick out the third [00:22:24] pick out the third input vector so in general this is a [00:22:27] input vector so in general this is a distribution so this is kind of like [00:22:29] distribution so this is kind of like softly picking out um which input vector [00:22:32] softly picking out um which input vector is similar to y [00:22:35] is similar to y okay and then finally i'm going to [00:22:37] okay and then finally i'm going to reduce the dimensionality to some lower [00:22:40] reduce the dimensionality to some lower dimensional object [00:22:43] so [00:22:45] so similarity is can be a multifaceted [00:22:48] similarity is can be a multifaceted thing so [00:22:49] thing so one thing that the transformer does is [00:22:52] one thing that the transformer does is allows us to use multiple attention [00:22:55] allows us to use multiple attention heads so i'm going to repeat this [00:22:57] heads so i'm going to repeat this process again [00:22:59] process again taking the query vector taking the input [00:23:02] taking the query vector taking the input vector comparing them getting a [00:23:04] vector comparing them getting a distribution over the input vectors and [00:23:06] distribution over the input vectors and using that distribution reweight the [00:23:08] using that distribution reweight the input vector so i'm selecting out softly [00:23:10] input vector so i'm selecting out softly in a vector and i multiply it by matrix [00:23:14] in a vector and i multiply it by matrix to reduce the dimensionality i've done [00:23:16] to reduce the dimensionality i've done this twice but in general you can do [00:23:18] this twice but in general you can do this um any [00:23:19] this um any for [00:23:20] for 16. [00:23:22] 16. so now i concatenate these vectors so i [00:23:24] so now i concatenate these vectors so i have a four dimensional vector from this [00:23:27] have a four dimensional vector from this computation four dimensional vector from [00:23:29] computation four dimensional vector from this computation i can concatenate them [00:23:31] this computation i can concatenate them into a eight dimensional vector and now [00:23:34] into a eight dimensional vector and now i can reduce the dimensionality back to [00:23:36] i can reduce the dimensionality back to the original [00:23:37] the original dimensionality that of the of the inputs [00:23:41] dimensionality that of the of the inputs okay so that was a kind of a very [00:23:44] okay so that was a kind of a very involved uh you know process but at the [00:23:46] involved uh you know process but at the end of the day you can think about this [00:23:49] end of the day you can think about this as taking y comparing it with the x's [00:23:52] as taking y comparing it with the x's and selecting out the one that's [00:23:54] and selecting out the one that's most similar [00:23:56] most similar and [00:23:58] and doing some dimensionality reduction in [00:23:59] doing some dimensionality reduction in the process [00:24:02] okay so that's attention um [00:24:06] okay so that's attention um the transformer uh uses something called [00:24:09] the transformer uh uses something called self-attention which means that the [00:24:11] self-attention which means that the query vector is actually going to be the [00:24:13] query vector is actually going to be the import vectors themselves [00:24:15] import vectors themselves so if attention self-attention takes a [00:24:18] so if attention self-attention takes a sequence of [00:24:20] sequence of input vectors [00:24:22] input vectors and then it's going to output the same [00:24:25] and then it's going to output the same a sequence of output vectors [00:24:28] a sequence of output vectors where the first vector is i'm going to [00:24:30] where the first vector is i'm going to stick x1 [00:24:32] stick x1 into the query vector for y and compute [00:24:35] into the query vector for y and compute the tension [00:24:37] the tension and then x2 and x3 and x4 [00:24:40] and then x2 and x3 and x4 so each of these vectors is [00:24:42] so each of these vectors is comparing [00:24:44] comparing a particular input vector with the rest [00:24:46] a particular input vector with the rest of imperfectors and doing some [00:24:48] of imperfectors and doing some processing [00:24:50] processing so in other words i've basically [00:24:55] so in other words i've basically generated a sequence of vectors where [00:24:58] generated a sequence of vectors where all the objects all [00:25:00] all the objects all n squared of the objects have allowed uh [00:25:03] n squared of the objects have allowed uh i've allowed them to communicate with [00:25:06] i've allowed them to communicate with each other directly [00:25:08] each other directly um so in contrast with rnn [00:25:11] um so in contrast with rnn um [00:25:12] um we have representations that have to [00:25:14] we have representations that have to kind of do proceed step by step and the [00:25:18] kind of do proceed step by step and the number of steps is the length of the [00:25:20] number of steps is the length of the sequence which causes these long chains [00:25:22] sequence which causes these long chains which [00:25:24] which prevents uh kind of fast propagation [00:25:26] prevents uh kind of fast propagation whereas attention um solves uh this [00:25:29] whereas attention um solves uh this problem [00:25:31] problem so one kind of slight uh [00:25:34] so one kind of slight uh you know comment is that you know i've [00:25:36] you know comment is that you know i've speaking very vaguely and intuitively [00:25:39] speaking very vaguely and intuitively about these things um [00:25:41] about these things um trying to provide as much intuition as [00:25:43] trying to provide as much intuition as possible and i it's really you can't [00:25:46] possible and i it's really you can't really be more precise because i'm again [00:25:49] really be more precise because i'm again not specifying uh the actual computation [00:25:52] not specifying uh the actual computation i'm only specifying the kind of the [00:25:54] i'm only specifying the kind of the scope of possible computations that [00:25:57] scope of possible computations that can be done once [00:25:59] can be done once the parameters are learned from data [00:26:02] okay so that's the tension [00:26:05] okay so that's the tension mechanism you can think about this as a [00:26:07] mechanism you can think about this as a sequence model that just takes uh input [00:26:10] sequence model that just takes uh input sequence and [00:26:11] sequence and contextualizes the [00:26:13] contextualizes the input vectors into output vectors [00:26:17] input vectors into output vectors so there's two other pieces i need to [00:26:20] so there's two other pieces i need to talk about um to before i can fully [00:26:22] talk about um to before i can fully define the transformer layer [00:26:24] define the transformer layer normalization and residual connections [00:26:26] normalization and residual connections so these are really kind of technical [00:26:28] so these are really kind of technical devices to make the final [00:26:30] devices to make the final neural network easier to train [00:26:33] neural network easier to train i'm going to package them up into [00:26:35] i'm going to package them up into something called add norm [00:26:37] something called add norm and it's also a has a type signature of [00:26:40] and it's also a has a type signature of a sequence model where i have an input [00:26:42] a sequence model where i have an input sequence of vectors and i spit out the [00:26:45] sequence of vectors and i spit out the corresponding set of contextualized [00:26:47] corresponding set of contextualized vectors [00:26:48] vectors and the intuition behind this is i'm [00:26:50] and the intuition behind this is i'm going to apply f to x safely [00:26:54] going to apply f to x safely so let me explain what that means [00:26:56] so let me explain what that means so add norm of f x is equal to [00:27:00] so add norm of f x is equal to i'm first going to take x and apply f to [00:27:02] i'm first going to take x and apply f to it [00:27:03] it okay so [00:27:05] okay so why is that not good enough [00:27:07] why is that not good enough well remember that um [00:27:10] well remember that um in [00:27:11] in uh these these functions are under [00:27:14] uh these these functions are under specified so at the beginning of [00:27:15] specified so at the beginning of training they're basically not doing you [00:27:18] training they're basically not doing you know anything [00:27:19] know anything and so they're basically kind of junk [00:27:22] and so they're basically kind of junk and if this is junk then anything that i [00:27:24] and if this is junk then anything that i build on top of it is also going to be [00:27:26] build on top of it is also going to be uh pretty junky [00:27:28] uh pretty junky so what i want to do is add a residual [00:27:30] so what i want to do is add a residual connection [00:27:31] connection so residual connection is a kind of [00:27:33] so residual connection is a kind of escape hatch that allows x to be [00:27:35] escape hatch that allows x to be propagated through a verbatim [00:27:38] propagated through a verbatim so that means if f is jumped at least i [00:27:41] so that means if f is jumped at least i have x [00:27:43] have x so then i'm going to add a layer norm [00:27:47] so then i'm going to add a layer norm function on top of this [00:27:50] function on top of this so layer normalization [00:27:52] so layer normalization is just a way to [00:27:55] is just a way to make sure that this vector is not too [00:27:57] make sure that this vector is not too big or not too small because big vectors [00:28:00] big or not too small because big vectors and small vectors [00:28:02] and small vectors result in exploding gradients or [00:28:04] result in exploding gradients or vanishing gradients which stalls [00:28:07] vanishing gradients which stalls training or mixed training [00:28:09] training or mixed training diverge [00:28:10] diverge so specifically what layer norm does on [00:28:13] so specifically what layer norm does on a single vector is that it treats these [00:28:15] a single vector is that it treats these as a set of elements and it subtracts [00:28:18] as a set of elements and it subtracts the mean of those elements and divides [00:28:20] the mean of those elements and divides by the standard deviation to kind of [00:28:22] by the standard deviation to kind of standardize the the magnitude and [00:28:26] standardize the the magnitude and of the vectors [00:28:30] okay so [00:28:31] okay so in summary add norm of with a particular [00:28:34] in summary add norm of with a particular function is just applying f to x uh [00:28:37] function is just applying f to x uh safely [00:28:39] safely okay so now i'm finally ready to define [00:28:41] okay so now i'm finally ready to define the the transformer block [00:28:44] the the transformer block and this is again a sequence model that [00:28:46] and this is again a sequence model that takes this sequence of input vectors and [00:28:49] takes this sequence of input vectors and spits out a contextualized set of output [00:28:51] spits out a contextualized set of output vectors [00:28:52] vectors and this is just uh intuitively [00:28:55] and this is just uh intuitively processing each x i in context [00:28:59] processing each x i in context so there's only one line here we've done [00:29:02] so there's only one line here we've done actually a lot of most of the hard work [00:29:04] actually a lot of most of the hard work so the transformer block on a [00:29:07] so the transformer block on a sequence of vectors is going to be [00:29:10] sequence of vectors is going to be x and you apply a tension [00:29:13] x and you apply a tension that allows all the [00:29:15] that allows all the the vectors to talk to each other [00:29:18] the vectors to talk to each other and then you want to normalize to the [00:29:21] and then you want to normalize to the end to do this safely [00:29:23] end to do this safely um and finally you apply feed forward to [00:29:26] um and finally you apply feed forward to each individual [00:29:28] each individual resulting vector independently and then [00:29:30] resulting vector independently and then you also want to normalize [00:29:33] you also want to normalize and do this safely [00:29:35] and do this safely so so that's it for [00:29:37] so so that's it for a transformer block [00:29:40] a transformer block so [00:29:42] so now i can [00:29:43] now i can and now we have enough that we can [00:29:44] and now we have enough that we can actually build up to a bird which was [00:29:46] actually build up to a bird which was this complicated thing that i mentioned [00:29:49] this complicated thing that i mentioned at the beginning the bird is this large [00:29:51] at the beginning the bird is this large unsupervised pre-trained model which is [00:29:53] unsupervised pre-trained model which is uh came out in [00:29:55] uh came out in 2018 which has really kind of [00:29:58] 2018 which has really kind of transformed nlp before there were a lot [00:30:01] transformed nlp before there were a lot of specialized architectures for [00:30:02] of specialized architectures for different tasks but bird was a single [00:30:05] different tasks but bird was a single model architecture that work well across [00:30:08] model architecture that work well across the many tasks so this is the way um it [00:30:11] the many tasks so this is the way um it works for [00:30:13] works for you know question answering you take [00:30:18] the [00:30:19] the question you concatenate it with the [00:30:21] question you concatenate it with the paragraph [00:30:22] paragraph that gives you just a sequence of tokens [00:30:27] that gives you just a sequence of tokens and uh what bert does on a sequence of [00:30:29] and uh what bert does on a sequence of tokens is it's it's going to embed [00:30:33] tokens is it's it's going to embed the tokens [00:30:35] the tokens and then it's just going to [00:30:37] and then it's just going to apply the transformer block 24 times [00:30:42] apply the transformer block 24 times so again the nice thing about having a [00:30:44] so again the nice thing about having a transformer block that where the input [00:30:46] transformer block that where the input and output have the same dimensionality [00:30:48] and output have the same dimensionality and type is that you can just kind of [00:30:50] and type is that you can just kind of lay it on [00:30:51] lay it on and get much deeper [00:30:53] and get much deeper networks [00:30:55] networks okay so at the end of the day bird gives [00:30:57] okay so at the end of the day bird gives you a sequence of vectors which are [00:30:59] you a sequence of vectors which are highly contextualized and [00:31:02] highly contextualized and nuanced and it contains a lot of rich [00:31:04] nuanced and it contains a lot of rich information about the sentence [00:31:06] information about the sentence um and from there you can either use it [00:31:09] um and from there you can either use it to drive classification [00:31:12] to drive classification of let's say binary classification [00:31:15] of let's say binary classification directly by collapsing the vectors into [00:31:17] directly by collapsing the vectors into one vector or you can use it to [00:31:20] one vector or you can use it to select out [00:31:22] select out an answer to the question [00:31:25] an answer to the question and i'm not going to go into details of [00:31:27] and i'm not going to go into details of how that works [00:31:30] so so far we've talked about how to [00:31:32] so so far we've talked about how to design functions that can process a [00:31:35] design functions that can process a sentence a sequence of tokens or vectors [00:31:38] sentence a sequence of tokens or vectors um but we can also uh generate new [00:31:41] um but we can also uh generate new sequences [00:31:42] sequences and the basic building block for [00:31:44] and the basic building block for generation is i'm going to call it [00:31:46] generation is i'm going to call it generate token [00:31:48] generate token and it's um it's you take a vector [00:31:51] and it's um it's you take a vector x and you generate a token y [00:31:55] x and you generate a token y and this is going to be this is kind of [00:31:57] and this is going to be this is kind of the reverse of embed token which takes a [00:32:00] the reverse of embed token which takes a token and gener produces a vector [00:32:04] token and gener produces a vector and the way january token is going to [00:32:05] and the way january token is going to work is that it's going to actually use [00:32:08] work is that it's going to actually use this as a subroutine it's going to look [00:32:10] this as a subroutine it's going to look at all the possible candidate words that [00:32:14] at all the possible candidate words that one could generate it's going to embed [00:32:16] one could generate it's going to embed those [00:32:17] those to and take the dot product of x to get [00:32:19] to and take the dot product of x to get some sort of similarity between the [00:32:22] some sort of similarity between the vector and a potential candidate [00:32:24] vector and a potential candidate generation [00:32:25] generation now we have some scores we apply the [00:32:27] now we have some scores we apply the soft max to get a distribution [00:32:29] soft max to get a distribution over possible words and then we can [00:32:31] over possible words and then we can generate from that probability [00:32:37] so [00:32:38] so here [00:32:39] here building on top of generate token we can [00:32:42] building on top of generate token we can do language modeling where [00:32:44] do language modeling where the input is a sequence of words and the [00:32:48] the input is a sequence of words and the output is the next word [00:32:52] output is the next word um so this is actually fairly simple [00:32:55] um so this is actually fairly simple since we already have all the [00:32:56] since we already have all the essentiality tools so language modeling [00:32:58] essentiality tools so language modeling of x is you take x [00:33:02] of x is you take x you embed them into tokens the crucial [00:33:05] you embed them into tokens the crucial step is that you stick it through a [00:33:06] step is that you stick it through a sequence model remember a sequence model [00:33:09] sequence model remember a sequence model um does fancy stuff and it turns this [00:33:14] um does fancy stuff and it turns this sequence of um kind of primitive [00:33:17] sequence of um kind of primitive vectors into contextualized vectors [00:33:19] vectors into contextualized vectors which are um have contained more [00:33:21] which are um have contained more information [00:33:22] information and then it collapses them and this time [00:33:26] and then it collapses them and this time you you generally want to use the the [00:33:29] you you generally want to use the the last [00:33:29] last vector [00:33:31] vector because that's closest to the word that [00:33:33] because that's closest to the word that you want to generate next [00:33:35] you want to generate next and then that gives you just one vector [00:33:38] and then that gives you just one vector and you can use that to generate a token [00:33:44] okay so [00:33:45] okay so finally [00:33:48] finally we can take language models and we can [00:33:50] we can take language models and we can build on top of them um [00:33:53] build on top of them um to create what is known as a sequence to [00:33:55] to create what is known as a sequence to sequence model [00:33:56] sequence model so this is perhaps one of my kind of [00:33:58] so this is perhaps one of my kind of favorite uh interfaces because it's so [00:34:01] favorite uh interfaces because it's so versatile [00:34:02] versatile so the basic idea is that you have an [00:34:04] so the basic idea is that you have an input which is a sequence [00:34:06] input which is a sequence and you are trying to generate another [00:34:09] and you are trying to generate another sequence which is output and sequences [00:34:12] sequence which is output and sequences are very [00:34:13] are very general you can use sequences to encode [00:34:14] general you can use sequences to encode basically any sort of discrete [00:34:18] basically any sort of discrete output [00:34:19] output and the way we're going to do that is [00:34:21] and the way we're going to do that is just using you know a language model [00:34:24] just using you know a language model so [00:34:25] so um remember a language model takes the [00:34:27] um remember a language model takes the sequence and predicts the next token so [00:34:30] sequence and predicts the next token so i can take start out with x [00:34:33] i can take start out with x and i can use a query the language model [00:34:36] and i can use a query the language model to generate the next token and then i [00:34:38] to generate the next token and then i can feed this token attach this token to [00:34:42] can feed this token attach this token to the history query the language model [00:34:44] the history query the language model again to generate the next token and so [00:34:46] again to generate the next token and so on and so forth until i'm i [00:34:50] on and so forth until i'm i am done [00:34:52] am done um so this is by and large how a lot of [00:34:55] um so this is by and large how a lot of the state-of-the-art [00:34:57] the state-of-the-art methods for for example machine [00:34:59] methods for for example machine translation works for generating a [00:35:02] translation works for generating a translated sentence given input sentence [00:35:04] translated sentence given input sentence or document [00:35:06] or document summarization or semantic parsing [00:35:08] summarization or semantic parsing each of these are sequence can be framed [00:35:11] each of these are sequence can be framed as sequences sequence tasks [00:35:13] as sequences sequence tasks based on [00:35:15] based on usually these days um [00:35:18] usually these days um basically bird and transformers [00:35:22] okay so that was a really quick and [00:35:24] okay so that was a really quick and high-level world when tour of different [00:35:27] high-level world when tour of different types of differentiable programs from [00:35:29] types of differentiable programs from deep learning so we started with [00:35:32] deep learning so we started with uh now in in hindsight it seems kind of [00:35:35] uh now in in hindsight it seems kind of very simple feed forward [00:35:37] very simple feed forward networks [00:35:40] networks then we looked at images and looked at [00:35:42] then we looked at images and looked at convolutional neural networks which were [00:35:44] convolutional neural networks which were built on conflict conf layers and max [00:35:47] built on conflict conf layers and max pool layers [00:35:48] pool layers and also feed forward so the nice thing [00:35:51] and also feed forward so the nice thing about packaging this in the module is [00:35:52] about packaging this in the module is that now this actually is used in [00:35:54] that now this actually is used in transformers in the different [00:35:56] transformers in the different places as well [00:36:00] for us text and sequences [00:36:03] for us text and sequences we first have to embed them into a [00:36:05] we first have to embed them into a sequence of vectors [00:36:07] sequence of vectors and then we have kind of two [00:36:10] and then we have kind of two choices we can either [00:36:12] choices we can either use [00:36:13] use recurrent neural networks [00:36:15] recurrent neural networks or we can use transformers which are [00:36:18] or we can use transformers which are based on [00:36:20] based on attention [00:36:22] attention um we can use sequence models collapse [00:36:25] um we can use sequence models collapse it into a vector to drive classification [00:36:28] it into a vector to drive classification decisions [00:36:29] decisions or we can use them to generate [00:36:32] or we can use them to generate new sequences as well [00:36:34] new sequences as well so there are many details that are [00:36:36] so there are many details that are glossed over in particular [00:36:39] glossed over in particular the some of the architectures have been [00:36:40] the some of the architectures have been simplified so i encourage you to consult [00:36:42] simplified so i encourage you to consult the original source if you want the kind [00:36:44] the original source if you want the kind of the actual [00:36:45] of the actual um [00:36:46] um uh the full gory details another thing i [00:36:50] uh the full gory details another thing i haven't talked about is learning any of [00:36:51] haven't talked about is learning any of these models [00:36:53] these models it's going to be using some variant of [00:36:54] it's going to be using some variant of stochastic gradient descent but there's [00:36:56] stochastic gradient descent but there's often various tricks that are needed to [00:37:00] often various tricks that are needed to get it to work [00:37:01] get it to work but maybe the final thing i'll leave you [00:37:04] but maybe the final thing i'll leave you with is the idea that [00:37:07] with is the idea that all of these of differential programming [00:37:10] all of these of differential programming which is that all of these complex [00:37:11] which is that all of these complex models are built out of modules and even [00:37:15] models are built out of modules and even if you kind of don't understand or i [00:37:16] if you kind of don't understand or i didn't explain the details i think it's [00:37:19] didn't explain the details i think it's really important to pay attention to the [00:37:21] really important to pay attention to the kind of type signature of these um [00:37:25] kind of type signature of these um of these functions [00:37:26] of these functions as well as with an intuitive idea of [00:37:30] as well as with an intuitive idea of what each of these are doing [00:37:33] what each of these are doing okay so that ends this module thanks for [00:37:35] okay so that ends this module thanks for listening ================================================================================ LECTURE 014 ================================================================================ Artificial Intelligence & Machine Learning 11 - Generalization | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=Gq-Ah-QrOQM --- Transcript [00:00:06] hi in this module i'm going to be [00:00:07] hi in this module i'm going to be talking about the generalization of [00:00:09] talking about the generalization of machine learning algorithm [00:00:11] machine learning algorithm so recall that a machine learning [00:00:13] so recall that a machine learning framework has three design decisions the [00:00:16] framework has three design decisions the first is the hypothesis class which [00:00:18] first is the hypothesis class which could be linear predictors or neural [00:00:20] could be linear predictors or neural networks the second design decision is a [00:00:22] networks the second design decision is a loss function which in the case of [00:00:24] loss function which in the case of regression is could be word loss [00:00:26] regression is could be word loss it's of classification it could be the [00:00:28] it's of classification it could be the hinge or logistic loss if you take the [00:00:30] hinge or logistic loss if you take the loss and you average them you get the [00:00:32] loss and you average them you get the training loss which is our training [00:00:34] training loss which is our training objective that we've so far been [00:00:36] objective that we've so far been optimizing [00:00:37] optimizing and finally we have the optimization [00:00:39] and finally we have the optimization algorithm which is either gradient [00:00:41] algorithm which is either gradient descent or stochastic [00:00:44] descent or stochastic all good so far [00:00:46] all good so far now let's take a step back and be a [00:00:47] now let's take a step back and be a little more critical [00:00:49] little more critical is this [00:00:50] is this the training loss in particular a good [00:00:53] the training loss in particular a good objective [00:00:54] objective optimizing [00:00:57] so here is a little cartoon example [00:01:00] so here is a little cartoon example that does really well on training laws [00:01:03] that does really well on training laws here it goes it's called rote learning [00:01:06] here it goes it's called rote learning so the role learning algorithm is just [00:01:07] so the role learning algorithm is just going to store all the training examples [00:01:10] going to store all the training examples and then it's going to return this [00:01:12] and then it's going to return this predictor and this predictor takes [00:01:15] predictor and this predictor takes an input x and it's going to search for [00:01:17] an input x and it's going to search for x in the training set and they can find [00:01:19] x in the training set and they can find it then it's going to return the [00:01:21] it then it's going to return the corresponding y [00:01:23] corresponding y and otherwise it just gives up some seg [00:01:25] and otherwise it just gives up some seg follows our questions [00:01:27] follows our questions okay so this [00:01:28] okay so this learning algorithm minimizes the [00:01:30] learning algorithm minimizes the objective [00:01:32] objective perfectly it gets zero training loss [00:01:34] perfectly it gets zero training loss but [00:01:35] but you can kind of tell that it's a bad [00:01:36] you can kind of tell that it's a bad idea but it because it doesn't get [00:01:38] idea but it because it doesn't get anything else right [00:01:42] so this was an example of extreme [00:01:44] so this was an example of extreme overfitting here are some examples of [00:01:47] overfitting here are some examples of less extreme overfitting in pictures [00:01:50] less extreme overfitting in pictures so here's an example from [00:01:52] so here's an example from classification um you can see that the [00:01:55] classification um you can see that the green decision boundary here tries [00:01:57] green decision boundary here tries really hard to separate the blue and the [00:01:59] really hard to separate the blue and the red points and does so successfully [00:02:02] red points and does so successfully getting zero trigger [00:02:04] getting zero trigger but you can kind of intuitively sense [00:02:06] but you can kind of intuitively sense that it's overfitting [00:02:08] that it's overfitting and perhaps this black decision boundary [00:02:10] and perhaps this black decision boundary would be better [00:02:12] would be better in case of regression this [00:02:16] red curve gets zero training loss by [00:02:19] red curve gets zero training loss by going through all the training points [00:02:21] going through all the training points but you can see that it's overfitting [00:02:23] but you can see that it's overfitting and instead maybe you should be [00:02:25] and instead maybe you should be capturing the broader trend using simple [00:02:27] capturing the broader trend using simple line [00:02:29] line so in general if you try to overly [00:02:31] so in general if you try to overly optimize the training loss then you risk [00:02:34] optimize the training loss then you risk over fitting the noise in the [00:02:38] so then what is the true objective if it [00:02:41] so then what is the true objective if it isn't the training [00:02:42] isn't the training well to answer that question let's take [00:02:44] well to answer that question let's take a step back and think what are we trying [00:02:46] a step back and think what are we trying to do machine learning is just a means [00:02:49] to do machine learning is just a means and the end is a predictor that you're [00:02:51] and the end is a predictor that you're going to launch into the world and make [00:02:53] going to launch into the world and make predictions on real people [00:02:55] predictions on real people and this just happens to be trained from [00:02:58] and this just happens to be trained from a learning algorithm [00:03:00] a learning algorithm so how good is this predictor in [00:03:03] so how good is this predictor in the world [00:03:04] the world well [00:03:05] well the answer is [00:03:07] the answer is it's the goal that how good it is [00:03:09] it's the goal that how good it is depends on how well it's able to [00:03:12] depends on how well it's able to predict on unseen future examples so our [00:03:16] predict on unseen future examples so our true learning objective should be to [00:03:18] true learning objective should be to minimize the error on unseen future [00:03:20] minimize the error on unseen future examples sounds great [00:03:22] examples sounds great only one small problem is that we don't [00:03:24] only one small problem is that we don't have access to the future and in [00:03:27] have access to the future and in particular if we don't see the examples [00:03:29] particular if we don't see the examples how can we do anything about them [00:03:32] how can we do anything about them so [00:03:33] so often we settle for the next best thing [00:03:35] often we settle for the next best thing which is [00:03:36] which is get a test set [00:03:38] get a test set and the test set is just a set of [00:03:39] and the test set is just a set of examples that you didn't use for [00:03:41] examples that you didn't use for training so it is a surrogate for [00:03:44] training so it is a surrogate for the unseen future examples [00:03:46] the unseen future examples so i make this distinction [00:03:49] so i make this distinction uh because i want to stress the fact [00:03:51] uh because i want to stress the fact that when you deploy a machine learning [00:03:52] that when you deploy a machine learning algorithm into a predictor into the [00:03:55] algorithm into a predictor into the world [00:03:56] world it might encounter all sorts of you know [00:03:58] it might encounter all sorts of you know crazy things um and [00:04:01] crazy things um and what you [00:04:02] what you do in the [00:04:04] do in the training and in the lab is all you have [00:04:06] training and in the lab is all you have is a test set so [00:04:08] is a test set so the what you're trying to do is trying [00:04:10] the what you're trying to do is trying to have the test set be as close and as [00:04:13] to have the test set be as close and as representative of what you actually get [00:04:16] representative of what you actually get in the real world as possible [00:04:20] so [00:04:22] so now we have an intuitive feeling for [00:04:24] now we have an intuitive feeling for overfitting is can we make this a little [00:04:26] overfitting is can we make this a little bit more precise in particular when does [00:04:29] bit more precise in particular when does a learning algorithm generalize from the [00:04:31] a learning algorithm generalize from the training set [00:04:32] training set to the test set because that's kind of [00:04:34] to the test set because that's kind of what we've settled for [00:04:37] what we've settled for so [00:04:37] so there is a way to make this [00:04:39] there is a way to make this mathematically rigorous but i just want [00:04:42] mathematically rigorous but i just want to give you the the framing of how to [00:04:45] to give you the the framing of how to think about generalization [00:04:49] think about generalization so the starting point is f star [00:04:52] so the starting point is f star this is the predictor [00:04:53] this is the predictor that is the ideal thing it predicts [00:04:56] that is the ideal thing it predicts everything as correctly as you can [00:04:59] everything as correctly as you can before [00:05:00] before um this lives in the family of all [00:05:02] um this lives in the family of all predictors [00:05:04] predictors of course we can't get to f star [00:05:06] of course we can't get to f star so what do we do well we do two things [00:05:09] so what do we do well we do two things we first [00:05:10] we first define a hypothesis class script f [00:05:14] define a hypothesis class script f and [00:05:15] and then [00:05:16] then we are going to [00:05:18] we are going to have a learning algorithm that [00:05:20] have a learning algorithm that finds a particular predictor within this [00:05:24] finds a particular predictor within this hypothesis class [00:05:27] hypothesis class so another predictor i'm going to [00:05:30] so another predictor i'm going to talk about is g [00:05:32] talk about is g this is also a kind of a thing that you [00:05:34] this is also a kind of a thing that you can't get a hold of it's the best [00:05:37] can't get a hold of it's the best predictor that you can find in the [00:05:39] predictor that you can find in the hypothesis class [00:05:42] hypothesis class so now [00:05:43] so now this this [00:05:44] this this we're interested in the difference [00:05:46] we're interested in the difference between the error [00:05:47] between the error of the thing you have [00:05:50] of the thing you have and the thing that you wish you had okay [00:05:53] and the thing that you wish you had okay so mathematically that's written as [00:05:55] so mathematically that's written as error of the learn predictor minus the [00:05:57] error of the learn predictor minus the error of f star id [00:06:00] error of f star id and this error can be decomposed into [00:06:02] and this error can be decomposed into two parts [00:06:04] two parts the first part is the approximation [00:06:06] the first part is the approximation error [00:06:07] error approximation error is the difference [00:06:09] approximation error is the difference between g and f star [00:06:11] between g and f star mathematically that's the difference [00:06:13] mathematically that's the difference between the error of [00:06:15] between the error of uh [00:06:17] uh that's approximation error is the [00:06:19] that's approximation error is the difference between [00:06:20] difference between the error of g minus the error of f star [00:06:25] the error of g minus the error of f star this measures how good your hypothesis [00:06:28] this measures how good your hypothesis class is [00:06:29] class is the second [00:06:31] the second error is the estimation error [00:06:33] error is the estimation error which is the gap between [00:06:35] which is the gap between f hat and g [00:06:37] f hat and g this measures how good is the learn [00:06:39] this measures how good is the learn predictor relative to the potential of [00:06:41] predictor relative to the potential of the hypothesis class error of f hat [00:06:44] the hypothesis class error of f hat minus the error of g [00:06:47] minus the error of g and you can verify this identity because [00:06:49] and you can verify this identity because we're doing just subtracting error of g [00:06:51] we're doing just subtracting error of g and adding error of g [00:06:53] and adding error of g so this right hand side is equal to this [00:06:55] so this right hand side is equal to this left-hand side [00:06:57] left-hand side this trivial identity highlights these [00:07:00] this trivial identity highlights these two quantities approximation error and [00:07:02] two quantities approximation error and estimation error and gives us a language [00:07:04] estimation error and gives us a language to talk about the trade-offs and [00:07:05] to talk about the trade-offs and generalization [00:07:09] so let's get some more intuition about [00:07:11] so let's get some more intuition about how approximation and estimation error [00:07:14] how approximation and estimation error behave as you increase the size of the [00:07:16] behave as you increase the size of the hypothesis class [00:07:19] hypothesis class so when [00:07:21] so when the hypothesis class grows [00:07:23] the hypothesis class grows the approximation error will decrease [00:07:27] the approximation error will decrease this is because [00:07:29] this is because approximation error is [00:07:31] approximation error is measuring how good g is and the g is the [00:07:34] measuring how good g is and the g is the best thing in the class and if you're [00:07:35] best thing in the class and if you're adding more things the best thing is [00:07:37] adding more things the best thing is just going to [00:07:39] just going to get better [00:07:40] get better so in other words you're taking a min [00:07:42] so in other words you're taking a min over a larger set bound to decree [00:07:45] over a larger set bound to decree where you're [00:07:46] where you're optimizing [00:07:48] optimizing the second thing that happens is that [00:07:50] the second thing that happens is that the estimation error increases [00:07:53] the estimation error increases when you the hypothesis class grows [00:07:56] when you the hypothesis class grows and this is because [00:07:58] and this is because it's harder to estimate something [00:08:01] it's harder to estimate something more complex there's just more functions [00:08:03] more complex there's just more functions that the learning algorithm has to [00:08:06] figure out which one's the correct one [00:08:08] figure out which one's the correct one given the limited data [00:08:10] given the limited data so there are ways uh to make this more [00:08:13] so there are ways uh to make this more precise using the tools from statistical [00:08:15] precise using the tools from statistical learning theory but i'll just leave it [00:08:16] learning theory but i'll just leave it as an intro for now [00:08:20] so given these tradeoffs [00:08:22] so given these tradeoffs what are the ways that we can use to [00:08:24] what are the ways that we can use to control the hypothesis class size [00:08:28] control the hypothesis class size so we're going to focus our attention to [00:08:30] so we're going to focus our attention to linear predictor so remember in linear [00:08:32] linear predictor so remember in linear predictors the [00:08:34] predictors the the each predictor has a particular [00:08:36] the each predictor has a particular weight vector [00:08:38] weight vector so effectively the number of way the [00:08:40] so effectively the number of way the size of the set of weight vectors [00:08:42] size of the set of weight vectors determines the size of the hypothesis [00:08:44] determines the size of the hypothesis class [00:08:46] class so one thing you can do [00:08:48] so one thing you can do is to reduce the dimensionality [00:08:51] is to reduce the dimensionality of the set of possible wave vectors [00:08:55] of the set of possible wave vectors so pictorially this looks like this so [00:08:57] so pictorially this looks like this so imagine you have three features [00:08:59] imagine you have three features so the set of weight vectors or this [00:09:02] so the set of weight vectors or this three dimension of each weight vector is [00:09:04] three dimension of each weight vector is just a [00:09:06] just a ball [00:09:07] ball and [00:09:08] and if you remove one feature then you end [00:09:10] if you remove one feature then you end up with a two-dimensional [00:09:12] up with a two-dimensional uh ball [00:09:14] uh ball um equivalently this is saying one of [00:09:16] um equivalently this is saying one of the features has to have zero weight [00:09:18] the features has to have zero weight which you can think about as a [00:09:19] which you can think about as a restriction on a set of values that w [00:09:25] so how do you control the dimensionality [00:09:28] so how do you control the dimensionality of practice [00:09:29] of practice the process is called feature selection [00:09:31] the process is called feature selection or feature template selection [00:09:33] or feature template selection you can do this manually by adding [00:09:35] you can do this manually by adding feature templates seeing if they help [00:09:37] feature templates seeing if they help and removing them if they don't and [00:09:39] and removing them if they don't and you're trying to kind of manually figure [00:09:42] you're trying to kind of manually figure out which what is a small set of [00:09:44] out which what is a small set of features that actually gets you good [00:09:46] features that actually gets you good accuracy [00:09:49] accuracy dude there's also ways to do this more [00:09:51] dude there's also ways to do this more automatically you can do forward [00:09:53] automatically you can do forward selection boosting more l1 [00:09:54] selection boosting more l1 regularization this is beyond the scope [00:09:56] regularization this is beyond the scope of the class [00:09:57] of the class but there are ways to make this more [00:10:00] but there are ways to make this more less manual [00:10:02] less manual one thing i want to stress is that [00:10:04] one thing i want to stress is that controlling the dimensionality [00:10:05] controlling the dimensionality dimensionality is this number of [00:10:08] dimensionality is this number of features and that's the key quantity [00:10:10] features and that's the key quantity that matters not the number of feature [00:10:12] that matters not the number of feature templates [00:10:13] templates and also not the complexity of each [00:10:16] and also not the complexity of each individual feature so imagine you write [00:10:18] individual feature so imagine you write a thousand lines to compute one feature [00:10:20] a thousand lines to compute one feature well it's still a very simple uh [00:10:23] well it's still a very simple uh hypothesis class because it's just one [00:10:25] hypothesis class because it's just one feature so in so far as generalization [00:10:28] feature so in so far as generalization is concerned [00:10:31] so the second strategy [00:10:33] so the second strategy is controlling the norm or the length of [00:10:35] is controlling the norm or the length of this wave [00:10:38] this wave so we can reduce the norm of or the [00:10:40] so we can reduce the norm of or the length [00:10:41] length visually this looks like if you have a [00:10:44] visually this looks like if you have a set of weight vectors which are bounded [00:10:47] set of weight vectors which are bounded in length [00:10:48] in length um you can shrink the length and that re [00:10:51] um you can shrink the length and that re results in a smaller [00:10:53] results in a smaller circle which is clearly a smaller number [00:10:56] circle which is clearly a smaller number of weight vectors [00:10:58] of weight vectors and so this is probably the most common [00:11:00] and so this is probably the most common way [00:11:01] way to [00:11:03] to um [00:11:04] um control the norm [00:11:07] control the norm so [00:11:08] so there are [00:11:10] there are two ways to do this one is [00:11:12] two ways to do this one is uh by regularization so remember the [00:11:15] uh by regularization so remember the objective which we didn't like was [00:11:17] objective which we didn't like was minimizing the training loss [00:11:20] minimizing the training loss of w because that can lead for fitting [00:11:23] of w because that can lead for fitting so [00:11:24] so you one way to recognize is you add a [00:11:27] you one way to recognize is you add a penalty term [00:11:29] penalty term lambda over 2 times the norm of w [00:11:32] lambda over 2 times the norm of w squared [00:11:33] squared okay so w is a positive number [00:11:36] okay so w is a positive number which controls the strength of this [00:11:38] which controls the strength of this penalty [00:11:39] penalty and what this penalty does is it says [00:11:42] and what this penalty does is it says let's try to [00:11:44] let's try to minimize the training loss but [00:11:48] minimize the training loss but we also want to keep the norm small [00:11:51] we also want to keep the norm small because we're taking a min over the sum [00:11:52] because we're taking a min over the sum here [00:11:54] here so if we look at what gradient descent [00:11:56] so if we look at what gradient descent does to this objective we can interpret [00:11:58] does to this objective we can interpret it as follows so gradient descent [00:12:00] it as follows so gradient descent remember initializes weights iterates [00:12:02] remember initializes weights iterates over t epochs [00:12:04] over t epochs and performs an update so the update is [00:12:07] and performs an update so the update is w minus eta the step size times [00:12:10] w minus eta the step size times this [00:12:11] this gradient of the training loss [00:12:13] gradient of the training loss and now we take the gradient of this [00:12:16] and now we take the gradient of this penalty which is just lambda times w [00:12:19] penalty which is just lambda times w so [00:12:20] so remember we're subtracting eta so if w [00:12:24] remember we're subtracting eta so if w is um [00:12:26] is um let's say uh 10 comma 10 [00:12:28] let's say uh 10 comma 10 then what we're going to do is we're [00:12:30] then what we're going to do is we're going to subtract that vector and move [00:12:32] going to subtract that vector and move the weights closer uh to zero [00:12:35] the weights closer uh to zero by a amount that depends on [00:12:42] so another way to control the norm is by [00:12:46] so another way to control the norm is by early stopping so early stopping is a is [00:12:49] early stopping so early stopping is a is really easy [00:12:50] really easy to explain so here it is um you run [00:12:53] to explain so here it is um you run gradient descent you initialize w and [00:12:56] gradient descent you initialize w and you repeat a number of epochs and uh you [00:12:59] you repeat a number of epochs and uh you perform the update and the only thing is [00:13:01] perform the update and the only thing is that you're just going to reduce the [00:13:03] that you're just going to reduce the number of epochs you go for [00:13:05] number of epochs you go for that's it [00:13:07] that's it so [00:13:07] so this seems like a hack um there is you [00:13:10] this seems like a hack um there is you can develop some theory about it but the [00:13:12] can develop some theory about it but the intuition is that when you start the [00:13:14] intuition is that when you start the weights at zero that's the smallest norm [00:13:17] weights at zero that's the smallest norm and when you update the weights over a [00:13:20] and when you update the weights over a number of iterations the norm of w is [00:13:22] number of iterations the norm of w is actually going to grow it's not obvious [00:13:24] actually going to grow it's not obvious that this always happens but empirically [00:13:26] that this always happens but empirically this true generally [00:13:29] this true generally so by stopping a gradient descent early [00:13:32] so by stopping a gradient descent early you're saying don't let the norm of w [00:13:34] you're saying don't let the norm of w get too big [00:13:36] get too big so the lesson here is you're trying to [00:13:38] so the lesson here is you're trying to minimize the training error but you're [00:13:40] minimize the training error but you're not trying too hard because you're just [00:13:42] not trying too hard because you're just going to call it quits after a while [00:13:46] okay so let's summarize now [00:13:48] okay so let's summarize now so we started by saying [00:13:51] so we started by saying the training loss [00:13:52] the training loss is not the true objective [00:13:55] is not the true objective the real objective is [00:13:58] the real objective is minimizing the loss on [00:14:00] minimizing the loss on unseen future examples [00:14:02] unseen future examples unfortunately we don't have access to [00:14:04] unfortunately we don't have access to that so we're going to settle for the [00:14:06] that so we're going to settle for the loss on some test data which serves as a [00:14:09] loss on some test data which serves as a surrogate to the unseen examples [00:14:12] surrogate to the unseen examples then we studied approximation and [00:14:15] then we studied approximation and estimation error as a way to [00:14:17] estimation error as a way to understand generalization and it's [00:14:20] understand generalization and it's always going to be a balancing act [00:14:22] always going to be a balancing act between fitting the training error and [00:14:25] between fitting the training error and not [00:14:27] not letting your hypothesis class grow too [00:14:29] letting your hypothesis class grow too big [00:14:31] big and the mantra to end with is perhaps [00:14:34] and the mantra to end with is perhaps just keep it simple [00:14:37] just keep it simple so right now we've introduced a bunch of [00:14:39] so right now we've introduced a bunch of knobs for comparing the size of the [00:14:43] knobs for comparing the size of the hypothesis class [00:14:44] hypothesis class next we'll see how to actually turn ================================================================================ LECTURE 015 ================================================================================ Artificial Intelligence & Machine Learning 12 - Best Practices | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=ouvGV2YZEEM --- Transcript [00:00:05] so we've spent a lot of time talking [00:00:07] so we've spent a lot of time talking about the formal principles of machine [00:00:09] about the formal principles of machine learning in this module i'm going to [00:00:10] learning in this module i'm going to talk more about the empirical aspects of [00:00:14] talk more about the empirical aspects of machine practice [00:00:17] machine practice so recall the three design decisions for [00:00:20] so recall the three design decisions for a machine learning [00:00:22] a machine learning algorithm [00:00:23] algorithm first set up the hypothesis class [00:00:26] first set up the hypothesis class training objective and [00:00:28] training objective and optimization average [00:00:30] optimization average and each of these design decisions has [00:00:32] and each of these design decisions has itself a bunch of different [00:00:35] itself a bunch of different so for the hypothesis class [00:00:37] so for the hypothesis class uh you have to specify the feature [00:00:39] uh you have to specify the feature extractor fee [00:00:41] extractor fee new features quadratic features and you [00:00:44] new features quadratic features and you also have to specify the architecture [00:00:46] also have to specify the architecture the linear predictor or to use a one [00:00:48] the linear predictor or to use a one layer neural network or a two layer [00:00:50] layer neural network or a two layer neural network and how many hidden units [00:00:52] neural network and how many hidden units do you have when you have a neural [00:00:54] do you have when you have a neural network [00:00:56] network so for the training objective um there's [00:00:58] so for the training objective um there's a question of what should the loss [00:00:59] a question of what should the loss function be [00:01:00] function be the hinge loss or there's just a class [00:01:03] the hinge loss or there's just a class um and then what about regularization [00:01:05] um and then what about regularization you use regularization uh what is should [00:01:08] you use regularization uh what is should its strength be [00:01:10] its strength be the optimization algorithm [00:01:12] the optimization algorithm even the vanilla stochastic gradient [00:01:14] even the vanilla stochastic gradient descent has [00:01:15] descent has two hyper parameters one is the number [00:01:17] two hyper parameters one is the number of epochs [00:01:19] of epochs another one is the step size [00:01:22] another one is the step size here it's a constant but maybe you want [00:01:24] here it's a constant but maybe you want it to be decreasing or you want to use a [00:01:26] it to be decreasing or you want to use a fancier adaptive sepsis rule at a grad [00:01:29] fancier adaptive sepsis rule at a grad or atom [00:01:30] or atom if you're training deep neural networks [00:01:32] if you're training deep neural networks there's more things to think about [00:01:35] there's more things to think about there's initialization how much noise do [00:01:37] there's initialization how much noise do you add in [00:01:39] you add in retraining [00:01:41] retraining use a batch size for caster gradient [00:01:43] use a batch size for caster gradient descent as batch size 1 do you use 4 16 [00:01:47] descent as batch size 1 do you use 4 16 um what about using a drop out rate to a [00:01:50] um what about using a drop out rate to a guard against overfitting [00:01:52] guard against overfitting so quickly you see that the design space [00:01:54] so quickly you see that the design space becomes quite big and it's really kind [00:01:57] becomes quite big and it's really kind of like choose your own [00:01:58] of like choose your own venture [00:02:00] venture some of these design decisions can be [00:02:02] some of these design decisions can be made uh based on principles for example [00:02:05] made uh based on principles for example if you believe that your data has some [00:02:06] if you believe that your data has some sort of periodic structure you can add [00:02:09] sort of periodic structure you can add periodic features [00:02:10] periodic features but many of these if not most of the [00:02:13] but many of these if not most of the design decisions are [00:02:15] design decisions are really unclear [00:02:17] really unclear and you sometimes just want an automatic [00:02:19] and you sometimes just want an automatic way for these science decisions made [00:02:25] so each of these design decisions is [00:02:27] so each of these design decisions is called a hyper parameter hyper [00:02:29] called a hyper parameter hyper parameters are the design decisions that [00:02:32] parameters are the design decisions that may need to be made before running the [00:02:34] may need to be made before running the learning so how do you choose these [00:02:38] learning so how do you choose these so how about you we choose the design of [00:02:40] so how about you we choose the design of the hyper parameters to minimize the [00:02:42] the hyper parameters to minimize the training error [00:02:44] training error so this is a [00:02:46] so this is a really bad idea because [00:02:48] really bad idea because the optimum would be to just include all [00:02:51] the optimum would be to just include all the features use no regularization train [00:02:53] the features use no regularization train forever and really drive the training [00:02:56] forever and really drive the training loss down down down [00:02:57] loss down down down but remember the training loss is not [00:02:59] but remember the training loss is not the quantity that we remember about [00:03:03] the quantity that we remember about okay so how about we choose hyper [00:03:04] okay so how about we choose hyper parameters to minimize the test error [00:03:08] parameters to minimize the test error so [00:03:09] so this might generate actually good hyper [00:03:11] this might generate actually good hyper parameters [00:03:12] parameters but this is also bad because now you're [00:03:15] but this is also bad because now you're looking at the test [00:03:16] looking at the test set which makes it an unreliable [00:03:19] set which makes it an unreliable estimate of the actual error [00:03:23] so what do we do then [00:03:24] so what do we do then so the solution is to use a held out [00:03:27] so the solution is to use a held out validation set it's also known as a [00:03:29] validation set it's also known as a holdout set or development set [00:03:31] holdout set or development set and this set is just taken out of the [00:03:33] and this set is just taken out of the training set and it's used to optimize [00:03:35] training set and it's used to optimize hyper parameters [00:03:37] hyper parameters urinals [00:03:39] urinals so here's a picture you leave the test [00:03:40] so here's a picture you leave the test set alone it's isolated from what you're [00:03:43] set alone it's isolated from what you're doing here [00:03:45] doing here and you take the training set and you [00:03:47] and you take the training set and you divide it into a validation set which is [00:03:50] divide it into a validation set which is usually a small fraction [00:03:52] usually a small fraction but large enough to get reliable [00:03:53] but large enough to get reliable estimates and then the rest of the [00:03:55] estimates and then the rest of the training set [00:03:57] training set so now for each setting the hyper [00:03:58] so now for each setting the hyper parameters you can train on this train [00:04:00] parameters you can train on this train minus the validation and then evaluate [00:04:03] minus the validation and then evaluate the validation and then you can choose [00:04:05] the validation and then you can choose the hyper parameters [00:04:07] the hyper parameters to be the ones that [00:04:09] to be the ones that minimize the error on this validation [00:04:15] so now i'm going to talk about [00:04:17] so now i'm going to talk about model development strategy [00:04:20] model development strategy so we've talked a lot about uh the [00:04:22] so we've talked a lot about uh the formal machinery [00:04:24] formal machinery and i'm just going to walk you through [00:04:26] and i'm just going to walk you through kind of a typical [00:04:28] kind of a typical development cycle [00:04:31] development cycle so what you do is you start out by [00:04:34] so what you do is you start out by splitting the data you get some data and [00:04:36] splitting the data you get some data and you split it into train validation and [00:04:39] you split it into train validation and test you lock away the test set [00:04:42] test you lock away the test set and you look at the data not the test [00:04:44] and you look at the data not the test data the train or validation data to get [00:04:47] data the train or validation data to get intuition you want to understand [00:04:49] intuition you want to understand uh what kind of properties your problem [00:04:52] uh what kind of properties your problem that you're trying to solve has [00:04:55] that you're trying to solve has and then you repeat [00:04:56] and then you repeat the following [00:04:57] the following so you implement a model architecture or [00:05:00] so you implement a model architecture or a feature extractor [00:05:02] a feature extractor or you adjust some hyper parameters and [00:05:05] or you adjust some hyper parameters and then you run the learning algorithm you [00:05:07] then you run the learning algorithm you train a model [00:05:09] train a model and then you look at sandy check the [00:05:11] and then you look at sandy check the train and validation errors along the [00:05:13] train and validation errors along the way make sure the training error is [00:05:15] way make sure the training error is going down making sure the validation [00:05:16] going down making sure the validation error more or less goes down it goes up [00:05:19] error more or less goes down it goes up that means you're over fitting [00:05:21] that means you're over fitting you also want to look at [00:05:23] you also want to look at at least for linear classifiers the [00:05:25] at least for linear classifiers the weights if they're interpretable get uh [00:05:27] weights if they're interpretable get uh against any check and get some intuition [00:05:30] against any check and get some intuition and you also want to look at some [00:05:31] and you also want to look at some prediction errors you want to understand [00:05:33] prediction errors you want to understand if the model is not doing as well as you [00:05:36] if the model is not doing as well as you like [00:05:37] like how is it making how is it screwing up [00:05:41] how is it making how is it screwing up and you repeat this until you're [00:05:43] and you repeat this until you're satisfied and then finally you unlock [00:05:46] satisfied and then finally you unlock the test set you evaluate on the test [00:05:48] the test set you evaluate on the test set to get your final error rates that [00:05:50] set to get your final error rates that you put in your [00:05:53] so let's walk through an example of how [00:05:57] so let's walk through an example of how works okay so i'm going to take this [00:05:59] works okay so i'm going to take this simple example of um a named ntt [00:06:02] simple example of um a named ntt recognition [00:06:03] recognition um so here the input is a string [00:06:07] um so here the input is a string which contains a name along with a word [00:06:11] which contains a name along with a word to the left and the word to the right to [00:06:13] to the left and the word to the right to offer some context [00:06:15] offer some context and the output is going to be whether [00:06:17] and the output is going to be whether x [00:06:18] x excluding this initial and final word is [00:06:22] excluding this initial and final word is a person or not in this case gavin [00:06:24] a person or not in this case gavin newsom is uh plus one [00:06:27] newsom is uh plus one person [00:06:28] person okay so now i'm going to uh code this up [00:06:34] okay so now i'm going to uh code this up so [00:06:35] so we have um ner.py [00:06:38] we have um ner.py so this is a file um [00:06:41] so this is a file um that we're going to use and uh this file [00:06:44] that we're going to use and uh this file actually depends on [00:06:45] actually depends on submission.py.util.py [00:06:48] submission.py.util.py from your sentiment homework so if you [00:06:50] from your sentiment homework so if you have that you can plug it in with code [00:06:52] have that you can plug it in with code and see it in action for yourself [00:06:54] and see it in action for yourself um so let me just walk through this [00:06:57] um so let me just walk through this first let's read the training examples [00:07:00] first let's read the training examples the validation examples and then we're [00:07:02] the validation examples and then we're going to learn a predictor [00:07:04] going to learn a predictor this returns a set of weights [00:07:07] this returns a set of weights um we're going to output the weights to [00:07:08] um we're going to output the weights to a file [00:07:10] a file output the [00:07:11] output the error analysis to file which i'll show [00:07:13] error analysis to file which i'll show you in a second and then this part is [00:07:15] you in a second and then this part is commented out because we don't want to [00:07:17] commented out because we don't want to run [00:07:17] run uh evaluation on a test set just yet [00:07:21] uh evaluation on a test set just yet okay so the first thing we want to do is [00:07:23] okay so the first thing we want to do is just open up this training file just to [00:07:25] just open up this training file just to get some intuition for what the data [00:07:27] get some intuition for what the data looks like okay so each line here is a [00:07:29] looks like okay so each line here is a training example um this is why this is [00:07:32] training example um this is why this is a minus which means not a person and [00:07:34] a minus which means not a person and this is x [00:07:36] this is x mauritius into mauritius is uh not a [00:07:39] mauritius into mauritius is uh not a person [00:07:40] person um us is not a person malaysia's not a [00:07:43] um us is not a person malaysia's not a person [00:07:44] person sarah pakowski is a person [00:07:46] sarah pakowski is a person plus one moscow is not a person [00:07:49] plus one moscow is not a person um and so on [00:07:52] um and so on see all these training examples we have [00:07:54] see all these training examples we have around 7000 [00:07:59] okay so so now let's begin by [00:08:02] okay so so now let's begin by implementing um [00:08:04] implementing um feature extractor i have to implement [00:08:06] feature extractor i have to implement this [00:08:07] this function it's going to take x [00:08:10] function it's going to take x and [00:08:11] and just to uh [00:08:14] just to uh put a comment what does x looks like so [00:08:17] put a comment what does x looks like so example x [00:08:19] example x is the string here [00:08:21] is the string here um and then i'm going to define the [00:08:23] um and then i'm going to define the feature vector um to be a dictionary [00:08:28] feature vector um to be a dictionary of upload so this is going to be a [00:08:30] of upload so this is going to be a sparse representation of the future [00:08:33] sparse representation of the future vector [00:08:34] vector um okay so that is the simplest feature [00:08:38] um okay so that is the simplest feature extractor it happens to be the empty [00:08:40] extractor it happens to be the empty vector with no features but let's just [00:08:42] vector with no features but let's just see [00:08:43] see what happens [00:08:45] what happens um start simple so starting really [00:08:47] um start simple so starting really simple here [00:08:49] simple here okay so let's do python [00:08:52] okay so let's do python py [00:08:53] py uh we see that across a number of [00:08:56] uh we see that across a number of iterations [00:08:57] iterations test error is uh really high 72 per [00:09:01] test error is uh really high 72 per percent error [00:09:02] percent error this is not surprising because we [00:09:04] this is not surprising because we haven't we don't have any features [00:09:06] haven't we don't have any features okay so let's add some features so [00:09:09] okay so let's add some features so uh maybe a kind of obvious feature to [00:09:11] uh maybe a kind of obvious feature to add is uh looking at the identity of [00:09:15] add is uh looking at the identity of entity [00:09:17] entity so what we're gonna do is i'm gonna [00:09:19] so what we're gonna do is i'm gonna process x a little bit so i'm going to [00:09:22] process x a little bit so i'm going to split it [00:09:23] split it into a bunch of tokens [00:09:25] into a bunch of tokens so a list containing took [00:09:27] so a list containing took mauritius and into [00:09:29] mauritius and into and then i'm going to split it up into [00:09:32] and then i'm going to split it up into the left [00:09:33] the left entity and the right [00:09:35] entity and the right and this is going to be token 0 tokens 1 [00:09:39] and this is going to be token 0 tokens 1 through um [00:09:41] through um everything except for the last [00:09:43] everything except for the last token and then i'm going to have the [00:09:45] token and then i'm going to have the last [00:09:46] last okay so i'm just going to divide x into [00:09:49] okay so i'm just going to divide x into these three parts [00:09:52] these three parts so now i can define a feature template [00:09:56] so now i can define a feature template so let's define the feature template to [00:09:59] so let's define the feature template to be fee [00:10:01] be fee entity [00:10:02] entity is um and then i'm entertaining is now [00:10:06] is um and then i'm entertaining is now an array so i'm going to join it and i'm [00:10:08] an array so i'm going to join it and i'm going to set that to 1. so this is a [00:10:10] going to set that to 1. so this is a binary uh this is one line that [00:10:13] binary uh this is one line that represents [00:10:14] represents one feature template but it instantiates [00:10:17] one feature template but it instantiates into [00:10:18] into a whole bunch of different [00:10:21] a whole bunch of different features one for every possible entity [00:10:27] okay so i'm naming the feature in a way [00:10:30] okay so i'm naming the feature in a way that makes it [00:10:32] that makes it really interpretable [00:10:34] really interpretable so we'll see how this is quite useful [00:10:38] okay so let's um run this [00:10:41] okay so let's um run this see what happens [00:10:43] see what happens and now the error is uh 19 [00:10:46] and now the error is uh 19 it's in progress the training error is [00:10:48] it's in progress the training error is really low which means that we're really [00:10:50] really low which means that we're really fitting the training error [00:10:52] fitting the training error um so now let's go and inspect what [00:10:55] um so now let's go and inspect what happens we look at the weights here [00:10:58] happens we look at the weights here um so this is sorted from positive to [00:11:01] um so this is sorted from positive to negative [00:11:02] negative here we have the feature name and [00:11:05] here we have the feature name and and the weight [00:11:06] and the weight so up here you can see that entity is [00:11:09] so up here you can see that entity is and these generally happen to be [00:11:13] and these generally happen to be um and if you look at the bottom [00:11:15] um and if you look at the bottom we see things which are not names [00:11:18] we see things which are not names okay so this is a good sanity check that [00:11:21] okay so this is a good sanity check that says suggest that the learning is [00:11:23] says suggest that the learning is working [00:11:24] working um let's look at the error analysis so [00:11:27] um let's look at the error analysis so this shows you on the validation set [00:11:30] this shows you on the validation set on the predictions that the model makes [00:11:33] on the predictions that the model makes so here is the first [00:11:35] so here is the first input [00:11:36] input um [00:11:39] on the true [00:11:40] on the true label is one [00:11:42] label is one person but we predicted minus one [00:11:45] person but we predicted minus one which is wrong [00:11:47] which is wrong and here i'm showing the the features [00:11:50] and here i'm showing the the features that and their particular weights so [00:11:53] that and their particular weights so entity is a romero that has a feature [00:11:55] entity is a romero that has a feature value of one its weight is zero the [00:11:58] value of one its weight is zero the weight of zero generally means that it [00:12:00] weight of zero generally means that it never saw this feature at training time [00:12:02] never saw this feature at training time so therefore the score is zero so we [00:12:05] so therefore the score is zero so we have no idea what to do example [00:12:08] have no idea what to do example um here is another example of the senate [00:12:11] um here is another example of the senate it says entity is senate senate and uh [00:12:14] it says entity is senate senate and uh senate has a weight of negative one so [00:12:16] senate has a weight of negative one so we have a score of negative one and we [00:12:18] we have a score of negative one and we make prediction [00:12:20] make prediction um so let me just look through these [00:12:24] um so let me just look through these incorrect predictions [00:12:26] incorrect predictions margaret allah [00:12:29] margaret allah flamed [00:12:31] flamed was midfielder [00:12:33] was midfielder and you can kind of see well [00:12:36] and you can kind of see well it's unreasonable to expect that the [00:12:38] it's unreasonable to expect that the entities have been seen [00:12:40] entities have been seen before [00:12:42] before so why don't we try to use the context [00:12:44] so why don't we try to use the context to figure out whether uh the name is the [00:12:47] to figure out whether uh the name is the entity [00:12:48] entity the name is a person or not [00:12:52] the name is a person or not so let's go over here um [00:12:55] so let's go over here um i'm going to [00:12:57] i'm going to define a feature template left is [00:13:01] define a feature template left is left [00:13:02] left and [00:13:03] and right is right [00:13:05] right is right um [00:13:06] um so this is a feature template left is [00:13:08] so this is a feature template left is blank as we've written past and i'm [00:13:11] blank as we've written past and i'm instantiating this feature for this [00:13:13] instantiating this feature for this particular x [00:13:15] particular x which is taking um the actual value left [00:13:18] which is taking um the actual value left here okay so i added two feature [00:13:21] here okay so i added two feature templates um let's run this [00:13:25] templates um let's run this and now we can see that [00:13:27] and now we can see that the error rate has [00:13:30] the error rate has gone down to 11 [00:13:32] gone down to 11 right [00:13:34] right um [00:13:35] um notice that the training error doesn't [00:13:36] notice that the training error doesn't actually go down as fast because with [00:13:39] actually go down as fast because with more features sometimes it's harder to [00:13:41] more features sometimes it's harder to optimize but that's okay because we [00:13:43] optimize but that's okay because we don't care about the training and we [00:13:45] don't care about the training and we only care about [00:13:46] only care about the test error going down um one one uh [00:13:49] the test error going down um one one uh note is that this says test error but [00:13:52] note is that this says test error but i'm actually [00:13:53] i'm actually um [00:13:54] um passing in here the validation set a [00:13:58] passing in here the validation set a learn predictor the function turns out [00:14:00] learn predictor the function turns out test error because it doesn't [00:14:03] test error because it doesn't have any idea where [00:14:06] okay so let's look at the weights [00:14:09] okay so let's look at the weights so [00:14:10] so at the top [00:14:11] at the top still uh features that's look at the [00:14:13] still uh features that's look at the entity [00:14:14] entity clinton um [00:14:16] clinton um nelson [00:14:18] nelson here is some examples left is minister [00:14:22] here is some examples left is minister so if you have minister someone that is [00:14:25] so if you have minister someone that is probably a person president someone as a [00:14:27] probably a person president someone as a person [00:14:28] person um if you look down here you see that if [00:14:31] um if you look down here you see that if the left context is the [00:14:33] the left context is the the weight is minus which means that the [00:14:36] the weight is minus which means that the something is not a person so this all [00:14:38] something is not a person so this all makes sense it's good sanity check [00:14:41] makes sense it's good sanity check let's look at the error analysis so now [00:14:44] let's look at the error analysis so now we're getting a royal romero correct um [00:14:46] we're getting a royal romero correct um let's see what we're getting wrong so [00:14:49] let's see what we're getting wrong so it is blamed [00:14:51] it is blamed um [00:14:52] um felix and attila [00:14:53] felix and attila um never uh i guess we've seen that [00:14:56] um never uh i guess we've seen that person before um [00:14:59] person before um workers party [00:15:00] workers party um it's never seen this [00:15:02] um it's never seen this um [00:15:03] um and you know now you can think more [00:15:06] and you know now you can think more brainstorm and think well [00:15:08] brainstorm and think well could we maybe we don't aren't gonna see [00:15:11] could we maybe we don't aren't gonna see the exact [00:15:12] the exact stream match of an entity but you can [00:15:14] stream match of an entity but you can maybe break it down [00:15:17] maybe break it down pieces [00:15:18] pieces so what i'm going to do is for each word [00:15:21] so what i'm going to do is for each word in entity [00:15:22] in entity i'm going to say entity contains [00:15:26] i'm going to say entity contains a word [00:15:28] a word so it's pretty easy to write feature [00:15:30] so it's pretty easy to write feature templates um [00:15:32] templates um and this is very intuitive so this [00:15:33] and this is very intuitive so this feature template just says does this [00:15:35] feature template just says does this entity contain a [00:15:37] entity contain a particular word [00:15:39] particular word in the entity [00:15:42] in the entity okay so let's run this and now the error [00:15:44] okay so let's run this and now the error rate is going down to six percent okay [00:15:47] rate is going down to six percent okay so we're making good progress um [00:15:51] so we're making good progress um let's look at the weights i need to [00:15:54] let's look at the weights i need to check this [00:15:55] check this um so an entity contains [00:15:58] um so an entity contains um this feature will fire both for [00:16:01] um this feature will fire both for clinton and also as well as bill clinton [00:16:03] clinton and also as well as bill clinton um [00:16:04] um and these contains features are [00:16:06] and these contains features are generally seen more general and they're [00:16:08] generally seen more general and they're given high weight [00:16:10] given high weight at the bottom [00:16:11] at the bottom again if it has new in it it's probably [00:16:14] again if it has new in it it's probably like new york or something and that's [00:16:17] like new york or something and that's not probably not going to be a person [00:16:19] not probably not going to be a person contains newsroom i don't know too many [00:16:21] contains newsroom i don't know too many folks [00:16:24] um [00:16:25] um error analysis uh let's see what's wrong [00:16:28] error analysis uh let's see what's wrong here we're still getting this wrong [00:16:30] here we're still getting this wrong sometimes you just [00:16:33] going on we're still getting this [00:16:35] going on we're still getting this kurdistan party [00:16:37] kurdistan party one wrong [00:16:38] one wrong um [00:16:39] um iron man and sometimes it's it's kind of [00:16:41] iron man and sometimes it's it's kind of hard to know what to do so [00:16:44] hard to know what to do so um [00:16:45] um you know let's just try something else [00:16:47] you know let's just try something else let's say [00:16:48] let's say uh going in the spirit of decomposing [00:16:51] uh going in the spirit of decomposing the entity into words we can go further [00:16:54] the entity into words we can go further and have patterns that match on prefix [00:16:57] and have patterns that match on prefix prefixes and suffixes [00:16:59] prefixes and suffixes so we can say the entity contains the [00:17:02] so we can say the entity contains the prefix [00:17:03] prefix word and just [00:17:05] word and just arbitrarily choose the first four [00:17:07] arbitrarily choose the first four uh characters [00:17:09] uh characters um it changes the suffix [00:17:12] um it changes the suffix word and i'm gonna choose last [00:17:15] word and i'm gonna choose last characters [00:17:17] so first for characters [00:17:20] so first for characters um last character [00:17:23] um last character okay [00:17:24] okay all right so let's [00:17:26] all right so let's see how this does [00:17:29] see how this does now we can see that the error rate is [00:17:31] now we can see that the error rate is going down to four percent so we made a [00:17:33] going down to four percent so we made a little bit of progress uh there [00:17:35] little bit of progress uh there um i'm actually going to call it uh [00:17:37] um i'm actually going to call it uh quits for now just in the interest of [00:17:39] quits for now just in the interest of time we've made a lot of progress from [00:17:41] time we've made a lot of progress from seventy percent error to only four [00:17:43] seventy percent error to only four percent error [00:17:45] percent error but remember this is only on the [00:17:47] but remember this is only on the validation set [00:17:48] validation set so now comes the final [00:17:51] so now comes the final trial [00:17:52] trial see how well [00:17:53] see how well this does on the test set so read in the [00:17:56] this does on the test set so read in the test set and then evaluate [00:17:59] test set and then evaluate the predictor on this [00:18:01] the predictor on this test set [00:18:02] test set and i'm going to print it out so let's [00:18:04] and i'm going to print it out so let's run this [00:18:06] run this and hope that we didn't overfit [00:18:09] and hope that we didn't overfit um and here we actually [00:18:11] um and here we actually did even better on the test set than [00:18:15] did even better on the test set than validation set which sometimes happens [00:18:17] validation set which sometimes happens um [00:18:18] um there's just always some randomness here [00:18:20] there's just always some randomness here and so we ended up with four percent uh [00:18:23] and so we ended up with four percent uh iterate which is um pretty good for [00:18:26] iterate which is um pretty good for uh 10 minutes before [00:18:28] uh 10 minutes before so in practice it's things are probably [00:18:31] so in practice it's things are probably not going to go as smooth as this this [00:18:33] not going to go as smooth as this this is just kind of illustrative example um [00:18:35] is just kind of illustrative example um to illustrate the the kind of process [00:18:37] to illustrate the the kind of process here [00:18:40] okay so there's much more to be said [00:18:43] okay so there's much more to be said about the practice of machine learning [00:18:45] about the practice of machine learning um but [00:18:46] um but so [00:18:47] so i'm just going to give you kind of some [00:18:48] i'm just going to give you kind of some general [00:18:49] general you know advice [00:18:51] you know advice many of these tips are kind of related [00:18:53] many of these tips are kind of related to having best software engineering [00:18:56] to having best software engineering practice [00:18:57] practice so the first thing i want to talk about [00:18:59] so the first thing i want to talk about is starting simple so the wrong thing to [00:19:01] is starting simple so the wrong thing to do is code up a really complicated [00:19:03] do is code up a really complicated learning algorithm run it on a million [00:19:05] learning algorithm run it on a million examples and watch again fashion burn [00:19:08] examples and watch again fashion burn wonder what happened [00:19:10] wonder what happened um [00:19:11] um simplify um both in terms of running on [00:19:14] simplify um both in terms of running on small subsets of your data [00:19:16] small subsets of your data maybe even synthetic data and also start [00:19:18] maybe even synthetic data and also start with a simple baseline model we started [00:19:20] with a simple baseline model we started with a literally a classifier that have [00:19:22] with a literally a classifier that have zero features and maybe one feature just [00:19:25] zero features and maybe one feature just so we can see and understand what it's [00:19:27] so we can see and understand what it's doing so this is important because it [00:19:30] doing so this is important because it allows you to work in a regime where [00:19:32] allows you to work in a regime where things are understandable [00:19:34] things are understandable and also importantly [00:19:36] and also importantly things run quickly [00:19:38] things run quickly so you want fast iteration time like [00:19:41] so you want fast iteration time like what we you just saw was you we can [00:19:43] what we you just saw was you we can quickly try something get a result try [00:19:45] quickly try something get a result try something weird result if you have to [00:19:47] something weird result if you have to wait [00:19:48] wait 10 hours to get a result then you're [00:19:50] 10 hours to get a result then you're just not going to make as much progress [00:19:52] just not going to make as much progress because you won't be rigged [00:19:54] because you won't be rigged um [00:19:55] um one [00:19:56] one sanity check that i would recommend is [00:19:59] sanity check that i would recommend is uh try to train on very few examples [00:20:02] uh try to train on very few examples like five examples and see if you can [00:20:05] like five examples and see if you can overfit see if you can drive the [00:20:07] overfit see if you can drive the training error to zero [00:20:09] training error to zero now of course doing so is not going to [00:20:11] now of course doing so is not going to give you a useful model [00:20:13] give you a useful model but it will [00:20:14] but it will tell you whether your the machinery is [00:20:17] tell you whether your the machinery is working or not if you're unable to fit [00:20:19] working or not if you're unable to fit five examples that [00:20:21] five examples that something is wrong it could mean that [00:20:24] something is wrong it could mean that your data is too noisy or you lack [00:20:26] your data is too noisy or you lack certain features or your model is not [00:20:28] certain features or your model is not expressive enough [00:20:29] expressive enough or your learning algorithm isn't working [00:20:31] or your learning algorithm isn't working maybe you try to optimize a zero loss or [00:20:34] maybe you try to optimize a zero loss or something i don't know but [00:20:36] something i don't know but anyway it's a good sanity check [00:20:38] anyway it's a good sanity check uh the second thing is log everything [00:20:42] uh the second thing is log everything print out metrics [00:20:44] print out metrics uh um so track the training lots and the [00:20:47] uh um so track the training lots and the validation lots over time make sure that [00:20:49] validation lots over time make sure that it's going down as intended record the [00:20:52] it's going down as intended record the hyper parameters [00:20:54] hyper parameters that you're using to train so you can [00:20:55] that you're using to train so you can keep track of what you actually did to [00:20:59] keep track of what you actually did to get through your result for now [00:21:00] get through your result for now statistics of the data set how many [00:21:02] statistics of the data set how many features how many examples um the model [00:21:06] features how many examples um the model uh how many weights are there the norm [00:21:08] uh how many weights are there the norm of the weights predictions as you saw [00:21:10] of the weights predictions as you saw and it was really useful to have this [00:21:12] and it was really useful to have this file that showed you [00:21:14] file that showed you exactly how the model makes each [00:21:17] exactly how the model makes each prediction it just gives you a lot more [00:21:19] prediction it just gives you a lot more insight [00:21:20] insight finally [00:21:21] finally uh spend some time figuring how to [00:21:23] uh spend some time figuring how to organize your experiments i like to have [00:21:26] organize your experiments i like to have each run i make go into a separate [00:21:29] each run i make go into a separate folder which you can save [00:21:31] folder which you can save um [00:21:32] um and so then later back later you can go [00:21:35] and so then later back later you can go back and check [00:21:36] back and check all the models the predictions and [00:21:40] all the models the predictions and a record of all the hyper parameters [00:21:42] a record of all the hyper parameters that you use so that you just have an [00:21:44] that you use so that you just have an idea of what you did [00:21:47] idea of what you did and then a note about reporting your [00:21:49] and then a note about reporting your results [00:21:50] results it's important to run your experiments [00:21:52] it's important to run your experiments multiple times [00:21:54] multiple times particularly with different random seeds [00:21:56] particularly with different random seeds so you make sure that your [00:21:59] so you make sure that your results are stable and reliable and then [00:22:02] results are stable and reliable and then you can report the mean and the standard [00:22:05] you can report the mean and the standard deviation over [00:22:07] deviation over these random seeds [00:22:10] and finally um [00:22:12] and finally um we often in machine learning tend to be [00:22:15] we often in machine learning tend to be guilty of distilling everything down [00:22:17] guilty of distilling everything down into one number as a test error [00:22:20] into one number as a test error um but in in practice we might be [00:22:24] um but in in practice we might be interested in multiple metrics in [00:22:26] interested in multiple metrics in particular it's important to make sure [00:22:28] particular it's important to make sure that [00:22:29] that if you get five percent error understand [00:22:31] if you get five percent error understand what those errors are sometimes it's [00:22:34] what those errors are sometimes it's useful to report the error rates on [00:22:36] useful to report the error rates on different [00:22:37] different minority groups or subpopulations if you [00:22:39] minority groups or subpopulations if you have access to that information [00:22:42] have access to that information and generally be cognizant of the biases [00:22:44] and generally be cognizant of the biases in your model [00:22:47] okay to summary summarize we've talked [00:22:50] okay to summary summarize we've talked about the practice of machine learning [00:22:53] about the practice of machine learning first make sure you have good data [00:22:55] first make sure you have good data hygiene [00:22:56] hygiene separate your test set leave it alone [00:22:58] separate your test set leave it alone and divide your training set into a [00:22:59] and divide your training set into a validation and the rest don't look at [00:23:02] validation and the rest don't look at the touch set [00:23:03] the touch set but do look at the training or [00:23:05] but do look at the training or validation set to understand the shape [00:23:07] validation set to understand the shape of your data so that you can [00:23:11] of your data so that you can have intuition for [00:23:13] have intuition for deciding how to model the [00:23:15] deciding how to model the start simple [00:23:17] start simple and finally [00:23:19] and finally there's a lot [00:23:20] there's a lot of design decisions which can be [00:23:22] of design decisions which can be overwhelming [00:23:23] overwhelming at first and [00:23:25] at first and the most important thing is to practice [00:23:28] the most important thing is to practice doing that [00:23:29] doing that doing experimentation so that you can [00:23:31] doing experimentation so that you can start developing an intuition of what [00:23:33] start developing an intuition of what hyperparameters [00:23:35] hyperparameters matter and what kind of effect they have [00:23:37] matter and what kind of effect they have and then eventually developing a set of [00:23:39] and then eventually developing a set of best practices for yourself [00:23:42] best practices for yourself okay that's all ================================================================================ LECTURE 016 ================================================================================ Machine Learning 13 - K-means | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=5-Fn8R9fH7A --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about k-means a simple algorithm for [00:00:09] about k-means a simple algorithm for clustering [00:00:10] clustering one form of unsupervised learning [00:00:13] one form of unsupervised learning so i want to start with a classical [00:00:15] so i want to start with a classical example of clustering from the nlp [00:00:18] example of clustering from the nlp literature around clustering [00:00:20] literature around clustering so this was the [00:00:22] so this was the unsupervised learning method of choice [00:00:24] unsupervised learning method of choice before word vector [00:00:26] before word vector or contextualized word [00:00:28] or contextualized word so on so the input to the algorithm was [00:00:32] so on so the input to the algorithm was simply raw text lots of lots of words of [00:00:34] simply raw text lots of lots of words of musicals and the output was a clustering [00:00:38] musicals and the output was a clustering of those words [00:00:40] of those words um so [00:00:41] um so the algorithm was able to pick out a [00:00:43] the algorithm was able to pick out a cluster one which [00:00:45] cluster one which friday monday thursday and generally [00:00:47] friday monday thursday and generally days of the week a certude had months [00:00:51] days of the week a certude had months um cluster three had some sort of [00:00:53] um cluster three had some sort of natural resources [00:00:54] natural resources and you can list and each cluster had [00:00:58] and you can list and each cluster had fairly coherent [00:01:00] fairly coherent uh [00:01:01] uh structure [00:01:02] structure in it [00:01:04] in it one thing that's [00:01:05] one thing that's quite interesting to note is that no one [00:01:08] quite interesting to note is that no one told this algorithm [00:01:09] told this algorithm what days of the week were or what [00:01:11] what days of the week were or what months or what family relations are it [00:01:13] months or what family relations are it was able to simply figure all the stuff [00:01:16] was able to simply figure all the stuff out just by looking at the data [00:01:19] out just by looking at the data so on a personal note brown clustering [00:01:22] so on a personal note brown clustering was actually my first experience that [00:01:24] was actually my first experience that got me to pursue research in nlp in the [00:01:26] got me to pursue research in nlp in the first place just seeing the results of [00:01:28] first place just seeing the results of unsupervised learning uh when it worked [00:01:30] unsupervised learning uh when it worked was kind of really magical [00:01:32] was kind of really magical and of course today we're seeing even [00:01:34] and of course today we're seeing even more strong evidence of the sheer [00:01:36] more strong evidence of the sheer potential of unsupervised learning with [00:01:39] potential of unsupervised learning with a language [00:01:40] a language for attention [00:01:44] so [00:01:45] so i want to contrast [00:01:47] i want to contrast to unsupervised learning and supervised [00:01:49] to unsupervised learning and supervised learning [00:01:50] learning so in [00:01:52] so in supervised learning we looked at [00:01:53] supervised learning we looked at classification you're given a training [00:01:55] classification you're given a training set which is labeled [00:01:57] set which is labeled so [00:01:58] so inputs are labeled with an output y [00:02:01] inputs are labeled with an output y this goes into a learning algorithm you [00:02:02] this goes into a learning algorithm you get a classifier and then you click on [00:02:05] get a classifier and then you click on new points [00:02:08] new points and [00:02:09] and the main challenge with on the label [00:02:11] the main challenge with on the label data is that label data is expensive to [00:02:13] data is that label data is expensive to obtain you have to have [00:02:15] obtain you have to have annotated [00:02:17] annotated often domain experts add [00:02:21] so in contrast [00:02:24] so in contrast unsupervised learning um [00:02:26] unsupervised learning um of which clustering is a form of [00:02:29] of which clustering is a form of is uses on label data that's very cheap [00:02:33] is uses on label data that's very cheap to obtain [00:02:34] to obtain so as a concrete example suppose you [00:02:37] so as a concrete example suppose you have uh some points [00:02:39] have uh some points here [00:02:40] here and they're just unlabeled points [00:02:43] and they're just unlabeled points um [00:02:44] um any data can be sizes here there's [00:02:47] any data can be sizes here there's points [00:02:48] points um and we want a learning algorithm [00:02:50] um and we want a learning algorithm that can produce [00:02:52] that can produce not a predictor but [00:02:53] not a predictor but an assignment of each point uh to a [00:02:57] an assignment of each point uh to a cluster i have two clusters [00:02:59] cluster i have two clusters so let's assign the first [00:03:02] so let's assign the first uh four points um the blue cluster [00:03:06] uh four points um the blue cluster here are the four points blue cluster [00:03:09] here are the four points blue cluster and the second set of points uh to the [00:03:12] and the second set of points uh to the orange cluster [00:03:13] orange cluster are the points down [00:03:17] so intuitively [00:03:19] so intuitively we want to assign nearby points to the [00:03:22] we want to assign nearby points to the same cluster [00:03:23] same cluster and you can kind of see that these [00:03:25] and you can kind of see that these points are closer to each other and [00:03:27] points are closer to each other and these points are closer to the other end [00:03:30] these points are closer to the other end some separation [00:03:32] some separation clusters here [00:03:36] so more formally the task of clustering [00:03:38] so more formally the task of clustering is we're given some training points uh d [00:03:41] is we're given some training points uh d train and this is the list of points [00:03:44] train and this is the list of points under xn [00:03:45] under xn and the output is an assignment of each [00:03:48] and the output is an assignment of each point to a cluster [00:03:50] point to a cluster formally [00:03:51] formally we have an assignment vector z [00:03:54] we have an assignment vector z which is going to be here z1 [00:03:56] which is going to be here z1 all the way through zn [00:03:58] all the way through zn where each zi [00:04:00] where each zi is an element from there's a number [00:04:03] is an element from there's a number between one [00:04:04] between one and k [00:04:06] and k so assuming we have big k clusters [00:04:09] so assuming we have big k clusters each point is assigned to one of the [00:04:15] so [00:04:17] so what makes a cluster [00:04:20] what makes a cluster the key assumption behind [00:04:22] the key assumption behind k-means [00:04:24] k-means is that each cluster can be represented [00:04:26] is that each cluster can be represented faithfully by a centroid [00:04:30] and we're going to concatenate all the [00:04:32] and we're going to concatenate all the centers together to form uh [00:04:36] so this diagram illustrates what a [00:04:38] so this diagram illustrates what a centroid is trying to capture [00:04:40] centroid is trying to capture the centroid is in some sense [00:04:43] the centroid is in some sense a point which is closest to all the [00:04:46] a point which is closest to all the other points on in that clusters [00:04:49] other points on in that clusters represents a cluster by some concrete [00:04:52] represents a cluster by some concrete point in space [00:04:54] point in space centroid [00:04:56] centroid so the intuition in terms of centroid is [00:04:59] so the intuition in terms of centroid is that we want [00:05:00] that we want each point [00:05:01] each point to be close to its assigned centroid u [00:05:05] to be close to its assigned centroid u of z i's a bit of notation which i'll go [00:05:08] of z i's a bit of notation which i'll go through later but intuitively you can [00:05:10] through later but intuitively you can look at this point over here [00:05:12] look at this point over here we want this point to be close to [00:05:15] we want this point to be close to um the centroid of the sine cluster [00:05:20] um the centroid of the sine cluster and this point [00:05:22] and this point goes to centroid [00:05:27] so now we can define the k-means [00:05:29] so now we can define the k-means objective function [00:05:31] objective function here's a picture which i walk through so [00:05:34] here's a picture which i walk through so the k-means objective function is uh [00:05:36] the k-means objective function is uh denoted as a loss [00:05:39] denoted as a loss he means loss function [00:05:41] he means loss function and it's a function of the cluster [00:05:43] and it's a function of the cluster assignments um [00:05:45] assignments um one through zn [00:05:46] one through zn and the [00:05:48] and the cluster centroids mu1 through [00:05:50] cluster centroids mu1 through and this is equal to i'm going to look [00:05:54] and this is equal to i'm going to look at all the endpoints [00:05:57] at all the endpoints some over them [00:05:59] some over them looking at the [00:06:01] looking at the i point [00:06:03] i point look at the difference between [00:06:05] look at the difference between um [00:06:06] um so zi is [00:06:08] so zi is uh one a number between one and k which [00:06:11] uh one a number between one and k which specifies which cluster [00:06:13] specifies which cluster a point i is assigned to [00:06:16] a point i is assigned to and i'm going to access [00:06:18] and i'm going to access its uh centroid [00:06:20] its uh centroid and i'm going to take a difference [00:06:23] and i'm going to take a difference between these two and uh square it so [00:06:25] between these two and uh square it so this is the square difference between [00:06:28] this is the square difference between distance between uh the point and its [00:06:31] distance between uh the point and its okay so [00:06:32] okay so pictorially what i'm looking at is for [00:06:35] pictorially what i'm looking at is for each point i look at its uh [00:06:38] each point i look at its uh assigned centroid [00:06:40] assigned centroid and i'm looking at the squared [00:06:42] and i'm looking at the squared of the length of these dashed uh lines [00:06:46] of the length of these dashed uh lines sum of all the squares of the dashed [00:06:48] sum of all the squares of the dashed lines is exactly [00:06:50] lines is exactly amy plus [00:06:52] amy plus i have to be as small as possible so i [00:06:54] i have to be as small as possible so i want to minimize with respect to both [00:06:57] want to minimize with respect to both the clusters centroids the cluster [00:07:00] the clusters centroids the cluster assignments and the centroids [00:07:04] assignments and the centroids of this objective function [00:07:09] so [00:07:11] so this is to get some intuition [00:07:13] this is to get some intuition let's consider a simpler example in one [00:07:16] let's consider a simpler example in one dimension so here we have four points at [00:07:19] dimension so here we have four points at 0 [00:07:21] 0 2 [00:07:22] 2 um 10 and 12. [00:07:25] um 10 and 12. okay so [00:07:27] okay so i'm going to consider [00:07:28] i'm going to consider the [00:07:29] the the case of if we know what the [00:07:31] the case of if we know what the centroids are does that make our life [00:07:34] centroids are does that make our life easier [00:07:35] easier so if we know the centroids are at one [00:07:37] so if we know the centroids are at one and eleven [00:07:39] and eleven um indeed this becomes a pretty trivial [00:07:42] um indeed this becomes a pretty trivial problem because now remember how do we [00:07:45] problem because now remember how do we assign points well [00:07:46] assign points well for the first point we just assign it to [00:07:49] for the first point we just assign it to the closest centroid [00:07:50] the closest centroid because where we know where the [00:07:52] because where we know where the centroids are [00:07:53] centroids are um so this point is zero is closest to [00:07:55] um so this point is zero is closest to one so i'm going to assign this one [00:07:58] one so i'm going to assign this one for z2 [00:08:00] for z2 this point is closest to one so [00:08:03] this point is closest to one so one [00:08:05] one this point is closest to plus the [00:08:07] this point is closest to plus the century for cluster two [00:08:08] century for cluster two and same with [00:08:10] and same with this so all i'm doing is [00:08:13] this so all i'm doing is looking at all the centroids and [00:08:16] looking at all the centroids and one is the closest to [00:08:18] one is the closest to the point i'm trying to assign [00:08:22] so now let's consider the case where i [00:08:25] so now let's consider the case where i don't know [00:08:26] don't know centroids [00:08:28] centroids but i have the assignments [00:08:30] but i have the assignments if i have the assignments [00:08:33] if i have the assignments then i can also compute the centroid [00:08:37] then i can also compute the centroid so for the first cluster [00:08:40] so for the first cluster i simply look at all the points that are [00:08:42] i simply look at all the points that are assigned to that cluster [00:08:44] assigned to that cluster and remember i want to find [00:08:48] and remember i want to find the centroid which is as close as [00:08:50] the centroid which is as close as possible to all the mount average and so [00:08:54] possible to all the mount average and so this is going to be minimum over [00:08:57] this is going to be minimum over sum of the squared distances [00:08:59] sum of the squared distances and recall that that is exactly [00:09:02] and recall that that is exactly optimized in closed form by sending it [00:09:04] optimized in closed form by sending it to mean [00:09:06] to mean of [00:09:06] of the points assigned to that cluster [00:09:09] the points assigned to that cluster so for mere two [00:09:11] so for mere two um points 10 and 12 are assigned to that [00:09:14] um points 10 and 12 are assigned to that cluster so the mean of those is 11. [00:09:19] cluster so the mean of those is 11. so now [00:09:20] so now given either the cluster [00:09:23] given either the cluster assignments [00:09:24] assignments or the centroids we can successfully [00:09:27] or the centroids we can successfully recover the [00:09:28] recover the optimally [00:09:29] optimally but this is a chicken and egg problem [00:09:31] but this is a chicken and egg problem because we neither have [00:09:33] because we neither have the centroids nor do we have the [00:09:35] the centroids nor do we have the assignments to begin with [00:09:38] assignments to begin with so what can we do [00:09:40] so what can we do well [00:09:41] well um let's just uh take a gamble and just [00:09:45] um let's just uh take a gamble and just initialize randomly [00:09:47] initialize randomly okay so we're going to initialize [00:09:49] okay so we're going to initialize the the centroids to some random [00:09:53] the the centroids to some random so let's us usually they are assigned to [00:09:56] so let's us usually they are assigned to uh some of the existing points data [00:09:58] uh some of the existing points data points so let's just assign them to [00:10:02] points so let's just assign them to so clearly this is not optimum [00:10:04] so clearly this is not optimum but let's try to iterate to [00:10:08] but let's try to iterate to so first iteration [00:10:09] so first iteration what we're going to do [00:10:11] what we're going to do is we're going to fix these centroids [00:10:14] is we're going to fix these centroids and optimize the cluster assignments so [00:10:17] and optimize the cluster assignments so let's look at each point and try to [00:10:19] let's look at each point and try to assign it to one of these clusters [00:10:22] assign it to one of these clusters so zero is closest to zero so i'm going [00:10:24] so zero is closest to zero so i'm going to assign that [00:10:26] to assign that plus a one [00:10:28] plus a one this point [00:10:30] this point two is closest to cluster two because [00:10:33] two is closest to cluster two because it's right on top so that's two [00:10:35] it's right on top so that's two and these two points [00:10:37] and these two points are also closest to [00:10:39] are also closest to um [00:10:40] um uh cluster two so they'll annotate them [00:10:43] uh cluster two so they'll annotate them assign them to cluster two [00:10:46] assign them to cluster two and then i'm going to use these uh new [00:10:51] and then i'm going to use these uh new assignments and try to re-estimate the [00:10:53] assignments and try to re-estimate the centroid [00:10:55] centroid so for the first cluster [00:10:58] so for the first cluster i'm just going that's only this point so [00:11:00] i'm just going that's only this point so i'm going to keep the [00:11:02] i'm going to keep the centroid there [00:11:03] centroid there and for the second cluster now i have uh [00:11:06] and for the second cluster now i have uh these three points and i'm going to [00:11:09] these three points and i'm going to find a [00:11:11] find a place the centroid that minimizes the [00:11:13] place the centroid that minimizes the square distance to all of them which is [00:11:15] square distance to all of them which is going to be the average at eight [00:11:18] going to be the average at eight which is two plus ten plus twelve [00:11:20] which is two plus ten plus twelve divided by three [00:11:22] divided by three okay so now i have these uh updated [00:11:24] okay so now i have these uh updated centroids at zero and eight you can see [00:11:26] centroids at zero and eight you can see that this is looking a bit better [00:11:28] that this is looking a bit better and now the second iteration i'm going [00:11:32] and now the second iteration i'm going to [00:11:33] to um reassign the points [00:11:35] um reassign the points based on these uh plus new centroids is [00:11:38] based on these uh plus new centroids is thrown in [00:11:40] thrown in so um the first [00:11:43] so um the first at this point is going to be assigned to [00:11:46] at this point is going to be assigned to cluster [00:11:47] cluster right here [00:11:49] right here um this point is uh is also going to be [00:11:52] um this point is uh is also going to be close assigned to cluster 1 because 0 is [00:11:56] close assigned to cluster 1 because 0 is closer to 2 than this [00:12:00] closer to 2 than this and uh [00:12:01] and uh point 10 is going to be closest to [00:12:04] point 10 is going to be closest to the second cluster [00:12:06] the second cluster and same with point [00:12:10] so now we have new cluster assignments [00:12:12] so now we have new cluster assignments we can go back and uh re-estimate the [00:12:16] we can go back and uh re-estimate the centroids [00:12:18] centroids and now we're back in our familiar uh [00:12:20] and now we're back in our familiar uh problem from the previous slide where [00:12:22] problem from the previous slide where we look at the first cluster [00:12:25] we look at the first cluster the sine [00:12:26] the sine of the centroid to be just the mean [00:12:28] of the centroid to be just the mean between the two points [00:12:30] between the two points and same [00:12:32] and same second cluster [00:12:34] second cluster and then 12 is 11. [00:12:36] and then 12 is 11. and now we're actually converged [00:12:39] and now we're actually converged um if you try to repeat this process [00:12:41] um if you try to repeat this process nothing will change so we're done [00:12:43] nothing will change so we're done and in this case it happens to recover [00:12:46] and in this case it happens to recover the optimum [00:12:48] the optimum clustering for these four points even [00:12:50] clustering for these four points even though we didn't know anything to start [00:12:51] though we didn't know anything to start out with [00:12:54] so here is the k-means algorithm stated [00:12:58] so here is the k-means algorithm stated more formally [00:13:00] more formally so [00:13:01] so first we are going to initialize all the [00:13:03] first we are going to initialize all the centroids randomly [00:13:05] centroids randomly then we're going to iterate [00:13:07] then we're going to iterate d times or until convergence [00:13:10] d times or until convergence and we're going to alternate between [00:13:12] and we're going to alternate between step one which is set the assignments [00:13:15] step one which is set the assignments given the centroids and step two which [00:13:17] given the centroids and step two which is setting the centroids given [00:13:19] is setting the centroids given assignments [00:13:21] assignments so step one we're going to go through [00:13:24] so step one we're going to go through each point [00:13:25] each point and we're going to try to assign it and [00:13:27] and we're going to try to assign it and assign it to zi is going to be [00:13:30] assign it to zi is going to be equal to [00:13:31] equal to we're going to look at all the [00:13:35] we're going to look at all the the clusters [00:13:37] the clusters 1 through big k [00:13:39] 1 through big k for each one of these we're going to [00:13:41] for each one of these we're going to look at where that point is [00:13:43] look at where that point is and [00:13:44] and look at the square distance between [00:13:47] look at the square distance between the centroid of that cluster [00:13:50] the centroid of that cluster squared [00:13:51] squared and then we're just going to take the [00:13:54] and then we're just going to take the argument which is the cluster [00:13:57] argument which is the cluster k which minimizes this [00:14:02] for step two we're going to loop over [00:14:04] for step two we're going to loop over all the clusters [00:14:07] all the clusters and we're going to set the centroid for [00:14:09] and we're going to set the centroid for that cluster to be [00:14:11] that cluster to be um we're going to [00:14:13] um we're going to look at all of the points i which are [00:14:16] look at all of the points i which are assigned [00:14:18] assigned to that cluster okay [00:14:20] to that cluster okay and just uh average um sum up all the [00:14:24] and just uh average um sum up all the uh the points [00:14:26] uh the points and this one divided by the number of [00:14:28] and this one divided by the number of points i'm summing over to get the [00:14:30] points i'm summing over to get the average [00:14:31] average um [00:14:32] um equals k [00:14:34] equals k okay so that is the k-means algorithm [00:14:42] so one word about whether it [00:14:45] so one word about whether it works or not the k-means algorithm is [00:14:47] works or not the k-means algorithm is guaranteed to converge to a local [00:14:50] guaranteed to converge to a local minimum of the objective k-means [00:14:52] minimum of the objective k-means objective [00:14:53] objective but it's not guaranteed to converge uh [00:14:55] but it's not guaranteed to converge uh to find the global minimum it's a [00:14:57] to find the global minimum it's a cartoon picture of all optimization [00:14:59] cartoon picture of all optimization functions it can convert to a local [00:15:01] functions it can convert to a local minimum but not a global [00:15:04] minimum but not a global um so here if you click here i have this [00:15:07] um so here if you click here i have this demo [00:15:08] demo which shows you [00:15:10] which shows you how k-means works um [00:15:13] how k-means works um so you can actually construct your own [00:15:15] so you can actually construct your own set of training examples and if you [00:15:18] set of training examples and if you step through the k-means algorithm [00:15:20] step through the k-means algorithm um you initialize and [00:15:23] um you initialize and it [00:15:24] it alternate between moving these centroids [00:15:26] alternate between moving these centroids around and reassigning the points [00:15:29] around and reassigning the points and in this happy case we actually get [00:15:32] and in this happy case we actually get to a pretty good clustering um the blue [00:15:35] to a pretty good clustering um the blue points over here red points over here [00:15:38] points over here red points over here and green [00:15:39] and green training error [00:15:41] training error means objective function is 44.7 [00:15:44] means objective function is 44.7 um but if i initialize in slightly [00:15:48] um but if i initialize in slightly different way [00:15:49] different way um let's see what happens [00:15:52] um let's see what happens uh it converges to something with much [00:15:55] uh it converges to something with much worse error and you can see visually [00:15:57] worse error and you can see visually that this is a sub-optimal clustering [00:15:59] that this is a sub-optimal clustering because this point has only one [00:16:02] because this point has only one this cluster only has one point whereas [00:16:05] this cluster only has one point whereas this cluster has so many other points [00:16:07] this cluster has so many other points which are spread out [00:16:10] which are spread out so um what do you do about [00:16:13] so um what do you do about this um [00:16:16] this um well there are a couple things [00:16:19] well there are a couple things one is that you can just run them [00:16:20] one is that you can just run them multiple times for different [00:16:21] multiple times for different randomizations [00:16:23] randomizations and take the best one [00:16:25] and take the best one another thing you can do is use a [00:16:28] another thing you can do is use a smarter heuristic so i didn't say very [00:16:30] smarter heuristic so i didn't say very much about [00:16:31] much about initialization [00:16:33] initialization but you there's a cool thing you can do [00:16:35] but you there's a cool thing you can do called k-means plus plus where you [00:16:39] called k-means plus plus where you initialize the clusters one at a time [00:16:42] initialize the clusters one at a time the centroids one at a time to be the [00:16:44] the centroids one at a time to be the data points which are as far away as [00:16:46] data points which are as far away as possible from all the other so this kind [00:16:49] possible from all the other so this kind of makes sure that the centroids are [00:16:50] of makes sure that the centroids are spread out and that they can kind of [00:16:54] spread out and that they can kind of move over time to capture all the points [00:16:56] move over time to capture all the points in your content [00:17:01] okay to wrap things up so far we've [00:17:04] okay to wrap things up so far we've talked about k-means which is an [00:17:06] talked about k-means which is an algorithm for doing clustering and [00:17:08] algorithm for doing clustering and clustering is a useful task [00:17:10] clustering is a useful task that can be allowed to discover a [00:17:13] that can be allowed to discover a structure and non-labeled data [00:17:15] structure and non-labeled data particularly group points together it's [00:17:18] particularly group points together it's useful to distinguish k-means um [00:17:21] useful to distinguish k-means um two [00:17:23] meanings of k-means [00:17:25] meanings of k-means the first is the k-means objective [00:17:28] the first is the k-means objective so this is objective function that says [00:17:31] so this is objective function that says find assignments and find centroids that [00:17:35] find assignments and find centroids that minimizes [00:17:36] minimizes the squared distance between [00:17:38] the squared distance between each point and a side the centroid of [00:17:41] each point and a side the centroid of the sine cluster [00:17:43] the sine cluster and then there's a k-means algorithm [00:17:45] and then there's a k-means algorithm which performs alternating minimization [00:17:48] which performs alternating minimization on the k-means objective [00:17:50] on the k-means objective uh so [00:17:52] uh so this is a [00:17:53] this is a setting the assignments given the [00:17:55] setting the assignments given the centroids and setting the centroids [00:17:58] centroids and setting the centroids given assignments [00:18:01] given assignments chicken egg problem so this is not [00:18:02] chicken egg problem so this is not guaranteed to [00:18:05] guaranteed to globally minimize the k-means objective [00:18:07] globally minimize the k-means objective although [00:18:07] although and can usually get pretty good results [00:18:10] and can usually get pretty good results robustly [00:18:12] robustly um step me back a little bit [00:18:14] um step me back a little bit a means is for clustering there's other [00:18:17] a means is for clustering there's other types of clustering methods [00:18:19] types of clustering methods out there and clustering is a form of [00:18:22] out there and clustering is a form of unsupervised learning [00:18:24] unsupervised learning and [00:18:25] and generally unsupervised learning has a [00:18:27] generally unsupervised learning has a few use cases one is just data [00:18:30] few use cases one is just data exploration and discovery to get a pile [00:18:32] exploration and discovery to get a pile of data you might not have had a chance [00:18:35] of data you might not have had a chance to [00:18:35] to annotate or label it [00:18:37] annotate or label it so you can run clustering or other types [00:18:40] so you can run clustering or other types of unsupervised learning to [00:18:42] of unsupervised learning to group points and discover any sort of [00:18:45] group points and discover any sort of structure [00:18:46] structure to get insight [00:18:48] to get insight and the second use case is that when you [00:18:50] and the second use case is that when you perform clustering or some sort of [00:18:53] perform clustering or some sort of representation learning you can actually [00:18:56] representation learning you can actually get useful [00:18:57] get useful representations or features that you can [00:19:00] representations or features that you can stick them in to downstream supervised [00:19:03] stick them in to downstream supervised learning problems when you do get a [00:19:05] learning problems when you do get a bunch of label data and this generally [00:19:08] bunch of label data and this generally is helpful for [00:19:09] is helpful for helping [00:19:11] helping supervised learning work better [00:19:15] supervised learning work better okay so that's end of this module ================================================================================ LECTURE 017 ================================================================================ Search 1 - Dynamic Programming, Uniform Cost Search | Stanford CS221: AI (Autumn 2019) Source: https://www.youtube.com/watch?v=aIsgJJYrlXk --- Transcript [00:00:05] everyone I'm dorsa and this week I'll be [00:00:09] everyone I'm dorsa and this week I'll be teaching the state-based models and the [00:00:11] teaching the state-based models and the pine is for the next couple of weeks for [00:00:13] pine is for the next couple of weeks for me to teach the state base models MVPs [00:00:16] me to teach the state base models MVPs and games and and after that Percy will [00:00:18] and games and and after that Percy will come back and talk about the later so [00:00:20] come back and talk about the later so the later topics so a few announcements [00:00:22] the later topics so a few announcements so homework three is out there just make [00:00:26] so homework three is out there just make sure to look at that and then the grades [00:00:27] sure to look at that and then the grades for homework one will be coming out soon [00:00:29] for homework one will be coming out soon so just yeah all right so so let's talk [00:00:34] so just yeah all right so so let's talk about state based models let's talk [00:00:36] about state based models let's talk about search so just to start I was [00:00:39] about search so just to start I was thinking maybe we can start with this [00:00:40] thinking maybe we can start with this question if you can so basically you're [00:00:47] question if you can so basically you're gonna let me tell you what the question [00:00:48] gonna let me tell you what the question is and then think about it and then [00:00:50] is and then think about it and then after I don't get this working so so the [00:00:52] after I don't get this working so so the question is you have a farmer and the [00:00:54] question is you have a farmer and the farmer has a cabbage a goat and a wolf [00:00:56] farmer has a cabbage a goat and a wolf and it's on one side of the river [00:00:58] and it's on one side of the river everything is on one side of the river [00:00:59] everything is on one side of the river so you have this river you have a farmer [00:01:02] so you have this river you have a farmer we have the farmer with the cabbage the [00:01:05] we have the farmer with the cabbage the goat and the wolf yeah and the farmer [00:01:08] goat and the wolf yeah and the farmer wants to go to the other side of the [00:01:10] wants to go to the other side of the river and take everything with himself [00:01:13] river and take everything with himself and but the thing is the farmer has a [00:01:15] and but the thing is the farmer has a boat and in that boat can only fit two [00:01:17] boat and in that boat can only fit two things so the farmer can be in it with [00:01:19] things so the farmer can be in it with with one of these other things okay so [00:01:21] with one of these other things okay so the question is how many crossings can [00:01:24] the question is how many crossings can candy farmer do to take everything on [00:01:26] candy farmer do to take everything on the other side of the river and are a [00:01:28] the other side of the river and are a bunch of constraints the constraint is [00:01:30] bunch of constraints the constraint is if you leave the cabbage and go together [00:01:32] if you leave the cabbage and go together the goat is going to eat the cabbage so [00:01:34] the goat is going to eat the cabbage so you can't really do that if you leave [00:01:35] you can't really do that if you leave wolf with the goat [00:01:37] wolf with the goat the wolf is going to eat the goat you [00:01:39] the wolf is going to eat the goat you can't really do that how many crossings [00:01:41] can't really do that how many crossings should you take to take everything to [00:01:43] should you take to take everything to the other side think about it talk to [00:01:45] the other side think about it talk to your neighbors [00:01:45] your neighbors I'll get this working everyone clear on [00:01:48] I'll get this working everyone clear on the question [00:01:49] the question [Music] [00:01:57] [Music] [00:02:33] [Music] the link doesn't work because I can't [00:02:36] the link doesn't work because I can't connect to internet but alright so okay [00:02:39] connect to internet but alright so okay so how many people think it is four four [00:02:43] so how many people think it is four four crossings five six six [00:02:49] crossings five six six some people think six seven no solution [00:02:55] some people think six seven no solution no solution okay so the point is [00:02:58] no solution okay so the point is actually not like what the answer is yes [00:03:00] actually not like what the answer is yes it would come back to this question and [00:03:01] it would come back to this question and try to solve it but I think the [00:03:03] try to solve it but I think the important point to think about right now [00:03:04] important point to think about right now is how you went about solving it so so [00:03:07] is how you went about solving it so so what were you thinking and what was the [00:03:08] what were you thinking and what was the process that you were thinking when you [00:03:10] process that you were thinking when you were trying to solve solve this problem [00:03:12] were trying to solve solve this problem and that is kind of the commonality that [00:03:14] and that is kind of the commonality that search problems have and and we want to [00:03:15] search problems have and and we want to think about those types of problems or [00:03:17] think about those types of problems or it's it's more challenging to answer [00:03:19] it's it's more challenging to answer these types of questions and let's say [00:03:21] these types of questions and let's say reflex based type of questions so so [00:03:23] reflex based type of questions so so that's kind of just a motivating [00:03:25] that's kind of just a motivating examples that you will come back later [00:03:27] examples that you will come back later and here's an xkcd on this so basically [00:03:31] and here's an xkcd on this so basically one potential solution is the farmer [00:03:33] one potential solution is the farmer takes the goat goes to the other side [00:03:34] takes the goat goes to the other side comes back takes the cabbage goes to the [00:03:37] comes back takes the cabbage goes to the other side and just leaves the wolf [00:03:39] other side and just leaves the wolf because why would he need a wolf why [00:03:41] because why would he need a wolf why would a former need a wolf so if yeah I [00:03:44] would a former need a wolf so if yeah I searched for you probably were thinking [00:03:46] searched for you probably were thinking about this and I get surprised like an [00:03:49] about this and I get surprised like an interesting point in it because [00:03:50] interesting point in it because sometimes maybe you should change the [00:03:52] sometimes maybe you should change the problem your model is completely wrong [00:03:54] problem your model is completely wrong maybe maybe sometimes you should rethink [00:03:55] maybe maybe sometimes you should rethink and go back to your model I try to fix [00:03:57] and go back to your model I try to fix that but anyways so we'll come back to [00:04:00] that but anyways so we'll come back to this question so all right so this was [00:04:03] this question so all right so this was our guideline for the class and we have [00:04:05] our guideline for the class and we have already talked about reflex based model [00:04:07] already talked about reflex based model so we have talked about machine learning [00:04:08] so we have talked about machine learning and how that can get applied and now we [00:04:10] and how that can get applied and now we want to start talking about state based [00:04:12] want to start talking about state based models this week we're going to talk [00:04:14] models this week we're going to talk about search problems next week [00:04:15] about search problems next week MVP's and then the week after we're [00:04:18] MVP's and then the week after we're going to talk about games and if you [00:04:19] going to talk about games and if you remember the kind of the guideline that [00:04:22] remember the kind of the guideline that we had for the class was we were [00:04:24] we had for the class was we were thinking about these three different [00:04:26] thinking about these three different paradigms of modeling right we talked [00:04:30] paradigms of modeling right we talked about this already so modeling inference [00:04:34] about this already so modeling inference and learning so for reflux based models [00:04:40] and learning so for reflux based models we talked about this already right so [00:04:42] we talked about this already right so well with the model B Roley can be a [00:04:44] well with the model B Roley can be a linear predictor or it can be a neural [00:04:46] linear predictor or it can be a neural network so that was the model and then [00:04:48] network so that was the model and then we talked about and friends but in the [00:04:50] we talked about and friends but in the case of reflex based models it was [00:04:52] case of reflex based models it was really simple it was just function [00:04:53] really simple it was just function evaluation you have you had your neural [00:04:55] evaluation you have you had your neural network and you just go about evaluating [00:04:57] network and you just go about evaluating it and that was inference and we also [00:04:59] it and that was inference and we also spent some time talking about learning [00:05:00] spent some time talking about learning so how would we use like what's a [00:05:03] so how would we use like what's a gradient descent to try to fit the [00:05:05] gradient descent to try to fit the parameters of the model okay so similar [00:05:08] parameters of the model okay so similar thing with search based models you want [00:05:09] thing with search based models you want to talk about these three different [00:05:11] to talk about these three different paradigms that we have in the class and [00:05:13] paradigms that we have in the class and then the plan is to talk about models [00:05:15] then the plan is to talk about models and in France today and then under [00:05:17] and in France today and then under Wednesday we'll talk about learning and [00:05:19] Wednesday we'll talk about learning and we kind of have the same sort of format [00:05:21] we kind of have the same sort of format next week too so we're going to start [00:05:22] next week too so we're going to start talking about modeling and inference on [00:05:23] talking about modeling and inference on Mondays Wednesdays are gonna be about [00:05:25] Mondays Wednesdays are gonna be about learning so so just to give you an idea [00:05:27] learning so so just to give you an idea what the plan is [00:05:29] what the plan is all right so so what are search problems [00:05:32] all right so so what are search problems let's start with a few motivating [00:05:34] let's start with a few motivating examples so so one potential example one [00:05:36] examples so so one potential example one can think of is is route-finding so you [00:05:39] can think of is is route-finding so you might have a map and you want to go from [00:05:40] might have a map and you want to go from point A to point B on the map and you [00:05:42] point A to point B on the map and you have an objective so you want to maybe [00:05:44] have an objective so you want to maybe find a shortest path or the fastest path [00:05:47] find a shortest path or the fastest path or most scenic time and and that is your [00:05:50] or most scenic time and and that is your objective and the things you can do is [00:05:52] objective and the things you can do is you can take a bunch of action so you [00:05:53] you can take a bunch of action so you can do things like go straight turn left [00:05:55] can do things like go straight turn left turn right and then the answer for the [00:05:58] turn right and then the answer for the search problem is going to be a sequence [00:06:00] search problem is going to be a sequence of actions if you want to go from A to B [00:06:02] of actions if you want to go from A to B with the shortest path the answer that [00:06:04] with the shortest path the answer that one would give is maybe turned right [00:06:06] one would give is maybe turned right first and then turn left and then right [00:06:08] first and then turn left and then right again [00:06:08] again or any or any of these sequences okay so [00:06:10] or any or any of these sequences okay so so this is just a canonical example of [00:06:12] so this is just a canonical example of what a search problem is there are a few [00:06:15] what a search problem is there are a few other examples so for example you can [00:06:17] other examples so for example you can think of robot a robot motion planning [00:06:19] think of robot a robot motion planning so if you have a robot that wants to go [00:06:21] so if you have a robot that wants to go from point A to point B then it might [00:06:24] from point A to point B then it might want to have different objectives for [00:06:25] want to have different objectives for doing that so again the question might [00:06:27] doing that so again the question might be what is the fastest way of doing it [00:06:29] be what is the fastest way of doing it or what is the most energy efficient way [00:06:31] or what is the most energy efficient way of getting the robot to do that or or [00:06:33] of getting the robot to do that or or what is the safest way of doing it like [00:06:35] what is the safest way of doing it like another question that we are interested [00:06:37] another question that we are interested in is what is the most expressive or [00:06:38] in is what is the most expressive or legible way of robot doing it so so [00:06:41] legible way of robot doing it so so people can understand what a robot [00:06:43] people can understand what a robot really wants so you might have again [00:06:44] really wants so you might have again various types of objectives you can [00:06:46] various types of objectives you can formalize that and then the actions that [00:06:49] formalize that and then the actions that you can take in the case of the robot [00:06:50] you can take in the case of the robot motion planning is the robot is going to [00:06:52] motion planning is the robot is going to have different joints and each one of [00:06:53] have different joints and each one of the joints can translate and can rotate [00:06:55] the joints can translate and can rotate so translation and rotation are the type [00:06:58] so translation and rotation are the type of actions that you can take so so in [00:07:00] of actions that you can take so so in this case I have a robot with seven [00:07:01] this case I have a robot with seven seven joints and then I need to tell [00:07:03] seven joints and then I need to tell what each one of those joints should do [00:07:04] what each one of those joints should do in terms of translation and rotation [00:07:07] in terms of translation and rotation this is my robot yes it's a fetch robot [00:07:13] this is my robot yes it's a fetch robot alright so so let's look at another [00:07:15] alright so so let's look at another example so games this is a fun example [00:07:17] example so games this is a fun example so you might think about something like [00:07:20] so you might think about something like Rubik's Cube or or this 15 puzzle and [00:07:22] Rubik's Cube or or this 15 puzzle and and again what do you want to do as a [00:07:24] and again what do you want to do as a search problem well you want to you want [00:07:26] search problem well you want to you want to end up in configuration that's [00:07:27] to end up in configuration that's desirable right so you want to end up in [00:07:29] desirable right so you want to end up in a configuration where you have this type [00:07:31] a configuration where you have this type of configuration of rubik's cube or the [00:07:34] of configuration of rubik's cube or the 15 puzzle so that is the goal that's the [00:07:36] 15 puzzle so that is the goal that's the objective and then the action as you can [00:07:39] objective and then the action as you can move pieces around here so the sequence [00:07:41] move pieces around here so the sequence of actions might be how you're moving [00:07:43] of actions might be how you're moving these pieces are on to get that [00:07:44] these pieces are on to get that particular configuration of the 15 [00:07:47] particular configuration of the 15 puzzle ok so again another example of [00:07:49] puzzle ok so again another example of what a search problem is machine [00:07:52] what a search problem is machine translation is an interesting one it's [00:07:55] translation is an interesting one it's not necessarily the most natural thing [00:07:57] not necessarily the most natural thing you might think about when you think [00:07:58] you might think about when you think about search problems but what it is [00:08:00] about search problems but what it is actually you can think about it as a [00:08:01] actually you can think about it as a search problem again so imagine you have [00:08:03] search problem again so imagine you have a phrase in different language and you [00:08:05] a phrase in different language and you want to translate it to English so what [00:08:07] want to translate it to English so what is the objective here well you can think [00:08:09] is the objective here well you can think of the objective as going to fluent [00:08:11] of the objective as going to fluent English and preserving meaning so that [00:08:13] English and preserving meaning so that is the objective that one would have in [00:08:15] is the objective that one would have in machine translation and then the type of [00:08:18] machine translation and then the type of actions that you're taking is you're [00:08:20] actions that you're taking is you're appending words so you start with there [00:08:21] appending words so you start with there and then your appending blue to it and [00:08:23] and then your appending blue to it and you're appending hostage so so as you're [00:08:26] you're appending hostage so so as you're appending these these different [00:08:27] appending these these different different words those are the actions [00:08:29] different words those are the actions that you're taking so so in some sense [00:08:31] that you're taking so so in some sense you can have any complex sequential task [00:08:33] you can have any complex sequential task and the sequence of actions that you [00:08:35] and the sequence of actions that you would get to get to your objective is [00:08:38] would get to get to your objective is this going to be the answer for your [00:08:40] this going to be the answer for your search problem and you can pose it as a [00:08:41] search problem and you can pose it as a search problem ok all right so so what [00:08:46] search problem ok all right so so what is different between let's say reflex [00:08:48] is different between let's say reflex based models and search problem so if [00:08:50] based models and search problem so if you remember reflex based models the [00:08:52] you remember reflex based models the idea was you'd have an input X and then [00:08:55] idea was you'd have an input X and then we wanted to find this F for example a [00:08:57] we wanted to find this F for example a classifier that that would output [00:09:00] classifier that that would output something like like this Y which is labo [00:09:02] something like like this Y which is labo it's a plus 1 or minus 1 so the common [00:09:04] it's a plus 1 or minus 1 so the common thing in in these reflex based models [00:09:06] thing in in these reflex based models was we were outputting this this one [00:09:08] was we were outputting this this one they [00:09:09] they this one in this case action being minus [00:09:12] this one in this case action being minus 1 or plus 1 again in search problems the [00:09:15] 1 or plus 1 again in search problems the idea is I've given an input I'm given a [00:09:18] idea is I've given an input I'm given a state and then given that I have that [00:09:20] state and then given that I have that state what I want to output is a [00:09:22] state what I want to output is a sequence of actions so I do want to [00:09:24] sequence of actions so I do want to think about what happens if I take this [00:09:26] think about what happens if I take this action like how is that going to affect [00:09:28] action like how is that going to affect the future of my actions okay so so the [00:09:31] the future of my actions okay so so the key idea in search problems is you need [00:09:33] key idea in search problems is you need to consider future consequences of the [00:09:36] to consider future consequences of the actions you take at the currency like [00:09:39] actions you take at the currency like just outputting one thing and so if you [00:09:44] just outputting one thing and so if you rerun it so the question is yeah is it [00:09:45] rerun it so the question is yeah is it not the same as like I'm rerunning it I [00:09:47] not the same as like I'm rerunning it I asked you to thing and then I rerun it [00:09:49] asked you to thing and then I rerun it again [00:09:49] again and then you could do that but that ends [00:09:51] and then you could do that but that ends up being a little bit of a that would be [00:09:53] up being a little bit of a that would be something similar to a greedy algorithm [00:09:54] something similar to a greedy algorithm or like let's say I want to get to the [00:09:56] or like let's say I want to get to the door and I want to find a find the [00:09:58] door and I want to find a find the fastest way and and right now if I just [00:10:00] fastest way and and right now if I just look at like my current state maybe I [00:10:01] look at like my current state maybe I think the fastest way of getting there [00:10:03] think the fastest way of getting there is going this way but if I actually [00:10:05] is going this way but if I actually think about a horizon and I think about [00:10:06] think about a horizon and I think about how this action is going to affect my [00:10:08] how this action is going to affect my future I might call with different [00:10:10] future I might call with different sequence of actions yeah all right okay [00:10:15] sequence of actions yeah all right okay so and then we've already seen this [00:10:16] so and then we've already seen this paradigm so let's start talking about [00:10:18] paradigm so let's start talking about modeling and in France during this class [00:10:20] modeling and in France during this class so this is the the plan for today so [00:10:22] so this is the the plan for today so we're going to talk about three [00:10:24] we're going to talk about three different algorithms for for doing [00:10:26] different algorithms for for doing inference for search problems so so [00:10:29] inference for search problems so so we're going to talk about research which [00:10:30] we're going to talk about research which is the most naive thing one could do to [00:10:32] is the most naive thing one could do to solve some of these search problems but [00:10:34] solve some of these search problems but that's the simplest thing we can start [00:10:35] that's the simplest thing we can start with and then after that you want to [00:10:37] with and then after that you want to look at improvements of that doing [00:10:39] look at improvements of that doing dynamic programming or uniform cost [00:10:41] dynamic programming or uniform cost search based problem another flex pays [00:10:46] search based problem another flex pays problem the very fact that in a respect [00:10:48] problem the very fact that in a respect face problem the output that you give [00:10:50] face problem the output that you give does not influence an exchange and it [00:10:53] does not influence an exchange and it doesn't search yeah that's true yeah so [00:10:55] doesn't search yeah that's true yeah so so the output that you get and search [00:10:56] so the output that you get and search problem instance action it actually [00:10:58] problem instance action it actually influences your future yeah that's a [00:11:00] influences your future yeah that's a good way of actually thinking about it [00:11:03] all right so so let's talk about [00:11:05] all right so so let's talk about research so let's go back to our [00:11:08] research so let's go back to our favorite example okay so we have the [00:11:11] favorite example okay so we have the farm area cabbage go-to in both so let's [00:11:13] farm area cabbage go-to in both so let's think about all possible actions that [00:11:15] think about all possible actions that one can take when we have this farmer [00:11:18] one can take when we have this farmer cabbage goat interval okay so so a bunch [00:11:21] cabbage goat interval okay so so a bunch of things we can do is the farmer [00:11:22] of things we can do is the farmer I can go to the other side of the river [00:11:24] I can go to the other side of the river with the boat alone so this triangle [00:11:27] with the boat alone so this triangle here just means like going to the other [00:11:29] here just means like going to the other side of that de river the farmer can [00:11:32] side of that de river the farmer can take the cabbage so C's for cabbage G's [00:11:34] take the cabbage so C's for cabbage G's for it go to WC for both so another [00:11:37] for it go to WC for both so another possible action is the farmer takes a [00:11:39] possible action is the farmer takes a cabbage or the farmer takes a goat or [00:11:40] cabbage or the farmer takes a goat or the farmer takes a wolf and goes to the [00:11:42] the farmer takes a wolf and goes to the other side of the river you also have a [00:11:44] other side of the river you also have a bunch of other actions the farmer can [00:11:46] bunch of other actions the farmer can come back the farmer I can come back [00:11:47] come back the farmer I can come back with the cabbage come back with the goat [00:11:49] with the cabbage come back with the goat collaborative so I'm basically numerate [00:11:52] collaborative so I'm basically numerate enumerate all possible actions that that [00:11:55] enumerate all possible actions that that one could ever do and sure none of some [00:11:57] one could ever do and sure none of some of these might not be possible in [00:11:59] of these might not be possible in particular States but I'm just creating [00:12:01] particular States but I'm just creating this library of actions things that are [00:12:03] this library of actions things that are possible yeah so then when we think [00:12:06] possible yeah so then when we think about this as a search problem we could [00:12:09] about this as a search problem we could create a search tree which which [00:12:11] create a search tree which which basically starts from an initial state [00:12:13] basically starts from an initial state of where things are and then we can kind [00:12:16] of where things are and then we can kind of think about where we could go from [00:12:19] of think about where we could go from that initial state so the search tree is [00:12:20] that initial state so the search tree is more of a what if what if tree which [00:12:23] more of a what if what if tree which which allows you to think about what are [00:12:25] which allows you to think about what are the possible options that you can take [00:12:27] the possible options that you can take so conceptually what it looks like is [00:12:30] so conceptually what it looks like is you're starting with your initial state [00:12:32] you're starting with your initial state where everything is on one side of the [00:12:34] where everything is on one side of the river so those two lines are it is the [00:12:37] river so those two lines are it is the river white and you can take a bunch of [00:12:40] river white and you can take a bunch of actions right like one possible action [00:12:42] actions right like one possible action is you can take the cabbage and go to [00:12:44] is you can take the cabbage and go to the other side of the river and you end [00:12:46] the other side of the river and you end up in that state and that's a little not [00:12:48] up in that state and that's a little not a good state so I'm making that red well [00:12:50] a good state so I'm making that red well why is that because the wolf is going to [00:12:51] why is that because the wolf is going to eat the goat that's not that great okay [00:12:55] eat the goat that's not that great okay and every action every crossing let's [00:12:57] and every action every crossing let's say let's say every crossing takes cost [00:12:59] say let's say every crossing takes cost of one so that one that you see on the [00:13:01] of one so that one that you see on the edge is the cost of that action okay so [00:13:04] edge is the cost of that action okay so that didn't really work that well what [00:13:05] that didn't really work that well what else can I do well I can I can do [00:13:08] else can I do well I can I can do another action I can I can from the [00:13:10] another action I can I can from the initial State [00:13:10] initial State I can take the goat and go to the other [00:13:12] I can take the goat and go to the other side of the river that ends up in this [00:13:15] side of the river that ends up in this configuration from there the farmer [00:13:17] configuration from there the farmer could come back take the cabbage go to [00:13:19] could come back take the cabbage go to the other side end up in this [00:13:21] the other side end up in this configuration the farmer can come back [00:13:23] configuration the farmer can come back that's again not a great States because [00:13:25] that's again not a great States because cabbage and goat are left on the other [00:13:27] cabbage and goat are left on the other side of the river good is going to eat [00:13:29] side of the river good is going to eat the cabbage that's not great what else [00:13:31] the cabbage that's not great what else can I do well the farmer can come back [00:13:33] can I do well the farmer can come back with the goat and [00:13:35] with the goat and once the farmer comes back with the goat [00:13:36] once the farmer comes back with the goat the farmer leaves the goat takes the [00:13:39] the farmer leaves the goat takes the wolf goes to the other side comes back [00:13:41] wolf goes to the other side comes back gets the goat again and then okay so so [00:13:46] gets the goat again and then okay so so how many steps is to stay cool one two [00:13:48] how many steps is to stay cool one two three four five six and seven so so the [00:13:51] three four five six and seven so so the ones mice are seven that was a right [00:13:53] ones mice are seven that was a right answer and that is kind of the idea of [00:13:57] answer and that is kind of the idea of getting to this so you could have this [00:14:10] getting to this so you could have this giant tree where you go to different [00:14:12] giant tree where you go to different states but we can actually have like a [00:14:14] states but we can actually have like a counter that tells you if I visited that [00:14:16] counter that tells you if I visited that state and if you have visited that state [00:14:17] state and if you have visited that state maybe you don't want to go there again [00:14:18] maybe you don't want to go there again because because you have already [00:14:19] because because you have already explored all the possible actions from [00:14:21] explored all the possible actions from there you're not done with this tree [00:14:23] there you're not done with this tree though right like I found this this good [00:14:26] though right like I found this this good state here but maybe there's a better [00:14:27] state here but maybe there's a better way of like getting there I don't know [00:14:29] way of like getting there I don't know yet I haven't explored everything so so [00:14:31] yet I haven't explored everything so so what I can do is I can actually explore [00:14:33] what I can do is I can actually explore all these other things that that one [00:14:34] all these other things that that one could do not gonna go over them but [00:14:37] could do not gonna go over them but there is another solution and turns out [00:14:39] there is another solution and turns out that other solution also takes seven [00:14:41] that other solution also takes seven steps so it's not necessarily a better [00:14:42] steps so it's not necessarily a better solution but but you got to explore all [00:14:44] solution but but you got to explore all of that because there could be another [00:14:45] of that because there could be another solution later on that that is better [00:14:48] solution later on that that is better than the seven steps all right the wiser [00:14:58] than the seven steps all right the wiser okay all right so so this is how the [00:15:03] okay all right so so this is how the search tree looks like oh that's a very [00:15:09] search tree looks like oh that's a very good point thank you for saying so for [00:15:11] good point thank you for saying so for CPD students I'll try to repeat the [00:15:13] CPD students I'll try to repeat the questions I always forget this I'll try [00:15:15] questions I always forget this I'll try to repeat the questions the question was [00:15:17] to repeat the questions the question was was the slice or the slides are up they [00:15:19] was the slice or the slides are up they are up they should be up okay all right [00:15:22] are up they should be up okay all right so going back to our search problem so [00:15:25] so going back to our search problem so we can try to formalize this search [00:15:27] we can try to formalize this search problem so let's actually think about it [00:15:29] problem so let's actually think about it more formally so what are the things [00:15:31] more formally so what are the things that we need to keep track of so so we [00:15:33] that we need to keep track of so so we have a start state so let's define s [00:15:34] have a start state so let's define s start to be the start state in addition [00:15:37] start to be the start state in addition to that we can we can define this [00:15:38] to that we can we can define this function called actions which returns [00:15:40] function called actions which returns all possible actions from States so [00:15:43] all possible actions from States so actions is a function of state if I'm in [00:15:45] actions is a function of state if I'm in a state it basically tells me what are [00:15:47] a state it basically tells me what are the actions I can take from [00:15:48] the actions I can take from I can can you find this cost function so [00:15:51] I can can you find this cost function so this cost function takes a state in [00:15:53] this cost function takes a state in action and tells me what is the cost of [00:15:55] action and tells me what is the cost of that and in this example the cost of [00:15:57] that and in this example the cost of crossing the river was just one but you [00:15:59] crossing the river was just one but you can imagine having different cost values [00:16:01] can imagine having different cost values we can have a successor function that [00:16:04] we can have a successor function that basically takes a state in action and [00:16:06] basically takes a state in action and and tells us where we end up at so if [00:16:08] and tells us where we end up at so if I'm state s and I take action a where [00:16:11] I'm state s and I take action a where would I end up at and that's the [00:16:12] would I end up at and that's the successor function and then we were [00:16:14] successor function and then we were going to define and is end the function [00:16:16] going to define and is end the function which basically checks if you're in an [00:16:18] which basically checks if you're in an end state where we don't have any other [00:16:20] end state where we don't have any other possible actions we can you can think of [00:16:27] possible actions we can you can think of it as a way of like finite state machine [00:16:30] it as a way of like finite state machine type of type of way of looking at it [00:16:32] type of type of way of looking at it yeah so like we use a similar type of [00:16:35] yeah so like we use a similar type of formalism for MVPs and games too so it's [00:16:38] formalism for MVPs and games too so it's just good idea to get like all these [00:16:39] just good idea to get like all these hormones outside but start state [00:16:41] hormones outside but start state transitions cost position and action an [00:16:55] transitions cost position and action an extravert this thing so then so so the [00:17:00] extravert this thing so then so so the action okay so so action depends on the [00:17:01] action okay so so action depends on the state so you start from start state [00:17:03] state so you start from start state where you haven't taken any actions [00:17:04] where you haven't taken any actions right and then from that start state [00:17:06] right and then from that start state then you can think about all possible [00:17:08] then you can think about all possible like right up there so you're in that [00:17:10] like right up there so you're in that start state and then you can think about [00:17:12] start state and then you can think about all possible actions you can take [00:17:14] all possible actions you can take and then those actions depend on current [00:17:16] and then those actions depend on current state but they don't depend on the [00:17:18] state but they don't depend on the future State right so based on my [00:17:19] future State right so based on my current state everything is on one side [00:17:21] current state everything is on one side of the river I can think about all [00:17:23] of the river I can think about all possible actions I can take and where I [00:17:24] possible actions I can take and where I know where I end up at and then after [00:17:27] know where I end up at and then after that's like the next action depends on [00:17:29] that's like the next action depends on that that state so it's a sequential [00:17:31] that that state so it's a sequential thing okay yes you have all the [00:17:35] thing okay yes you have all the information on the actions and the cost [00:17:37] information on the actions and the cost that you could do beforehand how is this [00:17:39] that you could do beforehand how is this conceptually different than like a mini [00:17:40] conceptually different than like a mini cost flow convex optimization [00:17:42] cost flow convex optimization okay so how is it different from a kind [00:17:46] okay so how is it different from a kind of convex optimization type of row so we [00:17:48] of convex optimization type of row so we have you have an objective here and you [00:17:49] have you have an objective here and you can think of what that objective is and [00:17:51] can think of what that objective is and based on what that objective is you can [00:17:53] based on what that objective is you can have different methods for solving it [00:17:54] have different methods for solving it right so so you can basically formulate [00:17:56] right so so you can basically formulate this as an optimization problem where [00:17:58] this as an optimization problem where you saw you look for the [00:18:00] you saw you look for the solution to a search problem as an [00:18:01] solution to a search problem as an optimization problem that's perfectly a [00:18:03] optimization problem that's perfectly a perfect way of doing it and then we are [00:18:05] perfect way of doing it and then we are going to talk about various types of [00:18:07] going to talk about various types of methods for solving this problem today [00:18:09] methods for solving this problem today yeah all right so so let's look at [00:18:12] yeah all right so so let's look at another example so this is a [00:18:14] another example so this is a transportation problem this so okay so [00:18:24] transportation problem this so okay so basically what we want to do is we have [00:18:26] basically what we want to do is we have straight blocks from 1 through n so 1 2 [00:18:32] straight blocks from 1 through n so 1 2 3 so on so these are straight thoughts [00:18:38] 3 so on so these are straight thoughts and what we want to do is we basically [00:18:41] and what we want to do is we basically want to travel from from 1 2 to some n [00:18:45] want to travel from from 1 2 to some n number and we have two possible actions [00:18:47] number and we have two possible actions so at any state let's say on state s at [00:18:50] so at any state let's say on state s at any state I can either walk and if I [00:18:53] any state I can either walk and if I walk I end up in s plus 1 so if I'm in 3 [00:18:56] walk I end up in s plus 1 so if I'm in 3 I'm going to end up in 4 and walking [00:19:00] I'm going to end up in 4 and walking takes 1 minute or I can take this magic [00:19:03] takes 1 minute or I can take this magic and if magic tram takes any state [00:19:06] and if magic tram takes any state s 2 2 times s so if I mean 3 then I'm [00:19:10] s 2 2 times s so if I mean 3 then I'm going to end up in 6 by taking the magic [00:19:12] going to end up in 6 by taking the magic trap and the magic time [00:19:14] trap and the magic time always takes 2 minutes doesn't matter [00:19:16] always takes 2 minutes doesn't matter from that so so if I'm in to all end up [00:19:19] from that so so if I'm in to all end up in 4 or 5 min 5 I can end up in 10 by [00:19:22] in 4 or 5 min 5 I can end up in 10 by taking the trap ok so so I have two [00:19:24] taking the trap ok so so I have two possible actions in any of these states [00:19:26] possible actions in any of these states and what I want to do is I want to go [00:19:28] and what I want to do is I want to go from 1 to N and then I want to basically [00:19:30] from 1 to N and then I want to basically do that in the shortest time possible so [00:19:34] do that in the shortest time possible so we did with the least amount of costs [00:19:36] we did with the least amount of costs there's a problem all right so so this [00:19:40] there's a problem all right so so this is kind of like what the search problem [00:19:42] is kind of like what the search problem is so what we want to do is first off [00:19:43] is so what we want to do is first off you want to just formalize it and I'm [00:19:46] you want to just formalize it and I'm gonna do that here I'm not gonna do wife [00:19:48] gonna do that here I'm not gonna do wife solutions cuz I'm not Percy and I did [00:19:51] solutions cuz I'm not Percy and I did that once and it was a disaster so we [00:19:55] that once and it was a disaster so we are going to yeah I've taped these in [00:19:58] are going to yeah I've taped these in 2018 but basically you're going to go [00:20:02] 2018 but basically you're going to go over it together so so let's just do [00:20:04] over it together so so let's just do that so we're going to define the search [00:20:08] that so we're going to define the search problem this problem so we're [00:20:10] problem this problem so we're going to define a class for [00:20:11] going to define a class for transportation problems you're going to [00:20:12] transportation problems you're going to separate our search [00:20:14] separate our search from our algorithms because remember [00:20:16] from our algorithms because remember modeling is separate from inference so [00:20:19] modeling is separate from inference so let's just have a constructor for this [00:20:20] let's just have a constructor for this transportation problem it takes n [00:20:22] transportation problem it takes n because we are at n box okay so n is the [00:20:26] because we are at n box okay so n is the number of blocks [00:20:30] well [00:20:33] all right so so then you have miss to [00:20:35] all right so so then you have miss to have a start state we are starting from [00:20:37] have a start state we are starting from one so block one and then we need to [00:20:40] one so block one and then we need to define is end state so as n state [00:20:43] define is end state so as n state basically checks if you have reached an [00:20:45] basically checks if you have reached an or not because we have to get to there [00:20:47] or not because we have to get to there and in okay alright so what else do [00:20:53] and in okay alright so what else do we need [00:20:54] we need so we have a successor function you also [00:20:56] so we have a successor function you also have a cost function I'm gonna put both [00:20:58] have a cost function I'm gonna put both of them together cuz that is just easier [00:21:01] of them together cuz that is just easier so the successor in cost function I'm [00:21:03] so the successor in cost function I'm saying let's just give it state s and [00:21:06] saying let's just give it state s and then given a state it's going to return [00:21:09] then given a state it's going to return this triple of action new state costs so [00:21:13] this triple of action new state costs so I give it a state as the initial state [00:21:14] I give it a state as the initial state and then it just returns all possible [00:21:16] and then it just returns all possible actions with the new states I can end up [00:21:18] actions with the new states I can end up at and how much does that cost yeah so [00:21:21] at and how much does that cost yeah so what are my options [00:21:22] what are my options well if I'm in state s I can walk to s [00:21:25] well if I'm in state s I can walk to s plus 1 that cost 1 if I'm in state s I [00:21:29] plus 1 that cost 1 if I'm in state s I can take the tram I can end up in to s [00:21:31] can take the tram I can end up in to s and that costs to okay so that's how I'm [00:21:33] and that costs to okay so that's how I'm creating my Triple C and I need to check [00:21:37] creating my Triple C and I need to check if I don't pass the enth walk remember [00:21:39] if I don't pass the enth walk remember like you have n box so we don't want to [00:21:40] like you have n box so we don't want to pass then walk ok so so that's just to [00:21:44] pass then walk ok so so that's just to make sure that we don't pass it so we [00:21:46] make sure that we don't pass it so we are still below tenth block and then [00:21:49] are still below tenth block and then this is what my successor in cost [00:21:51] this is what my successor in cost function will return the debt triples [00:21:52] function will return the debt triples okay so let's just return that okay so [00:21:57] okay so let's just return that okay so that is my transportation problem let's [00:21:59] that is my transportation problem let's make sure it does the think the way we [00:22:02] make sure it does the think the way we want it so let's say we have 10 blocks [00:22:04] want it so let's say we have 10 blocks and now I want to print my [00:22:06] and now I want to print my transportation of my successor and cost [00:22:08] transportation of my successor and cost much analysis let's say I'm returning [00:22:10] much analysis let's say I'm returning successor and cost for 3 what should I [00:22:12] successor and cost for 3 what should I get [00:22:12] get so from 3 I can have two actions right I [00:22:16] so from 3 I can have two actions right I can either walk or I can take the tram [00:22:19] can either walk or I can take the tram if I walk it costs one if I take the [00:22:22] if I walk it costs one if I take the tram it costs two I'll end up in 4 or 6 [00:22:24] tram it costs two I'll end up in 4 or 6 let's just try I don't know 9 [00:22:29] if I'm a state 9 I can only do one thing [00:22:32] if I'm a state 9 I can only do one thing I can walk right cuz remember the the [00:22:34] I can walk right cuz remember the the block is number of blocks is 10 and I [00:22:36] block is number of blocks is 10 and I can't go beyond that so alright okay so [00:22:42] can't go beyond that so alright okay so that was yeah let's go back here so that [00:22:47] that was yeah let's go back here so that was just defining the search problem [00:22:50] was just defining the search problem yeah and I haven't told you guys like [00:22:54] yeah and I haven't told you guys like how to solve it right this is we were [00:22:56] how to solve it right this is we were just doing the modeling right now so we [00:22:58] just doing the modeling right now so we just modeled this problem just coated it [00:23:00] just modeled this problem just coated it up modeling it means what is this what [00:23:02] up modeling it means what is this what are the actions what is a successor [00:23:04] are the actions what is a successor function what is the cost function [00:23:06] function what is the cost function defining an is end function say but what [00:23:09] defining an is end function say but what the initial state is okay so so now I [00:23:11] the initial state is okay so so now I think we are ready to think about the [00:23:14] think we are ready to think about the algorithm in terms of like going in [00:23:16] algorithm in terms of like going in solving these types of search problems [00:23:18] solving these types of search problems so the simplest algorithm we want to [00:23:21] so the simplest algorithm we want to talk about is backtracking search so the [00:23:25] talk about is backtracking search so the idea of backtracking search is maybe I [00:23:28] idea of backtracking search is maybe I can draw a tree here is you're starting [00:23:31] can draw a tree here is you're starting from an initial state and then you have [00:23:32] from an initial state and then you have a bunch of possible actions and then you [00:23:35] a bunch of possible actions and then you end up in some state and you have a [00:23:36] end up in some state and you have a bunch of water possible actions let's [00:23:39] bunch of water possible actions let's say you have two actions possible and [00:23:41] say you have two actions possible and this can become this exponentially blows [00:23:44] this can become this exponentially blows up so I'm going to stop soon all right [00:23:48] up so I'm going to stop soon all right so so create this tree and this tree has [00:23:51] so so create this tree and this tree has some branching factor it's number of [00:23:54] some branching factor it's number of actions you have at every at every state [00:23:57] actions you have at every at every state and then it also has some depth so that [00:24:01] and then it also has some depth so that is how many levels you go down to let me [00:24:05] is how many levels you go down to let me just define that by D okay [00:24:07] just define that by D okay and now their solutions down in this no [00:24:10] and now their solutions down in this no it's right so so we want to figure out [00:24:11] it's right so so we want to figure out what those solutions are and [00:24:12] what those solutions are and backtracking search just does the [00:24:14] backtracking search just does the simplest thing possible what it does is [00:24:16] simplest thing possible what it does is it starts from this initial state and [00:24:18] it starts from this initial state and it's going to go all the way down here [00:24:19] it's going to go all the way down here and if it doesn't find a solution it's [00:24:20] and if it doesn't find a solution it's gonna go back here and then try again [00:24:22] gonna go back here and then try again and try again and it's gonna go over all [00:24:24] and try again and it's gonna go over all over the tree because there might be a [00:24:26] over the tree because there might be a better solution down here too so it [00:24:28] better solution down here too so it needs to actually go over all of the [00:24:29] needs to actually go over all of the truth okay so I'm going to have a table [00:24:32] truth okay so I'm going to have a table of algorithms cuz you're going to talk [00:24:35] of algorithms cuz you're going to talk about a few of them yeah algorithms [00:24:40] about a few of them yeah algorithms what sort of costs they allow [00:24:43] what sort of costs they allow in terms of time how bad they are in [00:24:47] in terms of time how bad they are in terms of space so if you've taken an [00:24:50] terms of space so if you've taken an algorithms course like some of these are [00:24:52] algorithms course like some of these are probably familiar so alright so we [00:24:56] probably familiar so alright so we talked about Mack tracking search [00:24:58] talked about Mack tracking search tracking search that is basically this [00:25:04] tracking search that is basically this algorithm that goes through pretty much [00:25:06] algorithm that goes through pretty much everything and it allows any type of [00:25:08] everything and it allows any type of costs so I can have any costs right I [00:25:11] costs so I can have any costs right I can have pretty much any cost I want on [00:25:14] can have pretty much any cost I want on these edges because I'm going over all [00:25:15] these edges because I'm going over all of the tree it doesn't matter what these [00:25:17] of the tree it doesn't matter what these costs okay so how bad is this in terms [00:25:22] costs okay so how bad is this in terms of in terms of time so in terms of time [00:25:25] of in terms of time so in terms of time I'm going over the full tree like going [00:25:28] I'm going over the full tree like going over the full tree then then this is [00:25:30] over the full tree then then this is going to have this exponential blow-up [00:25:32] going to have this exponential blow-up where I'm looking at order of B to the D [00:25:37] where I'm looking at order of B to the D where B is again my branching factor and [00:25:39] where B is again my branching factor and D is the depth of the tree because in [00:25:44] D is the depth of the tree because in terms of time this is not a good [00:25:45] terms of time this is not a good algorithm maybe in terms of time I have [00:25:47] algorithm maybe in terms of time I have to go over everything in the tree and [00:25:49] to go over everything in the tree and that's the size of my tree and in terms [00:25:52] that's the size of my tree and in terms of space in terms of space what I mean [00:25:54] of space in terms of space what I mean is I need to figure out what was what [00:25:57] is I need to figure out what was what was the sequence of actions I needed to [00:25:58] was the sequence of actions I needed to take to get to some solution so let's [00:26:00] take to get to some solution so let's say that my solution is down here my [00:26:02] say that my solution is down here my solution is down here then for mean or [00:26:05] solution is down here then for mean or like I hate to store a bunch of things [00:26:06] like I hate to store a bunch of things to know how I got here and the things I [00:26:08] to know how I got here and the things I need to store the parents of this node [00:26:11] need to store the parents of this node and that is depth of them so in terms of [00:26:14] and that is depth of them so in terms of space this algorithm takes order of D [00:26:19] space this algorithm takes order of D because because that is like the things [00:26:21] because because that is like the things that I need to store in my memory to be [00:26:23] that I need to store in my memory to be able to recover everything between the [00:26:29] able to recover everything between the space be bigger your B to the D as well [00:26:32] space be bigger your B to the D as well because until you get that you need to [00:26:35] because until you get that you need to have the space to have everything right [00:26:36] have the space to have everything right no so actually we will talk about [00:26:38] no so actually we will talk about breadth-first search later which does [00:26:40] breadth-first search later which does require have a larger space so so the [00:26:43] require have a larger space so so the reason you can forget it is the only [00:26:44] reason you can forget it is the only history that I need to keep track of is [00:26:47] history that I need to keep track of is this particular branch right I don't [00:26:49] this particular branch right I don't need to figure out like I don't need to [00:26:50] need to figure out like I don't need to keep track of like actually the history [00:26:53] keep track of like actually the history of all these other notes I can throw [00:26:55] of all these other notes I can throw those out but or something else like [00:26:57] those out but or something else like breadth-first search where we will talk [00:26:58] breadth-first search where we will talk about in a few slides you actually need [00:27:00] about in a few slides you actually need to keep track of like the history of [00:27:01] to keep track of like the history of everything else [00:27:02] everything else so so let me get back to that in a few [00:27:04] so so let me get back to that in a few slides but for this one make the clean [00:27:06] slides but for this one make the clean ideas yeah like I want to know how I got [00:27:08] ideas yeah like I want to know how I got there to know how I got there I just [00:27:10] there to know how I got there I just need to know like the minimum cost to [00:27:15] need to know like the minimum cost to reach your point or is it to find [00:27:16] reach your point or is it to find whether like so it depends on what your [00:27:20] whether like so it depends on what your objective is like it really depends on [00:27:22] objective is like it really depends on what the search problem is asking so so [00:27:24] what the search problem is asking so so in the case of that farmer good example [00:27:26] in the case of that farmer good example the search problem is asking you want to [00:27:29] the search problem is asking you want to move everything to the other side of the [00:27:30] move everything to the other side of the river so you have that criteria and you [00:27:32] river so you have that criteria and you want to find the minimum cost one so you [00:27:34] want to find the minimum cost one so you also add other criteria so it really [00:27:36] also add other criteria so it really depends on what the search problems [00:27:38] depends on what the search problems asking and some of these notes might be [00:27:39] asking and some of these notes might be solutions some of them might not be [00:27:41] solutions some of them might not be solutions so it's a really difference [00:27:43] solutions so it's a really difference all right so so let's just look at these [00:27:46] all right so so let's just look at these on the slide so the memory is order of T [00:27:48] on the slide so the memory is order of T it's actually small it's nice in terms [00:27:50] it's actually small it's nice in terms of time this is not a great algorithm [00:27:52] of time this is not a great algorithm right because even if you're branching [00:27:54] right because even if you're branching factors too if the depth of the tree is [00:27:56] factors too if the depth of the tree is 50 then this is going to blow up like [00:27:59] 50 then this is going to blow up like immediately so a lot of these tree [00:28:01] immediately so a lot of these tree search algorithm said you're going to [00:28:03] search algorithm said you're going to talk about like they have the same [00:28:04] talk about like they have the same problem so so they pretty much have the [00:28:06] problem so so they pretty much have the same time complexity if you're going to [00:28:08] same time complexity if you're going to just look at very minimal improvements [00:28:10] just look at very minimal improvements of them and then after that we'll talk [00:28:12] of them and then after that we'll talk about dynamic programming and uniform [00:28:14] about dynamic programming and uniform cost search which are polynomial [00:28:15] cost search which are polynomial algorithms that are much better than [00:28:17] algorithms that are much better than okay all right so let's actually let's [00:28:20] okay all right so let's actually let's go back to a tram example and let's try [00:28:22] go back to a tram example and let's try to write up what backtracking search [00:28:24] to write up what backtracking search does so alright so we defined our model [00:28:26] does so alright so we defined our model our model is the search problem this [00:28:29] our model is the search problem this particular transportation search problem [00:28:31] particular transportation search problem it could be anything else and now we're [00:28:34] it could be anything else and now we're going to kind of have this main section [00:28:36] going to kind of have this main section with where we were going to put in like [00:28:38] with where we were going to put in like our algorithms in it and you're going to [00:28:40] our algorithms in it and you're going to write them as general as possible so so [00:28:43] write them as general as possible so so we can apply them to other types of [00:28:44] we can apply them to other types of search problems yeah so let's define [00:28:46] search problems yeah so let's define backtracking search it takes a search [00:28:49] backtracking search it takes a search problem it can take the transportation [00:28:51] problem it can take the transportation problem okay all right so and then we're [00:28:54] problem okay all right so and then we're going to basically in backtracking [00:28:56] going to basically in backtracking search what we were doing is we were [00:28:57] search what we were doing is we were recursing on every state given that we [00:29:00] recursing on every state given that we have a history of getting there and the [00:29:02] have a history of getting there and the total cost that it took us to get there [00:29:04] total cost that it took us to get there okay so so at the state having gone some [00:29:08] okay so so at the state having gone some history and some accumulated cost so [00:29:11] history and some accumulated cost so far we are going to basically recurse on [00:29:14] far we are going to basically recurse on that state and look at the children of [00:29:16] that state and look at the children of that state okay so so we're going to [00:29:18] that state okay so so we're going to explore the rest of the subtree from [00:29:21] explore the rest of the subtree from from that particular state [00:29:25] all right so how do we do that [00:29:28] well we got to make sure that you're not [00:29:30] well we got to make sure that you're not in an end state or if you're an in-state [00:29:33] in an end state or if you're an in-state like we can actually update the best [00:29:35] like we can actually update the best solution so far okay so let's put that [00:29:40] solution so far okay so let's put that for to do so so the bunch of things we [00:29:42] for to do so so the bunch of things we need to do we need to figure out if we [00:29:43] need to do we need to figure out if we are in an end state if you are well we [00:29:45] are in an end state if you are well we gotta we gotta update our best solution [00:29:47] gotta we gotta update our best solution if you're not in an end state then we [00:29:49] if you're not in an end state then we are going to recurse on children [00:29:53] all right so you can do that later and [00:29:59] all right so you can do that later and then in general this recurse function is [00:30:02] then in general this recurse function is going to we're going to call it on on [00:30:05] going to we're going to call it on on the on the start State so let's actually [00:30:07] the on the start State so let's actually do that too so so what backtracking [00:30:09] do that too so so what backtracking search does is it calls this recursive [00:30:12] search does is it calls this recursive function on the initial State dead [00:30:13] function on the initial State dead behalf with history of none right if you [00:30:16] behalf with history of none right if you don't have any history yet and and cost [00:30:18] don't have any history yet and and cost this zero so far because we haven't [00:30:20] this zero so far because we haven't really gone anywhere so so we start with [00:30:22] really gone anywhere so so we start with a start state we call recurse on it okay [00:30:25] a start state we call recurse on it okay and how do we recurse on children well [00:30:27] and how do we recurse on children well we have defined this this successor and [00:30:29] we have defined this this successor and cost function so by calling that [00:30:31] cost function so by calling that successor and cost function on state [00:30:33] successor and cost function on state then we can get action new state and [00:30:36] then we can get action new state and costs so so we get this triple of action [00:30:39] costs so so we get this triple of action news dating cost [00:30:42] and then we can basically recurse on the [00:30:45] and then we can basically recurse on the new state I'm not putting the histories [00:30:49] new state I'm not putting the histories right now in this code so so you need to [00:30:51] right now in this code so so you need to keep track of the history too but let's [00:30:53] keep track of the history too but let's just not worry about the history oh I [00:30:57] just not worry about the history oh I guess I'm putting it in this one I mean [00:30:59] guess I'm putting it in this one I mean the later ones I will not put them but [00:31:02] the later ones I will not put them but but basically the history is keeping [00:31:03] but basically the history is keeping track of like how you got there and [00:31:05] track of like how you got there and total cost is going to be what you've [00:31:07] total cost is going to be what you've got so far costs the cost of this is new [00:31:10] got so far costs the cost of this is new state action yeah okay so we need to [00:31:14] state action yeah okay so we need to keep track of the best solution so far [00:31:15] keep track of the best solution so far so I'm just going to find a dictionary [00:31:19] so I'm just going to find a dictionary here just to make sure that we keep [00:31:21] here just to make sure that we keep track of it and we'll Python scoping [00:31:26] okay [00:31:29] and then the place we are going to [00:31:32] and then the place we are going to update our best solution so far is that [00:31:33] update our best solution so far is that to do that is left right so if you're in [00:31:36] to do that is left right so if you're in an end state then we can actually update [00:31:38] an end state then we can actually update the best solution so far yeah and what [00:31:41] the best solution so far yeah and what do we want in our best solution well we [00:31:43] do we want in our best solution well we want to know what the cost is so so you [00:31:46] want to know what the cost is so so you can start with cost of infinity and [00:31:48] can start with cost of infinity and anything below infinity is better and [00:31:51] anything below infinity is better and then we're going to start with a history [00:31:53] then we're going to start with a history of empty but we're going to throw up [00:31:54] of empty but we're going to throw up that's history too so that's the [00:31:57] that's history too so that's the initialization of best solution so far [00:32:00] initialization of best solution so far then we are going to update that right [00:32:03] then we are going to update that right if you're in an end state if the total [00:32:05] if you're in an end state if the total cost that we have right now is smaller [00:32:07] cost that we have right now is smaller than the best solution so far then [00:32:09] than the best solution so far then you're going to update that best [00:32:11] you're going to update that best solution and then you're going to update [00:32:13] solution and then you're going to update its history with whatever its histories [00:32:17] all right and that's it that's [00:32:18] all right and that's it that's backtracking search okay so let's just [00:32:21] backtracking search okay so let's just make sure it does the thing so to do [00:32:28] make sure it does the thing so to do that we are going to actually know we [00:32:32] that we are going to actually know we got to return the best solution so far [00:32:39] all right so now we have defined this [00:32:42] all right so now we have defined this transportation problem now what I want [00:32:44] transportation problem now what I want to do is I want to call backtracking [00:32:46] to do is I want to call backtracking search on the transportation problem [00:32:48] search on the transportation problem yeah so that all sounds good I need to [00:32:53] yeah so that all sounds good I need to write a print function also so to be [00:32:56] write a print function also so to be able to print things so I'm going to [00:32:57] able to print things so I'm going to just write a generic print function that [00:32:59] just write a generic print function that we can call on any of these types of [00:33:01] we can call on any of these types of problems so let's let's define a print [00:33:05] problems so let's let's define a print solution function that just like prints [00:33:07] solution function that just like prints things the way you want them so we get [00:33:10] things the way you want them so we get the solution and we're going to just [00:33:11] the solution and we're going to just unpack the cost and history and just [00:33:13] unpack the cost and history and just print the constant history nicely [00:33:21] you [00:33:23] you all right so I can i can use this print [00:33:26] all right so I can i can use this print solution for pretty much all the other [00:33:27] solution for pretty much all the other algorithms we'll talk about today too [00:33:32] and you're going to talk about how we [00:33:35] and you're going to talk about how we get there with the history so now I have [00:33:37] get there with the history so now I have my print function I have my backtracking [00:33:39] my print function I have my backtracking search algorithm I've defined my [00:33:41] search algorithm I've defined my transportation problem I can just call [00:33:43] transportation problem I can just call it on this transportation problem it's [00:33:45] it on this transportation problem it's been ten blocks so as you guys can see [00:33:48] been ten blocks so as you guys can see here so the total cost is six so what [00:33:51] here so the total cost is six so what this means is for going from city 1 to [00:33:53] this means is for going from city 1 to city to city 10 then this is the best [00:33:56] city to city 10 then this is the best solution I got a walk walk walk walk and [00:33:58] solution I got a walk walk walk walk and then after that take the tram guys like [00:34:00] then after that take the tram guys like I end up in five and then after that [00:34:02] I end up in five and then after that it's actually worth taking the tram and [00:34:04] it's actually worth taking the tram and paying constant let's try it out for 20 [00:34:08] paying constant let's try it out for 20 what do you think is the answer for 20 [00:34:11] what do you think is the answer for 20 so similar here for a walk walk walk [00:34:14] so similar here for a walk walk walk until we get to five then we take the [00:34:17] until we get to five then we take the tram then we take the tram again across [00:34:18] tram then we take the tram again across the state and then if it is 100 it's a [00:34:24] the state and then if it is 100 it's a little bit more interesting if you have [00:34:25] little bit more interesting if you have 100 so you're walking and then you're [00:34:27] 100 so you're walking and then you're taking the tram and you get to 24 and [00:34:30] taking the tram and you get to 24 and you want you have that one step to get [00:34:32] you want you have that one step to get to 25 which is a good stay because then [00:34:34] to 25 which is a good stay because then you can just multiply that by 2 so so [00:34:36] you can just multiply that by 2 so so you walk for that one step and take the [00:34:38] you walk for that one step and take the tram again so what if I want to try out [00:34:41] tram again so what if I want to try out a much larger number of blocks so is [00:34:45] a much larger number of blocks so is this gonna work no cuz cuz remember that [00:34:49] this gonna work no cuz cuz remember that time was order of B to the D that wasn't [00:34:51] time was order of B to the D that wasn't that great so let's try that [00:34:54] that great so let's try that Oh because maximum recursion to him we [00:34:58] Oh because maximum recursion to him we can fix that so let's try fix impact so [00:35:01] can fix that so let's try fix impact so if you can't you can set your recursion [00:35:03] if you can't you can set your recursion limit to be whatever so if we try that [00:35:08] limit to be whatever so if we try that is this gonna work [00:35:11] [Laughter] [00:35:14] [Laughter] well now it's just gonna take a long [00:35:16] well now it's just gonna take a long time right it's not gonna give you an [00:35:18] time right it's not gonna give you an answer because it's gonna just take a [00:35:20] answer because it's gonna just take a long time so all right [00:35:28] okay let's go back here alright so that [00:35:32] okay let's go back here alright so that was backtracking search right so all we [00:35:34] was backtracking search right so all we was doing was just going over all of [00:35:36] was doing was just going over all of this tree and it was taking exponential [00:35:39] this tree and it was taking exponential time as you saw and we just tried it out [00:35:40] time as you saw and we just tried it out on that transportation problem that [00:35:42] on that transportation problem that really fine so we just defined a search [00:35:44] really fine so we just defined a search problem we use this really simple search [00:35:46] problem we use this really simple search algorithm to find solutions for that and [00:35:48] algorithm to find solutions for that and then that's what we have so far so so [00:35:50] then that's what we have so far so so now what we want to do is we want to we [00:35:52] now what we want to do is we want to we want to come up with a few better [00:35:54] want to come up with a few better improvements of this backtracking search [00:35:56] improvements of this backtracking search again don't get your hopes up it's not [00:35:58] again don't get your hopes up it's not that big of an improvement but but we [00:36:00] that big of an improvement but but we can do some something better so so the [00:36:03] can do some something better so so the first improvement you want to make is by [00:36:04] first improvement you want to make is by using this algorithm called depth-first [00:36:06] using this algorithm called depth-first search so you might have heard of it the [00:36:09] search so you might have heard of it the FS or depth-first search okay so the [00:36:12] FS or depth-first search okay so the restriction that DFS puts in is that [00:36:15] restriction that DFS puts in is that your cost has to be zero so your cost [00:36:18] your cost has to be zero so your cost has to be you actually draw a line [00:36:22] has to be you actually draw a line between them so okay so so we're talking [00:36:27] between them so okay so so we're talking about the investor and the restriction [00:36:31] about the investor and the restriction is the cost has to be zero so so what [00:36:33] is the cost has to be zero so so what the offense does is it basically does [00:36:36] the offense does is it basically does exactly the same thing as backtracking [00:36:37] exactly the same thing as backtracking search but once it finds a solution down [00:36:40] search but once it finds a solution down here then it is done it basically [00:36:43] here then it is done it basically doesn't like explore the rest of this [00:36:45] doesn't like explore the rest of this thing and the reason it can do that if [00:36:47] thing and the reason it can do that if the cost of all these edges is zero so [00:36:49] the cost of all these edges is zero so if the cost of all these edges are zero [00:36:52] if the cost of all these edges are zero then if I find a solution I found a [00:36:54] then if I find a solution I found a solution I don't need to like find its [00:36:56] solution I don't need to like find its better solution that's because that that [00:36:57] better solution that's because that that is good enough like anything like fine [00:36:59] is good enough like anything like fine also has a cost of zero so my dad's all [00:37:01] also has a cost of zero so my dad's all just returned to so an example of that [00:37:04] just returned to so an example of that is if you have Rubik in rubik's cube [00:37:05] is if you have Rubik in rubik's cube like if you find a solution then you [00:37:08] like if you find a solution then you have found a solution right there are [00:37:09] have found a solution right there are million different ways of like getting [00:37:11] million different ways of like getting to a solution but like you just want one [00:37:13] to a solution but like you just want one and then if you find one then you're [00:37:15] and then if you find one then you're happy you are done okay so as you can [00:37:18] happy you are done okay so as you can see this is a very very slight [00:37:20] see this is a very very slight improvement to backtracking search what [00:37:23] improvement to backtracking search what happens is in terms of in terms of space [00:37:25] happens is in terms of in terms of space it's still the same thing so it's order [00:37:27] it's still the same thing so it's order of D so in terms of space nothing has [00:37:29] of D so in terms of space nothing has changed [00:37:30] changed it's pretty good its order of D in terms [00:37:33] it's pretty good its order of D in terms of time in practice it is better right [00:37:36] of time in practice it is better right because in practice if I find a solution [00:37:38] because in practice if I find a solution I can just be done don't worry about the [00:37:40] I can just be done don't worry about the rest of the tree but [00:37:41] rest of the tree but in general if you want to talk about it [00:37:43] in general if you want to talk about it in theory then the worst case scenario [00:37:45] in theory then the worst case scenario is just trying out all of the trees so [00:37:46] is just trying out all of the trees so you write it as force case scenario it's [00:37:49] you write it as force case scenario it's order of B to the D so so nothing has [00:37:50] order of B to the D so so nothing has really changed in terms of in terms of [00:37:54] really changed in terms of in terms of exponential blow-up doll that tree [00:37:57] exponential blow-up doll that tree assumes that you applied the subproblems [00:38:01] assumes that you applied the subproblems to not overlap all right because you're [00:38:03] to not overlap all right because you're kind of branching of a kind of different [00:38:05] kind of branching of a kind of different states but in fact a sub-problem could [00:38:07] states but in fact a sub-problem could overlap so somebody to trim problem you [00:38:10] overlap so somebody to trim problem you can get to a same place with different [00:38:12] can get to a same place with different history but the rest are the same [00:38:14] history but the rest are the same yeah so you can it's so the question is [00:38:16] yeah so you can it's so the question is yeah do subproblems overlap here or they [00:38:19] yeah do subproblems overlap here or they don't so you could actually have it in a [00:38:21] don't so you could actually have it in a setting where two subproblems do overlap [00:38:22] setting where two subproblems do overlap but you could actually add this this [00:38:23] but you could actually add this this extra like constraint that says if I [00:38:25] extra like constraint that says if I visited the state and don't add it to [00:38:27] visited the state and don't add it to the tree so so you have that option or [00:38:29] the tree so so you have that option or you have the option of like going down [00:38:31] you have the option of like going down the tree with some like particular [00:38:33] the tree with some like particular deaths and not trying out everything in [00:38:35] deaths and not trying out everything in the setting that we have here yeah like [00:38:37] the setting that we have here yeah like you're basically trying out all possible [00:38:39] you're basically trying out all possible like I'm talking about the most like [00:38:42] like I'm talking about the most like general form you're going over all the [00:38:44] general form you're going over all the states and all possible actions that [00:38:46] states and all possible actions that would come alright so that was CFS okay [00:38:53] would come alright so that was CFS okay so the idea of the FS again as you're [00:38:57] so the idea of the FS again as you're doing backtracking search and then [00:38:58] doing backtracking search and then you're just stopping when you find a [00:39:00] you're just stopping when you find a solution because because cost is zero [00:39:02] solution because because cost is zero here so in terms of space order of D in [00:39:05] here so in terms of space order of D in terms of time it's still order of [00:39:08] terms of time it's still order of utility alright so that was the FS we [00:39:13] utility alright so that was the FS we have another algorithm called [00:39:14] have another algorithm called breadth-first search BFS and this is [00:39:18] breadth-first search BFS and this is useful when cost is some constant but it [00:39:22] useful when cost is some constant but it doesn't need to be zero it's just so [00:39:24] doesn't need to be zero it's just so some some positive constant so what that [00:39:27] some some positive constant so what that means is all these edges have the same [00:39:31] means is all these edges have the same cost and that cost is just C the same [00:39:37] cost and that cost is just C the same cost so the idea of breadth first search [00:39:42] cost so the idea of breadth first search is we can we can go layer by layer like [00:39:46] is we can we can go layer by layer like like we're not going to try out the [00:39:48] like we're not going to try out the depth instead what we can do is we can [00:39:50] depth instead what we can do is we can go layer by layer try out this layer and [00:39:53] go layer by layer try out this layer and see if you find a solution here remember [00:39:54] see if you find a solution here remember the tree doesn't need to go all the way [00:39:56] the tree doesn't need to go all the way down here the tree could end here or [00:39:57] down here the tree could end here or look at any of these and any of these [00:39:59] look at any of these and any of these notes right like I can have like a tree [00:40:01] notes right like I can have like a tree that looks you do like this [00:40:03] that looks you do like this I have a solution here this tree doesn't [00:40:06] I have a solution here this tree doesn't need to be this nicely formed like I can [00:40:09] need to be this nicely formed like I can have a tree that looks like this okay so [00:40:11] have a tree that looks like this okay so if I have a tree that looks like this [00:40:13] if I have a tree that looks like this with breadth-first search I'm gonna try [00:40:15] with breadth-first search I'm gonna try out this layer see if this guy is a [00:40:17] out this layer see if this guy is a solution if it's not I'm gonna try this [00:40:18] solution if it's not I'm gonna try this guy this is the solution if not I'm [00:40:20] guy this is the solution if not I'm gonna try here here and then when I find [00:40:22] gonna try here here and then when I find a solution when I get here I'm done [00:40:24] a solution when I get here I'm done right that's like if I find a solution [00:40:26] right that's like if I find a solution here I know it took to see to get here [00:40:29] here I know it took to see to get here like two of these see values and if [00:40:31] like two of these see values and if there is any other solution anywhere [00:40:33] there is any other solution anywhere else and this subtree or in this subtree [00:40:36] else and this subtree or in this subtree those solutions are going to be worse [00:40:38] those solutions are going to be worse than this because they're gonna just [00:40:39] than this because they're gonna just like take like they are going to have a [00:40:42] like take like they are going to have a higher cost so because the cost is [00:40:44] higher cost so because the cost is constant throughout so then it's it's [00:40:48] constant throughout so then it's it's useful if your solutions are somewhere [00:40:50] useful if your solutions are somewhere like high up in this tree and then you [00:40:52] like high up in this tree and then you can find it so in terms of time I get [00:40:55] can find it so in terms of time I get some improvements here because I can [00:40:57] some improvements here because I can call this steps it's shorter depth D [00:41:00] call this steps it's shorter depth D small D I'm gonna call it a shorter [00:41:02] small D I'm gonna call it a shorter depth small D and in terms of time it's [00:41:05] depth small D and in terms of time it's still exponential but it's order of B to [00:41:08] still exponential but it's order of B to the small D and this is actually a huge [00:41:11] the small D and this is actually a huge improvement because if you think about [00:41:13] improvement because if you think about it the trees exponentially become larger [00:41:15] it the trees exponentially become larger so these like lower levels are a lot of [00:41:18] so these like lower levels are a lot of things that you need to you need to [00:41:20] things that you need to you need to explore if you have like branching [00:41:21] explore if you have like branching factor of 10 the next layer has a [00:41:23] factor of 10 the next layer has a hundred things in it right so so going [00:41:25] hundred things in it right so so going down these layers is actually pretty bad [00:41:27] down these layers is actually pretty bad so so the fact that with breadth-first [00:41:29] so so the fact that with breadth-first search i can improve the timing and then [00:41:31] search i can improve the timing and then limit it to a particular depth that's [00:41:33] limit it to a particular depth that's pretty good still exponential but pretty [00:41:36] pretty good still exponential but pretty good if you have no negative cost so [00:41:39] good if you have no negative cost so that way you can also assume this [00:41:41] that way you can also assume this yeah you can assume that this is best [00:41:43] yeah you can assume that this is best solution yeah exactly so you're assuming [00:41:44] solution yeah exactly so you're assuming that there are no negative costs so at [00:41:46] that there are no negative costs so at this point I know this the best solution [00:41:47] this point I know this the best solution undone like I call it and I don't know I [00:41:49] undone like I call it and I don't know I guess for anything else [00:41:50] guess for anything else the problem with breadth-first search is [00:41:53] the problem with breadth-first search is there's a question there sorry [00:41:55] there's a question there sorry so same yeah we are assuming all the [00:41:58] so same yeah we are assuming all the costs are same because maybe like all [00:42:00] costs are same because maybe like all the costs are one if I don't assume that [00:42:02] the costs are one if I don't assume that but all of these costs are 100 and then [00:42:04] but all of these costs are 100 and then like there might be like like some water [00:42:07] like there might be like like some water like yeah you need to explore the rest [00:42:10] like yeah you need to explore the rest if they're not the same basically [00:42:12] if they're not the same basically alright so so the problem with BFS is in [00:42:16] alright so so the problem with BFS is in terms of memory we are losing in terms [00:42:18] terms of memory we are losing in terms of memory you need to actually keep [00:42:20] of memory you need to actually keep track of the history of all these other [00:42:22] track of the history of all these other like all the notes that you have [00:42:23] like all the notes that you have explored so far so in terms of memory [00:42:26] explored so far so in terms of memory this is going to be order of B to the D [00:42:29] this is going to be order of B to the D kind of similar to the time and then the [00:42:32] kind of similar to the time and then the reason is I have explored this guy and [00:42:36] reason is I have explored this guy and then after exploring this guy I need to [00:42:39] then after exploring this guy I need to still have like a history of where it's [00:42:42] still have like a history of where it's going to go because next time around [00:42:44] going to go because next time around when I try out the Slayer I need to know [00:42:46] when I try out the Slayer I need to know everything about this parent and I'd [00:42:47] everything about this parent and I'd like when I even explore here and this [00:42:49] like when I even explore here and this is not a solution [00:42:50] is not a solution I need to store everything about this [00:42:51] I need to store everything about this because maybe I don't find a solution in [00:42:53] because maybe I don't find a solution in this in this level and I need to come [00:42:55] this in this level and I need to come down and then I come down I need to know [00:42:57] down and then I come down I need to know everything about these notes so I need [00:42:59] everything about these notes so I need to actually store pretty much like [00:43:01] to actually store pretty much like everything about the tree until I find [00:43:03] everything about the tree until I find my solution and then that's where you [00:43:05] my solution and then that's where you lose like in Brett's first search in [00:43:08] lose like in Brett's first search in terms of space it's not going to be that [00:43:10] terms of space it's not going to be that great so in terms of space it's now [00:43:12] great so in terms of space it's now order P to the D it's a lot worse than [00:43:13] order P to the D it's a lot worse than what we have had in terms of time it is [00:43:16] what we have had in terms of time it is better it's still exponential it's [00:43:18] better it's still exponential it's better okay all right okay so now let's [00:43:26] better okay all right okay so now let's talk about one more algorithm and then [00:43:28] talk about one more algorithm and then after that we we jump to dynamic [00:43:30] after that we we jump to dynamic programming where is a question back [00:43:32] programming where is a question back here yeah so it is exponential I agree [00:43:39] here yeah so it is exponential I agree so D can be the same as Big D but in [00:43:41] so D can be the same as Big D but in practice if small D is not the same as [00:43:43] practice if small D is not the same as Big D you're winning a lot because [00:43:46] Big D you're winning a lot because because yeah these lower layers are so [00:43:48] because yeah these lower layers are so bad that that people actually like to [00:43:51] bad that that people actually like to call the call the fact that [00:43:53] call the call the fact that or don't be to the small D EFS be big [00:43:59] or don't be to the small D EFS be big worst case in there for the time and [00:44:01] worst case in there for the time and awkward yet so Deena fest needs to go [00:44:06] awkward yet so Deena fest needs to go all the way down to these lower lower [00:44:08] all the way down to these lower lower levels what vnfs can stop at every level [00:44:11] levels what vnfs can stop at every level because it's doing a level by level yeah [00:44:17] because it's doing a level by level yeah so the reason is yeah so like if you're [00:44:19] so the reason is yeah so like if you're saying okay so in DFS we were also [00:44:21] saying okay so in DFS we were also saving some time right like why aren't [00:44:22] saving some time right like why aren't you like calling that out and then the [00:44:24] you like calling that out and then the reason is with DFS you still need to get [00:44:26] reason is with DFS you still need to get to these like more layers and that is [00:44:28] to these like more layers and that is the like that is the place that you're [00:44:31] the like that is the place that you're losing on time so so the fact that [00:44:32] losing on time so so the fact that you're still like losing on time and [00:44:34] you're still like losing on time and sure like you haven't explored these [00:44:35] sure like you haven't explored these other ones but you have already got to [00:44:37] other ones but you have already got to these lower trees like so far that's [00:44:39] these lower trees like so far that's pretty bad so that is why or they're [00:44:42] pretty bad so that is why or they're free to the D universe alright so this [00:44:46] free to the D universe alright so this last Tyler thing I want to talk about [00:44:47] last Tyler thing I want to talk about this is an idea that tries it's a cool [00:44:49] this is an idea that tries it's a cool idea it actually tries to combine the [00:44:52] idea it actually tries to combine the benefits of VFS and DFS and then this is [00:44:55] benefits of VFS and DFS and then this is called DFS iterative deepening so what [00:45:01] called DFS iterative deepening so what this algorithm does is it basically goes [00:45:03] this algorithm does is it basically goes level by level same SPFs because then [00:45:07] level by level same SPFs because then that way if you find a solution you're [00:45:09] that way if you find a solution you're done everything is great right [00:45:11] done everything is great right but with what it does is for every level [00:45:14] but with what it does is for every level it runs a full DFS and it feels it take [00:45:17] it runs a full DFS and it feels it take it's gonna take a long time but it's [00:45:20] it's gonna take a long time but it's actually good because again if you find [00:45:21] actually good because again if you find your solution like early on it doesn't [00:45:24] your solution like early on it doesn't matter that you have ran like a million [00:45:25] matter that you have ran like a million DSS's [00:45:26] DSS's so far so um so it's kind of like an [00:45:29] so far so um so it's kind of like an analogy of it is imagine that you have a [00:45:31] analogy of it is imagine that you have a dog and that dog is DFS and it's on a [00:45:34] dog and that dog is DFS and it's on a leash and you have like a short leash [00:45:36] leash and you have like a short leash and when it is on that leash it's going [00:45:38] and when it is on that leash it's going to do a DFS and try out and search all [00:45:39] to do a DFS and try out and search all the space and it doesn't find it [00:45:41] the space and it doesn't find it anything so it comes back and then [00:45:43] anything so it comes back and then you're going to extend the leash a [00:45:44] you're going to extend the leash a little bit and it's gonna do everything [00:45:46] little bit and it's gonna do everything and like to search everything and do a [00:45:48] and like to search everything and do a DFS comes back doesn't find anything you [00:45:50] DFS comes back doesn't find anything you extend the leash again so so that's the [00:45:52] extend the leash again so so that's the idea like extending the leeches this [00:45:54] idea like extending the leeches this idea of extending your your levels okay [00:45:57] idea of extending your your levels okay so so how does how does DFS iterative [00:46:02] so so how does how does DFS iterative deepening in yes [00:46:06] bottom of the tree it's even worse than [00:46:07] bottom of the tree it's even worse than both of mine [00:46:14] yes exactly yes that's okay that's a [00:46:16] yes exactly yes that's okay that's a good point so the point is the point out [00:46:18] good point so the point is the point out that mentioned this if your solution is [00:46:22] that mentioned this if your solution is like here you're screwed it's worse than [00:46:24] like here you're screwed it's worse than the FS or BFS right you're doing all [00:46:26] the FS or BFS right you're doing all these DFS's through like a bigger like [00:46:29] these DFS's through like a bigger like higher level BFS and your and and it's [00:46:32] higher level BFS and your and and it's it's a terrible situation but again in [00:46:33] it's a terrible situation but again in practice like we were hoping the [00:46:35] practice like we were hoping the solutions are not gonna end up like down [00:46:37] solutions are not gonna end up like down the street but yeah if the solutions are [00:46:39] the street but yeah if the solutions are down the tree then you're not like [00:46:41] down the tree then you're not like reading anything wait about using the SS [00:46:44] reading anything wait about using the SS like problems do you think the FS area [00:46:48] like problems do you think the FS area name would be useful in general if you [00:46:50] name would be useful in general if you okay so the question is yeah so in what [00:46:52] okay so the question is yeah so in what problems do we think the FSA deepening [00:46:54] problems do we think the FSA deepening is useful in general if like there are [00:46:57] is useful in general if like there are problems that I think BFS is going to be [00:46:59] problems that I think BFS is going to be useful usually BFS iterative deepening [00:47:00] useful usually BFS iterative deepening is useful the reason I would think that [00:47:02] is useful the reason I would think that is like there is some structure about [00:47:04] is like there is some structure about the problem that I would think I would [00:47:06] the problem that I would think I would find my solution earlier so if I if I [00:47:08] find my solution earlier so if I if I have some reasons some some reasons [00:47:11] have some reasons some some reasons about the problem about the structure of [00:47:12] about the problem about the structure of the problem and I think solutions are [00:47:14] the problem and I think solutions are low depth I should use some of these [00:47:16] low depth I should use some of these algorithms and a DFS video rate of [00:47:18] algorithms and a DFS video rate of deepening in terms of space it helps to [00:47:20] deepening in terms of space it helps to so might as well use that all right so [00:47:22] so might as well use that all right so so in terms of space it's going to be [00:47:24] so in terms of space it's going to be order of small Deena so in terms of [00:47:28] order of small Deena so in terms of space or their small D and then in terms [00:47:31] space or their small D and then in terms of time it gets the same benefits of it [00:47:35] of time it gets the same benefits of it gets the same benefits of BFS so so [00:47:38] gets the same benefits of BFS so so that's that's nice and then again like [00:47:40] that's that's nice and then again like because it's has this BFS out of the [00:47:43] because it's has this BFS out of the loop it has the same sort of constraint [00:47:45] loop it has the same sort of constraint on it cost it's gotta be a constant [00:47:48] on it cost it's gotta be a constant constraint guys pause alright so that is [00:47:52] constraint guys pause alright so that is our table and again in looking at this [00:47:55] our table and again in looking at this table in terms of time you're just not [00:47:57] table in terms of time you're just not doing well right like we have this [00:47:59] doing well right like we have this exponential time algorithms here and we [00:48:04] exponential time algorithms here and we could avoid the exponential space [00:48:06] could avoid the exponential space reducing something like the FS iterative [00:48:08] reducing something like the FS iterative deepening but still this time thing is [00:48:10] deepening but still this time thing is it's just not that great okay and what [00:48:13] it's just not that great okay and what we want to do in a now is you want to [00:48:14] we want to do in a now is you want to talk about search algorithms that bring [00:48:17] talk about search algorithms that bring down this exponential time to polynomial [00:48:19] down this exponential time to polynomial time [00:48:19] time somehow and then there is no magic we'll [00:48:22] somehow and then there is no magic we'll talk about how and dynamic programming [00:48:25] talk about how and dynamic programming is the first so so the way iterative [00:48:36] is the first so so the way iterative deepening works is it sets the local say [00:48:40] deepening works is it sets the local say level is 1 so if level is 1 I'm gonna do [00:48:42] level is 1 so if level is 1 I'm gonna do a full DFS okay because I'm doing a full [00:48:46] a full DFS okay because I'm doing a full DFS interpretive space it's the same as [00:48:49] DFS interpretive space it's the same as DFS in terms of space I just it's just [00:48:51] DFS in terms of space I just it's just the same as the length where we find a [00:48:54] the same as the length where we find a solution let's see the length where I [00:48:55] solution let's see the length where I find a solution is small D so now I say [00:48:58] find a solution is small D so now I say level is too many the level is - I'm [00:49:00] level is too many the level is - I'm gonna do a full DFS yeah so when I do a [00:49:04] gonna do a full DFS yeah so when I do a full DFS then in terms of space I need [00:49:07] full DFS then in terms of space I need to I need to just remember my parents so [00:49:10] to I need to just remember my parents so that's why it's border of T in terms of [00:49:12] that's why it's border of T in terms of space and in terms of time its its order [00:49:15] space and in terms of time its its order of B to the D because if I find my [00:49:17] of B to the D because if I find my solution here I'm done I don't need to [00:49:19] solution here I'm done I don't need to like explore anything else and and that [00:49:21] like explore anything else and and that is exponential about the exponential in [00:49:23] is exponential about the exponential in this smaller depth as opposed to longer [00:49:26] this smaller depth as opposed to longer that's similar to similar to BFS I sorry [00:49:34] that's similar to similar to BFS I sorry I still don't understand why let's say [00:49:35] I still don't understand why let's say like that's it okay so that's a very [00:49:40] like that's it okay so that's a very good question so yeah I think I know it [00:49:41] good question so yeah I think I know it so you're asking small D small D was the [00:49:43] so you're asking small D small D was the same as Big D if I had my solutions down [00:49:45] same as Big D if I had my solutions down here oh why am I like differentiating [00:49:47] here oh why am I like differentiating here between a small D and Big D right [00:49:50] here between a small D and Big D right is that what you're asking for away [00:49:54] quite large like smoothies log oh I see [00:50:09] quite large like smoothies log oh I see what you're saying so you're saying okay [00:50:10] what you're saying so you're saying okay like when I'm doing when I'm performing [00:50:12] like when I'm doing when I'm performing DFS iterative deepening then I'm doing [00:50:16] DFS iterative deepening then I'm doing DF DF SS so sure it's order of B to the [00:50:19] DF DF SS so sure it's order of B to the D for each of them but then I'm doing D [00:50:21] D for each of them but then I'm doing D of them and if these really large I [00:50:22] of them and if these really large I should put that here sure I do agree [00:50:25] should put that here sure I do agree that is the right time but again like in [00:50:27] that is the right time but again like in the case of this exponential this is so [00:50:30] the case of this exponential this is so bad that we are just dropping that like [00:50:32] bad that we are just dropping that like don't even worry about that extra [00:50:33] don't even worry about that extra t-that's come see but it is true you [00:50:35] t-that's come see but it is true you need to have that extra D like in [00:50:37] need to have that extra D like in general if you want to talk about it I [00:50:38] general if you want to talk about it I don't want to move on to dynamic [00:50:39] don't want to move on to dynamic programming but last question just on [00:50:41] programming but last question just on top of that presumably though you're [00:50:43] top of that presumably though you're saving the work that you've done during [00:50:44] saving the work that you've done during the prior iteration so you're not really [00:50:46] the prior iteration so you're not really computing anything larger than the [00:50:47] computing anything larger than the capital d correct yeah that's right [00:50:50] capital d correct yeah that's right the worst-case scenario is or to the [00:50:51] the worst-case scenario is or to the beat all right so let's move to dynamic [00:50:55] beat all right so let's move to dynamic programming okay so so what does dynamic [00:50:58] programming okay so so what does dynamic programming do so maybe I can I'll still [00:51:03] programming do so maybe I can I'll still use this cuz I might use this later okay [00:51:08] use this cuz I might use this later okay so you know erase my tram on here okay [00:51:11] so you know erase my tram on here okay so the idea of dynamic programming we [00:51:13] so the idea of dynamic programming we have already seen this in the first [00:51:14] have already seen this in the first lecture is I have a state s and I want [00:51:19] lecture is I have a state s and I want to end up in some end state but to do [00:51:21] to end up in some end state but to do that I can take an action that takes me [00:51:24] that I can take an action that takes me to s prime right I can I can end up in s [00:51:27] to s prime right I can I can end up in s prime by cost of this a I can take an [00:51:31] prime by cost of this a I can take an action that not ends up in s prime and [00:51:33] action that not ends up in s prime and then from there I can do a bunch of [00:51:34] then from there I can do a bunch of things I don't know what but I'll end up [00:51:37] things I don't know what but I'll end up in some end State and what I'm [00:51:41] in some end State and what I'm interested in actually computing is for [00:51:44] interested in actually computing is for this state s is to find what is future [00:51:48] this state s is to find what is future cost of and this part of it is future [00:51:54] cost of and this part of it is future cost that's fine and I don't know what [00:51:57] cost that's fine and I don't know what it is but I can just leave it as future [00:51:59] it is but I can just leave it as future cost of s prime so if I want to find [00:52:01] cost of s prime so if I want to find what future cost of s is maybe I should [00:52:04] what future cost of s is maybe I should say feels a little bit to the right one [00:52:06] say feels a little bit to the right one sense right cost of si for this edge [00:52:10] sense right cost of si for this edge erase this what I'm interested in [00:52:13] erase this what I'm interested in finding is future cost of my stakes well [00:52:17] finding is future cost of my stakes well what is that equal to well that's going [00:52:20] what is that equal to well that's going to be equal to this cost of Si right [00:52:23] to be equal to this cost of Si right like at state s I'm gonna take action [00:52:24] like at state s I'm gonna take action okay so it's gonna be cost of Si plus [00:52:30] okay so it's gonna be cost of Si plus future cost of s Prime again I don't [00:52:32] future cost of s Prime again I don't know what that is but that's future [00:52:34] know what that is but that's future Dorset's problem so this is future cost [00:52:38] Dorset's problem so this is future cost of s Prime [00:52:40] of s Prime and then you might ask well what is a [00:52:42] and then you might ask well what is a where does a come from how do I know [00:52:44] where does a come from how do I know what a is I don't know [00:52:46] what a is I don't know I'm gonna pick an a that minimizes this [00:52:49] I'm gonna pick an a that minimizes this some around it okay so future cost of s [00:52:57] some around it okay so future cost of s is just going to be equal to minimum of [00:53:00] is just going to be equal to minimum of cost of Si plus future cost of s prime [00:53:03] cost of Si plus future cost of s prime over all possible actions and it's going [00:53:07] over all possible actions and it's going to be zero if you're in an end State [00:53:10] to be zero if you're in an end State this is end of true okay so if I already [00:53:16] this is end of true okay so if I already know I'm in an end State and there is no [00:53:18] know I'm in an end State and there is no future cost that's going to be equal to [00:53:19] future cost that's going to be equal to 0 otherwise [00:53:21] 0 otherwise future cost is just going to be cost of [00:53:23] future cost is just going to be cost of going from s to the next state and then [00:53:25] going from s to the next state and then future cost so that is just how one [00:53:33] future cost so that is just how one would go about formalizing this problem [00:53:35] would go about formalizing this problem as a dynamic problem and they're not [00:53:37] as a dynamic problem and they're not dynamic programming problem and then how [00:53:41] dynamic programming problem and then how do I find what s Primus well I wrote [00:53:44] do I find what s Primus well I wrote this successor and cost function my code [00:53:46] this successor and cost function my code remember like we know how to find a [00:53:48] remember like we know how to find a successor given that we are in state s [00:53:51] successor given that we are in state s and we are taking action a so s prime is [00:53:54] and we are taking action a so s prime is just calling that successor function [00:53:56] just calling that successor function over SN a alright so let's go back to [00:53:59] over SN a alright so let's go back to some route finding example so so this is [00:54:03] some route finding example so so this is a slightly different route finding [00:54:05] a slightly different route finding example so let's say that you want to [00:54:06] example so let's say that you want to find the minimum cost path from going a [00:54:09] find the minimum cost path from going a city 1 to some city end in the future [00:54:12] city 1 to some city end in the future moving forward we can always just move [00:54:14] moving forward we can always just move forward and it costs CIJ to go from city [00:54:17] forward and it costs CIJ to go from city I to CDJ ok so this is this is my new [00:54:20] I to CDJ ok so this is this is my new search problem so so this is kind of how [00:54:23] search problem so so this is kind of how the tree would look like so if I want to [00:54:25] the tree would look like so if I want to draw this research for this I can start [00:54:27] draw this research for this I can start from city one I can end up in city 2 or [00:54:30] from city one I can end up in city 2 or 3 or 4 then if I'm a city two I can end [00:54:33] 3 or 4 then if I'm a city two I can end up in 3 or 4 upon 3 I can end up in or [00:54:35] up in 3 or 4 upon 3 I can end up in or right like this is how I can have a much [00:54:39] right like this is how I can have a much larger version of it if I'm talking [00:54:41] larger version of it if I'm talking about going to city 7 then I have this [00:54:44] about going to city 7 then I have this type of tree and by just like looking at [00:54:46] type of tree and by just like looking at this tree you see all these sub tree is [00:54:49] this tree you see all these sub tree is just being repeated like like throughout [00:54:51] just being repeated like like throughout right if [00:54:52] right if just looking at five like future costs [00:54:54] just looking at five like future costs of five it's gonna be the same thing [00:54:56] of five it's gonna be the same thing this is gonna be the same thing [00:54:58] this is gonna be the same thing throughout and if I use like something [00:55:00] throughout and if I use like something liked research said we have talked about [00:55:01] liked research said we have talked about then I have to like go and explore like [00:55:03] then I have to like go and explore like this whole tree and then it's gonna be [00:55:06] this whole tree and then it's gonna be really time-consuming so so the key [00:55:08] really time-consuming so so the key insight here is future costs this value [00:55:11] insight here is future costs this value of future cost only depends on state [00:55:13] of future cost only depends on state okay so it only depends on where I am [00:55:16] okay so it only depends on where I am right now and because of that maybe I [00:55:18] right now and because of that maybe I can just store that the first time that [00:55:19] can just store that the first time that I compute future cost of five and then [00:55:22] I compute future cost of five and then like in the future I just call that and [00:55:24] like in the future I just call that and then and I don't like recompute the [00:55:25] then and I don't like recompute the future cost of life so so the [00:55:28] future cost of life so so the observation here is future costs only [00:55:31] observation here is future costs only depends on current city so so my state [00:55:34] depends on current city so so my state in this case is current city and then [00:55:36] in this case is current city and then that state is enough for me to compute [00:55:38] that state is enough for me to compute future costs all right so so if you if [00:55:43] future costs all right so so if you if you think about what we have talked [00:55:45] you think about what we have talked about so far like we have thought about [00:55:46] about so far like we have thought about like these these search problems where [00:55:49] like these these search problems where the stage we think of it as the past [00:55:50] the stage we think of it as the past sequence of actions in the history of [00:55:52] sequence of actions in the history of actions you have taken and all that but [00:55:54] actions you have taken and all that but right now for this problem like see is [00:55:56] right now for this problem like see is just current city that's enough okay so [00:55:59] just current city that's enough okay so and because of that you're getting all [00:56:01] and because of that you're getting all these exponential savings in time and [00:56:03] these exponential savings in time and and space because again I can compute [00:56:05] and space because again I can compute future cost of five there and collapse [00:56:07] future cost of five there and collapse that whole tree into this graph and just [00:56:10] that whole tree into this graph and just go about solving my search problem in [00:56:12] go about solving my search problem in this graph as opposed to that that whole [00:56:14] this graph as opposed to that that whole tree yeah so so that's that's where you [00:56:16] tree yeah so so that's that's where you get the savings from dynamic programming [00:56:20] get the savings from dynamic programming and I just want to emphasize that again [00:56:23] and I just want to emphasize that again all let me actually do this so so the [00:56:25] all let me actually do this so so the key idea here is like what I was saying [00:56:28] key idea here is like what I was saying there's no magic happening here the key [00:56:29] there's no magic happening here the key idea here is is how to figure out what [00:56:32] idea here is is how to figure out what your state is it's actually important to [00:56:34] your state is it's actually important to think about what your status in this [00:56:36] think about what your status in this case we are we're assuming the state is [00:56:37] case we are we're assuming the state is summary of all parts all past actions [00:56:40] summary of all parts all past actions that we have taken sufficient for us to [00:56:43] that we have taken sufficient for us to choose the optimal future okay so so [00:56:45] choose the optimal future okay so so that's like a mouthful but basically [00:56:47] that's like a mouthful but basically what that means is the only reason [00:56:50] what that means is the only reason dynamic programming works and for this [00:56:52] dynamic programming works and for this particular example we just saw is the [00:56:54] particular example we just saw is the state the way we define it is enough for [00:56:56] state the way we define it is enough for us to plan for the future I might have a [00:56:59] us to plan for the future I might have a different problem where the state like I [00:57:00] different problem where the state like I defined a state in a way that it's not [00:57:02] defined a state in a way that it's not enough for me to define for future but [00:57:04] enough for me to define for future but if [00:57:05] if I want to use dynamic programming then I [00:57:06] I want to use dynamic programming then I got to be smart about choosing my state [00:57:08] got to be smart about choosing my state because because that is the thing that [00:57:09] because because that is the thing that decides for the future so so for example [00:57:12] decides for the future so so for example for this problem like I might visit city [00:57:14] for this problem like I might visit city 1 then 3 then 4 and then 6 and and for [00:57:18] 1 then 3 then 4 and then 6 and and for solving this particular search problem I [00:57:19] solving this particular search problem I just need to know that I'm in City 6 [00:57:21] just need to know that I'm in City 6 that is enough but like maybe I have [00:57:24] that is enough but like maybe I have some other problem that requires knowing [00:57:26] some other problem that requires knowing 1 3 4 and 6 and then because of that [00:57:28] 1 3 4 and 6 and then because of that maybe I need to know the full tree ok so [00:57:30] maybe I need to know the full tree ok so so this is where the saving comes from [00:57:32] so this is where the saving comes from like figuring out what the state is and [00:57:34] like figuring out what the state is and and defining that right all right so so [00:57:37] and defining that right all right so so we'll come back to this notion of state [00:57:38] we'll come back to this notion of state again and think about the state a little [00:57:40] again and think about the state a little bit more carefully but maybe before that [00:57:42] bit more carefully but maybe before that maybe we can just implement dynamic [00:57:43] maybe we can just implement dynamic programming real quick all right so [00:57:46] programming real quick all right so let's go back to our tram problem I'm [00:57:48] let's go back to our tram problem I'm back to the tram problem and let's [00:57:52] back to the tram problem and let's implement dynamic programming ok so how [00:57:55] implement dynamic programming ok so how do we do this you're basically just [00:57:56] do we do this you're basically just writing that like math over there into [00:57:59] writing that like math over there into code that that's all we're doing so so [00:58:00] code that that's all we're doing so so we're going to define this future cost [00:58:02] we're going to define this future cost if you're in an end state we're in [00:58:04] if you're in an end state we're in return 0 if you're not in an end state [00:58:07] return 0 if you're not in an end state we're just going to add up cost plus [00:58:09] we're just going to add up cost plus future cost of s Prime how do we get S [00:58:11] future cost of s Prime how do we get S Prime well you're going to call this [00:58:13] Prime well you're going to call this success a success and cost function so [00:58:16] success a success and cost function so we can get action new state and the [00:58:18] we can get action new state and the costs and then you're going to take the [00:58:20] costs and then you're going to take the minimum of them over over all possible [00:58:23] minimum of them over over all possible actions so minimum of cost plus future [00:58:26] actions so minimum of cost plus future cost of new states that is literally [00:58:29] cost of new states that is literally what you have on your board [00:58:33] right [00:58:35] right and we are returning a result so that is [00:58:38] and we are returning a result so that is Futurecast [00:58:39] Futurecast what's your dynamic programming did it [00:58:40] what's your dynamic programming did it should it should return future cost over [00:58:43] should it should return future cost over initial state right starts [00:58:47] you [00:58:48] you and you returned the history if you want [00:58:51] and you returned the history if you want in this case I'm not returning to [00:58:53] in this case I'm not returning to history okay so how do I get savings [00:58:57] history okay so how do I get savings well I got to put a cache right that's [00:58:59] well I got to put a cache right that's the only way I'm gonna get savings so [00:59:01] the only way I'm gonna get savings so that is where I put the cash and and if [00:59:05] that is where I put the cash and and if I if the state is already in the cache [00:59:08] I if the state is already in the cache I'll just call my cache otherwise [00:59:10] I'll just call my cache otherwise question go supposed to go how are we [00:59:16] question go supposed to go how are we getting future costs how are we getting [00:59:19] getting future costs how are we getting so future caustics and States but what [00:59:22] so future caustics and States but what actually this would like we have to have [00:59:23] actually this would like we have to have like a function implemented if I [00:59:24] like a function implemented if I calculate your costs or is that like so [00:59:27] calculate your costs or is that like so the future cost is going to be yes oh so [00:59:29] the future cost is going to be yes oh so we have this function right future costs [00:59:31] we have this function right future costs overstate but you're gonna call future [00:59:33] overstate but you're gonna call future costs so future cost over state is going [00:59:36] costs so future cost over state is going to be equal to cost of state and actions [00:59:38] to be equal to cost of state and actions in this function I'm saying all possible [00:59:40] in this function I'm saying all possible actions try that out plus future cost of [00:59:42] actions try that out plus future cost of s Prime and s Prime comes from the [00:59:45] s Prime and s Prime comes from the successor and then cost function and a [00:59:46] successor and then cost function and a successor and cost much [00:59:49] successor and cost much all right so an and yeah and so so we do [00:59:53] all right so an and yeah and so so we do the caching the proper caching type of [00:59:55] the caching the proper caching type of way of doing this too and now we have [00:59:58] way of doing this too and now we have dynamic programming so we can basically [00:59:59] dynamic programming so we can basically call this over our trying problem so I'm [01:00:05] call this over our trying problem so I'm gonna I'm gonna move forward [01:00:08] gonna I'm gonna move forward okay so let's do print solution dynamic [01:00:13] okay so let's do print solution dynamic programming over our problem you can [01:00:16] programming over our problem you can again play around with this the only [01:00:18] again play around with this the only baby I'm taking this is if it gives me [01:00:20] baby I'm taking this is if it gives me the same solution has backtracking [01:00:22] the same solution has backtracking search because I knew how that works [01:00:24] search because I knew how that works right so let's just call it on ten and [01:00:28] right so let's just call it on ten and yeah it gave me the same the same answer [01:00:30] yeah it gave me the same the same answer so I can pull your own with this okay [01:00:32] so I can pull your own with this okay all right so let's go back [01:00:40] all right so let's go back okay so one assumption that we have here [01:00:44] okay so one assumption that we have here to just point out is we're assuming that [01:00:46] to just point out is we're assuming that this graph is going to be a cyclic so so [01:00:49] this graph is going to be a cyclic so so that's that's an assumption that we need [01:00:50] that's that's an assumption that we need to make when we are solving this dynamic [01:00:52] to make when we are solving this dynamic programming problem and the reason is [01:00:54] programming problem and the reason is well we need to compute this future cost [01:00:57] well we need to compute this future cost right for me to compute future cost of s [01:00:59] right for me to compute future cost of s the SS prime I need to like have thought [01:01:03] the SS prime I need to like have thought about sorry for me to compute future [01:01:05] about sorry for me to compute future cost of s I need to have thought about [01:01:06] cost of s I need to have thought about future cost of s prime so there is kind [01:01:09] future cost of s prime so there is kind of this natural ordering that exists [01:01:11] of this natural ordering that exists between my states so if I think about an [01:01:13] between my states so if I think about an example over there are cycles then then [01:01:17] example over there are cycles then then I don't have that ordering right if I [01:01:18] I don't have that ordering right if I want to compute it let's say I want to [01:01:19] want to compute it let's say I want to go from A to D here and C so if I want [01:01:25] go from A to D here and C so if I want to compute future cost of B I don't [01:01:27] to compute future cost of B I don't really know if I should have computed [01:01:29] really know if I should have computed future cost of a before or C before or [01:01:32] future cost of a before or C before or what order should I have gone to compute [01:01:35] what order should I have gone to compute like future possibly so so you actually [01:01:38] like future possibly so so you actually need to have some way of ordering your [01:01:40] need to have some way of ordering your states in order to compute these future [01:01:43] states in order to compute these future costs and apply the dynamic programming [01:01:45] costs and apply the dynamic programming so that's why like we can't really have [01:01:47] so that's why like we can't really have cycles like let me let me think about [01:01:49] cycles like let me let me think about this algorithm but we are going to talk [01:01:50] this algorithm but we are going to talk about a uniform cost search which [01:01:52] about a uniform cost search which actually allows us to have cycles like [01:01:54] actually allows us to have cycles like in a few slides so the wrong time all [01:01:59] in a few slides so the wrong time all this is actually polynomial time into [01:02:01] this is actually polynomial time into order of states so order of N [01:02:04] order of states so order of N yeah well remember and it's a number of [01:02:07] yeah well remember and it's a number of states all right so all right so let's [01:02:12] states all right so all right so let's talk about the idea of States a little [01:02:13] talk about the idea of States a little bit more because I think this is this is [01:02:15] bit more because I think this is this is actually interesting right so so let's [01:02:17] actually interesting right so so let's just reiterate what is a state state is [01:02:19] just reiterate what is a state state is a summary of all past actions sufficient [01:02:22] a summary of all past actions sufficient to choose future actions optimally okay [01:02:25] to choose future actions optimally okay so so everyone happy but what status so [01:02:29] so so everyone happy but what status so now what we want to do is we want to [01:02:30] now what we want to do is we want to figure out how we should define our [01:02:33] figure out how we should define our state space because again this is an [01:02:34] state space because again this is an important problem right like how we were [01:02:36] important problem right like how we were defining state space is the thing that [01:02:38] defining state space is the thing that gets the dynamic program you're working [01:02:39] gets the dynamic program you're working so so we got it we got to think about [01:02:41] so so we got it we got to think about how to do that so so let's go back to [01:02:43] how to do that so so let's go back to this example and let's just change that [01:02:45] this example and let's just change that a little bit so so this is the same [01:02:46] a little bit so so this is the same example of I'm going from city one to [01:02:48] example of I'm going from city one to see the end I can only move forward and [01:02:51] see the end I can only move forward and it costs CIJ to go from any city I to [01:02:53] it costs CIJ to go from any city I to say DJ and I'm gonna add a constraint [01:02:56] say DJ and I'm gonna add a constraint and the constraint is I can't visit [01:02:58] and the constraint is I can't visit three odd cities in a row so what that [01:03:02] three odd cities in a row so what that means is maybe I'm in state 1 and then I [01:03:08] means is maybe I'm in state 1 and then I went to state 3 or cd1 I went to city 3 [01:03:12] went to state 3 or cd1 I went to city 3 and then after that can I go to City [01:03:15] and then after that can I go to City Simon well no based on this constraint [01:03:18] Simon well no based on this constraint that I've added I can't do that right so [01:03:21] that I've added I can't do that right so I wanted to find a state space that [01:03:23] I wanted to find a state space that allows me to keep track of these things [01:03:25] allows me to keep track of these things so I can solve this new search problem [01:03:27] so I can solve this new search problem with this new constraint so so how [01:03:30] with this new constraint so so how should i how should i do so in the [01:03:33] should i how should i do so in the previous problem when we didn't have the [01:03:35] previous problem when we didn't have the constraint our state was just the [01:03:38] constraint our state was just the current city like previously you just [01:03:40] current city like previously you just cared about the current city and the [01:03:43] cared about the current city and the reason we cared about the current city [01:03:44] reason we cared about the current city is like like we were solving the search [01:03:46] is like like we were solving the search problem like they end up in a city you [01:03:48] problem like they end up in a city you need to know how I'm going where I [01:03:50] need to know how I'm going where I should go from 3 so I should I should [01:03:51] should go from 3 so I should I should have my current city in general right so [01:03:53] have my current city in general right so so for the previous problem without the [01:03:55] so for the previous problem without the constraint current city was enough but [01:03:58] constraint current city was enough but now current city is not enough right I [01:03:59] now current city is not enough right I actually need to know like something [01:04:02] actually need to know like something about my path ok yeah that's actually a [01:04:09] about my path ok yeah that's actually a very good point so one suggestion is [01:04:12] very good point so one suggestion is have a count of how many odd States [01:04:14] have a count of how many odd States another maybe like and different maybe [01:04:15] another maybe like and different maybe the first thing I would come to our mind [01:04:17] the first thing I would come to our mind is something [01:04:17] is something simpler so maybe we say well the state [01:04:19] simpler so maybe we say well the state is similar to its like the state like [01:04:24] is similar to its like the state like when we say well the state is previous [01:04:26] when we say well the state is previous city and current city [01:04:29] city and current city okay so this is one possible option for [01:04:32] okay so this is one possible option for for my state right cuz cuz if I have [01:04:34] for my state right cuz cuz if I have this if I have this guy as my state and [01:04:37] this if I have this guy as my state and then that is enough right like if I my [01:04:39] then that is enough right like if I my current city is three I know my previous [01:04:41] current city is three I know my previous city was one I know I shouldn't go to [01:04:44] city was one I know I shouldn't go to seven like that's enough for me to make [01:04:45] seven like that's enough for me to make like future decisions yeah but there is [01:04:49] like future decisions yeah but there is a problem with this well what is the [01:04:51] a problem with this well what is the problem so I have n cities right so so [01:04:55] problem so I have n cities right so so current city can take any possible [01:04:57] current city can take any possible action and n possible states previous [01:05:00] action and n possible states previous city can also take and possible options [01:05:03] city can also take and possible options has impossible options so if I think [01:05:04] has impossible options so if I think about the size of my state space it is n [01:05:07] about the size of my state space it is n squirt if I decide to choose this state [01:05:10] squirt if I decide to choose this state but if I decide to choose this state I'm [01:05:12] but if I decide to choose this state I'm going to have n squirt [01:05:14] going to have n squirt States and remember we are doing this [01:05:15] States and remember we are doing this dynamic programming thing like we need [01:05:17] dynamic programming thing like we need to actually like write down like like [01:05:19] to actually like write down like like how to get from all those states that's [01:05:21] how to get from all those states that's gonna be big but there is an improvement [01:05:23] gonna be big but there is an improvement to this and that's the improvement that [01:05:24] to this and that's the improvement that you suggested which is I don't actually [01:05:27] you suggested which is I don't actually need to have this whole giant previous [01:05:29] need to have this whole giant previous city which has n options I can just have [01:05:31] city which has n options I can just have a counter to just know whether the [01:05:33] a counter to just know whether the previous city was odd or not like that's [01:05:36] previous city was odd or not like that's enough right like fight I don't care if [01:05:37] enough right like fight I don't care if it was one or three or whatever right [01:05:39] it was one or three or whatever right like I just care to know if previous [01:05:41] like I just care to know if previous city was odd or not so so another option [01:05:43] city was odd or not so so another option for alright it's here another option for [01:05:46] for alright it's here another option for my state is to know if previous was odd [01:05:51] my state is to know if previous was odd or not okay and I need to know my [01:05:54] or not okay and I need to know my current city again right currency do we [01:05:56] current city again right currency do we need that because like we need to know [01:05:58] need that because like we need to know how to get from there and then this [01:06:01] how to get from there and then this brings down my state space like how does [01:06:03] brings down my state space like how does it bring down my state space because [01:06:04] it bring down my state space because well what's the size of my state space [01:06:06] well what's the size of my state space this guy can take and possible states if [01:06:10] this guy can take and possible states if my previous city was odd that's too like [01:06:13] my previous city was odd that's too like so I just brought down my state space [01:06:17] so I just brought down my state space from something that was N squared 2 to N [01:06:19] from something that was N squared 2 to N and that's a good improvement so in [01:06:21] and that's a good improvement so in general learning you're picking these [01:06:23] general learning you're picking these state spaces you should pick them mini [01:06:25] state spaces you should pick them mini more like sufficient thing for you to [01:06:27] more like sufficient thing for you to make decisions so it's got to be a [01:06:29] make decisions so it's got to be a summary of all the previous action [01:06:31] summary of all the previous action previous things that you need to make [01:06:33] previous things that you need to make future decisions but pick the minimal [01:06:35] future decisions but pick the minimal one because you're stirring these things [01:06:36] one because you're stirring these things and it actually matters to pick the [01:06:38] and it actually matters to pick the smallest one so so here is an example of [01:06:41] smallest one so so here is an example of like exactly that so so my state is now [01:06:43] like exactly that so so my state is now this tuple or whether the previous city [01:06:45] this tuple or whether the previous city was odd or not and my current city so if [01:06:47] was odd or not and my current city so if I start at city one well like I don't [01:06:49] I start at city one well like I don't have a previous city and I'm at City one [01:06:51] have a previous city and I'm at City one I could go to City three and I end up in [01:06:54] I could go to City three and I end up in odd and three I could try to go to City [01:06:58] odd and three I could try to go to City seven well that's not possible because [01:06:59] seven well that's not possible because now I have good three state and end and [01:07:03] now I have good three state and end and up here and there like the rest of the [01:07:05] up here and there like the rest of the tree you can have another example so so [01:07:11] tree you can have another example so so the way I'm counting this is how so my [01:07:14] the way I'm counting this is how so my state is a tuple of two things right if [01:07:17] state is a tuple of two things right if the previous city is odd or even I have [01:07:19] the previous city is odd or even I have two options here it's either odd or even [01:07:21] two options here it's either odd or even that's too and then my current city and [01:07:23] that's too and then my current city and I have n possible options for my [01:07:25] I have n possible options for my currency they could be sitting one city [01:07:27] currency they could be sitting one city to city three so that's n so I have any [01:07:29] to city three so that's n so I have any options here I have two options here [01:07:31] options here I have two options here that's why I'm saying my whole state [01:07:32] that's why I'm saying my whole state space is two times okay all right okay [01:07:38] space is two times okay all right okay so let's try out this example let's not [01:07:42] so let's try out this example let's not put it in just talk to your neighbors [01:07:44] put it in just talk to your neighbors about this and then maybe you have ideas [01:07:47] about this and then maybe you have ideas just let me know in a minute so okay so [01:07:49] just let me know in a minute so okay so what is the difference here so you're [01:07:50] what is the difference here so you're traveling from city one to CDN and then [01:07:53] traveling from city one to CDN and then the constraint has changed now we want [01:07:54] the constraint has changed now we want to visit at least three odd cities so [01:07:57] to visit at least three odd cities so that's what we want to do and then the [01:07:59] that's what we want to do and then the question is what is the minimal state [01:08:03] talk to your neighbors [01:08:29] all right any ideas [01:08:39] any ideas [01:08:41] any ideas what is a possible state like it don't [01:08:45] what is a possible state like it don't worry about the minimal event like for [01:08:46] worry about the minimal event like for now like what do I need to keep track of [01:08:51] number of this number of odd City okay [01:08:54] number of this number of odd City okay so is that it [01:08:55] so is that it do I need to just know number of odd [01:08:56] do I need to just know number of odd cities so number of what I meant is I [01:09:05] cities so number of what I meant is I also need to have chrome City right so [01:09:07] also need to have chrome City right so okay so one possible option for this new [01:09:09] okay so one possible option for this new example I'm gonna write that here it is [01:09:12] example I'm gonna write that here it is I wonder is it at least three odd cities [01:09:15] I wonder is it at least three odd cities I also need my to know my current city [01:09:17] I also need my to know my current city for any of these types of like not any [01:09:18] for any of these types of like not any of these types of problems look for [01:09:20] of these types of problems look for these particular problems that I've [01:09:21] these particular problems that I've defined here I need to know where I am [01:09:22] defined here I need to know where I am so I need to know what my current city [01:09:24] so I need to know what my current city is so so that is like that is given what [01:09:27] is so so that is like that is given what I need to talk okay so I want to see at [01:09:30] I need to talk okay so I want to see at least three odd cities so one possible [01:09:32] least three odd cities so one possible option is to just have a counter I keep [01:09:34] option is to just have a counter I keep counting number of odd cities so this [01:09:41] counting number of odd cities so this could be one potential yes or maybe one [01:09:49] could be one potential yes or maybe one three one so okay so the question is do [01:09:53] three one so okay so the question is do the cities need to be different the way [01:09:54] the cities need to be different the way they are defining the problem is we're [01:09:56] they are defining the problem is we're moving forward if I'm in one like I can [01:09:58] moving forward if I'm in one like I can just move forward I can't like say add [01:09:59] just move forward I can't like say add one or I can't like go back so so we're [01:10:02] one or I can't like go back so so we're always moving forward but when we talk [01:10:03] always moving forward but when we talk about the state space we're talking [01:10:05] about the state space we're talking about the more general setting like some [01:10:08] about the more general setting like some of some of that 2n might not even be [01:10:10] of some of that 2n might not even be possible [01:10:10] possible that's alright so so this is one option [01:10:15] that's alright so so this is one option but I can actually do better than this [01:10:24] and then you're done right so a [01:10:27] and then you're done right so a suggestion there is we can you didn't [01:10:28] suggestion there is we can you didn't have like you can use at least three odd [01:10:31] have like you can use at least three odd cities then you need at least two art [01:10:33] cities then you need at least two art reduced and at least one art city and [01:10:35] reduced and at least one art city and then you're done and one way of [01:10:36] then you're done and one way of formalizing that that's exactly right [01:10:37] formalizing that that's exactly right like I don't care if I have four art [01:10:39] like I don't care if I have four art cities now or five art cities like as [01:10:41] cities now or five art cities like as long as I have like a buff three that's [01:10:43] long as I have like a buff three that's that's good enough right one odd city to [01:10:45] that's good enough right one odd city to our cities three art city and [01:10:47] our cities three art city and that is just three Foss like like that's [01:10:49] that is just three Foss like like that's enough for me okay so if I had this then [01:10:53] enough for me okay so if I had this then the state space here is going to be an [01:10:56] the state space here is going to be an options here and number of odd cities [01:10:59] options here and number of odd cities it's around and over two so it's going [01:11:01] it's around and over two so it's going to be N squared over two [01:11:02] to be N squared over two but if I use this this new suggestion [01:11:05] but if I use this this new suggestion where I don't keep track of four or five [01:11:07] where I don't keep track of four or five six seven I just keep track of one two [01:11:09] six seven I just keep track of one two and three plus then my state space ends [01:11:12] and three plus then my state space ends up becoming three times N and I can [01:11:15] up becoming three times N and I can formally write that as s is equal to [01:11:17] formally write that as s is equal to minimum of number of odd cities and [01:11:23] minimum of number of odd cities and three and then current cities needs the [01:11:27] three and then current cities needs the current city and with this state space [01:11:31] current city and with this state space then size is equal to three so I just [01:11:38] then size is equal to three so I just pick up brought down and squared to n [01:11:40] pick up brought down and squared to n and that's a nice yes not also need an [01:11:45] and that's a nice yes not also need an option for zero visibility we're [01:11:50] option for zero visibility we're starting from city one so you're already [01:11:51] starting from city one so you're already counting that in but yeah like if you [01:11:53] counting that in but yeah like if you have zero answer [01:11:56] have zero answer all right so haha okay so that was that [01:12:01] all right so haha okay so that was that this is how it look like like you can [01:12:03] this is how it look like like you can think of your state space like this [01:12:04] think of your state space like this again has a two pole of I visited one [01:12:06] again has a two pole of I visited one two three and and then the cities I have [01:12:10] two three and and then the cities I have another example here you can think about [01:12:12] another example here you can think about this later and yeah like work it at home [01:12:15] this later and yeah like work it at home but basically the question is again [01:12:17] but basically the question is again you're going from city 1 to n and you [01:12:19] you're going from city 1 to n and you want to visit more odd cities than even [01:12:21] want to visit more odd cities than even cities what would be the minimum states [01:12:23] cities what would be the minimum states but you can talk about it offline so the [01:12:26] but you can talk about it offline so the summary so far is is that state is going [01:12:29] summary so far is is that state is going to be a summary of past actions [01:12:30] to be a summary of past actions sufficient to choose future actions [01:12:32] sufficient to choose future actions optimally and then dynamic programming [01:12:34] optimally and then dynamic programming it's not doing any magic right it's [01:12:36] it's not doing any magic right it's using this notion of state to bring down [01:12:38] using this notion of state to bring down this exponential time algorithm to a [01:12:40] this exponential time algorithm to a polynomial time algorithm and then the [01:12:42] polynomial time algorithm and then the trick of using memorization and with the [01:12:44] trick of using memorization and with the trick of choosing the right state okay [01:12:47] trick of choosing the right state okay and we've talked about dynamic [01:12:48] and we've talked about dynamic programming and how it doesn't work for [01:12:51] programming and how it doesn't work for cyclic graphs and now we want to spend a [01:12:53] cyclic graphs and now we want to spend a little bit of time talking about uniform [01:12:55] little bit of time talking about uniform cost search [01:12:57] cost search and how that can help with the cycles so [01:13:01] and how that can help with the cycles so if you guys have seen Dijkstra's [01:13:02] if you guys have seen Dijkstra's algorithm this is very similar to [01:13:04] algorithm this is very similar to de-stress like yeah so it's basically [01:13:07] de-stress like yeah so it's basically like stars but alright so let's let's [01:13:11] like stars but alright so let's let's actually talk about this so so the [01:13:12] actually talk about this so so the observation here is that when we when we [01:13:14] observation here is that when we when we think about the cost of getting from [01:13:16] think about the cost of getting from start state to some s prime well that is [01:13:20] start state to some s prime well that is going to be equal to cost of going from [01:13:23] going to be equal to cost of going from s to s prime and then some past costs of [01:13:26] s to s prime and then some past costs of us and then been dynamic programming [01:13:28] us and then been dynamic programming like we make sure that we have this [01:13:30] like we make sure that we have this ordering and these things are computed [01:13:32] ordering and these things are computed in order so we're not worried about like [01:13:34] in order so we're not worried about like visiting the states like multiple times [01:13:35] visiting the states like multiple times but within in uniform path search we [01:13:39] but within in uniform path search we might visit the state multiple times and [01:13:41] might visit the state multiple times and if you have cycles we don't know what [01:13:42] if you have cycles we don't know what order to go but the order we can go is [01:13:45] order to go but the order we can go is we can actually compute a past cause the [01:13:47] we can actually compute a past cause the suggested path cost and and basically go [01:13:50] suggested path cost and and basically go over the states based on increasing past [01:13:53] over the states based on increasing past costs okay so let me actually yeah so so [01:13:57] costs okay so let me actually yeah so so uniform cost search what it does is it [01:13:59] uniform cost search what it does is it numerous states in an order of [01:14:01] numerous states in an order of increasing past costs so and then in [01:14:05] increasing past costs so and then in this case we need to actually make an [01:14:07] this case we need to actually make an assumption here we need to assume that [01:14:09] assumption here we need to assume that the the cost is going to be non-negative [01:14:11] the the cost is going to be non-negative so so I'm making this assumption for [01:14:13] so so I'm making this assumption for uniform cost search so here is an [01:14:16] uniform cost search so here is an example of uniform cost search running [01:14:19] example of uniform cost search running oh we don't have internet I just yeah [01:14:21] oh we don't have internet I just yeah there is a video of uniform cost search [01:14:23] there is a video of uniform cost search running in action and if I have time now [01:14:25] running in action and if I have time now connect to internet and get it working [01:14:27] connect to internet and get it working but so so let's talk about the [01:14:29] but so so let's talk about the high-level idea of uniform cost search [01:14:32] high-level idea of uniform cost search so in uniform cost search we have three [01:14:34] so in uniform cost search we have three sets that we need to keep track one is [01:14:37] sets that we need to keep track one is explored set which is the states that we [01:14:40] explored set which is the states that we have found the optimal path you sort of [01:14:42] have found the optimal path you sort of stay say we're sure like how to get to [01:14:43] stay say we're sure like how to get to you have computed the best as possible [01:14:45] you have computed the best as possible to get there we are like done with them [01:14:47] to get there we are like done with them yeah then we have another set called a [01:14:50] yeah then we have another set called a frontier where this frontier are the [01:14:53] frontier where this frontier are the states that we have seen you've computed [01:14:56] states that we have seen you've computed like a cost of getting there like you [01:14:59] like a cost of getting there like you know somehow how to get there and what [01:15:00] know somehow how to get there and what would be the cost but you're just not [01:15:02] would be the cost but you're just not sure about it like you're not sure if [01:15:03] sure about it like you're not sure if that was the best way of getting there [01:15:05] that was the best way of getting there okay so [01:15:06] okay so so the frontier you can think of it as a [01:15:09] so the frontier you can think of it as a known unknown I know they exist but like [01:15:11] known unknown I know they exist but like I actually I'm not sure what's the [01:15:13] I actually I'm not sure what's the optimal way of getting there and then [01:15:14] optimal way of getting there and then finally we have this unexplored part of [01:15:17] finally we have this unexplored part of states and these unexplored part of [01:15:19] states and these unexplored part of state I haven't even seen them yet I I [01:15:20] state I haven't even seen them yet I I don't even know how to get there and you [01:15:22] don't even know how to get there and you can think of it as more of an unknown [01:15:24] can think of it as more of an unknown unknown so so that's like how you would [01:15:26] unknown so so that's like how you would think about these three [01:15:27] think about these three so let's actually work out an example [01:15:29] so let's actually work out an example for uniform cost search I'm actually [01:15:31] for uniform cost search I'm actually going to do this one so so I'm just [01:15:35] going to do this one so so I'm just gonna show how uniform cost search runs [01:15:37] gonna show how uniform cost search runs on this example so I said we are going [01:15:40] on this example so I said we are going to keep track of three sets unexplored [01:15:47] frontier and then explore it okay all [01:15:57] frontier and then explore it okay all right so everything ends up an [01:15:59] right so everything ends up an unexplored at the beginning [01:16:00] unexplored at the beginning a B C and D and what I want to do is I [01:16:03] a B C and D and what I want to do is I want to go from A to D right that's what [01:16:06] want to go from A to D right that's what I want to do okay so I want to find a [01:16:07] I want to do okay so I want to find a minimum pass cloth minimum cost path to [01:16:13] minimum pass cloth minimum cost path to get from A to D given that I have this [01:16:16] get from A to D given that I have this graph so what I'm gonna do is I'm gonna [01:16:18] graph so what I'm gonna do is I'm gonna take my initial State that's a I am [01:16:21] take my initial State that's a I am going to put a on my frontier and it [01:16:23] going to put a on my frontier and it costs zero to get to a because I'm just [01:16:26] costs zero to get to a because I'm just starting at a okay so that's on my [01:16:29] starting at a okay so that's on my frontier then the next step what I'm [01:16:30] frontier then the next step what I'm gonna do is I'm gonna pop off the thing [01:16:32] gonna do is I'm gonna pop off the thing with the lowest cost from my frontier [01:16:35] with the lowest cost from my frontier there's one thing on my frontier I'm [01:16:37] there's one thing on my frontier I'm just gonna pop off that one thing off my [01:16:40] just gonna pop off that one thing off my frontier I'm gonna put that to export [01:16:42] frontier I'm gonna put that to export the cost of getting to a is zero and [01:16:44] the cost of getting to a is zero and then what I'm going to do is after [01:16:47] then what I'm going to do is after popping it up from my frontier is I'm [01:16:50] popping it up from my frontier is I'm gonna see how I can get from a to any [01:16:53] gonna see how I can get from a to any other state so from a I can get to B [01:16:55] other state so from a I can get to B that's one option [01:16:56] that's one option and with the cost of one so from a I can [01:17:01] and with the cost of one so from a I can go to be the cost of one where else can [01:17:03] go to be the cost of one where else can I go [01:17:04] I go I can go to see speed acosta 100 okay so [01:17:09] I can go to see speed acosta 100 okay so what I just did is I move be from [01:17:13] what I just did is I move be from unexplored to frontier and then I I know [01:17:15] unexplored to frontier and then I I know how I to get there from a and I move to [01:17:17] how I to get there from a and I move to say to a frontier and I know how to get [01:17:19] say to a frontier and I know how to get there so now it's the next round I'm [01:17:23] there so now it's the next round I'm looking at my frontier [01:17:24] looking at my frontier he's not on my frontier anymore it's an [01:17:26] he's not on my frontier anymore it's an export and I'm gonna pop up the thing [01:17:29] export and I'm gonna pop up the thing with the best cost of my frontier [01:17:31] with the best cost of my frontier well what is that that's B so I'm going [01:17:35] well what is that that's B so I'm going to move B to my export the way the best [01:17:39] to move B to my export the way the best way to get to B I already know that [01:17:41] way to get to B I already know that right that's from A to B everything is [01:17:43] right that's from A to B everything is good okay so now that have popped off B [01:17:46] good okay so now that have popped off B from my frontier I'm gonna look at me [01:17:48] from my frontier I'm gonna look at me and see what state I can get to from [01:17:50] and see what state I can get to from from P I can go to a but a is already in [01:17:53] from P I can go to a but a is already in export like I already know the best way [01:17:55] export like I already know the best way to get to a so so there is no reason to [01:17:57] to get to a so so there is no reason to do that from B I can get to see and if I [01:18:01] do that from B I can get to see and if I want to get to C then I can actually get [01:18:03] want to get to C then I can actually get to see with the cost of 1 plus whatever [01:18:06] to see with the cost of 1 plus whatever cost of B is already 1 so what I'm gonna [01:18:10] cost of B is already 1 so what I'm gonna do is I'm gonna erase this because [01:18:12] do is I'm gonna erase this because there's a better way of getting there [01:18:14] there's a better way of getting there and that's from me okay and then from me [01:18:19] and that's from me okay and then from me I can get to D so I'm gonna move D from [01:18:22] I can get to D so I'm gonna move D from an export to front here I can get to it [01:18:26] an export to front here I can get to it from B and then how do I get to it from [01:18:28] from B and then how do I get to it from me there's a cost of 1 0 1 alright [01:18:32] me there's a cost of 1 0 1 alright because 100 plus cost of getting to [01:18:36] because 100 plus cost of getting to alright so I'm done exploring everything [01:18:40] alright so I'm done exploring everything I can do from be going back to my [01:18:42] I can do from be going back to my frontier again so these two are not on [01:18:44] frontier again so these two are not on my frontier I just have C and D on my [01:18:46] my frontier I just have C and D on my frontier [01:18:47] frontier I'm gonna pop off the thing with the [01:18:49] I'm gonna pop off the thing with the best cost that is C I'm gonna move that [01:18:52] best cost that is C I'm gonna move that to export with the cost of tooth and the [01:18:56] to export with the cost of tooth and the way to eat the best way to get that is [01:18:58] way to eat the best way to get that is from B okay so we're done with C and [01:19:01] from B okay so we're done with C and then we're gonna see where we can go [01:19:03] then we're gonna see where we can go from C from C I can go to a well that's [01:19:06] from C from C I can go to a well that's done that's already on export and export [01:19:08] done that's already on export and export said I'm not going to touch that similar [01:19:10] said I'm not going to touch that similar thing with P already in export don't [01:19:12] thing with P already in export don't need to worry about that from C I can [01:19:14] need to worry about that from C I can get to D right and if I want to get to D [01:19:17] get to D right and if I want to get to D from [01:19:18] from CEO what would be the cost of that it [01:19:20] CEO what would be the cost of that it would be two plus one so I can update [01:19:22] would be two plus one so I can update this and have three and I can update the [01:19:26] this and have three and I can update the way to get to D from here and then we're [01:19:31] way to get to D from here and then we're done we go to frontier the only thing [01:19:34] done we go to frontier the only thing that's left on the frontier is D I'm [01:19:36] that's left on the frontier is D I'm gonna just pop that off and then I'm [01:19:38] gonna just pop that off and then I'm gonna add that to exports and that is [01:19:40] gonna add that to exports and that is three and that's what I have in my [01:19:42] three and that's what I have in my exports so the way to get from A to D is [01:19:44] exports so the way to get from A to D is is by taking this route and it costs one [01:19:48] is by taking this route and it costs one so a b c ND okay is that is that clear [01:19:54] so a b c ND okay is that is that clear alright okay so there are two slides [01:20:00] alright okay so there are two slides left and they're probably gonna kick us [01:20:02] left and they're probably gonna kick us out soon so I'll do this next time so so [01:20:05] out soon so I'll do this next time so so yeah the to two slides left is one is [01:20:07] yeah the to two slides left is one is going to just go over there the [01:20:10] going to just go over there the pseudocode so take a look at that the [01:20:11] pseudocode so take a look at that the code is online and there's a small [01:20:13] code is online and there's a small theorem that says this is actually doing [01:20:15] theorem that says this is actually doing the right thing I'll talk about that [01:20:17] the right thing I'll talk about that next time ================================================================================ LECTURE 018 ================================================================================ Search 2 - A* | Stanford CS221: Artificial Intelligence (Autumn 2019) Source: https://www.youtube.com/watch?v=HEs1ZCvLH2s --- Transcript [00:00:05] okay so hi everyone so [00:00:09] okay so hi everyone so today's to continue talking about search [00:00:11] today's to continue talking about search so so that's what we were going to start [00:00:14] so so that's what we were going to start doing finish off some of this stuff [00:00:15] doing finish off some of this stuff we've started talking about last time [00:00:18] we've started talking about last time and then after that switch to some of [00:00:21] and then after that switch to some of the more interesting topics like [00:00:23] the more interesting topics like learning so a few announcements so the [00:00:25] learning so a few announcements so the solutions to the old exams are online [00:00:27] solutions to the old exams are online now so if you guys want to start [00:00:29] now so if you guys want to start studying for the exam you can do that so [00:00:32] studying for the exam you can do that so so start looking at some of those [00:00:34] so start looking at some of those problems I think that would be useful [00:00:36] problems I think that would be useful actually let me start with the search to [00:00:39] actually let me start with the search to lecture because I think that might be [00:00:43] lecture because I think that might be like that has a review of some of the [00:00:46] like that has a review of some of the topics we've talked about so it might be [00:00:48] topics we've talked about so it might be easier to do that also I'm not connected [00:00:50] easier to do that also I'm not connected to the network so we're not going to do [00:00:52] to the network so we're not going to do the questions or show the videos because [00:00:54] the questions or show the videos because I have I have a hard time connecting a [00:00:56] I have I have a hard time connecting a network in this room ok all right so so [00:00:58] network in this room ok all right so so let's start continue talking about [00:01:01] let's start continue talking about search so if you guys remember we had [00:01:05] search so if you guys remember we had this the city block problem so let's go [00:01:08] this the city block problem so let's go back to that problem and let's just try [00:01:09] back to that problem and let's just try to do a review of some of the some of [00:01:11] to do a review of some of the some of the search search algorithms we talked [00:01:14] the search search algorithms we talked about last time so so suppose you want [00:01:17] about last time so so suppose you want to travel from city 1 to city n only [00:01:20] to travel from city 1 to city n only going forward and then from city n you [00:01:22] going forward and then from city n you want to go backwards so and back to city [00:01:24] want to go backwards so and back to city 1 going only backwards okay so you so [00:01:27] 1 going only backwards okay so you so the problem statement is kind of like [00:01:30] the problem statement is kind of like this you're starting in city 1 you're [00:01:35] this you're starting in city 1 you're going you're going forward and you're [00:01:37] going you're going forward and you're getting to some city n so maybe you were [00:01:39] getting to some city n so maybe you were doing out on this and then after that [00:01:42] doing out on this and then after that you want to go backwards and get to get [00:01:45] you want to go backwards and get to get to city one again so you're going to [00:01:47] to city one again so you're going to some of these cities so that's the goal [00:01:50] some of these cities so that's the goal and then the cost of going from any city [00:01:53] and then the cost of going from any city I to city J is equal to CIJ ok so so [00:01:57] I to city J is equal to CIJ ok so so that sells so the question is what which [00:02:00] that sells so the question is what which one of these following algorithms could [00:02:02] one of these following algorithms could you use to solve this problem and it [00:02:04] you use to solve this problem and it could be multiple of them so we have [00:02:06] could be multiple of them so we have depth first search breadth first search [00:02:08] depth first search breadth first search dynamic programming and uniform cost [00:02:10] dynamic programming and uniform cost search and these were the algorithms we [00:02:12] search and these were the algorithms we talked about last time [00:02:13] talked about last time so maybe just talk to your neighbors for [00:02:16] so maybe just talk to your neighbors for a minute [00:02:17] a minute and then we can do votes on each one of [00:02:19] and then we can do votes on each one of these yes [00:02:21] these yes listen pasa pasa okay let me take that [00:02:27] listen pasa pasa okay let me take that again thank you thank you for so let's [00:03:05] again thank you thank you for so let's maybe start talking about this so how [00:03:08] maybe start talking about this so how about depth-first search like how many [00:03:10] about depth-first search like how many will saying we can use their search how [00:03:13] will saying we can use their search how many people think we can't use that [00:03:15] many people think we can't use that search it's very like good split so so [00:03:21] search it's very like good split so so some of the people think we can't use [00:03:24] some of the people think we can't use that their search what are some reasons [00:03:26] that their search what are some reasons maybe yeah so here we are basically [00:03:34] maybe yeah so here we are basically going from city one to see the end each [00:03:36] going from city one to see the end each one of these edges have a cost of CIJ [00:03:39] one of these edges have a cost of CIJ I'm just saying CIJ is greater than or [00:03:41] I'm just saying CIJ is greater than or equal to zero that's the only thing I'm [00:03:43] equal to zero that's the only thing I'm saying about CIJ but if remember that [00:03:45] saying about CIJ but if remember that first search you really wanted the cost [00:03:47] first search you really wanted the cost to just be equal to zero because if you [00:03:49] to just be equal to zero because if you remember that whole tree like the whole [00:03:51] remember that whole tree like the whole point of depth-first search was I could [00:03:53] point of depth-first search was I could just stop whenever I could find a [00:03:54] just stop whenever I could find a solution and we were assuming that the [00:03:56] solution and we were assuming that the cost of all the edges is just equal to [00:03:59] cost of all the edges is just equal to zero so so we can't really use that [00:04:00] zero so so we can't really use that search here because because our cost is [00:04:02] search here because because our cost is not zero so assuming that like not again [00:04:05] not zero so assuming that like not again you know that reasoning how about [00:04:07] you know that reasoning how about breadth-first search can be used for [00:04:09] breadth-first search can be used for search [00:04:22] so so that's a good way so what [00:04:24] so so that's a good way so what suggesting is can we think about the [00:04:26] suggesting is can we think about the problem as going from city one to city [00:04:28] problem as going from city one to city in and then after that like introduce [00:04:31] in and then after that like introduce like a whole new problem that continues [00:04:33] like a whole new problem that continues that and starts from CDN and goes to [00:04:35] that and starts from CDN and goes to city one let me get back to that points [00:04:37] city one let me get back to that points like in a second because like you could [00:04:39] like in a second because like you could potentially think about that actually [00:04:41] potentially think about that actually like that might be an interesting way of [00:04:42] like that might be an interesting way of thinking about it but but irrespective [00:04:44] thinking about it but but irrespective of that I can't use depth or search so [00:04:46] of that I can't use depth or search so I'm so far I'm just talking about depth [00:04:47] I'm so far I'm just talking about depth or search irrespective of how I'm [00:04:49] or search irrespective of how I'm looking at the problem the costs are [00:04:51] looking at the problem the costs are going to be nonzero so because the costs [00:04:54] going to be nonzero so because the costs are going to be nonzero I can't use that [00:04:56] are going to be nonzero I can't use that first search so so let's talk about [00:04:58] first search so so let's talk about matters so how about birth research can [00:05:00] matters so how about birth research can I use breadth-first search you cannot [00:05:09] I use breadth-first search you cannot use fresh first search here because for [00:05:11] use fresh first search here because for breadth-first search if you remember you [00:05:13] breadth-first search if you remember you really wanted all the costs to be the [00:05:14] really wanted all the costs to be the same they didn't need to be 0 date but [00:05:16] same they didn't need to be 0 date but they needed to be the same thing because [00:05:18] they needed to be the same thing because then you could just go over the levels [00:05:19] then you could just go over the levels and then here I'm not like I'm not [00:05:21] and then here I'm not like I'm not saying I'm not putting any restrictions [00:05:23] saying I'm not putting any restrictions on CIJ being the same thing ok so now [00:05:26] on CIJ being the same thing ok so now let's talk about dynamic programming how [00:05:28] let's talk about dynamic programming how about dynamic programming can be used [00:05:29] about dynamic programming can be used sine amock programming alright so that [00:05:33] sine amock programming alright so that looks right right like we could use [00:05:34] looks right right like we could use dynamic programming here everything [00:05:36] dynamic programming here everything looks ok CI J's are positive looks fine [00:05:41] looks ok CI J's are positive looks fine how about actually one question so so [00:05:46] how about actually one question so so don't I have cycles here we kind of [00:05:48] don't I have cycles here we kind of briefly talked about this already [00:05:49] briefly talked about this already so don't I have like this cycle here so [00:06:03] so don't I have like this cycle here so we could actually use dynamic [00:06:05] we could actually use dynamic programming here even if it kind of [00:06:06] programming here even if it kind of looks like we have a cycle and the [00:06:08] looks like we have a cycle and the reasons we can kind of use this trick [00:06:10] reasons we can kind of use this trick for we can basically draw this out again [00:06:15] for we can basically draw this out again and for going forward basically go all [00:06:18] and for going forward basically go all the way here and then after that we're [00:06:20] the way here and then after that we're going backwards kind of include the [00:06:23] going backwards kind of include the directionality too so all I am doing is [00:06:26] directionality too so all I am doing is I'm extending the state this [00:06:27] I'm extending the state this space to not just be the city but be the [00:06:31] space to not just be the city but be the city in addition to that be the [00:06:33] city in addition to that be the direction that we're going so if I'm in [00:06:34] direction that we're going so if I'm in city four here it's City for going [00:06:37] city four here it's City for going forward and if at some point in the [00:06:39] forward and if at some point in the future I'm in city and on a four again [00:06:40] future I'm in city and on a four again it's City four going backwards so I'll [00:06:43] it's City four going backwards so I'll keep track of both the city and the [00:06:45] keep track of both the city and the directionality and when I do that then [00:06:47] directionality and when I do that then I'm kind of breaking the cycle like I'm [00:06:49] I'm kind of breaking the cycle like I'm not putting any cycles here and I can [00:06:51] not putting any cycles here and I can actually use dynamic programming that [00:06:54] actually use dynamic programming that makes it and then uniform cost search [00:06:59] that dad also sounds good - right like [00:07:02] that dad also sounds good - right like uniform cost search you could actually [00:07:03] uniform cost search you could actually use that doesn't matter if you have [00:07:04] use that doesn't matter if you have cycles or not and then we have positive [00:07:06] cycles or not and then we have positive positive non negative costs so we could [00:07:09] positive non negative costs so we could use uniform cost search yeah all right [00:07:12] use uniform cost search yeah all right so this was just a quick review of some [00:07:13] so this was just a quick review of some of the things we talked about last time [00:07:15] of the things we talked about last time and another thing we talked about last [00:07:17] and another thing we talked about last time was this notion of state okay so so [00:07:20] time was this notion of state okay so so we started talking about three search [00:07:22] we started talking about three search algorithms and at some point we switch [00:07:25] algorithms and at some point we switch to dynamic programming and uniform cost [00:07:27] to dynamic programming and uniform cost search where we are where we don't need [00:07:30] search where we are where we don't need to like we don't need to have this [00:07:31] to like we don't need to have this exponential blow-up and the reason [00:07:33] exponential blow-up and the reason behind that was you're we have [00:07:35] behind that was you're we have memoization and in addition to that we [00:07:37] memoization and in addition to that we have this notion of state okay and so [00:07:40] have this notion of state okay and so what is a state a state is a summary of [00:07:41] what is a state a state is a summary of all past actions that are sufficient for [00:07:44] all past actions that are sufficient for us to choose the future optimally so so [00:07:47] us to choose the future optimally so so we need to be really careful about [00:07:48] we need to be really careful about choosing our state so in this previous [00:07:51] choosing our state so in this previous question we looked at past actions so if [00:07:54] question we looked at past actions so if you look at like all cities that you go [00:07:56] you look at like all cities that you go over it can be in City one then three [00:07:58] over it can be in City one then three and four five six and I see a three [00:08:01] and four five six and I see a three again so in terms of state the things [00:08:04] again so in terms of state the things that you want to keep track of is what [00:08:05] that you want to keep track of is what city you are in but in addition to that [00:08:07] city you are in but in addition to that you want to have the directionality [00:08:08] you want to have the directionality because you need to know like where you [00:08:11] because you need to know like where you are and how you're getting back okay so [00:08:14] are and how you're getting back okay so and we did a couple of examples around [00:08:16] and we did a couple of examples around that trying to figure out what is what [00:08:18] that trying to figure out what is what is like a specific notion of state for [00:08:20] is like a specific notion of state for various problems all right so so we [00:08:25] various problems all right so so we started last time talking about search [00:08:27] started last time talking about search problems and and we sort of formalizing [00:08:29] problems and and we sort of formalizing it so if you remember our paradigm of [00:08:30] it so if you remember our paradigm of modeling and inference and learning we [00:08:32] modeling and inference and learning we started kind of modeling search problems [00:08:35] started kind of modeling search problems using this formalism where we defined a [00:08:37] using this formalism where we defined a starting state that's a start and then [00:08:39] starting state that's a start and then we talked about the actions of [00:08:41] we talked about the actions of which is a function over States which [00:08:43] which is a function over States which returns all possible actions and then we [00:08:46] returns all possible actions and then we talked about the cost function so the [00:08:47] talked about the cost function so the cost function can take a state an action [00:08:50] cost function can take a state an action and tell us what is the cost of that [00:08:51] and tell us what is the cost of that that that that edge and then we talked [00:08:55] that that that edge and then we talked about the successor function which takes [00:08:57] about the successor function which takes the state in action and tells us where [00:08:58] the state in action and tells us where we end up at and at the end we had this [00:09:00] we end up at and at the end we had this is end function that was just checking [00:09:03] is end function that was just checking if you're in an end state or not so [00:09:04] if you're in an end state or not so these were all the things that we needed [00:09:06] these were all the things that we needed to define a search problem and we kind [00:09:09] to define a search problem and we kind of tried that in a couple of examples to [00:09:11] of tried that in a couple of examples to trim' example the city example of that [00:09:14] trim' example the city example of that and then after talking about these these [00:09:17] and then after talking about these these different ways of thinking about search [00:09:20] different ways of thinking about search problems we started talking about [00:09:23] problems we started talking about various types of inference algorithm so [00:09:25] various types of inference algorithm so we talked about tree search so depth [00:09:27] we talked about tree search so depth first search breadth first search depth [00:09:28] first search breadth first search depth research with iterative deepening [00:09:30] research with iterative deepening backtracking search and then after that [00:09:34] backtracking search and then after that we talked about some of these graph [00:09:35] we talked about some of these graph search type algorithms like uniform cost [00:09:37] search type algorithms like uniform cost search and and dynamic programming so [00:09:40] search and and dynamic programming so last time we did an example of uniform [00:09:46] last time we did an example of uniform cost search but we didn't get to prove [00:09:48] cost search but we didn't get to prove the correctness of it so I want to [00:09:50] the correctness of it so I want to switch to some of the last last last [00:09:53] switch to some of the last last last times slides to just go over this quick [00:09:57] times slides to just go over this quick theorem and then after that just switch [00:09:58] theorem and then after that just switch back to to this lecture ok so uniform [00:10:01] back to to this lecture ok so uniform cost search like if you remember what we [00:10:03] cost search like if you remember what we were doing in uniform cost search we had [00:10:05] were doing in uniform cost search we had three different sets [00:10:06] three different sets we had an export set which was basically [00:10:09] we had an export set which was basically the set of states that we have visited [00:10:10] the set of states that we have visited and we are sure how to get to them and [00:10:12] and we are sure how to get to them and we know the optimal path and we know [00:10:14] we know the optimal path and we know everything about them we had this [00:10:16] everything about them we had this frontier set which was a set with a set [00:10:19] frontier set which was a set with a set of states that we have got to them but [00:10:22] of states that we have got to them but we are not sure if the cost that we have [00:10:24] we are not sure if the cost that we have is the best cost cost there might be a [00:10:27] is the best cost cost there might be a better way of getting to them and we [00:10:28] better way of getting to them and we don't know like we are not sure yet and [00:10:30] don't know like we are not sure yet and then we have the uh neck Sports set of [00:10:32] then we have the uh neck Sports set of states which are basically states that [00:10:34] states which are basically states that we haven't seen yet so we did this [00:10:37] we haven't seen yet so we did this example where we started with all the [00:10:38] example where we started with all the states in the unexplored set and then we [00:10:40] states in the unexplored set and then we moved them to the frontier and then from [00:10:42] moved them to the frontier and then from the frontier we moved them to the export [00:10:44] the frontier we moved them to the export set so so this was the example that we [00:10:46] set so so this was the example that we did under board okay and then we [00:10:49] did under board okay and then we realized that like even if you have [00:10:51] realized that like even if you have cyclones we can actually do this [00:10:52] cyclones we can actually do this algorithm and then [00:10:53] algorithm and then we ended up finding the best path being [00:10:56] we ended up finding the best path being from A to B to C to D and that cost [00:10:58] from A to B to C to D and that cost three [00:10:59] three so let's actually implement uniform cost [00:11:03] so let's actually implement uniform cost search and so I think we didn't do this [00:11:05] search and so I think we didn't do this last time so going back to our set of so [00:11:11] last time so going back to our set of so we started writing up these algorithms [00:11:13] we started writing up these algorithms for search problems so we have we have [00:11:15] for search problems so we have we have written dynamic programming already and [00:11:17] written dynamic programming already and backtracking search so now we can we can [00:11:19] backtracking search so now we can we can try to kind of implement uniform cost [00:11:21] try to kind of implement uniform cost search and for doing so we need to have [00:11:23] search and for doing so we need to have this priority queue data structure so [00:11:25] this priority queue data structure so this is in a util file I'm just showing [00:11:27] this is in a util file I'm just showing you what it like what functions it has [00:11:29] you what it like what functions it has it has an update function and it has a [00:11:31] it has an update function and it has a remove min function so it's just a data [00:11:34] remove min function so it's just a data structure that I'm gonna use for my [00:11:35] structure that I'm gonna use for my frontier [00:11:36] frontier because like my frontier I'm popping off [00:11:37] because like my frontier I'm popping off things off my frontier so I'm going to [00:11:40] things off my frontier so I'm going to use this data structure alright so let's [00:11:43] use this data structure alright so let's go back to uniform cost search so we're [00:11:46] go back to uniform cost search so we're going to define this frontier where [00:11:48] going to define this frontier where you're adding states to it from [00:11:50] you're adding states to it from unexplored set we're adding states to [00:11:53] unexplored set we're adding states to the frontier okay and it's going to be a [00:11:55] the frontier okay and it's going to be a priority queue so so we have that data [00:11:57] priority queue so so we have that data structure because we just imported util [00:11:59] structure because we just imported util and you're going to basically add this [00:12:03] and you're going to basically add this start state with a cost of 0 to the [00:12:05] start state with a cost of 0 to the frontier so that's the first thing we do [00:12:06] frontier so that's the first thing we do and then after that like while the [00:12:09] and then after that like while the frontier is not empty so while true what [00:12:12] frontier is not empty so while true what we were going to do is we're going to [00:12:14] we were going to do is we're going to remove the minimum past cough an element [00:12:17] remove the minimum past cough an element from the frontier so so basically just [00:12:20] from the frontier so so basically just pop off the frontier the best thing that [00:12:23] pop off the frontier the best thing that exists there and just move that to the [00:12:24] exists there and just move that to the explored set okay so when I pop off the [00:12:27] explored set okay so when I pop off the thing from the frontier basically I get [00:12:28] thing from the frontier basically I get this past cost and I get the state ok [00:12:31] this past cost and I get the state ok all right so so if we are in an end [00:12:35] all right so so if we are in an end state then you're just going to return [00:12:37] state then you're just going to return that pass cost with the history I'm not [00:12:38] that pass cost with the history I'm not putting the history here for now I'm [00:12:40] putting the history here for now I'm just returning the cost okay so after [00:12:43] just returning the cost okay so after popping off a state from the frontier [00:12:45] popping off a state from the frontier the thing we were doing was we were [00:12:46] the thing we were doing was we were adding the children of that so the way [00:12:49] adding the children of that so the way we do that is we're going to use this [00:12:51] we do that is we're going to use this successor and cost function that we [00:12:52] successor and cost function that we defined last time so we can basically [00:12:55] defined last time so we can basically iterate over action new state and costs [00:12:58] iterate over action new state and costs in the successor and cost function and [00:12:59] in the successor and cost function and basically update our frontier by adding [00:13:03] basically update our frontier by adding these new States to it ok and then the [00:13:06] these new States to it ok and then the cost that you're going to [00:13:07] cost that you're going to it's cost post past cost if that is [00:13:09] it's cost post past cost if that is better so so that's what the update [00:13:12] better so so that's what the update function of the frontier does and that's [00:13:14] function of the frontier does and that's pretty much it like that is uniform cost [00:13:16] pretty much it like that is uniform cost search you add stuff to the frontier you [00:13:18] search you add stuff to the frontier you pop up stuff from the frontier and and [00:13:20] pop up stuff from the frontier and and that way you explore them you move [00:13:22] that way you explore them you move things from now on exports it to the [00:13:23] things from now on exports it to the export set so let's just try that out it [00:13:26] export set so let's just try that out it looks like it is doing the right thing [00:13:28] looks like it is doing the right thing so it got the same value as dynamic [00:13:31] so it got the same value as dynamic programming so looks like it kind of [00:13:34] programming so looks like it kind of works okay so this code is also online [00:13:38] works okay so this code is also online so if you want to take a look at it [00:13:40] so if you want to take a look at it later actually this is not what I want [00:13:42] later actually this is not what I want to do yeah okay all right so so that was [00:13:50] to do yeah okay all right so so that was and and here's also the pseudo code of [00:13:52] and and here's also the pseudo code of uniform cost search okay okay so we have [00:13:59] uniform cost search okay okay so we have between there's a question right so what [00:14:05] between there's a question right so what sort of the question is what's the [00:14:06] sort of the question is what's the runtime of uniform cost search so the [00:14:08] runtime of uniform cost search so the wrong time of uniform cost search is [00:14:09] wrong time of uniform cost search is order of n log n where the log n is [00:14:12] order of n log n where the log n is because of like the bookkeeping of the [00:14:15] because of like the bookkeeping of the priority queue and you're going over all [00:14:17] priority queue and you're going over all the edges so so he can think of n here [00:14:21] the edges so so he can think of n here as the edges and worst case scenario if [00:14:23] as the edges and worst case scenario if you have a fully connected graph it's [00:14:25] you have a fully connected graph it's technically and squirt log n but in [00:14:28] technically and squirt log n but in practice we have sparse or graph so [00:14:30] practice we have sparse or graph so people usually refer to B that's just n [00:14:32] people usually refer to B that's just n log n where n is the number of states [00:14:34] log n where n is the number of states that you have explored and it's actually [00:14:35] that you have explored and it's actually not all of the states it's the states [00:14:37] not all of the states it's the states that you have explored okay and dynamic [00:14:39] that you have explored okay and dynamic programming is order of n so technically [00:14:42] programming is order of n so technically like dynamic programming is slightly [00:14:44] like dynamic programming is slightly better but really events actually go [00:14:48] better but really events actually go first give it is the only difference [00:14:50] first give it is the only difference between this and I stress that you just [00:14:52] between this and I stress that you just don't have them all [00:14:54] don't have them all yeah so what's the questions was the [00:14:56] yeah so what's the questions was the difference between this and the [00:14:57] difference between this and the Dijkstra's algorithm they're very [00:14:59] Dijkstra's algorithm they're very similar the only difference is this is [00:15:00] similar the only difference is this is trying to solve a search problem so [00:15:02] trying to solve a search problem so you're not I like exploring all the [00:15:04] you're not I like exploring all the states when you get to the solution you [00:15:05] states when you get to the solution you get to the solution and then you just [00:15:07] get to the solution and then you just return that Dijkstra you're going from [00:15:09] return that Dijkstra you're going from you're basically exploring all of all of [00:15:11] you're basically exploring all of all of the states in there in your graph all [00:15:17] the states in there in your graph all right sounds good okay so I just want to [00:15:21] right sounds good okay so I just want to quickly talk about this correctness [00:15:23] quickly talk about this correctness theorem so so for uniform cost search we [00:15:25] theorem so so for uniform cost search we actually have a correctness theorem [00:15:27] actually have a correctness theorem which basically says uniform cost search [00:15:29] which basically says uniform cost search does the right thing so what basically [00:15:32] does the right thing so what basically the theorem says is if we have a state [00:15:34] the theorem says is if we have a state that we are popping off the frontier and [00:15:36] that we are popping off the frontier and we're moving it from the frontier to the [00:15:38] we're moving it from the frontier to the export then it's priority value which is [00:15:41] export then it's priority value which is equal to past cost of s is actually the [00:15:44] equal to past cost of s is actually the minimum cost of getting to to the state [00:15:46] minimum cost of getting to to the state s so what this is saying is let's say [00:15:49] s so what this is saying is let's say that this is my export set so this is my [00:15:51] that this is my export set so this is my exports and then right here is my [00:15:55] exports and then right here is my frontier and I have a start state okay [00:15:59] frontier and I have a start state okay and then I have some state s that right [00:16:04] and then I have some state s that right now I've decided that I am popping off s [00:16:06] now I've decided that I am popping off s from the frontier to export because that [00:16:09] from the frontier to export because that is the best thing that has the best pass [00:16:11] is the best thing that has the best pass cost so what the theorem says is this [00:16:14] cost so what the theorem says is this this path that I have from and start to [00:16:17] this path that I have from and start to s is the shortest path possible to get [00:16:20] s is the shortest path possible to get to get to the state s ok so the way to [00:16:23] to get to the state s ok so the way to prove that is to show that the cost of [00:16:25] prove that is to show that the cost of this path is lower than any other path [00:16:27] this path is lower than any other path paths that go from s start to s so let's [00:16:32] paths that go from s start to s so let's say there is some other path this green [00:16:34] say there is some other path this green one that goes from s star to s some [00:16:37] one that goes from s star to s some other way and the way that it goes to s [00:16:40] other way and the way that it goes to s is it should probably leave the export [00:16:43] is it should probably leave the export set of states from some state called t [00:16:46] set of states from some state called t maybe to some ghost go to some other [00:16:48] maybe to some ghost go to some other state nu and then from you go to us UN s [00:16:52] state nu and then from you go to us UN s can be the same thing but the point of [00:16:53] can be the same thing but the point of it is if I have this other path that [00:16:55] it is if I have this other path that goes through s it needs to leave the [00:16:58] goes through s it needs to leave the export set from some state so what I [00:17:02] export set from some state so what I want to show is I want to show that the [00:17:04] want to show is I want to show that the cost of [00:17:06] cost of the green line I want to show that that [00:17:10] the green line I want to show that that is greater than the cost of the black [00:17:12] is greater than the cost of the black line okay all right so the cost of the [00:17:14] line okay all right so the cost of the Green Line what is the cost of the Green [00:17:16] Green Line what is the cost of the Green Line it's going to be the cost to here [00:17:18] Line it's going to be the cost to here and then cost of T to you and the cost [00:17:20] and then cost of T to you and the cost of u to s so I can say well this cost is [00:17:23] of u to s so I can say well this cost is actually greater than or equal to [00:17:27] actually greater than or equal to priority of T because that is the cost [00:17:29] priority of T because that is the cost of getting to T plus cost of C to you [00:17:35] of getting to T plus cost of C to you and I'm just dropping that's this last [00:17:38] and I'm just dropping that's this last part the u to s I'm just dropping that [00:17:39] part the u to s I'm just dropping that okay so cost of green is like at least [00:17:43] okay so cost of green is like at least equal to priority of t plus cost of T TT [00:17:46] equal to priority of t plus cost of T TT to you okay well what is that equal to [00:17:49] to you okay well what is that equal to priority is just a number right it's [00:17:51] priority is just a number right it's just a number that you're getting off [00:17:52] just a number that you're getting off the priority queue so that is actually [00:17:55] the priority queue so that is actually equal to past cost of t plus cost of t [00:18:04] equal to past cost of t plus cost of t to you and and this value is going to [00:18:09] to you and and this value is going to actually be greater than or equal to [00:18:11] actually be greater than or equal to priority do you well why is that because [00:18:15] priority do you well why is that because if you is in my frontier I visited you [00:18:18] if you is in my frontier I visited you so I already have some priority value [00:18:20] so I already have some priority value for you and the value that I've assigned [00:18:22] for you and the value that I've assigned for the priority of you is either equal [00:18:25] for the priority of you is either equal to this path cost of T plus cos of T to [00:18:27] to this path cost of T plus cos of T to you because I've like seen that to use [00:18:29] you because I've like seen that to use in my export use in my frontier so I've [00:18:31] in my export use in my frontier so I've definitely seen this or it is something [00:18:33] definitely seen this or it is something better that I don't know what it is [00:18:35] better that I don't know what it is right so so priority of U is going to be [00:18:37] right so so priority of U is going to be less than or equal to this path cost of [00:18:40] less than or equal to this path cost of T plus cost of T to you okay and well [00:18:43] T plus cost of T to you okay and well what do I know in terms of priority of [00:18:45] what do I know in terms of priority of you and priority of s well I know [00:18:48] you and priority of s well I know priority of U is going to be greater [00:18:50] priority of U is going to be greater than or equal to priority of this well [00:18:54] than or equal to priority of this well why is that because I already know I'm [00:18:56] why is that because I already know I'm popping off s next [00:18:57] popping off s next I'm not topping off you like like I I [00:18:59] I'm not topping off you like like I I know I'm popping off the the thing that [00:19:00] know I'm popping off the the thing that has the least amount of priority and the [00:19:03] has the least amount of priority and the least value here and that's s and and [00:19:06] least value here and that's s and and well that is equal to cost of the black [00:19:09] well that is equal to cost of the black line [00:19:13] right so that was just a quick like [00:19:16] right so that was just a quick like proof of why uniform cost search always [00:19:19] proof of why uniform cost search always returns kind of the best minimum cost [00:19:22] returns kind of the best minimum cost path all right so let's go to the slides [00:19:29] path all right so let's go to the slides again so just the comparison quick [00:19:32] again so just the comparison quick comparison between dynamic programming [00:19:34] comparison between dynamic programming of uniform cost search so we talked [00:19:36] of uniform cost search so we talked about dynamic programming we know it [00:19:38] about dynamic programming we know it doesn't allow cycles but in terms of [00:19:40] doesn't allow cycles but in terms of action cost it can be anything like it [00:19:42] action cost it can be anything like it look you can have negative cost you can [00:19:44] look you can have negative cost you can have positive cost and in terms of [00:19:47] have positive cost and in terms of complexity is order N and then uniform [00:19:50] complexity is order N and then uniform cost search you can have cycles so that [00:19:52] cost search you can have cycles so that is cool but the problem is the costs [00:19:54] is cool but the problem is the costs need to be non-negative and it's order n [00:19:56] need to be non-negative and it's order n log N and if you have if you end up in a [00:19:58] log N and if you have if you end up in a situation where you have cycles and your [00:20:01] situation where you have cycles and your costs are actually negative there's this [00:20:02] costs are actually negative there's this other algorithm called bellman-ford that [00:20:04] other algorithm called bellman-ford that we are not talking about in this class [00:20:05] we are not talking about in this class but you could actually like have a [00:20:07] but you could actually like have a different algorithm that addresses those [00:20:09] different algorithm that addresses those sort of the settings okay all right okay [00:20:16] sort of the settings okay all right okay so that was that was this idea of in [00:20:20] so that was that was this idea of in France right now we have like a good [00:20:22] France right now we have like a good series of ways of going about doing [00:20:24] series of ways of going about doing inference for search problems you have [00:20:26] inference for search problems you have formalize them and now the plan for this [00:20:28] formalize them and now the plan for this lecture is is to think about learning so [00:20:31] lecture is is to think about learning so how are we going to go about learning [00:20:32] how are we going to go about learning when we have search problems and when [00:20:35] when we have search problems and when our search problem is not fully [00:20:36] our search problem is not fully specified and there are things in the [00:20:38] specified and there are things in the search problem that are not specified [00:20:40] search problem that are not specified and you want to learn what they are like [00:20:41] and you want to learn what they are like the cost okay so so that's going to be [00:20:45] the cost okay so so that's going to be the first part of the lecture and then [00:20:46] the first part of the lecture and then towards the end of the lecture we are [00:20:48] towards the end of the lecture we are going to talk about a few other [00:20:49] going to talk about a few other algorithms that make things faster so so [00:20:52] algorithms that make things faster so so smarter ways of making things faster [00:20:54] smarter ways of making things faster we're going to talk about a star and [00:20:56] we're going to talk about a star and some sort of relaxation type strategies [00:20:59] some sort of relaxation type strategies all right so so let's go back to our [00:21:03] all right so so let's go back to our transportation problem so so this was [00:21:05] transportation problem so so this was our transportation problem where we had [00:21:09] our transportation problem where we had a start state and we can either walk and [00:21:12] a start state and we can either walk and by walking we can go from state s to [00:21:14] by walking we can go from state s to state s plus 1 and that costs 1 or we [00:21:17] state s plus 1 and that costs 1 or we can take a tram a magic tram that takes [00:21:19] can take a tram a magic tram that takes us from state s to state to s and that [00:21:22] us from state s to state to s and that costs - ok [00:21:24] costs - ok want to get to state so we can formalize [00:21:27] want to get to state so we can formalize that a search problem we can like we saw [00:21:29] that a search problem we can like we saw it we saw this last time we can actually [00:21:31] it we saw this last time we can actually try to find what is the best path to get [00:21:34] try to find what is the best path to get from state 1 to any state and like we [00:21:36] from state 1 to any state and like we saw like past like walk walk tram tram [00:21:39] saw like past like walk walk tram tram tram walked round tram this is one [00:21:41] tram walked round tram this is one potential like optimal task that one can [00:21:43] potential like optimal task that one can get ok but the thing is the world is not [00:21:46] get ok but the thing is the world is not perfect right like modeling is actually [00:21:48] perfect right like modeling is actually really hard like it's not that we always [00:21:50] really hard like it's not that we always have this nice model with everything and [00:21:52] have this nice model with everything and we could end up in scenarios where we [00:21:54] we could end up in scenarios where we have a search problem and we don't [00:21:56] have a search problem and we don't actually know what the costs are our [00:21:58] actually know what the costs are our actions are so we don't actually know [00:22:00] actions are so we don't actually know what the cost of walking is or what the [00:22:02] what the cost of walking is or what the cost of traumas but maybe we actually [00:22:04] cost of traumas but maybe we actually have access to to this optimal path like [00:22:06] have access to to this optimal path like maybe I know the optimal path is walk [00:22:08] maybe I know the optimal path is walk walk tram tram tram walk tram tram but I [00:22:11] walk tram tram tram walk tram tram but I don't know what the costs are so the [00:22:13] don't know what the costs are so the point of learning is is to go about [00:22:15] point of learning is is to go about learning what these cost values are [00:22:18] learning what these cost values are based on these these optimal paths that [00:22:20] based on these these optimal paths that be half so so I want to actually learn [00:22:22] be half so so I want to actually learn the cost of walking is one and the cost [00:22:24] the cost of walking is one and the cost of travesty and this is actually a [00:22:27] of travesty and this is actually a common problem that we have like in [00:22:29] common problem that we have like in machine learning in general so like for [00:22:30] machine learning in general so like for example you might have data from how a [00:22:34] example you might have data from how a person does something or like how a [00:22:35] person does something or like how a person let's say like grasp an object [00:22:37] person let's say like grasp an object and and I have no idea what was the cost [00:22:39] and and I have no idea what was the cost that the person was optimizing to grasp [00:22:41] that the person was optimizing to grasp an object right but I have like the [00:22:43] an object right but I have like the trajectory I know like what the path [00:22:44] trajectory I know like what the path they took when they picked up an object [00:22:46] they took when they picked up an object so what I can do is if I have access to [00:22:49] so what I can do is if I have access to that path of how they picked up an [00:22:50] that path of how they picked up an object then from that I can actually [00:22:52] object then from that I can actually learn what was the cost function that [00:22:54] learn what was the cost function that they were optimizing because then I can [00:22:56] they were optimizing because then I can put that cost function maybe on a on a [00:22:57] put that cost function maybe on a on a robot that does the same thing [00:23:07] that's a good question so the question [00:23:08] that's a good question so the question is is it possible to have multiple [00:23:09] is is it possible to have multiple solutions yes so we are gonna actually [00:23:11] solutions yes so we are gonna actually see that like later like what sort of [00:23:13] see that like later like what sort of like solutions are we going to get or [00:23:15] like solutions are we going to get or they're there there could be cases where [00:23:16] they're there there could be cases where we have multiple solutions the ratio of [00:23:18] we have multiple solutions the ratio of it is the thing that matters so if you [00:23:20] it is the thing that matters so if you have like walk is one tram is for if you [00:23:23] have like walk is one tram is for if you get to an eight you kind of get the same [00:23:25] get to an eight you kind of get the same sort of behavior and then it also [00:23:27] sort of behavior and then it also depends on what sort of data you have [00:23:28] depends on what sort of data you have like if your data allowed you to [00:23:30] like if your data allowed you to actually recover that that's true [00:23:32] actually recover that that's true solution so so we're gonna actually talk [00:23:34] solution so so we're gonna actually talk about all this cases okay all right okay [00:23:38] about all this cases okay all right okay so if you think about it when the search [00:23:42] so if you think about it when the search problem we were trying to solve this was [00:23:44] problem we were trying to solve this was the inference problem was when we were [00:23:46] the inference problem was when we were given kind of a search formulation and [00:23:48] given kind of a search formulation and we are given a cost and our goal was to [00:23:51] we are given a cost and our goal was to find a sequence of actions this optimal [00:23:53] find a sequence of actions this optimal sequence of actions that was the [00:23:54] sequence of actions that was the shortest path or the best path and and [00:23:56] shortest path or the best path and and some thought some way and this is a [00:23:58] some thought some way and this is a forwards problem so search is this [00:24:00] forwards problem so search is this forward problem where you're given a [00:24:01] forward problem where you're given a cost and you want to find the sequence [00:24:03] cost and you want to find the sequence of actions okay so it's interesting [00:24:05] of actions okay so it's interesting because learning in some sense is an [00:24:08] because learning in some sense is an inverse problem it's the inverse of [00:24:10] inverse problem it's the inverse of search so the inverse of search is if [00:24:13] search so the inverse of search is if you give me that sequence of actions [00:24:15] you give me that sequence of actions that's the best sequence of actions that [00:24:16] that's the best sequence of actions that you've got then can you figure out what [00:24:18] you've got then can you figure out what the cost this so so in some sense you [00:24:20] the cost this so so in some sense you can think of learning as this inverse [00:24:22] can think of learning as this inverse problem of search N and we are going to [00:24:24] problem of search N and we are going to kind of address that so I'm going to go [00:24:27] kind of address that so I'm going to go over one example to talk about learning [00:24:30] over one example to talk about learning and I'm actually going to use the [00:24:33] and I'm actually going to use the notation of the machine learning [00:24:35] notation of the machine learning lectures that we had at the beginning of [00:24:38] lectures that we had at the beginning of like last week basically so let's say [00:24:42] like last week basically so let's say that we have maybe I can draw this so [00:24:51] that we have maybe I can draw this so let's say that we have a search problem [00:24:53] let's say that we have a search problem without costs and that's our input so if [00:24:56] without costs and that's our input so if so so we are kind of framing this [00:24:58] so so we are kind of framing this problem of learning as a prediction [00:25:00] problem of learning as a prediction problem and if you remember prediction [00:25:01] problem and if you remember prediction problems and prediction problems we had [00:25:03] problems and prediction problems we had an input so our input was X okay and in [00:25:09] an input so our input was X okay and in this case you're saying our input is a [00:25:11] this case you're saying our input is a search problem search problem [00:25:15] search problem search problem without costs okay so that is my input [00:25:20] without costs okay so that is my input and then we have outputs and in this [00:25:24] and then we have outputs and in this case my my output Y is this optimal [00:25:27] case my my output Y is this optimal sequence of actions that one could get [00:25:29] sequence of actions that one could get yet so it's a solution path so it's a [00:25:32] yet so it's a solution path so it's a solution and what I want to do is I want [00:25:37] solution and what I want to do is I want to look like if you remember machine [00:25:38] to look like if you remember machine learning the idea was I would want to [00:25:40] learning the idea was I would want to find this predictor this F function f [00:25:42] find this predictor this F function f that would take an input f of X and then [00:25:44] that would take an input f of X and then it would basically return the solution [00:25:47] it would basically return the solution path and in other settings that it would [00:25:49] path and in other settings that it would generalize so so that was kind of the [00:25:50] generalize so so that was kind of the idea that we explored in machine [00:25:52] idea that we explored in machine learning and you kind of want to do the [00:25:53] learning and you kind of want to do the same thing in here so let's start with [00:25:56] same thing in here so let's start with I'm gonna draw that here so let's start [00:25:59] I'm gonna draw that here so let's start with an example where we are in city one [00:26:01] with an example where we are in city one and then maybe we walk to City 2 so we [00:26:05] and then maybe we walk to City 2 so we can walk to city 2 and then from there [00:26:08] can walk to city 2 and then from there maybe I have two options I can keep [00:26:10] maybe I have two options I can keep walking to get to City four so I can do [00:26:13] walking to get to City four so I can do walk walk walk [00:26:15] walk walk walk or maybe I can take the tram and end up [00:26:19] or maybe I can take the tram and end up in City four and and the thing is I [00:26:24] in City four and and the thing is I don't actually know what the costs of [00:26:25] don't actually know what the costs of these these actions are I don't know [00:26:28] these these actions are I don't know what the cost of don't walk is what the [00:26:30] what the cost of don't walk is what the cost of tram is but one thing I know is [00:26:34] cost of tram is but one thing I know is that my my solution path my Y is equal [00:26:37] that my my solution path my Y is equal to walk walk walk so so one way to go [00:26:47] to walk walk walk so so one way to go about this is to actually start with [00:26:49] about this is to actually start with some initialization of these costs so [00:26:52] some initialization of these costs so the way we are defining these costs are [00:26:54] the way we are defining these costs are going to be I'm going to use the word [00:26:57] going to be I'm going to use the word I'm gonna write here maybe oh just right [00:27:01] I'm gonna write here maybe oh just right up here I'm going to use W like because [00:27:07] up here I'm going to use W like because I want to use the same notation as the [00:27:09] I want to use the same notation as the learning lectures so W is going to be [00:27:12] learning lectures so W is going to be the weight that each one of my actions I [00:27:15] the weight that each one of my actions I have two actions in this case I can [00:27:17] have two actions in this case I can either walk or I can take the tram so [00:27:19] either walk or I can take the tram so I'm going to call them action one so W [00:27:21] I'm going to call them action one so W of action one is W of walking and then W [00:27:27] of action one is W of walking and then W of action [00:27:28] of action is w of taking the tram so action to is [00:27:32] is w of taking the tram so action to is taking the track so I'm defining these W [00:27:35] taking the track so I'm defining these W values and the way I'm defining these [00:27:37] values and the way I'm defining these weights is just as a function of actions [00:27:40] weights is just as a function of actions this could technically be a function of [00:27:41] this could technically be a function of state and actions but right now I'm just [00:27:43] state and actions but right now I'm just simplifying this and I'm saying the W's [00:27:46] simplifying this and I'm saying the W's is this values the cost of walking just [00:27:49] is this values the cost of walking just depend did the cost of going from 1 to 2 [00:27:52] depend did the cost of going from 1 to 2 it just depends on my action it doesn't [00:27:55] it just depends on my action it doesn't depend on what state I am in you could [00:27:56] depend on what state I am in you could imagine settings where it actually [00:27:58] imagine settings where it actually depends on like what city you are in to [00:27:59] depends on like what city you are in to okay so so then under that scenario what [00:28:03] okay so so then under that scenario what is the cost of cost of y it is going to [00:28:07] is the cost of cost of y it is going to be double your walk plus double your [00:28:09] be double your walk plus double your walk plus double your walk okay so what [00:28:12] walk plus double your walk okay so what I'm suggesting is let's just start with [00:28:14] I'm suggesting is let's just start with something let's just start with yeah [00:28:17] something let's just start with yeah let's just start with these weights so [00:28:19] let's just start with these weights so I'm gonna say walking costs three and [00:28:22] I'm gonna say walking costs three and it's always going to cost three again [00:28:24] it's always going to cost three again the reason it's always going to cost [00:28:26] the reason it's always going to cost three is I'm basically saying my weights [00:28:27] three is I'm basically saying my weights only depend on the action [00:28:29] only depend on the action they don't depend on the state so it's [00:28:30] they don't depend on the state so it's always going to cost three and I'm gonna [00:28:32] always going to cost three and I'm gonna say well why not let's just say the tram [00:28:35] say well why not let's just say the tram takes the cost of two okay so this [00:28:38] takes the cost of two okay so this doesn't like look right but like let's [00:28:40] doesn't like look right but like let's just say I assume this is the right [00:28:42] just say I assume this is the right solution okay so now what I want to do [00:28:44] solution okay so now what I want to do is I want to be able to update these [00:28:47] is I want to be able to update these weights update these values in a way [00:28:49] weights update these values in a way that I can get this optimal path that I [00:28:52] that I can get this optimal path that I have this this walk walk walk so how can [00:28:55] have this this walk walk walk so how can I do that so I started with this random [00:28:57] I do that so I started with this random initializations of what the weights are [00:28:59] initializations of what the weights are okay so now that I have done that I can [00:29:01] okay so now that I have done that I can I can try to figure out what is the [00:29:02] I can try to figure out what is the optimal optimal path fear based on these [00:29:05] optimal optimal path fear based on these weights so what is my prediction so that [00:29:07] weights so what is my prediction so that is y prime that is my prediction based [00:29:10] is y prime that is my prediction based on these weights that I've just set up [00:29:11] on these weights that I've just set up in terms of what what the optimal path [00:29:13] in terms of what what the optimal path is well what is that that is walked ran [00:29:15] is well what is that that is walked ran because this cost five and discussed [00:29:17] because this cost five and discussed nine so with these weights these random [00:29:20] nine so with these weights these random weights or them just come up with I'm [00:29:21] weights or them just come up with I'm going to pick what contra and that is my [00:29:24] going to pick what contra and that is my prediction okay so now what we want to [00:29:31] prediction okay so now what we want to do is you want to update our w's based [00:29:33] do is you want to update our w's based on the fact that our true label is walk [00:29:37] on the fact that our true label is walk walk walk and our prediction is walking [00:29:39] walk walk and our prediction is walking on [00:29:41] on and the algorithm that kind of does this [00:29:43] and the algorithm that kind of does this it's just like the most like silliest [00:29:45] it's just like the most like silliest thing possible so so what it does is [00:29:47] thing possible so so what it does is it's going to first look at the truth [00:29:49] it's going to first look at the truth value of W okay so it's going to look at [00:29:52] value of W okay so it's going to look at so so so the weights are starting from [00:29:54] so so so the weights are starting from so I decided that this guy is three and [00:29:57] so I decided that this guy is three and I decided that this guy's too and I'm [00:29:59] I decided that this guy's too and I'm gonna update them so I'm gonna look at [00:30:01] gonna update them so I'm gonna look at every action in this path and for every [00:30:04] every action in this path and for every action in this path I'm going to down [00:30:07] action in this path I'm going to down wait the the weight of that well why am [00:30:09] wait the the weight of that well why am I going to do that because I don't want [00:30:11] I going to do that because I don't want to penalize that right this is the true [00:30:12] to penalize that right this is the true thing I want the weight of the true [00:30:14] thing I want the weight of the true thing to be small so I see walk I'm like [00:30:16] thing to be small so I see walk I'm like okay so I see you walk the weight of [00:30:18] okay so I see you walk the weight of that was three I'm going to down rate [00:30:20] that was three I'm going to down rate that by one I'm gonna make that two I [00:30:22] that by one I'm gonna make that two I see walk again so I'm gonna bring that [00:30:25] see walk again so I'm gonna bring that one I see walk again I'm gonna subtract [00:30:30] one I see walk again I'm gonna subtract one again I end up at zero okay now I'm [00:30:34] one again I end up at zero okay now I'm going to go over my prediction and then [00:30:37] going to go over my prediction and then for every action I see here I'm going to [00:30:38] for every action I see here I'm going to bring it up bring the cost the weight up [00:30:41] bring it up bring the cost the weight up by one so I see walk again here I'm [00:30:44] by one so I see walk again here I'm going to bring it up by one so so these [00:30:47] going to bring it up by one so so these were subtract subtract subtract bring it [00:30:50] were subtract subtract subtract bring it up by one because it's over my Y Prime [00:30:52] up by one because it's over my Y Prime and then I see Tran and then because I [00:30:56] and then I see Tran and then because I see tram I'm going to bring this up by [00:30:59] see tram I'm going to bring this up by one and that ends up in three so my new [00:31:02] one and that ends up in three so my new weights here are going to be three the [00:31:06] weights here are going to be three the the weight of walk just became one and [00:31:10] the weight of walk just became one and then the weight of Tran just became [00:31:13] then the weight of Tran just became three okay and and now I can kind of [00:31:18] three okay and and now I can kind of repeat doing this and see if that gets [00:31:20] repeat doing this and see if that gets me this this optimal solution or not so [00:31:22] me this this optimal solution or not so I'm gonna try running my search [00:31:23] I'm gonna try running my search algorithm if I run my search algorithm [00:31:25] algorithm if I run my search algorithm this path and this path cost for this [00:31:28] this path and this path cost for this path cost three so I'm actually going to [00:31:30] path cost three so I'm actually going to get this path and this path so my new [00:31:32] get this path and this path so my new prediction is just going to be walk walk [00:31:33] prediction is just going to be walk walk walk there are going to be the same [00:31:34] walk there are going to be the same thing my weights are now going to change [00:31:36] thing my weights are now going to change I'm going to converge [00:31:40] so I'm talking about a very simplified [00:31:43] so I'm talking about a very simplified version of this but the idea is always [00:31:44] version of this but the idea is always one so the very simplified version of [00:31:46] one so the very simplified version of this is this version where I'm saying [00:31:47] this is this version where I'm saying the W is just depend on on actions if [00:31:50] the W is just depend on on actions if you if you make the weights depend on [00:31:52] you if you make the weights depend on state and actions there is a more [00:31:53] state and actions there is a more generalized form of this this is called [00:31:55] generalized form of this this is called the strip and the structure perceptron [00:31:57] the strip and the structure perceptron algorithm you'll talk about who briefly [00:31:59] algorithm you'll talk about who briefly talked about the diversion where there [00:32:01] talked about the diversion where there is a state action too but for in this [00:32:03] is a state action too but for in this case where it just depends on action [00:32:04] case where it just depends on action you're literally just bring it up by one [00:32:06] you're literally just bring it up by one or play whatever like whatever you're [00:32:08] or play whatever like whatever you're bringing it up here you got to bring it [00:32:09] bringing it up here you got to bring it down by the same thing so it's plus and [00:32:12] down by the same thing so it's plus and minus a what a raise so what am i doing [00:32:21] minus a what a raise so what am i doing to minus one so I'll get to that so so [00:32:23] to minus one so I'll get to that so so when I look at Y here right like this is [00:32:26] when I look at Y here right like this is the thing that I really want it so if I [00:32:28] the thing that I really want it so if I so when I see walk I realize that [00:32:30] so when I see walk I realize that walking was a good thing so I need to [00:32:32] walking was a good thing so I need to bring down the weight of that but if the [00:32:35] bring down the weight of that but if the weights that I already had like knew [00:32:37] weights that I already had like knew that walking is pretty good then like [00:32:39] that walking is pretty good then like the rates that I already had knew that [00:32:41] the rates that I already had knew that walking is pretty good I should like [00:32:42] walking is pretty good I should like cancel that out so that's why we're [00:32:44] cancel that out so that's why we're doing the plus one because like at this [00:32:46] doing the plus one because like at this stage like I knew walking is pretty good [00:32:48] stage like I knew walking is pretty good up here like like my prediction also [00:32:50] up here like like my prediction also said walk so if I'm subtracting it off [00:32:52] said walk so if I'm subtracting it off should add it to to kind of like get [00:32:54] should add it to to kind of like get them cancel that but like right here [00:32:57] them cancel that but like right here like I didn't know walking is good so [00:32:58] like I didn't know walking is good so I'm going to bring down the weight of [00:33:00] I'm going to bring down the weight of that and then bring up the weight of [00:33:02] that and then bring up the weight of Tran I mistakenly thought ram is the way [00:33:12] Tran I mistakenly thought ram is the way to go so to avoid that next time around [00:33:14] to go so to avoid that next time around I'm going to make the cost of [00:33:16] I'm going to make the cost of higher so I don't take that route it [00:33:18] higher so I don't take that route it anymore there's a questionnaire secured [00:33:21] anymore there's a questionnaire secured we like the only resource I'd like a [00:33:24] we like the only resource I'd like a sentence [00:33:24] sentence my primary another white pine is [00:33:27] my primary another white pine is different from walking yes but then like [00:33:30] different from walking yes but then like what if like we have like a long [00:33:31] what if like we have like a long sequence and white time is only [00:33:32] sequence and white time is only differently like one small location and [00:33:35] differently like one small location and like with that change awaits efficiently [00:33:38] like with that change awaits efficiently yeah so if so you're asking okay so if [00:33:41] yeah so if so you're asking okay so if my yny prom prime are kind of like the [00:33:43] my yny prom prime are kind of like the same thing walk walk walk or something [00:33:44] same thing walk walk walk or something and at the very end this last one they [00:33:46] and at the very end this last one they were going to be different yeah so like [00:33:48] were going to be different yeah so like we were just then for that last one [00:33:49] we were just then for that last one you're just adding one right so so it [00:33:51] you're just adding one right so so it does like waited it does actually [00:33:53] does like waited it does actually address that and it just run you can run [00:33:55] address that and it just run you can run it until you get this sequences to be [00:33:57] it until you get this sequences to be exactly the same thing so you don't have [00:33:58] exactly the same thing so you don't have any mistakes [00:34:00] any mistakes does it matter for our new class become [00:34:03] does it matter for our new class become negative does it matter if our new costs [00:34:05] negative does it matter if our new costs because it depends on what sort of [00:34:07] because it depends on what sort of search algorithm you're using at the end [00:34:10] search algorithm you're using at the end of the day it's fine if you're using [00:34:11] of the day it's fine if you're using dynamic programming so I can have like a [00:34:13] dynamic programming so I can have like a negative cost here and I'm just calling [00:34:15] negative cost here and I'm just calling like dynamic programming at the end of [00:34:17] like dynamic programming at the end of the day with that and that is fine the [00:34:18] the day with that and that is fine the other is fine if the cost becomes the [00:34:40] other is fine if the cost becomes the question is will be God here one and [00:34:42] question is will be God here one and three is this actually right like if you [00:34:45] three is this actually right like if you remember like when we define a strand [00:34:46] remember like when we define a strand problem we said walking costs one and [00:34:49] problem we said walking costs one and trial costs 2 but we never got that well [00:34:51] trial costs 2 but we never got that well the reason we never got that is the [00:34:53] the reason we never got that is the solution we are going to get here is [00:34:54] solution we are going to get here is just based on our training data so if my [00:34:57] just based on our training data so if my training data is just walk walk walk [00:34:59] training data is just walk walk walk this is like the best thing I can get [00:35:00] this is like the best thing I can get and I can kind of like converge to the [00:35:02] and I can kind of like converge to the solution where the two end up being [00:35:04] solution where the two end up being equal I don't know how many mistakes on [00:35:06] equal I don't know how many mistakes on this if I have more like data points [00:35:08] this if I have more like data points that I'm going to do this longer and [00:35:09] that I'm going to do this longer and actually try it out another training [00:35:11] actually try it out another training data and then I might converge to a [00:35:12] data and then I might converge to a different thing so as far as [00:35:15] different thing so as far as initializing the weights [00:35:18] initializing the weights I'm assuming went further away you are [00:35:22] I'm assuming went further away you are from the Asheville truth the bunkers [00:35:24] from the Asheville truth the bunkers gonna take I please okay so the question [00:35:28] gonna take I please okay so the question is how do we initialize so in the [00:35:30] is how do we initialize so in the natural algorithm you're just [00:35:31] natural algorithm you're just initializing with zero so we're [00:35:32] initializing with zero so we're initializing everything by zero it's [00:35:34] initializing everything by zero it's actually not that bad because you just [00:35:36] actually not that bad because you just you just basically have this sequence [00:35:37] you just basically have this sequence and in it for the more general case [00:35:39] and in it for the more general case you're computing a feature value that [00:35:42] you're computing a feature value that you just compute the full thing and you [00:35:44] you just compute the full thing and you just do one single so it is not that [00:35:46] just do one single so it is not that costly if we have that input and [00:36:05] costly if we have that input and incorporate that into that so you're [00:36:07] incorporate that into that so you're saying if we have some prior knowledge [00:36:09] saying if we have some prior knowledge about the cost can be incorporated yeah [00:36:11] about the cost can be incorporated yeah that is interesting so in this current [00:36:15] that is interesting so in this current format so if you have some prior [00:36:18] format so if you have some prior algorithm maybe you like then that your [00:36:20] algorithm maybe you like then that your prediction is going to be better right [00:36:22] prediction is going to be better right so if you have some knowledge about it [00:36:24] so if you have some knowledge about it maybe you'll get a better prediction and [00:36:25] maybe you'll get a better prediction and then based on that you don't update it [00:36:27] then based on that you don't update it as much so maybe you can incorporate in [00:36:29] as much so maybe you can incorporate in the search problem but like in again [00:36:31] the search problem but like in again this is the most like general form of [00:36:33] this is the most like general form of this algorithm the simple kind of with [00:36:36] this algorithm the simple kind of with the simplified version of it also like [00:36:38] the simplified version of it also like even like for the action so it's not [00:36:40] even like for the action so it's not doing anything fancy it's not doing [00:36:41] doing anything fancy it's not doing something that hard either [00:36:45] overfeeding at all yes it's going to it [00:36:49] overfeeding at all yes it's going to it can told you yeah I'll show some [00:36:51] can told you yeah I'll show some examples on this like we're going to [00:36:52] examples on this like we're going to code this up and then we'll see [00:36:54] code this up and then we'll see overfitting kind of situations do so so [00:36:56] overfitting kind of situations do so so I'll get back to that actually all right [00:36:58] I'll get back to that actually all right all right so all right so let's move on [00:37:02] all right so all right so let's move on okay so so this is just like the things [00:37:04] okay so so this is just like the things that are on the slides or what I've [00:37:06] that are on the slides or what I've already talked about so yeah so here is [00:37:08] already talked about so yeah so here is an example so we start with three for a [00:37:12] an example so we start with three for a walk and two for tram and then the idea [00:37:14] walk and two for tram and then the idea is like how are we going to change the [00:37:15] is like how are we going to change the costs so we get the solution that we [00:37:18] costs so we get the solution that we were hoping for and and as I was saying [00:37:22] were hoping for and and as I was saying well we can assume that the costs only [00:37:23] well we can assume that the costs only depend on the action so I'm assuming [00:37:25] depend on the action so I'm assuming cost of SN a is just W of a in the most [00:37:29] cost of SN a is just W of a in the most general form it can depend on [00:37:30] general form it can depend on on the state - okay so then if you take [00:37:35] on the state - okay so then if you take any candidate output past then what [00:37:37] any candidate output past then what would be the cost of the path it would [00:37:39] would be the cost of the path it would just be the sum of these W values over [00:37:41] just be the sum of these W values over over all the edges so it would just be W [00:37:44] over all the edges so it would just be W of a 1 plus W of a 2 force W of 8 a 3 [00:37:46] of a 1 plus W of a 2 force W of 8 a 3 and as you've seen in this example the [00:37:48] and as you've seen in this example the cost of a path is just double your work [00:37:51] cost of a path is just double your work first over your walk for some of your [00:37:52] first over your walk for some of your walk or double your walk plus W of [00:37:54] walk or double your walk plus W of so so that's all this slide is saying so [00:37:57] so so that's all this slide is saying so that's how we compute the cost all right [00:38:00] that's how we compute the cost all right so so now let's actually look at this [00:38:02] so so now let's actually look at this algorithm like running in practice okay [00:38:07] algorithm like running in practice okay let me actually go over this video code [00:38:08] let me actually go over this video code so so we start analyzing double used to [00:38:11] so so we start analyzing double used to be equal to zero and then after that [00:38:13] be equal to zero and then after that we're going to iterate for some amount [00:38:15] we're going to iterate for some amount of T and then we have a training set of [00:38:17] of T and then we have a training set of examples it might not be just one here I [00:38:20] examples it might not be just one here I just showed this one example like like [00:38:22] just showed this one example like like the only training example I had was was [00:38:24] the only training example I had was was that wok wok wok is a good thing but I [00:38:26] that wok wok wok is a good thing but I can you can imagine having multiple [00:38:28] can you can imagine having multiple training examples for your search [00:38:29] training examples for your search problem and then what you can do is you [00:38:32] problem and then what you can do is you can compute your prediction so that is y [00:38:34] can compute your prediction so that is y prime given that you have some W and you [00:38:37] prime given that you have some W and you can start with this W equal to zero and [00:38:39] can start with this W equal to zero and then just compute your prediction Y [00:38:40] then just compute your prediction Y prime and then basically you can do this [00:38:43] prime and then basically you can do this plus and minus type of action so for [00:38:45] plus and minus type of action so for each action that is in your true Y that [00:38:48] each action that is in your true Y that is in your true label you're going to [00:38:50] is in your true label you're going to subtract one so to decrease the cost of [00:38:53] subtract one so to decrease the cost of true Y and then for each action that is [00:38:55] true Y and then for each action that is in your prediction you're going to add [00:38:57] in your prediction you're going to add add one to kind of increase the cost of [00:38:59] add one to kind of increase the cost of the predicted Y okay all right so so [00:39:03] the predicted Y okay all right so so let's look at implementing this one and [00:39:06] let's look at implementing this one and let's try to look at some examples here [00:39:08] let's try to look at some examples here all right so let's go back to the [00:39:10] all right so let's go back to the problem so this is again the same trying [00:39:12] problem so this is again the same trying problem you just want to use the same [00:39:15] problem you just want to use the same sort of format I actually went back and [00:39:17] sort of format I actually went back and wrote up the history here if you [00:39:20] wrote up the history here if you remember last time I was saying I'm not [00:39:21] remember last time I was saying I'm not returning the history now we have a way [00:39:23] returning the history now we have a way of returning history by each one of [00:39:25] of returning history by each one of these algorithms because we were going [00:39:26] these algorithms because we were going to call dynamic programming and we need [00:39:28] to call dynamic programming and we need the history all right so let's go back [00:39:30] the history all right so let's go back to our transportation problems so we had [00:39:32] to our transportation problems so we had a cost of 1 and 2 for walking and tram [00:39:36] a cost of 1 and 2 for walking and tram but what we want to do is we want to put [00:39:38] but what we want to do is we want to put parameters there so we want to actually [00:39:39] parameters there so we want to actually put this weight and we can give that to [00:39:42] put this weight and we can give that to our transportation problem so [00:39:44] our transportation problem so in addition to the number of bucks now [00:39:45] in addition to the number of bucks now I'm going to actually give like the [00:39:47] I'm going to actually give like the weight of different actions okay all [00:39:53] weight of different actions okay all right so then walking has a weight and [00:39:58] right so then walking has a weight and tram has weight so now I have updated my [00:40:03] tram has weight so now I have updated my transportation problem to generally take [00:40:06] transportation problem to generally take different values so so now we want to be [00:40:10] different values so so now we want to be able to generate some some training [00:40:12] able to generate some some training examples so that's what I want to do I [00:40:13] examples so that's what I want to do I want to generate different types of [00:40:14] want to generate different types of training examples that we can call so we [00:40:17] training examples that we can call so we can get these true labels so let's [00:40:19] can get these true labels so let's assume that the true weights for our [00:40:21] assume that the true weights for our training example is just one on two so [00:40:23] training example is just one on two so that is what we really want okay and [00:40:26] that is what we really want okay and then you're going to just write this [00:40:28] then you're going to just write this prediction function that we can call up [00:40:30] prediction function that we can call up later to get different values of Y so [00:40:34] later to get different values of Y so the prediction function is going to get [00:40:36] the prediction function is going to get the number of blocks so so it's going to [00:40:38] the number of blocks so so it's going to get an N the number of blocks here and [00:40:42] get an N the number of blocks here and it is going to act with this path that [00:40:45] it is going to act with this path that we want so it's going to act with these [00:40:47] we want so it's going to act with these Y values this different time okay so all [00:40:53] Y values this different time okay so all right so the whole point of prediction [00:40:56] right so the whole point of prediction is is basically like running this f of X [00:40:59] is is basically like running this f of X function and we can define our [00:41:04] function and we can define our transportation problem its width and [00:41:07] transportation problem its width and weight and the way we were going to get [00:41:12] weight and the way we were going to get this is by calling dynamic programming [00:41:13] this is by calling dynamic programming so someone asked earlier could the cost [00:41:15] so someone asked earlier could the cost be negative well yes because now I'm [00:41:17] be negative well yes because now I'm calling dynamic programming and if like [00:41:19] calling dynamic programming and if like it's problem has negative cost that is [00:41:21] it's problem has negative cost that is fine too so and the history is going to [00:41:25] fine too so and the history is going to get and the action new state and an cost [00:41:28] get and the action new state and an cost right so but the thing that I actually [00:41:30] right so but the thing that I actually want to return from my predict function [00:41:32] want to return from my predict function is a sequence of actions so I'll just [00:41:34] is a sequence of actions so I'll just get the action out of this history that [00:41:36] get the action out of this history that I get from dynamic programming some [00:41:38] I get from dynamic programming some calling dynamic programming on my [00:41:40] calling dynamic programming on my problem that is going to return a [00:41:42] problem that is going to return a history or get the sequence of actions [00:41:44] history or get the sequence of actions from that and that is my predict [00:41:45] from that and that is my predict function and I can just call that later [00:41:47] function and I can just call that later so let's go back to generating examples [00:41:49] so let's go back to generating examples so so I'm just going to go for try out [00:41:56] so so I'm just going to go for try out and end to go from 1 to 10 so one block [00:41:58] and end to go from 1 to 10 so one block to 10 bucks and we're calling the [00:42:01] to 10 bucks and we're calling the predict function on these true weights [00:42:03] predict function on these true weights to get the true y-values so these are my [00:42:06] to get the true y-values so these are my true labels okay [00:42:07] true labels okay and those are my examples so my examples [00:42:10] and those are my examples so my examples are just calling generate examples here [00:42:12] are just calling generate examples here okay so let's just print out our [00:42:14] okay so let's just print out our examples see how it looks like we [00:42:16] examples see how it looks like we haven't done anything like in terms of [00:42:18] haven't done anything like in terms of like the algorithm or anything we're [00:42:19] like the algorithm or anything we're just like creating these training [00:42:21] just like creating these training examples by calling this predict [00:42:24] examples by calling this predict function on the true weights I have a [00:42:27] function on the true weights I have a typo here generate examples I need [00:42:30] typo here generate examples I need parentheses I'll fix the typo okay so [00:42:38] parentheses I'll fix the typo okay so that kind of looks right right so that's [00:42:40] that kind of looks right right so that's my training example one through nine and [00:42:41] my training example one through nine and then what is what is the path that you [00:42:44] then what is what is the path that you would want to do if you have these true [00:42:46] would want to do if you have these true weights the one and two okay so now I [00:42:49] weights the one and two okay so now I have my examples so I'm ready to write [00:42:52] have my examples so I'm ready to write this structured perceptron algorithm it [00:42:55] this structured perceptron algorithm it gets my examples it gets the training [00:42:56] gets my examples it gets the training examples which are these paths and then [00:43:02] examples which are these paths and then we're going to iterate for some range [00:43:04] we're going to iterate for some range and then we can basically go over all [00:43:10] and then we can basically go over all the examples that we have in our true [00:43:12] the examples that we have in our true true y values and then we can't we can [00:43:16] true y values and then we can't we can basically go and update our weights [00:43:17] basically go and update our weights based on based on that and based on our [00:43:20] based on based on that and based on our predictions so let's initialize the [00:43:22] predictions so let's initialize the weights to just be zero so that's for [00:43:23] weights to just be zero so that's for walking in tram they're just zero and [00:43:26] walking in tram they're just zero and prediction actions this is when we're [00:43:29] prediction actions this is when we're calling predict based on the current [00:43:32] calling predict based on the current weights so if my current weights are [00:43:33] weights so if my current weights are zero then pred actions is just that y [00:43:36] zero then pred actions is just that y prime so pred actions is y prime true [00:43:39] prime so pred actions is y prime true actions is y like the things that we had [00:43:42] actions is y like the things that we had on the slides if okay and then I want to [00:43:47] on the slides if okay and then I want to count the number of mistakes I'm making [00:43:49] count the number of mistakes I'm making too so if the two are not equal to each [00:43:50] too so if the two are not equal to each other then I'm going to just keep it [00:43:52] other then I'm going to just keep it counter for number of mistakes if the [00:43:54] counter for number of mistakes if the two become equal then then my number of [00:43:56] two become equal then then my number of mistakes is zero I'm going to break it [00:43:57] mistakes is zero I'm going to break it in maybe I'm happy then okay so I make a [00:44:02] in maybe I'm happy then okay so I make a prediction and then after that I'm going [00:44:04] prediction and then after that I'm going to update the weight values [00:44:09] okay so how do I update [00:44:12] okay so how do I update well basically subtract if you're in [00:44:15] well basically subtract if you're in true actions which is why the labels [00:44:17] true actions which is why the labels that have created from my training [00:44:19] that have created from my training examples and then two plus one if you're [00:44:24] examples and then two plus one if you're in prediction actions based on the [00:44:25] in prediction actions based on the current weight values and then that's [00:44:27] current weight values and then that's pretty much it like you like that is [00:44:29] pretty much it like you like that is structured perceptron okay so let's just [00:44:32] structured perceptron okay so let's just print things nicely so we can print the [00:44:36] print things nicely so we can print the iteration and number of mistakes we have [00:44:38] iteration and number of mistakes we have and what is actually the weight values [00:44:40] and what is actually the weight values that we have and I'm just breaking this [00:44:44] that we have and I'm just breaking this whenever I have like no mistakes [00:44:47] whenever I have like no mistakes so if number of mistakes is zero oh I'll [00:44:49] so if number of mistakes is zero oh I'll just break this okay [00:44:59] just break this okay that sounds good so if number of [00:45:02] that sounds good so if number of mistakes is zero then I'll break okay [00:45:07] mistakes is zero then I'll break okay so all good I'm gonna run this it's not [00:45:12] so all good I'm gonna run this it's not gonna do anything because I didn't call [00:45:14] gonna do anything because I didn't call it so I'll go back and actually call it [00:45:16] it so I'll go back and actually call it I have another typo here if you guys can [00:45:20] I have another typo here if you guys can guess like where does my typos [00:45:24] guess like where does my typos this is gonna give an error well I [00:45:34] this is gonna give an error well I called you the weights not wait so I'll [00:45:38] called you the weights not wait so I'll go fix that okay this is driving okay so [00:45:43] go fix that okay this is driving okay so and then this is what we get so let's [00:45:44] and then this is what we get so let's actually look at this so what we got is [00:45:47] actually look at this so what we got is the first iteration number of mistakes [00:45:49] the first iteration number of mistakes was six and then we ended up actually at [00:45:52] was six and then we ended up actually at the first iteration we ended up [00:45:54] the first iteration we ended up converging to one two so then the second [00:45:56] converging to one two so then the second iteration the number of mistakes just [00:45:58] iteration the number of mistakes just became zero and then we just got one two [00:46:00] became zero and then we just got one two which is which is the weights that we [00:46:02] which is which is the weights that we were hoping for [00:46:03] were hoping for okay so that kind of looks okay to me [00:46:06] okay so that kind of looks okay to me that's my training data everything looks [00:46:08] that's my training data everything looks fine there's a question actually [00:46:10] fine there's a question actually I like integers [00:46:14] I like integers yeah so in this case you're sending all [00:46:16] yeah so in this case you're sending all the way time give it our update model as [00:46:19] the way time give it our update model as well we're assuming that the number of [00:46:21] well we're assuming that the number of walks of the number of trams are [00:46:22] walks of the number of trams are different where trim was in a different [00:46:24] different where trim was in a different location but the number of Lawson [00:46:25] location but the number of Lawson appearances correct you would still [00:46:28] appearances correct you would still salute [00:46:30] salute I see what you're asking no you treat it [00:46:32] I see what you're asking no you treat it like it should figure figure that out so [00:46:35] like it should figure figure that out so um you we can go over an example after [00:46:38] um you we can go over an example after after the class and I'll show you like [00:46:40] after the class and I'll show you like how how it actually doesn't all right so [00:46:46] okay so let's try 1 & 3 so we'd 1 & 3 [00:46:50] okay so let's try 1 & 3 so we'd 1 & 3 takes a little bit longer and but it [00:46:53] takes a little bit longer and but it does recover so 1 & 4 is actually the [00:46:57] does recover so 1 & 4 is actually the interesting one because it does recover [00:46:59] interesting one because it does recover something it does recover to 8 [00:47:02] something it does recover to 8 it doesn't recover 1 & 4 but like given [00:47:04] it doesn't recover 1 & 4 but like given my data actually 2 8 is like like there [00:47:07] my data actually 2 8 is like like there is no reason for me to get 1 1 & 4 like [00:47:10] is no reason for me to get 1 1 & 4 like the ratio of them is the thing that I [00:47:11] the ratio of them is the thing that I actually care about so even if I get 2 & [00:47:13] actually care about so even if I get 2 & 8 like like that is a reasonable set of [00:47:16] 8 like like that is a reasonable set of weights that one could get I'm gonna try [00:47:22] weights that one could get I'm gonna try a couple more things so let's try one in [00:47:24] a couple more things so let's try one in five so I'm gonna try one in five and [00:47:27] five so I'm gonna try one in five and this is what I get [00:47:29] this is what I get so I get the way to walk to be minus one [00:47:32] so I get the way to walk to be minus one an afraid of charm to be one no more [00:47:35] an afraid of charm to be one no more mistakes is zero so why is this [00:47:37] mistakes is zero so why is this happening [00:47:42] now your training is just all walking so [00:47:45] now your training is just all walking so starting to just walk yeah so what's [00:47:48] starting to just walk yeah so what's happening here is if you look at my [00:47:50] happening here is if you look at my training data up here my training data [00:47:52] training data up here my training data is just has like walk like all walks it [00:47:54] is just has like walk like all walks it hasn't seen tram ever so it has no idea [00:47:56] hasn't seen tram ever so it has no idea like what the cost of tram is with [00:47:59] like what the cost of tram is with respect to cost the walk so it's not [00:48:01] respect to cost the walk so it's not gonna learn that so we're gonna fix that [00:48:02] gonna learn that so we're gonna fix that like one way to fix that is to go and [00:48:04] like one way to fix that is to go and change the training data and actually [00:48:06] change the training data and actually like get more data so we can kind of do [00:48:09] like get more data so we can kind of do that so like just one thing to remember [00:48:13] that so like just one thing to remember is this is just going to fit your [00:48:14] is this is just going to fit your training data whatever it is so yeah so [00:48:18] training data whatever it is so yeah so when we fix that then walk becomes - and [00:48:21] when we fix that then walk becomes - and tram becomes nine which is not one and [00:48:23] tram becomes nine which is not one and five but it is getting there like it's a [00:48:26] five but it is getting there like it's a better ratio and number of mistakes is [00:48:28] better ratio and number of mistakes is still zero so it really depends on what [00:48:30] still zero so it really depends on what you're looking for like if you're trying [00:48:31] you're looking for like if you're trying to like match your data and know if your [00:48:33] to like match your data and know if your number of mistakes is zero and you're [00:48:34] number of mistakes is zero and you're happy with this you can just go with [00:48:36] happy with this you can just go with this and even though like it happened [00:48:38] this and even though like it happened like actually recovered the exact value [00:48:40] like actually recovered the exact value the ratios you can that's fine or maybe [00:48:43] the ratios you can that's fine or maybe you're looking for the exact ratios and [00:48:45] you're looking for the exact ratios and you should like run it longer more [00:48:46] you should like run it longer more iteration it's a question structured [00:48:48] iteration it's a question structured perceptron like suspect - getting stuck [00:48:50] perceptron like suspect - getting stuck in local optima sorry I was looking at [00:49:07] that is a good question so actually let [00:49:12] that is a good question so actually let me think about that [00:49:14] me think about that the Eustis and then i'll Pirie do you [00:49:16] the Eustis and then i'll Pirie do you actually know if this gets into local [00:49:18] actually know if this gets into local optima i have experienced it personally [00:49:23] optima i have experienced it personally I feel like there's there's reasons for [00:49:26] I feel like there's there's reasons for it to do this it's doing this kind of me [00:49:30] it to do this it's doing this kind of me think about this because even in a more [00:49:36] think about this because even in a more general form of it it's commonly used in [00:49:39] general form of it it's commonly used in like like matching like sentence like [00:49:42] like like matching like sentence like words and sentences so I haven't [00:49:44] words and sentences so I haven't experienced it either but I can look [00:49:46] experienced it either but I can look into that and back to you house guys are [00:49:51] into that and back to you house guys are we just being at all of your optimal [00:49:52] we just being at all of your optimal pass [00:49:53] pass yes yeah yeah if we do figure out all [00:49:56] yes yeah yeah if we do figure out all the alcohol paths and technically you [00:49:57] the alcohol paths and technically you should just be complex right because I [00:49:59] should just be complex right because I can just match up and if you're feeding [00:50:04] can just match up and if you're feeding it all the optimal path it should you [00:50:07] it all the optimal path it should you just matching path you're saying is yeah [00:50:15] just matching path you're saying is yeah so in terms of okay so yeah so in terms [00:50:17] so in terms of okay so yeah so in terms of like bringing down the number of [00:50:19] of like bringing down the number of mistakes then then it should always [00:50:20] mistakes then then it should always match it but if you have some true like [00:50:23] match it but if you have some true like weights that you're looking for and it's [00:50:25] weights that you're looking for and it's not represented in your data set then [00:50:27] not represented in your data set then it's not necessarily like like learning [00:50:29] it's not necessarily like like learning that so so in those settings they could [00:50:31] that so so in those settings they could fall into local optima so kind of like a [00:50:33] fall into local optima so kind of like a another version of this is when you're [00:50:35] another version of this is when you're doing like reward learning and then you [00:50:37] doing like reward learning and then you actually have this true reward you want [00:50:39] actually have this true reward you want to find like in those settings you can [00:50:40] to find like in those settings you can totally fall into like local optima [00:50:42] totally fall into like local optima because you want to find out what the [00:50:43] because you want to find out what the reward function is but you're right like [00:50:45] reward function is but you're right like if you're just matching the data so the [00:50:53] if you're just matching the data so the scaling would be a different problem [00:50:54] scaling would be a different problem right so the scaling is kind of yeah so [00:50:57] right so the scaling is kind of yeah so you can you have reward shaping so you [00:50:58] you can you have reward shaping so you can have different versions of the [00:50:59] can have different versions of the reverse function and if you get any of [00:51:01] reverse function and if you get any of them that is fine but but you might [00:51:04] them that is fine but but you might still I get into local optima that's not [00:51:06] still I get into local optima that's not explained by reward shaping so okay so [00:51:08] explained by reward shaping so okay so that we can talk about these things are [00:51:10] that we can talk about these things are fine maybe I should just move on to the [00:51:14] fine maybe I should just move on to the next topics cuz we have some more stuff [00:51:17] next topics cuz we have some more stuff going on okay so I was actually going to [00:51:19] going on okay so I was actually going to skip these slides because we have stuff [00:51:21] skip these slides because we have stuff coming up but this is a more general [00:51:22] coming up but this is a more general form of it so remember I was saying this [00:51:24] form of it so remember I was saying this w is a function of a but you could you [00:51:29] w is a function of a but you could you could have a more general form where [00:51:32] could have a more general form where your cost function is not just W as a [00:51:34] your cost function is not just W as a function of a [00:51:35] function of a it is actually W times a set of features [00:51:37] it is actually W times a set of features and and then the cost of a path is W [00:51:41] and and then the cost of a path is W times the features of the path and [00:51:43] times the features of the path and that's just the sum of features over the [00:51:45] that's just the sum of features over the edges so so you can have this more [00:51:47] edges so so you can have this more general form go over the slides later on [00:51:49] general form go over the slides later on maybe because we got to move to the next [00:51:51] maybe because we got to move to the next part but just real quick the update here [00:51:54] part but just real quick the update here is this more general form up update [00:51:56] is this more general form up update which is update your W based on [00:51:58] which is update your W based on subtracting the features over your true [00:52:01] subtracting the features over your true true path plus the features over your [00:52:03] true path plus the features over your predicted path so more general form of [00:52:05] predicted path so more general form of this it's called Cohen's algorithm so [00:52:07] this it's called Cohen's algorithm so Mike Collins was working on this and in [00:52:10] Mike Collins was working on this and in natural language processing he was [00:52:11] natural language processing he was actually interested in it in the setting [00:52:13] actually interested in it in the setting of part the speech target and tagging so [00:52:15] of part the speech target and tagging so so you might have like a sentence and [00:52:18] so you might have like a sentence and you want to tag each one of the each one [00:52:20] you want to tag each one of the each one of the labels here as a noun or verb or [00:52:23] of the labels here as a noun or verb or determiner or now and again so so he was [00:52:26] determiner or now and again so so he was thinking he was basically looking at [00:52:27] thinking he was basically looking at this problem as a search problem and he [00:52:30] this problem as a search problem and he was using like similar type of [00:52:31] was using like similar type of algorithms to try to figure out like [00:52:34] algorithms to try to figure out like like match what what the value like [00:52:36] like match what what the value like match noun or like each one of these [00:52:39] match noun or like each one of these parts of speech tags to the sentence so [00:52:42] parts of speech tags to the sentence so he has some scores and then based on the [00:52:44] he has some scores and then based on the scores and and his data set he goes like [00:52:46] scores and and his data set he goes like up and down he moves the scores up and [00:52:48] up and down he moves the scores up and down which uses the same idea you can [00:52:50] down which uses the same idea you can use the same idea again in machine [00:52:52] use the same idea again in machine translation so you can have like if you [00:52:53] translation so you can have like if you have heard of like beam search you can [00:52:55] have heard of like beam search you can have multiple types like like a bunch of [00:52:58] have multiple types like like a bunch of translations of some phrase and then you [00:53:00] translations of some phrase and then you can operate and down rate them based on [00:53:02] can operate and down rate them based on your training data okay all right okay [00:53:07] your training data okay all right okay so now let's move to a ice a ice a star [00:53:10] so now let's move to a ice a ice a star not a I star a star search alright [00:53:14] not a I star a star search alright so okay so we've talked about this idea [00:53:17] so okay so we've talked about this idea of learning costs right so we've talked [00:53:19] of learning costs right so we've talked about search problems in general doing [00:53:22] about search problems in general doing inference and then doing learning on top [00:53:24] inference and then doing learning on top of them and then now I want to talk a [00:53:26] of them and then now I want to talk a little bit about kind of making things [00:53:28] little bit about kind of making things faster using smarter ideas and smarter [00:53:31] faster using smarter ideas and smarter heuristics there's a question [00:53:34] I would say what is the loss function [00:53:37] I would say what is the loss function that we are trying to minimize this in [00:53:39] that we are trying to minimize this in this structure so in this is this is a [00:53:41] this structure so in this is this is a prediction problem right so in that [00:53:43] prediction problem right so in that prediction problem we are trying to [00:53:45] prediction problem we are trying to basically figure out what W WS are as [00:53:48] basically figure out what W WS are as closely as possible as we are matching [00:53:50] closely as possible as we are matching these w wi prime to Y right so so [00:53:54] these w wi prime to Y right so so basically like the way we are solving [00:53:57] basically like the way we are solving this is not necessarily as an [00:53:59] this is not necessarily as an optimization the way that we have solve [00:54:00] optimization the way that we have solve other types of learning problems the way [00:54:02] other types of learning problems the way we are solving it what is by just like [00:54:04] we are solving it what is by just like tweaking these weights to try to match [00:54:06] tweaking these weights to try to match my Y as closely as possible to 2y time [00:54:09] my Y as closely as possible to 2y time okay all right okay so let's get taught [00:54:13] okay all right okay so let's get taught to talk about a star so I don't have [00:54:16] to talk about a star so I don't have internet so I can't show these but I [00:54:19] internet so I can't show these but I think the link for these should work [00:54:20] think the link for these should work when you go to their to the file so the [00:54:24] when you go to their to the file so the idea is if you go back to uniform cost [00:54:27] idea is if you go back to uniform cost search like an uniform cost search what [00:54:30] search like an uniform cost search what we wanted to do was you want to get from [00:54:31] we wanted to do was you want to get from a point to some solution but we would [00:54:34] a point to some solution but we would uniformly like increase explore the [00:54:37] uniformly like increase explore the states around us until we get to some [00:54:40] states around us until we get to some final state the idea of a a-star is to [00:54:43] final state the idea of a a-star is to basically do a uniform cost search but [00:54:45] basically do a uniform cost search but do it a little bit smarter and move [00:54:47] do it a little bit smarter and move towards the direction of the goal state [00:54:48] towards the direction of the goal state so if I have a goal state particularly [00:54:50] so if I have a goal state particularly like in that corner maybe I can I can [00:54:53] like in that corner maybe I can I can move in that direction smarter right [00:54:55] move in that direction smarter right okay so here is like an example of that [00:54:59] okay so here is like an example of that pictorially so I can start from a start [00:55:01] pictorially so I can start from a start and and if I'm using uniform cost search [00:55:04] and and if I'm using uniform cost search again I'm uniformly kind of exploring [00:55:06] again I'm uniformly kind of exploring all the states possible until I hit my s [00:55:09] all the states possible until I hit my s end and then I'm happy I'm done I've [00:55:11] end and then I'm happy I'm done I've solved my search problem everything is [00:55:13] solved my search problem everything is good but the thing is I've done all [00:55:14] good but the thing is I've done all these like wasted effort on this side [00:55:16] these like wasted effort on this side which is just not that great okay [00:55:19] which is just not that great okay so uniform cost search and in that sense [00:55:22] so uniform cost search and in that sense has this problem of just exploring a [00:55:23] has this problem of just exploring a bunch of states for no good reason and [00:55:25] bunch of states for no good reason and what we want to do is we want to take [00:55:28] what we want to do is we want to take into account that we're just going from [00:55:30] into account that we're just going from a star to s and so we don't really like [00:55:32] a star to s and so we don't really like need to do all of that we can actually [00:55:34] need to do all of that we can actually just try to get to the get to the end [00:55:36] just try to get to the get to the end state okay so so going back to maybe I'm [00:55:42] state okay so so going back to maybe I'm gonna go on this side so [00:55:45] going back to how these search problems [00:55:47] going back to how these search problems work the idea is to start from a start [00:55:52] work the idea is to start from a start and then get to some state s and then we [00:55:56] and then get to some state s and then we have this s end okay and what uniform [00:55:59] have this s end okay and what uniform cost search does is it basically orders [00:56:01] cost search does is it basically orders the states based on past cost of s and [00:56:07] the states based on past cost of s and then explores everything around it based [00:56:09] then explores everything around it based on past cost of f s until it reaches s [00:56:11] on past cost of f s until it reaches s end okay but when you're in state s like [00:56:14] end okay but when you're in state s like there is also this thing called future [00:56:17] there is also this thing called future cost earnest right and ideally when I'm [00:56:19] cost earnest right and ideally when I'm on the state s I don't want to explore [00:56:21] on the state s I don't want to explore other things on this side I actually [00:56:22] other things on this side I actually want to want to move in the direction of [00:56:25] want to want to move in the direction of kind of reducing my my future cost and [00:56:27] kind of reducing my my future cost and getting to my to my end State okay so so [00:56:30] getting to my to my end State okay so so the cost of me getting from s start to [00:56:32] the cost of me getting from s start to us and it's really just like past cost [00:56:35] us and it's really just like past cost of s plus future cost of s and if I knew [00:56:41] of s plus future cost of s and if I knew what future cost of s was I would just [00:56:43] what future cost of s was I would just move in that direction but if I knew [00:56:45] move in that direction but if I knew what future cost of S is well the [00:56:47] what future cost of S is well the problem was solved right like I had the [00:56:49] problem was solved right like I had the answers of my search problem like I'm [00:56:51] answers of my search problem like I'm solving a problem so in reality I don't [00:56:53] solving a problem so in reality I don't have access to future costs sorry I have [00:56:55] have access to future costs sorry I have no idea what future cost is but I do [00:56:57] no idea what future cost is but I do have access to something like I can't [00:56:59] have access to something like I can't potentially have access to something [00:57:00] potentially have access to something else and I'm gonna call that H of s and [00:57:05] else and I'm gonna call that H of s and that is an estimate of future costs so [00:57:08] that is an estimate of future costs so I'm going to add a function called H of [00:57:10] I'm going to add a function called H of s and this is called a heuristic and a [00:57:13] s and this is called a heuristic and a Serie C could estimate what future cost [00:57:15] Serie C could estimate what future cost is and if I have access to this [00:57:18] is and if I have access to this heuristic maybe I can update my cost to [00:57:21] heuristic maybe I can update my cost to be something as what the past cost is in [00:57:24] be something as what the past cost is in addition to that like I can add this [00:57:26] addition to that like I can add this heuristic and that helps me to be a [00:57:28] heuristic and that helps me to be a little bit smarter when I'm running my [00:57:30] little bit smarter when I'm running my algorithm okay so so the idea is ideally [00:57:33] algorithm okay so so the idea is ideally like what I would want to do is I want [00:57:34] like what I would want to do is I want to explore in the order of past cost [00:57:36] to explore in the order of past cost plus future cost I don't have future [00:57:39] plus future cost I don't have future costs if I had future costs I had the [00:57:40] costs if I had future costs I had the answer to my search problem instead what [00:57:43] answer to my search problem instead what if star does is its it exports in the [00:57:46] if star does is its it exports in the order of past costs plus some H of s ok [00:57:49] order of past costs plus some H of s ok so remember uniform cost search it [00:57:51] so remember uniform cost search it explores just in the order of past costs [00:57:54] explores just in the order of past costs so in uniform cost search like we don't [00:57:56] so in uniform cost search like we don't have that H of s okay [00:57:58] have that H of s okay and H of s is a heuristic it's an [00:58:01] and H of s is a heuristic it's an estimate of the future cost all right so [00:58:06] estimate of the future cost all right so what does a star do it actually there's [00:58:08] what does a star do it actually there's something really simple so so a star [00:58:10] something really simple so so a star basically just does uniform cost search [00:58:12] basically just does uniform cost search so all it does is uniform cost search [00:58:15] so all it does is uniform cost search with a new cost so before I had this [00:58:18] with a new cost so before I had this blue cost cost of SN a this was my cost [00:58:22] blue cost cost of SN a this was my cost before now I'm going to update my cost [00:58:24] before now I'm going to update my cost to be discussed prime of SN a which is [00:58:28] to be discussed prime of SN a which is just cost plus the heuristic over the [00:58:30] just cost plus the heuristic over the successor of SN a minus the heuristic so [00:58:33] successor of SN a minus the heuristic so so that is the new cost and I can just [00:58:36] so that is the new cost and I can just run uniform cost search on this new cost [00:58:38] run uniform cost search on this new cost so so I'm gonna call it cost prime [00:58:42] so so I'm gonna call it cost prime listener well what is that equal to that [00:58:46] listener well what is that equal to that is equal to cost of SN a which is what [00:58:48] is equal to cost of SN a which is what we had before when we're doing uniform [00:58:50] we had before when we're doing uniform cost search plus heuristic over [00:58:54] cost search plus heuristic over successor of SN a - heuristic over s so [00:59:02] successor of SN a - heuristic over s so why do I want this well what this is [00:59:05] why do I want this well what this is saying is if I'm at in some state s ok [00:59:08] saying is if I'm at in some state s ok and there is some water state successor [00:59:12] and there is some water state successor of SN a so I can take an action a and [00:59:15] of SN a so I can take an action a and end up in successor of SN a and there is [00:59:17] end up in successor of SN a and there is some s end here that I'm really trying [00:59:19] some s end here that I'm really trying to get to remember H was my estimate of [00:59:23] to get to remember H was my estimate of future cost what this is saying is my [00:59:26] future cost what this is saying is my estimate of future cost for getting from [00:59:29] estimate of future cost for getting from successor to SN minus my estimate of [00:59:35] successor to SN minus my estimate of getting from future cost of s to us and [00:59:39] getting from future cost of s to us and should be the thing I'm adding to my [00:59:41] should be the thing I'm adding to my cost function I should penalize that and [00:59:43] cost function I should penalize that and what this is really enforcing is it [00:59:45] what this is really enforcing is it basically makes me move in the direction [00:59:47] basically makes me move in the direction of s end because because if I end up in [00:59:51] of s end because because if I end up in some other state that is not in the [00:59:52] some other state that is not in the direction of s and then then that thing [00:59:55] direction of s and then then that thing that I'm adding here is basically going [00:59:57] that I'm adding here is basically going to penalize that right it's going to be [01:00:00] to penalize that right it's going to be saying well it's really bad that you're [01:00:02] saying well it's really bad that you're you're going in that action I'm going to [01:00:03] you're going in that action I'm going to put more costs on that so you never go [01:00:05] put more costs on that so you never go in that direction you should go in the [01:00:06] in that direction you should go in the direction that go it's ghost towards [01:00:08] direction that go it's ghost towards your s end and that all depends on like [01:00:11] your s end and that all depends on like what your H function is [01:00:12] what your H function is and how good like of an H function you [01:00:14] and how good like of an H function you have and how you're designing your your [01:00:16] have and how you're designing your your heuristics but that's kind of the idea [01:00:18] heuristics but that's kind of the idea behind it so here is an example actually [01:00:20] behind it so here is an example actually so let's say that we have this example [01:00:22] so let's say that we have this example where we have ABCD and E and we have [01:00:25] where we have ABCD and E and we have cost of one on all of these edges and [01:00:27] cost of one on all of these edges and what we want to do is you want to go [01:00:29] what we want to do is you want to go from C to e that's our plan okay so if [01:00:32] from C to e that's our plan okay so if I'm running uniform cost search well [01:00:34] I'm running uniform cost search well what would I do [01:00:35] what would I do I'm at C I'm going to explore B and D [01:00:37] I'm at C I'm going to explore B and D because they have a cost of 1 and then [01:00:40] because they have a cost of 1 and then after that I'm going to explore a and E [01:00:42] after that I'm going to explore a and E and then finally I get to get to eat but [01:00:44] and then finally I get to get to eat but why did I spend all of that time we can [01:00:46] why did I spend all of that time we can get a and B I shouldn't have done that [01:00:48] get a and B I shouldn't have done that right like a and B are not in the [01:00:50] right like a and B are not in the direction of getting to a cent so [01:00:52] direction of getting to a cent so instead what I can do is if someone [01:00:54] instead what I can do is if someone comes in tells me oh I have this [01:00:56] comes in tells me oh I have this heuristic function you can evaluate it [01:00:58] heuristic function you can evaluate it on your state and this heuristic [01:01:00] on your state and this heuristic function is going to give you four three [01:01:02] function is going to give you four three two and one and zero for each one of [01:01:04] two and one and zero for each one of these states then you can update your [01:01:05] these states then you can update your cost and maybe you'll have a better way [01:01:07] cost and maybe you'll have a better way of getting to a cent so this heuristic [01:01:09] of getting to a cent so this heuristic in this case is actually perfect because [01:01:11] in this case is actually perfect because it's actually equal to future costs like [01:01:13] it's actually equal to future costs like the point of the heuristic is to get as [01:01:15] the point of the heuristic is to get as close as possible to the future cost [01:01:17] close as possible to the future cost this is exactly equal to future class [01:01:19] this is exactly equal to future class so with this heuristic what's going to [01:01:21] so with this heuristic what's going to happen is my new cost is going to change [01:01:23] happen is my new cost is going to change how is it going to change well it's [01:01:25] how is it going to change well it's going to become the cost of whatever the [01:01:27] going to become the cost of whatever the cost of the edge was before which was 1 [01:01:29] cost of the edge was before which was 1 plus h of the case of for example cost [01:01:32] plus h of the case of for example cost of going from C to B if you work at C to [01:01:34] of going from C to B if you work at C to B it's the old cost which was 1 plus [01:01:37] B it's the old cost which was 1 plus heuristic at B which is 3 minus [01:01:40] heuristic at B which is 3 minus heuristic at C which is 2 so that ends [01:01:43] heuristic at C which is 2 so that ends up giving me 1 plus 3 minus 2 that is [01:01:45] up giving me 1 plus 3 minus 2 that is equal to 2 and then similarly you can [01:01:47] equal to 2 and then similarly you can compute like all these like new cost [01:01:49] compute like all these like new cost values the purple values and and that [01:01:52] values the purple values and and that has a cost of 2 for going in this [01:01:54] has a cost of 2 for going in this direction and cost of 0 for going [01:01:56] direction and cost of 0 for going towards E and if I just run uniform cost [01:01:58] towards E and if I just run uniform cost search again here then I can get to like [01:02:01] search again here then I can get to like much easier [01:02:05] there's a a sterling result in 3d [01:02:08] there's a a sterling result in 3d approaches where this opportunities like [01:02:11] approaches where this opportunities like go back with them better loss star [01:02:16] go back with them better loss star result in my greedy approaches like [01:02:19] result in my greedy approaches like where user greedy greedy [01:02:22] where user greedy greedy yes yeah so okay so so it's all so so [01:02:25] yes yeah so okay so so it's all so so the question is is a star like causing [01:02:27] the question is is a star like causing like greedy approaches so no actually we [01:02:30] like greedy approaches so no actually we were going to talk about that a little [01:02:31] were going to talk about that a little bit a star dependent depends on the [01:02:33] bit a star dependent depends on the heuristic you are choosing so depending [01:02:34] heuristic you are choosing so depending on the heuristic you are choosing a star [01:02:36] on the heuristic you are choosing a star is actually going to be like return the [01:02:38] is actually going to be like return the optimal value but yeah it does depend on [01:02:41] optimal value but yeah it does depend on the heuristic so it actually does the [01:02:42] the heuristic so it actually does the exact same thing as uniform cost search [01:02:44] exact same thing as uniform cost search if you choose a good heuristic what why [01:02:52] if you choose a good heuristic what why is cost of C C look really bad when it's [01:03:00] is cost of C C look really bad when it's really not since to become so cost of CB [01:03:03] really not since to become so cost of CB o because oh I see what you're saying [01:03:05] o because oh I see what you're saying that's what we started with so this is [01:03:08] that's what we started with so this is like the graph that I started with so I [01:03:10] like the graph that I started with so I started with the cost like the blue cost [01:03:12] started with the cost like the blue cost being all one but now I'm saying those [01:03:14] being all one but now I'm saying those costs are not good I'm going to update [01:03:16] costs are not good I'm going to update them based on this heuristic so I can [01:03:18] them based on this heuristic so I can get closer to the goal like as fast as [01:03:20] get closer to the goal like as fast as possible [01:03:26] and state well you returned like the [01:03:28] and state well you returned like the actual cause you wouldn't that's right [01:03:34] actual cause you wouldn't that's right so so the question is what costs are you [01:03:36] so so the question is what costs are you going to return at the end and you do [01:03:38] going to return at the end and you do want to return the actual cost so you [01:03:39] want to return the actual cost so you return the actual cost but you can run [01:03:41] return the actual cost but you can run your algorithm with this heuristic thing [01:03:43] your algorithm with this heuristic thing added in because that allows you to [01:03:45] added in because that allows you to explore less things and just be more [01:03:46] explore less things and just be more efficient okay oh I got a move on [01:03:49] efficient okay oh I got a move on all right so okay so a good question to [01:03:53] all right so okay so a good question to ask is well what is this heuristic how [01:03:54] ask is well what is this heuristic how does this heuristic look like like can [01:03:56] does this heuristic look like like can any it does any heuristic like work well [01:03:58] any it does any heuristic like work well so turns out that not every heuristic [01:04:01] so turns out that not every heuristic works so here's an example so again the [01:04:03] works so here's an example so again the blue things are the costs that are [01:04:05] blue things are the costs that are already given these are the things that [01:04:06] already given these are the things that I already have and I can just run my [01:04:08] I already have and I can just run my search algorithm bit the red things are [01:04:10] search algorithm bit the red things are the values of the heuristic someone gave [01:04:12] the values of the heuristic someone gave them to me for now in general we would [01:04:14] them to me for now in general we would want to design them so someone comes in [01:04:17] want to design them so someone comes in and gives me this this heuristic values [01:04:18] and gives me this this heuristic values and then what I want to do is I want to [01:04:22] and then what I want to do is I want to compute the new cost values so the [01:04:24] compute the new cost values so the question is is this heuristic good so I [01:04:27] question is is this heuristic good so I get my new cost values they look like [01:04:29] get my new cost values they look like this like does this work we don't have [01:04:34] this like does this work we don't have time so I'm gonna answer that it's not [01:04:36] time so I'm gonna answer that it's not gonna work but the reason this is not [01:04:38] gonna work but the reason this is not gonna work is though we just got a [01:04:40] gonna work is though we just got a negative edge there right so I'm running [01:04:42] negative edge there right so I'm running uniform cost search at the end of the [01:04:44] uniform cost search at the end of the day like a star is just uniform cost [01:04:46] day like a star is just uniform cost search and I can't have negative edges [01:04:48] search and I can't have negative edges so I'm not like I'd like that it's just [01:04:51] so I'm not like I'd like that it's just not a good heuristic to have here so so [01:04:54] not a good heuristic to have here so so the heuristics need to have specific [01:04:55] the heuristics need to have specific properties and and we should you should [01:04:57] properties and and we should you should think about what those properties are so [01:04:59] think about what those properties are so one property that you would want to have [01:05:00] one property that you would want to have the heuristics to have is this idea of [01:05:02] the heuristics to have is this idea of consistency this is actually like the [01:05:05] consistency this is actually like the most important property really so so we [01:05:08] most important property really so so we talked about heuristics I'm going to [01:05:10] talked about heuristics I'm going to talk about properties of them here [01:05:12] talk about properties of them here heuristics H they should be consistent [01:05:17] so the consistent heuristic has two [01:05:19] so the consistent heuristic has two conditions the first condition is it's [01:05:22] conditions the first condition is it's going to satisfy the triangle inequality [01:05:23] going to satisfy the triangle inequality and and what that means is like the cost [01:05:26] and and what that means is like the cost that you're the updated cost that you [01:05:28] that you're the updated cost that you have should be it should be non-negative [01:05:29] have should be it should be non-negative so so this costs prime of this yesterday [01:05:34] so so this costs prime of this yesterday this should be positive so [01:05:37] this should be positive so that means that the old cost of sna plus [01:05:41] that means that the old cost of sna plus h of successor i'm gonna use as prime [01:05:45] h of successor i'm gonna use as prime for that - H of S is greater or equal to [01:05:49] for that - H of S is greater or equal to zero okay so that is the first condition [01:05:51] zero okay so that is the first condition and then the second condition that we [01:05:54] and then the second condition that we are going to put is that future cost of [01:05:57] are going to put is that future cost of SN is going to be equal to zero right [01:06:00] SN is going to be equal to zero right because the future cost of the end state [01:06:02] because the future cost of the end state should be zero so then the heuristic at [01:06:05] should be zero so then the heuristic at the end state is also equal to zero so [01:06:08] the end state is also equal to zero so so these are kind of the properties that [01:06:10] so these are kind of the properties that we would want to have if you want to [01:06:11] we would want to have if you want to talk about consistent heuristics okay [01:06:14] talk about consistent heuristics okay and they're kind of like natural things [01:06:16] and they're kind of like natural things that we would want to have right like [01:06:17] that we would want to have right like like the first one is basically saying [01:06:20] like the first one is basically saying well the cost you are going to end up at [01:06:21] well the cost you are going to end up at should be should be greater than or [01:06:23] should be should be greater than or equals 0 and you can run uniform cost [01:06:24] equals 0 and you can run uniform cost search on it but it's really like [01:06:26] search on it but it's really like talking about this triangle inequality [01:06:28] talking about this triangle inequality that we want to have right like we H of [01:06:30] that we want to have right like we H of s it's kind of an estimate of this [01:06:32] s it's kind of an estimate of this future cost so if I'm going to from s [01:06:35] future cost so if I'm going to from s take an action with that cost of s and a [01:06:38] take an action with that cost of s and a dad added up H of successor of s SN a [01:06:41] dad added up H of successor of s SN a should be greater than just H of s the [01:06:43] should be greater than just H of s the estimate of future cost of s so so [01:06:46] estimate of future cost of s so so that's that's all it is saying and then [01:06:47] that's that's all it is saying and then the last one also makes sense right like [01:06:49] the last one also makes sense right like I do want my future cost of s and to be [01:06:52] I do want my future cost of s and to be 0 right so then the heuristic at s end [01:06:54] 0 right so then the heuristic at s end should also be equal to 0 because again [01:06:56] should also be equal to 0 because again heuristic is just an estimate of the [01:06:58] heuristic is just an estimate of the future cost ok all right so so what do I [01:07:02] future cost ok all right so so what do I know about a star beyond that so one [01:07:06] know about a star beyond that so one thing that we know is that if if H is [01:07:09] thing that we know is that if if H is consistent so if I have this consistency [01:07:11] consistent so if I have this consistency property then I know that a star is [01:07:15] property then I know that a star is correct so that there's a theorem that [01:07:19] correct so that there's a theorem that says a star is going to be correct if H [01:07:22] says a star is going to be correct if H is consistent well we can kind of look [01:07:28] is consistent well we can kind of look at that through an example so so let's [01:07:30] at that through an example so so let's say that I am at the zero and I take a 1 [01:07:34] say that I am at the zero and I take a 1 and I end up at this one and I take a 2 [01:07:37] and I end up at this one and I take a 2 and a minus 3 and 0 its to take a 3 so [01:07:46] and a minus 3 and 0 its to take a 3 so let's say that I have I have kind of [01:07:48] let's say that I have I have kind of like a path that looks like this okay so [01:07:51] like a path that looks like this okay so then if I'm looking at the cost of each [01:07:55] then if I'm looking at the cost of each each one of these right I'm looking at [01:07:56] each one of these right I'm looking at cost of cost prime of 0 and a 1 [01:08:03] cost of cost prime of 0 and a 1 well what is that equal to that's that's [01:08:05] well what is that equal to that's that's my updated cost updated cost is old cost [01:08:09] my updated cost updated cost is old cost which is cost of a 0 and a plus [01:08:13] which is cost of a 0 and a plus heuristic value at s1 minus heuristic [01:08:16] heuristic value at s1 minus heuristic value is 0 value this one point is [01:08:19] value is 0 value this one point is heuristic value I this year okay so so [01:08:22] heuristic value I this year okay so so that is the cost of going from s 0 and [01:08:24] that is the cost of going from s 0 and taking a 1 I'm going to just write all [01:08:26] taking a 1 I'm going to just write all the costs for for the rest of this to [01:08:28] the costs for for the rest of this to figure out what's the cost of the path [01:08:29] figure out what's the cost of the path the cost of the path is just the sum of [01:08:31] the cost of the path is just the sum of these costs so s 1 a 2 is cost of s 1 a [01:08:39] these costs so s 1 a 2 is cost of s 1 a 2 plus heuristic is it as 2 - heuristic [01:08:47] 2 plus heuristic is it as 2 - heuristic it is 1 so that is the new cost of this [01:08:50] it is 1 so that is the new cost of this edge and the new cost of the last edge [01:08:52] edge and the new cost of the last edge which is cost prime of s 2 a 3 and that [01:08:58] which is cost prime of s 2 a 3 and that is equal to the old cost stuff s 2 a 3 [01:09:02] is equal to the old cost stuff s 2 a 3 plus heuristic add s 3 minus heuristic [01:09:07] plus heuristic add s 3 minus heuristic is ok so I just wrote up all these costs [01:09:10] is ok so I just wrote up all these costs if I'm talking about the cost of a path [01:09:13] if I'm talking about the cost of a path then it's just that these costs added up [01:09:15] then it's just that these costs added up right so if I add up these costs what [01:09:18] right so if I add up these costs what happens a bunch of things get cancelled [01:09:22] happens a bunch of things get cancelled out right [01:09:23] out right this guy gets cancelled up like this guy [01:09:25] this guy gets cancelled up like this guy this guy gets cancelled out right and [01:09:29] this guy gets cancelled out right and what I end up with is is some of these [01:09:33] what I end up with is is some of these new cost is cost Prime's of Si minus 1 [01:09:39] new cost is cost Prime's of Si minus 1 AI is just equal to some of my old cost [01:09:44] AI is just equal to some of my old cost of Si minus 1 AI plus my heuristic at [01:09:51] of Si minus 1 AI plus my heuristic at this last state is end State [01:09:55] this last state is end State - heuristic at this year [01:09:58] - heuristic at this year okay I'm saying my heuristic is a [01:10:02] okay I'm saying my heuristic is a consistent heuristic so what is a [01:10:04] consistent heuristic so what is a property of a consistent heuristic the [01:10:07] property of a consistent heuristic the heuristic value at s nth should be equal [01:10:10] heuristic value at s nth should be equal to zero so this guy is also equal to [01:10:12] to zero so this guy is also equal to zero so what I end up with is is if I [01:10:17] zero so what I end up with is is if I look at a path with the new cost the sum [01:10:19] look at a path with the new cost the sum of the new cost is just equal to the sum [01:10:21] of the new cost is just equal to the sum of the old cost minus sum sum constant [01:10:24] of the old cost minus sum sum constant and this constant is just a heuristic [01:10:26] and this constant is just a heuristic value at s 0 ok so so why is this [01:10:30] value at s 0 ok so so why is this important because when we talk about the [01:10:31] important because when we talk about the correctness like remember we just proved [01:10:33] correctness like remember we just proved at the beginning of this lecture that [01:10:35] at the beginning of this lecture that uniform cost search is correct so the [01:10:37] uniform cost search is correct so the cost that it is returning is optimum [01:10:39] cost that it is returning is optimum that is that if this cost a star is just [01:10:42] that is that if this cost a star is just uniform cost search with a new cost so a [01:10:45] uniform cost search with a new cost so a star is just running on this new cost [01:10:47] star is just running on this new cost but this new cost is the same thing that [01:10:49] but this new cost is the same thing that they have as old cost minus a constant [01:10:51] they have as old cost minus a constant so if I'm optimizing the new cost it's [01:10:54] so if I'm optimizing the new cost it's the same thing as optimizing the old [01:10:55] the same thing as optimizing the old cost so it is going to return the [01:10:57] cost so it is going to return the optimal solution okay all right so that [01:11:03] optimal solution okay all right so that is basically the same things on the [01:11:05] is basically the same things on the slide like ok basically so that's one [01:11:09] slide like ok basically so that's one property right so so we talked about [01:11:10] property right so so we talked about heuristics being consistent we have now [01:11:13] heuristics being consistent we have now just talked about a star being correct [01:11:15] just talked about a star being correct because it's uniform cost search it's [01:11:17] because it's uniform cost search it's it's correct only if the heuristic is [01:11:20] it's correct only if the heuristic is consistent right like only if you have [01:11:22] consistent right like only if you have that property because because that [01:11:23] that property because because that consistency gets us gets us the fact [01:11:25] consistency gets us gets us the fact that this guy is equal to 0 and gets us [01:11:27] that this guy is equal to 0 and gets us the fact that these guys are going to be [01:11:29] the fact that these guys are going to be positive and I can run uniform cost [01:11:31] positive and I can run uniform cost search on de um the next property that [01:11:34] search on de um the next property that we have here for for a store is a star [01:11:37] we have here for for a store is a star is actually more efficient than uniform [01:11:39] is actually more efficient than uniform cost search and we kind of have already [01:11:41] cost search and we kind of have already seen this right like like the whole [01:11:43] seen this right like like the whole point of a star is to not explore [01:11:45] point of a star is to not explore everything and explore in a directed [01:11:46] everything and explore in a directed manner so if you remember uniform cost [01:11:50] manner so if you remember uniform cost search like how does it explore well it [01:11:52] search like how does it explore well it explores all the states that have a past [01:11:54] explores all the states that have a past cost that are less than the past cost of [01:11:57] cost that are less than the past cost of ascent so again remember the uniform [01:12:01] ascent so again remember the uniform cost search you're exploring with the [01:12:02] cost search you're exploring with the with the order of path cost of states [01:12:05] with the order of path cost of states and then we explore all those states [01:12:07] and then we explore all those states that [01:12:07] that Haskell's less than the den state a star [01:12:12] Haskell's less than the den state a star like the thing that they have stored us [01:12:14] like the thing that they have stored us is it explores less states so it [01:12:16] is it explores less states so it explores states that have a past cost [01:12:18] explores states that have a past cost less than past cost of the end state - [01:12:22] less than past cost of the end state - the heuristic so so if you kind of look [01:12:24] the heuristic so so if you kind of look at the right side the right side just [01:12:26] at the right side the right side just became become smaller right like the [01:12:29] became become smaller right like the right side for uniform cost search was [01:12:31] right side for uniform cost search was just past cost of SN now it is past cost [01:12:33] just past cost of SN now it is past cost of ascent - the heuristic so it just [01:12:35] of ascent - the heuristic so it just became smaller and then why did it [01:12:37] became smaller and then why did it become smaller because now I'm doing [01:12:39] become smaller because now I'm doing this more directed search I'm not [01:12:40] this more directed search I'm not searching everything uniformly around me [01:12:42] searching everything uniformly around me and and that's the whole point of the [01:12:44] and and that's the whole point of the heuristic okay and that makes it [01:12:46] heuristic okay and that makes it actually more efficient so and then kind [01:12:49] actually more efficient so and then kind of the interpretation of this is if H is [01:12:51] of the interpretation of this is if H is larger than then that's better right [01:12:53] larger than then that's better right like if my heuristic is as large as [01:12:55] like if my heuristic is as large as possible well well that is better [01:12:57] possible well well that is better because then I am kind of exploring a [01:12:59] because then I am kind of exploring a smaller like area to get to the solution [01:13:02] smaller like area to get to the solution the proof of this is like two lines so [01:13:04] the proof of this is like two lines so I'm gonna escape that so let me actually [01:13:06] I'm gonna escape that so let me actually show how this looks like so if I'm [01:13:10] show how this looks like so if I'm trying to get from a start to s and [01:13:12] trying to get from a start to s and again if I'm doing uniform cost search [01:13:14] again if I'm doing uniform cost search I'm uniformly exploring so like all [01:13:17] I'm uniformly exploring so like all states around me and that is equivalent [01:13:19] states around me and that is equivalent to assuming that the heuristic is equal [01:13:21] to assuming that the heuristic is equal to zero like it's basically uniform cost [01:13:23] to zero like it's basically uniform cost search is a star when the heuristic is [01:13:26] search is a star when the heuristic is equal to zero so what is the point of [01:13:29] equal to zero so what is the point of the heuristic the point of the heuristic [01:13:31] the heuristic the point of the heuristic is to estimate what the future cost this [01:13:32] is to estimate what the future cost this if I know what the future costs is then [01:13:35] if I know what the future costs is then then H of s is just equal to future cost [01:13:38] then H of s is just equal to future cost and then a and that would be awesome and [01:13:40] and then a and that would be awesome and and I only need to like explore that [01:13:42] and I only need to like explore that green kind of space and then the thing [01:13:44] green kind of space and then the thing I'm exploring is it's just the notes [01:13:46] I'm exploring is it's just the notes that are under minimum past cough and [01:13:48] that are under minimum past cough and call cost path and I'm not exploring [01:13:51] call cost path and I'm not exploring anything extra right like that's the [01:13:53] anything extra right like that's the most like efficient thing one can do in [01:13:56] most like efficient thing one can do in practice like I don't have access to [01:13:58] practice like I don't have access to future costs right and in practice if I [01:14:00] future costs right and in practice if I had access to future costs like the [01:14:01] had access to future costs like the problem was solved I have access to some [01:14:04] problem was solved I have access to some heuristic that is some estimate of the [01:14:06] heuristic that is some estimate of the future cost it's not as bad as uniform [01:14:08] future cost it's not as bad as uniform cost search it's getting close to future [01:14:10] cost search it's getting close to future costs like look the value of future cost [01:14:12] costs like look the value of future cost and you're kind of somewhere in between [01:14:14] and you're kind of somewhere in between so it is going to be more efficient than [01:14:16] so it is going to be more efficient than uniform cost search in some sense okay [01:14:19] uniform cost search in some sense okay all right so so basically the whole idea [01:14:23] all right so so basically the whole idea of a star is it kind of distorts edge [01:14:26] of a star is it kind of distorts edge edge cost and favor sees and States so [01:14:28] edge cost and favor sees and States so I'm going to add here that a star is [01:14:30] I'm going to add here that a star is efficient so that is the other thing [01:14:35] okay all right so so these are all cool [01:14:39] okay all right so so these are all cool properties one more property about here [01:14:42] properties one more property about here is six and then after that we can talk [01:14:43] is six and then after that we can talk about your lack stations so so there's [01:14:46] about your lack stations so so there's also this other property called [01:14:47] also this other property called admissibility [01:14:48] admissibility which is something that we have kind of [01:14:50] which is something that we have kind of been talking about already right like [01:14:51] been talking about already right like we've been talking about how this [01:14:52] we've been talking about how this heuristic [01:14:53] heuristic should get close to future cost and [01:14:55] should get close to future cost and should be an estimate of the future cost [01:14:57] should be an estimate of the future cost so an admissible heuristic is a [01:14:59] so an admissible heuristic is a heuristic where H of s is less than or [01:15:02] heuristic where H of s is less than or equal to future cost and then the cool [01:15:04] equal to future cost and then the cool thing is if you already have consistency [01:15:06] thing is if you already have consistency then you have admissibility - so if you [01:15:08] then you have admissibility - so if you already have this property then you have [01:15:11] already have this property then you have admissibility - so another property is [01:15:15] admissibility - so another property is admissible [01:15:17] admissible which means H of s is less than or equal [01:15:20] which means H of s is less than or equal to future cost of all right so the [01:15:29] to future cost of all right so the proofs of these are again like just [01:15:31] proofs of these are again like just one-liners so this one is more than one [01:15:33] one-liners so this one is more than one line but it is actually quite easy it's [01:15:36] line but it is actually quite easy it's in the notes so you can use induction [01:15:37] in the notes so you can use induction here to prove to prove that if you have [01:15:40] here to prove to prove that if you have consistency then you're going to have [01:15:42] consistency then you're going to have admissibility - okay so so we've just [01:15:46] admissibility - okay so so we've just talked about how a star is the [01:15:47] talked about how a star is the sufficient thing you have talked about [01:15:49] sufficient thing you have talked about how like we can come up with we haven't [01:15:50] how like we can come up with we haven't talked about how you come up with your [01:15:51] talked about how you come up with your six but we have talked about consistent [01:15:53] six but we have talked about consistent heuristics that are going to be useful [01:15:55] heuristics that are going to be useful and they're going to give us [01:15:56] and they're going to give us admissibility and they're going to give [01:15:58] admissibility and they're going to give us correctness and how like a star is [01:16:00] us correctness and how like a star is going to be this very efficient thing [01:16:01] going to be this very efficient thing but we actually have not talked about [01:16:03] but we actually have not talked about how to come up with heuristics so let's [01:16:06] how to come up with heuristics so let's spend the next yeah couple of minutes [01:16:09] spend the next yeah couple of minutes talking about talking about how to come [01:16:12] talking about talking about how to come up with heuristics and then the main [01:16:14] up with heuristics and then the main idea here is just relax the problem just [01:16:17] idea here is just relax the problem just relaxation so so what are so so the way [01:16:20] relaxation so so what are so so the way we come up with heuristics is we pick [01:16:22] we come up with heuristics is we pick the problem and just make it easier and [01:16:24] the problem and just make it easier and solve that easier problem so so that is [01:16:25] solve that easier problem so so that is kind of the whole idea of it so remember [01:16:28] kind of the whole idea of it so remember the HMS is supposed to be close to [01:16:31] the HMS is supposed to be close to future cost [01:16:33] future cost and some of these problems can be really [01:16:35] and some of these problems can be really difficult right so this so if you have a [01:16:37] difficult right so this so if you have a lot of constraints and it becomes harder [01:16:39] lot of constraints and it becomes harder to solve the problem so if you relax it [01:16:41] to solve the problem so if you relax it and we just remove the constraints we [01:16:42] and we just remove the constraints we are solving a much easier problem and [01:16:44] are solving a much easier problem and that could be used as a heuristic as a [01:16:46] that could be used as a heuristic as a value of heuristic that estimates what [01:16:48] value of heuristic that estimates what the future cost this so so you want to [01:16:53] the future cost this so so you want to remove constraints and when we remove [01:16:54] remove constraints and when we remove constraints the cool thing that happens [01:16:56] constraints the cool thing that happens is sometimes we have closed form [01:16:58] is sometimes we have closed form solutions sometimes you just have easier [01:17:00] solutions sometimes you just have easier search problems that we can solve and [01:17:02] search problems that we can solve and sometimes you have like independence of [01:17:03] sometimes you have like independence of problems and we can find a solutions to [01:17:05] problems and we can find a solutions to them and that gives us a good heuristic [01:17:08] them and that gives us a good heuristic so so that is my goal right like I would [01:17:09] so so that is my goal right like I would want to find these heuristics so let me [01:17:12] want to find these heuristics so let me just go through a couple of examples for [01:17:13] just go through a couple of examples for that so so let's say I have a search [01:17:15] that so so let's say I have a search problem and I want to get the triangle [01:17:17] problem and I want to get the triangle to get to the circle and that is what I [01:17:19] to get to the circle and that is what I want to do and I have all these like [01:17:21] want to do and I have all these like walls there and that just seems really [01:17:23] walls there and that just seems really difficult so what is a good heuristic [01:17:25] difficult so what is a good heuristic here I'm going to just relax the problem [01:17:28] here I'm going to just relax the problem I'm gonna remove like all those walls [01:17:30] I'm gonna remove like all those walls just knock down the walls and have that [01:17:32] just knock down the walls and have that problem [01:17:33] problem that just seems much easier okay so so [01:17:36] that just seems much easier okay so so well like now I actually have a closed [01:17:39] well like now I actually have a closed form solution for getting the triangle [01:17:41] form solution for getting the triangle get to the get to the circle I can just [01:17:43] get to the get to the circle I can just compute them in half in this sense and I [01:17:45] compute them in half in this sense and I can use that as a heuristic again it's [01:17:47] can use that as a heuristic again it's not going to be there like actually like [01:17:49] not going to be there like actually like what future cost is but it is an [01:17:51] what future cost is but it is an approximation for it so so usually you [01:17:53] approximation for it so so usually you can think of the heuristics as these [01:17:54] can think of the heuristics as these optimistic views of what the future [01:17:57] optimistic views of what the future costs is like like it's an optimistic [01:17:58] costs is like like it's an optimistic view of the problem like what if there [01:18:00] view of the problem like what if there was like no walls like if there are no [01:18:02] was like no walls like if there are no walls here then how would I get from [01:18:04] walls here then how would I get from location to another location the [01:18:06] location to another location the solution to that is going to give you [01:18:08] solution to that is going to give you this future cost Val its estimate of [01:18:10] this future cost Val its estimate of future cost value which is which is H of [01:18:12] future cost value which is which is H of s okay or the Tran problem let's say we [01:18:15] s okay or the Tran problem let's say we have the Tran problem but we have a more [01:18:18] have the Tran problem but we have a more difficult version of it where we have a [01:18:20] difficult version of it where we have a constraint and this constraint says you [01:18:22] constraint and this constraint says you can't have more tram actions than walk [01:18:25] can't have more tram actions than walk actions so now this is my search problem [01:18:27] actions so now this is my search problem I need to solve this this seems kind of [01:18:29] I need to solve this this seems kind of difficult like we talked about how to [01:18:31] difficult like we talked about how to come up with States for it last time and [01:18:33] come up with States for it last time and even that seems difficult like I need to [01:18:35] even that seems difficult like I need to have the location I need to have the [01:18:36] have the location I need to have the difference between the walk and tram [01:18:38] difference between the walk and tram that seems kind of difficult like like I [01:18:40] that seems kind of difficult like like I have an order of N squared states now so [01:18:43] have an order of N squared states now so instead of doing that well let me just [01:18:45] instead of doing that well let me just remove the constraint [01:18:46] remove the constraint I'm just gonna remove the constraint [01:18:47] I'm just gonna remove the constraint relax it and then after relaxing it then [01:18:51] relax it and then after relaxing it then I have a much easier search problem I [01:18:53] I have a much easier search problem I need to deal with I only have this [01:18:54] need to deal with I only have this location and then I can just go with [01:18:57] location and then I can just go with that location and everything will be [01:18:59] that location and everything will be great okay all right so so the idea here [01:19:04] great okay all right so so the idea here was like for this middle part is if I if [01:19:08] was like for this middle part is if I if I remove these constraints I'm going to [01:19:10] I remove these constraints I'm going to have these easier search problems these [01:19:11] have these easier search problems these relaxations and I can compute the future [01:19:14] relaxations and I can compute the future cost of these relaxations using my [01:19:16] cost of these relaxations using my favorite techniques like dynamic [01:19:17] favorite techniques like dynamic programming or uniform cost search but [01:19:19] programming or uniform cost search but but the one thing to notice is I need to [01:19:21] but the one thing to notice is I need to compute that for one through n because [01:19:23] compute that for one through n because this heuristic is a function of state [01:19:25] this heuristic is a function of state right so I actually need to compute [01:19:26] right so I actually need to compute future cost for this relaxed problem for [01:19:29] future cost for this relaxed problem for all states from 1 through n and that [01:19:33] all states from 1 through n and that allows me to have like a better estimate [01:19:34] allows me to have like a better estimate of this there are some like engineering [01:19:37] of this there are some like engineering things that you might need to do here so [01:19:39] things that you might need to do here so so for example you might so here we are [01:19:43] so for example you might so here we are looking for future cost so if you plan [01:19:44] looking for future cost so if you plan to use uniform cost search for whatever [01:19:46] to use uniform cost search for whatever reason like maybe dynamic programming [01:19:48] reason like maybe dynamic programming doesn't work in this setting you need to [01:19:50] doesn't work in this setting you need to use uniform cost search you need to make [01:19:52] use uniform cost search you need to make a few engineering things to make it work [01:19:54] a few engineering things to make it work because if you remember uniform cost [01:19:56] because if you remember uniform cost search would only work on past cost [01:19:58] search would only work on past cost doesn't work on future costs so you need [01:20:00] doesn't work on future costs so you need to like create a reverse problem where [01:20:03] to like create a reverse problem where you can actually compute future costs so [01:20:05] you can actually compute future costs so so a few engineering things but beyond [01:20:08] so a few engineering things but beyond that it is basically just running our [01:20:10] that it is basically just running our search algorithms that we know on these [01:20:13] search algorithms that we know on these relaxed problems and that will give us a [01:20:16] relaxed problems and that will give us a heuristic value and we'll put that in [01:20:18] heuristic value and we'll put that in our problem and we'll go and solve it ok [01:20:20] our problem and we'll go and solve it ok and another cool thing that heuristics [01:20:23] and another cool thing that heuristics give us is this idea of having [01:20:25] give us is this idea of having independent subproblems so so here's [01:20:27] independent subproblems so so here's another example I want to solve this [01:20:29] another example I want to solve this this eight puzzle and move blocks here [01:20:31] this eight puzzle and move blocks here and there and come up with this new [01:20:32] and there and come up with this new configuration that seems hard again a [01:20:36] configuration that seems hard again a relaxation of that is just assume that [01:20:38] relaxation of that is just assume that the tiles can overlap so the original [01:20:40] the tiles can overlap so the original problem says the tiles cannot overlap [01:20:42] problem says the tiles cannot overlap I'm just gonna relax it and say oh you [01:20:44] I'm just gonna relax it and say oh you can just go wherever and you can overlap [01:20:46] can just go wherever and you can overlap ok so that is again much simpler and now [01:20:49] ok so that is again much simpler and now I have eight independent problems for [01:20:51] I have eight independent problems for getting each one of these points from [01:20:53] getting each one of these points from one location to another location and I [01:20:55] one location to another location and I have a closed form solution for that [01:20:57] have a closed form solution for that because that's again [01:20:58] because that's again Manhattan distance so that gives me a [01:21:01] Manhattan distance so that gives me a heuristic that that's an estimate that's [01:21:02] heuristic that that's an estimate that's not perfect it's an estimate and I can [01:21:05] not perfect it's an estimate and I can use that estimate in my original search [01:21:06] use that estimate in my original search problem to solve the search problem so [01:21:08] problem to solve the search problem so here it was just some examples of this [01:21:10] here it was just some examples of this idea of removing constant removing [01:21:12] idea of removing constant removing constraints and coming up a bit better [01:21:14] constraints and coming up a bit better heuristics so like knocking down walls [01:21:16] heuristics so like knocking down walls like why can't ramp freely overlapping [01:21:19] like why can't ramp freely overlapping pieces and he says and that allows you [01:21:21] pieces and he says and that allows you to kind of solve this new problem and [01:21:24] to kind of solve this new problem and then the idea is you're reducing these [01:21:27] then the idea is you're reducing these edge costs from infinity to some finite [01:21:29] edge costs from infinity to some finite finite cost okay all right so yeah so [01:21:35] finite cost okay all right so yeah so I'm gonna wrap up here and I guess we're [01:21:38] I'm gonna wrap up here and I guess we're going to talk about these last few [01:21:39] going to talk about these last few slides next time since you're running [01:21:42] slides next time since you're running wait but I think you guys have got like [01:21:45] wait but I think you guys have got like the main idea so what's our next time ================================================================================ LECTURE 019 ================================================================================ Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019) Source: https://www.youtube.com/watch?v=9g32v7bK3Co --- Transcript [00:00:05] okay let's start guys okay so I'll plan [00:00:10] okay let's start guys okay so I'll plan for two days to catch up so you're a [00:00:12] for two days to catch up so you're a little behind so it's okay so today I [00:00:15] little behind so it's okay so today I want to talk about MVPs Markov decision [00:00:17] want to talk about MVPs Markov decision processes my finest to talk about that [00:00:20] processes my finest to talk about that for the first hour and then after that I [00:00:22] for the first hour and then after that I want to talk for ten minutes about the [00:00:24] want to talk for ten minutes about the previous lecture so remember like you [00:00:27] previous lecture so remember like you went over relaxation is kind of quick so [00:00:29] went over relaxation is kind of quick so maybe we can go over that again and then [00:00:31] maybe we can go over that again and then the last ten minutes I want to talk [00:00:33] the last ten minutes I want to talk about the project and and kind of the [00:00:35] about the project and and kind of the plan for the project how you should [00:00:36] plan for the project how you should think about it this is coming up so we [00:00:38] think about it this is coming up so we should start talking about that so this [00:00:40] should start talking about that so this is an optimistic plan though [00:00:42] is an optimistic plan though let's see how it goes what this is the [00:00:45] let's see how it goes what this is the current plan okay all right so okay [00:00:48] current plan okay all right so okay let's get into it so Markov decision [00:00:49] let's get into it so Markov decision processes so let's start with a question [00:00:52] processes so let's start with a question let's actually do this just by hand so [00:00:54] let's actually do this just by hand so don't need to go to the website so the [00:00:57] don't need to go to the website so the question is it's Friday night and you [00:00:58] question is it's Friday night and you want to go to Mountain View and you have [00:01:00] want to go to Mountain View and you have a bunch of options but what you want to [00:01:02] a bunch of options but what you want to do is you want to get to Mountain View [00:01:03] do is you want to get to Mountain View with the least amount of time okay which [00:01:06] with the least amount of time okay which one of these modes of transportation [00:01:08] one of these modes of transportation would you use like how many of you would [00:01:10] would you use like how many of you would bike no one would like help of people [00:01:13] bike no one would like help of people like how many of you would drive this is [00:01:16] like how many of you would drive this is this is popular mountain view would be [00:01:18] this is popular mountain view would be good Cal trainers some people would take [00:01:21] good Cal trainers some people would take out rain sounds good uber and lyft we [00:01:25] out rain sounds good uber and lyft we have like a good like distribution why [00:01:27] have like a good like distribution why yes yeah the good number of you go on a [00:01:30] yes yeah the good number of you go on a flight as flying cars are becoming a [00:01:32] flight as flying cars are becoming a thing like this could be an option in [00:01:34] thing like this could be an option in the future there are a lot of actually [00:01:35] the future there are a lot of actually startups working on flying cars but but [00:01:39] startups working on flying cars but but as you think about this problem like the [00:01:41] as you think about this problem like the way you think about it is is there a [00:01:43] way you think about it is is there a bunch of uncertainties in the world [00:01:44] bunch of uncertainties in the world right it's not a necessarily a search [00:01:46] right it's not a necessarily a search problem right you could you could bike [00:01:47] problem right you could you could bike and you can get a flat tire and you [00:01:49] and you can get a flat tire and you don't really know that right you have to [00:01:51] don't really know that right you have to like kind of take that into account if [00:01:52] like kind of take that into account if you're driving there could be traffic if [00:01:55] you're driving there could be traffic if you're taking the Caltrain there are all [00:01:57] you're taking the Caltrain there are all sorts of delay with the Caltrain and [00:01:59] sorts of delay with the Caltrain and then all sorts of other uncertainties [00:02:01] then all sorts of other uncertainties that exist in the world and then you [00:02:03] that exist in the world and then you need to think about those so it's not [00:02:04] need to think about those so it's not just a pure research problem where you [00:02:06] just a pure research problem where you pick your route and then you just go [00:02:08] pick your route and then you just go with it right there are there things [00:02:09] with it right there are there things that can happen and that can affect your [00:02:11] that can happen and that can affect your decision so and that kind of takes us to [00:02:14] decision so and that kind of takes us to Markov decision processes we talked [00:02:16] Markov decision processes we talked about search problems where everything [00:02:17] about search problems where everything was at terminus [00:02:18] was at terminus and now you're talking about this next [00:02:20] and now you're talking about this next class of state basis functions which are [00:02:22] class of state basis functions which are Markov decision processes and the idea [00:02:24] Markov decision processes and the idea of it is you take actions but you might [00:02:26] of it is you take actions but you might not actually end up where you expect it [00:02:28] not actually end up where you expect it to because there's this nature around [00:02:30] to because there's this nature around you and there's this world around you [00:02:31] you and there's this world around you that's going to be uncertain and do [00:02:33] that's going to be uncertain and do stuff that you didn't expect okay so so [00:02:36] stuff that you didn't expect okay so so so far we've talked about search [00:02:37] so far we've talked about search problems the idea of it is you start [00:02:39] problems the idea of it is you start with the state and then you take an [00:02:42] with the state and then you take an action and you deterministically end up [00:02:44] action and you deterministically end up in a new state if you remember the [00:02:46] in a new state if you remember the successor function successor of SN a [00:02:48] successor function successor of SN a would always give us s Prime and we [00:02:50] would always give us s Prime and we would deterministically end up in s [00:02:52] would deterministically end up in s prime so if you have like that graph up [00:02:54] prime so if you have like that graph up there if you start in S and you decide [00:02:56] there if you start in S and you decide to take this action one you're going to [00:02:58] to take this action one you're going to end up in a like there's no other option [00:03:01] end up in a like there's no other option that's how you're gonna end up in it [00:03:03] that's how you're gonna end up in it okay and the solution to these search [00:03:06] okay and the solution to these search problems were these paths so we had the [00:03:08] problems were these paths so we had the sequence of actions because I know if I [00:03:10] sequence of actions because I know if I take action one in action three in [00:03:12] take action one in action three in action to I know like what is a path [00:03:14] action to I know like what is a path that I'm going to end up hide and now [00:03:16] that I'm going to end up hide and now with the idea okay so when we think [00:03:18] with the idea okay so when we think about Markov decision processes that is [00:03:20] about Markov decision processes that is the setting where we have uncertainty in [00:03:22] the setting where we have uncertainty in the world and we need to take that into [00:03:24] the world and we need to take that into account so so the idea of it is you [00:03:26] account so so the idea of it is you start this state you decide to take an [00:03:28] start this state you decide to take an action but then you can randomly end up [00:03:30] action but then you can randomly end up in different states you can randomly end [00:03:32] in different states you can randomly end up in s1 prime or ass to prime and again [00:03:34] up in s1 prime or ass to prime and again because there is just so many other [00:03:35] because there is just so many other things that are happening in the world [00:03:37] things that are happening in the world and you need to you need to worry about [00:03:39] and you need to you need to worry about that randomness and make decisions based [00:03:41] that randomness and make decisions based on that okay and and this actually comes [00:03:44] on that okay and and this actually comes up pretty much like everywhere in every [00:03:46] up pretty much like everywhere in every application so this comes up in robotics [00:03:49] application so this comes up in robotics so for example if you have a robot that [00:03:50] so for example if you have a robot that wants to go and pick up an object you [00:03:52] wants to go and pick up an object you decide on your strategy everything is [00:03:54] decide on your strategy everything is great but like when it comes to actually [00:03:56] great but like when it comes to actually moving the robot and getting the robot [00:03:57] moving the robot and getting the robot to do the task like the actuators can [00:03:59] to do the task like the actuators can fail or you might have all sorts of [00:04:01] fail or you might have all sorts of obstacles around you that you didn't [00:04:03] obstacles around you that you didn't think about so there is uncertainty [00:04:04] think about so there is uncertainty about the environment or uncertainty [00:04:06] about the environment or uncertainty about your model like your actuators [00:04:08] about your model like your actuators that that you don't necessarily think [00:04:10] that that you don't necessarily think about and in reality they're affecting [00:04:12] about and in reality they're affecting your decisions and where you're ending [00:04:14] your decisions and where you're ending up at this comes up and other settings [00:04:16] up at this comes up and other settings like resource allocation so in resource [00:04:18] like resource allocation so in resource allocation maybe you're deciding what to [00:04:21] allocation maybe you're deciding what to produce what is the product you would [00:04:22] produce what is the product you would want to produce and and that kind of [00:04:24] want to produce and and that kind of depends on what is the customer demand [00:04:26] depends on what is the customer demand and and you might not have a good model [00:04:28] and and you might not have a good model of that and that's uncertain right it [00:04:30] of that and that's uncertain right it really depends on what [00:04:32] really depends on what products customers want and what they [00:04:33] products customers want and what they don't and you might have a model but [00:04:35] don't and you might have a model but it's not going to be like accurate and [00:04:37] it's not going to be like accurate and and you need to do research allocation [00:04:39] and you need to do research allocation under those assumptions of uncertainty [00:04:42] under those assumptions of uncertainty about the world similar thing is in [00:04:44] about the world similar thing is in agriculture so for example you want to [00:04:46] agriculture so for example you want to decide what sort of what to plant but [00:04:51] decide what sort of what to plant but but again you might not be sure about [00:04:52] but again you might not be sure about the weather if it's gonna rain or if the [00:04:54] the weather if it's gonna rain or if the if the if the crops are going to yield [00:04:56] if the if the crops are going to yield or not so there's a lot of uncertainty [00:04:58] or not so there's a lot of uncertainty in these decisions that we make and and [00:05:00] in these decisions that we make and and they make these problems to go beyond [00:05:03] they make these problems to go beyond search problems and become problems [00:05:05] search problems and become problems where we have uncertainty and we need to [00:05:07] where we have uncertainty and we need to make decisions under uncertainty okay [00:05:09] make decisions under uncertainty okay all right so let's again another example [00:05:12] all right so let's again another example so this is a volcano crossing example so [00:05:15] so this is a volcano crossing example so so we have an island and you're on one [00:05:17] so we have an island and you're on one side of the island and what we want to [00:05:19] side of the island and what we want to do so we're in that black square over [00:05:20] do so we're in that black square over there and what we want to do is we want [00:05:22] there and what we want to do is we want to go from this black square to this [00:05:24] to go from this black square to this side of the island where we have the [00:05:26] side of the island where we have the scenic view and that's gonna give us a [00:05:28] scenic view and that's gonna give us a lot of reward and happiness so so my [00:05:30] lot of reward and happiness so so my goal is to go from one side of the [00:05:31] goal is to go from one side of the island to the other side of the line but [00:05:33] island to the other side of the line but the caveat is here is that there's a [00:05:35] the caveat is here is that there's a small kano in the middle of the island [00:05:37] small kano in the middle of the island that I need to actually pass okay so and [00:05:39] that I need to actually pass okay so and and if I fall into okay now I'm going to [00:05:42] and if I fall into okay now I'm going to get a minus 50 reward more like minus [00:05:44] get a minus 50 reward more like minus infinity but but for this example like [00:05:47] infinity but but for this example like imagine you're getting a minus 50 reward [00:05:48] imagine you're getting a minus 50 reward if you fall into the volcano okay so [00:05:52] if you fall into the volcano okay so alright so so if I have this thing here [00:05:55] alright so so if I have this thing here on the side so if my slip probability is [00:06:00] on the side so if my slip probability is zero which is I'm sure I'm not gonna [00:06:02] zero which is I'm sure I'm not gonna fall into it okay now should I cross the [00:06:03] fall into it okay now should I cross the island no oh yes well I should cross the [00:06:10] island no oh yes well I should cross the island and because I'm not gonna fall [00:06:12] island and because I'm not gonna fall right like I'm not gonna fall into that [00:06:13] right like I'm not gonna fall into that minus 50 sleep probability is zero I'll [00:06:16] minus 50 sleep probability is zero I'll get to my twenty you reward everything [00:06:19] get to my twenty you reward everything will be great okay but the thing is like [00:06:21] will be great okay but the thing is like we've been talking about how the world [00:06:22] we've been talking about how the world is stochastic and SU probability is not [00:06:25] is stochastic and SU probability is not going to be zero maybe maybe it is ten [00:06:27] going to be zero maybe maybe it is ten percent so if there is ten percent [00:06:29] percent so if there is ten percent chance of falling into the volcano how [00:06:31] chance of falling into the volcano how many of you would would still across the [00:06:33] many of you would would still across the island good number so the optimal [00:06:39] island good number so the optimal solution is actually shown by these [00:06:40] solution is actually shown by these arrows here and yes the optimal solution [00:06:43] arrows here and yes the optimal solution is still to cross the island like you're [00:06:46] is still to cross the island like you're you hear we're gonna talk about all [00:06:47] you hear we're gonna talk about all these terms but the value here is [00:06:48] these terms but the value here is basically the value you're gonna get at [00:06:51] basically the value you're gonna get at that beginning like stage which is a [00:06:52] that beginning like stage which is a kind of well we'll talk about it it's [00:06:54] kind of well we'll talk about it it's expected utility that you're gonna get [00:06:56] expected utility that you're gonna get it's gonna go down cuz there is some [00:06:58] it's gonna go down cuz there is some probability that you're gonna fall - [00:06:59] probability that you're gonna fall - okay no but still like the best thing to [00:07:01] okay no but still like the best thing to do is to cross the island [00:07:03] do is to cross the island how about 20% how many of you would do [00:07:05] how about 20% how many of you would do it with 20% so number of people it's [00:07:10] it with 20% so number of people it's less still turns out that the optimal [00:07:13] less still turns out that the optimal strategy is to cross 30% one person so [00:07:20] strategy is to cross 30% one person so with 30% that's actually the point that [00:07:22] with 30% that's actually the point that you kind of you'd rather not not cross [00:07:25] you kind of you'd rather not not cross because there's a soul okay no and it's [00:07:27] because there's a soul okay no and it's a large probability you could you could [00:07:29] a large probability you could you could fall into the okay and the value is [00:07:31] fall into the okay and the value is going to go down okay so these are the [00:07:32] going to go down okay so these are the types of problems you're gonna we're [00:07:34] types of problems you're gonna we're gonna work with yes so two is like a [00:07:40] gonna work with yes so two is like a value the reward that you are going to [00:07:42] value the reward that you are going to get that that state and then value you [00:07:44] get that that state and then value you compute that your propagated back you'll [00:07:46] compute that your propagated back you'll talk about that in detail some on how to [00:07:47] talk about that in detail some on how to compute the value okay [00:07:49] compute the value okay all right okay so that was just an [00:07:52] all right okay so that was just an example so so that was an example of a [00:07:54] example so so that was an example of a Markov decision process what we want to [00:07:56] Markov decision process what we want to do in this lecture is we are going to [00:07:57] do in this lecture is we are going to like again model these types of systems [00:08:00] like again model these types of systems as Markov decision processes then we are [00:08:03] as Markov decision processes then we are going to talk about inference type [00:08:04] going to talk about inference type algorithms so how do we do inference how [00:08:06] algorithms so how do we do inference how do we come up with this best strategy [00:08:08] do we come up with this best strategy path and in the middle I'm go talk about [00:08:10] path and in the middle I'm go talk about policy valuation which is not an [00:08:13] policy valuation which is not an inference algorithm but it's kind of a [00:08:15] inference algorithm but it's kind of a step towards it and it's basically this [00:08:16] step towards it and it's basically this idea of if someone tells me this is a [00:08:18] idea of if someone tells me this is a policy can I evaluate how good it is and [00:08:21] policy can I evaluate how good it is and we'll talk about value iteration which [00:08:23] we'll talk about value iteration which tries to figure out what is the best [00:08:24] tries to figure out what is the best policy that I can take so that's a plan [00:08:28] policy that I can take so that's a plan for today then next lecture we are going [00:08:29] for today then next lecture we are going to talk about reinforcement learning [00:08:31] to talk about reinforcement learning where we don't actually know what the [00:08:33] where we don't actually know what the reward is and we don't know what the [00:08:34] reward is and we don't know what the what the transitions are so that's kind [00:08:37] what the transitions are so that's kind of the learning part of a part of this [00:08:38] of the learning part of a part of this MDP lectures so Reid is going to [00:08:41] MDP lectures so Reid is going to actually do the Duda lecture next nix on [00:08:43] actually do the Duda lecture next nix on Wednesday right okay so let's get into [00:08:46] Wednesday right okay so let's get into let's get into Markov decision processes [00:08:49] let's get into Markov decision processes so we have a bunch of examples [00:08:50] so we have a bunch of examples throughout this lecture so let's look at [00:08:52] throughout this lecture so let's look at another example so all right so actually [00:08:54] another example so all right so actually I do need volunteers for this so in this [00:08:57] I do need volunteers for this so in this example we have [00:08:59] example we have two rounds and the idea is you can at [00:09:02] two rounds and the idea is you can at any point in time you can choose two [00:09:04] any point in time you can choose two actions you can either stay or you can [00:09:06] actions you can either stay or you can quit okay if you decide to quit I'm [00:09:09] quit okay if you decide to quit I'm going to give you $10 actually I'm not [00:09:12] going to give you $10 actually I'm not gonna give you $10 but imagine I'm gonna [00:09:13] gonna give you $10 but imagine I'm gonna give you $10 and then we'll end the game [00:09:16] give you $10 and then we'll end the game okay and then if you decide to stay then [00:09:19] okay and then if you decide to stay then you're gonna get $4 and then I'll roll [00:09:20] you're gonna get $4 and then I'll roll the dice if I get one or two will end [00:09:23] the dice if I get one or two will end the game otherwise you're going to [00:09:26] the game otherwise you're going to continue to the next round and you can [00:09:27] continue to the next round and you can decide again okay so who wants to play [00:09:29] decide again okay so who wants to play with this okay all right volunteer do [00:09:32] with this okay all right volunteer do you want to stay or quit so that was [00:09:37] you want to stay or quit so that was easy you got your $10 does anyone else [00:09:40] easy you got your $10 does anyone else wanna play stay oh you got $8 sorry I [00:09:53] wanna play stay oh you got $8 sorry I kind of get the idea here [00:09:55] kind of get the idea here right so you have these actions and then [00:09:57] right so you have these actions and then with one of them like if you said to [00:09:59] with one of them like if you said to quit you deterministically will get your [00:10:01] quit you deterministically will get your $10 and you're done with the other one [00:10:04] $10 and you're done with the other one it's it's probabilistic and you kind of [00:10:06] it's it's probabilistic and you kind of want to see which one is better and what [00:10:08] want to see which one is better and what would be the best policy to take in this [00:10:10] would be the best policy to take in this setting so we'll come back to this [00:10:12] setting so we'll come back to this question you'll formalize this and we'll [00:10:14] question you'll formalize this and we'll go over this so okay so then you need to [00:10:26] go over this so okay so then you need to actually compute what is the expected [00:10:28] actually compute what is the expected utility right so and that's what we want [00:10:29] utility right so and that's what we want to do right so so you might say oh I [00:10:31] to do right so so you might say oh I want to I want to stay and then I get my [00:10:33] want to I want to stay and then I get my four dollars and I want to quit and then [00:10:35] four dollars and I want to quit and then I get 14 and maybe that is the way to go [00:10:37] I get 14 and maybe that is the way to go that that could be a strategy but for [00:10:39] that that could be a strategy but for doing that right like we were going to [00:10:41] doing that right like we were going to actually talk about that before doing [00:10:42] actually talk about that before doing that we are going to define what would [00:10:44] that we are going to define what would be the optimal policy one other thing [00:10:46] be the optimal policy one other thing that for this particular problem if [00:10:49] that for this particular problem if you're going to keep in mind is I'll [00:10:51] you're going to keep in mind is I'll talk about in a minute you find a policy [00:10:53] talk about in a minute you find a policy but but the policy the way we define it [00:10:55] but but the policy the way we define it is it's a function of state so if you [00:10:57] is it's a function of state so if you decide to stay that is your policy if [00:10:59] decide to stay that is your policy if you decide to not stay that is your [00:11:01] you decide to not stay that is your policy like you're not allowing [00:11:02] policy like you're not allowing searching right now like as I talked [00:11:04] searching right now like as I talked about this later in a lecture but I'll [00:11:06] about this later in a lecture but I'll come back to this problem okay so if you [00:11:08] come back to this problem okay so if you if you decide that your policy the thing [00:11:10] if you decide that your policy the thing you want to do is to just stay [00:11:12] you want to do is to just stay keep staying this is the probability of [00:11:15] keep staying this is the probability of like the total rewards that you're gonna [00:11:17] like the total rewards that you're gonna get so you're gonna get four with some [00:11:20] get so you're gonna get four with some probability and then if you're lucky [00:11:21] probability and then if you're lucky you're gonna get eight and then even if [00:11:23] you're gonna get eight and then even if you're luck here you're gonna get 12 and [00:11:25] you're luck here you're gonna get 12 and if you're luckier you're gonna get 16 [00:11:26] if you're luckier you're gonna get 16 but but the probabilities are going to [00:11:29] but but the probabilities are going to come down pretty much like really [00:11:31] come down pretty much like really quickly so the thing we care about in [00:11:34] quickly so the thing we care about in this setting is the expected utility [00:11:36] this setting is the expected utility right an expectation like if I if I if I [00:11:39] right an expectation like if I if I if I run this if I average all these possible [00:11:41] run this if I average all these possible paths that I can do what would be the [00:11:43] paths that I can do what would be the value that I get and for this particular [00:11:45] value that I get and for this particular problem it turns out that an expectation [00:11:47] problem it turns out that an expectation if you decide to stay you should get 12 [00:11:50] if you decide to stay you should get 12 so so you got really unlucky that you [00:11:52] so so you got really unlucky that you got 8 but but in general in expectation [00:11:54] got 8 but but in general in expectation you should decide to stay yeah and then [00:11:57] you should decide to stay yeah and then we actually want to spend a little bit [00:11:58] we actually want to spend a little bit of time in this lecture thinking about [00:12:00] of time in this lecture thinking about how we get that 12 and then how to go [00:12:02] how we get that 12 and then how to go about computing this expected utility [00:12:04] about computing this expected utility and based on that how to decide what [00:12:06] and based on that how to decide what policy to use right okay and then if you [00:12:10] policy to use right okay and then if you decide to quit then then expected you [00:12:13] decide to quit then then expected you chose either it's kind of obvious right [00:12:14] chose either it's kind of obvious right because that you're quitting and that's [00:12:16] because that you're quitting and that's with probability one you're getting ten [00:12:18] with probability one you're getting ten dollars so you're just going to get ten [00:12:20] dollars so you're just going to get ten dollars and that is the expected utility [00:12:22] dollars and that is the expected utility of creating so so when you when I said [00:12:31] of creating so so when you when I said when you roll a die I said if you get [00:12:33] when you roll a die I said if you get one or two yeah you you say yeah and [00:12:37] one or two yeah you you say yeah and then if you get the other so the 2/3 of [00:12:39] then if you get the other so the 2/3 of it you continue so so it's a 1/3 2/3 [00:12:42] it you continue so so it's a 1/3 2/3 comes from there ok all right [00:12:45] comes from there ok all right I'll come back to this example this is [00:12:47] I'll come back to this example this is actually the running example throughout [00:12:48] actually the running example throughout this lecture so [00:12:56] what the lecture is about okay so let's [00:12:59] what the lecture is about okay so let's actually I do wonder finish it an hour [00:13:01] actually I do wonder finish it an hour that's why maybe I'm rushing things a [00:13:03] that's why maybe I'm rushing things a little bit but we are going to talk [00:13:05] little bit but we are going to talk about this problem like throughout the [00:13:06] about this problem like throughout the class so so don't worry about it if it's [00:13:08] class so so don't worry about it if it's not clear at the end of it we can [00:13:09] not clear at the end of it we can clarify things okay all right so I do [00:13:12] clarify things okay all right so I do want to formalize this problem the way I [00:13:14] want to formalize this problem the way I want to formalize this problem is using [00:13:16] want to formalize this problem is using an MDP so I want to I want to formalize [00:13:18] an MDP so I want to I want to formalize this at the MA as a Markov decision [00:13:20] this at the MA as a Markov decision process maybe I can just use this so in [00:13:27] process maybe I can just use this so in Markov decision processes similar to [00:13:29] Markov decision processes similar to search problem so you're going to up [00:13:30] search problem so you're going to up stakes so in this particular game I'm [00:13:33] stakes so in this particular game I'm going to have two states I'm either in [00:13:35] going to have two states I'm either in the game or I'm out of the game so I'm [00:13:39] the game or I'm out of the game so I'm in an end state where everything we [00:13:41] in an end state where everything we ended you're out of the game you're done [00:13:44] ended you're out of the game you're done okay so so those are my states then when [00:13:48] okay so so those are my states then when I'm in these states [00:13:49] I'm in these states I'm in each of these states I can take [00:13:50] I'm in each of these states I can take an action and if I'm in in-state I can [00:13:54] an action and if I'm in in-state I can take 2 actions right I can either decide [00:13:56] take 2 actions right I can either decide to stay right or I can quit and if I if [00:14:07] to stay right or I can quit and if I if I decide to stay from in-state that [00:14:10] I decide to stay from in-state that takes me to something that I'm going to [00:14:12] takes me to something that I'm going to call a chance node so a chance node is a [00:14:16] call a chance node so a chance node is a node that represents state an action so [00:14:20] node that represents state an action so it's not really like like the blue [00:14:21] it's not really like like the blue things are my states but I'm creating [00:14:23] things are my states but I'm creating these chance nodes as a way of kind of [00:14:25] these chance nodes as a way of kind of going through this example to see where [00:14:27] going through this example to see where things are going so so did these blue [00:14:29] things are going so so did these blue states are going to be my state I mean s [00:14:34] states are going to be my state I mean s these chance nodes are overstating [00:14:37] these chance nodes are overstating actions so basically the snail tells me [00:14:39] actions so basically the snail tells me that I started with in and I decided to [00:14:42] that I started with in and I decided to stay an attached node here basically [00:14:46] stay an attached node here basically tells me that I started with in and I [00:14:49] tells me that I started with in and I decided to quit even though it's [00:14:56] decided to quit even though it's deterministic me so I deterministically [00:14:58] deterministic me so I deterministically go there but then from the chance node [00:15:00] go there but then from the chance node that's where I'm introducing the [00:15:02] that's where I'm introducing the probabilities so from the chance node I [00:15:04] probabilities so from the chance node I can like Todd was the clay end up in [00:15:06] can like Todd was the clay end up in there [00:15:06] there different states in the case though it's [00:15:08] different states in the case though it's also deterministic in the case of the [00:15:10] also deterministic in the case of the quit in this case is deterministic yeah [00:15:12] quit in this case is deterministic yeah so in the case of the quit we say 41 I'm [00:15:15] so in the case of the quit we say 41 I'm going to end up in this end state so I'm [00:15:17] going to end up in this end state so I'm gonna draw that with the note with the [00:15:19] gonna draw that with the note with the edge that comes from my chance node and [00:15:20] edge that comes from my chance node and I'm gonna save it by what if one I'm [00:15:23] I'm gonna save it by what if one I'm going to get $10 and just be done okay [00:15:27] going to get $10 and just be done okay what if you're in this state this is [00:15:30] what if you're in this state this is actually the state where interesting [00:15:31] actually the state where interesting things can happen with probability 2/3 [00:15:34] things can happen with probability 2/3 I'm going to go back to in and get $4 or [00:15:41] I'm going to go back to in and get $4 or which probability 1/3 I'm going to end [00:15:45] which probability 1/3 I'm going to end up an end and for $4 so so that is my [00:15:53] up an end and for $4 so so that is my Markov decision process so so I had [00:15:56] Markov decision process so so I had maybe you can keep track of a list of [00:15:59] maybe you can keep track of a list of things we were defining in this lecture [00:16:00] things we were defining in this lecture so we just defined States and then we [00:16:04] so we just defined States and then we said well we're gonna have these chance [00:16:05] said well we're gonna have these chance nodes because from these chance nodes [00:16:07] nodes because from these chance nodes probabilistic who you're going to come [00:16:09] probabilistic who you're going to come out of them depending on what happens to [00:16:11] out of them depending on what happens to nature right and this is the decision [00:16:13] nature right and this is the decision I've made [00:16:14] I've made now nature kind of decides which one we [00:16:17] now nature kind of decides which one we are going to end up at and and based on [00:16:19] are going to end up at and and based on that we move forward yeah all right so [00:16:23] that we move forward yeah all right so so more formally we have a bunch of [00:16:25] so more formally we have a bunch of things when we define an MVP similar [00:16:28] things when we define an MVP similar search problems like we now need to [00:16:30] search problems like we now need to define the same set of things so so we [00:16:32] define the same set of things so so we have a set of states in this case my [00:16:34] have a set of states in this case my states are in an end [00:16:36] states are in an end ok we have a start state I'm starting [00:16:39] ok we have a start state I'm starting with in so that's my start state I have [00:16:42] with in so that's my start state I have actions as a function of States so when [00:16:45] actions as a function of States so when I ask what are the actions of the state [00:16:47] I ask what are the actions of the state my actions are going to be stay or quit [00:16:50] my actions are going to be stay or quit what are actions of end I don't have [00:16:54] what are actions of end I don't have anything and state doesn't have any [00:16:55] anything and state doesn't have any actions that come out of it and then we [00:16:58] actions that come out of it and then we have these transition probabilities so [00:17:00] have these transition probabilities so transition probabilities more formally [00:17:03] transition probabilities more formally take a state an action and a new state [00:17:06] take a state an action and a new state so as a s Prime and tell me what is the [00:17:10] so as a s Prime and tell me what is the transition probability of that it's 1/3 [00:17:12] transition probability of that it's 1/3 in this case and then I have a reward [00:17:15] in this case and then I have a reward which tells me how much was that [00:17:17] which tells me how much was that rewarding [00:17:17] rewarding four dollars so so I'm defining so when [00:17:21] four dollars so so I'm defining so when I'm defining my MDP kind of the new [00:17:24] I'm defining my MDP kind of the new things I'm defining is this transition [00:17:25] things I'm defining is this transition probability it shows me if you're in [00:17:28] probability it shows me if you're in state s and take action a and you end up [00:17:31] state s and take action a and you end up in s prime what is the probability of [00:17:32] in s prime what is the probability of that I'm an in I decide to stay and then [00:17:36] that I'm an in I decide to stay and then I end up in end what's the probability [00:17:38] I end up in end what's the probability of that that's one-third maybe I'm an in [00:17:40] of that that's one-third maybe I'm an in I decide to quit I end up in end what's [00:17:44] I decide to quit I end up in end what's the probability of that is equal to one [00:17:46] the probability of that is equal to one okay and then over the same state action [00:17:50] okay and then over the same state action state crimes like next States we're [00:17:52] state crimes like next States we're going to end up ad we were going to [00:17:53] going to end up ad we were going to define a reward which tells me how much [00:17:56] define a reward which tells me how much money did I get or like how how good was [00:17:59] money did I get or like how how good was that so it was four dollars in this case [00:18:01] that so it was four dollars in this case or or if I decide to quit I got ten [00:18:04] or or if I decide to quit I got ten dollars and if you remember in the case [00:18:08] dollars and if you remember in the case of search problems you're talking about [00:18:10] of search problems you're talking about cost I'm just flipping the sign here we [00:18:12] cost I'm just flipping the sign here we wanted to minimize cost here we want to [00:18:14] wanted to minimize cost here we want to maximize the reward it's just a more [00:18:15] maximize the reward it's just a more optimistic view of the world I guess so [00:18:19] optimistic view of the world I guess so so that is what the rewards are going to [00:18:21] so that is what the rewards are going to be you find on you also have this is end [00:18:24] be you find on you also have this is end function which again similar to search [00:18:26] function which again similar to search problems just checks if we are in an end [00:18:28] problems just checks if we are in an end state or not and in addition to that we [00:18:31] state or not and in addition to that we have something that's called a discount [00:18:32] have something that's called a discount factor it's it's this value gamma you [00:18:35] factor it's it's this value gamma you choose between zero and one and I'll [00:18:38] choose between zero and one and I'll talk about this later [00:18:40] talk about this later don't worry about it right now but it's [00:18:42] don't worry about it right now but it's a thing we define for our search problem [00:18:44] a thing we define for our search problem for our any piece all right so how do I [00:18:49] for our any piece all right so how do I compare this with search again these [00:18:51] compare this with search again these were the things that we had in a search [00:18:53] were the things that we had in a search problem we had the successor function [00:18:54] problem we had the successor function that would deterministically take me to [00:18:57] that would deterministically take me to s prime and we had this cost function I [00:18:59] s prime and we had this cost function I would tell me what was the cost of being [00:19:01] would tell me what was the cost of being in state s and taking action a so so the [00:19:04] in state s and taking action a so so the major things that are changed is that [00:19:05] major things that are changed is that instead of a successor function [00:19:07] instead of a successor function I have transition probabilities these [00:19:09] I have transition probabilities these T's that that basically terminal was the [00:19:11] T's that that basically terminal was the probability of starting in s taking [00:19:13] probability of starting in s taking action a and ending up in s prime and [00:19:15] action a and ending up in s prime and then the cost just became reward okay so [00:19:18] then the cost just became reward okay so those are kind of the major differences [00:19:20] those are kind of the major differences between search and MVP because things [00:19:22] between search and MVP because things are things are not deterministic [00:19:26] alright so that was the formalism now I [00:19:29] alright so that was the formalism now I can define any any MVP model any Markov [00:19:33] can define any any MVP model any Markov decision process and then one thing just [00:19:36] decision process and then one thing just one thing to point out is these [00:19:37] one thing to point out is these transition probabilities to see [00:19:39] transition probabilities to see basically specifies the probability of [00:19:42] basically specifies the probability of ending up in in state s prime if you [00:19:44] ending up in in state s prime if you take action a in state s so these are [00:19:47] take action a in state s so these are probabilities right so so for example [00:19:50] probabilities right so so for example you know like we have done this example [00:19:52] you know like we have done this example but let's just do it under on the slides [00:19:53] but let's just do it under on the slides again if I'm in state in I take action [00:19:56] again if I'm in state in I take action quit I end up an end what's the [00:19:59] quit I end up an end what's the probability of that one and then if I'm [00:20:02] probability of that one and then if I'm stayed in I take action stay I end up in [00:20:06] stayed in I take action stay I end up in state in again what's the probability of [00:20:08] state in again what's the probability of that I ended up in in again two-thirds [00:20:12] that I ended up in in again two-thirds and then if I'm stay in I take action [00:20:15] and then if I'm stay in I take action stay I end up in end it's probability of [00:20:17] stay I end up in end it's probability of that one third yeah and and these are [00:20:21] that one third yeah and and these are probabilities so what that means is they [00:20:23] probabilities so what that means is they need to kind of add up to one but one [00:20:26] need to kind of add up to one but one thing to notice is well just what is [00:20:28] thing to notice is well just what is gonna add up to one like like all of the [00:20:30] gonna add up to one like like all of the things in the column are not gonna add [00:20:31] things in the column are not gonna add up to one [00:20:32] up to one the thing that's going to add up to one [00:20:34] the thing that's going to add up to one is if you consider all possible these [00:20:36] is if you consider all possible these different X Prime's that you're going to [00:20:38] different X Prime's that you're going to end up at those probabilities are gonna [00:20:41] end up at those probabilities are gonna add up to one so so if you look at this [00:20:45] add up to one so so if you look at this table again if you look at deciding and [00:20:49] table again if you look at deciding and being stay in and taking actions stay [00:20:51] being stay in and taking actions stay then the probabilities that we have for [00:20:54] then the probabilities that we have for difference s Prime's are two-thirds and [00:20:56] difference s Prime's are two-thirds and one-third and those two are the things [00:20:58] one-third and those two are the things that are going to add up to one and in [00:21:00] that are going to add up to one and in the first case if you're in stay in and [00:21:02] the first case if you're in stay in and you decide to quit then wherever [00:21:04] you decide to quit then wherever whatever else primes you're gonna end up [00:21:06] whatever else primes you're gonna end up had in this case it's just the end State [00:21:07] had in this case it's just the end State those probabilities are gonna add up to [00:21:09] those probabilities are gonna add up to one so so more formally what that means [00:21:11] one so so more formally what that means is if I'm summing over s primes these [00:21:14] is if I'm summing over s primes these new states that I'm going to end up at [00:21:16] new states that I'm going to end up at the transition probabilities need to add [00:21:18] the transition probabilities need to add up to one because they're basically [00:21:20] up to one because they're basically probabilities that tell me what are the [00:21:23] probabilities that tell me what are the what are the things that can happen if I [00:21:25] what are the things that can happen if I take an action yeah and then this [00:21:29] take an action yeah and then this transition probabilities are going to be [00:21:31] transition probabilities are going to be non-negative because there are [00:21:33] non-negative because there are probabilities so that's also another [00:21:35] probabilities so that's also another property so [00:21:38] property so usual things alright so so that's a [00:21:42] usual things alright so so that's a search problem let's actually formalize [00:21:43] search problem let's actually formalize another search problem this is let's [00:21:45] another search problem this is let's actually try to code this up so what is [00:21:48] actually try to code this up so what is this search problem this is the [00:21:51] this search problem this is the problem so remember the problem I [00:21:52] problem so remember the problem I have blocks 1 through n what I want to [00:21:56] have blocks 1 through n what I want to do is I have two possible actions I can [00:21:58] do is I have two possible actions I can either walk from state s to state s plus [00:22:01] either walk from state s to state s plus 1 or I can take the magic tram that [00:22:04] 1 or I can take the magic tram that takes me from state s to stay to s if I [00:22:06] takes me from state s to stay to s if I walk that costs 1 minutes [00:22:09] walk that costs 1 minutes okay means reward of that is minus 1 if [00:22:12] okay means reward of that is minus 1 if I if I take the tram that costs 2 [00:22:15] I if I take the tram that costs 2 minutes and that means that the reward [00:22:17] minutes and that means that the reward of that is minus 2 okay and then the [00:22:21] of that is minus 2 okay and then the question was how like how do we want to [00:22:22] question was how like how do we want to travel from from 1 to N in the least [00:22:25] travel from from 1 to N in the least amount of time so so nothing here is is [00:22:28] amount of time so so nothing here is is probable stick yet right so I'm gonna [00:22:30] probable stick yet right so I'm gonna add an extra thing here which says the [00:22:33] add an extra thing here which says the tram is going to fail with probability [00:22:35] tram is going to fail with probability 0.5 so I'm gonna decide maybe to take a [00:22:38] 0.5 so I'm gonna decide maybe to take a trial at some point and that tram can [00:22:40] trial at some point and that tram can can fail with probability 0.5 if it [00:22:43] can fail with probability 0.5 if it fails I end up in my state like I don't [00:22:44] fails I end up in my state like I don't go anywhere and and actually like in [00:22:47] go anywhere and and actually like in this case you're assuming you're still [00:22:48] this case you're assuming you're still losing 2 minutes so if I decide to take [00:22:50] losing 2 minutes so if I decide to take the tram I'm gonna lose 2 minutes [00:22:52] the tram I'm gonna lose 2 minutes maybe you'll fail maybe you will not [00:22:54] maybe you'll fail maybe you will not okay all right so let's try to formalize [00:22:58] okay all right so let's try to formalize this so we're gonna take our tram [00:23:02] this so we're gonna take our tram problem from two lectures ago so this is [00:23:04] problem from two lectures ago so this is from search one we're gonna just copy [00:23:07] from search one we're gonna just copy that [00:23:08] that so alright so this was what we had from [00:23:13] so alright so this was what we had from last time he had this transportation [00:23:15] last time he had this transportation problem and we had all these like [00:23:17] problem and we had all these like algorithms to solve the search problem [00:23:19] algorithms to solve the search problem we don't really need them because we [00:23:20] we don't really need them because we have a new problem so it's just get rid [00:23:22] have a new problem so it's just get rid of them and now I just want to formalize [00:23:25] of them and now I just want to formalize an MDP so it's a transportation MDP ok [00:23:29] an MDP so it's a transportation MDP ok the initialization looks ok start state [00:23:33] the initialization looks ok start state looks ok I'm starting from 1 is end [00:23:35] looks ok I'm starting from 1 is end looks ok so the thing I'm going to [00:23:38] looks ok so the thing I'm going to change is the first off I need to add [00:23:40] change is the first off I need to add this actions function ok so what would [00:23:44] this actions function ok so what would actions do it's going to return a list [00:23:46] actions do it's going to return a list of actions that are potential actions [00:23:50] of actions that are potential actions and you give [00:23:50] and you give State so I just copy-paste it stuff from [00:23:53] State so I just copy-paste it stuff from down there you just edit so it's going [00:23:56] down there you just edit so it's going to return a list of valid actions okay [00:23:59] to return a list of valid actions okay so what are the valid actions I can take [00:24:03] so what are the valid actions I can take I can either walk or I can tram so I'm [00:24:05] I can either walk or I can tram so I'm gonna remove all these extra things that [00:24:06] gonna remove all these extra things that I had from before and just keep it to be [00:24:09] I had from before and just keep it to be I'm either walking or I'm taking the [00:24:11] I'm either walking or I'm taking the tram okay as long as it's a valid state [00:24:13] tram okay as long as it's a valid state so so that looks right for actions the [00:24:17] so so that looks right for actions the only thing we had was a successor and [00:24:19] only thing we had was a successor and cost function so so now we want to just [00:24:21] cost function so so now we want to just change that and return these transition [00:24:24] change that and return these transition probabilities and under wort so so it's [00:24:26] probabilities and under wort so so it's basically the successor probabilities [00:24:28] basically the successor probabilities and reward okay so I'm putting those two [00:24:32] and reward okay so I'm putting those two together similar to before we had [00:24:33] together similar to before we had successor and cost now I'm returning [00:24:35] successor and cost now I'm returning probabilities and rewards okay so so [00:24:38] probabilities and rewards okay so so what this function is going to return is [00:24:40] what this function is going to return is it's going to return this new state is s [00:24:41] it's going to return this new state is s Prime I'm going to end up at and the [00:24:44] Prime I'm going to end up at and the probability value for that under a word [00:24:45] probability value for that under a word of that okay so so given that I'm [00:24:48] of that okay so so given that I'm starting in state s and I'm taking [00:24:50] starting in state s and I'm taking action a then what are the potential s [00:24:53] action a then what are the potential s crimes that I can end up at and what are [00:24:56] crimes that I can end up at and what are the probabilities of that like then what [00:24:58] the probabilities of that like then what is T of si s Prime and what is the [00:25:01] is T of si s Prime and what is the reward of that what is the reward of si [00:25:02] reward of that what is the reward of si s Prime I want have a function to just [00:25:04] s Prime I want have a function to just return to these so I can call it later [00:25:06] return to these so I can call it later okay [00:25:10] all right so I need to basically check [00:25:18] all right so I need to basically check like for for each one of these actions I [00:25:21] like for for each one of these actions I can or for action walk what happens for [00:25:24] can or for action walk what happens for action walk what's new state I'm gonna [00:25:27] action walk what's new state I'm gonna end up back well I'm gonna end up at s [00:25:30] end up back well I'm gonna end up at s plus one it's a deterministic action so [00:25:33] plus one it's a deterministic action so I'm gonna end up there with Parvati one [00:25:35] I'm gonna end up there with Parvati one and what's the reward of that minus one [00:25:39] and what's the reward of that minus one because it's one minute costs so it's [00:25:41] because it's one minute costs so it's minus one report then for action tram we [00:25:45] minus one report then for action tram we kind of do the same thing but you have [00:25:47] kind of do the same thing but you have two options here I can I can end up in [00:25:50] two options here I can I can end up in 2's tram doesn't fail I end up into s [00:25:53] 2's tram doesn't fail I end up into s this probability 0.5 that cost that [00:25:55] this probability 0.5 that cost that reward of that is minus 2 or the other [00:25:58] reward of that is minus 2 or the other option is I'm going to end up in state s [00:26:01] option is I'm going to end up in state s cuz I didn't go anywhere because we [00:26:03] cuz I didn't go anywhere because we probability point 5 to Tran [00:26:04] probability point 5 to Tran to fail and that cut that nerve world of [00:26:07] to fail and that cut that nerve world of that is - - and that's pretty much it [00:26:10] that is - - and that's pretty much it that that is my my MVP so I can just [00:26:14] that that is my my MVP so I can just define this for a city with let's say [00:26:16] define this for a city with let's say ten blocks oh and we need to have the [00:26:19] ten blocks oh and we need to have the discount factor but we'll talk about [00:26:21] discount factor but we'll talk about that later let's say it's just one for [00:26:23] that later let's say it's just one for now yeah and they'll use right I'm [00:26:26] now yeah and they'll use right I'm writing this other states functions were [00:26:28] writing this other states functions were later but that look right just [00:26:32] later but that look right just formalized this MVP so let's check if it [00:26:37] formalized this MVP so let's check if it does the right thing so maybe we want to [00:26:39] does the right thing so maybe we want to know what are the actions from state 3 [00:26:41] know what are the actions from state 3 what are the actions from state 3 oh we [00:26:44] what are the actions from state 3 oh we need to remove this you to a function [00:26:46] need to remove this you to a function from before because we don't have it in [00:26:47] from before because we don't have it in the folder move that what are the [00:26:51] the folder move that what are the actions from state 3 I have 10 blocks if [00:26:55] actions from state 3 I have 10 blocks if I'm in state 3 I can either walk or tram [00:26:57] I'm in state 3 I can either walk or tram right or one of them is fine right so so [00:27:01] right or one of them is fine right so so that did the right thing [00:27:03] that did the right thing maybe we want to just check if this [00:27:06] maybe we want to just check if this successor probability and your horde [00:27:08] successor probability and your horde function does the right thing so maybe [00:27:10] function does the right thing so maybe maybe we can try that out for state 3 [00:27:13] maybe we can try that out for state 3 and walk so so for step 3 and action [00:27:15] and walk so so for step 3 and action walk then what do we get well we end up [00:27:18] walk then what do we get well we end up in 4 and that it that is with [00:27:22] in 4 and that it that is with probability 1 with the reward of minus 1 [00:27:25] probability 1 with the reward of minus 1 okay let's try that for tram again [00:27:31] okay let's try that for tram again remember tram can fail so I'm gonna get [00:27:33] remember tram can fail so I'm gonna get two things here so these are the things [00:27:36] two things here so these are the things I'm gonna get it for tram I'm going to [00:27:38] I'm gonna get it for tram I'm going to either end up in 6 with probability 0.5 [00:27:40] either end up in 6 with probability 0.5 with the reward of minus 2 or I will not [00:27:43] with the reward of minus 2 or I will not go anywhere I'm still at 3 with [00:27:45] go anywhere I'm still at 3 with probability 0.5 and that is with the [00:27:48] probability 0.5 and that is with the reward of minus 2 okay all right so that [00:27:54] reward of minus 2 okay all right so that was just a tram problem and we formalize [00:27:58] was just a tram problem and we formalize it as an MDP again the reason it's an [00:28:01] it as an MDP again the reason it's an MVP here is is that the tram can fail [00:28:03] MVP here is is that the tram can fail with probability 0.5 so we added that in [00:28:06] with probability 0.5 so we added that in let me define our transition function [00:28:07] let me define our transition function and our problem and our reward function [00:28:10] and our problem and our reward function okay all right everyone happy with how [00:28:15] okay all right everyone happy with how we are defining MVPs yeah [00:28:18] we are defining MVPs yeah okay pretty similar to search problems [00:28:19] okay pretty similar to search problems except for now we have these [00:28:20] except for now we have these probabilities alright so so now I have [00:28:24] probabilities alright so so now I have define an MVP that's great [00:28:27] define an MVP that's great the next question that in general we [00:28:29] the next question that in general we would like to answer is to give a [00:28:31] would like to answer is to give a solution right so there's a question so [00:28:37] solution right so there's a question so the Markov part means that you just [00:28:38] the Markov part means that you just depends also when you you just depend on [00:28:41] depends also when you you just depend on the state and this current state like a [00:28:43] the state and this current state like a wavy to find our state you remember our [00:28:44] wavy to find our state you remember our state is sufficient for us to make [00:28:46] state is sufficient for us to make optimal decisions for the future so the [00:28:48] optimal decisions for the future so the Markov part means that your Markov and [00:28:49] Markov part means that your Markov and you only depends on the current state [00:28:50] you only depends on the current state and actions to end up in that probable [00:28:53] and actions to end up in that probable equally end up in the next next so yeah [00:28:57] equally end up in the next next so yeah so the interesting question who would [00:28:58] so the interesting question who would like to do is well we want to find a [00:29:00] like to do is well we want to find a solution I want to figure out what is [00:29:02] solution I want to figure out what is the optimal path to actually solve this [00:29:04] the optimal path to actually solve this problem and again if you remember search [00:29:06] problem and again if you remember search problems the solution to search problems [00:29:08] problems the solution to search problems was just a sequence of action said [00:29:10] was just a sequence of action said that's all I had like a sequence of [00:29:12] that's all I had like a sequence of actions a path that was a solution and [00:29:14] actions a path that was a solution and the reason that was a good solution was [00:29:16] the reason that was a good solution was like everything was deterministic so I [00:29:18] like everything was deterministic so I could just give you the path and then [00:29:20] could just give you the path and then that was what you would fall but in the [00:29:22] that was what you would fall but in the case of MVPs the way we are defining a [00:29:24] case of MVPs the way we are defining a solution is by using this notion of a [00:29:26] solution is by using this notion of a policy so a policy let me actually write [00:29:30] policy so a policy let me actually write that here so so you've defined an MVP [00:29:34] that here so so you've defined an MVP but now I want to say well what is a [00:29:35] but now I want to say well what is a solution of an MVP a solution of an [00:29:38] solution of an MVP a solution of an Markov the same process is a policy PI [00:29:41] Markov the same process is a policy PI of s so and this policy basically goes [00:29:46] of s so and this policy basically goes from States so it takes any state and it [00:29:50] from States so it takes any state and it tells me what is the part was a [00:29:52] tells me what is the part was a potential action that I would get for [00:29:55] potential action that I would get for that state okay so the policy is a [00:29:58] that state okay so the policy is a function it's a mapping from each state [00:30:00] function it's a mapping from each state s in the set of all possible States to [00:30:03] s in the set of all possible States to to an action in the set of all possible [00:30:05] to an action in the set of all possible actions okay so in the case of volcano [00:30:09] actions okay so in the case of volcano crossing like I can have something like [00:30:10] crossing like I can have something like this I can be in state 1 1 and then the [00:30:14] this I can be in state 1 1 and then the policy of that state could be going [00:30:16] policy of that state could be going south or I can be in state 2 1 and a [00:30:19] south or I can be in state 2 1 and a policy for that state is if this was a [00:30:23] policy for that state is if this was a search problem I would just give it path [00:30:25] search problem I would just give it path I would just say go south and then go to [00:30:27] I would just say go south and then go to go east and go north right so so that [00:30:29] go east and go north right so so that would be my solution but but again like [00:30:31] would be my solution but but again like if [00:30:31] if decide that well the policy at 1-1 is to [00:30:34] decide that well the policy at 1-1 is to go south there is no reason for you to [00:30:36] go south there is no reason for you to end up at South right because this thing [00:30:38] end up at South right because this thing this thing is probabilistic so the best [00:30:40] this thing is probabilistic so the best thing I can do is for every state just [00:30:43] thing I can do is for every state just tell you what is the best thing you can [00:30:44] tell you what is the best thing you can do for that particular state and and [00:30:46] do for that particular state and and that's why we are defining a policy as [00:30:47] that's why we are defining a policy as opposed to get giving like a full path [00:30:50] opposed to get giving like a full path all right so policy is the thing you're [00:30:53] all right so policy is the thing you're looking for and ideally I would like to [00:30:56] looking for and ideally I would like to find this best policy that would just [00:30:58] find this best policy that would just give me the right solution but in order [00:31:00] give me the right solution but in order to get there I want to spend a little [00:31:02] to get there I want to spend a little bit of time talking about how good a [00:31:04] bit of time talking about how good a policy would be so and that's kind of [00:31:06] policy would be so and that's kind of this idea of evaluating a policy so so [00:31:09] this idea of evaluating a policy so so in this middle section I don't want to [00:31:11] in this middle section I don't want to try to find a policy I just assume you [00:31:13] try to find a policy I just assume you give me a policy and I can evaluate it [00:31:15] give me a policy and I can evaluate it and tell you how good that is so so [00:31:17] and tell you how good that is so so that's the plan for the middle section [00:31:19] that's the plan for the middle section yeah all right everyone happy it so so [00:31:23] yeah all right everyone happy it so so far all I've done is I've defined an MVP [00:31:25] far all I've done is I've defined an MVP which is very similar to a search [00:31:26] which is very similar to a search problem [00:31:27] problem it's just probabilistic okay so so how [00:31:30] it's just probabilistic okay so so how would we evaluate a policy so if you [00:31:34] would we evaluate a policy so if you give me a policy which basically tells [00:31:36] give me a policy which basically tells me at every state s take some action [00:31:39] me at every state s take some action then that policy is going to generate a [00:31:42] then that policy is going to generate a random path right I can get multiple [00:31:44] random path right I can get multiple random paths because nature behaves [00:31:46] random paths because nature behaves differently and the world is uncertain [00:31:47] differently and the world is uncertain so I might get a bunch of random paths [00:31:49] so I might get a bunch of random paths and then those are all random variables [00:31:52] and then those are all random variables random paths sorry and then for each one [00:31:55] random paths sorry and then for each one of those random paths I can I can define [00:31:57] of those random paths I can I can define a utility so so what is a utility [00:31:59] a utility so so what is a utility utility is just going to be the sum of [00:32:02] utility is just going to be the sum of rewards that I'm going to get over that [00:32:04] rewards that I'm going to get over that path and I'm calling it as the discount [00:32:07] path and I'm calling it as the discount at some of the rewards remember that the [00:32:10] at some of the rewards remember that the scouts will talk about that but but you [00:32:12] scouts will talk about that but but you can't you can just count the future but [00:32:14] can't you can just count the future but before now just assume it's just a sum [00:32:16] before now just assume it's just a sum of rewards on that time okay so you took [00:32:19] of rewards on that time okay so you took the utility that you're going to get is [00:32:21] the utility that you're going to get is also going to be a random variable right [00:32:23] also going to be a random variable right because if you think about a policy the [00:32:27] because if you think about a policy the policy is going to generate a bunch of [00:32:28] policy is going to generate a bunch of random paths and and utility is just [00:32:31] random paths and and utility is just going to be the sum of rewards of each [00:32:33] going to be the sum of rewards of each one of those so it's a random variable [00:32:34] one of those so it's a random variable so if you remember this example right so [00:32:39] so if you remember this example right so I can I can basically have a path that [00:32:41] I can I can basically have a path that tells me starting in [00:32:43] tells me starting in and then stay and then that ends right [00:32:46] and then stay and then that ends right so so this is one random hat and for [00:32:48] so so this is one random hat and for this particular on your path throw what [00:32:50] this particular on your path throw what is it so see I'm gonna get I'm just [00:32:51] is it so see I'm gonna get I'm just gonna get four dollars that's one [00:32:53] gonna get four dollars that's one possible thing that can happen if my if [00:32:56] possible thing that can happen if my if my policy is to let's say stay like [00:33:00] my policy is to let's say stay like there is no reason for for the game to [00:33:02] there is no reason for for the game to end right here right like I can have a [00:33:04] end right here right like I can have a lot of different types of random path I [00:33:06] lot of different types of random path I can have a situation where I'm staying [00:33:07] can have a situation where I'm staying three times and an after that ending the [00:33:10] three times and an after that ending the game and utility of that is twelve we [00:33:12] game and utility of that is twelve we can have the situation where we have [00:33:13] can have the situation where we have stay stay and end that's the situation [00:33:15] stay stay and end that's the situation it's all like we told you have eight and [00:33:18] it's all like we told you have eight and so on so so you're getting all these [00:33:20] so on so so you're getting all these utilities for all these random paths so [00:33:22] utilities for all these random paths so so these utilities are also going to be [00:33:24] so these utilities are also going to be just random variables okay so I can't [00:33:27] just random variables okay so I can't really play your arm with the utility [00:33:28] really play your arm with the utility that's not telling me anything you know [00:33:29] that's not telling me anything you know it's telling me something but [00:33:31] it's telling me something but surrounding variable I can't optimize [00:33:33] surrounding variable I can't optimize that so instead we need to define [00:33:35] that so instead we need to define something that you can actually play [00:33:37] something that you can actually play around with it and and that is this idea [00:33:39] around with it and and that is this idea of a value which is just an expected [00:33:42] of a value which is just an expected utility so so the value of a policy is [00:33:45] utility so so the value of a policy is the expected utility of that policy and [00:33:48] the expected utility of that policy and and that's not a random variable anymore [00:33:49] and that's not a random variable anymore that's actually like a number and I can [00:33:51] that's actually like a number and I can I can compute that number I can compute [00:33:53] I can compute that number I can compute that number for every state and then [00:33:55] that number for every state and then just play around with value particularly [00:34:05] just play around with value particularly all possible so the question is yeah so [00:34:07] all possible so the question is yeah so when you say value of policy is a policy [00:34:09] when you say value of policy is a policy basically telling me it's a policy [00:34:13] basically telling me it's a policy basically telling me what what is the [00:34:16] basically telling me what what is the strategy for all possible states well [00:34:17] strategy for all possible states well you're defining a policy has a function [00:34:19] you're defining a policy has a function of state right so and value the same [00:34:22] of state right so and value the same thing as a function of state I might ask [00:34:23] thing as a function of state I might ask what is the value of being an in so the [00:34:26] what is the value of being an in so the value of being in in is is an following [00:34:29] value of being in in is is an following and following policies stay is going to [00:34:32] and following policies stay is going to be the value of following policies stay [00:34:36] be the value of following policies stay from this particular state which is [00:34:37] from this particular state which is expected utility of that which is which [00:34:39] expected utility of that which is which is basically that twelve value there I [00:34:42] is basically that twelve value there I could ask it for about any other state [00:34:44] could ask it for about any other state to so I can be in any other state and [00:34:46] to so I can be in any other state and then say well what's the value of that [00:34:47] then say well what's the value of that and and when we do value iteration and [00:34:49] and and when we do value iteration and you actually need to compute this value [00:34:50] you actually need to compute this value for all states to kind of have an idea [00:34:53] for all states to kind of have an idea of how to get from one state [00:34:55] of how to get from one state but way to being in state in yeah and [00:35:00] but way to being in state in yeah and the policy given your state in taking [00:35:02] the policy given your state in taking the action state yes yeah and that is [00:35:05] the action state yes yeah and that is that is what 12 is okay and 12 like we [00:35:08] that is what 12 is okay and 12 like we kind of empirically have seen it's 12 [00:35:10] kind of empirically have seen it's 12 but we haven't like shown how to get 12 [00:35:12] but we haven't like shown how to get 12 yet okay all right so um actually let me [00:35:16] yet okay all right so um actually let me write these my list of things so we [00:35:18] write these my list of things so we talked about the policy what else did we [00:35:20] talked about the policy what else did we talk about we talked about utility so [00:35:23] talk about we talked about utility so what is utility utility we said it's our [00:35:25] what is utility utility we said it's our rewards so if I get like reward one then [00:35:29] rewards so if I get like reward one then I get reward - it's a discount that [00:35:32] I get reward - it's a discount that someone rewards so I'm gonna use this [00:35:33] someone rewards so I'm gonna use this gamma which is that discount that I'll [00:35:35] gamma which is that discount that I'll talk about in a little bit times reward [00:35:37] talk about in a little bit times reward 2 plus gamma squared 2 times reward 3 [00:35:40] 2 plus gamma squared 2 times reward 3 and so on so utilities you give me a [00:35:42] and so on so utilities you give me a random path and I just sum up the [00:35:44] random path and I just sum up the rewards of that [00:35:45] rewards of that imagine if gamma is 1 I'm just summing [00:35:47] imagine if gamma is 1 I'm just summing up the rewards if gamma is not 1 on [00:35:49] up the rewards if gamma is not 1 on something I'm looking at this discounted [00:35:51] something I'm looking at this discounted song okay so so that is utility but [00:35:54] song okay so so that is utility but value so this is utility value is just [00:35:58] value so this is utility value is just the expected utility okay so you give me [00:36:03] the expected utility okay so you give me a bunch of random path I can compute [00:36:05] a bunch of random path I can compute their utilities I can just sum them up [00:36:07] their utilities I can just sum them up and average them and that gives me value [00:36:14] that's a very good question we'll get [00:36:16] that's a very good question we'll get back there so so so in general in okay [00:36:20] back there so so so in general in okay if it is a cyclic it is fine but if you [00:36:23] if it is a cyclic it is fine but if you have a cyclic graph you want your yama [00:36:25] have a cyclic graph you want your yama to be less than one and we will talk [00:36:27] to be less than one and we will talk about that when we get to the [00:36:28] about that when we get to the convergence of all right okay all right [00:36:35] convergence of all right okay all right so so let's go to the this particular [00:36:39] so so let's go to the this particular volcano crossing example so in this case [00:36:45] volcano crossing example so in this case like I can run this game and every time [00:36:48] like I can run this game and every time I run it I'm gonna get a different [00:36:49] I run it I'm gonna get a different utility cuz like I'm gonna end up in [00:36:52] utility cuz like I'm gonna end up in some random path some of them end up in [00:36:54] some random path some of them end up in the volcano that's pretty bad right so I [00:36:56] the volcano that's pretty bad right so I get different utility values utilities [00:37:00] get different utility values utilities but the value which is expected utility [00:37:02] but the value which is expected utility is not changing really it's just around [00:37:04] is not changing really it's just around 3.7 which is just the average of these [00:37:06] 3.7 which is just the average of these utilities so I can keep running this [00:37:08] utilities so I can keep running this getting the [00:37:08] getting the different utilities but what value is [00:37:10] different utilities but what value is this one number that I can I can talk [00:37:12] this one number that I can I can talk about and that's the value of this [00:37:15] about and that's the value of this particular state and that tells me like [00:37:17] particular state and that tells me like what would be the best policy that I can [00:37:19] what would be the best policy that I can take and what's the best amount of [00:37:20] take and what's the best amount of utility that I can get from an [00:37:22] utility that I can get from an expectation from that state all right so [00:37:29] expectation from that state all right so we've been talking about this utility [00:37:31] we've been talking about this utility I've actually written that already on [00:37:32] I've actually written that already on the board so utility is going to be a [00:37:34] the board so utility is going to be a discounted somewhat rewards and then [00:37:36] discounted somewhat rewards and then we've been talking about this discount [00:37:38] we've been talking about this discount factor and yeah yield of the discount [00:37:40] factor and yeah yield of the discount factor is I might like care about the [00:37:43] factor is I might like care about the future differently from how much I care [00:37:45] future differently from how much I care about now so so for example if you give [00:37:48] about now so so for example if you give me four dollars today and you give me [00:37:50] me four dollars today and you give me four dollars tomorrow look if that four [00:37:52] four dollars tomorrow look if that four dollars tomorrow is the same kind of [00:37:54] dollars tomorrow is the same kind of amount and has the same value to me as [00:37:56] amount and has the same value to me as today then then I might it's kind of the [00:37:58] today then then I might it's kind of the same idea of having a discount counter [00:38:00] same idea of having a discount counter of one account and discount of 1 gamma 1 [00:38:03] of one account and discount of 1 gamma 1 so you're saving for the future the [00:38:06] so you're saving for the future the values of things in the future is the [00:38:07] values of things in the future is the same amount if you give me 4 dollars now [00:38:09] same amount if you give me 4 dollars now if you give me four dollars 10 years [00:38:11] if you give me four dollars 10 years from now it's going to be 4 dollars I [00:38:13] from now it's going to be 4 dollars I care about it like with four dollars [00:38:16] care about it like with four dollars amount and I can just add things up but [00:38:19] amount and I can just add things up but it could also be the case like you might [00:38:20] it could also be the case like you might be in a situation in a particular MVP [00:38:23] be in a situation in a particular MVP where you don't care about the future as [00:38:25] where you don't care about the future as much maybe you give me four dollars ten [00:38:27] much maybe you give me four dollars ten years from now and that's that doesn't [00:38:28] years from now and that's that doesn't like I don't have any value for that so [00:38:30] like I don't have any value for that so so then if that is the case and you just [00:38:33] so then if that is the case and you just want to live in the moment and you don't [00:38:34] want to live in the moment and you don't care about the values you're gonna get [00:38:36] care about the values you're gonna get in the future then that's kind of the [00:38:38] in the future then that's kind of the other extreme when justice gamma this [00:38:40] other extreme when justice gamma this discount is equal to zero so so that is [00:38:43] discount is equal to zero so so that is a situation that if I get four dollars [00:38:45] a situation that if I get four dollars in the future that they don't like value [00:38:48] in the future that they don't like value like I don't have any value to me [00:38:49] like I don't have any value to me they're just like are zero to me so so I [00:38:51] they're just like are zero to me so so I only care about right now living in the [00:38:53] only care about right now living in the moment [00:38:54] moment what is them on I'm gonna get and in [00:38:56] what is them on I'm gonna get and in reality you're like somewhere in between [00:38:58] reality you're like somewhere in between right like we're not just this case [00:39:00] right like we're not just this case we're real living in a moment we're also [00:39:02] we're real living in a moment we're also not this case that that everything is [00:39:04] not this case that that everything is just the same amounts like right now or [00:39:06] just the same amounts like right now or in the future like in balance life is a [00:39:08] in the future like in balance life is a setting where we have some discount [00:39:10] setting where we have some discount factor it's it's not zero it's not one [00:39:13] factor it's it's not zero it's not one it actually discounts values in the [00:39:15] it actually discounts values in the future because future maybe doesn't have [00:39:17] future because future maybe doesn't have the same values now but but we still [00:39:20] the same values now but but we still value things and if [00:39:21] value things and if like four dollars still something in the [00:39:23] like four dollars still something in the future and that's where we pick like a [00:39:25] future and that's where we pick like a gamma that's between zero and one so so [00:39:28] gamma that's between zero and one so so that is kind of a design choice like [00:39:29] that is kind of a design choice like depending on what problem you're in you [00:39:31] depending on what problem you're in you might want to choose a different gamma [00:39:44] it's not really an assessment of risk in [00:39:46] it's not really an assessment of risk in that way it depends on the problem it [00:39:48] that way it depends on the problem it depends on like in the particular [00:39:50] depends on like in the particular problem I do want to get values in the [00:39:52] problem I do want to get values in the future I have like some sort of [00:39:53] future I have like some sort of long-term like goal that I want to get [00:39:55] long-term like goal that I want to get to and I care about the future like it [00:39:57] to and I care about the future like it depends like if you're solving a game [00:39:59] depends like if you're solving a game versus you're solving like I don't know [00:40:00] versus you're solving like I don't know like a robot manipulation problem like [00:40:03] like a robot manipulation problem like it might just be very different like the [00:40:05] it might just be very different like the scale factors that you would use for a [00:40:07] scale factors that you would use for a lot of examples we would use in this [00:40:08] lot of examples we would use in this class you just choose a gamma that's [00:40:10] class you just choose a gamma that's close to one like usually like four for [00:40:13] close to one like usually like four for a lot of problems that we end up dealing [00:40:14] a lot of problems that we end up dealing with gamma is like points nine that's [00:40:16] with gamma is like points nine that's like the usual okay like for usual [00:40:19] like the usual okay like for usual problems like you might have a very [00:40:20] problems like you might have a very different problem and we don't care [00:40:21] different problem and we don't care about the future so so then we just drop [00:40:23] about the future so so then we just drop it [00:40:24] it yes okay so that's a good question so it [00:40:34] yes okay so that's a good question so it is gamma a hyper parameter that you need [00:40:36] is gamma a hyper parameter that you need to tune I would say gamma is a design [00:40:37] to tune I would say gamma is a design choice it's not a hyper parameter [00:40:39] choice it's not a hyper parameter necessarily in that sense that oh I'll [00:40:40] necessarily in that sense that oh I'll pick the right gamma I will do the right [00:40:42] pick the right gamma I will do the right thing you would want to pick a gamma [00:40:43] thing you would want to pick a gamma that kind of works well with your [00:40:45] that kind of works well with your problem statement and then gamma of zero [00:40:48] problem statement and then gamma of zero is kind of young greedy like you were [00:40:49] is kind of young greedy like you were picking like what is the best thing [00:40:51] picking like what is the best thing right now and I just don't care about [00:40:52] right now and I just don't care about the future ever [00:41:00] it doesn't really the Markov property [00:41:02] it doesn't really the Markov property it's just a discount of like you know [00:41:05] it's just a discount of like you know it's about the reward it's not about how [00:41:07] it's about the reward it's not about how this state affects the next state it's [00:41:09] this state affects the next state it's basically affects how much reward you're [00:41:11] basically affects how much reward you're gonna get or how much do value reward in [00:41:14] gonna get or how much do value reward in the future it doesn't it doesn't [00:41:15] the future it doesn't it doesn't actually like it's still Markov decision [00:41:17] actually like it's still Markov decision process what you're getting whether it's [00:41:24] process what you're getting whether it's affecting the reward yeah but it's more [00:41:25] affecting the reward yeah but it's more called because the iphone's state s and [00:41:27] called because the iphone's state s and I take action a I'm gonna end up in s [00:41:29] I take action a I'm gonna end up in s Prime and that doesn't depend on like [00:41:31] Prime and that doesn't depend on like gamma [00:41:33] gamma all right so okay so so in this section [00:41:37] all right so okay so so in this section we've been talking about this idea of [00:41:38] we've been talking about this idea of someone comes in and gives me the policy [00:41:40] someone comes in and gives me the policy so the policy is PI and what I want to [00:41:43] so the policy is PI and what I want to do is I want to figure out what's the [00:41:44] do is I want to figure out what's the value of that policy and again value is [00:41:47] value of that policy and again value is just expected utility [00:41:48] just expected utility okay so V PI of s is just six but a [00:41:50] okay so V PI of s is just six but a utility received by following this [00:41:53] utility received by following this policy PI from state yes okay so so I'm [00:41:56] policy PI from state yes okay so so I'm not doing anything fancy I'm not even [00:41:58] not doing anything fancy I'm not even trying to figure out what PI is all I [00:42:00] trying to figure out what PI is all I want to do is I want to just evaluate if [00:42:03] want to do is I want to just evaluate if you tell me this is PI how good is that [00:42:05] you tell me this is PI how good is that what's the value of that okay so so [00:42:08] what's the value of that okay so so that's what a value function is so value [00:42:12] that's what a value function is so value of a policy is V PI of s okay that's [00:42:17] of a policy is V PI of s okay that's expected utility of starting in some [00:42:19] expected utility of starting in some state let me put this here yeah so we PI [00:42:33] state let me put this here yeah so we PI is the value of the expected utility of [00:42:36] is the value of the expected utility of me starting in some state us okay and [00:42:41] me starting in some state us okay and state s has value of PI of s and if [00:42:44] state s has value of PI of s and if phones tells me that well you're [00:42:46] phones tells me that well you're following policy PI then I already know [00:42:49] following policy PI then I already know from state s the action I'm gonna take [00:42:51] from state s the action I'm gonna take is PI of s so that's very clear so I'll [00:42:55] is PI of s so that's very clear so I'll take PI of s and if I take PI of s well [00:42:58] take PI of s and if I take PI of s well I'm going to end up in some chance nope [00:43:02] I'm going to end up in some chance nope okay and that chance note is a state [00:43:06] okay and that chance note is a state action note it's going to be s and the [00:43:09] action note it's going to be s and the action I've decided the action is PI of [00:43:11] action I've decided the action is PI of s I have this define this new function [00:43:16] s I have this define this new function this Q function Q PI of Si which is just [00:43:22] this Q function Q PI of Si which is just the expected utility from the chance [00:43:24] the expected utility from the chance node okay so so we've talked about value [00:43:27] node okay so so we've talked about value values expected utility from my actual [00:43:29] values expected utility from my actual States I'm gonna talk about Q values as [00:43:32] States I'm gonna talk about Q values as expected utilities from the chance notes [00:43:35] expected utilities from the chance notes so after you have committed that you [00:43:36] so after you have committed that you have taken action a and then your [00:43:39] have taken action a and then your following policy PI then what is the [00:43:41] following policy PI then what is the expected utility from that point on okay [00:43:43] expected utility from that point on okay and well what does the expected utility [00:43:45] and well what does the expected utility from [00:43:46] from point on we are in a chance note so many [00:43:48] point on we are in a chance note so many things that can happen because I have [00:43:50] things that can happen because I have like nature is going to play Andros die [00:43:53] like nature is going to play Andros die and anything can happen and they're [00:43:56] and anything can happen and they're gonna happen with transition s a s Prime [00:44:00] gonna happen with transition s a s Prime and with that transition probability I'm [00:44:03] and with that transition probability I'm going to end up in a new state I'm gonna [00:44:05] going to end up in a new state I'm gonna call that s Prime and the value of that [00:44:07] call that s Prime and the value of that state again expected utility of that [00:44:09] state again expected utility of that state is V PI of s prime all right [00:44:14] state is V PI of s prime all right so okay so what are these actually equal [00:44:16] so okay so what are these actually equal to so I've just defined value as [00:44:18] to so I've just defined value as expected utility Q value as expected [00:44:21] expected utility Q value as expected utility from a chance node what are they [00:44:23] utility from a chance node what are they actually equal to okay so I'm gonna [00:44:26] actually equal to okay so I'm gonna write a recurrence that you're gonna use [00:44:28] write a recurrence that you're gonna use for the rest of the class so pay [00:44:30] for the rest of the class so pay attention for five seconds there's [00:44:32] attention for five seconds there's question day so they're both of them are [00:44:43] question day so they're both of them are expected value yeah just one is just a [00:44:46] expected value yeah just one is just a function off state the other one you've [00:44:47] function off state the other one you've committed to one action and the reason [00:44:49] committed to one action and the reason I'm defining both of them is to just [00:44:51] I'm defining both of them is to just writing my recurrence it's gonna be a [00:44:53] writing my recurrence it's gonna be a little bit easier because I have this [00:44:55] little bit easier because I have this state action note and I can talk about [00:44:56] state action note and I can talk about them and I can talk about pal but if I [00:44:58] them and I can talk about pal but if I get branching from these state action [00:45:00] get branching from these state action notes okay all right so I'm gonna write [00:45:02] notes okay all right so I'm gonna write a recurrence it's not hard but it's kind [00:45:05] a recurrence it's not hard but it's kind of the basis of the next like n lectures [00:45:08] of the basis of the next like n lectures so pay attention so alright so V PI of s [00:45:12] so pay attention so alright so V PI of s what is that equal to well that is going [00:45:17] what is that equal to well that is going to be equal to zero if I'm in an end [00:45:19] to be equal to zero if I'm in an end state so if is end of s is equal to true [00:45:25] state so if is end of s is equal to true then there is no expected utility that's [00:45:27] then there is no expected utility that's equal to zero that's an easy case [00:45:28] equal to zero that's an easy case otherwise [00:45:30] otherwise well I took policy is someone told me [00:45:33] well I took policy is someone told me take policy is so value is just equal to [00:45:36] take policy is so value is just equal to Q right so so in this case we PI of s if [00:45:40] Q right so so in this case we PI of s if someone comes and gives me policy hi [00:45:42] someone comes and gives me policy hi it's just equal to Q PI of s these two [00:45:50] it's just equal to Q PI of s these two are just equal to each other so the next [00:45:53] are just equal to each other so the next question one might ask is actually let [00:45:56] question one might ask is actually let me write this a little closer so I'll [00:45:58] me write this a little closer so I'll have some space [00:46:03] so this is equal to Q PI of s so so what [00:46:08] so this is equal to Q PI of s so so what is that equal to what is Q PI of Si [00:46:11] is that equal to what is Q PI of Si equal to this is via this so now I just [00:46:14] equal to this is via this so now I just want to know what is Q value key PI of [00:46:17] want to know what is Q value key PI of si what is that equal to [00:46:19] si what is that equal to okay so if I'm right here then there are [00:46:22] okay so if I'm right here then there are a bunch of different things that can [00:46:23] a bunch of different things that can happen right and I can end up in these [00:46:26] happen right and I can end up in these different s Prime so if I'm looking for [00:46:28] different s Prime so if I'm looking for the expected utility then I'm looking [00:46:30] the expected utility then I'm looking for the probability of me ending up in [00:46:31] for the probability of me ending up in this state times the utility of this [00:46:33] this state times the utility of this state plus the probability of we ending [00:46:35] state plus the probability of we ending a new state times they told you of that [00:46:37] a new state times they told you of that so so that is just equal to sum over all [00:46:40] so so that is just equal to sum over all possible s crimes that I can end up at [00:46:44] possible s crimes that I can end up at of transition probabilities of Si s [00:46:48] of transition probabilities of Si s prime transition probability of ending [00:46:50] prime transition probability of ending of a new state times the immediate [00:46:53] of a new state times the immediate reward that I'm gonna get reward of si s [00:46:57] reward that I'm gonna get reward of si s prime plus the value here what I care [00:47:01] prime plus the value here what I care about the discounted value so I'm gonna [00:47:03] about the discounted value so I'm gonna add gamma V PI of s prime because I'm [00:47:07] add gamma V PI of s prime because I'm talking about this this next state okay [00:47:10] talking about this this next state okay is this serious okay so this is the [00:47:17] is this serious okay so this is the recurrence that we were doing in policy [00:47:18] recurrence that we were doing in policy evaluation again remember someone came [00:47:20] evaluation again remember someone came and gave me policy PI so I'm writing [00:47:22] and gave me policy PI so I'm writing this policy PI here someone gave me [00:47:24] this policy PI here someone gave me policy PI I just want to know how good [00:47:26] policy PI I just want to know how good policy PI is I can do that by computing [00:47:29] policy PI is I can do that by computing V PI what is reply equal to someone told [00:47:31] V PI what is reply equal to someone told me it's your following policy PI so it's [00:47:33] me it's your following policy PI so it's got to be equal to just Q PI what is Q [00:47:36] got to be equal to just Q PI what is Q PI equal to it's just sum of all the [00:47:37] PI equal to it's just sum of all the like the expectation of all the places [00:47:40] like the expectation of all the places that I can end up at that sum over s [00:47:42] that I can end up at that sum over s primes transition probabilities of [00:47:44] primes transition probabilities of ending up in s prime times the reward [00:47:47] ending up in s prime times the reward the total reward you are getting which [00:47:48] the total reward you are getting which is the immediate reward Plus this [00:47:50] is the immediate reward Plus this counting in my future okay [00:47:54] and then following policy Pike I'm [00:47:57] and then following policy Pike I'm starting from the master plan yes sir [00:48:01] starting from the master plan yes sir you promised all right so okay so far so [00:48:06] you promised all right so okay so far so good so so that is how I can evaluate [00:48:09] good so so that is how I can evaluate this policy right so so I have these two [00:48:12] this policy right so so I have these two recurrences if I have these two [00:48:14] recurrences if I have these two recurrences I can just replace this guy [00:48:17] recurrences I can just replace this guy here and let's imagine we are in the [00:48:21] here and let's imagine we are in the case maybe I can use a different color [00:48:22] case maybe I can use a different color up here I'm just replacing I'm just [00:48:30] up here I'm just replacing I'm just replacing this guy right here I don't [00:48:32] replacing this guy right here I don't know if it's worth writing it imagine we [00:48:34] know if it's worth writing it imagine we are not in an end state if you're not in [00:48:36] are not in an end state if you're not in an in the end of state then we PI of s [00:48:38] an in the end of state then we PI of s or what is that equal to [00:48:40] or what is that equal to that is just equal to sum of transition [00:48:43] that is just equal to sum of transition probabilities si s prime over s Prime's [00:48:47] probabilities si s prime over s Prime's times immediate reward that I'm gonna [00:48:49] times immediate reward that I'm gonna get plus discounting V PI of s Prime [00:48:54] get plus discounting V PI of s Prime okay so this is kind of a recurrence [00:48:56] okay so this is kind of a recurrence that I have I literally just combined [00:48:58] that I have I literally just combined these two and wrote it in green if [00:49:00] these two and wrote it in green if you're not in an end state so if you're [00:49:02] you're not in an end state so if you're not in an end State [00:49:03] not in an end State this is the recurrence I have I have a [00:49:05] this is the recurrence I have I have a PI here I have a PI on this side too [00:49:08] PI here I have a PI on this side too so that is nice and and that is kind of [00:49:11] so that is nice and and that is kind of the the Placer I can compute V PI maybe [00:49:14] the the Placer I can compute V PI maybe I can do it iteratively or maybe I can [00:49:16] I can do it iteratively or maybe I can actually find a closed form solution for [00:49:17] actually find a closed form solution for some problems but that is basically what [00:49:19] some problems but that is basically what I'm gonna do I have V PI as a function [00:49:21] I'm gonna do I have V PI as a function that depends on V PI of s prime and I [00:49:23] that depends on V PI of s prime and I can just solve for this V PI okay it [00:49:26] can just solve for this V PI okay it allows me to evaluate policy PI I [00:49:28] allows me to evaluate policy PI I haven't figured out a new policy all I [00:49:30] haven't figured out a new policy all I have done is evaluating what's a value [00:49:32] have done is evaluating what's a value of pi okay all right okay so let's go [00:49:41] of pi okay all right okay so let's go back to this example so let's say that [00:49:44] back to this example so let's say that someone comes in and tells me about the [00:49:46] someone comes in and tells me about the policy you got to follow is is stay so [00:49:48] policy you got to follow is is stay so my policy is to stay okay I want to know [00:49:51] my policy is to stay okay I want to know I want to just evaluate that I want your [00:49:53] I want to just evaluate that I want your policy evaluation when you're doing [00:49:55] policy evaluation when you're doing policy evaluation you got to compute [00:49:57] policy evaluation you got to compute that V PI for all states so let's start [00:50:00] that V PI for all states so let's start with v pi up end well that is equal to 0 [00:50:03] with v pi up end well that is equal to 0 because we know V PI at the end state is [00:50:04] because we know V PI at the end state is just equal to 0 [00:50:06] just equal to 0 now I want to know what's V PI of in [00:50:08] now I want to know what's V PI of in okay state in what is that equal to [00:50:12] okay state in what is that equal to that's just equal to Q PI of in and stay [00:50:15] that's just equal to Q PI of in and stay right v pi is just equal to Q PI in [00:50:20] right v pi is just equal to Q PI in mistake so I'm going to replace that [00:50:24] mistake so I'm going to replace that that's just equal to 1/3 times immediate [00:50:27] that's just equal to 1/3 times immediate rewards which is for cost value of the [00:50:30] rewards which is for cost value of the next state I'm going to end up at which [00:50:32] next state I'm going to end up at which is end in this case plus 2/3 times the [00:50:36] is end in this case plus 2/3 times the immediate reward I'm going to get which [00:50:37] immediate reward I'm going to get which is 4 dollars plus value of the state I'm [00:50:39] is 4 dollars plus value of the state I'm going to end up at which is in okay so [00:50:42] going to end up at which is in okay so so that is just that sum that we have [00:50:44] so that is just that sum that we have there right the PI of end is 0 so let me [00:50:48] there right the PI of end is 0 so let me just put that 0 there I'm gonna put 0 [00:50:50] just put that 0 there I'm gonna put 0 there I only have one state here to [00:50:52] there I only have one state here to write so that I just have this as a [00:50:54] write so that I just have this as a function of this one state in so having [00:50:57] function of this one state in so having an equation I can find the closed form [00:50:59] an equation I can find the closed form solution of the PI of n I'm just going [00:51:01] solution of the PI of n I'm just going to move things around a little bit and [00:51:04] to move things around a little bit and then I'll find out that V PI of in is [00:51:06] then I'll find out that V PI of in is just equal to 12 so so that's how you [00:51:09] just equal to 12 so so that's how you get that 12 that I've been talking about [00:51:10] get that 12 that I've been talking about so you just found out that if you tell [00:51:13] so you just found out that if you tell me the policy to follow stay if that is [00:51:16] me the policy to follow stay if that is the policy then the value of that policy [00:51:18] the policy then the value of that policy from state in is equal to 12 yeah so so [00:51:27] from state in is equal to 12 yeah so so the policy is a function of state I only [00:51:29] the policy is a function of state I only have this one state that's interesting [00:51:31] have this one state that's interesting here right that that's on one state is [00:51:32] here right that that's on one state is in so I need to win and when I define my [00:51:35] in so I need to win and when I define my policy I need to kind of choose the same [00:51:37] policy I need to kind of choose the same policy for that stage right my policy [00:51:40] policy for that stage right my policy says and in you got either stay or you [00:51:42] says and in you got either stay or you got either quick quit all right so you [00:51:48] got either quick quit all right so you can basically do the same thing using an [00:51:50] can basically do the same thing using an iterative algorithm to so so here like [00:51:52] iterative algorithm to so so here like in the previous example it was kind of [00:51:54] in the previous example it was kind of simple I just solved the closed form [00:51:56] simple I just solved the closed form solution but in reality like you might [00:51:58] solution but in reality like you might have different states and then in with [00:52:00] have different states and then in with the company it might be a little bit [00:52:02] the company it might be a little bit more complicated so we can actually have [00:52:04] more complicated so we can actually have an iterative algorithm that allows us to [00:52:06] an iterative algorithm that allows us to find these meat pies so the way we do [00:52:08] find these meat pies so the way we do that is we start with the values for all [00:52:13] that is we start with the values for all states to be equal to 0 and this 0 I put [00:52:17] states to be equal to 0 and this 0 I put here is the first iteration so I'm going [00:52:20] here is the first iteration so I'm going to [00:52:20] to my iterations here so so I'm gonna just [00:52:23] my iterations here so so I'm gonna just initialize all the values for all states [00:52:25] initialize all the values for all states to just be equal to zero okay then I'm [00:52:28] to just be equal to zero okay then I'm just gonna iterate for some number of [00:52:30] just gonna iterate for some number of time whatever number I care like I would [00:52:32] time whatever number I care like I would like to then what I'm gonna do is for [00:52:35] like to then what I'm gonna do is for every state again remember the value [00:52:37] every state again remember the value needs to be computed for every state so [00:52:39] needs to be computed for every state so for every state I'm gonna update my [00:52:42] for every state I'm gonna update my value by the same equation that I have [00:52:45] value by the same equation that I have on the board okay and the same equation [00:52:47] on the board okay and the same equation depends on the value at the previous [00:52:49] depends on the value at the previous time step so this is just an iterative [00:52:52] time step so this is just an iterative algorithm that allows me to compute new [00:52:54] algorithm that allows me to compute new values based on previous values that I [00:52:57] values based on previous values that I have had and I served it like everything [00:52:58] have had and I served it like everything zero and then I keep updating values of [00:53:01] zero and then I keep updating values of all states and they keep going so [00:53:06] all states and they keep going so basically that equation but think of it [00:53:08] basically that equation but think of it as like an iterative update every round [00:53:12] as like an iterative update every round so you don't you run this for multiple [00:53:14] so you don't you run this for multiple rounds every round you just update your [00:53:17] rounds every round you just update your value okay so like here is just a [00:53:22] value okay so like here is just a pictorial you're looking at it imagine [00:53:23] pictorial you're looking at it imagine you have like five states here you [00:53:26] you have like five states here you initialize all of them to be equal to [00:53:28] initialize all of them to be equal to zero the first round you're going to get [00:53:30] zero the first round you're going to get some value are going to update it and [00:53:32] some value are going to update it and then you're going to keep running this [00:53:34] then you're going to keep running this and then eventually you can kind of see [00:53:36] and then eventually you can kind of see that the last two columns are kind of [00:53:38] that the last two columns are kind of close to each other and you have [00:53:39] close to each other and you have converged to the true value so so you [00:53:42] converged to the true value so so you again someone comes and gives you the [00:53:44] again someone comes and gives you the policy you start with values equal to [00:53:47] policy you start with values equal to zero for all the states and then you [00:53:49] zero for all the states and then you just update it based on your previous [00:53:50] just update it based on your previous value yeah so how long should we run [00:53:55] value yeah so how long should we run this well we have a heuristic to kind of [00:53:58] this well we have a heuristic to kind of figure out how long we should run this [00:53:59] figure out how long we should run this particular algorithm one thing you can [00:54:02] particular algorithm one thing you can do is you can kind of keep track of the [00:54:04] do is you can kind of keep track of the difference between your value at the [00:54:05] difference between your value at the previous time step versus this time step [00:54:08] previous time step versus this time step so so if the difference is below some [00:54:10] so so if the difference is below some threshold you can kind of call it call [00:54:13] threshold you can kind of call it call it done and then say well I've found the [00:54:15] it done and then say well I've found the right values and then in this case we [00:54:18] right values and then in this case we were basically looking at the difference [00:54:20] were basically looking at the difference between value at iteration T versus [00:54:22] between value at iteration T versus value I generation t minus 1 and then [00:54:24] value I generation t minus 1 and then we're taking the max of that over all [00:54:26] we're taking the max of that over all possible states because I want the [00:54:28] possible states because I want the values to be close for all states [00:54:33] is this gear so so I'm going to talk [00:54:35] is this gear so so I'm going to talk about the convergence then you talk [00:54:37] about the convergence then you talk about the gamma factor and and the [00:54:39] about the gamma factor and and the discount factor and basically City and [00:54:42] discount factor and basically City and also how long you should run this to get [00:54:44] also how long you should run this to get these is also a difficult problem and it [00:54:46] these is also a difficult problem and it depends on the properties of your MVP so [00:54:48] depends on the properties of your MVP so if you have an ergodic if you have an [00:54:49] if you have an ergodic if you have an hour guard again between this is just [00:54:51] hour guard again between this is just should work okay but in general it's a [00:54:54] should work okay but in general it's a hard problem to answer for general [00:54:56] hard problem to answer for general Markov decision problem processes and [00:54:59] Markov decision problem processes and another thing to notice here is I'm not [00:55:01] another thing to notice here is I'm not storing that whole table like the only [00:55:03] storing that whole table like the only thing I'm storing is the last two [00:55:06] thing I'm storing is the last two columns of this table because because [00:55:08] columns of this table because because that's the PI at iteration T and V PI I [00:55:11] that's the PI at iteration T and V PI I generation t minus 1 those are like the [00:55:14] generation t minus 1 those are like the only things I'm storing because that [00:55:15] only things I'm storing because that allows me to compute if I have [00:55:17] allows me to compute if I have conversion that kind of allows me to [00:55:19] conversion that kind of allows me to keep going because I only need my [00:55:21] keep going because I only need my previous values to update my values [00:55:23] previous values to update my values right in terms of complexity but this is [00:55:28] right in terms of complexity but this is going to take order of T times s times s [00:55:30] going to take order of T times s times s prime well why is that because I'm [00:55:33] prime well why is that because I'm iterating over T time steps and I'm [00:55:37] iterating over T time steps and I'm iterating over all my states and I'm [00:55:39] iterating over all my states and I'm summing over all s Prime's so because of [00:55:42] summing over all s Prime's so because of that that's a complexity I yet and one [00:55:44] that that's a complexity I yet and one thing to notice here is it doesn't [00:55:46] thing to notice here is it doesn't depend on actions it doesn't depend on [00:55:48] depend on actions it doesn't depend on the size of actions and the reason it [00:55:50] the size of actions and the reason it doesn't depend on the size of actions is [00:55:52] doesn't depend on the size of actions is you have given me the policy you're [00:55:53] you have given me the policy you're telling me follow this policy so if [00:55:55] telling me follow this policy so if you've given me the policy then I don't [00:55:57] you've given me the policy then I don't really need to worry about like the [00:55:59] really need to worry about like the number of actions I have okay all right [00:56:05] here is just another like the same [00:56:08] here is just another like the same example that we have seen so at [00:56:10] example that we have seen so at iteration T equal to 1 n is going to get [00:56:13] iteration T equal to 1 n is going to get 4 and it's going to get 0 Ida duration 2 [00:56:16] 4 and it's going to get 0 Ida duration 2 it gets a slightly better value and then [00:56:19] it gets a slightly better value and then finally like a duration like 100 let's [00:56:22] finally like a duration like 100 let's say we get the value 12 and remember for [00:56:25] say we get the value 12 and remember for this particular example like this [00:56:26] this particular example like this example we were able to solve it like [00:56:28] example we were able to solve it like solve the closed form we have v of [00:56:32] solve the closed form we have v of policy staying [00:56:33] policy staying from state n but but you could also run [00:56:38] from state n but but you could also run the iterative algorithm and get the same [00:56:40] the iterative algorithm and get the same value of 12 [00:56:45] the number of actions is the size of s [00:56:49] the number of actions is the size of s Prime no because you the size of s you [00:56:52] Prime no because you the size of s you might end up in very different different [00:56:54] might end up in very different different states this depends on your [00:56:55] states this depends on your probabilities the size of X prime is [00:56:59] probabilities the size of X prime is actually the size of like size of states [00:57:00] actually the size of like size of states is the same thing like it's the worst [00:57:03] is the same thing like it's the worst case scenario you're going from every [00:57:04] case scenario you're going from every state to every state just imagine that [00:57:07] state to every state just imagine that size of s okay oh it's the summary so [00:57:11] size of s okay oh it's the summary so far where are we so we have talked about [00:57:14] far where are we so we have talked about MVPs these are graphs with States and [00:57:17] MVPs these are graphs with States and chance notes and transition [00:57:18] chance notes and transition probabilities and rewards and we have [00:57:21] probabilities and rewards and we have talked about policy as the solution to [00:57:24] talked about policy as the solution to an MVP which is this function that takes [00:57:26] an MVP which is this function that takes a state and gives us an action okay we [00:57:29] a state and gives us an action okay we talked about value of a policy so value [00:57:31] talked about value of a policy so value of a policy is the expected utility of [00:57:34] of a policy is the expected utility of that policy so so if you talk about [00:57:36] that policy so so if you talk about utility like you have these random [00:57:39] utility like you have these random values before all these random paths [00:57:41] values before all these random paths that you're gonna get for every policy [00:57:42] that you're gonna get for every policy the value of utility is just an [00:57:44] the value of utility is just an expectation over all those random random [00:57:47] expectation over all those random random variables and so far we've talked about [00:57:49] variables and so far we've talked about this idea of policy evaluation which is [00:57:52] this idea of policy evaluation which is just an iterative algorithm to compute [00:57:54] just an iterative algorithm to compute what's the value of a state if you give [00:57:57] what's the value of a state if you give me some policy like how good is that [00:57:58] me some policy like how good is that policy what's the value I'm gonna get at [00:58:00] policy what's the value I'm gonna get at every state all right so that has been [00:58:05] every state all right so that has been all assuming you give me the policy now [00:58:07] all assuming you give me the policy now the thing I want to spend a little bit [00:58:09] the thing I want to spend a little bit of time on is figuring out how to find [00:58:11] of time on is figuring out how to find that policy here we only have a stay or [00:58:22] that policy here we only have a stay or quit if you have a different problem [00:58:24] quit if you have a different problem that they can learn another actually [00:58:27] that they can learn another actually state way or something trade is going to [00:58:32] state way or something trade is going to change the value of the policy [00:58:34] change the value of the policy because then you have a new action and [00:58:36] because then you have a new action and then you need to update our policies so [00:58:38] then you need to update our policies so in this case so far I'm assuming that [00:58:40] in this case so far I'm assuming that the set of actions is fixed I'm not like [00:58:42] the set of actions is fixed I'm not like adding new actions right like the way [00:58:44] adding new actions right like the way even with search problems like the way [00:58:45] even with search problems like the way we defined search problems or the way we [00:58:47] we defined search problems or the way we are defining MVPs is I am saying like [00:58:50] are defining MVPs is I am saying like I'm starting with a set up where states [00:58:52] I'm starting with a set up where states are fixed actions are fixed I have stay [00:58:54] are fixed actions are fixed I have stay and create those are like the only [00:58:55] and create those are like the only actions I can take the reward is fixed [00:58:58] actions I can take the reward is fixed transition probabilities are fixed under [00:59:00] transition probabilities are fixed under that scenario then what is the best the [00:59:03] that scenario then what is the best the best policy I can take and best policy [00:59:05] best policy I can take and best policy is just from those set up like they've [00:59:06] is just from those set up like they've already defined actions okay next [00:59:10] already defined actions okay next lecture we will talk about unknown [00:59:12] lecture we will talk about unknown settings like when we have transition [00:59:13] settings like when we have transition probabilities that are not known or [00:59:15] probabilities that are not known or reward functions that are not known and [00:59:16] reward functions that are not known and how we go about learning them and that [00:59:18] how we go about learning them and that would be the reinforcement learning [00:59:19] would be the reinforcement learning lecture so next lecture might address [00:59:21] lecture so next lecture might address some okay all right so let's talk about [00:59:23] some okay all right so let's talk about value iteration so so that was public [00:59:25] value iteration so so that was public evaluation so like that whole thing was [00:59:28] evaluation so like that whole thing was evaluation so now what I would like to [00:59:30] evaluation so now what I would like to do is I want to try to get the maximum [00:59:33] do is I want to try to get the maximum expected utility and find the set of [00:59:36] expected utility and find the set of policies that gets me the maximum [00:59:38] policies that gets me the maximum expected utility okay so to do that I'm [00:59:41] expected utility okay so to do that I'm gonna define this thing that's called an [00:59:43] gonna define this thing that's called an optimal value so instead of value have a [00:59:45] optimal value so instead of value have a particular policy I just want to be [00:59:47] particular policy I just want to be optimist which is the maximum value [00:59:49] optimist which is the maximum value attained by any policy so so you might [00:59:52] attained by any policy so so you might have a bunch of different policies I [00:59:54] have a bunch of different policies I just want that policy that maximizes the [00:59:56] just want that policy that maximizes the value okay so and that is the Optus so [01:00:01] value okay so and that is the Optus so um so let me go back to this to this [01:00:04] um so let me go back to this to this example so I'm gonna have this in [01:00:05] example so I'm gonna have this in parallel to this example of policy [01:00:08] parallel to this example of policy evaluation I want to do value iteration [01:00:09] evaluation I want to do value iteration okay so I'm gonna start from state s [01:00:13] okay so I'm gonna start from state s again state s has V opt [01:00:17] again state s has V opt s okay that is what I would like to find [01:00:21] s okay that is what I would like to find here I had V Pyrus if I'm looking for we [01:00:25] here I had V Pyrus if I'm looking for we opt of s then I can have multiple [01:00:28] opt of s then I can have multiple actions that can come out of here and I [01:00:31] actions that can come out of here and I don't know which one to take but like [01:00:33] don't know which one to take but like any of if I take any of them if I take [01:00:35] any of if I take any of them if I take this guy that takes me to a chance note [01:00:37] this guy that takes me to a chance note of si okay and then I'm looking for Q [01:00:43] of si okay and then I'm looking for Q opt of si [01:00:47] opt of si and from here it's actually pretty [01:00:48] and from here it's actually pretty similar to what we had right here so I'm [01:00:51] similar to what we had right here so I'm in a chance note anything can happen [01:00:53] in a chance note anything can happen right nature plays and with some [01:00:57] right nature plays and with some transition probability of a prime I'm [01:01:01] transition probability of a prime I'm going to end up in some new state and [01:01:04] going to end up in some new state and its prime and I care about the opt of [01:01:07] its prime and I care about the opt of that so if I'm looking for this optimal [01:01:12] that so if I'm looking for this optimal policy which comes from this optimal [01:01:14] policy which comes from this optimal value then I need to find V opt and if I [01:01:17] value then I need to find V opt and if I want to find V opt well that depends on [01:01:20] want to find V opt well that depends on what action I'm taking here but let's [01:01:22] what action I'm taking here but let's say I take one of these and if I take [01:01:23] say I take one of these and if I take one of these I end up in a chance note I [01:01:25] one of these I end up in a chance note I have Q opt of the saying that chance [01:01:28] have Q opt of the saying that chance note and then from that point on with [01:01:30] note and then from that point on with whatever probabilities I can end up in [01:01:32] whatever probabilities I can end up in some s prime okay so I want to write the [01:01:34] some s prime okay so I want to write the recurrence for this guy similar to the [01:01:37] recurrence for this guy similar to the recurrence that we wrote here it's gonna [01:01:38] recurrence that we wrote here it's gonna be actually very similar so okay so I'm [01:01:42] be actually very similar so okay so I'm going to start with you because that is [01:01:44] going to start with you because that is easier so what is Q opt of si that just [01:01:48] easier so what is Q opt of si that just seems very similar to this previous case [01:01:50] seems very similar to this previous case what is that equal to what was q pi q pi [01:01:56] what is that equal to what was q pi q pi was just some of transition [01:01:58] was just some of transition probabilities times rewards right so so [01:02:00] probabilities times rewards right so so what is Q opt yeah so so it would just [01:02:04] what is Q opt yeah so so it would just be basically this equation except for [01:02:06] be basically this equation except for I'm gonna replace me pi we'd be opt so [01:02:08] I'm gonna replace me pi we'd be opt so so from Q opt I can end up anywhere like [01:02:11] so from Q opt I can end up anywhere like based on the transition probabilities so [01:02:13] based on the transition probabilities so I'm going to sum up over s Prime's and [01:02:14] I'm going to sum up over s Prime's and all possible places that I can end up at [01:02:18] all possible places that I can end up at I'm gonna get an immediate reward which [01:02:20] I'm gonna get an immediate reward which is RSA s Prime and I'm gonna discount [01:02:24] is RSA s Prime and I'm gonna discount the future but the value of the future [01:02:27] the future but the value of the future is V opt of s prime okay so so far so [01:02:32] is V opt of s prime okay so so far so good that's Q opt how about we opt what [01:02:37] good that's Q opt how about we opt what is that equal to well it's going to be [01:02:42] is that equal to well it's going to be equal to zero if you're in an end state [01:02:44] equal to zero if you're in an end state that's similar to before so if his end [01:02:47] that's similar to before so if his end of S is true then ten eight is zero [01:02:51] of S is true then ten eight is zero otherwise I have I have a bunch of [01:02:54] otherwise I have I have a bunch of options here right I can take any of [01:02:56] options here right I can take any of these actions and I can get any Q opt so [01:02:59] these actions and I can get any Q opt so reach one should I pick which Q opt [01:03:01] reach one should I pick which Q opt should I pick the one that maximizes [01:03:06] should I pick the one that maximizes right which actually I should pick an [01:03:10] right which actually I should pick an action from the set of actions of that [01:03:14] action from the set of actions of that state that maximizes q opt so so the [01:03:18] state that maximizes q opt so so the only thing that has changed here is [01:03:20] only thing that has changed here is before someone told me what the policy [01:03:22] before someone told me what the policy is I just took the cue of that here I'm [01:03:25] is I just took the cue of that here I'm just picking the maximum value of Q and [01:03:28] just picking the maximum value of Q and that actually tells me what action to [01:03:31] that actually tells me what action to pick so what is the optimal policy what [01:03:35] pick so what is the optimal policy what should be the optimal policy I'm gonna [01:03:43] should be the optimal policy I'm gonna call it high opt is what is that equal [01:03:48] call it high opt is what is that equal to [01:03:48] to it's got to be the thing that maximizes [01:03:51] it's got to be the thing that maximizes V right which is the thing that [01:03:54] V right which is the thing that maximizes this this Q so because that [01:03:58] maximizes this this Q so because that gives me the action so it's going to be [01:04:01] gives me the action so it's going to be the argument of Q opt of SN a where a is [01:04:08] the argument of Q opt of SN a where a is in actions okay all right so this one is [01:04:19] in actions okay all right so this one is policy evaluation someone gave me the [01:04:22] policy evaluation someone gave me the policy with that policy I was able to [01:04:25] policy with that policy I was able to compute V I was able to compute Q I was [01:04:27] compute V I was able to compute Q I was able to write this recurrence then I had [01:04:29] able to write this recurrence then I had an iterative algorithm to do things this [01:04:32] an iterative algorithm to do things this is called value iteration this is to [01:04:35] is called value iteration this is to find the right policy iteration this is [01:04:39] find the right policy iteration this is to find the policy how do I do that well [01:04:41] to find the policy how do I do that well I have a value that's for the optimal [01:04:43] I have a value that's for the optimal optimal value that I can get and it's [01:04:45] optimal value that I can get and it's going to be maximum over all possible [01:04:47] going to be maximum over all possible actions I can take of the Q values and Q [01:04:50] actions I can take of the Q values and Q values is similar to before so I have [01:04:53] values is similar to before so I have this recurrence now and then the optimal [01:04:55] this recurrence now and then the optimal policy is just a narc max of Q [01:05:04] tiny far exactly like this to eight oh [01:05:07] tiny far exactly like this to eight oh yeah so you could get to ace it so the [01:05:09] yeah so you could get to ace it so the question is yeah like what if like I [01:05:11] question is yeah like what if like I have to ace that give me the same thing [01:05:12] have to ace that give me the same thing I can return any of them it depends on [01:05:14] I can return any of them it depends on your implementation of Mac's so you can [01:05:16] your implementation of Mac's so you can return any of them you're five minutes [01:05:21] return any of them you're five minutes over and be p1 okay so good news is the [01:05:27] over and be p1 okay so good news is the slice are the same things that I have on [01:05:28] slice are the same things that I have on the board so so Q opt is just equal to [01:05:32] the board so so Q opt is just equal to the sum that we've talked about V opt I [01:05:34] the sum that we've talked about V opt I just add the max on top of Q opt same [01:05:38] just add the max on top of Q opt same story okay and then if I want the policy [01:05:40] story okay and then if I want the policy then I just do the arc max of Q opt and [01:05:43] then I just do the arc max of Q opt and that gives me the policy right I can [01:05:46] that gives me the policy right I can have an again an iterative algorithm [01:05:47] have an again an iterative algorithm that does the same thing it's actually [01:05:49] that does the same thing it's actually quite similar to the iterative algorithm [01:05:51] quite similar to the iterative algorithm for policy evaluation I just start [01:05:53] for policy evaluation I just start setting everything to equal to zero I [01:05:55] setting everything to equal to zero I iterate for some number of times I go [01:05:57] iterate for some number of times I go over all possible states and then I just [01:06:00] over all possible states and then I just update my value based on this new [01:06:02] update my value based on this new recurrence that has a max so very [01:06:06] recurrence that has a max so very similar to before I just do this update [01:06:08] similar to before I just do this update one thing is the time complexity is [01:06:11] one thing is the time complexity is going to be order of T times s times a [01:06:13] going to be order of T times s times a times s fine because now I have this max [01:06:15] times s fine because now I have this max value over all possible actions so I'm [01:06:18] value over all possible actions so I'm actually iterating over all possible [01:06:20] actually iterating over all possible actions versus in policy evaluations I [01:06:22] actions versus in policy evaluations I didn't have a chriskiss someone would [01:06:24] didn't have a chriskiss someone would give me the policy I didn't need to [01:06:26] give me the policy I didn't need to worry about this all right so so let's [01:06:29] worry about this all right so so let's look at coding this up real quick okay [01:06:33] look at coding this up real quick okay so we have this MVP problem we define it [01:06:36] so we have this MVP problem we define it it was a tram problem it was [01:06:38] it was a tram problem it was probabilistic everything about it was [01:06:40] probabilistic everything about it was great so now I just want to do an [01:06:43] great so now I just want to do an algorithm section an inference section [01:06:45] algorithm section an inference section where I code up value duration and I can [01:06:49] where I code up value duration and I can call a value duration on this MVP [01:06:52] call a value duration on this MVP problem to get the best optimal policy [01:06:54] problem to get the best optimal policy okay so I'm going to call value [01:06:57] okay so I'm going to call value iteration later all right [01:07:07] so we initialize so all the values are [01:07:11] so we initialize so all the values are going to become I might skip things to [01:07:13] going to become I might skip things to make this faster so we're gonna [01:07:15] make this faster so we're gonna initialize all the values to just zero [01:07:17] initialize all the values to just zero right because all these values are going [01:07:19] right because all these values are going to be 0 so I defined a state's function [01:07:22] to be 0 so I defined a state's function so I for all of those the value is just [01:07:26] so I for all of those the value is just going to be equal to 0 [01:07:27] going to be equal to 0 so let's initialize with that then [01:07:30] so let's initialize with that then you're just gonna iterate or some number [01:07:31] you're just gonna iterate or some number of time and what we want to do is you [01:07:37] of time and what we want to do is you want to compute this new value given old [01:07:39] want to compute this new value given old values so it's an iterative algorithm we [01:07:41] values so it's an iterative algorithm we have old values you just update new [01:07:44] have old values you just update new values based on them so what should that [01:07:46] values based on them so what should that be equal to so we iterate over our [01:07:54] be equal to so we iterate over our state's if you're in an end state then [01:07:56] state's if you're in an end state then what is value equal to 0 right if you're [01:08:01] what is value equal to 0 right if you're not in an end state then you're just [01:08:03] not in an end state then you're just gonna do that it that that recurrence [01:08:05] gonna do that it that that recurrence there okay so new value of a state is [01:08:09] there okay so new value of a state is going to be equal to max of what the Q [01:08:11] going to be equal to max of what the Q values okay so new V is just max of Q's [01:08:17] values okay so new V is just max of Q's of states and actions okay so now I need [01:08:22] of states and actions okay so now I need to define Q what does Q do here of state [01:08:26] to define Q what does Q do here of state an action is just equal to that sum over [01:08:29] an action is just equal to that sum over / s Prime's so it's gonna return sum and [01:08:35] / s Prime's so it's gonna return sum and it's gonna return sum over s Prime's [01:08:37] it's gonna return sum over s Prime's I define this successor probability and [01:08:40] I define this successor probability and report function that gives me new state [01:08:43] report function that gives me new state probability and rewards so I'm gonna [01:08:44] probability and rewards so I'm gonna iterate over that and then call that up [01:08:47] iterate over that and then call that up here so given that I have a state in [01:08:51] here so given that I have a state in action I can get new state probability [01:08:53] action I can get new state probability and report what are we summing you're [01:08:55] and report what are we summing you're summing the probability the transition [01:08:57] summing the probability the transition probabilities times the immediate reward [01:08:59] probabilities times the immediate reward which is reward here times my costs my [01:09:02] which is reward here times my costs my discount times my V which is the old [01:09:05] discount times my V which is the old value of V over s prime over my new [01:09:08] value of V over s prime over my new state so that is my cue that is my V and [01:09:12] state so that is my cue that is my V and that's pretty much done we just need to [01:09:16] that's pretty much done we just need to check for a convergence to check for [01:09:18] check for a convergence to check for convergence we [01:09:19] convergence we kind of do the same thing as before we [01:09:20] kind of do the same thing as before we check if value of V and new V are close [01:09:23] check if value of V and new V are close enough to each other that we can't call [01:09:25] enough to each other that we can't call it done I'm gonna skip these parts so [01:09:28] it done I'm gonna skip these parts so you can basically check if V minus nu V [01:09:32] you can basically check if V minus nu V are within some threshold for for all [01:09:34] are within some threshold for for all states and if they are then V is equal [01:09:38] states and if they are then V is equal to nu V we need to read the policy so [01:09:41] to nu V we need to read the policy so policy is just arc max of Q so I'm gonna [01:09:46] policy is just arc max of Q so I'm gonna make this a little faster so the policy [01:09:48] make this a little faster so the policy is just going to be well none if you're [01:09:51] is just going to be well none if you're in an end state and otherwise it's just [01:09:53] in an end state and otherwise it's just going to be arc max of our Q values so [01:10:00] going to be arc max of our Q values so I'm just writing Arg max here pretty [01:10:02] I'm just writing Arg max here pretty much I'm just returning the action that [01:10:07] much I'm just returning the action that maximizes the Q and then we need to [01:10:10] maximizes the Q and then we need to spend a bunch of time getting the [01:10:11] spend a bunch of time getting the printing working so let me actually get [01:10:16] yeah okay all right actually right here [01:10:19] yeah okay all right actually right here so I'm running this function I'm rich [01:10:21] so I'm running this function I'm rich I'm I'm writing out actually these are a [01:10:24] I'm I'm writing out actually these are a little shifted weird States values and [01:10:27] little shifted weird States values and then PI which is the policy okay so it [01:10:30] then PI which is the policy okay so it starts off walk walk walk remember this [01:10:32] starts off walk walk walk remember this is the case where we have 50 percent [01:10:34] is the case where we have 50 percent probability of tram failing and with 50 [01:10:37] probability of tram failing and with 50 percent probability of translating these [01:10:39] percent probability of translating these are the values we were gonna get and the [01:10:41] are the values we were gonna get and the policies still walk until state five and [01:10:45] policies still walk until state five and then take the tram from from state five [01:10:48] then take the tram from from state five okay which is kind of interesting [01:10:51] okay which is kind of interesting because the policy of the search problem [01:10:53] because the policy of the search problem was the same thing too okay so the thing [01:10:56] was the same thing too okay so the thing we can do is we can actually let me move [01:10:59] we can do is we can actually let me move this little bit forward we can actually [01:11:01] this little bit forward we can actually define this failed probability which [01:11:03] define this failed probability which becomes just a variable so you can play [01:11:05] becomes just a variable so you can play around with this if you pick different [01:11:07] around with this if you pick different fail probabilities you're gonna get [01:11:09] fail probabilities you're gonna get different policies so for example if you [01:11:11] different policies so for example if you pick a failed probability that is large [01:11:13] pick a failed probability that is large then probably like the policy is going [01:11:16] then probably like the policy is going to be just just walk and never take the [01:11:18] to be just just walk and never take the tram because the tram is failing all the [01:11:20] tram because the tram is failing all the time but if you decide to take a failed [01:11:23] time but if you decide to take a failed probability that's close to zero then [01:11:26] probability that's close to zero then then this is your optimal policy which [01:11:28] then this is your optimal policy which is close to the search problem so it's [01:11:29] is close to the search problem so it's basically the solution to the search [01:11:31] basically the solution to the search problem so [01:11:32] problem so play around with this the code is online [01:11:35] play around with this the code is online this was just value duration value [01:11:40] this was just value duration value duration and use on this problem [01:11:44] duration and use on this problem okay so I'm gonna skip this one too [01:11:47] okay so I'm gonna skip this one too alright so yeah and then this is also [01:11:55] alright so yeah and then this is also showing like how over multiple [01:11:57] showing like how over multiple iterations you can kind of get to the [01:11:59] iterations you can kind of get to the get to the optimal optimal value and [01:12:01] get to the optimal optimal value and optimal policy using value duration so [01:12:03] optimal policy using value duration so in one iterations it hasn't seen it yet [01:12:06] in one iterations it hasn't seen it yet so it thinks that the value the optimal [01:12:08] so it thinks that the value the optimal value is 1.85 it hasn't updated the [01:12:10] value is 1.85 it hasn't updated the values and so like it I don't three [01:12:13] values and so like it I don't three iterations it gets better but it hasn't [01:12:16] iterations it gets better but it hasn't still updated it still thinks it can't [01:12:17] still updated it still thinks it can't get to the other side and remember this [01:12:20] get to the other side and remember this is a split probability of 10% but if I [01:12:23] is a split probability of 10% but if I get to like I think 10 then it [01:12:26] get to like I think 10 then it eventually learns the best policy is to [01:12:29] eventually learns the best policy is to get to 20 and the value is 13 point 68 [01:12:32] get to 20 and the value is 13 point 68 and if you go even like higher [01:12:34] and if you go even like higher iterations after that point it's just [01:12:35] iterations after that point it's just fine-tuning so the values are around 13 [01:12:38] fine-tuning so the values are around 13 so you can play around you the okay no [01:12:41] so you can play around you the okay no problem okay so when does this converge [01:12:43] problem okay so when does this converge so if your discount factor is less than [01:12:47] so if your discount factor is less than 1 or your MVP graph is a cyclic then [01:12:51] 1 or your MVP graph is a cyclic then this is going to converge so if MVP [01:12:53] this is going to converge so if MVP graph is a cyclic that's kind of obvious [01:12:54] graph is a cyclic that's kind of obvious you're just doing dynamic programming [01:12:56] you're just doing dynamic programming over your full thing so so that's going [01:12:58] over your full thing so so that's going to that's going to convert if you have [01:13:01] to that's going to convert if you have cycles you you want your your discount [01:13:04] cycles you you want your your discount to be less than 1 because if your if you [01:13:07] to be less than 1 because if your if you have cycles and your discount is let's [01:13:08] have cycles and your discount is let's say 1 and let's say you're getting zero [01:13:11] say 1 and let's say you're getting zero rewards from then you're never going to [01:13:14] rewards from then you're never going to change [01:13:14] change you're never going to move you move from [01:13:16] you're never going to move you move from your state you're always going to be [01:13:17] your state you're always going to be stuck in your state and if you have [01:13:19] stuck in your state and if you have nonzero rewards you're going to get this [01:13:21] nonzero rewards you're going to get this unbounded reward and keep going because [01:13:23] unbounded reward and keep going because you have cycles and and it's just going [01:13:25] you have cycles and and it's just going to end up becoming numerically so so [01:13:28] to end up becoming numerically so so just a good rule of thumb is pick a [01:13:29] just a good rule of thumb is pick a gamma that's less than 1 then then you [01:13:31] gamma that's less than 1 then then you kind of get this convergence property ok [01:13:34] kind of get this convergence property ok all right [01:13:36] all right so yeah summary so far is we have mdps [01:13:39] so yeah summary so far is we have mdps now we've talked about finding policies [01:13:44] now we've talked about finding policies rather than paths [01:13:46] rather than paths policy evaluation is just a way of [01:13:48] policy evaluation is just a way of computing like how good a policy is and [01:13:50] computing like how good a policy is and the reason I talk about policy [01:13:51] the reason I talk about policy evaluation is there's this other [01:13:53] evaluation is there's this other algorithm called policy iteration which [01:13:55] algorithm called policy iteration which uses policy evaluation and we didn't [01:13:57] uses policy evaluation and we didn't discuss that in the class but it's kind [01:13:59] discuss that in the class but it's kind of like a quick not equivalent but you [01:14:01] of like a quick not equivalent but you could use it in a similar manner as [01:14:03] could use it in a similar manner as value iteration it has its pros and cons [01:14:05] value iteration it has its pros and cons and so policy evaluation U is used in [01:14:07] and so policy evaluation U is used in those settings do not leave please we [01:14:09] those settings do not leave please we have more stuff to cover we have value [01:14:14] have more stuff to cover we have value iteration which computes this optimal [01:14:17] iteration which computes this optimal value which is the maximum expected [01:14:19] value which is the maximum expected utility okay and next time you're going [01:14:21] utility okay and next time you're going to talk about reinforcement learning and [01:14:22] to talk about reinforcement learning and that's gonna be awesome so talk about [01:14:25] that's gonna be awesome so talk about unknown rewards alright so that was MVPs [01:14:30] unknown rewards alright so that was MVPs doing inference and and kind of defining [01:14:34] doing inference and and kind of defining them I'm going back to the last lecture [01:14:36] them I'm going back to the last lecture just to kind of talk about some of the [01:14:38] just to kind of talk about some of the stuff that we didn't cover last time [01:14:40] stuff that we didn't cover last time okay all right so if you remember last [01:14:42] okay all right so if you remember last night we were talking about search [01:14:43] night we were talking about search problems so big future search problems [01:14:47] problems so big future search problems where we don't have probabilities and we [01:14:50] where we don't have probabilities and we talked about a store as a way of just [01:14:51] talked about a store as a way of just making things faster and we talked about [01:14:53] making things faster and we talked about this idea of relaxations which was a way [01:14:56] this idea of relaxations which was a way of finding good heuristics so a store [01:14:59] of finding good heuristics so a store had this heuristic heuristic was an [01:15:01] had this heuristic heuristic was an estimate of future cost we wanted to [01:15:03] estimate of future cost we wanted to figure out how to find these heuristics [01:15:05] figure out how to find these heuristics like how do we go about finding this [01:15:07] like how do we go about finding this heuristic and one idea was just to relax [01:15:09] heuristic and one idea was just to relax everything that allows you to come up [01:15:11] everything that allows you to come up with an easier search problem or just [01:15:13] with an easier search problem or just easier problem and that helps you to [01:15:15] easier problem and that helps you to find what the heuristic is okay so so we [01:15:19] find what the heuristic is okay so so we talked about this idea of removing [01:15:20] talked about this idea of removing constraints and when you remove [01:15:21] constraints and when you remove constraints then you can end up in nice [01:15:24] constraints then you can end up in nice situations like in some settings you [01:15:26] situations like in some settings you have a closed form solution in some [01:15:28] have a closed form solution in some other settings you have just an easier [01:15:29] other settings you have just an easier search problem and you can solve that [01:15:30] search problem and you can solve that and in some other settings you have like [01:15:32] and in some other settings you have like independent sell problems so when you [01:15:34] independent sell problems so when you remove constraints then then you have [01:15:37] remove constraints then then you have this easier problem you can solve that [01:15:39] this easier problem you can solve that easier problem and that gives you a [01:15:41] easier problem and that gives you a heuristic you're not done yet [01:15:43] heuristic you're not done yet right you're you have a heuristic you [01:15:45] right you're you have a heuristic you take that heuristic and then change your [01:15:47] take that heuristic and then change your costs and and just run uniform cost [01:15:49] costs and and just run uniform cost search on your original problem so so [01:15:52] search on your original problem so so solving an easier problem is like you're [01:15:54] solving an easier problem is like you're not done when you're solve the easy a [01:15:56] not done when you're solve the easy a problem it just helps you to find the [01:15:57] problem it just helps you to find the thing that helps for the origin [01:16:00] thing that helps for the origin problem so it's kind of like a [01:16:00] problem so it's kind of like a multi-step thing so examples of that is [01:16:04] multi-step thing so examples of that is if you have walls remove all the walls [01:16:06] if you have walls remove all the walls you have an easier problem if you solve [01:16:08] you have an easier problem if you solve that easier problem that gives you a [01:16:10] that easier problem that gives you a heuristic and in this case it's like [01:16:12] heuristic and in this case it's like when you knock down these walls that [01:16:13] when you knock down these walls that easier problem you have a closed-form [01:16:16] easier problem you have a closed-form solution for it you don't need to do [01:16:17] solution for it you don't need to do anything fancy you don't need to do [01:16:18] anything fancy you don't need to do uniform cost search any of that you just [01:16:20] uniform cost search any of that you just compute them in height and distance and [01:16:21] compute them in height and distance and and that gives you a heuristic with that [01:16:24] and that gives you a heuristic with that heuristic you go and solve your original [01:16:26] heuristic you go and solve your original problem that was one example another [01:16:28] problem that was one example another example is when you remove constraints [01:16:31] example is when you remove constraints you have an easier search problem so you [01:16:33] you have an easier search problem so you don't have closed form solutions but you [01:16:35] don't have closed form solutions but you have an easier search problem so you [01:16:36] have an easier search problem so you might have a really difficult search [01:16:38] might have a really difficult search problem with a bunch of constraints that [01:16:40] problem with a bunch of constraints that are hard to do remove the constraints so [01:16:42] are hard to do remove the constraints so when you remove the constraints you have [01:16:44] when you remove the constraints you have a relaxed problem which is just the [01:16:46] a relaxed problem which is just the original problem without the constraint [01:16:48] original problem without the constraint that's a search problem you can solve [01:16:50] that's a search problem you can solve that search problem using uniform cost [01:16:52] that search problem using uniform cost search or dynamic programming and and [01:16:54] search or dynamic programming and and solving that allows you to find the [01:16:56] solving that allows you to find the heuristic again you're not done yet [01:16:58] heuristic again you're not done yet right you take the heuristic and then [01:17:00] right you take the heuristic and then you go to the original problem change [01:17:02] you go to the original problem change the cost and I'm drawing uniform hazard [01:17:05] the cost and I'm drawing uniform hazard and just one quick kind of example here [01:17:08] and just one quick kind of example here was when you're computing these relaxed [01:17:10] was when you're computing these relaxed problems the thing you want to find is [01:17:11] problems the thing you want to find is the future costs of this this relaxed [01:17:14] the future costs of this this relaxed problem and to do that you have this [01:17:16] problem and to do that you have this easier search problem you still need to [01:17:18] easier search problem you still need to run uniform cost search or dynamic [01:17:20] run uniform cost search or dynamic programming in this case if you decide [01:17:22] programming in this case if you decide to run uniform cost search remember [01:17:24] to run uniform cost search remember uniform cost search computes past cost [01:17:26] uniform cost search computes past cost in this case I really want to compute [01:17:28] in this case I really want to compute future costs so you need to do a bunch [01:17:31] future costs so you need to do a bunch of engineering to get that working in [01:17:33] of engineering to get that working in this particular case the relaxed problem [01:17:35] this particular case the relaxed problem you need to reverse it because when you [01:17:37] you need to reverse it because when you reverse it past cost of the reversed [01:17:40] reverse it past cost of the reversed relaxed problem becomes future cost of [01:17:43] relaxed problem becomes future cost of the relaxed problem if that makes sense [01:17:45] the relaxed problem if that makes sense so so the way I'm reversing this is I'm [01:17:47] so so the way I'm reversing this is I'm basically saying start to stay this N [01:17:49] basically saying start to stay this N and state is 1 and my walk action takes [01:17:52] and state is 1 and my walk action takes me to s minus 1 instead of s plus 1 and [01:17:54] me to s minus 1 instead of s plus 1 and my Tran action takes me to s over 2 [01:17:56] my Tran action takes me to s over 2 instead of s times 2 and the whole [01:17:59] instead of s times 2 and the whole reason I'm doing that is is that the [01:18:01] reason I'm doing that is is that the past cost of this new problem is the [01:18:03] past cost of this new problem is the future cost of the non reversed version [01:18:05] future cost of the non reversed version ok because I need to use uniform cost [01:18:08] ok because I need to use uniform cost search here so I run my uniform cost [01:18:11] search here so I run my uniform cost search that gives me a heuristic [01:18:13] search that gives me a heuristic and that heuristic gives me this future [01:18:15] and that heuristic gives me this future cost of the relaxed problem and [01:18:17] cost of the relaxed problem and everything will be great [01:18:18] everything will be great another example is I can have [01:18:20] another example is I can have independent sub problems using my new [01:18:22] independent sub problems using my new RipStik so in this case like they have [01:18:24] RipStik so in this case like they have these tiles they technically cannot [01:18:26] these tiles they technically cannot overlap instead what we are allowing is [01:18:29] overlap instead what we are allowing is you're allowing them to overlap so if [01:18:30] you're allowing them to overlap so if they allow them to overlap I have eight [01:18:32] they allow them to overlap I have eight independent subproblems that I can solve [01:18:34] independent subproblems that I can solve these sub problems give me heuristics [01:18:36] these sub problems give me heuristics and I can just go with them okay so so [01:18:39] and I can just go with them okay so so these were just a bunch of examples and [01:18:41] these were just a bunch of examples and kind of the key idea was reducing edge [01:18:43] kind of the key idea was reducing edge like when we are coming up with this [01:18:45] like when we are coming up with this relaxed problems you're reducing edge [01:18:47] relaxed problems you're reducing edge costs from infinity to some finite cost [01:18:50] costs from infinity to some finite cost okay so I'm getting rid of walls before [01:18:52] okay so I'm getting rid of walls before I couldn't cross like it was infinity [01:18:54] I couldn't cross like it was infinity cost of that was infinity but if I get [01:18:55] cost of that was infinity but if I get rid of the wall I'm making it a finite [01:18:57] rid of the wall I'm making it a finite cost so this type of method this is a [01:19:02] cost so this type of method this is a general framework so the point I want to [01:19:04] general framework so the point I want to make is generally you can talk about the [01:19:06] make is generally you can talk about the relaxation of a search problem so if you [01:19:08] relaxation of a search problem so if you have a search problem P a relaxation of [01:19:10] have a search problem P a relaxation of a search problem I'm gonna call that P R [01:19:12] a search problem I'm gonna call that P R ap R L is going to be a problem or the [01:19:15] ap R L is going to be a problem or the cost of the relaxation for any state [01:19:18] cost of the relaxation for any state action is less than or equal to cost of [01:19:21] action is less than or equal to cost of state and action I'll take questions [01:19:22] state and action I'll take questions afterwards all right so so that is a [01:19:25] afterwards all right so so that is a relaxed problem okay so the cool thing [01:19:28] relaxed problem okay so the cool thing about that is if you're given a relaxed [01:19:30] about that is if you're given a relaxed problem then you can pick your heuristic [01:19:33] problem then you can pick your heuristic to be the future cost of the relaxed [01:19:35] to be the future cost of the relaxed problem and that is called the relaxed [01:19:37] problem and that is called the relaxed heuristic okay so so this is kind of a [01:19:39] heuristic okay so so this is kind of a recipe a general framework like if [01:19:40] recipe a general framework like if someone asks you find a good heuristic [01:19:42] someone asks you find a good heuristic find a relaxed problem future cost of [01:19:45] find a relaxed problem future cost of the relaxed problem is a heuristic and [01:19:48] the relaxed problem is a heuristic and the cool thing about that is it turns [01:19:49] the cool thing about that is it turns out that that that that future cost of [01:19:52] out that that that that future cost of the relaxed problem mature deciding to [01:19:54] the relaxed problem mature deciding to be a heuristic is also consistent [01:19:55] be a heuristic is also consistent because we talked about all these [01:19:57] because we talked about all these consistency properties and and how you [01:19:59] consistency properties and and how you want to find a heuristic to be [01:20:00] want to find a heuristic to be consistent for the solution to be [01:20:02] consistent for the solution to be correct and how in the world am I going [01:20:04] correct and how in the world am I going to find a consistent heuristic well here [01:20:06] to find a consistent heuristic well here is one here is one way of finding [01:20:08] is one here is one way of finding consistent heuristics pick your problem [01:20:10] consistent heuristics pick your problem make it relaxed making it relaxed means [01:20:13] make it relaxed making it relaxed means that pick a cost that's less if we can [01:20:15] that pick a cost that's less if we can pick a relaxed problem where the cost is [01:20:17] pick a relaxed problem where the cost is less than the cost of the original [01:20:18] less than the cost of the original problem and then future cost of that [01:20:20] problem and then future cost of that relaxed problem is just going to be your [01:20:22] relaxed problem is just going to be your heuristic and it's going to be [01:20:24] heuristic and it's going to be consistent so proof of that is two lines [01:20:27] consistent so proof of that is two lines skip that and and the quick think about [01:20:30] skip that and and the quick think about this like what knows about this is [01:20:31] this like what knows about this is there's a trade-off here there's a [01:20:33] there's a trade-off here there's a trade-off between efficiency and [01:20:35] trade-off between efficiency and tightness so sure like making things [01:20:37] tightness so sure like making things relaxed and removing constraints it's [01:20:39] relaxed and removing constraints it's kind of fun right you have this easier [01:20:40] kind of fun right you have this easier problem and you just solve it and [01:20:42] problem and you just solve it and everything is great about it but it's [01:20:44] everything is great about it but it's not like there is kind of a trade-off [01:20:46] not like there is kind of a trade-off between how tight you want your [01:20:47] between how tight you want your heuristic to be like you shouldn't [01:20:49] heuristic to be like you shouldn't remove too many constraints because if [01:20:50] remove too many constraints because if you remove too many constraints then [01:20:52] you remove too many constraints then your heuristic is not a good estimate of [01:20:54] your heuristic is not a good estimate of future cost [01:20:55] future cost remember your heuristic is supposed to [01:20:57] remember your heuristic is supposed to be an estimate of future cost so so if [01:20:59] be an estimate of future cost so so if it is not a good estimate of future cost [01:21:00] it is not a good estimate of future cost and it's not tight then it's not that [01:21:02] and it's not tight then it's not that great so so there is a balance between [01:21:04] great so so there is a balance between how much you're removing you're [01:21:05] how much you're removing you're considering your constraints and and how [01:21:08] considering your constraints and and how that makes finding the heuristic easier [01:21:11] that makes finding the heuristic easier versus the fact that you want your [01:21:13] versus the fact that you want your heuristics to be tight and be close to [01:21:15] heuristics to be tight and be close to your future costs so so don't remove [01:21:16] your future costs so so don't remove everything leave some constraints and [01:21:19] everything leave some constraints and then solve it and you can also do things [01:21:22] then solve it and you can also do things like if you have two heuristics that are [01:21:24] like if you have two heuristics that are both consistent you can take the max of [01:21:26] both consistent you can take the max of that and you can take the max of that [01:21:27] that and you can take the max of that it's a little bit more restrictive maybe [01:21:29] it's a little bit more restrictive maybe maybe that is closer to your future [01:21:31] maybe that is closer to your future costs and that is then you can actually [01:21:33] costs and that is then you can actually show the max of that is also consistent [01:21:35] show the max of that is also consistent okay so we talked about like relaxation [01:21:39] okay so we talked about like relaxation say a start what a quick thing I want to [01:21:41] say a start what a quick thing I want to mention because that wasn't very clear [01:21:42] mention because that wasn't very clear last time it's structured this [01:21:44] last time it's structured this perceptron we talked about that a little [01:21:46] perceptron we talked about that a little bit too and we talked about convergence [01:21:47] bit too and we talked about convergence of that so quick things on that [01:21:49] of that so quick things on that structured perceptron actually converges [01:21:51] structured perceptron actually converges there was this question that if we have [01:21:53] there was this question that if we have if that if you have a patch that is [01:21:57] if that if you have a patch that is let's say walk tram and and we end up [01:22:01] let's say walk tram and and we end up recovering another path that is tram [01:22:03] recovering another path that is tram walk is that bad is that good well turns [01:22:07] walk is that bad is that good well turns out that the cost of both of these paths [01:22:09] out that the cost of both of these paths are the same thing so if I end up [01:22:10] are the same thing so if I end up getting this path that's perfectly fine [01:22:12] getting this path that's perfectly fine to write like that that is also with the [01:22:14] to write like that that is also with the same optimal weights in the example that [01:22:17] same optimal weights in the example that we have shown in a tram example I don't [01:22:19] we have shown in a tram example I don't think we are able to get two paths that [01:22:21] think we are able to get two paths that look like this because of the nature of [01:22:23] look like this because of the nature of the example so so in general things to [01:22:26] the example so so in general things to remember from structures perceptron is [01:22:28] remember from structures perceptron is it does converge it does converge in a [01:22:30] it does converge it does converge in a way that it can recover the two Weiss [01:22:32] way that it can recover the two Weiss but it doesn't necessarily get the exact [01:22:35] but it doesn't necessarily get the exact double use as we saw last time right [01:22:37] double use as we saw last time right like you might get two and four you [01:22:38] like you might get two and four you might get foreign air so guys as long as [01:22:40] might get foreign air so guys as long as you have the same [01:22:40] you have the same relationships that that is enough but [01:22:42] relationships that that is enough but but you're going to be able to get to [01:22:44] but you're going to be able to get to actualize and it does convert so with [01:22:47] actualize and it does convert so with that project conversation is going to be [01:22:50] that project conversation is going to be next time do take a look at do take a [01:22:53] next time do take a look at do take a look at the website so all the [01:22:54] look at the website so all the information on the project is on the [01:22:55] information on the project is on the website so you started thinking about it [01:22:57] website so you started thinking about it look at the project page and that has [01:22:59] look at the project page and that has something ================================================================================ LECTURE 020 ================================================================================ Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019) Source: https://www.youtube.com/watch?v=HpaHTfY52RQ --- Transcript [00:00:04] so this lecture is going to be on [00:00:06] so this lecture is going to be on reinforcement learning I will in [00:00:09] reinforcement learning I will in interest this time skip the quiz so so [00:00:12] interest this time skip the quiz so so the way to think about how reinforcement [00:00:15] the way to think about how reinforcement learning fits into what we've done so [00:00:17] learning fits into what we've done so far is you remember this class has this [00:00:21] far is you remember this class has this picture right so we talked about [00:00:22] picture right so we talked about different models and we talked about [00:00:24] different models and we talked about different algorithms inference [00:00:26] different algorithms inference algorithms to be able to predict using [00:00:28] algorithms to be able to predict using these models and answer queries and then [00:00:30] these models and answer queries and then we have learning which is how do you [00:00:33] we have learning which is how do you actually learn these models right so [00:00:36] actually learn these models right so every type of model we go through we [00:00:38] every type of model we go through we have to kind of check the boxes for each [00:00:39] have to kind of check the boxes for each of these pieces so last lecture we [00:00:44] of these pieces so last lecture we talked about Markov decision processes [00:00:46] talked about Markov decision processes this is a kind of a modeling framework [00:00:48] this is a kind of a modeling framework allows you to define models for example [00:00:51] allows you to define models for example for crossing volcanoes or playing dice [00:00:54] for crossing volcanoes or playing dice games or tram taking trams [00:00:56] games or tram taking trams what about inference so what do we have [00:00:59] what about inference so what do we have here last time we just had value [00:01:01] here last time we just had value iteration and which allows you to [00:01:05] iteration and which allows you to compute the optimal policy and policy [00:01:06] compute the optimal policy and policy evaluation which allows you to estimate [00:01:11] evaluation which allows you to estimate the value of a particular policy so [00:01:14] the value of a particular policy so these are algorithms that operate on the [00:01:18] these are algorithms that operate on the MVP right and we looked at these [00:01:21] MVP right and we looked at these algorithms last time so this lecture is [00:01:24] algorithms last time so this lecture is going to be about learning [00:01:26] going to be about learning I'll just put RL for now RL is not an [00:01:30] I'll just put RL for now RL is not an algorithm it's a kind of refers to the [00:01:32] algorithm it's a kind of refers to the family of algorithms that fits in this [00:01:35] family of algorithms that fits in this week but that's a way you should think [00:01:37] week but that's a way you should think about it RL allows you to either [00:01:40] about it RL allows you to either explicitly or implicitly as to my MVP s [00:01:42] explicitly or implicitly as to my MVP s and then once you have that you can do [00:01:44] and then once you have that you can do all these inference algorithms to figure [00:01:49] all these inference algorithms to figure out what the optimal policy is okay so [00:01:53] out what the optimal policy is okay so just to review so what is the MVP the [00:01:58] just to review so what is the MVP the clearest way remember to think about it [00:02:01] clearest way remember to think about it is it's in terms of a graph so you have [00:02:04] is it's in terms of a graph so you have a set of states so in this dice game we [00:02:07] a set of states so in this dice game we have in and n so we have a set of states [00:02:10] have in and n so we have a set of states from every state you have a set of [00:02:13] from every state you have a set of actions coming out so in this case stay [00:02:18] actions coming out so in this case stay and quit actions take you to chance [00:02:22] and quit actions take you to chance nodes where the you don't get to control [00:02:26] nodes where the you don't get to control what happens but nature does and there's [00:02:29] what happens but nature does and there's randomness so out of these chance nodes [00:02:31] randomness so out of these chance nodes are transitions each transition takes [00:02:35] are transitions each transition takes you into a state it has some probability [00:02:37] you into a state it has some probability associated with it so two-thirds in this [00:02:39] associated with it so two-thirds in this case it also has some reward associated [00:02:41] case it also has some reward associated with it which you pick up along the way [00:02:43] with it which you pick up along the way so naturally this has to be 1 3 4 and [00:02:47] so naturally this has to be 1 3 4 and remember the last time this was [00:02:49] remember the last time this was probability 1/10 okay so and then there [00:02:56] probability 1/10 okay so and then there is no the discount factor which gamma [00:02:59] is no the discount factor which gamma which is a number between 0 & 1 tells [00:03:01] which is a number between 0 & 1 tells you how much you value the future for [00:03:04] you how much you value the future for default you can think about it as 1 for [00:03:06] default you can think about it as 1 for simplicity okay so this is a Markov [00:03:10] simplicity okay so this is a Markov decision process and what do you do with [00:03:14] decision process and what do you do with one of these things we have a notion of [00:03:19] one of these things we have a notion of a policy and a policy [00:03:23] let's see I'll write it over here so a [00:03:27] let's see I'll write it over here so a policy denoted PI let me use screen so a [00:03:33] policy denoted PI let me use screen so a policy pi is a mapping from States to [00:03:38] policy pi is a mapping from States to action it tells you a policy when you [00:03:39] action it tells you a policy when you apply it says when I land here where [00:03:42] apply it says when I land here where should I go should I do stay or quit if [00:03:44] should I go should I do stay or quit if I land well I mean this is kind of a [00:03:46] I land well I mean this is kind of a simple MVP otherwise it usually would be [00:03:48] simple MVP otherwise it usually would be more States and for every state blue [00:03:50] more States and for every state blue circle it will tell you where to go and [00:03:53] circle it will tell you where to go and when you run a policy what happens you [00:03:58] when you run a policy what happens you get a path which I'm going to call an [00:04:02] get a path which I'm going to call an episode so what do you do you start in [00:04:04] episode so what do you do you start in state s0 not so that would be in in this [00:04:07] state s0 not so that would be in in this particular example you take an action a1 [00:04:11] particular example you take an action a1 let's say stay you get some reward in [00:04:15] let's say stay you get some reward in this case it would be for you end up in [00:04:17] this case it would be for you end up in a new state oops s1 and suppose you go [00:04:21] a new state oops s1 and suppose you go back to n and then you take another [00:04:23] back to n and then you take another action maybe it's stay reward is for [00:04:26] action maybe it's stay reward is for again and and so on right [00:04:29] again and and so on right so this [00:04:31] so this sequence is a path or in RL speak it's [00:04:35] sequence is a path or in RL speak it's an episode [00:04:38] an episode let's see so let me let me erase this [00:04:42] let's see so let me let me erase this comment and so this is an episode and [00:04:50] comment and so this is an episode and until you hit the end state and what [00:04:54] until you hit the end state and what happens out of an episode you can look [00:04:57] happens out of an episode you can look at a utility we're going to denote you [00:05:00] at a utility we're going to denote you which is the discounted some of the [00:05:02] which is the discounted some of the rewards along the way right so if you [00:05:05] rewards along the way right so if you you know stayed three times and then [00:05:09] you know stayed three times and then went there you would have utility of [00:05:12] went there you would have utility of four plus four plus four plus four so [00:05:14] four plus four plus four plus four so that would be sixteen okay so the last [00:05:19] that would be sixteen okay so the last lecture we didn't really work with the [00:05:22] lecture we didn't really work with the episodes and their utility because we [00:05:25] episodes and their utility because we were able to define set of recurrences [00:05:27] were able to define set of recurrences that computed the expected utility so [00:05:31] that computed the expected utility so remember that we want to you know we [00:05:35] remember that we want to you know we don't know what's going to happen so [00:05:36] don't know what's going to happen so there's a distribution and in order to [00:05:38] there's a distribution and in order to optimize something we have to turn it to [00:05:40] optimize something we have to turn it to a number that's what expectation does so [00:05:43] a number that's what expectation does so there's two concepts that we had from [00:05:46] there's two concepts that we had from last time one is the value function of a [00:05:49] last time one is the value function of a particular policy so V PI of S is the [00:05:51] particular policy so V PI of S is the expected utility if you follow PI from s [00:05:54] expected utility if you follow PI from s what does that mean that means if you [00:05:57] what does that mean that means if you take a particular s so let's take in and [00:06:00] take a particular s so let's take in and I put you there and you run the policy [00:06:02] I put you there and you run the policy so stay and you traverse this graph you [00:06:06] so stay and you traverse this graph you will have different utilities coming out [00:06:08] will have different utilities coming out and the average of those is going to be [00:06:10] and the average of those is going to be V PI of s similarly there's a Q value [00:06:14] V PI of s similarly there's a Q value expected utility if you first take an [00:06:17] expected utility if you first take an action from state s and then fall pi so [00:06:19] action from state s and then fall pi so what does that mean that means if I put [00:06:21] what does that mean that means if I put you on one of these read chance nodes [00:06:23] you on one of these read chance nodes and you basically play out the game and [00:06:27] and you basically play out the game and average the resulting utilities that you [00:06:29] average the resulting utilities that you get what number do you get okay and we [00:06:35] get what number do you get okay and we saw recurrences that related these two [00:06:38] saw recurrences that related these two so V PI of s is U recurrence the name of [00:06:43] so V PI of s is U recurrence the name of game is two kind of Dell [00:06:44] game is two kind of Dell to some kind of simple problem so you [00:06:47] to some kind of simple problem so you first look up what you're supposed to do [00:06:49] first look up what you're supposed to do in s that's pious and that takes you in [00:06:52] in s that's pious and that takes you in a chance node which is s comma PI R sub [00:06:55] a chance node which is s comma PI R sub s and then you say hey how much utility [00:06:57] s and then you say hey how much utility am I going to get from that node and [00:06:59] am I going to get from that node and similarly from the the chance nodes you [00:07:02] similarly from the the chance nodes you have to look at all the possible [00:07:04] have to look at all the possible successors the probability of going into [00:07:07] successors the probability of going into that successor of the immediate reward [00:07:10] that successor of the immediate reward that you get along the edge plus the [00:07:12] that you get along the edge plus the discounted reward of the kind of the [00:07:16] discounted reward of the kind of the future when you end up in s prime okay [00:07:22] future when you end up in s prime okay so any questions about this this is kind [00:07:25] so any questions about this this is kind of a review of Markov decision processes [00:07:27] of a review of Markov decision processes from last time okay so now we're able to [00:07:40] from last time okay so now we're able to do something different okay so if you [00:07:43] do something different okay so if you say goodbye to the transition and [00:07:44] say goodbye to the transition and rewards [00:07:45] rewards that's called reinforcement learning so [00:07:47] that's called reinforcement learning so remember Markov decision processes I [00:07:49] remember Markov decision processes I give you everything here and you just [00:07:51] give you everything here and you just have to find the optimal policy and now [00:07:54] have to find the optimal policy and now I'm gonna make life difficult by not [00:07:58] I'm gonna make life difficult by not even telling you what rewards and what [00:08:01] even telling you what rewards and what transitions you have to get okay so just [00:08:04] transitions you have to get okay so just to give a kind of flavor of what that's [00:08:06] to give a kind of flavor of what that's like let's play a game [00:08:08] like let's play a game so I'm gonna need a volunteer [00:08:11] so I'm gonna need a volunteer I'll give you the game but this [00:08:13] I'll give you the game but this volunteer you have to have a lot of grit [00:08:16] volunteer you have to have a lot of grit and persistence because this is not [00:08:18] and persistence because this is not gonna be an easy game he has to be one [00:08:20] gonna be an easy game he has to be one of those people that even though you're [00:08:22] of those people that even though you're losing a lot you're still gonna not give [00:08:25] losing a lot you're still gonna not give up okay so here's how the game works [00:08:27] up okay so here's how the game works so for each round R equals one two three [00:08:30] so for each round R equals one two three four or five six and so on you're just [00:08:33] four or five six and so on you're just gonna choose a or b red pill or blue [00:08:36] gonna choose a or b red pill or blue pill I guess and you you move to a new [00:08:39] pill I guess and you you move to a new state so the state is here and you get [00:08:43] state so the state is here and you get some rewards which I'm gonna show here [00:08:45] some rewards which I'm gonna show here okay and the state is five comma zero [00:08:49] okay and the state is five comma zero that's an initial state okay so [00:08:52] that's an initial state okay so everything clear about the rules of the [00:08:53] everything clear about the rules of the game [00:08:55] game that's reinforcement learning all right [00:08:57] that's reinforcement learning all right we don't know anything about how okay so [00:09:00] we don't know anything about how okay so any volunteers how about you in the [00:09:03] any volunteers how about you in the front okay that's an empty piece so that [00:09:24] front okay that's an empty piece so that case I hope I warned you I'm glad this [00:10:03] case I hope I warned you I'm glad this worked because last time it took a lot [00:10:05] worked because last time it took a lot longer but you know so what did you have [00:10:10] longer but you know so what did you have to do I mean you don't know what you try [00:10:12] to do I mean you don't know what you try so you try a and B and then hopefully [00:10:14] so you try a and B and then hopefully you're building MVP and you're your head [00:10:17] you're building MVP and you're your head right yeah right okay [00:10:19] right yeah right okay just smile and nod and you have to [00:10:21] just smile and nod and you have to figure out how the game works right so [00:10:23] figure out how the game works right so maybe you notice that hey a is no [00:10:25] maybe you notice that hey a is no decrementing and B is going up but then [00:10:27] decrementing and B is going up but then there's this other bit that gets flipped [00:10:29] there's this other bit that gets flipped so can you figure this out in a process [00:10:33] so can you figure this out in a process you're also trying to maximize reward [00:10:34] you're also trying to maximize reward which apparently I guess wasn't it [00:10:37] which apparently I guess wasn't it doesn't come until the very end because [00:10:39] doesn't come until the very end because it's a cruel game okay so how do we [00:10:44] it's a cruel game okay so how do we gonna algorithm to kind of do this and [00:10:45] gonna algorithm to kind of do this and how do we think about cross doing this [00:10:49] how do we think about cross doing this so just to kind of make the contrast [00:10:52] so just to kind of make the contrast between MVPs and reinforcement learning [00:10:54] between MVPs and reinforcement learning sharper so Markov decision process is an [00:10:57] sharper so Markov decision process is an offline thing right so you already have [00:11:00] offline thing right so you already have a mental model of how the work world [00:11:01] a mental model of how the work world works that's the MVP that's all the [00:11:03] works that's the MVP that's all the rewards in the transitions and the [00:11:04] rewards in the transitions and the states and actions and you have to find [00:11:07] states and actions and you have to find a policy to collect maximum rewards [00:11:09] a policy to collect maximum rewards you have it all in your head so you just [00:11:11] you have it all in your head so you just kind of think really hard about you know [00:11:13] kind of think really hard about you know what is the best thing is I know if I do [00:11:15] what is the best thing is I know if I do this action and I'll go here and you [00:11:17] this action and I'll go here and you know look at the probabilities take the [00:11:19] know look at the probabilities take the max or whatever so reinforcement [00:11:21] max or whatever so reinforcement learning is very different you don't [00:11:22] learning is very different you don't know how the world works so you can't [00:11:23] know how the world works so you can't just sit there and think because [00:11:25] just sit there and think because thinking isn't gonna help you figure out [00:11:26] thinking isn't gonna help you figure out how the world works so you have to just [00:11:28] how the world works so you have to just go out and perform actions in the world [00:11:30] go out and perform actions in the world right and in doing so you hopefully [00:11:33] right and in doing so you hopefully you'll learn something but also your [00:11:34] you'll learn something but also your you'll get some rewards okay so since [00:11:39] you'll get some rewards okay so since maybe formalize the paradigm of RL so [00:11:44] maybe formalize the paradigm of RL so you can think about as an agent that's [00:11:45] you can think about as an agent that's that's you and do you have the [00:11:49] that's you and do you have the environment which is everything else [00:11:50] environment which is everything else that's not agent the agent takes actions [00:11:53] that's not agent the agent takes actions so that I sense action to the [00:11:56] so that I sense action to the environment an environment just sends [00:11:57] environment an environment just sends you back rewards and a new state and you [00:12:00] you back rewards and a new state and you keep on doing this so what you have to [00:12:04] keep on doing this so what you have to do is figure out first of all how am I [00:12:07] do is figure out first of all how am I going to act if I'm in a particular [00:12:09] going to act if I'm in a particular state s t minus one what action should I [00:12:11] state s t minus one what action should I choose okay so that's one one question [00:12:16] choose okay so that's one one question and then you're gonna get this reward [00:12:18] and then you're gonna get this reward and observe a new state how what should [00:12:20] and observe a new state how what should I do to update my mental model of the [00:12:23] I do to update my mental model of the world okay so these are the main two [00:12:25] world okay so these are the main two questions I'm gonna talk first about how [00:12:28] questions I'm gonna talk first about how to update the parameters and then later [00:12:31] to update the parameters and then later in the lecture I'm gonna come back to [00:12:32] in the lecture I'm gonna come back to how do you actually go and you know [00:12:34] how do you actually go and you know explore okay so I'm not gonna say much [00:12:39] explore okay so I'm not gonna say much here but you know in the context of [00:12:41] here but you know in the context of volcano crossing just to kind of think [00:12:45] volcano crossing just to kind of think through things every time you play the [00:12:47] through things every time you play the game right you're gonna get some utility [00:12:49] game right you're gonna get some utility so you take of so this is the episode [00:12:51] so you take of so this is the episode over here so ARS you're gonna sometimes [00:12:54] over here so ARS you're gonna sometimes you you know fall into a pit sometimes [00:12:56] you you know fall into a pit sometimes you go to a hut and based on these [00:13:00] you go to a hut and based on these experiences if I didn't hadn't told you [00:13:03] experiences if I didn't hadn't told you what any of the actions do and what's [00:13:06] what any of the actions do and what's the probability or anything how would [00:13:09] the probability or anything how would you kind of go about just kind of [00:13:11] you kind of go about just kind of solving this problem that's that's the [00:13:13] solving this problem that's that's the question okay so there's a bunch of [00:13:17] question okay so there's a bunch of algorithms I think there's going to be [00:13:18] algorithms I think there's going to be one two three four at least [00:13:22] one two three four at least our algorithms that we're gonna talk [00:13:24] our algorithms that we're gonna talk about with different characteristics but [00:13:25] about with different characteristics but they're all gonna kind of build onto [00:13:27] they're all gonna kind of build onto each other in some way so the first [00:13:29] each other in some way so the first class of algorithms Monte Carlo methods [00:13:31] class of algorithms Monte Carlo methods right so okay so when you're ever you're [00:13:38] right so okay so when you're ever you're doing RL or any sort of learning the [00:13:41] doing RL or any sort of learning the first thing you get is you just have [00:13:43] first thing you get is you just have data let's let's suppose that you run [00:13:46] data let's let's suppose that you run even a random policies you're just gonna [00:13:48] even a random policies you're just gonna because in the beginning you don't know [00:13:49] because in the beginning you don't know any better so you're just gonna try [00:13:51] any better so you're just gonna try random actions and but in the process [00:13:53] random actions and but in the process you're gonna see hey I tried this action [00:13:55] you're gonna see hey I tried this action and a lot to this reward and and so on [00:13:57] and a lot to this reward and and so on so in the concrete example just to make [00:14:00] so in the concrete example just to make things open more crisp it's going to [00:14:03] things open more crisp it's going to look something like in and then you take [00:14:06] look something like in and then you take you know you did let's see let me try to [00:14:09] you know you did let's see let me try to color coordinate this so you're in n you [00:14:18] color coordinate this so you're in n you do stay and you get a free water four [00:14:22] do stay and you get a free water four and then you're back in in you do a stay [00:14:25] and then you're back in in you do a stay and then you get four and then maybe [00:14:29] and then you get four and then maybe you're done you're out okay so this is [00:14:31] you're done you're out okay so this is an example episode just to make things [00:14:34] an example episode just to make things concrete so this is s 0 a 1 R 1 s 2 s 1 [00:14:40] concrete so this is s 0 a 1 R 1 s 2 s 1 incrementing too quickly a - R - s - ok [00:14:46] incrementing too quickly a - R - s - ok ok so what should you do here all right [00:14:52] ok so what should you do here all right so any ideas model-based Monte Carlo so [00:14:58] so any ideas model-based Monte Carlo so if you have the MVP you would be done [00:15:00] if you have the MVP you would be done but we don't have them be key we have [00:15:02] but we don't have them be key we have data so what can we do [00:15:09] yeah yeah let's try to build an MDP from [00:15:13] yeah yeah let's try to build an MDP from that data okay so the idea is estimated [00:15:18] that data okay so the idea is estimated MDP so intuitively we just need to [00:15:24] MDP so intuitively we just need to figure out what the transitions and [00:15:26] figure out what the transitions and rewards are and when we're done right [00:15:27] rewards are and when we're done right so how do you do the transitions so the [00:15:31] so how do you do the transitions so the transition says if I'm in state s and I [00:15:34] transition says if I'm in state s and I take action a what will happen I don't [00:15:36] take action a what will happen I don't know what will happen but let's see in [00:15:38] know what will happen but let's see in the data what happened so I can go look [00:15:40] the data what happened so I can go look at the number of times I went into a [00:15:42] at the number of times I went into a particular s Prime and then divided over [00:15:46] particular s Prime and then divided over a number of times I attempted any this [00:15:49] a number of times I attempted any this action from that state at all and just [00:15:51] action from that state at all and just take the ratio okay and for the rewards [00:15:55] take the ratio okay and for the rewards this is actually fairly you know easy [00:15:58] this is actually fairly you know easy when I because when I observe a reward [00:16:01] when I because when I observe a reward from s a and s Prime I just write it [00:16:05] from s a and s Prime I just write it down and say that's the reward okay okay [00:16:09] down and say that's the reward okay okay so on the concrete example what does [00:16:10] so on the concrete example what does this look like so remember now here's mt [00:16:13] this look like so remember now here's mt b graph I don't know what the the [00:16:15] b graph I don't know what the the transition distribution or the rewards [00:16:18] transition distribution or the rewards are so let's suppose I get this [00:16:23] are so let's suppose I get this trajectory what should I do [00:16:25] trajectory what should I do so I get stay stay stay stay and I'm out [00:16:28] so I get stay stay stay stay and I'm out okay [00:16:29] okay so first I can write down the rewards of [00:16:33] so first I can write down the rewards of for here and then I can estimate the [00:16:38] for here and then I can estimate the probability of you know transitioning so [00:16:42] probability of you know transitioning so three out of four times I went back to N [00:16:44] three out of four times I went back to N one out of four times I went to N so I'm [00:16:46] one out of four times I went to N so I'm gonna estimate this as 3/4 1/4 okay but [00:16:50] gonna estimate this as 3/4 1/4 okay but then I suppose I get a new data point so [00:16:53] then I suppose I get a new data point so I have stay stay and so what do I do [00:16:57] I have stay stay and so what do I do I can add to these counts so everything [00:17:00] I can add to these counts so everything is kind of cumulative so two more times [00:17:02] is kind of cumulative so two more times I started one more time I went into N [00:17:05] I started one more time I went into N and another time I went to n so this [00:17:07] and another time I went to n so this becomes 4 out of 6 2 out of 6 and [00:17:09] becomes 4 out of 6 2 out of 6 and suppose I see another time when I just [00:17:11] suppose I see another time when I just go into n so I'm just gonna increment [00:17:13] go into n so I'm just gonna increment this counter and now it's 3 out of 7 and [00:17:15] this counter and now it's 3 out of 7 and 4 out of 7 [00:17:17] 4 out of 7 ok so pretty pretty simple [00:17:21] ok so pretty pretty simple okay so for reasons I'm not going to get [00:17:24] okay so for reasons I'm not going to get into this process actually you know [00:17:26] into this process actually you know converges to the if you do this kind of [00:17:28] converges to the if you do this kind of you know a million times you'll get [00:17:31] you know a million times you'll get pretty accurate that question do we know [00:17:40] pretty accurate that question do we know the number of states yes a question is [00:17:43] the number of states yes a question is you don't know the rewards or the [00:17:45] you don't know the rewards or the transitions but yes you do know the set [00:17:48] transitions but yes you do know the set of states and the actions set of states [00:17:52] of states and the actions set of states I guess you don't know have to know them [00:17:54] I guess you don't know have to know them all events but you have just observe [00:17:55] all events but you have just observe them as they come the actions you need [00:17:57] them as they come the actions you need to know because you your agent and you [00:17:59] to know because you your agent and you need to play the game yeah good question [00:18:03] need to play the game yeah good question okay so yeah so the question is does [00:18:14] okay so yeah so the question is does this work with variable rewards and if [00:18:19] this work with variable rewards and if the reward is not a function of SAS [00:18:22] the reward is not a function of SAS prime you would just take the average of [00:18:26] prime you would just take the average of the rewards that you see yeah okay so so [00:18:32] the rewards that you see yeah okay so so what do you do with this so after you [00:18:34] what do you do with this so after you ask them eight the MVP so you know you [00:18:35] ask them eight the MVP so you know you needed a transitions and rewards then [00:18:39] needed a transitions and rewards then now we have MVP in mind it may not be [00:18:42] now we have MVP in mind it may not be the exact right MVP because this is [00:18:44] the exact right MVP because this is estimated from data so it's not gonna [00:18:45] estimated from data so it's not gonna match it exactly but nonetheless we [00:18:48] match it exactly but nonetheless we already have these tools from last time [00:18:50] already have these tools from last time you can do value iteration to compute [00:18:52] you can do value iteration to compute the optimal policy on it and then you [00:18:55] the optimal policy on it and then you just you know you're done you run it on [00:18:57] just you know you're done you run it on in practice you were probably kind of [00:18:59] in practice you were probably kind of interleave the learning and the [00:19:01] interleave the learning and the optimization but for simplicity we can [00:19:04] optimization but for simplicity we can think about as a two stage where you [00:19:06] think about as a two stage where you gather a bunch of data you estimate the [00:19:07] gather a bunch of data you estimate the MTP and then you are off okay there's [00:19:11] MTP and then you are off okay there's one problem here does you wanna know [00:19:14] one problem here does you wanna know what the problem might be you can [00:19:17] what the problem might be you can actually see it by looking on the slide [00:19:25] yeah well with your face policy of [00:19:28] yeah well with your face policy of always staying on never explore the wait [00:19:30] always staying on never explore the wait branch of the world yeah yeah you didn't [00:19:33] branch of the world yeah yeah you didn't explore this at all so you actually [00:19:35] explore this at all so you actually don't know how much reward is here maybe [00:19:37] don't know how much reward is here maybe like you know 100 right so so this is [00:19:40] like you know 100 right so so this is this problem this kind of actually a [00:19:43] this problem this kind of actually a pretty big problem that unless you have [00:19:46] pretty big problem that unless you have a policy that actually goes and covers [00:19:50] a policy that actually goes and covers all the the states you just won't know [00:19:53] all the the states you just won't know right and this is kind of natural [00:19:54] right and this is kind of natural because there could always be you know a [00:19:56] because there could always be you know a lot of reward hiding under kind of one [00:19:59] lot of reward hiding under kind of one state but unless you see it do you you [00:20:01] state but unless you see it do you you don't do just so no okay so this is a [00:20:04] don't do just so no okay so this is a kind of key idea key challenge I would [00:20:08] kind of key idea key challenge I would say in reinforcement learning is [00:20:09] say in reinforcement learning is exploration so you need to be able to [00:20:12] exploration so you need to be able to explore the state space this is [00:20:14] explore the state space this is different from normal machine learning [00:20:16] different from normal machine learning where data just comes in passively and [00:20:18] where data just comes in passively and you learn a nice function and then [00:20:20] you learn a nice function and then you're you're done here you actually [00:20:22] you're you're done here you actually have to figure out how to get the data [00:20:24] have to figure out how to get the data and that's that's kind of one of the the [00:20:26] and that's that's kind of one of the the key challenges of RL so we're gonna go [00:20:33] key challenges of RL so we're gonna go back to this this problem and I'm not [00:20:35] back to this this problem and I'm not really going to try to solve it now um [00:20:37] really going to try to solve it now um for now you can just think about PI as a [00:20:39] for now you can just think about PI as a random policy because a random policy [00:20:41] random policy because a random policy you eventually will just you know hit [00:20:43] you eventually will just you know hit everything for you know finite small [00:20:46] everything for you know finite small state spaces okay so okay so that's the [00:20:54] state spaces okay so okay so that's the basically end of the first algorithm let [00:20:58] basically end of the first algorithm let me just write this over here so [00:21:00] me just write this over here so algorithms we have model-based Monte [00:21:05] algorithms we have model-based Monte Carlo and the model base is referring to [00:21:10] Carlo and the model base is referring to the fact that we're estimating a model [00:21:12] the fact that we're estimating a model that in particular to MVP the Monte [00:21:15] that in particular to MVP the Monte Carlo part is just referring to the fact [00:21:18] Carlo part is just referring to the fact that we're using samples to estimate a [00:21:22] that we're using samples to estimate a model or you're basically applying the [00:21:24] model or you're basically applying the policy multiple times and then [00:21:26] policy multiple times and then estimating the model based on averages [00:21:29] estimating the model based on averages okay so so now I'm going to present a [00:21:34] okay so so now I'm going to present a different algorithm and it's called [00:21:37] different algorithm and it's called model free [00:21:39] model free to call oh and you might from the name [00:21:45] to call oh and you might from the name guess what we might want to do is maybe [00:21:48] guess what we might want to do is maybe we don't have to estimate this small [00:21:50] we don't have to estimate this small okay and Wyatt why is that well what do [00:21:54] okay and Wyatt why is that well what do we do with this model what we did was we [00:21:58] we do with this model what we did was we you know presumably use value iteration [00:22:01] you know presumably use value iteration - you know compute the optimal policy [00:22:05] - you know compute the optimal policy and they remember this recurrence for [00:22:09] and they remember this recurrence for computing Q OTT it's in terms of T and [00:22:13] computing Q OTT it's in terms of T and reward but at the end of the day all you [00:22:15] reward but at the end of the day all you need is Q opt if I told you Q oft si [00:22:20] need is Q opt if I told you Q oft si which is what is aq opt si it's the the [00:22:24] which is what is aq opt si it's the the maximum possible utility I could get if [00:22:26] maximum possible utility I could get if I am in chance node si and I followed [00:22:29] I am in chance node si and I followed often policy so clearly if I knew that [00:22:31] often policy so clearly if I knew that then I would just produce often policy [00:22:34] then I would just produce often policy and our you've done I don't even need to [00:22:36] and our you've done I don't even need to know understand the rewards and [00:22:38] know understand the rewards and transitions okay so with that insight is [00:22:43] transitions okay so with that insight is model free learning which is that we're [00:22:45] model free learning which is that we're just going to try to estimate Q opt you [00:22:49] just going to try to estimate Q opt you know directly sometimes it can be a [00:22:52] know directly sometimes it can be a little bit confusing what is meant by [00:22:54] little bit confusing what is meant by model free so Q opt itself you can think [00:22:57] model free so Q opt itself you can think about it as a model but in the context [00:22:59] about it as a model but in the context of MDPs and reinforcement learning [00:23:01] of MDPs and reinforcement learning generally people when they say model [00:23:04] generally people when they say model free refers to the fact that there's no [00:23:05] free refers to the fact that there's no MDP model not that there's no model in [00:23:09] MDP model not that there's no model in general okay so so we're not going to [00:23:15] general okay so so we're not going to get to Q opt yet [00:23:18] get to Q opt yet that'll come later in a lecture so let's [00:23:20] that'll come later in a lecture so let's warm up a little bit so here's our data [00:23:23] warm up a little bit so here's our data staring at us remember let's look at a [00:23:29] staring at us remember let's look at a related quantity so Q pi remember what Q [00:23:31] related quantity so Q pi remember what Q pi is Q PI of Si is the expected utility [00:23:34] pi is Q PI of Si is the expected utility if we start at s and you first take [00:23:38] if we start at s and you first take action a and then follow a policy pi [00:23:40] action a and then follow a policy pi right so in in I guess another way to [00:23:45] right so in in I guess another way to write this is if you are at a particular [00:23:50] write this is if you are at a particular time step T you can define you [00:23:53] time step T you can define you t as the the discounted some of the [00:23:56] t as the the discounted some of the rewards from that point on which is you [00:23:58] rewards from that point on which is you know the reward immediately that you [00:24:00] know the reward immediately that you would get plus a discounted pot and then [00:24:02] would get plus a discounted pot and then I'm next time sad plus you know squared [00:24:05] I'm next time sad plus you know squared discounted and then to time steps in the [00:24:07] discounted and then to time steps in the future and so on and what you can do is [00:24:13] future and so on and what you can do is you can try to estimate Q PI from this [00:24:17] you can try to estimate Q PI from this utility right so this is the utility [00:24:20] utility right so this is the utility that you get at a particular time step [00:24:23] that you get at a particular time step so suppose you do the following so [00:24:26] so suppose you do the following so suppose you average the utilities that [00:24:28] suppose you average the utilities that you get only on the time steps where I [00:24:32] you get only on the time steps where I was in a particular state s and I took [00:24:35] was in a particular state s and I took an action a ok so you have it let's [00:24:39] an action a ok so you have it let's suppose you have a bunch of episodes [00:24:40] suppose you have a bunch of episodes right so here pictorially let's see [00:24:47] right so here pictorially let's see here's another way to think about it so [00:24:50] here's another way to think about it so I get a bunch of episodes I'm gonna do [00:24:52] I get a bunch of episodes I'm gonna do some abstract drawing here so every time [00:24:56] some abstract drawing here so every time you have you know si shows up here maybe [00:24:59] you have you know si shows up here maybe it shows up here maybe it shows up here [00:25:01] it shows up here maybe it shows up here maybe it shows up here you're gonna look [00:25:03] maybe it shows up here you're gonna look at how much reward I get from that point [00:25:05] at how much reward I get from that point on how much reward I get from here on [00:25:07] on how much reward I get from here on how much reward do I get from here on [00:25:09] how much reward do I get from here on and average them right so and there's a [00:25:13] and average them right so and there's a kind of a technicality which is that if [00:25:16] kind of a technicality which is that if si appears here and it also appears [00:25:18] si appears here and it also appears after it then I'm not going to count [00:25:21] after it then I'm not going to count that because I'm kind of if I do both [00:25:23] that because I'm kind of if I do both I'm kind of double counting in fact it [00:25:27] I'm kind of double counting in fact it works both ways but just conceptually [00:25:29] works both ways but just conceptually it's easier to think about just taking [00:25:31] it's easier to think about just taking the NS a if the same you don't kind of [00:25:35] the NS a if the same you don't kind of go back to the same you know s position [00:25:38] go back to the same you know s position ok so let's do that on a concrete [00:25:41] ok so let's do that on a concrete example so Q PI let's just write it Q PI [00:25:45] example so Q PI let's just write it Q PI as a is a thing where we're trying to [00:25:47] as a is a thing where we're trying to estimate and this is a value associated [00:25:51] estimate and this is a value associated with every chance node si so in [00:25:53] with every chance node si so in particular I've drawn it here I need a [00:25:55] particular I've drawn it here I need a value here and a value here okay so [00:26:00] value here and a value here okay so suppose I get some data I say and then I [00:26:03] suppose I get some data I say and then I got go to the end so what's my utility [00:26:06] got go to the end so what's my utility here [00:26:11] it's not a trick question [00:26:13] it's not a trick question for yes some of for is for okay so now I [00:26:18] for yes some of for is for okay so now I can say okay it's for that's my best [00:26:21] can say okay it's for that's my best guess so far I mean I haven't seen [00:26:22] guess so far I mean I haven't seen anything else maybe it's for so what [00:26:25] anything else maybe it's for so what happens if I play the game again and I [00:26:27] happens if I play the game again and I get four four so what's the utility here [00:26:30] get four four so what's the utility here eight so then I update this to the [00:26:34] eight so then I update this to the average of four and eight do it again I [00:26:36] average of four and eight do it again I get sixteen to deny average in the [00:26:39] get sixteen to deny average in the sixteen okay and and again you know I'm [00:26:45] sixteen okay and and again you know I'm using stay so I don't learn anything [00:26:47] using stay so I don't learn anything about this in practice you would [00:26:48] about this in practice you would actually go explore this and figure out [00:26:50] actually go explore this and figure out how much utility is saying there so in [00:26:53] how much utility is saying there so in particular notice I'm not updating the [00:26:55] particular notice I'm not updating the rewards nor the transitions because I'm [00:26:57] rewards nor the transitions because I'm model free I just care about the Q [00:26:59] model free I just care about the Q values that I get which are the values [00:27:01] values that I get which are the values that sit at the no it's not on the edges [00:27:06] okay so one caveat is that we are [00:27:10] okay so one caveat is that we are estimating Q PI naught Q opt will [00:27:12] estimating Q PI naught Q opt will revisit this later [00:27:14] revisit this later and another thing to kind of note is the [00:27:19] and another thing to kind of note is the difference between what is called on [00:27:21] difference between what is called on policy and off policy okay so in [00:27:27] policy and off policy okay so in reinforcement learning [00:27:29] reinforcement learning you're always following some policy to [00:27:32] you're always following some policy to get around the world right and that's [00:27:36] get around the world right and that's generally called an exploration policy [00:27:37] generally called an exploration policy or the control policy and then there's [00:27:40] or the control policy and then there's usually some other thing that you're [00:27:42] usually some other thing that you're trying to estimate usually the the value [00:27:44] trying to estimate usually the the value of a particular policy and that policy [00:27:46] of a particular policy and that policy could be the same or could be different [00:27:48] could be the same or could be different so on policy means that we're estimating [00:27:51] so on policy means that we're estimating the value of the policy that we're [00:27:54] the value of the policy that we're following the generate data generate [00:27:56] following the generate data generate policy our policy means that we're not [00:27:58] policy our policy means that we're not okay so so in particular is model free [00:28:05] okay so so in particular is model free Monte Carlo on policy or off policy it's [00:28:11] Monte Carlo on policy or off policy it's on policy because I'm estimating q pi [00:28:14] on policy because I'm estimating q pi knock you opt okay that's on policy and [00:28:19] knock you opt okay that's on policy and off policy what about model-based [00:28:23] off policy what about model-based Montecarlo I mean it's a little bit of a [00:28:27] Montecarlo I mean it's a little bit of a slightly weird question but in [00:28:33] slightly weird question but in model-based monte-carlo we're following [00:28:35] model-based monte-carlo we're following some policy maybe even random policy but [00:28:37] some policy maybe even random policy but we're estimating the transitions and [00:28:39] we're estimating the transitions and rewards and from that we can compute the [00:28:41] rewards and from that we can compute the optimal policy so you can you can think [00:28:44] optimal policy so you can you can think about is a off policy but you know [00:28:48] about is a off policy but you know that's maybe not completely standard [00:28:52] that's maybe not completely standard okay so any questions about what model [00:28:55] okay so any questions about what model free Monte Carlo is doing so let me just [00:28:58] free Monte Carlo is doing so let me just actually write so what does the [00:29:00] actually write so what does the model-based Monte College is doing it's [00:29:02] model-based Monte College is doing it's trying to estimate the the transition [00:29:05] trying to estimate the the transition and rewards and model 3 Monte Carlo is [00:29:08] and rewards and model 3 Monte Carlo is trying to estimate the Q pi okay and [00:29:15] trying to estimate the Q pi okay and just as a note I put hats on any letter [00:29:18] just as a note I put hats on any letter that is supposed to be a quantity that's [00:29:21] that is supposed to be a quantity that's estimated from data and that's what you [00:29:23] estimated from data and that's what you know statisticians do to differentiate [00:29:27] know statisticians do to differentiate them between whenever IQ PI that's the [00:29:30] them between whenever IQ PI that's the true value of that you know policy which [00:29:34] true value of that you know policy which you know I don't have ok any questions [00:29:39] you know I don't have ok any questions about model 3 Monte Carlo both of these [00:29:42] about model 3 Monte Carlo both of these algorithms are pretty simple right you [00:29:44] algorithms are pretty simple right you just you know you look at the data and [00:29:46] just you know you look at the data and you take averages yeah so question is [00:29:55] you take averages yeah so question is model free making changes to a policy or [00:29:59] model free making changes to a policy or as a fixed box so so this version I've [00:30:01] as a fixed box so so this version I've given you is only for a fixed policy the [00:30:04] given you is only for a fixed policy the general idea of model free as we'll see [00:30:06] general idea of model free as we'll see you later [00:30:07] you later you can also optimize a policy okay so [00:30:14] you can also optimize a policy okay so so now what we're gonna do is we're [00:30:17] so now what we're gonna do is we're gonna do theme and variations on model [00:30:21] gonna do theme and variations on model free Monte Carlo actually where it's [00:30:23] free Monte Carlo actually where it's gonna be the same algorithm but I just [00:30:24] gonna be the same algorithm but I just want to interpret it in kind of slightly [00:30:28] want to interpret it in kind of slightly different ways that will help us [00:30:29] different ways that will help us generalize it in the future yeah [00:30:32] generalize it in the future yeah problems where model three dozen [00:30:34] problems where model three dozen employees are there certain problems [00:30:37] employees are there certain problems where model free is better than [00:30:38] where model free is better than model-based so this is actually a really [00:30:40] model-based so this is actually a really interesting question right so you can [00:30:44] interesting question right so you can show that if your model is correct if [00:30:47] show that if your model is correct if your model of the world is correct model [00:30:49] your model of the world is correct model base is kind of the way to go because it [00:30:51] base is kind of the way to go because it will be more sample efficient meaning [00:30:53] will be more sample efficient meaning that you need fewer data points but it's [00:30:57] that you need fewer data points but it's really hard to get the model correct in [00:30:59] really hard to get the model correct in the real world so recently especially [00:31:03] the real world so recently especially with neo deep reinforcement learning [00:31:04] with neo deep reinforcement learning people have gotten a lot of mileage by [00:31:07] people have gotten a lot of mileage by just going model free because then jump [00:31:10] just going model free because then jump your head a little bit you can model [00:31:11] your head a little bit you can model this as a kind of a deep neural network [00:31:13] this as a kind of a deep neural network and that gives you extraordinary [00:31:14] and that gives you extraordinary flexibility and power without having to [00:31:17] flexibility and power without having to solve the hard problem of Noah [00:31:18] solve the hard problem of Noah constructing the MDP okay so so there's [00:31:26] constructing the MDP okay so so there's kind of three ways you can think about [00:31:27] kind of three ways you can think about this so the first we already talked [00:31:30] this so the first we already talked about it is you know this average idea [00:31:32] about it is you know this average idea so we're just looking at the utilities [00:31:34] so we're just looking at the utilities that you see whenever you counter in SNA [00:31:37] that you see whenever you counter in SNA and you just average them okay so here [00:31:40] and you just average them okay so here is an equivalent formulation and the way [00:31:45] is an equivalent formulation and the way it works is that for every si you that [00:31:50] it works is that for every si you that you see so every time you see a [00:31:51] you see so every time you see a particular sau sau sau and so on I'm [00:31:56] particular sau sau sau and so on I'm going to perform the following update on [00:31:59] going to perform the following update on so I'm gonna take my existing value and [00:32:01] so I'm gonna take my existing value and I'm going to do a what is called a [00:32:04] I'm going to do a what is called a convex combination so you know 1 - ADA [00:32:07] convex combination so you know 1 - ADA and a - sum to 1 so it's you know kind [00:32:09] and a - sum to 1 so it's you know kind of balancing between two things [00:32:11] of balancing between two things balancing between the old value that I [00:32:13] balancing between the old value that I had and the the new utility that I saw [00:32:16] had and the the new utility that I saw ok and the ADA is set to be 1 over a 1 [00:32:20] ok and the ADA is set to be 1 over a 1 plus the number of updates ok so let me [00:32:22] plus the number of updates ok so let me do a concrete example I think it'll make [00:32:24] do a concrete example I think it'll make this very clear what's what's going on [00:32:26] this very clear what's what's going on so suppose my data looks like this so I [00:32:30] so suppose my data looks like this so I hacked at 4 and then a 1 and a 1 so [00:32:35] hacked at 4 and then a 1 and a 1 so these are the utilities right that's [00:32:37] these are the utilities right that's that's a you hear I'm ignoring the SNA [00:32:39] that's a you hear I'm ignoring the SNA I'm just gonna assume that there's some [00:32:41] I'm just gonna assume that there's some something [00:32:42] something okay so first let's assume that Q pi is [00:32:45] okay so first let's assume that Q pi is zero okay so the first time I do let's [00:32:51] zero okay so the first time I do let's see number of updates I haven't done [00:32:52] see number of updates I haven't done anything so it's 1 1 minus 0 so 0 times [00:32:56] anything so it's 1 1 minus 0 so 0 times 0 plus 1 times 4 which is the first year [00:33:01] 0 plus 1 times 4 which is the first year that comes in okay so this is 4 okay so [00:33:06] that comes in okay so this is 4 okay so then that what about the next data point [00:33:08] then that what about the next data point that comes in so I'm gonna take 1/2 now [00:33:12] that comes in so I'm gonna take 1/2 now times 4 plus 1/2 times 1 which is a new [00:33:16] times 4 plus 1/2 times 1 which is a new value that comes in and that is I'm [00:33:19] value that comes in and that is I'm gonna write it as 4 plus 1 over 2 [00:33:23] gonna write it as 4 plus 1 over 2 okay so now okay just to keep track of [00:33:26] okay so now okay just to keep track of things [00:33:27] things this results in this this results in [00:33:30] this results in this this results in this and then now we're running out of [00:33:33] this and then now we're running out of space but hopefully I can so now on the [00:33:36] space but hopefully I can so now on the third one I do two thirds so I have 4 [00:33:43] third one I do two thirds so I have 4 plus 1 over 2 times 2/3 plus actually I [00:33:49] plus 1 over 2 times 2/3 plus actually I guess I should do 2/3 to be consistent [00:33:53] guess I should do 2/3 to be consistent 2/3 times 4 plus 1 over 2 which is a [00:33:56] 2/3 times 4 plus 1 over 2 which is a previous value that City and Q PI plus a [00:33:59] previous value that City and Q PI plus a 1/3 times 1 which is a new value and [00:34:02] 1/3 times 1 which is a new value and that gives me 4 plus 1 plus 1 over 3 [00:34:07] that gives me 4 plus 1 plus 1 over 3 right so you can see what's going on [00:34:10] right so you can see what's going on here is that you know each each time I [00:34:14] here is that you know each each time I have this you know sum over all the [00:34:17] have this you know sum over all the acuity I've seen over the number of [00:34:19] acuity I've seen over the number of times occurs and this ADA is set so that [00:34:22] times occurs and this ADA is set so that next time I kind of cancel out the old [00:34:24] next time I kind of cancel out the old account and I add the new count to the [00:34:27] account and I add the new count to the denominator and it kind of all works out [00:34:29] denominator and it kind of all works out so that and every time step what [00:34:33] so that and every time step what actually is in Q PI is just the plain [00:34:36] actually is in Q PI is just the plain average over all the numbers I've seen [00:34:38] average over all the numbers I've seen before okay this is just kind of an [00:34:43] before okay this is just kind of an algebraic trick to get this original [00:34:47] algebraic trick to get this original formulation which is the notion of [00:34:48] formulation which is the notion of average into this formulation which is a [00:34:50] average into this formulation which is a notion of kind of you're trying to take [00:34:55] notion of kind of you're trying to take a little bit of the old [00:34:56] a little bit of the old thing and add a little bit of a new [00:34:59] thing and add a little bit of a new thing okay [00:35:04] thing okay so I guess I'm gonna call this I guess [00:35:14] so I guess I'm gonna call this I guess combination [00:35:15] combination I guess so that's the second [00:35:17] I guess so that's the second interpretation there's a third [00:35:20] interpretation there's a third interpretation here which you can think [00:35:24] interpretation here which you can think about is in terms of stochastic gradient [00:35:27] about is in terms of stochastic gradient descent so this is actually a kind of a [00:35:28] descent so this is actually a kind of a simple algebraic manipulation so if you [00:35:31] simple algebraic manipulation so if you look at this expression what is this so [00:35:34] look at this expression what is this so you have 1 times Q pi so I'm going to [00:35:38] you have 1 times Q pi so I'm going to pull it out and put it down here and [00:35:39] pull it out and put it down here and then I'm gonna have minus a 2 times Q pi [00:35:42] then I'm gonna have minus a 2 times Q pi that's this thing and then I also have [00:35:44] that's this thing and then I also have an ADA a u so I'm gonna put kind of - [00:35:49] an ADA a u so I'm gonna put kind of - you here and this is inside this print [00:35:52] you here and this is inside this print so if you just you know do the algebra [00:35:55] so if you just you know do the algebra you can see that these two are [00:35:57] you can see that these two are equivalent so what's the point of this [00:36:02] equivalent so what's the point of this right so where have you kind of seen [00:36:06] right so where have you kind of seen this before something like maybe not not [00:36:13] this before something like maybe not not this exact expression but something like [00:36:15] this exact expression but something like it any ideas yeah when you looked at a [00:36:20] it any ideas yeah when you looked at a saccadic radius and in the context of [00:36:22] saccadic radius and in the context of the square loss for linear regression [00:36:25] the square loss for linear regression right so remember we had these updates [00:36:29] right so remember we had these updates that all looked like kind of prediction [00:36:31] that all looked like kind of prediction - Targa which was you know the residual [00:36:34] - Targa which was you know the residual and that was used to kind of update so [00:36:37] and that was used to kind of update so one way to interpret this is this is [00:36:41] one way to interpret this is this is kind of implicitly trying to do [00:36:43] kind of implicitly trying to do stochastic gradient descent on the [00:36:44] stochastic gradient descent on the objective which is a squared loss on the [00:36:49] objective which is a squared loss on the the q pi value that you you you're [00:36:53] the q pi value that you you you're trying to set and you which is the new [00:36:58] trying to set and you which is the new piece of data that you got so think [00:36:59] piece of data that you got so think about in regression this is the Y this [00:37:02] about in regression this is the Y this is y you know the what the output is and [00:37:05] is y you know the what the output is and you this is the model that's trying to [00:37:07] you this is the model that's trying to predict it and you want those to be [00:37:09] predict it and you want those to be close to each other [00:37:10] close to each other okay okay so so those are kind of three [00:37:17] okay okay so so those are kind of three views on basically this idea of [00:37:21] views on basically this idea of averaging or incremental updates okay so [00:37:26] averaging or incremental updates okay so it'll become clear why you know I did [00:37:30] it'll become clear why you know I did this this isn't just to you know have [00:37:31] this this isn't just to you know have fun okay so now let's see an example of [00:37:37] fun okay so now let's see an example of model free Monte Carlo in action on this [00:37:39] model free Monte Carlo in action on this the volcano game so remember here we [00:37:42] the volcano game so remember here we have as an example and I'm going to set [00:37:49] have as an example and I'm going to set the number of episodes to let's say a [00:37:51] the number of episodes to let's say a thousand let's see what happens [00:37:53] thousand let's see what happens so here okay so what is this kind of a [00:37:57] so here okay so what is this kind of a grid like structure degree of triangles [00:38:00] grid like structure degree of triangles denote so this remember is a state this [00:38:03] denote so this remember is a state this is 2 comma 1 so what i'm doing here is [00:38:07] is 2 comma 1 so what i'm doing here is dividing into four pieces which [00:38:08] dividing into four pieces which corresponds to the four different action [00:38:10] corresponds to the four different action so this triangle is 2 comma 1 north this [00:38:13] so this triangle is 2 comma 1 north this triangle is 2 comma 1 east and so on [00:38:16] triangle is 2 comma 1 east and so on okay so and the number here is the q pi [00:38:19] okay so and the number here is the q pi value that i'm estimating along the way [00:38:23] value that i'm estimating along the way okay so the policy I'm using is a [00:38:27] okay so the policy I'm using is a complete random just move randomly and [00:38:32] complete random just move randomly and I've run this a thousand times and we [00:38:35] I've run this a thousand times and we see that the average utility is no minus [00:38:40] see that the average utility is no minus 18 which is obviously not great okay but [00:38:46] 18 which is obviously not great okay but this is an estimate of how well the [00:38:48] this is an estimate of how well the random policy is doing so you know as [00:38:51] random policy is doing so you know as advertised no random policy you would [00:38:53] advertised no random policy you would expect to fall into a volcano quite [00:38:55] expect to fall into a volcano quite often okay [00:38:58] often okay and you can run this and sometimes you [00:39:01] and you can run this and sometimes you get slightly different results but you [00:39:02] get slightly different results but you know it's pretty much stable around [00:39:04] know it's pretty much stable around minus 19 - 18 ok any questions about [00:39:12] minus 19 - 18 ok any questions about this before we move on to different [00:39:15] this before we move on to different algorithms okay so model base Monte [00:39:19] algorithms okay so model base Monte Carlo were estimating the MVP lot of [00:39:21] Carlo were estimating the MVP lot of free Monte Carlo we're just estimating a [00:39:23] free Monte Carlo we're just estimating a Q [00:39:24] Q values of a particular policy for now [00:39:29] okay [00:39:31] okay so so let's really visit what model free [00:39:37] so so let's really visit what model free Monty Hall is doing so if you use a [00:39:41] Monty Hall is doing so if you use a policy PI eco stay for the dice game you [00:39:45] policy PI eco stay for the dice game you know you might get a bunch of different [00:39:48] know you might get a bunch of different trajectories that come out these are [00:39:50] trajectories that come out these are possible episodes and each episode you [00:39:52] possible episodes and each episode you have a utility you know associated with [00:39:54] have a utility you know associated with that and what model free Monte Carlo is [00:39:58] that and what model free Monte Carlo is doing is it's using these utilities to [00:40:02] doing is it's using these utilities to kind of update towards update uq pi [00:40:07] kind of update towards update uq pi right so in particular like for example [00:40:11] right so in particular like for example this you're saying okay I'm in I'm in [00:40:14] this you're saying okay I'm in I'm in the in state and I you know take an [00:40:17] the in state and I you know take an action stay we know what will happen [00:40:19] action stay we know what will happen well in this case I got you know 16 and [00:40:22] well in this case I got you know 16 and this case I got twelve and notice that [00:40:25] this case I got twelve and notice that there's you know quite a bit of variance [00:40:27] there's you know quite a bit of variance so on average this actually does the [00:40:29] so on average this actually does the right thing right so um just by [00:40:32] right thing right so um just by definition this is our unbiased [00:40:35] definition this is our unbiased you know estimate if you do this a [00:40:36] you know estimate if you do this a million times in average you're just [00:40:38] million times in average you're just gonna get the right value which is [00:40:39] gonna get the right value which is twelve in this case but the variance is [00:40:43] twelve in this case but the variance is huge so if for example if you only do it [00:40:46] huge so if for example if you only do it a few times you're not gonna get twelve [00:40:48] a few times you're not gonna get twelve you might get something you know cited [00:40:50] you might get something you know cited so how can we kind of counteract all [00:40:53] so how can we kind of counteract all this this variance so the key idea [00:40:56] this this variance so the key idea behind what we're gonna call [00:40:58] behind what we're gonna call bootstrapping is is that you know we [00:41:02] bootstrapping is is that you know we actually have you know some more [00:41:05] actually have you know some more information here so we have this Q PI [00:41:09] information here so we have this Q PI that we're estimating along the way [00:41:11] that we're estimating along the way right so so this view is saying okay [00:41:14] right so so this view is saying okay we're trying to estimate Q PI and then [00:41:17] we're trying to estimate Q PI and then we're going to try to basically regress [00:41:19] we're going to try to basically regress it against you know this data that we're [00:41:21] it against you know this data that we're seeing but you know can we actually use [00:41:23] seeing but you know can we actually use q pi itself to help you reduce the [00:41:28] q pi itself to help you reduce the variance so so the idea here is I'm [00:41:35] variance so so the idea here is I'm going to look at [00:41:37] going to look at although paces where I you know I [00:41:39] although paces where I you know I started in and I take stay I get a for [00:41:42] started in and I take stay I get a for okay so I'm gonna say I get a for but [00:41:47] okay so I'm gonna say I get a for but then after that point I'm actually just [00:41:50] then after that point I'm actually just going to substitute this 11 in okay this [00:41:55] going to substitute this 11 in okay this is kind of weird right because normally [00:41:58] is kind of weird right because normally I would just see okay what would happen [00:41:59] I would just see okay what would happen but what happens is kind of random on [00:42:02] but what happens is kind of random on average is gonna be right but you know [00:42:04] average is gonna be right but you know on any given case I'm a kid like you [00:42:06] on any given case I'm a kid like you know 24 or something and the hope here [00:42:10] know 24 or something and the hope here is that by using my current estimate [00:42:13] is that by using my current estimate which isn't going to be right because if [00:42:15] which isn't going to be right because if I were they were right I would be done [00:42:17] I were they were right I would be done but hopefully it's kind of somewhat [00:42:19] but hopefully it's kind of somewhat right and that will you know be no [00:42:23] right and that will you know be no better than using the the kind of the [00:42:25] better than using the the kind of the raw roll out value yeah question yes the [00:42:36] raw roll out value yeah question yes the question is would you update the furnace [00:42:38] question is would you update the furnace after each episode yes so all these [00:42:42] after each episode yes so all these algorithms I haven't been explicit about [00:42:44] algorithms I haven't been explicit about it is that you see an episode you update [00:42:47] it is that you see an episode you update after you see it and then you get a new [00:42:49] after you see it and then you get a new episode and so on yeah sometimes you [00:42:52] episode and so on yeah sometimes you would even update before you're done [00:42:54] would even update before you're done with episode okay so let me show this [00:43:00] with episode okay so let me show this what this algorithm so this is a new [00:43:06] what this algorithm so this is a new algorithm [00:43:06] algorithm it's called sarsa does anyone know why [00:43:11] it's called sarsa does anyone know why it's called sarsa oh yeah right so if [00:43:16] it's called sarsa oh yeah right so if you look at this it's bill starts up and [00:43:19] you look at this it's bill starts up and that's literally the reason why it's [00:43:20] that's literally the reason why it's called sarsa so what is this algorithm [00:43:24] called sarsa so what is this algorithm say so you're in a state s you took [00:43:26] say so you're in a state s you took action a you've got a reward and then [00:43:28] action a you've got a reward and then you ended up in state s prime and then [00:43:30] you ended up in state s prime and then you took another action a prime so for [00:43:32] you took another action a prime so for every kind of quintuple that you see [00:43:34] every kind of quintuple that you see you're gonna perform this update okay so [00:43:37] you're gonna perform this update okay so what is this update doing so this is the [00:43:39] what is this update doing so this is the convex combination remember that we saw [00:43:42] convex combination remember that we saw from before where you take a part of the [00:43:45] from before where you take a part of the old value and then you try to merge them [00:43:48] old value and then you try to merge them with the new value [00:43:50] with the new value so what is the new value here this is [00:43:53] so what is the new value here this is looking at just the immediate reward not [00:43:55] looking at just the immediate reward not the full utility just a media reward [00:43:57] the full utility just a media reward which is this for here and you're adding [00:43:59] which is this for here and you're adding the discount which is one for now of [00:44:01] the discount which is one for now of your estimate and remember what is the [00:44:04] your estimate and remember what is the estimate trying to do [00:44:05] estimate trying to do estimate is trying to be the expectation [00:44:09] estimate is trying to be the expectation of rewards that you will get in the [00:44:12] of rewards that you will get in the future so if this were actually coupon [00:44:15] future so if this were actually coupon occupy hat then this will actually just [00:44:17] occupy hat then this will actually just be strictly no better because that would [00:44:19] be strictly no better because that would be just reducing the variance but you [00:44:24] be just reducing the variance but you know of course this is not exactly right [00:44:26] know of course this is not exactly right there is bias so it's 11 not 12 but the [00:44:29] there is bias so it's 11 not 12 but the hope is that you know this is not you [00:44:31] hope is that you know this is not you know bias bite you know too much okay so [00:44:36] know bias bite you know too much okay so these would be the kind of the values [00:44:38] these would be the kind of the values that you will be updating rather than [00:44:40] that you will be updating rather than these kind of raw values here okay so [00:44:46] these kind of raw values here okay so just to kind of compare them well okay [00:44:48] just to kind of compare them well okay okay any questions about what sarsa is [00:44:52] okay any questions about what sarsa is doing before I move on so maybe I'll [00:45:01] doing before I move on so maybe I'll write something to try to be helpful [00:45:05] write something to try to be helpful here so Q PI Marquis Monticello [00:45:08] here so Q PI Marquis Monticello estimates Q PI based on you and sarsa is [00:45:12] estimates Q PI based on you and sarsa is still Q PI hat but it's based on reward [00:45:15] still Q PI hat but it's based on reward plus essentially Q PI hat I mean this is [00:45:19] plus essentially Q PI hat I mean this is not like a velvet expression but [00:45:21] not like a velvet expression but hopefully it's some symbols that will [00:45:22] hopefully it's some symbols that will evoke the right memories okay so let's [00:45:31] evoke the right memories okay so let's discuss the difference is so this is [00:45:36] discuss the difference is so this is this whenever people say kind of [00:45:38] this whenever people say kind of bootstrapping in the context of [00:45:40] bootstrapping in the context of reinforcement learning this is kind of [00:45:42] reinforcement learning this is kind of what they mean is that instead of using [00:45:45] what they mean is that instead of using you as the prediction target you're [00:45:47] you as the prediction target you're using R plus Q PI and this is kind of [00:45:51] using R plus Q PI and this is kind of your pulling up yourself from your [00:45:52] your pulling up yourself from your bootstraps because you're trying to ask [00:45:54] bootstraps because you're trying to ask me Q PI but you don't know Q pi by using [00:45:56] me Q PI but you don't know Q pi by using q pi two minutes [00:45:58] q pi two minutes okay so us based on one path [00:46:04] in salsa your based on the estimate [00:46:06] in salsa your based on the estimate which is based on all your previous kind [00:46:08] which is based on all your previous kind of experiences which means that this is [00:46:12] of experiences which means that this is unbiased model free monitor colors bias [00:46:16] unbiased model free monitor colors bias but sarsa is bias want to have has large [00:46:19] but sarsa is bias want to have has large variance sarsa has you know smaller [00:46:22] variance sarsa has you know smaller variance and one I guess a consequence [00:46:26] variance and one I guess a consequence of the way the algorithm set up is that [00:46:28] of the way the algorithm set up is that model fee Monte Carlo you have to kind [00:46:30] model fee Monte Carlo you have to kind of roll out the entire game basically [00:46:33] of roll out the entire game basically play the game or the MVP until you reach [00:46:36] play the game or the MVP until you reach the terminal state and then you can now [00:46:38] the terminal state and then you can now you have your you to update where as [00:46:41] you have your you to update where as sarsa when or any sort of bootstrapping [00:46:43] sarsa when or any sort of bootstrapping algorithm you can just immediately [00:46:45] algorithm you can just immediately update because all you need to do is you [00:46:48] update because all you need to do is you need to see this like a very local [00:46:50] need to see this like a very local window of SA RSA and then you can just [00:46:53] window of SA RSA and then you can just update and that can happen kind of you [00:46:55] update and that can happen kind of you know anywhere you don't have to wait [00:46:57] know anywhere you don't have to wait until the very end to get the value okay [00:47:04] until the very end to get the value okay so just as a quick sanity check which of [00:47:09] so just as a quick sanity check which of the following algorithms allows you to [00:47:12] the following algorithms allows you to estimate Q opt okay so model-based Monte [00:47:16] estimate Q opt okay so model-based Monte Carlo model free Monte Carlo or sarsa [00:47:22] Carlo model free Monte Carlo or sarsa okay so I'll give you maybe 10 seconds [00:47:26] okay so I'll give you maybe 10 seconds to ponder this okay how many more many [00:47:38] to ponder this okay how many more many more time okay let's get a report I [00:47:49] more time okay let's get a report I think I didn't reset it from last year [00:47:51] think I didn't reset it from last year so this is includes last year's uh [00:47:53] so this is includes last year's uh participants so model-based Monte Carlo [00:47:57] participants so model-based Monte Carlo allows you to get Q opt right because [00:48:00] allows you to get Q opt right because once you have that me P you can get [00:48:02] once you have that me P you can get whatever you want you can kick you up [00:48:03] whatever you want you can kick you up motive free Monte Carlo estimates q pi [00:48:07] motive free Monte Carlo estimates q pi doesn't have to make you opt and sarsa [00:48:11] doesn't have to make you opt and sarsa also has some it's q pi but doesn't has [00:48:13] also has some it's q pi but doesn't has to make you opt okay [00:48:17] to make you opt okay all right so so that's a kind of a [00:48:21] all right so so that's a kind of a problem like I mean these algorithms are [00:48:24] problem like I mean these algorithms are fine for estimating the value of a [00:48:28] fine for estimating the value of a policy but you really want the optimal [00:48:33] policy but you really want the optimal policy right in fact these can be used [00:48:36] policy right in fact these can be used to improve the policy as well because [00:48:38] to improve the policy as well because you can do something called policy [00:48:41] you can do something called policy improvement which I didn't talk about [00:48:43] improvement which I didn't talk about once you have the Q values you can [00:48:45] once you have the Q values you can define the new policy based on the Q [00:48:47] define the new policy based on the Q values but there's actually a kind of a [00:48:49] values but there's actually a kind of a more direct way to do this okay so so [00:48:53] more direct way to do this okay so so here's the kind of the way mental [00:48:55] here's the kind of the way mental framework you should have in your head [00:48:57] framework you should have in your head so there's two values Q PI and Q opt so [00:49:00] so there's two values Q PI and Q opt so in MDPs we saw that policy evaluation [00:49:03] in MDPs we saw that policy evaluation allows you to get Q PI value iteration [00:49:05] allows you to get Q PI value iteration get allows you get a Q opt and now we're [00:49:08] get allows you get a Q opt and now we're doing reinforcement learning here we saw [00:49:10] doing reinforcement learning here we saw a model free Monte Carlo in star sellout [00:49:12] a model free Monte Carlo in star sellout you get Q PI and now we need I'm gonna [00:49:16] you get Q PI and now we need I'm gonna show you a new algorithm called Q [00:49:17] show you a new algorithm called Q learning that allows you to get Q optin [00:49:26] so this gives you Q opt and it's based [00:49:30] so this gives you Q opt and it's based on reward plus Q opt okay so this is [00:49:38] on reward plus Q opt okay so this is going to be very similar to sarsa it's [00:49:41] going to be very similar to sarsa it's only gonna differ by essentially as you [00:49:44] only gonna differ by essentially as you might guess the same difference between [00:49:46] might guess the same difference between policy evaluation and value iteration [00:49:48] policy evaluation and value iteration okay so it's helpful to go back to kind [00:49:53] okay so it's helpful to go back to kind of the MVP recurrences so even though [00:49:55] of the MVP recurrences so even though MVP recurrences can only apply when you [00:49:57] MVP recurrences can only apply when you know the MVP for deriving and [00:49:59] know the MVP for deriving and reinforcement learning algorithms it's [00:50:01] reinforcement learning algorithms it's they can kind of give you inspiration [00:50:03] they can kind of give you inspiration for the actual algorithm okay so [00:50:06] for the actual algorithm okay so remember a Q opt what is Q opt to the Q [00:50:09] remember a Q opt what is Q opt to the Q opt is considering all possible [00:50:11] opt is considering all possible successors the probability immediate [00:50:13] successors the probability immediate reward plus future returns okay so the Q [00:50:17] reward plus future returns okay so the Q learning is this actually really kind of [00:50:19] learning is this actually really kind of clever idea and it's it could also be [00:50:23] clever idea and it's it could also be called czars stars I guess but maybe you [00:50:26] called czars stars I guess but maybe you don't want to call it that [00:50:29] and what it does is as follows so this [00:50:33] and what it does is as follows so this has the same form the convex combination [00:50:35] has the same form the convex combination of the old value and the the new value [00:50:40] of the old value and the the new value right so what is the new value so if you [00:50:46] right so what is the new value so if you look at Q opt Q opt is looking at [00:50:49] look at Q opt Q opt is looking at different successors reward plus V opt [00:50:52] different successors reward plus V opt what we're gonna do is well we don't [00:50:55] what we're gonna do is well we don't have all we're not going to be able to [00:50:56] have all we're not going to be able to some of our successors because when our [00:50:58] some of our successors because when our reinforcement learning setting and we [00:50:59] reinforcement learning setting and we only saw one particular successor so [00:51:02] only saw one particular successor so let's just use that successor so on that [00:51:04] let's just use that successor so on that successor we're going to get the reward [00:51:06] successor we're going to get the reward so R is a stand-in for the actual reward [00:51:09] so R is a stand-in for the actual reward I mean is the stand-in for the reward [00:51:12] I mean is the stand-in for the reward reward function and then you have gamma [00:51:15] reward function and then you have gamma times and then V opt I am going to [00:51:19] times and then V opt I am going to replace it with our estimate of what V [00:51:23] replace it with our estimate of what V opt is and what should it be estimate of [00:51:26] opt is and what should it be estimate of V off to be so what relates V up to Q [00:51:35] V off to be so what relates V up to Q opt yeah yeah exactly so you define V [00:51:47] opt yeah yeah exactly so you define V off to be the max over all possible [00:51:49] off to be the max over all possible actions of Q opto of s in that [00:51:51] actions of Q opto of s in that particular action then this is V opt [00:51:55] particular action then this is V opt right so Q is saying I'm in a chance [00:51:57] right so Q is saying I'm in a chance node how much what is optimal utility I [00:52:01] node how much what is optimal utility I can get provided I took an action [00:52:03] can get provided I took an action clearly the best thing to do if you're [00:52:06] clearly the best thing to do if you're at a state is just choose action that [00:52:08] at a state is just choose action that gives you the maximum Q value that you [00:52:11] gives you the maximum Q value that you get okay so that that's just no Q [00:52:15] get okay so that that's just no Q learning so let's put it side-by-side [00:52:16] learning so let's put it side-by-side with sarsa okay so sarsa these two are [00:52:22] with sarsa okay so sarsa these two are very similar [00:52:23] very similar right so sarsa remember updates against [00:52:26] right so sarsa remember updates against r plus a q pi and now we're updating [00:52:30] r plus a q pi and now we're updating against our plus this max over Q opt [00:52:36] against our plus this max over Q opt okay [00:52:37] okay and you can see that salsa requires [00:52:39] and you can see that salsa requires knowing what action I'm gonna take next [00:52:42] knowing what action I'm gonna take next kind of a one step look ahead a prime [00:52:46] kind of a one step look ahead a prime and it plugs it into here whereas Q [00:52:49] and it plugs it into here whereas Q learning it doesn't matter what a you [00:52:50] learning it doesn't matter what a you took because I'm good just gonna take [00:52:53] took because I'm good just gonna take the one that maximizes right so you can [00:52:56] the one that maximizes right so you can see why salsa dip is estimating the [00:52:59] see why salsa dip is estimating the value of a policy because you know what [00:53:01] value of a policy because you know what a prime shows up here is you know a [00:53:04] a prime shows up here is you know a function of a policy and here I'm kind [00:53:08] function of a policy and here I'm kind of insulated from that because I'm just [00:53:09] of insulated from that because I'm just taking the maximum overall actions this [00:53:11] taking the maximum overall actions this is the same intuition as for value [00:53:13] is the same intuition as for value iteration versus a policy evaluation [00:53:19] okay I'll pause here any questions [00:53:22] okay I'll pause here any questions Q learning versus sarsa Q learning on [00:53:28] Q learning versus sarsa Q learning on policy or off policy it's off policy [00:53:34] policy or off policy it's off policy because I'm following whatever policy [00:53:36] because I'm following whatever policy I'm following and I you get to estimate [00:53:39] I'm following and I you get to estimate the value of the optimal policy which is [00:53:41] the value of the optimal policy which is probably not the one I'm following how [00:53:42] probably not the one I'm following how they seem the beginning okay so let's [00:53:46] they seem the beginning okay so let's look at the example here so here's sarsa [00:53:49] look at the example here so here's sarsa and run for a thousand iterations and [00:53:52] and run for a thousand iterations and like model free Monte Carlo this I'm [00:53:56] like model free Monte Carlo this I'm estimating that average the average [00:53:59] estimating that average the average utility I'm getting is minus 20 and in [00:54:01] utility I'm getting is minus 20 and in particularly the values I'm getting are [00:54:03] particularly the values I'm getting are all very negative because this is Q PI [00:54:06] all very negative because this is Q PI this is a policy I'm following which is [00:54:08] this is a policy I'm following which is the random policy if I replace this with [00:54:12] the random policy if I replace this with Q what happens [00:54:14] Q what happens so first notice that the average utility [00:54:18] so first notice that the average utility is still minus 19 because I actually [00:54:20] is still minus 19 because I actually haven't changed my exploration policy [00:54:23] haven't changed my exploration policy I'm still doing random exploration well [00:54:26] I'm still doing random exploration well yeah I'm still doing random exploration [00:54:29] yeah I'm still doing random exploration but notice that the value the Q up [00:54:32] but notice that the value the Q up values are all around you know 20 right [00:54:37] values are all around you know 20 right and this is because the optimal policy [00:54:38] and this is because the optimal policy remember is just 2 and this is a slip [00:54:41] remember is just 2 and this is a slip probability 0 so although policy is just [00:54:43] probability 0 so although policy is just to go down here and get your 20 okay and [00:54:46] to go down here and get your 20 okay and Q and I guess it's kind of interesting [00:54:49] Q and I guess it's kind of interesting that Q learning I'm just blindly [00:54:51] that Q learning I'm just blindly following the policy running you know [00:54:53] following the policy running you know off off the cliff into the volcano all [00:54:55] off off the cliff into the volcano all the time [00:54:56] the time but you know I'm learning something and [00:54:59] but you know I'm learning something and I'm learning how to behave optimally [00:55:02] I'm learning how to behave optimally even though I'm not behaving optimally [00:55:04] even though I'm not behaving optimally and that's a kind of a hallmark of off [00:55:07] and that's a kind of a hallmark of off policy learning okay so any questions [00:55:16] policy learning okay so any questions about these four algorithms so model [00:55:20] about these four algorithms so model based Monte Carlo estimate MVP model [00:55:22] based Monte Carlo estimate MVP model free Monte Carlo estimate the Q value of [00:55:27] free Monte Carlo estimate the Q value of this policy based on the actual returns [00:55:30] this policy based on the actual returns that you get the actual sum of the [00:55:33] that you get the actual sum of the rewards sarsa is bootstrapping to [00:55:36] rewards sarsa is bootstrapping to estimating the same thing but with kind [00:55:38] estimating the same thing but with kind of a one step look ahead and Q learning [00:55:40] of a one step look ahead and Q learning is like sarsa except for I'm estimating [00:55:42] is like sarsa except for I'm estimating the optimal instead of a fixed policy pi [00:55:46] the optimal instead of a fixed policy pi yeah [00:55:49] Garza is on policy because I'm [00:55:53] Garza is on policy because I'm estimating Q PI all right okay so now [00:56:01] estimating Q PI all right okay so now let's talk about covering the unknown so [00:56:03] let's talk about covering the unknown so these are the algorithms so at this [00:56:05] these are the algorithms so at this point if I just hand you some data if I [00:56:08] point if I just hand you some data if I told you here's a fixed policy here's [00:56:10] told you here's a fixed policy here's some data you can actually ask me all [00:56:12] some data you can actually ask me all these quantities but now there's a [00:56:16] these quantities but now there's a question of exploration which we saw was [00:56:18] question of exploration which we saw was really important because if you don't [00:56:20] really important because if you don't even even see all the states how can you [00:56:22] even even see all the states how can you possibly act optimally so so which [00:56:27] possibly act optimally so so which exploration policy should you use so [00:56:30] exploration policy should you use so here are kind of two extremes [00:56:33] here are kind of two extremes so the first extreme is let's just set [00:56:36] so the first extreme is let's just set the exploration policy so so imagine [00:56:40] the exploration policy so so imagine we're doing Q learning now so you have [00:56:41] we're doing Q learning now so you have this Q opt estimate so it's not their [00:56:44] this Q opt estimate so it's not their true Q Appa you have an estimate of Q [00:56:46] true Q Appa you have an estimate of Q opt the naivety thing to do is just take [00:56:49] opt the naivety thing to do is just take a use that kuat figure out which action [00:56:52] a use that kuat figure out which action is best and just always do that action [00:56:55] is best and just always do that action okay so what happens when you do this is [00:57:00] you don't do very well so why don't you [00:57:05] you don't do very well so why don't you do very well because initially well you [00:57:08] do very well because initially well you explore randomly [00:57:10] explore randomly and soon you you find the two and once [00:57:15] and soon you you find the two and once you've found that two you say uh well [00:57:17] you've found that two you say uh well two is better than zero zero zero so I'm [00:57:19] two is better than zero zero zero so I'm just gonna keep on going down to the two [00:57:21] just gonna keep on going down to the two which is your all exploitation no [00:57:24] which is your all exploitation no exploration right you don't realize you [00:57:28] exploration right you don't realize you that there's all this other stuff over [00:57:30] that there's all this other stuff over here so in the other direction we have [00:57:35] here so in the other direction we have no exploitation all exploration here you [00:57:43] no exploitation all exploration here you kind of have the opposite set up where [00:57:45] kind of have the opposite set up where I'm running a q-learning right so as we [00:57:48] I'm running a q-learning right so as we saw before I'm actually able to estimate [00:57:50] saw before I'm actually able to estimate the the the Q opt values so I learn a [00:57:54] the the the Q opt values so I learn a lot but the average utility which is the [00:57:57] lot but the average utility which is the actual utility I'm getting by playing [00:57:59] actual utility I'm getting by playing this game is pretty bad in pretty good [00:58:03] this game is pretty bad in pretty good it's the you to you get from just you [00:58:06] it's the you to you get from just you know moving randomly so kind of what you [00:58:10] know moving randomly so kind of what you really want to do is balance you know [00:58:13] really want to do is balance you know exploration exploitation so just kind of [00:58:17] exploration exploitation so just kind of a kind of a site or a commentary is that [00:58:21] a kind of a site or a commentary is that I really feel reinforcement learning [00:58:22] I really feel reinforcement learning kind of captures our life pretty well [00:58:27] kind of captures our life pretty well because in life there's you know you [00:58:29] because in life there's you know you don't know what's going on you want to [00:58:32] don't know what's going on you want to get rewards you know you want to do well [00:58:34] get rewards you know you want to do well and but at the same time time you have [00:58:39] and but at the same time time you have to kind of learn about how the world [00:58:42] to kind of learn about how the world works so that you can kind of improve [00:58:43] works so that you can kind of improve your your policy so if you think about [00:58:45] your your policy so if you think about going to restaurants or finding the [00:58:47] going to restaurants or finding the shortest path better way to get to to [00:58:51] shortest path better way to get to to school or to work or in research even [00:58:54] school or to work or in research even when you're trying to figure out a [00:58:56] when you're trying to figure out a problem you can work on the thing that [00:58:58] problem you can work on the thing that you know how to do and will definitely [00:58:59] you know how to do and will definitely work or you know do you try to do [00:59:03] work or you know do you try to do something new in hopes of you learning [00:59:05] something new in hopes of you learning something but maybe it won't get you as [00:59:07] something but maybe it won't get you as high reward [00:59:08] high reward so hopefully reinforcement learning is I [00:59:11] so hopefully reinforcement learning is I know it's kind of a metaphor for life [00:59:13] know it's kind of a metaphor for life anyways okay so back to concrete stuff [00:59:21] anyways okay so back to concrete stuff so here is one [00:59:24] so here is one you can balance exploration exploitation [00:59:26] you can balance exploration exploitation right so it's called the epsilon greedy [00:59:29] right so it's called the epsilon greedy policy and this assumes that you're [00:59:32] policy and this assumes that you're doing something like q-learning so you [00:59:33] doing something like q-learning so you have these Q up values and the idea is [00:59:36] have these Q up values and the idea is that you know with probability 1 minus [00:59:39] that you know with probability 1 minus epsilon where epsilon is let's say like [00:59:41] epsilon where epsilon is let's say like point 1 you're just gonna give exploit [00:59:44] point 1 you're just gonna give exploit we're just gonna do give you a give it [00:59:47] we're just gonna do give you a give it all you you have and then once in a [00:59:51] all you you have and then once in a while you're just gonna do something [00:59:53] while you're just gonna do something random okay so this is actually not a [00:59:56] random okay so this is actually not a bad policy to act in life so once in a [00:59:58] bad policy to act in life so once in a while maybe you should just do something [00:59:59] while maybe you should just do something random and you kind of see what happens [01:00:01] random and you kind of see what happens so if you do this what do you get okay [01:00:06] so if you do this what do you get okay so what I've done here is I've set [01:00:10] so what I've done here is I've set epsilon to be starting with 1 so 1 is [01:00:14] epsilon to be starting with 1 so 1 is all exploration and then I'm going to [01:00:18] all exploration and then I'm going to change the value a third of the way into [01:00:21] change the value a third of the way into 0.5 and then I'm gonna 2/3 of way I'm [01:00:24] 0.5 and then I'm gonna 2/3 of way I'm going to change it to 0 okay so if I do [01:00:27] going to change it to 0 okay so if I do this then I actually estimate the values [01:00:30] this then I actually estimate the values really really well and also I get [01:00:34] really really well and also I get utility which is you know pretty good [01:00:37] utility which is you know pretty good you know 32 okay and this is also kind [01:00:44] you know 32 okay and this is also kind of something that happens as you get [01:00:47] of something that happens as you get older you tend to explore earth less and [01:00:50] older you tend to explore earth less and explore it just happens ok alright so [01:00:56] explore it just happens ok alright so that was exploration so let's put some [01:00:59] that was exploration so let's put some stuff on the board here do I need this [01:01:03] stuff on the board here do I need this anymore [01:01:04] anymore maybe this ok [01:01:06] maybe this ok [Applause] [01:01:11] okay so covering the unknown so we [01:01:13] okay so covering the unknown so we talked about your exploration your [01:01:19] talked about your exploration your epsilon greedy and there's other ways to [01:01:24] epsilon greedy and there's other ways to do this epsilon greedy is just kind of a [01:01:26] do this epsilon greedy is just kind of a simplest thing that actually you know [01:01:28] simplest thing that actually you know works remarkably you know well even the [01:01:32] works remarkably you know well even the state of our systems so the other [01:01:36] state of our systems so the other problem now I'm going to talk about is [01:01:38] problem now I'm going to talk about is you know generalization so remember when [01:01:41] you know generalization so remember when we said exploration well if you don't [01:01:43] we said exploration well if you don't see a particular state then you don't [01:01:45] see a particular state then you don't know what to do with this I mean think [01:01:47] know what to do with this I mean think about for a moment that's kind of [01:01:49] about for a moment that's kind of unreasonable because you know in life [01:01:51] unreasonable because you know in life you're never gonna be an exact same you [01:01:54] you're never gonna be an exact same you know situation and yet we are we need to [01:01:56] know situation and yet we are we need to be able to act properly right so general [01:02:00] be able to act properly right so general problem is at the state space that you [01:02:02] problem is at the state space that you you might deal with in a kind of a [01:02:03] you might deal with in a kind of a real-world situation is enormous [01:02:05] real-world situation is enormous and there's no way you're going to go [01:02:07] and there's no way you're going to go track down every possible state okay so [01:02:10] track down every possible state okay so this state space is actually not that [01:02:12] this state space is actually not that enormous but this is the biggest state [01:02:15] enormous but this is the biggest state space I could draw on this on the screen [01:02:17] space I could draw on this on the screen and you can see that this you know the [01:02:19] and you can see that this you know the average utility is you're pretty bad [01:02:20] average utility is you're pretty bad here okay so what can we do about this [01:02:24] here okay so what can we do about this so I guess let's talk about a large [01:02:28] so I guess let's talk about a large state space so this is the problem so [01:02:36] state space so this is the problem so now this is where the second the third [01:02:39] now this is where the second the third interpretation of model Fremont color [01:02:41] interpretation of model Fremont color will come in handy so let's take a look [01:02:44] will come in handy so let's take a look at q-learning okay so in the context of [01:02:49] at q-learning okay so in the context of SGD it looks like this right so it's a [01:02:53] SGD it looks like this right so it's a kind of a gradient step where you take [01:02:55] kind of a gradient step where you take the old value and you - ADA and [01:02:57] the old value and you - ADA and something that kind of looks like it [01:03:00] something that kind of looks like it could be a gradient which is the [01:03:01] could be a gradient which is the residual here so one thing to note is [01:03:07] residual here so one thing to note is that under the the kind of formulations [01:03:09] that under the the kind of formulations of a q-learning that I've talked about [01:03:12] of a q-learning that I've talked about so far this is what we call kind of roll [01:03:15] so far this is what we call kind of roll earning right which if we were you know [01:03:19] earning right which if we were you know two weeks ago we were said this is [01:03:21] two weeks ago we were said this is you know kind of ridiculous because it's [01:03:23] you know kind of ridiculous because it's not really learning we're generalizing [01:03:26] not really learning we're generalizing at all right now it's basically for [01:03:28] at all right now it's basically for every single state in action I have a [01:03:30] every single state in action I have a value if I have a difference in action [01:03:33] value if I have a difference in action completely different value I don't I [01:03:35] completely different value I don't I don't there's no kind of sharing of [01:03:36] don't there's no kind of sharing of information and naturally if I do that I [01:03:39] information and naturally if I do that I can't generalize between states and [01:03:41] can't generalize between states and actions okay so here's the key idea that [01:03:45] actions okay so here's the key idea that will allow us to actually overcome this [01:03:48] will allow us to actually overcome this so it's called function approximation in [01:03:50] so it's called function approximation in the context of reinforcement learning in [01:03:54] the context of reinforcement learning in normal machine learning it's just called [01:03:55] normal machine learning it's just called normal machine learning so the way it [01:04:00] normal machine learning so the way it works is this so we're going to define [01:04:03] works is this so we're going to define this Q opt si it's not going to be a [01:04:07] this Q opt si it's not going to be a lookup table it's going to depend on [01:04:10] lookup table it's going to depend on some parameters here W and I'm going to [01:04:12] some parameters here W and I'm going to define this function to be W dot fee si [01:04:16] define this function to be W dot fee si okay so I'm going to define this feature [01:04:18] okay so I'm going to define this feature vector very similar to how we did in [01:04:21] vector very similar to how we did in kind of machine in the machine learning [01:04:23] kind of machine in the machine learning section like except for instead of si we [01:04:25] section like except for instead of si we had X and now all the weights are going [01:04:28] had X and now all the weights are going to be kind of us you know saying okay so [01:04:30] to be kind of us you know saying okay so what kind of features might you have you [01:04:33] what kind of features might you have you might have for example features on you [01:04:36] might have for example features on you know actions so these are indicator [01:04:38] know actions so these are indicator features let's say hey maybe it's better [01:04:40] features let's say hey maybe it's better to go east than to go west or maybe it's [01:04:43] to go east than to go west or maybe it's better to be in the fifth row or it's [01:04:47] better to be in the fifth row or it's good to be in a sixth column and you [01:04:49] good to be in a sixth column and you know things like that so you have a [01:04:51] know things like that so you have a smaller set of features and you try to [01:04:54] smaller set of features and you try to use that to kind of generalize across I [01:04:56] use that to kind of generalize across I know all the different states that you [01:04:57] know all the different states that you might see so what this looks like is now [01:05:03] might see so what this looks like is now with the features is actually the same [01:05:07] with the features is actually the same as before except for now we have [01:05:11] as before except for now we have something that really looks like you [01:05:13] something that really looks like you know the machine learning of lectures is [01:05:15] know the machine learning of lectures is that you take your weight vector and you [01:05:18] that you take your weight vector and you do an update of the residual times the [01:05:24] do an update of the residual times the feature vector okay so how many of you [01:05:29] feature vector okay so how many of you this looks familiar from linear [01:05:32] this looks familiar from linear regression okay [01:05:34] regression okay all right so so just a contrast so [01:05:36] all right so so just a contrast so before we were just updating the Q opt [01:05:39] before we were just updating the Q opt values but the residual is exactly the [01:05:43] values but the residual is exactly the same and there's nothing over here and [01:05:44] same and there's nothing over here and now what we're doing is we're updating [01:05:46] now what we're doing is we're updating not the Q values we're updating the [01:05:48] not the Q values we're updating the weights the residual is the same and the [01:05:51] weights the residual is the same and the thing that connects the the Q values [01:05:55] thing that connects the the Q values with the the residual width or the [01:05:58] with the the residual width or the weights is the kind of a feature vector [01:06:00] weights is the kind of a feature vector okay as a sanity check this has the same [01:06:03] okay as a sanity check this has the same dimension this is a vector this is a [01:06:05] dimension this is a vector this is a scalar this is a vector which has the [01:06:07] scalar this is a vector which has the same dimensionality as W okay and if you [01:06:12] same dimensionality as W okay and if you want to derive this you can actually [01:06:14] want to derive this you can actually think about the implied objective [01:06:16] think about the implied objective function as simply you know linear [01:06:20] function as simply you know linear regression you have a model that's [01:06:22] regression you have a model that's trying to predict a value from an input [01:06:27] trying to predict a value from an input si so si is like X and Q opt is like [01:06:31] si so si is like X and Q opt is like kind of Y and then your record this [01:06:35] kind of Y and then your record this target is like the Y that you're trying [01:06:37] target is like the Y that you're trying to predict and you're just trying to [01:06:39] to predict and you're just trying to make this prediction close to the target [01:06:45] yeah question yeah so a good question so [01:06:53] yeah question yeah so a good question so what is this a door now is it the same [01:06:56] what is this a door now is it the same as before or is it new so when we first [01:06:59] as before or is it new so when we first started talking about these algorithms [01:07:01] started talking about these algorithms right ADA was supposed to be one over [01:07:02] right ADA was supposed to be one over the number of updates and so on but once [01:07:06] the number of updates and so on but once you get into the sgf form like this then [01:07:10] you get into the sgf form like this then now this is just behaves as a step size [01:07:12] now this is just behaves as a step size and you can tune it to your heart's [01:07:13] and you can tune it to your heart's conduct all right so that's all I'll say [01:07:21] conduct all right so that's all I'll say about these two challenges one is how do [01:07:26] about these two challenges one is how do you do exploration you can use epsilon [01:07:30] you do exploration you can use epsilon greedy which allows you to kind of [01:07:32] greedy which allows you to kind of balance exploration with exploitation oh [01:07:35] balance exploration with exploitation oh and then the second thing is that for [01:07:37] and then the second thing is that for large state spaces epsilon greedy isn't [01:07:40] large state spaces epsilon greedy isn't going to cut it because you you're not [01:07:42] going to cut it because you you're not going to see all the states even if you [01:07:43] going to see all the states even if you try really hard and you need something [01:07:46] try really hard and you need something like function approximation to [01:07:47] like function approximation to tell you about new states that you [01:07:49] tell you about new states that you fundamentally haven't seen before [01:07:53] okay so summary so far online learning [01:07:57] okay so summary so far online learning we're in the online setting this is the [01:07:59] we're in the online setting this is the game of reinforcement learning you have [01:08:01] game of reinforcement learning you have to learn and take actions in real world [01:08:03] to learn and take actions in real world one of the key challenges is this [01:08:05] one of the key challenges is this exploration exploitation trade-off we [01:08:08] exploration exploitation trade-off we saw for algorithms there's kind of two [01:08:12] saw for algorithms there's kind of two key ideas here one is Monte Carlo which [01:08:15] key ideas here one is Monte Carlo which is that from data alone you can [01:08:18] is that from data alone you can basically use averages to estimate [01:08:20] basically use averages to estimate quantities that you care about for [01:08:21] quantities that you care about for example transitions rewards and Q values [01:08:23] example transitions rewards and Q values and the second key idea is this [01:08:26] and the second key idea is this bootstrapping which shows up in salsa [01:08:27] bootstrapping which shows up in salsa and Q learning which is that you're [01:08:29] and Q learning which is that you're updating towards a target that depends [01:08:32] updating towards a target that depends on your estimate of what you're trying [01:08:35] on your estimate of what you're trying to predict not just the the kind of raw [01:08:39] to predict not just the the kind of raw data that you see okay so now I'm gonna [01:08:45] data that you see okay so now I'm gonna maybe step back a little bit and talk [01:08:47] maybe step back a little bit and talk about reinforcement learning in the [01:08:50] about reinforcement learning in the context of some kind of other things so [01:08:54] context of some kind of other things so there's kind of two things that happen [01:08:57] there's kind of two things that happen when we went from binary classification [01:08:59] when we went from binary classification which was you know two weeks ago to [01:09:01] which was you know two weeks ago to reinforcement learning now and it's [01:09:03] reinforcement learning now and it's worth kind of decoupling there's two [01:09:05] worth kind of decoupling there's two things one is state and one is feedback [01:09:07] things one is state and one is feedback so the idea about partial feedback is [01:09:13] so the idea about partial feedback is that you can only learn about actions [01:09:15] that you can only learn about actions you take right I mean this is kind of [01:09:18] you take right I mean this is kind of obvious in reinforcement learning if you [01:09:21] obvious in reinforcement learning if you didn't don't quit in this game you never [01:09:23] didn't don't quit in this game you never know how much money you'll yoyo get and [01:09:28] know how much money you'll yoyo get and the other idea is the notion of state [01:09:31] the other idea is the notion of state which is that you rewards depend on your [01:09:37] which is that you rewards depend on your previous actions so if you're going [01:09:39] previous actions so if you're going through a volcano you have to there's a [01:09:42] through a volcano you have to there's a kind of a different situation depending [01:09:45] kind of a different situation depending on where you are in in the map and [01:09:49] on where you are in in the map and there's actually kind of so so this is [01:09:52] there's actually kind of so so this is kind of a you can draw two by two grid [01:09:54] kind of a you can draw two by two grid where you go from supervised learning [01:09:56] where you go from supervised learning which is stateless and full feedback [01:09:58] which is stateless and full feedback right so there's no state every [01:10:00] right so there's no state every iteration you just get in [01:10:01] iteration you just get in new example and that doesn't have you [01:10:05] new example and that doesn't have you know there's no dependency and in terms [01:10:07] know there's no dependency and in terms of prediction on the previous examples [01:10:09] of prediction on the previous examples and full feedback and because in [01:10:14] and full feedback and because in supervised learning you're told which is [01:10:15] supervised learning you're told which is the correct label even though there's [01:10:18] the correct label even though there's even if there might be you know a [01:10:20] even if there might be you know a thousand labels for example in image [01:10:22] thousand labels for example in image classification you're just told which [01:10:23] classification you're just told which ones a correct label and now in real for [01:10:26] ones a correct label and now in real for still learning both of those are made [01:10:28] still learning both of those are made harder there's two other interesting [01:10:32] harder there's two other interesting points so what is called multi armed [01:10:34] points so what is called multi armed bandits is kind of a you can think about [01:10:37] bandits is kind of a you can think about as a warm-up to reinforce for learning [01:10:38] as a warm-up to reinforce for learning where there's partial feedback but [01:10:41] where there's partial feedback but there's no state which makes it easier [01:10:42] there's no state which makes it easier and there's also you can get full [01:10:45] and there's also you can get full feedback by their state so in structure [01:10:47] feedback by their state so in structure prediction for example a machine [01:10:49] prediction for example a machine translation you're told what the [01:10:51] translation you're told what the translation output should be but clearly [01:10:54] translation output should be but clearly do actions depend on previous actions [01:10:57] do actions depend on previous actions because you know you can't just [01:11:01] because you know you can't just translate words in isolation essentially [01:11:05] translate words in isolation essentially okay so one of the things I would just [01:11:08] okay so one of the things I would just mention very briefly is you know this [01:11:09] mention very briefly is you know this deeper reinforcement learning has been [01:11:13] deeper reinforcement learning has been you know very popular in recent years so [01:11:16] you know very popular in recent years so reinforce ammonia there was kind of a [01:11:17] reinforce ammonia there was kind of a lot of interest in the the kind of 90s [01:11:20] lot of interest in the the kind of 90s where a lot of the algorithms were kind [01:11:23] where a lot of the algorithms were kind of in theory work um was kind of [01:11:25] of in theory work um was kind of developed and then there was a period [01:11:27] developed and then there was a period where kind of not that much not as much [01:11:29] where kind of not that much not as much happened and since I guess 2013 there's [01:11:34] happened and since I guess 2013 there's been a revival of reinforcement of [01:11:36] been a revival of reinforcement of research a lot of it's due to I guess at [01:11:40] research a lot of it's due to I guess at the D mind where they published a paper [01:11:44] the D mind where they published a paper showing how they can do use robbery and [01:11:47] showing how they can do use robbery and first morning to play Atari so this will [01:11:49] first morning to play Atari so this will be talked about more in a section this [01:11:51] be talked about more in a section this Friday but the basic idea of deep [01:11:55] Friday but the basic idea of deep reinforcement learning just to kind of [01:11:57] reinforcement learning just to kind of demystify things is that you're using a [01:12:00] demystify things is that you're using a neural network for cue opt essentially [01:12:02] neural network for cue opt essentially that's what it is and there's also a lot [01:12:06] that's what it is and there's also a lot of tricks to make this kind of work [01:12:10] of tricks to make this kind of work which are necessary when you're dealing [01:12:13] which are necessary when you're dealing with enormous state spaces so what [01:12:15] with enormous state spaces so what of the things that's different about [01:12:16] of the things that's different about deep reinforcement learning is that [01:12:17] deep reinforcement learning is that people are much more ambitious about [01:12:19] people are much more ambitious about handling problems where the state spaces [01:12:21] handling problems where the state spaces are kind of enormous so for this the [01:12:23] are kind of enormous so for this the state is that just the you know the [01:12:25] state is that just the you know the pixels right so there's no huge number [01:12:28] pixels right so there's no huge number of pixels and whereas before people were [01:12:32] of pixels and whereas before people were kind of in what is known as a tabular [01:12:34] kind of in what is known as a tabular case which there's a number of states [01:12:36] case which there's a number of states you can kind of enumerate so there's a [01:12:39] you can kind of enumerate so there's a lot of kind of details here to care [01:12:42] lot of kind of details here to care right one general comment is that [01:12:44] right one general comment is that reinforcement learning is tech news it's [01:12:48] reinforcement learning is tech news it's really hard right because of this [01:12:50] really hard right because of this statefulness [01:12:52] statefulness and also the delayed feedback so just [01:12:54] and also the delayed feedback so just when you're maybe thinking about final [01:12:56] when you're maybe thinking about final projects I mean it's a really cool area [01:12:57] projects I mean it's a really cool area but you know don't underestimate how [01:12:59] but you know don't underestimate how much work and compute you need to do [01:13:03] much work and compute you need to do some other things I won't have a time to [01:13:05] some other things I won't have a time to talk about is so far we've talked about [01:13:07] talk about is so far we've talked about methods that try to estimate there the [01:13:09] methods that try to estimate there the cue function there's also a way to even [01:13:12] cue function there's also a way to even do without the cue function and just try [01:13:14] do without the cue function and just try to estimate the policy directly that's [01:13:16] to estimate the policy directly that's called you know methods like policy [01:13:19] called you know methods like policy gradient there's also methods like actor [01:13:21] gradient there's also methods like actor critic that tried to combine of these [01:13:23] critic that tried to combine of these value based methods and policy based [01:13:26] value based methods and policy based methods these are used in deep minds [01:13:31] methods these are used in deep minds you know I'll forego and alpha0 programs [01:13:34] you know I'll forego and alpha0 programs for you know crushing humans I go this [01:13:38] for you know crushing humans I go this one actually will be deferred to next [01:13:41] one actually will be deferred to next week section because this is in the [01:13:43] week section because this is in the context of games there's a bunch of [01:13:46] context of games there's a bunch of other applications you know you can fly [01:13:48] other applications you know you can fly helicopters play backgammon this is [01:13:50] helicopters play backgammon this is actually one of the early examples TD [01:13:53] actually one of the early examples TD gamma was one of early examples in there [01:13:54] gamma was one of early examples in there the nineties up on they're kind of one [01:13:56] the nineties up on they're kind of one of success stories of using [01:13:58] of success stories of using reinforcement learning and particulars [01:14:00] reinforcement learning and particulars you know self play for non games you [01:14:04] you know self play for non games you know reinforcement can be used to do [01:14:06] know reinforcement can be used to do have later scheduling and managing data [01:14:08] have later scheduling and managing data centers and so on okay so that concludes [01:14:12] centers and so on okay so that concludes a section on Markov decision processes [01:14:15] a section on Markov decision processes which we the ideas that we are playing [01:14:19] which we the ideas that we are playing against nature so nature is kind of [01:14:21] against nature so nature is kind of random but you know kind of neutral next [01:14:24] random but you know kind of neutral next time we're going to play against an [01:14:26] time we're going to play against an opponent where they're out together [01:14:29] opponent where they're out together so we'll see about that ================================================================================ LECTURE 021 ================================================================================ Game Playing 1 - Minimax, Alpha-beta Pruning | Stanford CS221: AI (Autumn 2019) Source: https://www.youtube.com/watch?v=3pU-Hrz_xy4 --- Transcript [00:00:04] all right let's start guys okay so a few [00:00:11] all right let's start guys okay so a few announcements before we start so if you [00:00:14] announcements before we start so if you have if you need away.he accommodations [00:00:16] have if you need away.he accommodations please let us know if you haven't done [00:00:18] please let us know if you haven't done that already so you need to let us know [00:00:20] that already so you need to let us know by October 31st because we need to [00:00:22] by October 31st because we need to figure out the alternate exam date so [00:00:24] figure out the alternate exam date so we'll get back to you about the exact [00:00:26] we'll get back to you about the exact like details around the alternate exam [00:00:28] like details around the alternate exam date but let us know by October 31st [00:00:31] date but let us know by October 31st project proposals are also due this [00:00:34] project proposals are also due this Thursday so to talk to the TAS you talk [00:00:36] Thursday so to talk to the TAS you talk to us come to office hours alright so [00:00:40] to us come to office hours alright so today we want to talk about games so so [00:00:43] today we want to talk about games so so we have started talking about this idea [00:00:45] we have started talking about this idea of state based models like the fact that [00:00:47] of state based models like the fact that you want to have state as a way of [00:00:48] you want to have state as a way of representing everything about everything [00:00:51] representing everything about everything that we need to plan for the future we [00:00:53] that we need to plan for the future we talked about search problems already if [00:00:55] talked about search problems already if you have talked about MVPs where we have [00:00:57] you have talked about MVPs where we have a setting where we are playing against [00:00:59] a setting where we are playing against the nature and and the nature can play [00:01:01] the nature and and the nature can play like probabilistically and then based on [00:01:03] like probabilistically and then based on that and we need to respond and today we [00:01:07] that and we need to respond and today we want to talk about games so so the setup [00:01:09] want to talk about games so so the setup is we have two players playing against [00:01:11] is we have two players playing against each other so we're not necessarily [00:01:12] each other so we're not necessarily playing against nature which can act [00:01:14] playing against nature which can act probabilistically we're actually playing [00:01:16] probabilistically we're actually playing against another intelligent agent that's [00:01:19] against another intelligent agent that's deciding for for his own or her own good [00:01:21] deciding for for his own or her own good so so that's kind of the main idea of of [00:01:24] so so that's kind of the main idea of of games alright so so let's start with an [00:01:27] games alright so so let's start with an example so this is actually an example [00:01:28] example so this is actually an example that you're gonna use throughout the [00:01:30] that you're gonna use throughout the lecture all right so the example is we [00:01:32] lecture all right so the example is we have three buckets we have a B and C and [00:01:35] have three buckets we have a B and C and then you're choosing one of these three [00:01:37] then you're choosing one of these three buckets and then I choose a number from [00:01:39] buckets and then I choose a number from the bucket and the question is well well [00:01:42] the bucket and the question is well well your goal here is to maximize the chosen [00:01:44] your goal here is to maximize the chosen number and the question is which bucket [00:01:46] number and the question is which bucket would you use okay so so how many of you [00:01:49] would you use okay so so how many of you would choose bucket a no one trusts me [00:01:54] would choose bucket a no one trusts me okay it's how much good how many of you [00:01:58] okay it's how much good how many of you choose B okay so now if people don't [00:02:01] choose B okay so now if people don't trust me how many of you to see okay so [00:02:05] trust me how many of you to see okay so so there's a number of people there yeah [00:02:06] so there's a number of people there yeah so so how are you making that decision [00:02:08] so so how are you making that decision so the way you are making this decision [00:02:09] so the way you are making this decision is if you choose a you're basically [00:02:12] is if you choose a you're basically assuming that I'm not playing like like [00:02:14] assuming that I'm not playing like like trying I'm not trying to get you I might [00:02:16] trying I'm not trying to get you I might actually give you 50 and if I [00:02:17] actually give you 50 and if I you fifty that'll be awesome and you [00:02:19] you fifty that'll be awesome and you have this very large value that you're [00:02:21] have this very large value that you're trying to maximize if you think I'm [00:02:23] trying to maximize if you think I'm going to act adversarial and go against [00:02:25] going to act adversarial and go against you and try to minimize your your number [00:02:27] you and try to minimize your your number then you're going to choose bucket B [00:02:29] then you're going to choose bucket B right because because worst case [00:02:31] right because because worst case scenario I'll choose the lowest number [00:02:33] scenario I'll choose the lowest number of the bucket and and in buckets be the [00:02:35] of the bucket and and in buckets be the lowest number is one which is better [00:02:37] lowest number is one which is better than minus 50 and minus 5 so if you're [00:02:39] than minus 50 and minus 5 so if you're assuming I'm trying to like minimize [00:02:41] assuming I'm trying to like minimize your good then you're gonna choose [00:02:42] your good then you're gonna choose bucket B and then if you have no idea [00:02:44] bucket B and then if you have no idea how I'm playing and you're just assuming [00:02:47] how I'm playing and you're just assuming maybe I'm acting ad stochastically and [00:02:49] maybe I'm acting ad stochastically and maybe I'm like flipping a coin and then [00:02:52] maybe I'm like flipping a coin and then based on that deciding like what number [00:02:54] based on that deciding like what number to give you you might choose C because [00:02:56] to give you you might choose C because in expectation C is not bad right like [00:02:59] in expectation C is not bad right like seems like if you just averaged out [00:03:00] seems like if you just averaged out these numbers and then pick the average [00:03:02] these numbers and then pick the average values from ABC a and B and see the [00:03:06] values from ABC a and B and see the average value for a is 0 for B it's 2 [00:03:08] average value for a is 0 for B it's 2 and then for C is 5 right so so so if [00:03:13] and then for C is 5 right so so so if I'm playing in stochastically you might [00:03:14] I'm playing in stochastically you might say oh I'm probably going to give you [00:03:16] say oh I'm probably going to give you something around 5 so you would pick C [00:03:18] something around 5 so you would pick C okay so so today we want to talk about [00:03:20] okay so so today we want to talk about these different policies that you might [00:03:22] these different policies that you might choose in these settings and how we [00:03:23] choose in these settings and how we should model our opponent and how we [00:03:25] should model our opponent and how we formalize these problems at game [00:03:27] formalize these problems at game problems so this is an example that Dad [00:03:29] problems so this is an example that Dad we just started ok so so the plan is to [00:03:32] we just started ok so so the plan is to formalize games talk about how we [00:03:34] formalize games talk about how we compute values in the setting of games [00:03:36] compute values in the setting of games so we're going to talk about expecting [00:03:38] so we're going to talk about expecting Max and minimax and then towards the end [00:03:40] Max and minimax and then towards the end of the lecture we're gonna talk about [00:03:41] of the lecture we're gonna talk about how to make things faster so we're gonna [00:03:44] how to make things faster so we're gonna talk about evaluation functions as a way [00:03:46] talk about evaluation functions as a way of making things faster which is using [00:03:49] of making things faster which is using domain knowledge to define evaluation [00:03:51] domain knowledge to define evaluation functions over notes we're also going to [00:03:53] functions over notes we're also going to talk about alpha beta pruning which is a [00:03:55] talk about alpha beta pruning which is a more general way of pruning your tree [00:03:57] more general way of pruning your tree and making things faster yeah all right [00:04:00] and making things faster yeah all right so that's the plan for today okay so we [00:04:04] so that's the plan for today okay so we just defined a scam and the way that to [00:04:06] just defined a scam and the way that to go about this game is to create [00:04:07] go about this game is to create something that's called a game tree a [00:04:09] something that's called a game tree a game tree is very similar to a search [00:04:12] game tree is very similar to a search tree so this might remind you of search [00:04:14] tree so this might remind you of search tree where we talked about it like two [00:04:15] tree where we talked about it like two weeks ago [00:04:16] weeks ago right so so the idea is we have this [00:04:18] right so so the idea is we have this game tree where we have notes in the [00:04:20] game tree where we have notes in the industry and each node is a decision [00:04:23] industry and each node is a decision point of a player and we have different [00:04:25] point of a player and we have different players here right like I was flying or [00:04:27] players here right like I was flying or you were playing you like we have two [00:04:28] you were playing you like we have two different people like playing here so [00:04:30] different people like playing here so these decision notes [00:04:31] these decision notes could be for what one of the players not [00:04:33] could be for what one of the players not both of them and then each roots to leaf [00:04:36] both of them and then each roots to leaf path is going to be a possible outcome [00:04:38] path is going to be a possible outcome of the game okay so like it could be [00:04:40] of the game okay so like it could be that I'm choosing minus 50 and then your [00:04:43] that I'm choosing minus 50 and then your decision was to pick bucket a so that [00:04:46] decision was to pick bucket a so that path is going to give us one possible [00:04:48] path is going to give us one possible outcome of how things can go okay so so [00:04:51] outcome of how things can go okay so so that is what the tree is basically [00:04:53] that is what the tree is basically representing here so the notes and in [00:04:56] representing here so the notes and in the first level or the decisions that I [00:04:59] the first level or the decisions that I was making and in the the first node the [00:05:01] was making and in the the first node the roots node are the decisions that you [00:05:02] roots node are the decisions that you were making in this setting so if you [00:05:05] were making in this setting so if you were to formalize this a little bit more [00:05:06] were to formalize this a little bit more we're gonna formalize this problem as a [00:05:09] we're gonna formalize this problem as a two player zero-sum game okay so so in [00:05:12] two player zero-sum game okay so so in this class at least like today we are [00:05:14] this class at least like today we are going to talk about two player games [00:05:15] going to talk about two player games where we have an agent and we have an [00:05:17] where we have an agent and we have an opponent and then we are going to talk [00:05:19] opponent and then we are going to talk about policies and values and for all of [00:05:22] about policies and values and for all of those things think of yourself as being [00:05:24] those things think of yourself as being the agent so you're playing for the [00:05:26] the agent so you're playing for the agent you're optimizing for the agent [00:05:28] agent you're optimizing for the agent opponent is this opponent that's playing [00:05:30] opponent is this opponent that's playing against okay so we're also going to like [00:05:34] against okay so we're also going to like today we're going to talk about games [00:05:35] today we're going to talk about games that are turn-taking games so we're [00:05:38] that are turn-taking games so we're going to talk about things like chess [00:05:39] going to talk about things like chess we're not talking about things like [00:05:41] we're not talking about things like rock-paper-scissors we will talk about [00:05:43] rock-paper-scissors we will talk about that actually next time then we have [00:05:44] that actually next time then we have like seeing like simultaneous games [00:05:46] like seeing like simultaneous games where you're playing simultaneously [00:05:48] where you're playing simultaneously today we are talking about turn-taking [00:05:49] today we are talking about turn-taking settings two-player turn-taking settings [00:05:52] settings two-player turn-taking settings full observability we see everything [00:05:54] full observability we see everything we're not talking about like games like [00:05:56] we're not talking about like games like poker where you don't necessarily see [00:05:57] poker where you don't necessarily see like you have partial observation and [00:05:59] like you have partial observation and you don't necessarily see the hand of [00:06:00] you don't necessarily see the hand of your opponent full observation to player [00:06:03] your opponent full observation to player and also zero-sum games and what [00:06:05] and also zero-sum games and what zero-sum means is if I'm winning and if [00:06:08] zero-sum means is if I'm winning and if I'm getting like ten dollars from [00:06:09] I'm getting like ten dollars from winning then my opponent is losing ten [00:06:11] winning then my opponent is losing ten dollars so the total utility is going to [00:06:14] dollars so the total utility is going to be equal to zero if I win some amount my [00:06:16] be equal to zero if I win some amount my opponent is losing the same amount all [00:06:19] opponent is losing the same amount all right so so what are the things that we [00:06:22] right so so what are the things that we need when we define games so we need to [00:06:23] need when we define games so we need to know the players we have the agent we [00:06:26] know the players we have the agent we have the opponent in addition to that [00:06:28] have the opponent in addition to that you need to define a bunch of things [00:06:29] you need to define a bunch of things this should remind you of the search [00:06:31] this should remind you of the search lecture or them if you like sure so we [00:06:33] lecture or them if you like sure so we might have a start state that's a start [00:06:35] might have a start state that's a start we have actions which is a function of [00:06:38] we have actions which is a function of states which gives us the possible [00:06:40] states which gives us the possible actions from a state s similar to before [00:06:43] actions from a state s similar to before you have a successor functions [00:06:45] you have a successor functions to search problems so successor function [00:06:47] to search problems so successor function takes a state an action and it tells us [00:06:49] takes a state an action and it tells us what's the resulting state you're going [00:06:51] what's the resulting state you're going to end up and this and you have an is [00:06:54] to end up and this and you have an is end function which checks if you're in [00:06:56] end function which checks if you're in an end state or not and the thing that's [00:06:58] an end state or not and the thing that's different here there are two things that [00:06:59] different here there are two things that are different here one is this utility [00:07:01] are different here one is this utility function and the utility function [00:07:04] function and the utility function basically gives us the agents utility at [00:07:06] basically gives us the agents utility at the end State so one thing to notice [00:07:10] the end State so one thing to notice here is is that the utility only comes [00:07:12] here is is that the utility only comes at an end state so after you finish the [00:07:15] at an end state so after you finish the game like I've played my chest and I [00:07:17] game like I've played my chest and I have one chest now and this is this [00:07:18] have one chest now and this is this chess game and and then I get my utility [00:07:21] chess game and and then I get my utility like as I'm making moves like through my [00:07:23] like as I'm making moves like through my chess game I'm not getting getting any [00:07:25] chess game I'm not getting getting any utility it's like you only get the [00:07:26] utility it's like you only get the utility at an end state and I no way [00:07:28] utility at an end state and I no way they are defined me utility as you are [00:07:30] they are defined me utility as you are defining it for the agent because again [00:07:31] defining it for the agent because again we are we're playing from perspective of [00:07:33] we are we're playing from perspective of the agent so so what would be the [00:07:35] the agent so so what would be the utility of the opponent - that right so [00:07:40] utility of the opponent - that right so the negation of that would be the [00:07:42] the negation of that would be the utility of opponent I've heard about [00:07:47] utility of opponent I've heard about partially observable Markov decision [00:07:49] partially observable Markov decision process this like kind of what it is [00:07:53] process this like kind of what it is okay so the question is is this [00:07:55] okay so the question is is this partially observable Markov decision [00:07:56] partially observable Markov decision process this is not a partially [00:07:57] process this is not a partially observable Markov decision processes [00:07:59] observable Markov decision processes there are classes that talk about like [00:08:02] there are classes that talk about like there's this decision under uncertainty [00:08:03] there's this decision under uncertainty michael Koch endure first class that [00:08:04] michael Koch endure first class that actually teaches that so you should [00:08:06] actually teaches that so you should shoot you should take classes on that [00:08:07] shoot you should take classes on that this is not partially observable Markov [00:08:09] this is not partially observable Markov decision process this is fully [00:08:11] decision process this is fully observable you have two players playing [00:08:13] observable you have two players playing against each other so very different [00:08:14] against each other so very different setup so so the question is are there [00:08:20] setup so so the question is are there any random this year and and so far I [00:08:22] any random this year and and so far I haven't discussed any randomness yet [00:08:24] haven't discussed any randomness yet later in the lecture I'll talk actually [00:08:25] later in the lecture I'll talk actually about the case where there might be a [00:08:26] about the case where there might be a nature in the middle that acts randomly [00:08:28] nature in the middle that acts randomly and then how we go about it but so far [00:08:30] and then how we go about it but so far two players playing against each other [00:08:32] two players playing against each other okay all right and then the other thing [00:08:36] okay all right and then the other thing that we need to define when we are [00:08:37] that we need to define when we are defining a game is is the player so so a [00:08:40] defining a game is is the player so so a player is a function of state and [00:08:42] player is a function of state and basically tells us who is in control [00:08:44] basically tells us who is in control like who is playing now so in the game [00:08:45] like who is playing now so in the game of chess the coos turn is it now and and [00:08:48] of chess the coos turn is it now and and Ida is a function that we are going to [00:08:49] Ida is a function that we are going to define when we are formally defining a [00:08:51] define when we are formally defining a game okay all right so so let's look at [00:08:56] game okay all right so so let's look at an example so we have a game of chess [00:08:57] an example so we have a game of chess players or white [00:08:58] players or white and black let's say you're playing for [00:09:00] and black let's say you're playing for white so the agent is white the opponent [00:09:02] white so the agent is white the opponent is black and then the state s can [00:09:05] is black and then the state s can represent the position of all pieces and [00:09:07] represent the position of all pieces and who started this so that is going to [00:09:10] who started this so that is going to what the state is representing so whose [00:09:12] what the state is representing so whose players turn it is and the position of [00:09:14] players turn it is and the position of all pieces so actions would be all the [00:09:17] all pieces so actions would be all the legal chess moves that player s can take [00:09:19] legal chess moves that player s can take and then is ends basically checks if the [00:09:23] and then is ends basically checks if the state is checkmate or draw that is what [00:09:26] state is checkmate or draw that is what is so so then what would the utility be [00:09:29] is so so then what would the utility be the utility will be will be if you're [00:09:32] the utility will be will be if you're like you're only going to get it when [00:09:34] like you're only going to get it when you win or when you lose or or if [00:09:36] you win or when you lose or or if there's a draw so the way we are [00:09:37] there's a draw so the way we are defining it is it's going to be let's [00:09:39] defining it is it's going to be let's say plus infinity if white wins because [00:09:41] say plus infinity if white wins because because the agent is white and and it's [00:09:44] because the agent is white and and it's going to be zero if there is a draw and [00:09:46] going to be zero if there is a draw and then it's going to be minus infinity if [00:09:48] then it's going to be minus infinity if black wins [00:09:50] black wins yeah so so that was all the things that [00:09:53] yeah so so that was all the things that we would need to define yes [00:10:01] why do we have why do we have whose turn [00:10:04] why do we have why do we have whose turn it is in this state and so that's one [00:10:06] it is in this state and so that's one way of actually like extracting the [00:10:08] way of actually like extracting the player function so you defy you can [00:10:10] player function so you defy you can define the player function is a player [00:10:12] define the player function is a player is a function of state so the state [00:10:14] is a function of state so the state already needs to encode whose turn it is [00:10:15] already needs to encode whose turn it is so you can kind of extract that this way [00:10:17] so you can kind of extract that this way you said the killer but you probably [00:10:19] you said the killer but you probably been negative utility for the the agent [00:10:21] been negative utility for the the agent is that assuming that both taking the [00:10:23] is that assuming that both taking the same actions no so so so this is [00:10:27] same actions no so so so this is turn-taking right so I take an action [00:10:28] turn-taking right so I take an action and then the opponent takes an action [00:10:30] and then the opponent takes an action and an agent a connection opponent takes [00:10:31] and an agent a connection opponent takes an action and then at the very end of [00:10:33] an action and then at the very end of the game then then you get the utility [00:10:36] the game then then you get the utility and then the opponent gets in to get [00:10:37] and then the opponent gets in to get negative of that utility but the actions [00:10:39] negative of that utility but the actions could be very different policies could [00:10:40] could be very different policies could be very different and we'll talk about [00:10:42] be very different and we'll talk about how to come up with that so why is the [00:10:43] how to come up with that so why is the condition like very influential exactly [00:10:45] condition like very influential exactly for what wins you get plus infinity but [00:10:47] for what wins you get plus infinity but we just black when you trade but if [00:10:49] we just black when you trade but if black plays black wins you have you [00:10:56] black plays black wins you have you don't have a zero-sum game we'll talk [00:10:57] don't have a zero-sum game we'll talk about that next lecture actually a [00:10:58] about that next lecture actually a little bit so so I'm talking about [00:11:00] little bit so so I'm talking about zero-sum games here because the [00:11:02] zero-sum games here because the algorithms you're talking about are for [00:11:03] algorithms you're talking about are for zero-sum games like we're going to talk [00:11:05] zero-sum games like we're going to talk about minute minimax type policies where [00:11:07] about minute minimax type policies where I'm minimizing and the agent that's [00:11:09] I'm minimizing and the agent that's maximizing so I'll get back to that if I [00:11:11] maximizing so I'll get back to that if I haven't understand sir that like we can [00:11:13] haven't understand sir that like we can talk about it after the class but also [00:11:15] talk about it after the class but also next lecture we'll talk about more [00:11:17] next lecture we'll talk about more variations of games so but for now I'm [00:11:19] variations of games so but for now I'm assuming a bunch of simplifying [00:11:20] assuming a bunch of simplifying assumptions about the assumption of [00:11:24] assumptions about the assumption of stuff like you're twins [00:11:28] why twins back at zero that's time yeah [00:11:33] why twins back at zero that's time yeah so this utilities need to add up to zero [00:11:35] so this utilities need to add up to zero if white wins maybe white gets ten but [00:11:37] if white wins maybe white gets ten but black gets minus ten [00:11:39] black gets minus ten so so like any titles okay all right so [00:11:42] so so like any titles okay all right so and then kind of the characteristics of [00:11:44] and then kind of the characteristics of games that we have already discussed are [00:11:46] games that we have already discussed are two main things one is that all [00:11:48] two main things one is that all utilities are at end state so throughout [00:11:50] utilities are at end state so throughout this path you're not getting in [00:11:52] this path you're not getting in utilities as opposed to like things like [00:11:54] utilities as opposed to like things like MVPs or we were getting rewards like [00:11:56] MVPs or we were getting rewards like throughout the path but here like the [00:11:58] throughout the path but here like the utility only comes in at the very end at [00:12:00] utility only comes in at the very end at the end state and then the other thing [00:12:02] the end state and then the other thing about it is that different players are [00:12:04] about it is that different players are in control at different states right [00:12:06] in control at different states right like if you're in state you might not be [00:12:08] like if you're in state you might not be able to control things in my control [00:12:09] able to control things in my control things it might be your opponent's turn [00:12:11] things it might be your opponent's turn I knew [00:12:12] I knew be able to do anything okay so those are [00:12:14] be able to do anything okay so those are kind of the two main characteristics of [00:12:16] kind of the two main characteristics of games all right so let's look at a game [00:12:19] games all right so let's look at a game that you're going to play alright so the [00:12:22] that you're going to play alright so the game is a half a game so we start with [00:12:25] game is a half a game so we start with the number and and then the player and [00:12:27] the number and and then the player and the players take turn and they can do [00:12:30] the players take turn and they can do two things they can either subtract one [00:12:31] two things they can either subtract one second decrement N or they can replace n [00:12:34] second decrement N or they can replace n with n over two so they can divide or [00:12:36] with n over two so they can divide or subtract yeah and the player that's left [00:12:39] subtract yeah and the player that's left with zero is going to win okay so that [00:12:42] with zero is going to win okay so that is that is the setup is that so I'll [00:12:45] is that is the setup is that so I'll follow that so so let's try to formalize [00:12:47] follow that so so let's try to formalize the game and then after that you want to [00:12:49] the game and then after that you want to figure out what is a good policy to do [00:12:51] figure out what is a good policy to do it so so right now let's just try to [00:12:53] it so so right now let's just try to let's just try to formalize this say [00:12:56] let's just try to formalize this say like what are all the different things [00:12:58] like what are all the different things for the model R so let's just have a new [00:13:00] for the model R so let's just have a new file we are going to define this game so [00:13:05] file we are going to define this game so it's 1/2 in game this alright so if [00:13:13] it's 1/2 in game this alright so if you're initializing with n so we're [00:13:15] you're initializing with n so we're starting with some number n so what is [00:13:21] starting with some number n so what is our state where our state is going to [00:13:22] our state where our state is going to encode whose player turn it is and that [00:13:24] encode whose player turn it is and that number n ok so we have a player let's [00:13:27] number n ok so we have a player let's say our players are either plus 1 or [00:13:28] say our players are either plus 1 or minus 1 that's how I'm defining like [00:13:30] minus 1 that's how I'm defining like whose player this so the start state [00:13:32] whose player this so the start state let's say player plus 1 place with n so [00:13:35] let's say player plus 1 place with n so so that is plus 1 and N and then we need [00:13:39] so that is plus 1 and N and then we need to define is and ok so what should is n [00:13:42] to define is and ok so what should is n check well we take the state we decouple [00:13:45] check well we take the state we decouple it into player and number and if the [00:13:48] it into player and number and if the number is equal to 0 then then that is [00:13:50] number is equal to 0 then then that is when the game ends that's our ending [00:13:51] when the game ends that's our ending condition ok [00:13:54] condition ok how about utility well we get the [00:13:57] how about utility well we get the utility at an end state so again I take [00:13:59] utility at an end state so again I take a state I decouple it into player and [00:14:02] a state I decouple it into player and number I make sure that we are in an end [00:14:05] number I make sure that we are in an end state so we search that number is equal [00:14:07] state so we search that number is equal to 0 because that kind of defines if [00:14:09] to 0 because that kind of defines if you're in an end state or not and then [00:14:11] you're in an end state or not and then the utility I'm gonna get if I'm winning [00:14:13] the utility I'm gonna get if I'm winning I'm gonna get infinity if I'm not [00:14:15] I'm gonna get infinity if I'm not winning I'm gonna get minus infinity and [00:14:17] winning I'm gonna get minus infinity and the way I'm defining that here is by [00:14:19] the way I'm defining that here is by just doing player times infinity because [00:14:21] just doing player times infinity because player I'm the agent I'm the player plus [00:14:23] player I'm the agent I'm the player plus 1 the opponent is player [00:14:25] 1 the opponent is player a swan that's how like if -1 is winning [00:14:29] a swan that's how like if -1 is winning I'm gonna get minus infinity okay the [00:14:32] I'm gonna get minus infinity okay the actions that we can do is we can [00:14:33] actions that we can do is we can subtract 1 or we can divide [00:14:35] subtract 1 or we can divide / - I mean subtract and avoid or the [00:14:38] / - I mean subtract and avoid or the main actions and play your display or [00:14:41] main actions and play your display or function again takes a state I'm going [00:14:43] function again takes a state I'm going to decouple this state into player a [00:14:45] to decouple this state into player a number and just return the player that's [00:14:47] number and just return the player that's how I know whose player's turn is and [00:14:51] how I know whose player's turn is and then we need to define the successor [00:14:53] then we need to define the successor function the successor function takes [00:14:55] function the successor function takes the state and an action and tells us [00:14:58] the state and an action and tells us what state you're gonna end up by so [00:15:00] what state you're gonna end up by so again a state I'm going to decouple that [00:15:02] again a state I'm going to decouple that into a player and a number and then the [00:15:05] into a player and a number and then the actions I can take are two things I can [00:15:07] actions I can take are two things I can either subtract 1 or I can I can divide [00:15:10] either subtract 1 or I can I can divide by 2 so if I'm subtracting then I'm [00:15:12] by 2 so if I'm subtracting then I'm going to return a new state which is - [00:15:14] going to return a new state which is - player because now it's - once turn or [00:15:16] player because now it's - once turn or +1 so like it's - whoever turned to this [00:15:19] +1 so like it's - whoever turned to this now and then I'm going to do number [00:15:21] now and then I'm going to do number minus 1 if the action is divided we are [00:15:24] minus 1 if the action is divided we are going to return the new player which is [00:15:26] going to return the new player which is - player and then number / - okay that [00:15:31] - player and then number / - okay that is it [00:15:31] is it so so we just defined this game [00:15:40] all right so so that was my game we were [00:15:47] all right so so that was my game we were gonna play this game in a little bit but [00:15:49] gonna play this game in a little bit but let's quickly before playing it let's [00:15:50] let's quickly before playing it let's talk about what is a solution to a game [00:15:52] talk about what is a solution to a game like what are we trying to do in a game [00:15:54] like what are we trying to do in a game so if you remember MVPs the solution to [00:15:56] so if you remember MVPs the solution to a game was with a policy so a policy was [00:15:59] a game was with a policy so a policy was a function of state it would return the [00:16:01] a function of state it would return the action that you need to take in that [00:16:02] action that you need to take in that state so similar term the piece here we [00:16:05] state so similar term the piece here we have policies but but the thing is I [00:16:07] have policies but but the thing is I have two players so policy should depend [00:16:09] have two players so policy should depend on the player two so I have pi of P [00:16:12] on the player two so I have pi of P which is the policy of player P and I [00:16:15] which is the policy of player P and I can define it similar to before it can [00:16:17] can define it similar to before it can be a policy as a function of a state and [00:16:18] be a policy as a function of a state and it can return just an action and this [00:16:21] it can return just an action and this would be a deterministic policy like [00:16:22] would be a deterministic policy like deterministically if I'm in a state the [00:16:25] deterministically if I'm in a state the policy is going to tell me what action [00:16:26] policy is going to tell me what action to take yes we can also define [00:16:29] to take yes we can also define stochastic policies so what's the [00:16:32] stochastic policies so what's the catholic policies would do is they would [00:16:33] catholic policies would do is they would take a state and action and then they [00:16:36] take a state and action and then they would return a number between 0 to 1 [00:16:38] would return a number between 0 to 1 which is the probability of taking that [00:16:41] which is the probability of taking that action so policy PI of a state and [00:16:44] action so policy PI of a state and action basically will return the [00:16:46] action basically will return the probability of player P taking action a [00:16:49] probability of player P taking action a in state s so if you remember the bucket [00:16:52] in state s so if you remember the bucket example like maybe half the time I would [00:16:54] example like maybe half the time I would pick the number on the right and half [00:16:56] pick the number on the right and half the time I would pick the number under [00:16:57] the time I would pick the number under on the left that would be a stochastic [00:16:59] on the left that would be a stochastic policy right I'm not deterministically [00:17:01] policy right I'm not deterministically telling you what the action is I'm [00:17:03] telling you what the action is I'm coming up with the stochastic way of [00:17:04] coming up with the stochastic way of telling you like what policy I'm falling [00:17:07] telling you like what policy I'm falling ok so if you have deterministic policies [00:17:08] ok so if you have deterministic policies stochastic policies like in our game we [00:17:11] stochastic policies like in our game we could fall either one of them fantastic [00:17:14] could fall either one of them fantastic policy as the student has been pausing [00:17:17] policy as the student has been pausing and can you speak up so under what case [00:17:20] and can you speak up so under what case would you want to stochastic policy [00:17:22] would you want to stochastic policy versus a deterministic policy so I know [00:17:24] versus a deterministic policy so I know what case do you want a stochastic [00:17:25] what case do you want a stochastic policy is a policy again we'll have [00:17:28] policy is a policy again we'll have covered that a little bit more next time [00:17:29] covered that a little bit more next time depending on what games you are in like [00:17:31] depending on what games you are in like we have some property yourself when [00:17:33] we have some property yourself when stochastic policies are giving us some [00:17:35] stochastic policies are giving us some some properties than the term mystic [00:17:37] some properties than the term mystic policies are giving us some other [00:17:38] policies are giving us some other properties right now you're just [00:17:40] properties right now you're just defining them as things that could exist [00:17:41] defining them as things that could exist and we could think our opponent is [00:17:44] and we could think our opponent is acting deterministically if we know [00:17:47] acting deterministically if we know exactly what they're doing sometimes [00:17:48] exactly what they're doing sometimes have no idea maybe you're like I've [00:17:50] have no idea maybe you're like I've learned it somehow and I have some [00:17:51] learned it somehow and I have some randomness there and then I'm gonna do [00:17:53] randomness there and then I'm gonna do some stochastic policy for how my [00:17:55] some stochastic policy for how my opponent is going to play against me but [00:17:57] opponent is going to play against me but we are going to talk about the like what [00:17:58] we are going to talk about the like what we get out of a stochastic versus [00:18:00] we get out of a stochastic versus deterministic policies a little bit more [00:18:01] deterministic policies a little bit more next time yes all right so okay so now [00:18:06] next time yes all right so okay so now let's okay so now that we know that it's [00:18:07] let's okay so now that we know that it's the policy that we want to get let's try [00:18:09] the policy that we want to get let's try to let's try to write up a policy for [00:18:13] to let's try to write up a policy for this game and then I'm gonna define a [00:18:15] this game and then I'm gonna define a human policy and what I mean by that is [00:18:17] human policy and what I mean by that is this is going to come from a human that [00:18:20] this is going to come from a human that means one of you guys or two of you guys [00:18:22] means one of you guys or two of you guys so um so I need two volunteers for this [00:18:25] so um so I need two volunteers for this but let's quickly actually write this up [00:18:27] but let's quickly actually write this up so what is a human policy it's just [00:18:29] so what is a human policy it's just going to get the input from the keyboard [00:18:31] going to get the input from the keyboard so what I'm going to tell you up here is [00:18:33] so what I'm going to tell you up here is is get the action from from from the [00:18:36] is get the action from from from the keyboard so get the input from the [00:18:38] keyboard so get the input from the keyboard and that is going to be the [00:18:39] keyboard and that is going to be the action that we are picking remember the [00:18:41] action that we are picking remember the actions are either divide or subtract [00:18:43] actions are either divide or subtract subtract one and if action is valid then [00:18:47] subtract one and if action is valid then return that action that sounds like a [00:18:49] return that action that sounds like a good good policy okay so that is a human [00:18:53] good good policy okay so that is a human policy so now what I want to do is I [00:18:55] policy so now what I want to do is I want to have like this game that's [00:18:57] want to have like this game that's actually playing against each other so I [00:18:58] actually playing against each other so I want to have policies for my agent my [00:19:02] want to have policies for my agent my agent is plus one that's going to be a [00:19:04] agent is plus one that's going to be a human policy and for my opponent I'm [00:19:06] human policy and for my opponent I'm gonna say my opponent is also a human [00:19:08] gonna say my opponent is also a human policy so I just want to humans to play [00:19:10] policy so I just want to humans to play against each other and the game is let's [00:19:15] against each other and the game is let's say we are starting with 15 so our [00:19:18] say we are starting with 15 so our number that you're starting with is 15 [00:19:20] number that you're starting with is 15 okay all right [00:19:23] okay all right so that looks right to me [00:19:28] so that looks right to me so how do we how do we ensure that we [00:19:30] so how do we how do we ensure that we are progressing in the game so if you're [00:19:32] are progressing in the game so if you're in an NSA if you're not in an end-state [00:19:34] in an NSA if you're not in an end-state you want to progress [00:19:35] you want to progress so let's print a bunch of things here [00:19:37] so let's print a bunch of things here just print our state okay let's get the [00:19:45] just print our state okay let's get the player out of the state because again [00:19:47] player out of the state because again the state encodes the player let's get [00:19:51] the state encodes the player let's get the policy because because we've defined [00:19:53] the policy because because we've defined these policies for both of the players [00:19:55] these policies for both of the players so we can we can get the policy for [00:19:57] so we can we can get the policy for whoever is playing right now and then [00:20:00] whoever is playing right now and then the action comes from the policy in that [00:20:03] the action comes from the policy in that state and then the new state you are [00:20:05] state and then the new state you are gonna end up at is just a successor of [00:20:07] gonna end up at is just a successor of the current state and action so that I'm [00:20:09] the current state and action so that I'm just progressing so so this while loop [00:20:11] just progressing so so this while loop here just figures out what stage we are [00:20:13] here just figures out what stage we are in what policy are we following and [00:20:15] in what policy are we following and where are we going to end up at and [00:20:17] where are we going to end up at and that's the successor function and [00:20:19] that's the successor function and another very end I'm just going to print [00:20:21] another very end I'm just going to print out the utility so that's either plus [00:20:25] out the utility so that's either plus infinity or minus infinity and that [00:20:29] infinity or minus infinity and that sounds good so all right so let's [00:20:32] sounds good so all right so let's actually alright so who wants play this [00:20:36] okay that's one person your agent your [00:20:38] okay that's one person your agent your player plus one opponent all three to [00:20:42] player plus one opponent all three to four I think you reversed white shirt [00:20:46] four I think you reversed white shirt yeah okay so you're minus one all right [00:20:49] yeah okay so you're minus one all right so let's play this yeah is this large [00:20:56] so let's play this yeah is this large enough yeah okay all right so player 1 [00:20:58] enough yeah okay all right so player 1 player +1 we are at number 15 you want [00:21:01] player +1 we are at number 15 you want to decrement okay so minus 1 so you are [00:21:06] to decrement okay so minus 1 so you are player minus 1 we are at 14 what do you [00:21:08] player minus 1 we are at 14 what do you want to do divide okay you have a policy [00:21:14] want to do divide okay you have a policy that you know minus 1 [00:21:23] oh yeah so I kind of get the point right [00:21:36] oh yeah so I kind of get the point right so wait do they make you lose now sorry [00:21:41] so wait do they make you lose now sorry my back but when you get the utility at [00:21:43] my back but when you get the utility at the end and then basically kind of [00:21:46] the end and then basically kind of actually does any no I don't know we [00:21:49] actually does any no I don't know we don't have down motion I was gonna try [00:21:50] don't have down motion I was gonna try like another pair but the code is online [00:21:52] like another pair but the code is online if you wanna play with it just play with [00:21:53] if you wanna play with it just play with it you'll have one other version playing [00:21:56] it you'll have one other version playing with an automated policy later [00:21:58] with an automated policy later all right so okay so we're back here all [00:22:09] all right so okay so we're back here all right so we just saw how we can give [00:22:10] right so we just saw how we can give some human policies and the human [00:22:12] some human policies and the human policies playing against each other and [00:22:14] policies playing against each other and again the policy you give it a state an [00:22:16] again the policy you give it a state an action it gives you a probability or you [00:22:18] action it gives you a probability or you give it a state and it gives you an [00:22:19] give it a state and it gives you an action so the term Mystic policy is just [00:22:21] action so the term Mystic policy is just an instance of a stochastic policy right [00:22:24] an instance of a stochastic policy right so so if you have deterministic policy [00:22:25] so so if you have deterministic policy you can kind of treat it as a stochastic [00:22:27] you can kind of treat it as a stochastic policy where with probability one you're [00:22:29] policy where with probability one you're picking you're picking an action so [00:22:31] picking you're picking an action so alright so so now we want to talk about [00:22:34] alright so so now we want to talk about how we evaluate a game so so let's say [00:22:36] how we evaluate a game so so let's say that someone comes in and gives me the [00:22:38] that someone comes in and gives me the policy of an agent and an opponent and I [00:22:41] policy of an agent and an opponent and I just want to know how good that was and [00:22:42] just want to know how good that was and again if you remember the MVP lecture we [00:22:44] again if you remember the MVP lecture we started with policy evaluation so [00:22:46] started with policy evaluation so indemnity lecture we started with this [00:22:49] indemnity lecture we started with this idea of someone gives me the policy if [00:22:51] idea of someone gives me the policy if you just want to evaluate how good that [00:22:53] you just want to evaluate how good that is and you're kind of doing an analogous [00:22:54] is and you're kind of doing an analogous to exactly that someone comes in and [00:22:56] to exactly that someone comes in and tells me that my agent is going to [00:22:58] tells me that my agent is going to pickpocket eighth that is what my agent [00:23:00] pickpocket eighth that is what my agent is going to just do all the time and [00:23:02] is going to just do all the time and someone comes in and says well my [00:23:03] someone comes in and says well my opponent is going to access classically [00:23:05] opponent is going to access classically and and with probability 1/2 you give me [00:23:08] and and with probability 1/2 you give me one of those numbers ok so so these are [00:23:10] one of those numbers ok so so these are the two policies that we were going to [00:23:11] the two policies that we were going to have so the question is how good is this [00:23:13] have so the question is how good is this so going back to the to the tree the [00:23:15] so going back to the to the tree the game tree what is really happening is my [00:23:17] game tree what is really happening is my agent is going to pick this one right [00:23:21] agent is going to pick this one right because it's going to pickpocket [00:23:22] because it's going to pickpocket a so with probability 1 we are going to [00:23:24] a so with probability 1 we are going to end up here with probability 0 we end up [00:23:26] end up here with probability 0 we end up in any of these other buckets and then [00:23:28] in any of these other buckets and then my opponent is going to stochastically [00:23:30] my opponent is going to stochastically pick either minus 50 or 50 ok [00:23:32] pick either minus 50 or 50 ok so if my opponent is picking minus 50 or [00:23:34] so if my opponent is picking minus 50 or 50 then the value of [00:23:36] 50 then the value of snowed is just the the expectation of [00:23:40] snowed is just the the expectation of that or it's just going to be zero so [00:23:41] that or it's just going to be zero so 50% of the times it's minus 50 or 50 [00:23:43] 50% of the times it's minus 50 or 50 percent of the times it's 50 then the [00:23:45] percent of the times it's 50 then the value of the snow Dussehra and then if [00:23:48] value of the snow Dussehra and then if my agent is picking picking a then then [00:23:50] my agent is picking picking a then then the value of this node is going to be [00:23:52] the value of this node is going to be zero okay so so you kind of can see how [00:23:55] zero okay so so you kind of can see how the value is going to propagate up from [00:23:58] the value is going to propagate up from the utility so we have the utilities are [00:24:00] the utility so we have the utilities are the leaf nodes but we could actually [00:24:01] the leaf nodes but we could actually compute a value for each one of these [00:24:04] compute a value for each one of these notes if I know what the policies are [00:24:06] notes if I know what the policies are right like if I know who is following [00:24:07] right like if I know who is following what policy I can actually compute these [00:24:09] what policy I can actually compute these values and go up the tree so in this [00:24:12] values and go up the tree so in this case I can say a value of the start [00:24:15] case I can say a value of the start state if I'm evaluating this particular [00:24:18] state if I'm evaluating this particular policy is going to be equal to zero okay [00:24:20] policy is going to be equal to zero okay all right so someone gave me the policy [00:24:24] all right so someone gave me the policy I evaluated the value of the start State [00:24:26] I evaluated the value of the start State so in general I guess I was just saying [00:24:29] so in general I guess I was just saying earlier this is this is similar to two [00:24:31] earlier this is this is similar to two policy evaluation this is similar to the [00:24:34] policy evaluation this is similar to the case that someone gives me the policies [00:24:35] case that someone gives me the policies and all evaluate what how good the [00:24:37] and all evaluate what how good the situation is and you can write a [00:24:39] situation is and you can write a recurrence to actually compute that so [00:24:41] recurrence to actually compute that so I'm going to write the recurrence here [00:24:43] I'm going to write the recurrence here maybe so you want to compute this value [00:24:45] maybe so you want to compute this value and this value is evaluating a given [00:24:49] and this value is evaluating a given policy it's a function of state but what [00:24:53] policy it's a function of state but what is that going to be equal to [00:24:55] is that going to be equal to it's going to be a quarter utility of s [00:24:57] it's going to be a quarter utility of s if you're in an end state so it's [00:24:59] if you're in an end state so it's utility of s if we are already in an end [00:25:04] utility of s if we are already in an end state otherwise I have access to the [00:25:09] state otherwise I have access to the policy of my opponent and policy of my [00:25:11] policy of my opponent and policy of my agent so I can just do and expect that [00:25:14] agent so I can just do and expect that sum over all possible actions of us [00:25:18] sum over all possible actions of us let's say that I'm if player s is agent [00:25:22] let's say that I'm if player s is agent I'm looking at policy of agent say it's [00:25:26] I'm looking at policy of agent say it's a stochastic policy times V of eval of [00:25:30] a stochastic policy times V of eval of the successor state successor of SN a [00:25:34] the successor state successor of SN a and this is if my player is agent so so [00:25:39] and this is if my player is agent so so if is player I'm just gonna write this [00:25:42] if is player I'm just gonna write this player of s is equal to agent [00:25:46] player of s is equal to agent what happens if my players opponent I'm [00:25:49] what happens if my players opponent I'm gonna do the same thing I'm just [00:25:51] gonna do the same thing I'm just evaluating I have access to the policy [00:25:53] evaluating I have access to the policy of the opponent I am again just doing [00:25:55] of the opponent I am again just doing going to do a sum over all possible [00:25:58] going to do a sum over all possible actions of a policy of the opponent this [00:26:01] actions of a policy of the opponent this is given to me someone gave this to me [00:26:03] is given to me someone gave this to me of state and action x value of the [00:26:07] of state and action x value of the successor state there's an a and this is [00:26:12] successor state there's an a and this is the case that my player is the opponent [00:26:15] the case that my player is the opponent here so this is the recurrence that we [00:26:17] here so this is the recurrence that we were going to just write and it's kind [00:26:19] were going to just write and it's kind of intuitive again we have seen this [00:26:20] of intuitive again we have seen this entry search shoe like you start with [00:26:22] entry search shoe like you start with the utilities at the leaf nodes and you [00:26:24] the utilities at the leaf nodes and you just push that back up based on what [00:26:26] just push that back up based on what your policies are and what your policies [00:26:28] your policies are and what your policies are telling you like which sides like [00:26:29] are telling you like which sides like which and which edges of the tree you [00:26:32] which and which edges of the tree you are taking what probably okay [00:26:35] are taking what probably okay this make sense all right okay so that [00:26:40] this make sense all right okay so that was evaluating the game but what if now [00:26:42] was evaluating the game but what if now I want to solve what the agent should do [00:26:45] I want to solve what the agent should do right like I am the agent I care about [00:26:47] right like I am the agent I care about doing like look like figuring out what [00:26:49] doing like look like figuring out what my PI agent is I don't know what my PI [00:26:50] my PI agent is I don't know what my PI agent is I need to figure out what sort [00:26:52] agent is I need to figure out what sort of policy I should be following and that [00:26:54] of policy I should be following and that kind of takes us to this idea of expect [00:26:57] kind of takes us to this idea of expect Emacs which is basically the idea of if [00:27:00] Emacs which is basically the idea of if I'm in a scenario where I know what my [00:27:02] I'm in a scenario where I know what my opponent does so I'm is still assuming [00:27:04] opponent does so I'm is still assuming what I know what my opponent does what [00:27:07] what I know what my opponent does what would be the best thing that I should be [00:27:08] would be the best thing that I should be doing as an agent okay what would be the [00:27:11] doing as an agent okay what would be the best thing I should do like if you knew [00:27:15] best thing I should do like if you knew like in the bucket example I was trying [00:27:17] like in the bucket example I was trying I was acting probabilistic Lee what [00:27:20] I was acting probabilistic Lee what would you do [00:27:23] the action that like gives you the [00:27:25] the action that like gives you the maximum so you pick the action that [00:27:28] maximum so you pick the action that gives you the maximum value because [00:27:29] gives you the maximum value because you're trying to maximize your own your [00:27:31] you're trying to maximize your own your own value so so then if that is the case [00:27:34] own value so so then if that is the case then this recurrence needs to change [00:27:35] then this recurrence needs to change right this recurrence the way it changes [00:27:38] right this recurrence the way it changes is I'm gonna call this is that new value [00:27:40] is I'm gonna call this is that new value so I'm gonna just do everything on top [00:27:42] so I'm gonna just do everything on top of this I'm not rewriting it I'm gonna [00:27:45] of this I'm not rewriting it I'm gonna call this value value of expecting max [00:27:48] call this value value of expecting max policy okay so so this value eval I'm [00:27:52] policy okay so so this value eval I'm not evaluating anything anymore I want [00:27:55] not evaluating anything anymore I want to actually figure out what my agent [00:27:57] to actually figure out what my agent should do so I'm gonna call it expect a [00:27:59] should do so I'm gonna call it expect a max and if I know a policy of my [00:28:04] max and if I know a policy of my opponent I'm not changing anything here [00:28:06] opponent I'm not changing anything here because I know the policy of my opponent [00:28:08] because I know the policy of my opponent I'm just gonna compute this but now I [00:28:10] I'm just gonna compute this but now I want to figure out what the agent should [00:28:12] want to figure out what the agent should do and what should the agent do mall the [00:28:13] do and what should the agent do mall the agent should do the thing that maximizes [00:28:15] agent should do the thing that maximizes this value so I'm gonna erase this sum [00:28:19] this value so I'm gonna erase this sum with the policy because I don't have [00:28:21] with the policy because I don't have that policy and the agent should do the [00:28:23] that policy and the agent should do the thing that maximizes this value so this [00:28:32] thing that maximizes this value so this should remind you of value iteration so [00:28:35] should remind you of value iteration so if you remember value iteration in the [00:28:37] if you remember value iteration in the MVP lecture if you weren't evaluating [00:28:40] MVP lecture if you weren't evaluating things right you were trying to maximize [00:28:41] things right you were trying to maximize our value and that's kind of like [00:28:43] our value and that's kind of like analogous to what we are doing here [00:28:45] analogous to what we are doing here you're trying to figure out what should [00:28:47] you're trying to figure out what should be the policy that the agent should take [00:28:48] be the policy that the agent should take that maximizes the value under the [00:28:51] that maximizes the value under the scenario that I know what the opponent [00:28:52] scenario that I know what the opponent does so I still kind of know what the [00:28:54] does so I still kind of know what the opponent does so going back to this [00:28:57] opponent does so going back to this example so let's say I know my opponent [00:28:59] example so let's say I know my opponent is acting stochastically what should I [00:29:01] is acting stochastically what should I do [00:29:01] do so if my opponent is acting [00:29:03] so if my opponent is acting stochastically with probability 1/2 then [00:29:05] stochastically with probability 1/2 then the values of each one of these buckets [00:29:07] the values of each one of these buckets are going to be 0 2 and 5 and I'm trying [00:29:10] are going to be 0 2 and 5 and I'm trying to maximize my own you to my own value [00:29:13] to maximize my own you to my own value so I'm gonna pick the one that gives me [00:29:15] so I'm gonna pick the one that gives me 5 and then that's shown with this upward [00:29:16] 5 and then that's shown with this upward triangle I'm trying to maximize so I'm [00:29:18] triangle I'm trying to maximize so I'm gonna pick buckets see because I'm [00:29:20] gonna pick buckets see because I'm trying to maximize under this knowledge [00:29:22] trying to maximize under this knowledge that the other agent is stochastic [00:29:24] that the other agent is stochastic reacting and and then we're calling this [00:29:29] reacting and and then we're calling this the value of expecting max policy and [00:29:31] the value of expecting max policy and the value of expecting max policy from [00:29:33] the value of expecting max policy from the start state is equal to 5 right [00:29:36] the start state is equal to 5 right because that's that's [00:29:37] because that's that's you did I think I'm gonna get posture - [00:29:41] yes this is assuming I know my opponents [00:29:44] yes this is assuming I know my opponents policy and then I'm following you Nick I [00:29:46] policy and then I'm following you Nick I guess so I'm maximizing my own [00:29:49] guess so I'm maximizing my own you took my own value knowing that my [00:29:51] you took my own value knowing that my opponent is following this policy and [00:29:53] opponent is following this policy and what the opponent would do an [00:29:54] what the opponent would do an expectation alright so and then this is [00:29:57] expectation alright so and then this is the this is the recurrence that you [00:29:59] the this is the recurrence that you would get we would just update the [00:30:01] would get we would just update the recurrence so if the agent is playing [00:30:04] recurrence so if the agent is playing then we maximize the value of expecting [00:30:06] then we maximize the value of expecting max okay all right so okay in general I [00:30:12] max okay all right so okay in general I don't know the policy of my opponent [00:30:14] don't know the policy of my opponent right so in general like I know what [00:30:16] right so in general like I know what gives me this payoff so so if that is [00:30:20] gives me this payoff so so if that is the case then what should we do so one [00:30:22] the case then what should we do so one thing that we could do is we could [00:30:25] thing that we could do is we could assume worst case so one thing that you [00:30:27] assume worst case so one thing that you could do is it could be like oh the [00:30:28] could do is it could be like oh the opponent is trying to get me in and [00:30:30] opponent is trying to get me in and they're going to play the worst-case [00:30:31] they're going to play the worst-case scenario and they are trying to minimize [00:30:33] scenario and they are trying to minimize my value and and that's the first thing [00:30:36] my value and and that's the first thing to do and we are going to talk about if [00:30:38] to do and we are going to talk about if that is always the best thing we can do [00:30:40] that is always the best thing we can do or not a little bit later in the lecture [00:30:41] or not a little bit later in the lecture but for now like we could assume that if [00:30:43] but for now like we could assume that if I know nothing about my opponent I can [00:30:45] I know nothing about my opponent I can just assume my opponent is acting [00:30:47] just assume my opponent is acting adversarially against me so and that [00:30:49] adversarially against me so and that kind of introduces this idea of minimax [00:30:52] kind of introduces this idea of minimax as opposed to expect you know just [00:30:55] as opposed to expect you know just talking about so so what would minimax [00:30:57] talking about so so what would minimax so in the case of a minimax policy what [00:31:00] so in the case of a minimax policy what I am assuming is I am this agent trying [00:31:03] I am assuming is I am this agent trying to maximize my my own my own value and [00:31:06] to maximize my my own my own value and then I'm assuming my opponent is acting [00:31:09] then I'm assuming my opponent is acting adversarial so my opponent is really [00:31:11] adversarial so my opponent is really trying to minimize my values and then [00:31:13] trying to minimize my values and then what that means is from this pocket I'm [00:31:15] what that means is from this pocket I'm gonna get minus 50 from this one I'm [00:31:17] gonna get minus 50 from this one I'm gonna get one from this one I'm gonna [00:31:19] gonna get one from this one I'm gonna get minus five and under that assumption [00:31:21] get minus five and under that assumption well I'm gonna pick the second bucket [00:31:23] well I'm gonna pick the second bucket because that gives me the highest the [00:31:25] because that gives me the highest the highest value so so that is a minimax [00:31:28] highest value so so that is a minimax policy so how would I change my [00:31:31] policy so how would I change my recurrence if I were to play minimax oh [00:31:34] recurrence if I were to play minimax oh I'm gonna I'm going to call it vo let's [00:31:38] I'm gonna I'm going to call it vo let's look at the V of minimax of a state well [00:31:43] look at the V of minimax of a state well the recurrence is going to be over our [00:31:44] the recurrence is going to be over our minimax that we of minimax so let me [00:31:46] minimax that we of minimax so let me change that [00:31:52] if the agent is playing the agent is [00:31:55] if the agent is playing the agent is still trying to maximize the value so [00:31:57] still trying to maximize the value so that is all good what if the opponent is [00:32:00] that is all good what if the opponent is playing hmm the opponent is going to [00:32:05] playing hmm the opponent is going to minimize right so I don't have access to [00:32:07] minimize right so I don't have access to PI up so what I'm gonna do is I'm gonna [00:32:10] PI up so what I'm gonna do is I'm gonna remove this and say well the opponent is [00:32:12] remove this and say well the opponent is going to take an action that minimizes [00:32:14] going to take an action that minimizes the value of the successor of SN a and [00:32:20] the value of the successor of SN a and this is how you would compute the value [00:32:23] this is how you would compute the value of minimax policy so what happens like [00:32:43] of minimax policy so what happens like if the adversary or agent is not always [00:32:45] if the adversary or agent is not always adversarial right so in that case you [00:32:47] adversarial right so in that case you have another stochastic policy that kind [00:32:48] have another stochastic policy that kind of defines what the what the opponent is [00:32:50] of defines what the what the opponent is doing and if you have access to then you [00:32:52] doing and if you have access to then you can do something similar to expect the [00:32:53] can do something similar to expect the max if you don't have access to that [00:32:55] max if you don't have access to that maybe you would want to like act [00:32:56] maybe you would want to like act worse-case and assume that they're [00:32:58] worse-case and assume that they're always trying to minimize but but that's [00:33:00] always trying to minimize but but that's some prior knowledge that you have that [00:33:01] some prior knowledge that you have that allows you to track better or maybe [00:33:03] allows you to track better or maybe evaluate the value better for every [00:33:06] evaluate the value better for every state so we will talk about evaluation [00:33:07] state so we will talk about evaluation functions a little bit in the lecture [00:33:09] functions a little bit in the lecture and maybe you look that can inform your [00:33:10] and maybe you look that can inform your evaluation function all right so so so [00:33:16] evaluation function all right so so so here the value of minimax from the start [00:33:19] here the value of minimax from the start state is going to be one right does [00:33:21] state is going to be one right does everyone see that so I'm assuming my [00:33:23] everyone see that so I'm assuming my opponent is acting adversarially so we [00:33:26] opponent is acting adversarially so we have minus 51 and minus 5 if I am [00:33:28] have minus 51 and minus 5 if I am maximizing then the best thing I can get [00:33:30] maximizing then the best thing I can get is 1 and that's how we compute V of [00:33:33] is 1 and that's how we compute V of minimax and then there is really no [00:33:36] minimax and then there is really no analogy to this in MVP setting because [00:33:39] analogy to this in MVP setting because in the MVP setting you don't really have [00:33:40] in the MVP setting you don't really have this game we don't really have this [00:33:41] this game we don't really have this opponent that's playing against us and [00:33:44] opponent that's playing against us and what happens is is that this is the [00:33:46] what happens is is that this is the recurrence that you're gonna get which [00:33:47] recurrence that you're gonna get which is what we already have on the board [00:33:49] is what we already have on the board right okay so so what would the policy [00:33:53] right okay so so what would the policy be so the policy is just going to be the [00:33:55] be so the policy is just going to be the Arg max of this V of minimax so if you [00:33:58] Arg max of this V of minimax so if you want to know what the policy of your [00:33:59] want to know what the policy of your agent should be that's PI max it's the [00:34:01] agent should be that's PI max it's the Arg max of V [00:34:02] Arg max of V minimax / successor of that state and if [00:34:05] minimax / successor of that state and if you want to know what the policy of your [00:34:07] you want to know what the policy of your opponent that state s should be well [00:34:09] opponent that state s should be well that's argument of me of minimax which [00:34:12] that's argument of me of minimax which is intuitive it's other than that way [00:34:14] is intuitive it's other than that way you can actually figure out what the [00:34:15] you can actually figure out what the action should be what the policy what [00:34:17] action should be what the policy what the actual action should be okay all [00:34:20] the actual action should be okay all right so let's go back to this example [00:34:22] right so let's go back to this example this half in game so what we want to do [00:34:23] this half in game so what we want to do is you want to actually code up what a [00:34:25] is you want to actually code up what a minimax policy would do in this setting [00:34:28] minimax policy would do in this setting and maybe we can play with a minimax [00:34:30] and maybe we can play with a minimax policy after that okay so what would a [00:34:36] policy after that okay so what would a minimax policy dudes so it's a policy so [00:34:38] minimax policy dudes so it's a policy so it's going to be a function of states [00:34:40] it's going to be a function of states it's just give it the state and you're [00:34:42] it's just give it the state and you're gonna just write this recursion that we [00:34:44] gonna just write this recursion that we have on the board so so we're recursing [00:34:46] have on the board so so we're recursing over the state if you're in an end state [00:34:49] over the state if you're in an end state then what are we returning just so [00:34:53] then what are we returning just so utility okay [00:34:55] utility okay so we're returning the utility of that [00:34:58] so we're returning the utility of that state and there's no actions and then if [00:35:02] state and there's no actions and then if you're not in an end state then you're [00:35:04] you're not in an end state then you're either maximizing or minimizing over a [00:35:07] either maximizing or minimizing over a set of choices so let's actually like [00:35:09] set of choices so let's actually like create those choices so they can just [00:35:10] create those choices so they can just call Max and min on them so the choices [00:35:14] call Max and min on them so the choices we're going to iterate over all actions [00:35:16] we're going to iterate over all actions that we have and and what is that going [00:35:23] that we have and and what is that going to be exactly well that's going to be [00:35:24] to be exactly well that's going to be doing a recursion over the successor [00:35:27] doing a recursion over the successor state so we're going to recurse over the [00:35:30] state so we're going to recurse over the successor state so recurs over signal [00:35:33] successor state so recurs over signal game that successor of state in action [00:35:36] game that successor of state in action and I'm gonna return the action here too [00:35:38] and I'm gonna return the action here too because I just want to get the policy [00:35:40] because I just want to get the policy later and the successor this district [00:35:43] later and the successor this district URIs function returns a state in action [00:35:45] URIs function returns a state in action so I just want to get the state from the [00:35:47] so I just want to get the state from the first one in the action for the second [00:35:49] first one in the action for the second one okay so if player is plus 1 that's [00:35:53] one okay so if player is plus 1 that's the agent the agent should maximize the [00:35:55] the agent the agent should maximize the choices and if player is minus 1 then [00:35:58] choices and if player is minus 1 then then that's the opponent the opponent [00:36:01] then that's the opponent the opponent should try to minimize over these [00:36:02] should try to minimize over these choices and that's pretty much like this [00:36:06] choices and that's pretty much like this recursion that we have on the board and [00:36:08] recursion that we have on the board and that's our recursive function [00:36:13] okay so we're gonna recurse over over [00:36:15] okay so we're gonna recurse over over our state and that gives us a value and [00:36:18] our state and that gives us a value and it also gives us gives us an action so [00:36:21] it also gives us gives us an action so let's just print things out so you can [00:36:23] let's just print things out so you can refer to them so minimax gives us an [00:36:28] refer to them so minimax gives us an action and it tells us this is the value [00:36:31] action and it tells us this is the value that you can get all right and then it's [00:36:40] that you can get all right and then it's a policy so let's just return the action [00:36:42] a policy so let's just return the action okay so now what I'm gonna do is I'm [00:36:44] okay so now what I'm gonna do is I'm gonna say +1 agent is still a human [00:36:47] gonna say +1 agent is still a human policy and then it's playing against a [00:36:49] policy and then it's playing against a minimax policy so alright so let's who [00:36:53] minimax policy so alright so let's who wants to play with this it's a little [00:36:56] wants to play with this it's a little scarier to play with a mini policy [00:37:01] alright so let's do this [00:37:03] alright so let's do this Python alright so you are the agent so [00:37:09] Python alright so you are the agent so your player one you're starting from 15 [00:37:11] your player one you're starting from 15 what do you want to do so you just lost [00:37:17] what do you want to do so you just lost the game why do I know you lost the game [00:37:22] the game why do I know you lost the game oh now it's player minus one point we [00:37:24] oh now it's player minus one point we are at 7 and minimax policy took action [00:37:27] are at 7 and minimax policy took action minor and says action - and and and it [00:37:32] minor and says action - and and and it also yeah [00:37:33] also yeah takes action - so we're at 6 and then [00:37:35] takes action - so we're at 6 and then the value of the game is minus infinity [00:37:37] the value of the game is minus infinity so you're playing at a minimize policy [00:37:39] so you're playing at a minimize policy you're already getting minus infinity so [00:37:41] you're already getting minus infinity so so you just lost you anyone want to try [00:37:44] so you just lost you anyone want to try this again you want to try it again [00:37:53] subtract okay so you can win right so [00:38:00] subtract okay so you can win right so value is infinity right now and then [00:38:04] value is infinity right now and then yeah so and then the minimax policy also [00:38:06] yeah so and then the minimax policy also did a minus so we are at 13 right now [00:38:08] did a minus so we are at 13 right now it's your turn you're at 13 you just [00:38:14] it's your turn you're at 13 you just lost the game again yeah so minus [00:38:18] lost the game again yeah so minus infinity yes yeah actually you need to [00:38:20] infinity yes yeah actually you need to like alternate between them I think that [00:38:22] like alternate between them I think that is the best policy but maybe this get a [00:38:27] is the best policy but maybe this get a sense of how this runs the code is on [00:38:29] sense of how this runs the code is on line so just feel free to play with her [00:38:31] line so just feel free to play with her then figure out what is the best policy [00:38:34] then figure out what is the best policy to use all right [00:38:37] to use all right so okay so that was the minimax policy [00:38:42] so okay so that was the minimax policy and then this is kind of a recurrence [00:38:44] and then this is kind of a recurrence that we get for a minimax policy now [00:38:46] that we get for a minimax policy now what I want to do is I want to spend a [00:38:47] what I want to do is I want to spend a little bit of time talking about some [00:38:51] little bit of time talking about some properties of this minimax policy and [00:38:53] properties of this minimax policy and we've talked about two types of policies [00:38:55] we've talked about two types of policies so far right we've talked about [00:38:56] so far right we've talked about expecting max which is basically saying [00:38:58] expecting max which is basically saying I as an agent I'm trying to maximize but [00:39:02] I as an agent I'm trying to maximize but I know what my opponent is going to do [00:39:04] I know what my opponent is going to do so I'm going to assume my opponent does [00:39:06] so I'm going to assume my opponent does whatever and then I'm going to maximize [00:39:08] whatever and then I'm going to maximize based on that so so for example I am [00:39:10] based on that so so for example I am following and I'm gonna refer that to a [00:39:12] following and I'm gonna refer that to a spy of expecting max which means that [00:39:15] spy of expecting max which means that the agent and everything in red is for [00:39:17] the agent and everything in red is for the agent everything in blue is for the [00:39:19] the agent everything in blue is for the opponent so I'm gonna say the agent is [00:39:22] opponent so I'm gonna say the agent is following this policy which says I'm [00:39:24] following this policy which says I'm going to maximize assuming my opponent [00:39:26] going to maximize assuming my opponent is doing whatever and here I'm calling [00:39:28] is doing whatever and here I'm calling pi7 [00:39:29] pi7 as like some opponent policy it could be [00:39:31] as like some opponent policy it could be like anything but pi7 [00:39:33] like anything but pi7 so let's say that's opponent is playing [00:39:34] so let's say that's opponent is playing PI 7 I am going to maximize based on [00:39:36] PI 7 I am going to maximize based on that and and the value we just talked [00:39:38] that and and the value we just talked about is the value of expecting max the [00:39:40] about is the value of expecting max the other value we just talked about is the [00:39:42] other value we just talked about is the value of minimax which says I am the [00:39:44] value of minimax which says I am the agent I am going to maximize assuming [00:39:46] agent I am going to maximize assuming the opponent is going to minimize and [00:39:48] the opponent is going to minimize and then the opponent actually is going to [00:39:50] then the opponent actually is going to minimize and it's going to follow-up I [00:39:51] minimize and it's going to follow-up I mean okay so so these are the two values [00:39:53] mean okay so so these are the two values you have talked about so far I want to [00:39:56] you have talked about so far I want to talk a little bit about the properties [00:39:57] talk a little bit about the properties of this but before that let me sorry oh [00:40:01] of this but before that let me sorry oh wait it like kind of like mix the two [00:40:03] wait it like kind of like mix the two together we just say like [00:40:05] together we just say like heightened like the probability of [00:40:06] heightened like the probability of typing the minimum for like in expectrum [00:40:08] typing the minimum for like in expectrum acts I've double probability [00:40:11] acts I've double probability distribution over like the actions right [00:40:13] distribution over like the actions right so they're like why don't we just take [00:40:14] so they're like why don't we just take the action that like minimizes whatever [00:40:16] the action that like minimizes whatever our reward is and give it a higher [00:40:17] our reward is and give it a higher weight I didn't fully follow is it are [00:40:27] weight I didn't fully follow is it are you coming up with a new policy that [00:40:28] you coming up with a new policy that your thing would be a better policy [00:40:33] between like expecting Max and minimax [00:40:35] between like expecting Max and minimax in something so this this this table [00:40:38] in something so this this this table might kind of address that because it's [00:40:40] might kind of address that because it's it's considering four different cases [00:40:42] it's considering four different cases it's actually not considering the two [00:40:43] it's actually not considering the two cases so this might actually refer to [00:40:45] cases so this might actually refer to what you were what you were proposing so [00:40:47] what you were what you were proposing so so let's actually go through this first [00:40:48] so let's actually go through this first and then maybe like if it doesn't answer [00:40:50] and then maybe like if it doesn't answer that so all right so so I want to talk [00:40:52] that so all right so so I want to talk about a setting so this table it's [00:40:54] about a setting so this table it's actually not that confusing but it can [00:40:55] actually not that confusing but it can get confusing so do pay attention to [00:40:57] get confusing so do pay attention to this part all right so do I want to [00:41:02] this part all right so do I want to maybe maybe I'll write it over there so [00:41:03] maybe maybe I'll write it over there so I'm gonna use red for agent blue one for [00:41:11] I'm gonna use red for agent blue one for left my right okay all right okay and [00:41:15] left my right okay all right okay and then I'm gonna use before I'm gonna use [00:41:20] then I'm gonna use before I'm gonna use blue for the opponent policy okay so so [00:41:25] blue for the opponent policy okay so so then for agent who are going to have pie [00:41:28] then for agent who are going to have pie max right in agent could play pie max [00:41:31] max right in agent could play pie max what does that mean again I'm going to [00:41:33] what does that mean again I'm going to maximize assuming you're going to [00:41:35] maximize assuming you're going to minimize an agent could play pie [00:41:37] minimize an agent could play pie expecting max maybe a policy 7 I'm going [00:41:41] expecting max maybe a policy 7 I'm going to put 7 here which means I'm going to [00:41:43] to put 7 here which means I'm going to maximize assuming you're going to follow [00:41:45] maximize assuming you're going to follow this pie 7 so this is a thing that the [00:41:48] this pie 7 so this is a thing that the agents can do okay and then there are [00:41:53] agents can do okay and then there are things that my opponent can you I'm [00:41:56] things that my opponent can you I'm gonna write that here my opponent can [00:41:58] gonna write that here my opponent can actually follow pie min which is I'm [00:42:01] actually follow pie min which is I'm just going to minimize all my opponent [00:42:03] just going to minimize all my opponent could follow some other policy PI 7 [00:42:05] could follow some other policy PI 7 let's say PI 7 in the bucket example [00:42:07] let's say PI 7 in the bucket example right now is just acting as [00:42:09] right now is just acting as stochastically so half the time pick one [00:42:11] stochastically so half the time pick one number half the time pick another number [00:42:12] number half the time pick another number okay so so that is [00:42:15] okay so so that is what we have so I'm gonna draw my [00:42:18] what we have so I'm gonna draw my actually my tree so we can go over [00:42:20] actually my tree so we can go over examples of that too so this was the [00:42:23] examples of that too so this was the bucket example we started at minus 50 [00:42:26] bucket example we started at minus 50 and 50 in bucket a 1 and 3 in bucket B [00:42:31] and 50 in bucket a 1 and 3 in bucket B minus 5 and 15 in bucket C ok so this [00:42:34] minus 5 and 15 in bucket C ok so this was my buckets example I'm actually [00:42:36] was my buckets example I'm actually going to talk about it so alright so I'm [00:42:39] going to talk about it so alright so I'm going to talk about a bunch of [00:42:40] going to talk about a bunch of properties of me of Pi Max and timing [00:42:46] properties of me of Pi Max and timing which is what we have been referring to [00:42:48] which is what we have been referring to as the minimax value okay so I want to [00:42:52] as the minimax value okay so I want to talk about this a little bit so the [00:42:55] talk about this a little bit so the first property that that we can have is [00:42:57] first property that that we can have is is that V of Pi max time in is it is [00:43:10] is that V of Pi max time in is it is going to be an upper bound of any other [00:43:18] going to be an upper bound of any other value of any other policy pi of I'm [00:43:22] value of any other policy pi of I'm gonna just write PI of expecting max for [00:43:23] gonna just write PI of expecting max for any other policy for the agent assuming [00:43:27] any other policy for the agent assuming that my opponent is playing as a [00:43:30] that my opponent is playing as a minimizer okay so so what I'm writing so [00:43:35] minimizer okay so so what I'm writing so what I'm writing here is is the value is [00:43:36] what I'm writing here is is the value is going to be an upper bound of any other [00:43:39] going to be an upper bound of any other value if my agent decides to do anything [00:43:41] value if my agent decides to do anything else under the assumption that my [00:43:44] else under the assumption that my opponent is a minimizer so my opponent [00:43:46] opponent is a minimizer so my opponent is really trying to get me if my [00:43:47] is really trying to get me if my opponent is really trying to get me then [00:43:49] opponent is really trying to get me then the best thing I can do is to maximize [00:43:51] the best thing I can do is to maximize okay so that's kind of intuitive right [00:43:53] okay so that's kind of intuitive right that's an upper bound let's look at that [00:43:55] that's an upper bound let's look at that example so what is PI V of Pi mix PI max [00:43:59] example so what is PI V of Pi mix PI max and PI min so so we just talked about [00:44:01] and PI min so so we just talked about that right so if this guy is a minimizer [00:44:03] that right so if this guy is a minimizer we're gonna get minus 50 here 1 here [00:44:06] we're gonna get minus 50 here 1 here minus 5 here if this guy is a Maximizer [00:44:09] minus 5 here if this guy is a Maximizer what is the value I'm gonna get get it 1 [00:44:12] what is the value I'm gonna get get it 1 right I'm gonna go down here and then [00:44:13] right I'm gonna go down here and then I'm gonna get one so V of PI max and [00:44:17] I'm gonna get one so V of PI max and timing is just equal to 1 that is this [00:44:19] timing is just equal to 1 that is this value that is just equal to 1 okay what [00:44:23] value that is just equal to 1 okay what is this saying is that this is going to [00:44:26] is this saying is that this is going to be greater than [00:44:28] be greater than maybe the setting where my opponent [00:44:32] maybe the setting where my opponent sorry my my agent is following expecting [00:44:34] sorry my my agent is following expecting max and my opponent is still doing [00:44:36] max and my opponent is still doing timing so so what would this correspond [00:44:38] timing so so what would this correspond to what would this value correspond to [00:44:40] to what would this value correspond to so this is a value which says well I'm [00:44:45] so this is a value which says well I'm going to take an action assuming my [00:44:47] going to take an action assuming my opponent is acting as stochastically if [00:44:50] opponent is acting as stochastically if my opponent is acting stochastically I'm [00:44:52] my opponent is acting stochastically I'm gonna get zero here I'm gonna get two [00:44:54] gonna get zero here I'm gonna get two here I'm gonna get five here if I'm [00:44:56] here I'm gonna get five here if I'm assuming that and I'm trying to maximize [00:44:58] assuming that and I'm trying to maximize my own my own value which trout do I go [00:45:01] my own my own value which trout do I go I'm gonna go at this trout but it turns [00:45:04] I'm gonna go at this trout but it turns out that my opponent was not doing that [00:45:06] out that my opponent was not doing that my opponent was actually a minimizer so [00:45:09] my opponent was actually a minimizer so if my opponent was actually a minimizer [00:45:10] if my opponent was actually a minimizer and I went this route my opponent is [00:45:14] and I went this route my opponent is going to give me minus 5 so the value [00:45:16] going to give me minus 5 so the value I'm gonna end up getting is minus 5 so [00:45:19] I'm gonna end up getting is minus 5 so this is equal to minus 5 this is equal [00:45:23] this is equal to minus 5 this is equal to minus y so so far I've shown that [00:45:27] to minus y so so far I've shown that this guy is greater than this guy all [00:45:32] this guy is greater than this guy all right so that's the first property first [00:45:34] right so that's the first property first property is if my opponent is terrible [00:45:35] property is if my opponent is terrible and is trying to get me best thing I can [00:45:37] and is trying to get me best thing I can do is to maximize I shouldn't do [00:45:39] do is to maximize I shouldn't do anything else okay the second property [00:45:42] anything else okay the second property is is that this is V of Pi knocks again [00:45:47] is is that this is V of Pi knocks again the same be V of Pi Max and pi min is [00:45:52] the same be V of Pi Max and pi min is now a lower bound of a setting where [00:45:56] now a lower bound of a setting where your agent is maximizing assuming your [00:46:00] your agent is maximizing assuming your opponent is minimizing but your opponent [00:46:02] opponent is minimizing but your opponent was actually not minimizing your [00:46:03] was actually not minimizing your opponent was falling by 7 so so what [00:46:06] opponent was falling by 7 so so what this says is if you're trying to [00:46:08] this says is if you're trying to maximize assuming your agent or your [00:46:10] maximize assuming your agent or your opponent is always minimizing then then [00:46:12] opponent is always minimizing then then you're doing like you'll come up with [00:46:14] you're doing like you'll come up with like a lower bound and if your opponent [00:46:16] like a lower bound and if your opponent ends up doing something else you can [00:46:17] ends up doing something else you can always just do better than this lower [00:46:19] always just do better than this lower bound okay so what is what is this V you [00:46:25] bound okay so what is what is this V you call - well we just showed that is Titus [00:46:26] call - well we just showed that is Titus 1 right that is this value okay what [00:46:30] 1 right that is this value okay what does this correspond to so this is value [00:46:33] does this correspond to so this is value of pi max which is I am going to assume [00:46:37] of pi max which is I am going to assume you're trying to get me if I'm going to [00:46:39] you're trying to get me if I'm going to assume you're trying to get me I'm going [00:46:40] assume you're trying to get me I'm going to [00:46:40] to on this route because that is the thing [00:46:42] on this route because that is the thing that gives me the highest the highest [00:46:44] that gives me the highest the highest value but you're not trying to get me or [00:46:46] value but you're not trying to get me or falling pi7 [00:46:47] falling pi7 so if you're falling falling by seven [00:46:49] so if you're falling falling by seven you're just going to give me half the [00:46:51] you're just going to give me half the time one half the times three and that [00:46:54] time one half the times three and that actually corresponds to two and I'm [00:46:56] actually corresponds to two and I'm going to get value two instead of value [00:46:59] going to get value two instead of value one so this is actually equal to two in [00:47:02] one so this is actually equal to two in this case and this corresponds to this [00:47:05] this case and this corresponds to this value in the table which is again the [00:47:07] value in the table which is again the agent is following a Maximizer assuming [00:47:10] agent is following a Maximizer assuming the opponent is a minimizer ponen was [00:47:12] the opponent is a minimizer ponen was not a minimize their opponent was just [00:47:13] not a minimize their opponent was just following pi seven and this is just [00:47:17] following pi seven and this is just equal to two okay so so far the things [00:47:21] equal to two okay so so far the things I've shown are actually very intuitive [00:47:23] I've shown are actually very intuitive they seem a little complicated but [00:47:24] they seem a little complicated but they're very intuitive what I've shown [00:47:25] they're very intuitive what I've shown is that this value of minimax it's an [00:47:28] is that this value of minimax it's an upper bound if you're assuming our [00:47:30] upper bound if you're assuming our opponent is a terrible opponent like [00:47:33] opponent is a terrible opponent like it's going to be an upper bound because [00:47:34] it's going to be an upper bound because the best thing I can do is maximize I've [00:47:36] the best thing I can do is maximize I've also shown it's a lower bound if my [00:47:38] also shown it's a lower bound if my opponent is not as bad so so that's [00:47:40] opponent is not as bad so so that's that's what I've shown so far secure the [00:47:45] that's what I've shown so far secure the opponent's policy is completely innocent [00:47:49] opponent's policy is completely innocent yeah so here like because yeah the agent [00:47:52] yeah so here like because yeah the agent actually doesn't see the opponent where [00:47:54] actually doesn't see the opponent where the opponent does right even in the [00:47:56] the opponent does right even in the expecting max case it thinks the [00:47:57] expecting max case it thinks the opponent is going to follow PI seven but [00:47:59] opponent is going to follow PI seven but maybe the opponent falls at PI 7 maybe [00:48:01] maybe the opponent falls at PI 7 maybe not right so so like when we talk about [00:48:03] not right so so like when we talk about expecting Max and minimax it's always [00:48:05] expecting Max and minimax it's always the case that the opponent doesn't [00:48:06] the case that the opponent doesn't actually see what the opponent does but [00:48:08] actually see what the opponent does but the opponent can't think like the agent [00:48:10] the opponent can't think like the agent can think what the opponent does and I'm [00:48:13] can think what the opponent does and I'm going to talk about one more property [00:48:14] going to talk about one more property and this last property basically says if [00:48:18] and this last property basically says if you know something actually goes back to [00:48:20] you know something actually goes back to your question if you know something [00:48:22] your question if you know something about your opponent right if you know [00:48:24] about your opponent right if you know something about your opponent then you [00:48:26] something about your opponent then you shouldn't do that minimax policy you [00:48:27] shouldn't do that minimax policy you should actually do the thing that has [00:48:29] should actually do the thing that has some knowledge of what your opponent to [00:48:31] some knowledge of what your opponent to us so so that basically says this we PI [00:48:38] us so so that basically says this we PI max and some PI of opponent [00:48:42] max and some PI of opponent you know something about PI opponent you [00:48:44] you know something about PI opponent you know that opponent is playing PI 7 that [00:48:47] know that opponent is playing PI 7 that is going to be less than or equal to the [00:48:48] is going to be less than or equal to the case where you are following the PI of [00:48:54] case where you are following the PI of expect emacs of seven and the opponent [00:48:59] expect emacs of seven and the opponent actually falls by seven okay so what is [00:49:02] actually falls by seven okay so what is this last equality inequality saying [00:49:04] this last equality inequality saying well it is saying that the case where [00:49:05] well it is saying that the case where you're trying to maximize and you think [00:49:07] you're trying to maximize and you think your opponent is minimizing but your [00:49:09] your opponent is minimizing but your opponent is actually not minimizing the [00:49:11] opponent is actually not minimizing the value of that is going to be less than a [00:49:12] value of that is going to be less than a case where you're maximizing under some [00:49:15] case where you're maximizing under some knowledge of your opponent's policy and [00:49:17] knowledge of your opponent's policy and your opponent's policy actually ended up [00:49:19] your opponent's policy actually ended up doing that okay so so the first term is [00:49:23] doing that okay so so the first term is always the agent the second term is [00:49:25] always the agent the second term is always the opponent right so so this [00:49:26] always the opponent right so so this value we have already computed that [00:49:28] value we have already computed that that's equal to 2 this value what is [00:49:30] that's equal to 2 this value what is this value saying it is saying you are [00:49:34] this value saying it is saying you are going to maximize assuming your opponent [00:49:36] going to maximize assuming your opponent is stochastic so if I'm assuming my [00:49:38] is stochastic so if I'm assuming my opponent is stochastic then I'm assuming [00:49:41] opponent is stochastic then I'm assuming that this is 0 this is 2 this is 5 right [00:49:45] that this is 0 this is 2 this is 5 right I am trying to maximize so which one am [00:49:49] I am trying to maximize so which one am I every track should I go I should go [00:49:51] I every track should I go I should go this route because that gives me 5 so [00:49:54] this route because that gives me 5 so this is the agent thinking the opponent [00:49:56] this is the agent thinking the opponent is going to be a stochastic thinking is [00:49:58] is going to be a stochastic thinking is going to get 5 and it gets here and the [00:50:00] going to get 5 and it gets here and the opponent actually ends up following PI 7 [00:50:02] opponent actually ends up following PI 7 which is a stochastic thing so so we are [00:50:04] which is a stochastic thing so so we are actually going to get 5 so so this guy [00:50:07] actually going to get 5 so so this guy is equal to 5 and this is the last [00:50:11] is equal to 5 and this is the last inequality that we have [00:50:13] inequality that we have we are PI expecting a max of 7 and pi of [00:50:20] we are PI expecting a max of 7 and pi of 7 is greater than or equal to V of Pi [00:50:26] 7 is greater than or equal to V of Pi max and PI 7 we just showed this is [00:50:28] max and PI 7 we just showed this is equal to 5 [00:50:29] equal to 5 for this example ok all right [00:50:37] the reactions of your points always [00:50:40] the reactions of your points always whether or not if it's the guests is it [00:50:43] whether or not if it's the guests is it we say too so if you know something [00:50:50] we say too so if you know something about this the cast to say that's not [00:50:51] about this the cast to say that's not really like here I knew that the [00:50:53] really like here I knew that the opponent was following this is a casting [00:50:54] opponent was following this is a casting policy of one half one out I might have [00:50:56] policy of one half one out I might have known that the opponent is following a [00:50:58] known that the opponent is following a deterministic policy and always is [00:51:00] deterministic policy and always is picking the left one so I could have [00:51:01] picking the left one so I could have like follows like a same expecting max [00:51:03] like follows like a same expecting max policy under that knowledge it could be [00:51:05] policy under that knowledge it could be anything else but the whole idea of [00:51:06] anything else but the whole idea of expecting max is I have some knowledge [00:51:08] expecting max is I have some knowledge of what the policy of the opponent is it [00:51:10] of what the policy of the opponent is it could be a stochastic policy it could be [00:51:12] could be a stochastic policy it could be a deterministic policy under that how [00:51:14] a deterministic policy under that how would I maximize that me not like that [00:51:19] would I maximize that me not like that bottom right is greater than the bottom [00:51:22] bottom right is greater than the bottom always yeah so the question is do we [00:51:25] always yeah so the question is do we have so we have what is in equality so [00:51:27] have so we have what is in equality so transitively this guy is always greater [00:51:29] transitively this guy is always greater than this guy and that kind of makes [00:51:31] than this guy and that kind of makes sense right like you're saying like if [00:51:33] sense right like you're saying like if you're following expecting max okay so [00:51:35] you're following expecting max okay so this last one kind of makes sense right [00:51:36] this last one kind of makes sense right it's basically saying if you're [00:51:37] it's basically saying if you're following expecting max and you know [00:51:39] following expecting max and you know something about your opponent and your [00:51:41] something about your opponent and your opponent actually ended up doing that so [00:51:42] opponent actually ended up doing that so though your value should be greater than [00:51:44] though your value should be greater than pretty much anything right because you [00:51:45] pretty much anything right because you knew something about the opponent you [00:51:47] knew something about the opponent you play knowing that having that knowledge [00:51:50] play knowing that having that knowledge yes [00:51:53] is that just caste clear know what so [00:51:58] is that just caste clear know what so it's knowing what they're gonna take [00:51:59] it's knowing what they're gonna take right click here I knew what the point I [00:52:01] right click here I knew what the point I knew that half the time they're going to [00:52:02] knew that half the time they're going to take this one half the time you're going [00:52:04] take this one half the time you're going to take the other one and then I use [00:52:05] to take the other one and then I use that knowledge right now is the [00:52:10] that knowledge right now is the expecting max policy given that your [00:52:12] expecting max policy given that your opponent is following atomizer policy [00:52:14] opponent is following atomizer policy given that given if your opponent is [00:52:17] given that given if your opponent is following pipe in it is it to do a [00:52:20] following pipe in it is it to do a Maximizer [00:52:21] Maximizer so the expecting that policy is as this [00:52:24] so the expecting that policy is as this policy when here we have a some the [00:52:27] policy when here we have a some the expecting max policy assumes your [00:52:30] expecting max policy assumes your opponent is following PI opponent and [00:52:32] opponent is following PI opponent and assumes that it has access to PI [00:52:34] assumes that it has access to PI opponent so it ends up doing this some [00:52:36] opponent so it ends up doing this some over here [00:52:38] we probably were saying so you're saying [00:52:41] we probably were saying so you're saying if PI opponent is actually PI min then [00:52:43] if PI opponent is actually PI min then do they end up being equal to each other [00:52:45] do they end up being equal to each other and yeah I guess yeah so you know that [00:52:49] and yeah I guess yeah so you know that you're poor it's becomes minimax right [00:52:50] you're poor it's becomes minimax right if you know your opponent is following [00:52:52] if you know your opponent is following me as I can't minimize every just like [00:52:54] me as I can't minimize every just like all that minimax all right so I'm gonna [00:52:56] all that minimax all right so I'm gonna move ahead a little bit [00:52:58] move ahead a little bit all right so and then this is like what [00:53:01] all right so and then this is like what we have already talked about okay [00:53:03] we have already talked about okay so a few other things about modifying [00:53:05] so a few other things about modifying this game so so we have okay so we have [00:53:06] this game so so we have okay so we have talked about this game we have talked [00:53:08] talked about this game we have talked about properties of this game there's a [00:53:09] about properties of this game there's a simple modification one can do which is [00:53:11] simple modification one can do which is bring nature in so there was a question [00:53:13] bring nature in so there was a question earlier which was like is there any [00:53:15] earlier which was like is there any chance here and then yeah you can like [00:53:17] chance here and then yeah you can like actually bring chance inside here so so [00:53:19] actually bring chance inside here so so let's say that you have the same game as [00:53:20] let's say that you have the same game as before you're choosing one of the three [00:53:22] before you're choosing one of the three bins and then after choosing one of the [00:53:24] bins and then after choosing one of the three wins you can flip a coin and if [00:53:27] three wins you can flip a coin and if heads comes then you can move one bin to [00:53:30] heads comes then you can move one bin to the left with wraparound so what this [00:53:33] the left with wraparound so what this means is 50% of the time tails comes [00:53:35] means is 50% of the time tails comes you're not changing anything you have [00:53:37] you're not changing anything you have this setup 50% of the time you get heads [00:53:40] this setup 50% of the time you get heads and then in those settings you're just [00:53:42] and then in those settings you're just gonna pick like a neighboring bin as [00:53:43] gonna pick like a neighboring bin as opposed to your zombie so so there [00:53:47] opposed to your zombie so so there you're adding this notion of chance here [00:53:48] you're adding this notion of chance here and and it's kind of acting as a new [00:53:51] and and it's kind of acting as a new player so so it's not actually the [00:53:52] player so so it's not actually the making things that much more complicated [00:53:54] making things that much more complicated so so what happens is in some sense we [00:53:56] so so what happens is in some sense we have a policy of a coin which is nature [00:53:59] have a policy of a coin which is nature here right and policy of coin is half [00:54:01] here right and policy of coin is half the time I get 0 I don't change anything [00:54:04] the time I get 0 I don't change anything half the time I just get the neighboring [00:54:06] half the time I just get the neighboring bin as opposed to my main bin and then I [00:54:09] bin as opposed to my main bin and then I get this new tree Berber I have like a [00:54:11] get this new tree Berber I have like a whole new level for what we're the [00:54:13] whole new level for what we're the chance place so we have now we have max [00:54:15] chance place so we have now we have max nodes we have min nodes we also have [00:54:17] nodes we have min nodes we also have these chance nodes here and the chance [00:54:19] these chance nodes here and the chance nodes again like sometimes they take me [00:54:21] nodes again like sometimes they take me to the original bucket and then 50% of [00:54:23] to the original bucket and then 50% of times they take me to a neighboring [00:54:25] times they take me to a neighboring bucket but but the whole story like [00:54:27] bucket but but the whole story like stays the same like nothing changes you [00:54:29] stays the same like nothing changes you can you can still compute value [00:54:30] can you can still compute value functions you can still push the value [00:54:32] functions you can still push the value functions further up it's the same sort [00:54:34] functions further up it's the same sort of recurrence nothing fundamental [00:54:36] of recurrence nothing fundamental changes just it just feels like there [00:54:37] changes just it just feels like there are three things playing now so so then [00:54:43] are three things playing now so so then this is actually called expecting [00:54:45] this is actually called expecting minimax so value of expecting minimax [00:54:48] minimax so value of expecting minimax here in this case for example is minus [00:54:51] here in this case for example is minus two because there is a min node for the [00:54:55] two because there is a min node for the opponent there's an expectation node for [00:54:57] opponent there's an expectation node for what nature does and then there is a max [00:54:59] what nature does and then there is a max node for what the agent should do that's [00:55:01] node for what the agent should do that's why it's called expecting minimax and [00:55:03] why it's called expecting minimax and then you can actually compute the same [00:55:04] then you can actually compute the same value there's like two players I pick up [00:55:09] value there's like two players I pick up in then you flip the point and then [00:55:12] in then you flip the point and then shift it left or notch it to left and [00:55:15] shift it left or notch it to left and then I get to take the number yes well [00:55:17] then I get to take the number yes well you thought you loved the opponent yeah [00:55:19] you thought you loved the opponent yeah yeah so there's still two players and [00:55:21] yeah so there's still two players and then the third coin think yes all right [00:55:26] then the third coin think yes all right so so yes so the way to formalize this [00:55:28] so so yes so the way to formalize this is you have players so you have an agent [00:55:30] is you have players so you have an agent you have an opponent down you have coin [00:55:32] you have an opponent down you have coin and then the recurrence changes a little [00:55:35] and then the recurrence changes a little bit I guess so so what happens is the [00:55:38] bit I guess so so what happens is the recurrence that we have had for minimax [00:55:39] recurrence that we have had for minimax was just the Max and min and it would [00:55:42] was just the Max and min and it would just return us the utility if you're in [00:55:44] just return us the utility if you're in an nth function and in an end-state now [00:55:46] an nth function and in an end-state now if the if it is the coins term we just [00:55:49] if the if it is the coins term we just do a sum over and expected some of their [00:55:52] do a sum over and expected some of their policy of the coin which is what we were [00:55:54] policy of the coin which is what we were doing expecting minimax but we just have [00:55:56] doing expecting minimax but we just have like a new return for one coin place so [00:55:58] like a new return for one coin place so so everything here kind of follows [00:56:00] so everything here kind of follows naturally terms up what we're expecting [00:56:03] naturally terms up what we're expecting okay all right so the summary so far is [00:56:06] okay all right so the summary so far is well we've been talking about max notes [00:56:08] well we've been talking about max notes we've been talking about chance notes [00:56:10] we've been talking about chance notes like what if you have a coin there and [00:56:11] like what if you have a coin there and then also these mid notes and and [00:56:14] then also these mid notes and and basically we've been talking about [00:56:16] basically we've been talking about composing these sore [00:56:17] composing these sore notes together and creating like a [00:56:19] notes together and creating like a minimax game or or an expecting max game [00:56:22] minimax game or or an expecting max game ana value function is we just do the [00:56:25] ana value function is we just do the usual recurrence that we have been doing [00:56:27] usual recurrence that we have been doing in this class from the expected utility [00:56:29] in this class from the expected utility to from the utility to come up with this [00:56:31] to from the utility to come up with this expected utility value for all the notes [00:56:33] expected utility value for all the notes that we have so there might be other [00:56:35] that we have so there might be other other scenarios that you might want to [00:56:37] other scenarios that you might want to think about for example for your [00:56:38] think about for example for your projects or like in general there are [00:56:40] projects or like in general there are other variations of games that you might [00:56:43] other variations of games that you might want to think about so what if like the [00:56:45] want to think about so what if like the case that you're playing with multiple [00:56:46] case that you're playing with multiple opponents like so far we have talked [00:56:47] opponents like so far we have talked about like the two-player setting where [00:56:49] about like the two-player setting where we have one opponent and one agent but [00:56:51] we have one opponent and one agent but what if you have multiple opponents like [00:56:52] what if you have multiple opponents like you can think about how the tree changes [00:56:54] you can think about how the tree changes in those settings or for example like [00:56:57] in those settings or for example like the taking turns aspects of it like is [00:56:58] the taking turns aspects of it like is it so if the game is simultaneous versus [00:57:00] it so if the game is simultaneous versus your turn taking or like you can imagine [00:57:03] your turn taking or like you can imagine settings where you have some actions [00:57:05] settings where you have some actions that allow you to have an extra turn so [00:57:07] that allow you to have an extra turn so so you have two turns and then the next [00:57:09] so you have two turns and then the next person takes takes turn so you should [00:57:12] person takes takes turn so you should think about some of these some of them [00:57:13] think about some of these some of them come up into homework so think about [00:57:16] come up into homework so think about variations of games in general they're [00:57:18] variations of games in general they're kind of fun so to talk a little bit [00:57:21] kind of fun so to talk a little bit about the computation aspects of this so [00:57:25] about the computation aspects of this so this is pretty bad are you talking about [00:57:27] this is pretty bad are you talking about a game tree which is similar to tree [00:57:30] a game tree which is similar to tree search so we're taking its research [00:57:33] search so we're taking its research approach if you remember it's research [00:57:35] approach if you remember it's research like the algorithms we're using like if [00:57:39] like the algorithms we're using like if you have branching factor of B and some [00:57:41] you have branching factor of B and some depth of D then then in terms of time [00:57:44] depth of D then then in terms of time it's exponential in the order of B to [00:57:46] it's exponential in the order of B to the to D in this case so I'm using D for [00:57:49] the to D in this case so I'm using D for the number of how do I say this so so [00:57:53] the number of how do I say this so so that's to D because the play the agent [00:57:55] that's to D because the play the agent plays and an opponent plays so that's [00:57:57] plays and an opponent plays so that's how I'm counting it so every every to D [00:57:59] how I'm counting it so every every to D like we have two replies but D that's [00:58:03] like we have two replies but D that's all right and then in terms of space its [00:58:06] all right and then in terms of space its order of D in terms of time it's [00:58:07] order of D in terms of time it's exponential that's pretty bad so for a [00:58:09] exponential that's pretty bad so for a game of like chess for example the [00:58:11] game of like chess for example the branching factor is around thirty five [00:58:13] branching factor is around thirty five steps is around 50 so if you compute B [00:58:17] steps is around 50 so if you compute B to the to D then it goes in the order of [00:58:20] to the to D then it goes in the order of like number of atoms and universe that [00:58:22] like number of atoms and universe that that's not doable we should we are not [00:58:25] that's not doable we should we are not able to use any of these methods so so [00:58:27] able to use any of these methods so so how do we make things faster so we [00:58:29] how do we make things faster so we should be talking about high [00:58:31] should be talking about high things faster so there are two [00:58:33] things faster so there are two approaches that we are talking about in [00:58:35] approaches that we are talking about in this class too to make things faster the [00:58:37] this class too to make things faster the first approach is using an evaluation [00:58:38] first approach is using an evaluation function so using an evaluation function [00:58:42] function so using an evaluation function what we can do is we can use [00:58:43] what we can do is we can use domain-specific knowledge about the game [00:58:45] domain-specific knowledge about the game to define almost like features about the [00:58:48] to define almost like features about the game in order to approximate like the [00:58:50] game in order to approximate like the value did this value function at a [00:58:52] value did this value function at a particular state so I'm gonna talk about [00:58:54] particular state so I'm gonna talk about that a little bit and then another [00:58:56] that a little bit and then another approach is this approach which is kind [00:58:58] approach is this approach which is kind of simple and kind of nice which is [00:59:00] of simple and kind of nice which is called alpha beta pruning and and alpha [00:59:02] called alpha beta pruning and and alpha beta pruning approach basically gets rid [00:59:04] beta pruning approach basically gets rid of part of the tree if it realizes you [00:59:07] of part of the tree if it realizes you don't need to go down that tree that [00:59:08] don't need to go down that tree that part that part of the subtree so so it's [00:59:10] part that part of the subtree so so it's a pruning approach that doesn't explore [00:59:12] a pruning approach that doesn't explore all of the tree only explores parts of [00:59:14] all of the tree only explores parts of the tree so so we're going to talk about [00:59:15] the tree so so we're going to talk about both of them alright so evaluation [00:59:19] both of them alright so evaluation functions so let's talk about that okay [00:59:22] functions so let's talk about that okay so the depths can be really like the [00:59:25] so the depths can be really like the breadth and depth of the game can be [00:59:27] breadth and depth of the game can be really large that's not that great so [00:59:29] really large that's not that great so one approach to go about solving the [00:59:32] one approach to go about solving the problem is is to kind of limit the depth [00:59:35] problem is is to kind of limit the depth so instead of like exploring everything [00:59:37] so instead of like exploring everything in a tree just limits the depth and get [00:59:40] in a tree just limits the depth and get to that particular depth and then after [00:59:42] to that particular depth and then after that when you get to that depth just [00:59:43] that when you get to that depth just call an evaluation function so so if you [00:59:46] call an evaluation function so so if you were to like search the full tree this [00:59:48] were to like search the full tree this was the recursion that that we had like [00:59:50] was the recursion that that we had like we have talked about right this is like [00:59:52] we have talked about right this is like if you're doing a minimax approach this [00:59:53] if you're doing a minimax approach this is the recursion that you got to do you [00:59:55] is the recursion that you got to do you got over all the states and actions and [00:59:56] got over all the states and actions and go over all the tree but if you're using [01:00:00] go over all the tree but if you're using a limited depth tree search approach [01:00:02] a limited depth tree search approach what you can do is you can basically [01:00:04] what you can do is you can basically have this depth need and then decrement [01:00:08] have this depth need and then decrement D every time you go over an agent an [01:00:10] D every time you go over an agent an opponent like every time you go down the [01:00:13] opponent like every time you go down the tree and at some point D just becomes [01:00:15] tree and at some point D just becomes zero so you get to put some particular [01:00:17] zero so you get to put some particular depth of the tree and when D becomes [01:00:19] depth of the tree and when D becomes zero you're going to call an evaluation [01:00:21] zero you're going to call an evaluation function on the states that you get okay [01:00:23] function on the states that you get okay and this evaluation function is almost [01:00:27] and this evaluation function is almost of the same form of what future costs [01:00:29] of the same form of what future costs maybe we're talking about search [01:00:30] maybe we're talking about search problems right so so if you knew exactly [01:00:32] problems right so so if you knew exactly what it was that then you were done but [01:00:35] what it was that then you were done but you don't know exactly what it is [01:00:36] you don't know exactly what it is because if you knew that you were to [01:00:38] because if you knew that you were to solve like the whole tree search problem [01:00:41] solve like the whole tree search problem but in general you can have some sort of [01:00:43] but in general you can have some sort of weak estimate of [01:00:44] weak estimate of of what what the future costs would be [01:00:47] of what what the future costs would be so yeah so an evaluation function eval s [01:00:52] so yeah so an evaluation function eval s is a weak estimate of V minimax of s so [01:00:57] is a weak estimate of V minimax of s so it's a weak estimate of your value [01:00:59] it's a weak estimate of your value function okay all right so so analogy of [01:01:04] function okay all right so so analogy of that is future cost in search problems [01:01:07] that is future cost in search problems so how do we come up with an evaluation [01:01:09] so how do we come up with an evaluation function so we do it in a similar manner [01:01:12] function so we do it in a similar manner to admitted in the learning lecture [01:01:14] to admitted in the learning lecture where we're coming up with with features [01:01:16] where we're coming up with with features and and and weights for those features [01:01:19] and and and weights for those features right so so if I'm playing like chess [01:01:20] right so so if I'm playing like chess and like the way we play it right like [01:01:22] and like the way we play it right like we think about a set of actions that we [01:01:24] we think about a set of actions that we can take and where we end up at and and [01:01:26] can take and where we end up at and and based on where we end up at then we kind [01:01:29] based on where we end up at then we kind of evaluate how good that were this [01:01:31] of evaluate how good that were this right we have some notions of features [01:01:33] right we have some notions of features and how good looking like how good that [01:01:35] and how good looking like how good that boards would be from that point on and [01:01:37] boards would be from that point on and that allows us evaluate what action to [01:01:39] that allows us evaluate what action to pick right like when we play chess [01:01:40] pick right like when we play chess that's kind of what we do we pick a [01:01:42] that's kind of what we do we pick a couple of actions and we see how the [01:01:44] couple of actions and we see how the boards would look like after taking them [01:01:45] boards would look like after taking them an evaluation function kind of does the [01:01:47] an evaluation function kind of does the same thing it tries to figure out what [01:01:48] same thing it tries to figure out what are the things that we should care about [01:01:50] are the things that we should care about in a specific game in this case and in [01:01:53] in a specific game in this case and in chess and then try so I give values to [01:01:55] chess and then try so I give values to them so so it might be things like the [01:01:57] them so so it might be things like the number of pieces we have or mobility of [01:01:59] number of pieces we have or mobility of those pieces or if our king is safe or [01:02:02] those pieces or if our king is safe or or if you have central control or not so [01:02:04] or if you have central control or not so for example for the pieces what we can [01:02:06] for example for the pieces what we can do is we can look at the difference [01:02:08] do is we can look at the difference between the number of pieces we have [01:02:10] between the number of pieces we have between what we have and what our [01:02:11] between what we have and what our opponent has so number of Kings that I [01:02:13] opponent has so number of Kings that I have versus number of opponents that I [01:02:15] have versus number of opponents that I have well that seems really important [01:02:17] have well that seems really important because if I don't have a king an [01:02:18] because if I don't have a king an opponent has a king then now I've lost [01:02:21] opponent has a king then now I've lost the game so so you might put like a [01:02:23] the game so so you might put like a really large weight for that and you [01:02:25] really large weight for that and you might care about like the differences [01:02:26] might care about like the differences between the number of ponds or number [01:02:29] between the number of ponds or number Queens and other types of pieces that [01:02:31] Queens and other types of pieces that you have on the board so so that allows [01:02:32] you have on the board so so that allows you to care about to think about how [01:02:35] you to care about to think about how good the board is or number of legal [01:02:37] good the board is or number of legal moves that you have a number of legal [01:02:39] moves that you have a number of legal moves that your opponent has and then [01:02:41] moves that your opponent has and then that gives you some notion of like [01:02:43] that gives you some notion of like mobility of that state ok all right [01:02:47] mobility of that state ok all right so so summary so far is yeah so this is [01:02:51] so so summary so far is yeah so this is pretty bad order of B to the to D is [01:02:53] pretty bad order of B to the to D is pretty bad and an evaluation function [01:02:56] pretty bad and an evaluation function basically tries to estimate this [01:02:58] basically tries to estimate this the minimax using some domain knowledge [01:03:00] the minimax using some domain knowledge and then unlike a estar we actually [01:03:03] and then unlike a estar we actually don't have like any guarantees in terms [01:03:04] don't have like any guarantees in terms of like error from from these sort of [01:03:06] of like error from from these sort of approximations so but it's an [01:03:09] approximations so but it's an approximation people use it it's pretty [01:03:11] approximation people use it it's pretty good we will talk about it a little bit [01:03:13] good we will talk about it a little bit later next time when we think about like [01:03:16] later next time when we think about like what sort of weights we should we should [01:03:18] what sort of weights we should we should pick for each one of these for each one [01:03:21] pick for each one of these for each one of these features so you should think [01:03:22] of these features so you should think learning when you think about what are [01:03:24] learning when you think about what are the weights we are using all right so [01:03:27] the weights we are using all right so okay so now I want to spend a bit of [01:03:29] okay so now I want to spend a bit of time on alpha beta pruning because this [01:03:32] time on alpha beta pruning because this is yeah important okay so alpha beta [01:03:36] is yeah important okay so alpha beta pruning um yeah the concept of alpha [01:03:40] pruning um yeah the concept of alpha beta pruning is also pretty simple but I [01:03:41] beta pruning is also pretty simple but I think it's one of those things that was [01:03:43] think it's one of those things that was it's kind of that table you should pay [01:03:45] it's kind of that table you should pay attention to kind of get what it is [01:03:47] attention to kind of get what it is happening all right so so let's say that [01:03:51] happening all right so so let's say that you want to choose between some bucket a [01:03:53] you want to choose between some bucket a and bucket be okay and you want to [01:03:56] and bucket be okay and you want to choose the maximum value and then you [01:03:58] choose the maximum value and then you know that the values of a fall into like [01:04:00] know that the values of a fall into like three to five and the values of B fall [01:04:02] three to five and the values of B fall into five to ten so so they don't really [01:04:04] into five to ten so so they don't really have like any any intersections between [01:04:06] have like any any intersections between each other so so in that case you don't [01:04:09] each other so so in that case you don't really care about your if you're picking [01:04:11] really care about your if you're picking a maximum right you shouldn't care about [01:04:13] a maximum right you shouldn't care about your bucket a or rest of your bucket a [01:04:15] your bucket a or rest of your bucket a because because you already know that [01:04:17] because because you already know that you're above wise you're happy with B [01:04:19] you're above wise you're happy with B you shouldn't even look at a so so kind [01:04:21] you shouldn't even look at a so so kind of the underlying concept of alpha beta [01:04:26] of the underlying concept of alpha beta pruning is is maintaining a lower bound [01:04:28] pruning is is maintaining a lower bound and upper bound on values and then if [01:04:31] and upper bound on values and then if the intervals don't overlap then [01:04:32] the intervals don't overlap then basically dropping part of the subtree [01:04:35] basically dropping part of the subtree that you don't need to work on because [01:04:37] that you don't need to work on because there is there is no overlap between [01:04:38] there is there is no overlap between there okay so here's an example so let's [01:04:43] there okay so here's an example so let's say we have these max notes and mid [01:04:44] say we have these max notes and mid notes and you're gonna go down and see [01:04:46] notes and you're gonna go down and see three and then this is a mid note so so [01:04:49] three and then this is a mid note so so you're gonna get three here so when I [01:04:52] you're gonna get three here so when I get to the max note here right I know [01:04:54] get to the max note here right I know what I know is that the max node is [01:04:57] what I know is that the max node is going to get three or higher right [01:04:59] going to get three or higher right that's one one thing that I would know [01:05:01] that's one one thing that I would know without even looking at anything on the [01:05:04] without even looking at anything on the on the other side that I've been looking [01:05:06] on the other side that I've been looking at the sub tree on the Left I already [01:05:08] at the sub tree on the Left I already know that this max no it should get [01:05:09] know that this max no it should get three or higher right [01:05:11] three or higher right your dad okay so so then when I go down [01:05:15] your dad okay so so then when I go down to the this min node and I see two here [01:05:18] to the this min node and I see two here right I know this is a min node it's [01:05:21] right I know this is a min node it's going to get a value that's less than or [01:05:23] going to get a value that's less than or equal to 2 less than or equal to 2 does [01:05:26] equal to 2 less than or equal to 2 does not have any interval with greater than [01:05:28] not have any interval with greater than or equal to 3 so I should not worry [01:05:31] or equal to 3 so I should not worry about that subtree did everyone see that [01:05:35] about that subtree did everyone see that so maybe you're like let me draw that [01:05:38] so maybe you're like let me draw that here [01:05:42] so that's kind of like the whole concept [01:05:44] so that's kind of like the whole concept of what happens in the alpha-beta [01:05:46] of what happens in the alpha-beta pruning so I have this max node this was [01:05:53] pruning so I have this max node this was three this was five I found that this [01:05:57] three this was five I found that this guy is three this is a max node whatever [01:05:59] guy is three this is a max node whatever it gets it's going to be greater than or [01:06:02] it gets it's going to be greater than or equal to three because it's already seen [01:06:04] equal to three because it's already seen three it's not gonna get any value less [01:06:06] three it's not gonna get any value less than three all right so we know whatever [01:06:08] than three all right so we know whatever value we are gonna get at this max node [01:06:10] value we are gonna get at this max node is going to be three or higher then I'm [01:06:15] is going to be three or higher then I'm gonna go down here and then I see two [01:06:17] gonna go down here and then I see two here right [01:06:19] here right it's a min node whatever it gets is [01:06:21] it's a min node whatever it gets is going to be less than or equal to two so [01:06:24] going to be less than or equal to two so less than or equal to 2 is the value [01:06:27] less than or equal to 2 is the value that's going to get popped up here I [01:06:30] that's going to get popped up here I already know less than or equal to 2 has [01:06:33] already know less than or equal to 2 has no interval with 3 or greater so I don't [01:06:36] no interval with 3 or greater so I don't even need to worry about this like I [01:06:37] even need to worry about this like I like I can completely ignore this side [01:06:40] like I can completely ignore this side of the tree I don't need to know [01:06:41] of the tree I don't need to know whatever is happening down here I don't [01:06:43] whatever is happening down here I don't even need to look at that okay cuz cuz I [01:06:46] even need to look at that okay cuz cuz I like this value should be greater than [01:06:48] like this value should be greater than applause sorry [01:06:57] now minimum so it's a minimum it's a [01:07:00] now minimum so it's a minimum it's a minimum note right so it's going to be [01:07:03] minimum note right so it's going to be your less than or equal it's a mid note [01:07:09] your less than or equal it's a mid note so I saw two if I see ten here or twenty [01:07:12] so I saw two if I see ten here or twenty here like I'm not gonna pick that like [01:07:13] here like I'm not gonna pick that like it's two or all right so yeah so if it [01:07:22] it's two or all right so yeah so if it is 10 or 100 or whatever substrate is [01:07:24] is 10 or 100 or whatever substrate is there like we're not gonna look at that [01:07:26] there like we're not gonna look at that so that that is kind of the whole [01:07:28] so that that is kind of the whole concept [01:07:30] concept all right so okay so the key idea of [01:07:37] all right so okay so the key idea of alpha-beta pruning is as we're like an [01:07:39] alpha-beta pruning is as we're like an optimal path is going to get to some [01:07:41] optimal path is going to get to some leaf node that has some utility and that [01:07:43] leaf node that has some utility and that utility is the thing that is going to be [01:07:46] utility is the thing that is going to be pushed up like like and then the [01:07:49] pushed up like like and then the interesting thing is if you pick the [01:07:51] interesting thing is if you pick the optimal path the value of the note on [01:07:53] optimal path the value of the note on that optimal path are all going to be [01:07:55] that optimal path are all going to be equal to each other like that basically [01:07:58] equal to each other like that basically that utility that you're gonna get [01:07:59] that utility that you're gonna get pushed up all the way to the top so [01:08:02] pushed up all the way to the top so because of that like we need to have [01:08:04] because of that like we need to have like these DS like we can't have [01:08:07] like these DS like we can't have settings where we don't have any [01:08:09] settings where we don't have any intersections between the intervals [01:08:10] intersections between the intervals because we know if this rule is if this [01:08:12] because we know if this rule is if this were to be the optimal path the value on [01:08:15] were to be the optimal path the value on this node should have been the same as [01:08:16] this node should have been the same as the value at this node the same as the [01:08:19] the value at this node the same as the value at this node and and so on so if [01:08:21] value at this node and and so on so if they don't have any intervals then no [01:08:22] they don't have any intervals then no way that they would have the same value [01:08:24] way that they would have the same value and no way for that path to be the [01:08:26] and no way for that path to be the optimal path okay so so that's kind of [01:08:29] optimal path okay so so that's kind of the reason that it works cuz the optimal [01:08:31] the reason that it works cuz the optimal path you're gonna have the same value [01:08:33] path you're gonna have the same value throughout okay [01:08:34] throughout okay so all right so how do we actually do [01:08:36] so all right so how do we actually do this so the way we do this is we are [01:08:37] this so the way we do this is we are going to keep a lower bound on max nodes [01:08:40] going to keep a lower bound on max nodes so I'm gonna call that a s here so we [01:08:47] so I'm gonna call that a s here so we are gonna have a s which is a lower [01:08:51] are gonna have a s which is a lower bound on max nodes [01:08:57] so we're gonna keep track of that you're [01:09:00] so we're gonna keep track of that you're also going to keep track of BS which is [01:09:03] also going to keep track of BS which is an upper bound on mid notes and then if [01:09:14] an upper bound on mid notes and then if they don't have any intervals we just [01:09:15] they don't have any intervals we just drop that subtree if they have intervals [01:09:17] drop that subtree if they have intervals we just keep updating a snps okay so so [01:09:21] we just keep updating a snps okay so so here's an example so let's say that we [01:09:23] here's an example so let's say that we start with this top node somehow we have [01:09:25] start with this top node somehow we have found out that this top node should be [01:09:27] found out that this top node should be greater than or equal to 6 right somehow [01:09:29] greater than or equal to 6 right somehow I know it should be greater than or [01:09:32] I know it should be greater than or equal to 6 [01:09:33] equal to 6 okay so that is my a s value so my a s [01:09:38] okay so that is my a s value so my a s is equal to 6 it is it is going to be a [01:09:42] is equal to 6 it is it is going to be a lower bound on my max node I know that [01:09:46] lower bound on my max node I know that the value the optimal value is going to [01:09:48] the value the optimal value is going to be something greater than equal to 6 [01:09:50] be something greater than equal to 6 ok then somehow we get to this min node [01:09:53] ok then somehow we get to this min node and then we realize that this min node [01:09:55] and then we realize that this min node should be less than or equal to 8 [01:09:57] should be less than or equal to 8 so you're here let's say 8 is here [01:10:02] so you're here let's say 8 is here you still have some interval you're all [01:10:04] you still have some interval you're all good right so the s is going to be equal [01:10:07] good right so the s is going to be equal to 8 right we have an upper bound on the [01:10:10] to 8 right we have an upper bound on the min node and that tells us that upper [01:10:12] min node and that tells us that upper bound is 8 so the the value the optimal [01:10:14] bound is 8 so the the value the optimal value the value on the optimal path is [01:10:17] value the value on the optimal path is going to be less than or equal to 8 okay [01:10:19] going to be less than or equal to 8 okay so far so good then somehow I found out [01:10:23] so far so good then somehow I found out that that one is greater than or equal [01:10:25] that that one is greater than or equal to 3 greater than or equal to 3 should [01:10:28] to 3 greater than or equal to 3 should be fine right like greater than or equal [01:10:30] be fine right like greater than or equal to 3 is still greater than or equal to 6 [01:10:32] to 3 is still greater than or equal to 6 my a s in this case I'm gonna call this [01:10:36] my a s in this case I'm gonna call this s 1 s 2 s 3 is equal to 3 right cuz I [01:10:42] s 1 s 2 s 3 is equal to 3 right cuz I know I need to be greater than or equal [01:10:43] know I need to be greater than or equal to 3 what like 6 already does the job [01:10:45] to 3 what like 6 already does the job right like I don't need to worry about [01:10:47] right like I don't need to worry about that 3 so so that's all and then for [01:10:51] that 3 so so that's all and then for this last node I am at this min node and [01:10:53] this last node I am at this min node and I realize that ps4 I'm gonna call it B S [01:10:57] I realize that ps4 I'm gonna call it B S 4 is equal to 5 and what this tells me [01:11:01] 4 is equal to 5 and what this tells me is that your value should be less than 5 [01:11:04] is that your value should be less than 5 and less than 5 so I'm going to update [01:11:06] and less than 5 so I'm going to update less than 8 to less than 5 [01:11:12] and now I don't have any inner walls so [01:11:15] and now I don't have any inner walls so what that tells me is that path is not [01:11:17] what that tells me is that path is not going to be the optimal path because [01:11:19] going to be the optimal path because there is no interval so we're not going [01:11:21] there is no interval so we're not going to find this this one number that is [01:11:23] to find this this one number that is going to be the utility and what that [01:11:25] going to be the utility and what that tells me is I can actually ignore that [01:11:27] tells me is I can actually ignore that whole subtree because because that's not [01:11:28] whole subtree because because that's not going to be in my optimal path I can I [01:11:31] going to be in my optimal path I can I can get rid of it I can ignore it yes so [01:11:42] can get rid of it I can ignore it yes so we're ignoring three in a different way [01:11:44] we're ignoring three in a different way so yeah so we're ignoring the value of [01:11:47] so yeah so we're ignoring the value of three because this is already encoded [01:11:49] three because this is already encoded here but we're ignoring the subtree of [01:11:51] here but we're ignoring the subtree of five like I'm not exploring it like I [01:11:53] five like I'm not exploring it like I need to explore things after the three [01:11:55] need to explore things after the three already because I'd like like like with [01:11:57] already because I'd like like like with the three of you already had an overlap [01:11:59] the three of you already had an overlap with the beta so you're looking at with [01:12:01] with the beta so you're looking at with the B value you're looking at the [01:12:03] the B value you're looking at the overlap between your upper bound of mid [01:12:06] overlap between your upper bound of mid node and lower bound of max node so that [01:12:07] node and lower bound of max node so that interval is the interval you were making [01:12:09] interval is the interval you were making sure it still has values in it if the [01:12:13] sure it still has values in it if the two or three instead we just ignore that [01:12:16] two or three instead we just ignore that anyways because you have something else [01:12:18] anyways because you have something else that yeah yeah so yeah I think so yeah [01:12:24] that yeah yeah so yeah I think so yeah so if you already have like if three [01:12:26] so if you already have like if three where two is that what you're saying [01:12:27] where two is that what you're saying yeah so so you want to have non-trivial [01:12:29] yeah so so you want to have non-trivial intervals basically yes yes so like if [01:12:32] intervals basically yes yes so like if if you use the same value you still yeah [01:12:36] if you use the same value you still yeah you don't have non-trivial intervals and [01:12:37] you don't have non-trivial intervals and yeah what are we got six an a300 this is [01:12:42] yeah what are we got six an a300 this is an example that imagines some Holly [01:12:45] an example that imagines some Holly we'll talk about some examples where we [01:12:47] we'll talk about some examples where we get them so let's talk about one more [01:12:49] get them so let's talk about one more example where we actually like it these [01:12:51] example where we actually like it these quotes for now just assume somehow we [01:12:52] quotes for now just assume somehow we have found this [01:12:57] I don't understand why brie is an [01:13:00] I don't understand why brie is an upper-bound what seems a lower bound so [01:13:02] upper-bound what seems a lower bound so um so then you actual value I'm not [01:13:08] um so then you actual value I'm not showing a full example here so the [01:13:09] showing a full example here so the actual values are coming from somewhere [01:13:11] actual values are coming from somewhere that I'm not talking about yet but oh [01:13:14] that I'm not talking about yet but oh the one at the top okay oh sorry yeah so [01:13:18] the one at the top okay oh sorry yeah so the one at the top right so so this is a [01:13:21] the one at the top right so so this is a min note let me note the same accent [01:13:24] min note let me note the same accent right so at my mid note I found out that [01:13:26] right so at my mid note I found out that minimum between three and five is three [01:13:28] minimum between three and five is three right so max no it is maximizing between [01:13:32] right so max no it is maximizing between three and a bunch of other things that's [01:13:35] three and a bunch of other things that's that's what it's supposed to do right so [01:13:37] that's what it's supposed to do right so if it is maximizing between three and a [01:13:39] if it is maximizing between three and a bunch of other things then it's at least [01:13:40] bunch of other things then it's at least going to be three it's not going to be [01:13:42] going to be three it's not going to be two there's no way for it to be two or [01:13:44] two there's no way for it to be two or it's not going to be zero right because [01:13:46] it's not going to be zero right because it's it's going to take maximum of three [01:13:48] it's it's going to take maximum of three and something else so that's why I'm [01:13:50] and something else so that's why I'm saying well this value whatever I'm [01:13:52] saying well this value whatever I'm going to get at this max node it's going [01:13:54] going to get at this max node it's going to be greater than or equal to three X s [01:13:57] to be greater than or equal to three X s so now I come down here and I see like I [01:14:00] so now I come down here and I see like I see this - this is a min note so the [01:14:04] see this - this is a min note so the value here is going to be the minimum [01:14:06] value here is going to be the minimum between two and whatever is down this [01:14:09] between two and whatever is down this tree right so it is going to be at least [01:14:13] tree right so it is going to be at least very bad way that we said it was it's [01:14:16] very bad way that we said it was it's going to be it's going to be two or [01:14:18] going to be it's going to be two or lower so so what we're getting here is [01:14:22] lower so so what we're getting here is going to be two or lower right so I'm [01:14:24] going to be two or lower right so I'm either going to get 2 or 1 or 0 or all [01:14:27] either going to get 2 or 1 or 0 or all that and that's the value that's going [01:14:29] that and that's the value that's going to be pushed up here right so that's the [01:14:33] to be pushed up here right so that's the value that's going to go down here it's [01:14:34] value that's going to go down here it's going to be a value that is 2 or lower [01:14:36] going to be a value that is 2 or lower so if I'm maximizing between 3 and [01:14:41] so if I'm maximizing between 3 and something that is 2 or lower then 3 is [01:14:44] something that is 2 or lower then 3 is enough and I can kind of figure that out [01:14:46] enough and I can kind of figure that out based on these intervals and don't look [01:14:48] based on these intervals and don't look at this side of the tree like like once [01:14:50] at this side of the tree like like once I have I've seen this - I already feel [01:14:52] I have I've seen this - I already feel that there is no no trivial interval [01:14:54] that there is no no trivial interval between a value that's greater than 3 [01:14:57] between a value that's greater than 3 and a value that's less than 2 so I can [01:14:59] and a value that's less than 2 so I can just not worry about stuff [01:15:03] all right so one quick cutter [01:15:06] all right so one quick cutter implementation thing is we talked about [01:15:10] implementation thing is we talked about these ace a values and B values you can [01:15:14] these ace a values and B values you can keep track of only one value and that [01:15:16] keep track of only one value and that would be this alpha value and beta value [01:15:18] would be this alpha value and beta value where alpha value is just I'm gonna just [01:15:21] where alpha value is just I'm gonna just write it here [01:15:21] write it here alpha value right so op of S is the max [01:15:26] alpha value right so op of S is the max of a s for all these s Prime's that are [01:15:30] of a s for all these s Prime's that are listen s yeah so so it's so what this [01:15:34] listen s yeah so so it's so what this basically says is it remember like when [01:15:36] basically says is it remember like when we saw three we said well that's already [01:15:38] we saw three we said well that's already included like we already knew that [01:15:40] included like we already knew that that's kind of the same idea so alpha s [01:15:42] that's kind of the same idea so alpha s is just going to be one value in this [01:15:44] is just going to be one value in this case it's just going to be six because [01:15:46] case it's just going to be six because like when I see three like I don't [01:15:48] like when I see three like I don't really care about that three right like [01:15:49] really care about that three right like I already know I'm greater than six [01:15:50] I already know I'm greater than six knowing that I'm greater than three is [01:15:52] knowing that I'm greater than three is not adding anything so we keep track of [01:15:54] not adding anything so we keep track of one value off of Asaph of s in this case [01:15:58] one value off of Asaph of s in this case F of s is just equal to six and a [01:16:02] F of s is just equal to six and a similar thing for beta we are going to [01:16:04] similar thing for beta we are going to keep track of beta of s and beta of s is [01:16:06] keep track of beta of s and beta of s is just minimum of BS s and then what I'm [01:16:12] just minimum of BS s and then what I'm writing here is just the ordering of the [01:16:14] writing here is just the ordering of the notes that you have seen so so beta is s [01:16:18] notes that you have seen so so beta is s fine and then you're looking at the [01:16:20] fine and then you're looking at the intervals like f of s and F of s and [01:16:24] intervals like f of s and F of s and above and beta of SM below and if those [01:16:27] above and beta of SM below and if those intervals don't have any trivial [01:16:29] intervals don't have any trivial intersections then you can you can prune [01:16:30] intersections then you can you can prune part of the tree okay so this is more of [01:16:33] part of the tree okay so this is more of an implementation thing instead of [01:16:34] an implementation thing instead of keeping track of all these assn BS s [01:16:37] keeping track of all these assn BS s just keep like one number one alpha and [01:16:38] just keep like one number one alpha and 1 beta okay all right [01:16:42] 1 beta okay all right okay so let's look at one other example [01:16:45] okay so let's look at one other example so all right so I'm gonna just do this [01:16:49] so all right so I'm gonna just do this example real quick okay so we're gonna [01:16:54] example real quick okay so we're gonna start from some top note we're gonna go [01:16:57] start from some top note we're gonna go to this note this is a mid note between [01:16:59] to this note this is a mid note between nine and seven between nine and seven [01:17:03] nine and seven between nine and seven right so it's a mid note I'm gonna get [01:17:06] right so it's a mid note I'm gonna get this guy a seven I'm gonna realize that [01:17:08] this guy a seven I'm gonna realize that this max node is going to be something [01:17:11] this max node is going to be something that's at least seven right it's going [01:17:13] that's at least seven right it's going to be something that's greater than or [01:17:14] to be something that's greater than or equal to seven so my alpha [01:17:17] equal to seven so my alpha there's going to be seven right now I [01:17:19] there's going to be seven right now I know whatever value I'm gonna get is [01:17:21] know whatever value I'm gonna get is going to be 7 or higher whatever value [01:17:24] going to be 7 or higher whatever value to start notice going to get it's got to [01:17:25] to start notice going to get it's got to be 7 or higher so now I come down here I [01:17:29] be 7 or higher so now I come down here I am at a mid note I see a 6 here [01:17:32] am at a mid note I see a 6 here right I go here it's a min note so [01:17:36] right I go here it's a min note so whatever we get here is going to be less [01:17:38] whatever we get here is going to be less than or equal to 6 right so it's going [01:17:41] than or equal to 6 right so it's going to be 6 or something that is lower that [01:17:44] to be 6 or something that is lower that tells me my beta is is equal to 6 that [01:17:48] tells me my beta is is equal to 6 that tells me whatever I'm getting in that [01:17:50] tells me whatever I'm getting in that min node is going to be 6 and lower that [01:17:52] min node is going to be 6 and lower that doesn't have any intersections with my [01:17:54] doesn't have any intersections with my alpha of s so I can just not do anything [01:17:58] alpha of s so I can just not do anything about this this branch like I don't need [01:18:01] about this this branch like I don't need to go over like all these other things [01:18:04] to go over like all these other things like I can kind of like ignore like this [01:18:06] like I can kind of like ignore like this whole bunch okay all right so now I go [01:18:10] whole bunch okay all right so now I go back up I go down here I'm at a mid note [01:18:15] back up I go down here I'm at a mid note so remember the way we were computing [01:18:17] so remember the way we were computing these beta values we were based on the [01:18:19] these beta values we were based on the notice that we have seen previously so I [01:18:21] notice that we have seen previously so I have a new beta now cuz I'm done with [01:18:23] have a new beta now cuz I'm done with this branch right so I need to get here [01:18:25] this branch right so I need to get here here I have a min between what is it 8 8 [01:18:30] here I have a min between what is it 8 8 and 3 so okay so so I see my maybe let [01:18:34] and 3 so okay so so I see my maybe let me just rate I see my 8 here [01:18:36] me just rate I see my 8 here it's a min node so it's going to be less [01:18:38] it's a min node so it's going to be less than or equal to 8 so my new beta value [01:18:41] than or equal to 8 so my new beta value is going to be 8 my alpha is still 7 [01:18:46] is going to be 8 my alpha is still 7 because that's for my top note so it's 8 [01:18:48] because that's for my top note so it's 8 or lower we do have an interval [01:18:52] or lower we do have an interval overlapping interval 7 to 8 everything [01:18:54] overlapping interval 7 to 8 everything is good so I actually need to go and see [01:18:57] is good so I actually need to go and see what this value is this value is 3 so I [01:19:01] what this value is this value is 3 so I get 3 here or like it's exactly equal to [01:19:04] get 3 here or like it's exactly equal to 3 so that updates my beta from 8 to 3 [01:19:09] 3 so that updates my beta from 8 to 3 we'll have already explored that part of [01:19:11] we'll have already explored that part of the tree anyways but 3 you don't have an [01:19:14] the tree anyways but 3 you don't have an interval if there were a bunch of things [01:19:16] interval if there were a bunch of things below this 3 like I like a nice somehow [01:19:19] below this 3 like I like a nice somehow sound it's not like I wouldn't need to [01:19:20] sound it's not like I wouldn't need to explore it but we don't really have that [01:19:21] explore it but we don't really have that and then we just find that our optimal [01:19:24] and then we just find that our optimal value 7 so we just return something okay [01:19:27] value 7 so we just return something okay and we did an explore this giant middle [01:19:30] and we did an explore this giant middle of the tree okay one more slide and [01:19:34] of the tree okay one more slide and enough two more two more quick one quick [01:19:36] enough two more two more quick one quick idea okay so yeah alright so the order [01:19:40] idea okay so yeah alright so the order of things actually matters so the only [01:19:42] of things actually matters so the only thing I want to mention about this idea [01:19:43] thing I want to mention about this idea of pruning is this order of things [01:19:45] of pruning is this order of things matter so so when you look at this [01:19:47] matter so so when you look at this example remember we didn't explore [01:19:48] example remember we didn't explore anything about the ten because we [01:19:50] anything about the ten because we already knew that this value needs to be [01:19:52] already knew that this value needs to be greater than equal to three these are my [01:19:54] greater than equal to three these are my buckets right if I swap the buckets like [01:19:56] buckets right if I swap the buckets like if I just swap the order of buckets I [01:19:58] if I just swap the order of buckets I moved the to ten bucket to this side [01:19:59] moved the to ten bucket to this side three five pocket to the other side I [01:20:01] three five pocket to the other side I wouldn't be able to do that I actually [01:20:03] wouldn't be able to do that I actually need to explore the whole tree because [01:20:06] need to explore the whole tree because my alpha and beta [01:20:07] my alpha and beta wouldn't have the same properties so the [01:20:09] wouldn't have the same properties so the order that you're putting things on the [01:20:11] order that you're putting things on the tree actually matters and you should [01:20:13] tree actually matters and you should care about that so worst case scenario [01:20:16] care about that so worst case scenario our ordering is terrible so we need to [01:20:18] our ordering is terrible so we need to actually go over the full tree that's [01:20:20] actually go over the full tree that's order of B to the to D that's the worst [01:20:21] order of B to the to D that's the worst case scenario [01:20:22] case scenario there are ends of this best ordering [01:20:24] there are ends of this best ordering where you don't explore like half of it [01:20:26] where you don't explore like half of it so you can't like if you had that if you [01:20:29] so you can't like if you had that if you if you have a tree where you're you can [01:20:31] if you have a tree where you're you can explore up to like depth ten then with [01:20:34] explore up to like depth ten then with the best order and you can actually [01:20:35] the best order and you can actually explore up to depth like 20 so sorry [01:20:38] explore up to depth like 20 so sorry that's a huge improvement actually so [01:20:40] that's a huge improvement actually so best ordering is going to be order of B [01:20:43] best ordering is going to be order of B to the D and then random ordering turns [01:20:46] to the D and then random ordering turns out to be pretty okay to so random [01:20:48] out to be pretty okay to so random ordering would be order of P to the 2 [01:20:50] ordering would be order of P to the 2 times 3/4 times D so even if you had a [01:20:52] times 3/4 times D so even if you had a random ordering it would be better than [01:20:54] random ordering it would be better than the worst-case scenario and then well [01:20:57] the worst-case scenario and then well how do you figure out what is a good [01:20:58] how do you figure out what is a good bordering in ordering well we can have [01:21:00] bordering in ordering well we can have this evaluation function remember you're [01:21:02] this evaluation function remember you're computing the evaluation function and [01:21:04] computing the evaluation function and and what you can do is you can order [01:21:06] and what you can do is you can order your Super Max nodes you can order the [01:21:09] your Super Max nodes you can order the successors by decreasing evaluation [01:21:11] successors by decreasing evaluation function and then for min nodes you can [01:21:14] function and then for min nodes you can order successors by increasing [01:21:16] order successors by increasing evaluation functions that allows you to [01:21:18] evaluation functions that allows you to prune as much things as possible all [01:21:21] prune as much things as possible all right so with that I'll see you guys [01:21:24] right so with that I'll see you guys next lecture talking about tea new [01:21:25] next lecture talking about tea new learning ================================================================================ LECTURE 022 ================================================================================ Game Playing 2 - TD Learning, Game Theory | Stanford CS221: Artificial Intelligence (Autumn 2019) Source: https://www.youtube.com/watch?v=WoFwXj4p4Sc --- Transcript [00:00:04] let's start guys okay so we're gonna [00:00:09] let's start guys okay so we're gonna continue talking about games today and [00:00:11] continue talking about games today and just quick announcement the project [00:00:13] just quick announcement the project proposals are due today I think you all [00:00:16] proposals are due today I think you all know that all right so let's good [00:00:19] know that all right so let's good tomorrow right okay yeah today is not [00:00:25] tomorrow right okay yeah today is not Thursday yeah tomorrow for a second I [00:00:28] Thursday yeah tomorrow for a second I thought it's Thursday all right so let's [00:00:32] thought it's Thursday all right so let's talk about games so we started talking [00:00:35] talk about games so we started talking about games last time we formalized them [00:00:38] about games last time we formalized them we talked about none we talked about [00:00:41] we talked about none we talked about zero-sum two-player games that were [00:00:43] zero-sum two-player games that were turn-taking [00:00:44] turn-taking right and we talked about a bunch of [00:00:46] right and we talked about a bunch of different strategies to solve them like [00:00:47] different strategies to solve them like the minimax strategy or the expecting [00:00:49] the minimax strategy or the expecting max strategy and today we want to talk a [00:00:52] max strategy and today we want to talk a little bit about learning in the setting [00:00:54] little bit about learning in the setting of games so what does learning mean how [00:00:56] of games so what does learning mean how do we learn those evaluation functions [00:00:57] do we learn those evaluation functions that we talked about and then towards [00:01:00] that we talked about and then towards the end of the lecture we want to talk a [00:01:01] the end of the lecture we want to talk a little bit about variations of the game [00:01:04] little bit about variations of the game the games you have talked about so how [00:01:06] the games you have talked about so how about you have how about the cases where [00:01:08] about you have how about the cases where we have simultaneous games or [00:01:09] we have simultaneous games or nonzero-sum games so that's a that's a [00:01:11] nonzero-sum games so that's a that's a plan for today so I'm gonna start with a [00:01:13] plan for today so I'm gonna start with a question that you're actually going to [00:01:16] question that you're actually going to talk about it towards the end of the [00:01:17] talk about it towards the end of the lecture but it's a good motivation so [00:01:20] lecture but it's a good motivation so think about a setting where we have a [00:01:23] think about a setting where we have a simultaneous two player zero-sum game so [00:01:25] simultaneous two player zero-sum game so it's a two player zero-sum game similar [00:01:28] it's a two player zero-sum game similar to the games we talked about last time [00:01:29] to the games we talked about last time but it is simultaneous so you're not [00:01:31] but it is simultaneous so you're not taking turns you're playing at the same [00:01:33] taking turns you're playing at the same time and an example of that is [00:01:36] time and an example of that is rock-paper-scissors so can you still be [00:01:39] rock-paper-scissors so can you still be optimal if you reveal your strategy [00:01:41] optimal if you reveal your strategy so we'll say you're playing with someone [00:01:43] so we'll say you're playing with someone if you tell them what your strategy is [00:01:45] if you tell them what your strategy is can you still be optimal that's the [00:01:48] can you still be optimal that's the question exactly what you're going to [00:01:54] question exactly what you're going to play you won't be successful too huge [00:01:56] play you won't be successful too huge for a zero-sum real-time so I was using [00:01:58] for a zero-sum real-time so I was using a larger scale I think you could still [00:02:00] a larger scale I think you could still be successful if that approach is like [00:02:03] be successful if that approach is like superior to the others rates [00:02:05] superior to the others rates so it's not so the answer was about the [00:02:07] so it's not so the answer was about the size of the game so rock-paper-scissors [00:02:09] size of the game so rock-paper-scissors being small versus versus not being [00:02:11] being small versus versus not being small so so the question is more of a [00:02:13] small so so the question is more of a motivating thing people talk about this [00:02:14] motivating thing people talk about this a lot of details towards the end of the [00:02:16] a lot of details towards the end of the class it's actually not the size that [00:02:18] class it's actually not the size that matters is a type of strategy that you [00:02:20] matters is a type of strategy that you play that matters just to give you an [00:02:21] play that matters just to give you an idea but like the reason that we have [00:02:24] idea but like the reason that we have put this I guess at the beginning of the [00:02:26] put this I guess at the beginning of the lecture is intuitively when you think [00:02:28] lecture is intuitively when you think about this you might say no I'm not [00:02:30] about this you might say no I'm not gonna tell you what my strategy is right [00:02:32] gonna tell you what my strategy is right because if I say I'm gonna play like [00:02:34] because if I say I'm gonna play like scissors you'll know what to play but [00:02:37] scissors you'll know what to play but this has an intuitive answer that you're [00:02:40] this has an intuitive answer that you're gonna talk about towards the end of the [00:02:41] gonna talk about towards the end of the lecture so just more of a motivating [00:02:43] lecture so just more of a motivating example don't think about it too hard [00:02:45] example don't think about it too hard all right so so let's do a quick review [00:02:48] all right so so let's do a quick review of games so so last night we talked [00:02:52] of games so so last night we talked about having an agent an opponent [00:02:54] about having an agent an opponent playing against each other so and you [00:02:56] playing against each other so and you were playing for the agent and the agent [00:02:58] were playing for the agent and the agent was trying to maximize their utility so [00:03:01] was trying to maximize their utility so they were trying to get this utility the [00:03:02] they were trying to get this utility the example we looked at was agent is going [00:03:05] example we looked at was agent is going to pickpocket a bucket B or bucket C and [00:03:08] to pickpocket a bucket B or bucket C and then the opponent is going to pick a [00:03:09] then the opponent is going to pick a number from these buckets they can [00:03:11] number from these buckets they can either pick minus 50 or 51 or 3 or minus [00:03:14] either pick minus 50 or 51 or 3 or minus 5 or 15 and then if you want to maximize [00:03:17] 5 or 15 and then if you want to maximize your your utility as an agent then you [00:03:20] your your utility as an agent then you can potentially think that your opponent [00:03:21] can potentially think that your opponent is trying to trying to minimize your [00:03:24] is trying to trying to minimize your utility and you can have this minimax [00:03:26] utility and you can have this minimax game kind of playing against each other [00:03:27] game kind of playing against each other and based on that decide what to do so [00:03:30] and based on that decide what to do so we had this minimax tree and based on [00:03:33] we had this minimax tree and based on that the utilities that are gonna pop up [00:03:34] that the utilities that are gonna pop up or minus 51 and minus 5 so if your goal [00:03:38] or minus 51 and minus 5 so if your goal is to maximize your utility you're gonna [00:03:40] is to maximize your utility you're gonna pick bucket be the second bucket because [00:03:42] pick bucket be the second bucket because that's the best thing you can do [00:03:43] that's the best thing you can do assuming your opponent is a minimizer so [00:03:45] assuming your opponent is a minimizer so so that was kind of the setup that we [00:03:47] so that was kind of the setup that we started looking at and the way we [00:03:49] started looking at and the way we thought about solving this game why was [00:03:51] thought about solving this game why was by writing a recurrence so so we had [00:03:54] by writing a recurrence so so we had this value this is V which was the value [00:03:56] this value this is V which was the value of a minimax at state s and if you're at [00:04:01] of a minimax at state s and if you're at the utility so if you're at an end state [00:04:03] the utility so if you're at an end state you're gonna get utility of s right like [00:04:05] you're gonna get utility of s right like if you get to the end state we get the [00:04:06] if you get to the end state we get the utility because we get the utility only [00:04:08] utility because we get the utility only idea at the very end of the game and if [00:04:11] idea at the very end of the game and if the agent is playing we the recurrence [00:04:13] the agent is playing we the recurrence is maximize V of the successor States [00:04:16] is maximize V of the successor States and if the opponent is playing [00:04:18] and if the opponent is playing want to minimize the value of the [00:04:20] want to minimize the value of the successor States so that was the [00:04:22] successor States so that was the recurrence we started with and and we [00:04:25] recurrence we started with and and we looked at games that were kind of large [00:04:27] looked at games that were kind of large like the game of chess and if you think [00:04:28] like the game of chess and if you think about the game of chess the branching [00:04:30] about the game of chess the branching factor is huge the depth is really large [00:04:32] factor is huge the depth is really large it's not practical to you to do the [00:04:35] it's not practical to you to do the recurrence so we we started talking [00:04:37] recurrence so we we started talking about waste for speeding things up and [00:04:39] about waste for speeding things up and one way to speed things up with this [00:04:41] one way to speed things up with this idea of using an evaluation function so [00:04:44] idea of using an evaluation function so do their recurrence but only do it until [00:04:46] do their recurrence but only do it until some depth so don't go over the full [00:04:49] some depth so don't go over the full tree just do it until some depth and [00:04:51] tree just do it until some depth and then after that just call an evaluation [00:04:53] then after that just call an evaluation function and hopefully your evaluation [00:04:55] function and hopefully your evaluation function which is kind of this weak [00:04:57] function which is kind of this weak estimate of your value is going to work [00:05:00] estimate of your value is going to work well and give you an idea what to do [00:05:02] well and give you an idea what to do next so so instead of the usual [00:05:04] next so so instead of the usual recurrence what we did was we decided to [00:05:06] recurrence what we did was we decided to add this knee here and this D right here [00:05:09] add this knee here and this D right here which is the depths that until which we [00:05:11] which is the depths that until which we are exploring and then we decrease the [00:05:13] are exploring and then we decrease the value of depth after an agent an [00:05:16] value of depth after an agent an opponent place and then my depth is [00:05:18] opponent place and then my depth is equal to zero we just call an evaluation [00:05:20] equal to zero we just call an evaluation function so intuitively if you're [00:05:22] function so intuitively if you're playing chess for example you might [00:05:24] playing chess for example you might think a few steps ahead and when you [00:05:26] think a few steps ahead and when you think a few steps ahead you might think [00:05:27] think a few steps ahead you might think about how the board looks like and kind [00:05:29] about how the board looks like and kind of evaluates that based on the features [00:05:31] of evaluates that based on the features that that that board has and based on [00:05:32] that that that board has and based on that you might you might decide to take [00:05:34] that you might you might decide to take various actions so similar type of idea [00:05:36] various actions so similar type of idea and then the question was well how are [00:05:39] and then the question was well how are we going to come up with a solution [00:05:39] we going to come up with a solution function like where is this evaluation [00:05:42] function like where is this evaluation function coming from and then one idea [00:05:44] function coming from and then one idea that that we talked about last time was [00:05:46] that that we talked about last time was it can be handcrafted the designer can [00:05:48] it can be handcrafted the designer can come in and sit down and figure out what [00:05:50] come in and sit down and figure out what is a good evaluation function so in the [00:05:53] is a good evaluation function so in the game of chase and test example is you [00:05:56] game of chase and test example is you have this evaluation function that can [00:05:58] have this evaluation function that can depend on the number of pieces you have [00:05:59] depend on the number of pieces you have the mobility of your pieces maybe the [00:06:01] the mobility of your pieces maybe the safety of your king central control all [00:06:04] safety of your king central control all these various things that you might care [00:06:05] these various things that you might care about so the difference between the [00:06:08] about so the difference between the number of Queens that you have and your [00:06:09] number of Queens that you have and your opponent's number of Queens these are [00:06:11] opponent's number of Queens these are things these are features that you care [00:06:13] things these are features that you care about and and potentially a designer can [00:06:15] about and and potentially a designer can come in and say well I care about nine [00:06:17] come in and say well I care about nine times more than I care about how many [00:06:19] times more than I care about how many pawns I sew so a hand like you can [00:06:21] pawns I sew so a hand like you can actually hand design these things and [00:06:23] actually hand design these things and write down these weights about how much [00:06:26] write down these weights about how much you care about this so I'm using [00:06:28] you care about this so I'm using terminology from the learning lecture [00:06:30] terminology from the learning lecture right I'm saying we have [00:06:31] right I'm saying we have wait here and we have features here and [00:06:33] wait here and we have features here and someone can comment just handcraft well [00:06:37] someone can comment just handcraft well one other thing we can do is instead of [00:06:39] one other thing we can do is instead of hand crafting it we could actually try [00:06:41] hand crafting it we could actually try to learn this evaluation function so so [00:06:44] to learn this evaluation function so so we can still have to have two features [00:06:45] we can still have to have two features right we can still say well I care about [00:06:47] right we can still say well I care about the number of kings and queens and these [00:06:48] the number of kings and queens and these are things that I have but I don't know [00:06:50] are things that I have but I don't know how much I care about them and I [00:06:52] how much I care about them and I actually want to learn that evaluation [00:06:54] actually want to learn that evaluation function like what the weights should be [00:06:57] function like what the weights should be so to do that I can write my evaluation [00:06:59] so to do that I can write my evaluation function eval of s as this me as a [00:07:03] function eval of s as this me as a function of state parameterize by pie [00:07:06] function of state parameterize by pie weights doubles and and my goal is to [00:07:09] weights doubles and and my goal is to figure out what these w's what these [00:07:11] figure out what these w's what these weights are and ideally I want to learn [00:07:13] weights are and ideally I want to learn that from some data ok so so we're going [00:07:15] that from some data ok so so we're going to talk about how learning is applied to [00:07:17] to talk about how learning is applied to the game setting and specifically the [00:07:19] the game setting and specifically the way we are using learning for these game [00:07:21] way we are using learning for these game settings is to just get a better sense [00:07:23] settings is to just get a better sense of what this evaluation function should [00:07:24] of what this evaluation function should be from some data so so the questions [00:07:27] be from some data so so the questions you might have right now is well how [00:07:29] you might have right now is well how does we look like where does my data [00:07:31] does we look like where does my data come from because I find if you know [00:07:33] come from because I find if you know where your data comes from and your V is [00:07:35] where your data comes from and your V is then all you need to do is to come up [00:07:37] then all you need to do is to come up with a learning algorithm that takes [00:07:38] with a learning algorithm that takes your data and tries to figure out what [00:07:40] your data and tries to figure out what your V is so so we're going to talk [00:07:42] your V is so so we're going to talk about that at the first part of [00:07:43] about that at the first part of knowledge and that kind of introduces to [00:07:47] knowledge and that kind of introduces to this this temporal difference learning [00:07:49] this this temporal difference learning which you're gonna discuss in a second [00:07:50] which you're gonna discuss in a second it's very similar to key learning and [00:07:54] it's very similar to key learning and then towards the end of the class we'll [00:07:55] then towards the end of the class we'll talk about simultaneous games and [00:07:57] talk about simultaneous games and nonzero symbols [00:07:58] nonzero symbols all right so so let's start with this V [00:08:02] all right so so let's start with this V function I just said well this new [00:08:04] function I just said well this new function could be parameterize by a set [00:08:07] function could be parameterize by a set of weights I set up double use and the [00:08:09] of weights I set up double use and the simplest form of this V function is to [00:08:11] simplest form of this V function is to just write it as a linear classifier as [00:08:13] just write it as a linear classifier as as a linear function of a set of [00:08:15] as a linear function of a set of features double use time space and these [00:08:18] features double use time space and these these are the features that are hind [00:08:19] these are the features that are hind coded and someone writes them and then [00:08:21] coded and someone writes them and then and then I just want to figure out what [00:08:22] and then I just want to figure out what W sir so this is the simplest form but [00:08:25] W sir so this is the simplest form but in general did this this V function [00:08:27] in general did this this V function doesn't need to be a linear classifier [00:08:29] doesn't need to be a linear classifier it can actually be any supervised [00:08:31] it can actually be any supervised learning model that you have discussed [00:08:32] learning model that you have discussed in the first few lectures it can be a [00:08:34] in the first few lectures it can be a neural network it can be anything even [00:08:36] neural network it can be anything even more complicated than network that just [00:08:39] more complicated than network that just does regression so we can basically any [00:08:41] does regression so we can basically any model you could use in supervised [00:08:43] model you could use in supervised learning could be placed here [00:08:44] learning could be placed here as I see function so all I'm doing is [00:08:48] as I see function so all I'm doing is I'm writing this P function as a [00:08:49] I'm writing this P function as a function of state and a bunch of [00:08:50] function of state and a bunch of parameters those parameters in the case [00:08:52] parameters those parameters in the case of linear classifiers are just w's and [00:08:54] of linear classifiers are just w's and in the case of the neural network there [00:08:56] in the case of the neural network there are WS and these fees in this case of [00:08:58] are WS and these fees in this case of like one layer alright so let's look at [00:09:06] like one layer alright so let's look at an example so let's think about an [00:09:07] an example so let's think about an example and I'm gonna focus on the [00:09:09] example and I'm gonna focus on the linear classifier way of looking at this [00:09:11] linear classifier way of looking at this just for simplicity so okay let's pick a [00:09:14] just for simplicity so okay let's pick a game so we're gonna look at backgammon [00:09:16] game so we're gonna look at backgammon so this is a very old game it's a [00:09:19] so this is a very old game it's a two-player game the way it works is you [00:09:21] two-player game the way it works is you have the red player and you have the [00:09:23] have the red player and you have the white player and each one of them have [00:09:25] white player and each one of them have these pieces and what they want to do is [00:09:27] these pieces and what they want to do is they want to move all their pieces from [00:09:28] they want to move all their pieces from one side of the board to the other side [00:09:30] one side of the board to the other side of the board it's a game of chance you [00:09:32] of the board it's a game of chance you can actually like roll two dice and [00:09:34] can actually like roll two dice and based on the outcome of your dice you [00:09:36] based on the outcome of your dice you move your pieces various various amounts [00:09:39] move your pieces various various amounts to two various columns there bunch of [00:09:41] to two various columns there bunch of rules so your goal is to get all your [00:09:43] rules so your goal is to get all your pieces off the board but if you have [00:09:45] pieces off the board but if you have only like one piece and your opponent [00:09:47] only like one piece and your opponent like gets on top of you they can push [00:09:49] like gets on top of you they can push you to the bar and you have to like [00:09:50] you to the bar and you have to like start again there are a bunch of rules [00:09:53] start again there are a bunch of rules about it read about it on Wikipedia if [00:09:55] about it read about it on Wikipedia if you're interested but you're gonna look [00:09:57] you're interested but you're gonna look at a simplified version of it so in this [00:09:59] at a simplified version of it so in this simplified version I have player O and [00:10:02] simplified version I have player O and play your X and I only have four columns [00:10:04] play your X and I only have four columns I have column 0 1 2 & 3 and in this case [00:10:09] I have column 0 1 2 & 3 and in this case I have four of each one of these players [00:10:10] I have four of each one of these players and and the idea is we want to come up [00:10:13] and and the idea is we want to come up with features that we would care about [00:10:15] with features that we would care about in this game of backgammon so so what [00:10:17] in this game of backgammon so so what are some features how do you think might [00:10:19] are some features how do you think might be useful [00:10:21] be useful remember the learning lecture how do we [00:10:24] remember the learning lecture how do we come up with like future templates third [00:10:30] come up with like future templates third is still down with the color but it's a [00:10:32] is still down with the color but it's a mistake so maybe like the location of [00:10:34] mistake so maybe like the location of the X's and O's the number of them yeah [00:10:37] the X's and O's the number of them yeah yeah so like what idea is you have all [00:10:40] yeah so like what idea is you have all these knowledge about the boards so [00:10:41] these knowledge about the boards so maybe we should like care about the [00:10:42] maybe we should like care about the location of the X's maybe we should care [00:10:44] location of the X's maybe we should care about like where the O's are how many [00:10:46] about like where the O's are how many pieces are on the board how many pieces [00:10:47] pieces are on the board how many pieces are off the board so similar type of way [00:10:50] are off the board so similar type of way that we would come up with features in [00:10:51] that we would come up with features in the first few lectures we were basically [00:10:53] the first few lectures we were basically we would do the same thing so a feature [00:10:54] we would do the same thing so a feature template set of feature templates could [00:10:56] template set of feature templates could look like this like number of [00:10:58] look like this like number of X's or OS in column whatever con being [00:11:01] X's or OS in column whatever con being equal to some value or a number of [00:11:03] equal to some value or a number of excess zeros on the bar may be fraction [00:11:06] excess zeros on the bar may be fraction of excesses or OS that are removed whose [00:11:08] of excesses or OS that are removed whose turn it is so these are all like [00:11:10] turn it is so these are all like potential features that it could be so [00:11:11] potential features that it could be so for this particular board here are what [00:11:14] for this particular board here are what those features would look like so if you [00:11:16] those features would look like so if you look at number of OS in column 0 equal [00:11:18] look at number of OS in column 0 equal to 1 that's equal to 1 remember we were [00:11:20] to 1 that's equal to 1 remember we were using these indicator functions to be [00:11:22] using these indicator functions to be more general so like here again we are [00:11:24] more general so like here again we are using this indicator functions you might [00:11:26] using this indicator functions you might ask number of O's on a bar that's equal [00:11:28] ask number of O's on a bar that's equal to one fraction of O's that are removed [00:11:31] to one fraction of O's that are removed so I have four pieces two of them are [00:11:33] so I have four pieces two of them are already removed so that's one half [00:11:34] already removed so that's one half number of X's in column one equal to 1 [00:11:37] number of X's in column one equal to 1 that's one number of X's and columns [00:11:38] that's one number of X's and columns three equal to three that's one it's to [00:11:41] three equal to three that's one it's to stern so that's a cool okay so so we [00:11:43] stern so that's a cool okay so so we have a bunch of features these features [00:11:45] have a bunch of features these features kind of explain what the sport looks [00:11:47] kind of explain what the sport looks like or how good this world is and what [00:11:49] like or how good this world is and what we want to do is we want to figure out [00:11:50] we want to do is we want to figure out what it what are the weights that we [00:11:53] what it what are the weights that we should put for each one of these [00:11:54] should put for each one of these features and how much we should care [00:11:55] features and how much we should care about each one of these features so that [00:11:57] about each one of these features so that is the goal of learning here okay all [00:12:01] is the goal of learning here okay all right so okay so that was my model right [00:12:03] right so okay so that was my model right so far I've talked about this vs of W I [00:12:06] so far I've talked about this vs of W I defined it as a linear classifier [00:12:08] defined it as a linear classifier predictor W times features and now the [00:12:12] predictor W times features and now the question is where do I get data like [00:12:14] question is where do I get data like where it's because if I'm doing learning [00:12:16] where it's because if I'm doing learning I got to get data from somewhere so so [00:12:19] I got to get data from somewhere so so what idea that we can use here is we can [00:12:21] what idea that we can use here is we can try to generate data based on our [00:12:23] try to generate data based on our current policy PI agent or PI opponent [00:12:26] current policy PI agent or PI opponent which is based on our current estimate [00:12:28] which is based on our current estimate of what we use right so currently I [00:12:31] of what we use right so currently I might have some idea of what this V [00:12:33] might have some idea of what this V function is it might be a very bad idea [00:12:34] function is it might be a very bad idea of what V is but that's okay I can just [00:12:37] of what V is but that's okay I can just start with that and starting with that V [00:12:40] start with that and starting with that V function that I currently have what I [00:12:42] function that I currently have what I can do is I can I can call our max of V [00:12:44] can do is I can I can call our max of V over successors of SN a to get a policy [00:12:47] over successors of SN a to get a policy for my agent [00:12:48] for my agent remember this was how we were getting [00:12:49] remember this was how we were getting policy in a minimax setting policy for [00:12:52] policy in a minimax setting policy for the opponent is just argument of that V [00:12:54] the opponent is just argument of that V function and then when I call these [00:12:56] function and then when I call these policies I get a bunch of actions I get [00:12:59] policies I get a bunch of actions I get a sequence of like states based on based [00:13:02] a sequence of like states based on based on how we're following these policies [00:13:03] on how we're following these policies and that is some data that I can [00:13:05] and that is some data that I can actually go over and try to make might [00:13:08] actually go over and try to make might be better and better so so that's kind [00:13:09] be better and better so so that's kind of how we do it we call these policies [00:13:11] of how we do it we call these policies we get [00:13:12] we get bunch of episodes we go over them to [00:13:14] bunch of episodes we go over them to make things better and better so that's [00:13:16] make things better and better so that's kind of the key idea um one question you [00:13:19] kind of the key idea um one question you might have at this point is is this [00:13:21] might have at this point is is this deterministic or not like do I need to [00:13:23] deterministic or not like do I need to do something like Epsilon greedy so in [00:13:25] do something like Epsilon greedy so in general you would need to do something [00:13:27] general you would need to do something like epsilon greedy but in this [00:13:29] like epsilon greedy but in this particular case you don't really need to [00:13:31] particular case you don't really need to do that because we have again it like we [00:13:32] do that because we have again it like we have this guy that you're actually [00:13:35] have this guy that you're actually rolling the dice and by rolling the dice [00:13:37] rolling the dice and by rolling the dice you are getting random different [00:13:39] you are getting random different different random paths that that we [00:13:41] different random paths that that we might take so that would take us [00:13:43] might take so that would take us different states so we kind of already [00:13:45] different states so we kind of already have this this element of random this [00:13:46] have this this element of random this year that does some of the exploration [00:13:48] year that does some of the exploration for us you just like yes so why if [00:13:53] for us you just like yes so why if someone greedy what I mean here is do I [00:13:55] someone greedy what I mean here is do I need to do extra exploration am I gonna [00:13:57] need to do extra exploration am I gonna get stuck like in particular set of [00:13:59] get stuck like in particular set of states if I don't do exploration and in [00:14:01] states if I don't do exploration and in this particular case because we have [00:14:02] this particular case because we have this randomness we don't really need to [00:14:04] this randomness we don't really need to do that in general you might imagine [00:14:06] do that in general you might imagine having some sort of epsilon greedy to [00:14:09] having some sort of epsilon greedy to take us explore a little bit more okay [00:14:11] take us explore a little bit more okay so then we generate episodes and then [00:14:12] so then we generate episodes and then from these episodes we want to learn [00:14:14] from these episodes we want to learn okay his episodes look like state action [00:14:18] okay his episodes look like state action reward states and then they keep going [00:14:20] reward states and then they keep going until like if you get a full episode one [00:14:23] until like if you get a full episode one thing to notice here is is the reward is [00:14:25] thing to notice here is is the reward is going to be 0 [00:14:26] going to be 0 throughout the episode until the very [00:14:28] throughout the episode until the very end of end of the game right like on [00:14:30] end of end of the game right like on till we end the episode and we might get [00:14:32] till we end the episode and we might get some reward at that point or we might [00:14:33] some reward at that point or we might not but but the reward throughout is [00:14:36] not but but the reward throughout is going to be equal to 0 because we were [00:14:38] going to be equal to 0 because we were playing a game right like you're not [00:14:39] playing a game right like you're not getting any rewards available and if you [00:14:41] getting any rewards available and if you think about each one of these small [00:14:43] think about each one of these small pieces of experience si RS prime you can [00:14:46] pieces of experience si RS prime you can try to learn something from each one of [00:14:48] try to learn something from each one of these pieces of experience okay so so [00:14:51] these pieces of experience okay so so what you have is actually going bored [00:14:53] what you have is actually going bored maybe what you have here is you have a [00:14:56] maybe what you have here is you have a piece of experience it's not like s hey [00:15:00] piece of experience it's not like s hey you get some reward maybe it is zero [00:15:02] you get some reward maybe it is zero that's fine if it is zero and you go to [00:15:04] that's fine if it is zero and you go to some s 5 through that so let's take an [00:15:09] some s 5 through that so let's take an action and you get a reward and you get [00:15:11] action and you get a reward and you get a reward you go to something from that [00:15:13] a reward you go to something from that and you have some prediction right your [00:15:17] and you have some prediction right your prediction is your current like your [00:15:19] prediction is your current like your current V function so your predict [00:15:26] current V function so your predict is going to be this V function at the [00:15:29] is going to be this V function at the state s prime meter eyes with W and this [00:15:32] state s prime meter eyes with W and this is what your already like you kind of [00:15:34] is what your already like you kind of know right now this is your current [00:15:35] know right now this is your current estimate or what he is and this is your [00:15:37] estimate or what he is and this is your prediction I'm writing the prediction as [00:15:39] prediction I'm writing the prediction as a function of W right because it depends [00:15:41] a function of W right because it depends on W and then we had a target that your [00:15:45] on W and then we had a target that your try to get to and my target which is [00:15:51] try to get to and my target which is kind of acts as a label it's going to be [00:15:54] kind of acts as a label it's going to be equal to my reward the reward that I'm [00:15:56] equal to my reward the reward that I'm getting so it's kind of the route so if [00:15:59] getting so it's kind of the route so if you look at this V of s and W well [00:16:02] you look at this V of s and W well what's kind of Polish to reward plus I'm [00:16:07] what's kind of Polish to reward plus I'm gonna write discount factor yeah del V [00:16:10] gonna write discount factor yeah del V of s prime right so so my target the [00:16:14] of s prime right so so my target the thing that I'm trying to like get to is [00:16:16] thing that I'm trying to like get to is the reward plus gamma V of s prime so [00:16:24] the reward plus gamma V of s prime so we're playing games in games gamma is [00:16:26] we're playing games in games gamma is usually one I'm gonna keep it here for [00:16:28] usually one I'm gonna keep it here for now but I'm gonna drop it at some point [00:16:29] now but I'm gonna drop it at some point so you don't need to really worry about [00:16:31] so you don't need to really worry about gamma and then one other thing to notice [00:16:33] gamma and then one other thing to notice here is I'm not writing target as a [00:16:35] here is I'm not writing target as a function of W because target acts kind [00:16:37] function of W because target acts kind of like my label right if I'm if I'm [00:16:39] of like my label right if I'm if I'm trying to do regression here how get is [00:16:41] trying to do regression here how get is my label it's kind of the ground truth [00:16:43] my label it's kind of the ground truth thing that I'm trying to get to so I'm [00:16:46] thing that I'm trying to get to so I'm gonna treat my target [00:16:47] gonna treat my target that's just like a value I'm not writing [00:16:49] that's just like a value I'm not writing it as function of W all right so so what [00:16:53] it as function of W all right so so what do we try to do usually like when you're [00:16:55] do we try to do usually like when you're trying to do learning yeah a prediction [00:16:56] trying to do learning yeah a prediction we have a target what do I do [00:17:00] we have a target what do I do minimize say yeah so what is there so I [00:17:02] minimize say yeah so what is there so I can write my error as potential you [00:17:04] can write my error as potential you squirt or so I'm gonna write 1/2 of [00:17:08] squirt or so I'm gonna write 1/2 of prediction of W minus target squirt this [00:17:14] prediction of W minus target squirt this is my square there I want to minimize [00:17:18] is my square there I want to minimize that so with respect to W okay how do I [00:17:23] that so with respect to W okay how do I do that I can take the gradient what is [00:17:27] do that I can take the gradient what is a gradient equal to this is simple right [00:17:33] a gradient equal to this is simple right so you do two gets cancelled gradient is [00:17:36] so you do two gets cancelled gradient is just this guy prediction of [00:17:39] just this guy prediction of w- target times the gradient of this [00:17:45] w- target times the gradient of this inner expression the gradient of this [00:17:47] inner expression the gradient of this inner expression with respect to w is [00:17:49] inner expression with respect to w is the gradient of prediction with respect [00:17:51] the gradient of prediction with respect to w minus zero plus target is treating [00:17:54] to w minus zero plus target is treating it as a number okay let me move this up [00:18:04] it as a number okay let me move this up so now I have the gradient what [00:18:06] so now I have the gradient what algorithm should I use I can use [00:18:12] algorithm should I use I can use gradient descent right so I'm going to [00:18:15] gradient descent right so I'm going to update my W how do I update it I'm going [00:18:19] update my W how do I update it I'm going to move in the negative direction of my [00:18:21] to move in the negative direction of my gradient using some learning rate a de [00:18:23] gradient using some learning rate a de times my gradient my gradient is [00:18:26] times my gradient my gradient is prediction of W minus target times [00:18:31] prediction of W minus target times gradient of prediction of W all right so [00:18:38] gradient of prediction of W all right so that's actually what's on the slide so [00:18:40] that's actually what's on the slide so the objective function there's [00:18:42] the objective function there's prediction - target squared [00:18:44] prediction - target squared gradient we just took that it's [00:18:45] gradient we just took that it's prediction - target times gradient of [00:18:48] prediction - target times gradient of prediction and then the update is just [00:18:50] prediction and then the update is just this this particular update probably [00:18:52] this this particular update probably move in a negative direction of the [00:18:53] move in a negative direction of the gradient this is this is what you guys [00:18:55] gradient this is this is what you guys have seen already all right so so far so [00:18:58] have seen already all right so so far so good [00:18:59] good um so this is the TD learning algorithm [00:19:03] um so this is the TD learning algorithm this is all it does so temporal [00:19:05] this is all it does so temporal difference learning what it does is it [00:19:07] difference learning what it does is it picks like these pieces of experience [00:19:09] picks like these pieces of experience si R s prime and then based on that [00:19:12] si R s prime and then based on that pieces of experience it just updates W [00:19:14] pieces of experience it just updates W based on this gradient descent update [00:19:16] based on this gradient descent update difference between prediction and target [00:19:19] difference between prediction and target times a gradient of V so what happens if [00:19:24] times a gradient of V so what happens if I have if I have this this linear [00:19:26] I have if I have this this linear function may be let me write let me [00:19:27] function may be let me write let me write this in the case that I have a [00:19:29] write this in the case that I have a linear linear function so what if my V [00:19:31] linear linear function so what if my V of s W is just equal to W dot feel this [00:19:39] of s W is just equal to W dot feel this LS so what happens to my update - ADA [00:19:47] LS so what happens to my update - ADA what is prediction right [00:19:53] what is prediction right although you don't feel this little horn [00:19:55] although you don't feel this little horn what is target did you find it up there [00:20:00] what is target did you find it up there it's the reward you're getting the [00:20:03] it's the reward you're getting the immediate reward you're getting plus [00:20:04] immediate reward you're getting plus gamma times V of s prime W which is w [00:20:10] gamma times V of s prime W which is w dot V of s prime times gradient of your [00:20:14] dot V of s prime times gradient of your prediction which is what's the illness [00:20:17] prediction which is what's the illness so I just I just wrote up this in the [00:20:20] so I just I just wrote up this in the case of a linear predictor case what you [00:20:23] case of a linear predictor case what you learn you know where the difference is [00:20:25] learn you know where the difference is between you two yeah so this is very [00:20:27] between you two yeah so this is very similar to Q learning they're very minor [00:20:29] similar to Q learning they're very minor differences that you'll talk about [00:20:30] differences that you'll talk about actually at the end of this section [00:20:31] actually at the end of this section comparing it to Q line all right so so I [00:20:34] comparing it to Q line all right so so I want to go over an example it's kind of [00:20:36] want to go over an example it's kind of like a tedious example but I think it [00:20:37] like a tedious example but I think it helps going over that and kind of saying [00:20:40] helps going over that and kind of saying why it works especially in the case that [00:20:42] why it works especially in the case that the reward is just equal to zero like [00:20:44] the reward is just equal to zero like throughout an episode so it kind of [00:20:46] throughout an episode so it kind of feels funny to use this algorithm and [00:20:48] feels funny to use this algorithm and make it work but it works so I want to [00:20:51] make it work but it works so I want to just go over like one example of this so [00:20:53] just go over like one example of this so I'm gonna show you one episode starting [00:20:56] I'm gonna show you one episode starting from s 1 to some other state and and I [00:20:59] from s 1 to some other state and and I have an episode I start from some state [00:21:01] have an episode I start from some state I get some features of that state again [00:21:04] I get some features of that state again these features are by just evaluating [00:21:06] these features are by just evaluating those hand-coded features and I'm just [00:21:09] those hand-coded features and I'm just gonna start what W should I start with 0 [00:21:13] gonna start what W should I start with 0 let me just initialized over u to be [00:21:15] let me just initialized over u to be equal to 0 ok right how do I update my W [00:21:21] equal to 0 ok right how do I update my W maybe let me just write it in this so so [00:21:23] maybe let me just write it in this so so this is I want to write in as simple or [00:21:26] this is I want to write in as simple or not the simpler form we're just in the [00:21:27] not the simpler form we're just in the other form so W the way we're updating [00:21:29] other form so W the way we're updating it is previous W - ADA times prediction [00:21:34] it is previous W - ADA times prediction - target I'm going to use P and T for [00:21:36] - target I'm going to use P and T for prediction - target times V of s this is [00:21:40] prediction - target times V of s this is update you are doing ok yeah that's [00:21:44] update you are doing ok yeah that's right okay so so what is my prediction [00:21:47] right okay so so what is my prediction with my prediction W dot C of s 0 what [00:21:54] with my prediction W dot C of s 0 what is my target so for my target I need to [00:21:56] is my target so for my target I need to know what state I'm ending up at I'm [00:21:58] know what state I'm ending up at I'm gonna end up at 1 0 in this episode and [00:22:01] gonna end up at 1 0 in this episode and I'm gonna get a reward of 0 so what is [00:22:03] I'm gonna get a reward of 0 so what is my target my target is reward which is 0 [00:22:06] my target my target is reward which is 0 plus double [00:22:07] plus double times V of s prime that is zero because [00:22:09] times V of s prime that is zero because W is equal to zero so my target is equal [00:22:11] W is equal to zero so my target is equal to zero my P minus P is equal to zero so [00:22:15] to zero my P minus P is equal to zero so P minus C is equal to zero this whole [00:22:17] P minus C is equal to zero this whole thing is 0 W stays the same so in the [00:22:20] thing is 0 W stays the same so in the next kind of step that we use just okay [00:22:25] next kind of step that we use just okay I'm gonna move forward so what is [00:22:29] I'm gonna move forward so what is prediction here 0 x 0 prediction is 0 [00:22:33] prediction here 0 x 0 prediction is 0 what is target I haven't yeah it's 0 [00:22:36] what is target I haven't yeah it's 0 because I haven't got any anything any [00:22:38] because I haven't got any anything any word yet about 1/2 so yeah so target is [00:22:44] word yet about 1/2 so yeah so target is going to be a reward which is 0 plus 0 [00:22:47] going to be a reward which is 0 plus 0 times whatever state of V of s prime [00:22:49] times whatever state of V of s prime that I'm at so that's equal to 0 P minus [00:22:51] that I'm at so that's equal to 0 P minus C is equal to 0 it's kind of boring so [00:22:54] C is equal to 0 it's kind of boring so at this point W haven't changed W is [00:22:59] at this point W haven't changed W is equal to 0 what is my prediction [00:23:01] equal to 0 what is my prediction prediction is equal to 0 that's great [00:23:04] prediction is equal to 0 that's great what is target equal to so I'm gonna end [00:23:07] what is target equal to so I'm gonna end up in an end state where I get 1 0 and I [00:23:13] up in an end state where I get 1 0 and I get a reward of 1 so this is the first [00:23:16] get a reward of 1 so this is the first time I'm getting a reward which is my [00:23:18] time I'm getting a reward which is my target to be my target is reward 1 plus [00:23:23] target to be my target is reward 1 plus 0 times 1 0 which is 0 so my target is 1 [00:23:26] 0 times 1 0 which is 0 so my target is 1 so what this tells me is I'm predicting [00:23:29] so what this tells me is I'm predicting 0 but my target is 1 so I need to push [00:23:32] 0 but my target is 1 so I need to push my w's a little bit up to actually [00:23:34] my w's a little bit up to actually address the fact that this is this is [00:23:35] address the fact that this is this is this is equal to 1 so P minus C is equal [00:23:38] this is equal to 1 so P minus C is equal to minus 1 so I need to do an update [00:23:41] to minus 1 so I need to do an update maybe I'll do that update here so how am [00:23:44] maybe I'll do that update here so how am i updating it so I'm doing starting from [00:23:46] i updating it so I'm doing starting from zero zero minus my ADA is 0.5 that's [00:23:51] zero zero minus my ADA is 0.5 that's what I allowed it like I put it I [00:23:53] what I allowed it like I put it I defined it to be my prediction - target [00:23:55] defined it to be my prediction - target is minus 1 what is fee of s P of s is 1 [00:23:59] is minus 1 what is fee of s P of s is 1 2 right so what should my new W be for [00:24:07] 2 right so what should my new W be for this an equal to point 5 and then 1 X [00:24:12] this an equal to point 5 and then 1 X I'm just doing arithmetic here [00:24:13] I'm just doing arithmetic here so my new W is going to become 0.5 and 1 [00:24:17] so my new W is going to become 0.5 and 1 at the end of this one episode so I just [00:24:20] at the end of this one episode so I just did a 1 episode 1 full [00:24:21] did a 1 episode 1 full we're w0 throughout and then at the very [00:24:24] we're w0 throughout and then at the very end when I got a reward then I updated [00:24:26] end when I got a reward then I updated my W because I realized that my [00:24:28] my W because I realized that my prediction and target or not the same [00:24:29] prediction and target or not the same thing okay so now I'm gonna I'm gonna [00:24:32] thing okay so now I'm gonna I'm gonna start a new episode and the new episode [00:24:34] start a new episode and the new episode I'm starting is going to start with this [00:24:36] I'm starting is going to start with this particular W and in the new episode even [00:24:38] particular W and in the new episode even though the rewards are gonna be 0 [00:24:40] though the rewards are gonna be 0 throughout so like we're actually going [00:24:41] throughout so like we're actually going to update our w's this is amazing think [00:25:01] to update our w's this is amazing think about the same features right you could [00:25:15] about the same features right you could have like I said up yeah depends on what [00:25:17] have like I said up yeah depends on what source or feature you can you could use [00:25:18] source or feature you can you could use like really not representative features [00:25:21] like really not representative features like if you really want S 4 and s 1 s 9 [00:25:23] like if you really want S 4 and s 1 s 9 to differentiate between them we should [00:25:25] to differentiate between them we should pick features that differentiates [00:25:26] pick features that differentiates between them but if you are kind of the [00:25:28] between them but if you are kind of the same and have the same sort of [00:25:29] same and have the same sort of characteristics it's fine to have it's [00:25:53] characteristics it's fine to have it's going to yeah you will never converge [00:25:54] going to yeah you will never converge and that kind of tells you that that [00:25:56] and that kind of tells you that that entry in your future victory you don't [00:25:57] entry in your future victory you don't care about that or it's always like it's [00:25:59] care about that or it's always like it's always staying the same and if it is [00:26:01] always staying the same and if it is always zero it doesn't matter like what [00:26:02] always zero it doesn't matter like what the weight of that entry is so in [00:26:04] the weight of that entry is so in general we wanna have feature is that or [00:26:05] general we wanna have feature is that or differentiating and you're losing [00:26:07] differentiating and you're losing something so for the second row I'm not [00:26:10] something so for the second row I'm not gonna write it up because that takes [00:26:12] gonna write it up because that takes time so so okay so let's start with a [00:26:15] time so so okay so let's start with a new episode we started this one again [00:26:18] new episode we started this one again but now I'm starting with this new W [00:26:20] but now I'm starting with this new W that I have so I can compute the [00:26:22] that I have so I can compute the prediction the prediction is one I can [00:26:24] prediction the prediction is one I can compute my target it's point five and [00:26:26] compute my target it's point five and what we realize here is we overshoot it [00:26:28] what we realize here is we overshoot it so before prediction was zero target was [00:26:30] so before prediction was zero target was one who are under shooting we fixed our [00:26:32] one who are under shooting we fixed our w's but now you're over shooting so we [00:26:35] w's but now you're over shooting so we to fix that variation on the [00:26:38] to fix that variation on the relationship between the teachers and [00:26:40] relationship between the teachers and elites they always have to be the same [00:26:42] elites they always have to be the same dimension and what should we be thinking [00:26:46] dimension and what should we be thinking about that would make a good feature for [00:26:47] about that would make a good feature for updating the link specifically like so [00:26:50] updating the link specifically like so okay so first off yes they need to be [00:26:52] okay so first off yes they need to be always the same there is dimension [00:26:53] always the same there is dimension because you're doing this dot product [00:26:55] because you're doing this dot product between them the feature selection you [00:26:59] between them the feature selection you don't necessarily think of it as like [00:27:02] don't necessarily think of it as like how am i updating the weights you think [00:27:03] how am i updating the weights you think of the feature selection as is it [00:27:04] of the feature selection as is it representative of how good my board is [00:27:06] representative of how good my board is is it for example in the case of [00:27:08] is it for example in the case of backgammon or is it representative of [00:27:10] backgammon or is it representative of how good I'm navigating so so it should [00:27:13] how good I'm navigating so so it should be a representation of how good your [00:27:14] be a representation of how good your state is and then it's yeah it's usually [00:27:16] state is and then it's yeah it's usually like hand designed right so so it's not [00:27:19] like hand designed right so so it's not necessarily you shouldn't think of it as [00:27:21] necessarily you shouldn't think of it as how is it helping my weights as you [00:27:23] how is it helping my weights as you think of it as how is it representing [00:27:25] think of it as how is it representing how good my state is the blackjack [00:27:28] how good my state is the blackjack example you have a threshold of 21 and [00:27:30] example you have a threshold of 21 and then your threshold intent if you're [00:27:32] then your threshold intent if you're using the same feature extraction for [00:27:33] using the same feature extraction for both how does that affect the [00:27:35] both how does that affect the generalizability of the model the agent [00:27:37] generalizability of the model the agent yeah so so you might choose do two [00:27:39] yeah so so you might choose do two different features and one of them might [00:27:40] different features and one of them might be more like so so there is kind of a [00:27:42] be more like so so there is kind of a trade-off right you might get a feature [00:27:44] trade-off right you might get a feature that actually the friendships between [00:27:46] that actually the friendships between different states very well but then that [00:27:48] different states very well but then that that makes learning longer that makes it [00:27:51] that makes learning longer that makes it not as generalizable and then the idea [00:27:52] not as generalizable and then the idea on the other hand you might get a [00:27:53] on the other hand you might get a feature that's pretty generalizable but [00:27:55] feature that's pretty generalizable but but then it might not do these specific [00:27:58] but then it might not do these specific things that you would want to do or [00:27:59] things that you would want to do or these differentiating factors about it [00:28:01] these differentiating factors about it so picking features it's an art right so [00:28:04] so picking features it's an art right so all right so let me let me move forward [00:28:07] all right so let me let me move forward because we have a bunch of things coming [00:28:09] because we have a bunch of things coming up okay so I'll go over this real quick [00:28:12] up okay so I'll go over this real quick then so we have the W's right so so we [00:28:14] then so we have the W's right so so we now update the W based on this new value [00:28:16] now update the W based on this new value and kind of similar thing you have a [00:28:19] and kind of similar thing you have a prediction you have a target you're [00:28:21] prediction you have a target you're still over shooting so you still need to [00:28:23] still over shooting so you still need to update it and then once you update it to [00:28:25] update it and then once you update it to a point 25 and point 75 and it kind of [00:28:28] a point 25 and point 75 and it kind of stays there you're happy okay all right [00:28:31] stays there you're happy okay all right so so this was just an example of TD [00:28:34] so so this was just an example of TD learning but this is the update that you [00:28:35] learning but this is the update that you have kind of already seen right and then [00:28:37] have kind of already seen right and then a lot of you pointed out that this is [00:28:39] a lot of you pointed out that this is this is similar to Q learning already [00:28:40] this is similar to Q learning already right this is actually pretty similar [00:28:42] right this is actually pretty similar the update is very similar like we have [00:28:45] the update is very similar like we have these gradients and [00:28:47] these gradients and same way that we have in q-learning and [00:28:49] same way that we have in q-learning and and we are looking at the difference [00:28:51] and we are looking at the difference between prediction and target same way [00:28:52] between prediction and target same way that we were looking at in cue learning [00:28:54] that we were looking at in cue learning but there are some minor differences so [00:28:55] but there are some minor differences so the first difference here is that [00:28:57] the first difference here is that q-learning [00:28:58] q-learning operates on the cue function q function [00:29:00] operates on the cue function q function is a function over state and actions [00:29:01] is a function over state and actions here we are operating on a value [00:29:04] here we are operating on a value function right on V and V is only a [00:29:06] function right on V and V is only a function of state right and part of that [00:29:08] function of state right and part of that is actually because in the setting and [00:29:11] is actually because in the setting and setting of a game you already know the [00:29:14] setting of a game you already know the rules of the game so we kind of already [00:29:15] rules of the game so we kind of already know the actions you don't need to worry [00:29:17] know the actions you don't need to worry about it as much the same way that [00:29:19] about it as much the same way that you're worrying about it in cue learning [00:29:21] you're worrying about it in cue learning the second difference is Q learning is [00:29:24] the second difference is Q learning is an auth policy algorithm so so the [00:29:26] an auth policy algorithm so so the values based on this estimate of the [00:29:28] values based on this estimate of the optimal policy which is this Q opt it's [00:29:31] optimal policy which is this Q opt it's based on this optimal policy but in the [00:29:34] based on this optimal policy but in the case of TD learning it's a non policy [00:29:36] case of TD learning it's a non policy the values based on this exploration [00:29:37] the values based on this exploration policy which is based on a fixed pie and [00:29:40] policy which is based on a fixed pie and sure you're updating the PI but you're [00:29:42] sure you're updating the PI but you're going with whatever PI you have and and [00:29:44] going with whatever PI you have and and kind of running with that I keep [00:29:46] kind of running with that I keep updating it okay so that's another [00:29:48] updating it okay so that's another difference and then finally like in Q [00:29:51] difference and then finally like in Q learning you don't need to know the MVP [00:29:53] learning you don't need to know the MVP transitions so you don't need to know [00:29:54] transitions so you don't need to know this transition function as transition [00:29:56] this transition function as transition from SI to s Prime but in the case of TD [00:29:59] from SI to s Prime but in the case of TD learning you need to know the rules of [00:30:01] learning you need to know the rules of the game so you need to know how the [00:30:03] the game so you need to know how the successor function OSN a works so so [00:30:07] successor function OSN a works so so those are some kind of minor differences [00:30:08] those are some kind of minor differences but from like a perspective of like how [00:30:11] but from like a perspective of like how your update works it is pretty similar [00:30:13] your update works it is pretty similar to what Q is [00:30:15] to what Q is all right so so that was kind of this [00:30:18] all right so so that was kind of this idea of I have this evaluation function [00:30:20] idea of I have this evaluation function I want to learn it from data I'm gonna [00:30:22] I want to learn it from data I'm gonna generate data from that generated data [00:30:24] generate data from that generated data I'm gonna update my W so that's what [00:30:26] I'm gonna update my W so that's what we've been talking about so far and the [00:30:29] we've been talking about so far and the idea of learning using learning to play [00:30:31] idea of learning using learning to play games is not a new idea actually so so [00:30:34] games is not a new idea actually so so in 50s [00:30:35] in 50s samuel looked at a checkers game program [00:30:38] samuel looked at a checkers game program so where he was using ideas from self [00:30:42] so where he was using ideas from self play and ideas from like similar type of [00:30:44] play and ideas from like similar type of things we have talked about using really [00:30:46] things we have talked about using really smart features using linear evaluation [00:30:48] smart features using linear evaluation functions to try to solve the checkers [00:30:50] functions to try to solve the checkers program so a bunch of other things that [00:30:52] program so a bunch of other things that he did included adding intermediate [00:30:55] he did included adding intermediate rewards so kind of threw out like sui to [00:30:57] rewards so kind of threw out like sui to get to the end point he added some [00:30:59] get to the end point he added some intermediate [00:30:59] intermediate words use alpha-beta pruning and some [00:31:02] words use alpha-beta pruning and some search heuristics and and it was kind of [00:31:03] search heuristics and and it was kind of impressive like what he did in 50's like [00:31:05] impressive like what he did in 50's like he ended up having this game that was [00:31:08] he ended up having this game that was playing like I was reaching like human [00:31:09] playing like I was reaching like human em amateur level of play and he only [00:31:12] em amateur level of play and he only used like 9k of memory which is like [00:31:15] used like 9k of memory which is like really impressive if you think about it [00:31:16] really impressive if you think about it so so this idea of learning in games is [00:31:19] so so this idea of learning in games is old people have been using it in the [00:31:21] old people have been using it in the case of backgammon this was around 90s [00:31:24] case of backgammon this was around 90s when tesora came up with with an [00:31:26] when tesora came up with with an algorithm to solve the game of [00:31:28] algorithm to solve the game of backgammon so he specifically used this [00:31:30] backgammon so he specifically used this TV Holanda for them which is similar to [00:31:33] TV Holanda for them which is similar to the TV learning that we have talked [00:31:34] the TV learning that we have talked about it has this lambda temperature [00:31:37] about it has this lambda temperature parameter that that kind of tells us how [00:31:39] parameter that that kind of tells us how good states are like as they get far [00:31:40] good states are like as they get far from the reward he didn't have any any [00:31:43] from the reward he didn't have any any intermediate rewards usually dumb [00:31:45] intermediate rewards usually dumb features but then he used neural [00:31:46] features but then he used neural networks which was kind of cool and he [00:31:49] networks which was kind of cool and he was able to reach human experts play and [00:31:52] was able to reach human experts play and kind of gave us and this kind of gave us [00:31:54] kind of gave us and this kind of gave us some insights into how to play games and [00:31:56] some insights into how to play games and how to solve like these really difficult [00:31:57] how to solve like these really difficult problems and then more recently we have [00:32:00] problems and then more recently we have been looking at the game of growth so in [00:32:02] been looking at the game of growth so in 2016 we had alphago which was using a [00:32:05] 2016 we had alphago which was using a lot of expert knowledge in addition to [00:32:08] lot of expert knowledge in addition to ideas from Monte Carlo tree search and [00:32:10] ideas from Monte Carlo tree search and then in 2017 yet alphago zero which [00:32:13] then in 2017 yet alphago zero which wasn't using even expert knowledge it [00:32:15] wasn't using even expert knowledge it was all like based on self play it was [00:32:18] was all like based on self play it was using dumb features neural networks and [00:32:20] using dumb features neural networks and then basically the main idea was using [00:32:23] then basically the main idea was using Monte Carlo tree search to try to solve [00:32:25] Monte Carlo tree search to try to solve this really challenging difficult [00:32:27] this really challenging difficult problem so I think in this section [00:32:29] problem so I think in this section you're gonna talk a little bit about [00:32:30] you're gonna talk a little bit about alphago 0-2 you're attending section all [00:32:36] alphago 0-2 you're attending section all right so the summary so far is we've [00:32:38] right so the summary so far is we've been talking about parameterizing these [00:32:39] been talking about parameterizing these evaluation functions using using [00:32:41] evaluation functions using using features and the idea of TD learning is [00:32:44] features and the idea of TD learning is to look at this error between our [00:32:46] to look at this error between our prediction and our target and try to [00:32:48] prediction and our target and try to minimize that error and find better [00:32:51] minimize that error and find better double use as we go through so um [00:32:53] double use as we go through so um alright so that was learning and in [00:32:56] alright so that was learning and in games so now I want to spend a little [00:32:59] games so now I want to spend a little bit of time talking about other [00:33:01] bit of time talking about other variations of games so so the setting [00:33:04] variations of games so so the setting where we take our games two simultaneous [00:33:06] where we take our games two simultaneous games from turn-based and then the [00:33:08] games from turn-based and then the setting or we go from zero song [00:33:10] setting or we go from zero song nonzero-sum alright okay simultaneous [00:33:17] nonzero-sum alright okay simultaneous games so um all right so so far we have [00:33:21] games so um all right so so far we have talked about turn-based games like chess [00:33:23] talked about turn-based games like chess where you play and an X player plays and [00:33:26] where you play and an X player plays and you play a next play in place and [00:33:27] you play a next play in place and minimax key and strategies seem to be [00:33:30] minimax key and strategies seem to be pretty okay but it comes to solving [00:33:32] pretty okay but it comes to solving these time bases but not all games are [00:33:35] these time bases but not all games are turn-based right like an example of it [00:33:37] turn-based right like an example of it is rock-paper-scissors you're all [00:33:39] is rock-paper-scissors you're all playing at the same time everyone is [00:33:40] playing at the same time everyone is playing simultaneously the question is [00:33:42] playing simultaneously the question is how do we go about solving a [00:33:45] how do we go about solving a simultaneous so let's search with a game [00:33:50] simultaneous so let's search with a game that is a simplified version of [00:33:52] that is a simplified version of rock-paper-scissors this is called a two [00:33:55] rock-paper-scissors this is called a two finger Mora game so the way it works is [00:33:58] finger Mora game so the way it works is we have two players player a and player [00:34:00] we have two players player a and player B and each player is going to show [00:34:03] B and each player is going to show either one finger or two fingers and [00:34:05] either one finger or two fingers and then you're playing at the same time and [00:34:07] then you're playing at the same time and then the way it works is if both of the [00:34:10] then the way it works is if both of the players show one at the same time then [00:34:13] players show one at the same time then player B gives two dollars to play or a [00:34:15] player B gives two dollars to play or a if both of you show two at the same time [00:34:18] if both of you show two at the same time player B gives player a four dollars and [00:34:21] player B gives player a four dollars and then if you show different numbers like [00:34:24] then if you show different numbers like 1 or 2 or 2 or 1 then player a has to [00:34:27] 1 or 2 or 2 or 1 then player a has to give they'll give 3 dollars to 2 player [00:34:29] give they'll give 3 dollars to 2 player B ok does that make sense so can you [00:34:32] B ok does that make sense so can you guys talk to your neighbors and play the [00:34:35] guys talk to your neighbors and play the same [00:34:37] [Music] [00:35:02] all right so what was the outcome how [00:35:08] all right so what was the outcome how many of you are in the case or a chose [00:35:10] many of you are in the case or a chose one Danby chose one oh yeah one ok [00:35:15] one Danby chose one oh yeah one ok Whomper here a chose one we chose to [00:35:19] Whomper here a chose one we chose to perfect like four people played so I [00:35:22] perfect like four people played so I chose to be chose one okay - and then - [00:35:28] chose to be chose one okay - and then - in - all right so so you can kind of see [00:35:32] in - all right so so you can kind of see like a whole mix of strategies here [00:35:35] like a whole mix of strategies here happening and this is the game that [00:35:36] happening and this is the game that you're going to talk about a little bit [00:35:38] you're going to talk about a little bit and think about what would be a good [00:35:39] and think about what would be a good strategy to use when you're solving this [00:35:42] strategy to use when you're solving this this simultaneous game yeah all right so [00:35:45] this simultaneous game yeah all right so um all right so let's formalize this [00:35:47] um all right so let's formalize this yeah player a and player B you have [00:35:50] yeah player a and player B you have these possible actions of showing one or [00:35:52] these possible actions of showing one or two and then you're gonna use this this [00:35:55] two and then you're gonna use this this payoff matrix which is which represents [00:35:57] payoff matrix which is which represents a utility if a chooses action a and B [00:36:01] a utility if a chooses action a and B chooses action B so so before we had [00:36:03] chooses action B so so before we had this this value function right before [00:36:05] this this value function right before the itis value function over over our [00:36:08] the itis value function over over our state here now we have this value [00:36:10] state here now we have this value function that is that is again from the [00:36:18] function that is that is again from the perspective of a agent a so remember [00:36:20] perspective of a agent a so remember like before we were thinking about value [00:36:22] like before we were thinking about value of function you were looking at it from [00:36:23] of function you were looking at it from the perspective of the first player the [00:36:25] the perspective of the first player the Maximizer player the agent now I'm [00:36:27] Maximizer player the agent now I'm looking at all of these scares from the [00:36:29] looking at all of these scares from the perspective of a player so so I'm trying [00:36:31] perspective of a player so so I'm trying to like get good things for a yeah and [00:36:38] to like get good things for a yeah and then this is like a one step game - [00:36:40] then this is like a one step game - right so so like you're just playing and [00:36:42] right so so like you're just playing and then you see what you get so so we're [00:36:43] then you see what you get so so we're not talking about it repeated games [00:36:45] not talking about it repeated games you're playing you see what happens okay [00:36:47] you're playing you see what happens okay so so we have this V which is V of a and [00:36:51] so so we have this V which is V of a and B and and this basically represent ace [00:36:54] B and and this basically represent ace utility if agent a plays a and if agent [00:36:58] utility if agent a plays a and if agent B plays and this is called [00:37:00] B plays and this is called and then you can represent this with a [00:37:01] and then you can represent this with a matrix and that's why it's called a [00:37:03] matrix and that's why it's called a payoff matrix I'm gonna write that [00:37:04] payoff matrix I'm gonna write that payoff matrix here so payoff matrix a [00:37:10] payoff matrix here so payoff matrix a you me here agent a can show one or can [00:37:14] you me here agent a can show one or can show two agent B can show one or can [00:37:17] show two agent B can show one or can show to right if both of us show one at [00:37:20] show to right if both of us show one at the same time agent a gets two dollars [00:37:22] the same time agent a gets two dollars if both of us show two at the same time [00:37:24] if both of us show two at the same time agent a gets four dollars otherwise [00:37:27] agent a gets four dollars otherwise agent a has to pay so agent a gets minus [00:37:30] agent a has to pay so agent a gets minus three dollars and again the reason I [00:37:33] three dollars and again the reason I only like to talk about one way is we [00:37:35] only like to talk about one way is we are still in the setting of zero-sum [00:37:37] are still in the setting of zero-sum games so whatever we age in a gets agent [00:37:40] games so whatever we age in a gets agent B gets negative of that right so so if [00:37:43] B gets negative of that right so so if agent a gets foreign dollars agent B is [00:37:45] agent a gets foreign dollars agent B is paying minus four dollars so I'm just [00:37:47] paying minus four dollars so I'm just writing one B from perspective of agent [00:37:50] writing one B from perspective of agent a and this is called the payoff matrix [00:37:52] a and this is called the payoff matrix all right so so now we need to talk [00:37:55] all right so so now we need to talk about what does a solution mean in this [00:37:58] about what does a solution mean in this setting so so what is a policy in this [00:37:59] setting so so what is a policy in this setting and then the way we refer to [00:38:02] setting and then the way we refer to them in this case are as strategies so [00:38:04] them in this case are as strategies so we have pure strategy which is almost [00:38:07] we have pure strategy which is almost like the same thing as as deterministic [00:38:10] like the same thing as as deterministic policies so a pure strategy is just a [00:38:13] policies so a pure strategy is just a single action that you decide to take so [00:38:15] single action that you decide to take so you have things like period strategies [00:38:19] you have things like period strategies your strategies the difference between [00:38:23] your strategies the difference between pure strategy and an deterministic [00:38:26] pure strategy and an deterministic policy is if you remember a [00:38:27] policy is if you remember a deterministic policy again is a function [00:38:28] deterministic policy again is a function of States right so so it's a policy as a [00:38:31] of States right so so it's a policy as a function of state it gives you an action [00:38:32] function of state it gives you an action here we have like a one move game right [00:38:34] here we have like a one move game right so it's just that one action and we call [00:38:36] so it's just that one action and we call it pure strategy we have also this other [00:38:40] it pure strategy we have also this other thing that's called mixed strategy which [00:38:43] thing that's called mixed strategy which is equivalent to two stochastic policies [00:38:45] is equivalent to two stochastic policies and what a mixed strategy is is is a [00:38:48] and what a mixed strategy is is is a probability distribution that tells you [00:38:50] probability distribution that tells you what's the probability of you choosing [00:38:51] what's the probability of you choosing so so fewer strategies are just actions [00:38:55] so so fewer strategies are just actions ace and then you can have things that [00:38:59] ace and then you can have things that are called mixed strategies and they are [00:39:03] are called mixed strategies and they are probabilities of our choosing [00:39:06] probabilities of our choosing actually okay all right [00:39:11] actually okay all right so here is an example so if you say well [00:39:14] so here is an example so if you say well I'm gonna show you one I'm going to [00:39:16] I'm gonna show you one I'm going to always show you one then the if you can [00:39:19] always show you one then the if you can you can write that strategy as a pure [00:39:21] you can write that strategy as a pure strategy that says I'm [00:39:22] strategy that says I'm always be probable do you want to show [00:39:24] always be probable do you want to show you one and probably what is zero show [00:39:26] you one and probably what is zero show you two so so let's say the first column [00:39:29] you two so so let's say the first column is for showing one the second column is [00:39:31] is for showing one the second column is for sure so so this is a pure strategy [00:39:33] for sure so so this is a pure strategy that says always I'm going to show you [00:39:35] that says always I'm going to show you one if I told you well I always I'm [00:39:37] one if I told you well I always I'm gonna show you two then I can write that [00:39:39] gonna show you two then I can write that strategy like this right with [00:39:41] strategy like this right with probability 1 [00:39:42] probability 1 I'm always showing you 2 I can also come [00:39:45] I'm always showing you 2 I can also come up with a mixed strategy mix strategy [00:39:46] up with a mixed strategy mix strategy with me I'm gonna flip a coin and if I [00:39:49] with me I'm gonna flip a coin and if I get 1/2 I'm gonna give you I find if I [00:39:51] get 1/2 I'm gonna give you I find if I get heads I'm gonna show you one if I [00:39:54] get heads I'm gonna show you one if I get tails I'm gonna show you two and [00:39:58] get tails I'm gonna show you two and then you can write that as this and this [00:40:00] then you can write that as this and this is going to be a mixed strategy you [00:40:01] is going to be a mixed strategy you could totally play that to your in the [00:40:02] could totally play that to your in the simultaneous game you could just bring [00:40:04] simultaneous game you could just bring chance in and be like half the time I'm [00:40:07] chance in and be like half the time I'm gonna show you one half the time I'm [00:40:09] gonna show you one half the time I'm going to show you two based on chance [00:40:11] going to show you two based on chance everyone happy with mixed strategies and [00:40:14] everyone happy with mixed strategies and your strategies alright so so how do we [00:40:18] your strategies alright so so how do we evaluate a value of a game so so [00:40:20] evaluate a value of a game so so remember in previous lecture and like in [00:40:23] remember in previous lecture and like in the MVP lecture even we were talking [00:40:25] the MVP lecture even we were talking about evaluating if someone gives me the [00:40:27] about evaluating if someone gives me the policy how do I evaluate how good that [00:40:29] policy how do I evaluate how good that is so the way we are evaluating that is [00:40:31] is so the way we are evaluating that is again by this value function me and we [00:40:33] again by this value function me and we are gonna write this value function as a [00:40:35] are gonna write this value function as a function of PI a and PI B you know I'll [00:40:38] function of PI a and PI B you know I'll just write that up here or I'm gonna [00:40:40] just write that up here or I'm gonna erase this so I'm gonna say a value of [00:40:46] erase this so I'm gonna say a value of agent a following PI a and agent be [00:40:49] agent a following PI a and agent be following PI B what is that equal to [00:40:51] following PI B what is that equal to well that is going to be the setting [00:40:53] well that is going to be the setting where PI a chooses action a PI B chooses [00:40:59] where PI a chooses action a PI B chooses action be x value of choice a and B [00:41:03] action be x value of choice a and B summing over all possible aims okay so [00:41:06] summing over all possible aims okay so so let's look at an actual example for [00:41:08] so let's look at an actual example for this so so for this particular case of [00:41:11] this so so for this particular case of two-finger more game let's say someone [00:41:13] two-finger more game let's say someone comes in and says I'm gonna tell you [00:41:16] comes in and says I'm gonna tell you what PI a is policy of agent a is just [00:41:20] what PI a is policy of agent a is just always show one and policy of agent B is [00:41:24] always show one and policy of agent B is this this mixed strategy which is half [00:41:27] this this mixed strategy which is half the time show one half the time show [00:41:29] the time show one half the time show show two and and the question is what is [00:41:32] show two and and the question is what is the value of these two policies how do [00:41:35] the value of these two policies how do we compute that [00:41:39] well I'm gonna use my payoff matrix [00:41:41] well I'm gonna use my payoff matrix right so so 1 times 1 over 2 times the [00:41:46] right so so 1 times 1 over 2 times the value that we get 1 which is equal to 2 [00:41:49] value that we get 1 which is equal to 2 so it's 1 times 1 1 over 2 times 2 plus [00:41:53] so it's 1 times 1 1 over 2 times 2 plus 0 times 1 over 2 times 4 plus 1 times 1 [00:42:02] 0 times 1 over 2 times 4 plus 1 times 1 over 2 times minus 3 the value that I [00:42:06] over 2 times minus 3 the value that I get is minus 3 plus 0 times 1 over 2 [00:42:10] get is minus 3 plus 0 times 1 over 2 times minus 3 and well what is that [00:42:15] times minus 3 and well what is that equal to what is that equal to [00:42:17] equal to what is that equal to there are two zeroes here that's minus 1 [00:42:19] there are two zeroes here that's minus 1 over 2 okay so I just computed that the [00:42:23] over 2 okay so I just computed that the value of these two policies is going to [00:42:26] value of these two policies is going to be minus 1 over 2 and again this is a [00:42:28] be minus 1 over 2 and again this is a front perspective of agent a and it kind [00:42:31] front perspective of agent a and it kind of makes sense right if agent a tells [00:42:33] of makes sense right if agent a tells you I'm gonna always show you 1 then [00:42:36] you I'm gonna always show you 1 then probably agent and an agent 2 is [00:42:37] probably agent and an agent 2 is following this mixed strategy agent a is [00:42:39] following this mixed strategy agent a is probably losing an agent a is losing [00:42:42] probably losing an agent a is losing minus 1 over 2 that opens of a whole set [00:43:07] minus 1 over 2 that opens of a whole set of new questions and you're not [00:43:08] of new questions and you're not discussing this class so that introduces [00:43:11] discussing this class so that introduces repeated games so you might be [00:43:13] repeated games so you might be interested in looking at what happens in [00:43:14] interested in looking at what happens in repeated games in this class right now [00:43:16] repeated games in this class right now we were just talking about this one step [00:43:17] we were just talking about this one step one play we're playing like zero-sum [00:43:19] one play we're playing like zero-sum game and we're playing like Jose [00:43:21] game and we're playing like Jose rock-paper-scissors and you just play [00:43:23] rock-paper-scissors and you just play once like you might say oh what happens [00:43:25] once like you might say oh what happens if you play you're like 10 times then [00:43:26] if you play you're like 10 times then you're building some relationship and [00:43:28] you're building some relationship and weird things can happen and so so that [00:43:31] weird things can happen and so so that introduces a whole new class oh all [00:43:35] introduces a whole new class oh all right so so the value is equal to minus [00:43:37] right so so the value is equal to minus 1 over 2 okay [00:43:39] 1 over 2 okay all right so so that was a game value so [00:43:42] all right so so that was a game value so so we just evaluated it right if someone [00:43:45] so we just evaluated it right if someone tells me it's Phi a and PI B I can [00:43:46] tells me it's Phi a and PI B I can evaluate it I can know how good PI a and [00:43:49] evaluate it I can know how good PI a and PI B is [00:43:50] PI B is from the perspective of agent a okay so [00:43:53] from the perspective of agent a okay so what do we want to do like when we saw [00:43:54] what do we want to do like when we saw we want to try to solve games all we [00:43:57] we want to try to solve games all we want to do is from the agent ace [00:43:58] want to do is from the agent ace perspective you want to maximize this I [00:44:01] perspective you want to maximize this I want to get as much money as possible [00:44:03] want to get as much money as possible and its values from my agent a [00:44:05] and its values from my agent a perspective so I should be trying to [00:44:07] perspective so I should be trying to maximize this agent mean you should be [00:44:09] maximize this agent mean you should be trying to minimize this like thinking [00:44:11] trying to minimize this like thinking minimax agent B should be making [00:44:12] minimax agent B should be making minimizing this agent a should be [00:44:14] minimizing this agent a should be maximizing this yeah that's what we want [00:44:16] maximizing this yeah that's what we want to do but the challenge here is we're [00:44:18] to do but the challenge here is we're playing simultaneously so we can't [00:44:20] playing simultaneously so we can't really use the minimax tree we can [00:44:22] really use the minimax tree we can remember any minimax 3 like in that [00:44:24] remember any minimax 3 like in that setting we had sequential place and then [00:44:26] setting we had sequential place and then quick like wait for agent a to play and [00:44:28] quick like wait for agent a to play and then after that play and that would give [00:44:30] then after that play and that would give us a lot of information here we are [00:44:32] us a lot of information here we are playing simultaneously so what should we [00:44:34] playing simultaneously so what should we do ok so what should we do so I'm gonna [00:44:38] do ok so what should we do so I'm gonna assume we can play sequential e so [00:44:40] assume we can play sequential e so that's what I want to do for now so so [00:44:42] that's what I want to do for now so so I'm gonna limit myself to pure [00:44:44] I'm gonna limit myself to pure strategies so maybe I'll I'll come over [00:44:48] strategies so maybe I'll I'll come over here so right now I'm gonna focus only [00:44:52] here so right now I'm gonna focus only on pure strategies I won't just consider [00:44:54] on pure strategies I won't just consider a setting very limited setting and see [00:44:58] a setting very limited setting and see what happens and I'm gonna assume oh [00:45:01] what happens and I'm gonna assume oh what if what if we were to play [00:45:02] what if what if we were to play sequentially what would happen how bad [00:45:04] sequentially what would happen how bad would it be if we were to play [00:45:06] would it be if we were to play sequentially so so we have the setting [00:45:09] sequentially so so we have the setting where player a place goes first [00:45:16] what do you think we do you think like [00:45:18] what do you think we do you think like if player a goes first is that better [00:45:20] if player a goes first is that better for player a or is that worse for player [00:45:22] for player a or is that worse for player i worse for player II ok so that's [00:45:25] i worse for player II ok so that's probably what's gonna happen find out so [00:45:28] probably what's gonna happen find out so player a was trying to maximize right [00:45:31] player a was trying to maximize right this me player B was trying to minimize [00:45:35] this me player B was trying to minimize right and then each of them have a have [00:45:38] right and then each of them have a have actions of either showing 1 or showing 2 [00:45:41] actions of either showing 1 or showing 2 this is player a B I can shoot one or [00:45:46] this is player a B I can shoot one or show one or two right if you do want if [00:45:48] show one or two right if you do want if we show one one player a gets what $2 is [00:45:51] we show one one player a gets what $2 is that right that's right and otherwise [00:45:57] that right that's right and otherwise for your it gets - three dollars if you [00:46:00] for your it gets - three dollars if you have two two player a gets four dollars [00:46:02] have two two player a gets four dollars so [00:46:04] so okay so so now if we have this [00:46:07] okay so so now if we have this sequential sitting if you're playing [00:46:09] sequential sitting if you're playing minimax then player B's going second [00:46:12] minimax then player B's going second player B is going to take the minimizer [00:46:14] player B is going to take the minimizer here so player B is going to be like [00:46:15] here so player B is going to be like this one and in this case player B is [00:46:18] this one and in this case player B is going to be like this one what should [00:46:20] going to be like this one what should player a do well in both cases player a [00:46:24] player a do well in both cases player a is getting minus three dollars it [00:46:25] is getting minus three dollars it doesn't actually matter play you're a [00:46:26] doesn't actually matter play you're a could do any of them and pull your a at [00:46:29] could do any of them and pull your a at the end of the day is going to get minus [00:46:31] the end of the day is going to get minus three dollars and this is a case where a [00:46:33] three dollars and this is a case where a player a goes first [00:46:34] player a goes first what if player a goes second second soso [00:46:44] what if player a goes second second soso then player B is going first player B is [00:46:47] then player B is going first player B is minimizing and then play your a is [00:46:50] minimizing and then play your a is maximizing and we have the same values [00:46:56] maximizing and we have the same values here okay [00:46:58] here okay so this is this is player a going second [00:47:01] so this is this is player a going second player a going second tries to maximize [00:47:04] player a going second tries to maximize so we'd like to pick these ones player [00:47:07] so we'd like to pick these ones player B's is here player B wants to minimize [00:47:10] B's is here player B wants to minimize so player B is going to be like okay if [00:47:14] so player B is going to be like okay if you're going second I'd rather I'd [00:47:16] you're going second I'd rather I'd rather show you one because by showing [00:47:18] rather show you one because by showing you one I'm losing less if I show you [00:47:21] you one I'm losing less if I show you two I'm losing even more so and then in [00:47:26] two I'm losing even more so and then in that setting we're gonna get to so [00:47:28] that setting we're gonna get to so player a is going to get two dollars all [00:47:33] player a is going to get two dollars all right so that was kind of intuitive if [00:47:35] right so that was kind of intuitive if you have fewer strategies it looks like [00:47:37] you have fewer strategies it looks like if you're going second that should be [00:47:39] if you're going second that should be better so so going second is no worse [00:47:43] better so so going second is no worse it's the same or better and that [00:47:46] it's the same or better and that basically can be represented by this [00:47:49] basically can be represented by this minimax relationship right so so agent a [00:47:52] minimax relationship right so so agent a is trying to maximize so so in this [00:47:54] is trying to maximize so so in this second case [00:47:57] in the second case we are maximizing [00:48:03] in the second case we are maximizing second over our actions of via a and B [00:48:06] second over our actions of via a and B and player B is going first so this is [00:48:11] and player B is going first so this is going to be greater than or equal to the [00:48:14] going to be greater than or equal to the case where player a is going first sorry [00:48:22] case where player a is going first sorry no not me that makes sense [00:48:31] so I'm gonna just write these things [00:48:35] so I'm gonna just write these things that you're learning throughout on the [00:48:37] that you're learning throughout on the side of the board maybe up here so what [00:48:40] side of the board maybe up here so what did we just learn we learned if we have [00:48:44] did we just learn we learned if we have fewer strategies if we have pure [00:48:47] fewer strategies if we have pure strategies right going second is better [00:48:56] that sounds intuitive and right okay so [00:49:03] that sounds intuitive and right okay so far says it okay so the question that I [00:49:08] far says it okay so the question that I want to try to think about right now is [00:49:09] want to try to think about right now is what if we have mixed strategies what's [00:49:12] what if we have mixed strategies what's gonna happen if you have mixed [00:49:13] gonna happen if you have mixed strategies are we going to get the same [00:49:14] strategies are we going to get the same thing look if you have mixed strategies [00:49:16] thing look if you have mixed strategies it's going second better or is it worse [00:49:18] it's going second better or is it worse or is it the same so that's a question [00:49:20] or is it the same so that's a question in charge at okay so so let's say player [00:49:22] in charge at okay so so let's say player a comes in and play your a says well I'm [00:49:25] a comes in and play your a says well I'm gonna reveal my strategy to you what I'm [00:49:27] gonna reveal my strategy to you what I'm gonna do is I'm gonna flip a coin [00:49:28] gonna do is I'm gonna flip a coin depending on what it comes I'm either [00:49:30] depending on what it comes I'm either show you're going to show you one or I'm [00:49:32] show you're going to show you one or I'm going to show you two that's what I'm [00:49:33] going to show you two that's what I'm gonna tell you tell you that's what I'm [00:49:34] gonna tell you tell you that's what I'm gonna do so so what would be the value [00:49:37] gonna do so so what would be the value of the game under that setting so the [00:49:39] of the game under that setting so the value of the game would be maybe all [00:49:42] value of the game would be maybe all right it's here so the value of PI a and [00:49:47] right it's here so the value of PI a and PI B PI ay is already this mixed [00:49:51] PI B PI ay is already this mixed strategy of 1/2 1/2 right is going to be [00:49:55] strategy of 1/2 1/2 right is going to be equal to PI is this actually alright so [00:50:00] equal to PI is this actually alright so what is that going to be equal to its [00:50:03] what is that going to be equal to its going to be PI B times 1 right type so [00:50:08] going to be PI B times 1 right type so it's going to be PI [00:50:09] it's going to be PI be choosing one times one happy [00:50:16] be choosing one times one happy probability 1/2 agent is also picking [00:50:18] probability 1/2 agent is also picking one if it is one one you're gonna get to [00:50:20] one if it is one one you're gonna get to write plus I be choosing one PI a with [00:50:29] write plus I be choosing one PI a with 1/2 choosing one and you're gonna get - [00:50:32] 1/2 choosing one and you're gonna get - $3 so choosing - you're gonna get - [00:50:35] $3 so choosing - you're gonna get - $3.00 plus PI be choosing 2 times 1/2 PI [00:50:42] $3.00 plus PI be choosing 2 times 1/2 PI a choosing do - you're gonna get $4 plus [00:50:45] a choosing do - you're gonna get $4 plus PI be choosing 2 times pi a choosing 1 [00:50:49] PI be choosing 2 times pi a choosing 1 and that's - - so I just like iterated [00:50:52] and that's - - so I just like iterated all the four options that we can get [00:50:54] all the four options that we can get here under the policy of PI be choosing [00:50:57] here under the policy of PI be choosing one or two and then PI is always just [00:51:00] one or two and then PI is always just are following this mixed strategy so [00:51:03] are following this mixed strategy so well what is this equal to [00:51:04] well what is this equal to that's equal to minus 1 over 2 pi B of 1 [00:51:09] that's equal to minus 1 over 2 pi B of 1 plus 1 over 2i b of 2 so that's the [00:51:16] plus 1 over 2i b of 2 so that's the value ok so again the setting is someone [00:51:18] value ok so again the setting is someone came in agent a came in AJ told me I'm [00:51:21] came in agent a came in AJ told me I'm following this mixed strategy this is [00:51:23] following this mixed strategy this is gonna be the thing I'm gonna do what [00:51:25] gonna be the thing I'm gonna do what should I do as an agency what should I [00:51:28] should I do as an agency what should I do as an agent Lee [00:51:32] so ok so now it's so quick so you always [00:51:35] so ok so now it's so quick so you always have to do one but why why is that well [00:51:38] have to do one but why why is that well well if agent a comes and tells me well [00:51:40] well if agent a comes and tells me well this is the thing I want to do I should [00:51:42] this is the thing I want to do I should try to minimize value of agent a right [00:51:44] try to minimize value of agent a right so what I'm really trying to do as agent [00:51:46] so what I'm really trying to do as agent B is to minimize this right cuz I don't [00:51:49] B is to minimize this right cuz I don't want agent a to get anything so if I'm [00:51:51] want agent a to get anything so if I'm minimizing this and some sense I'm [00:51:52] minimizing this and some sense I'm trying to come up with a policy that [00:51:54] trying to come up with a policy that minimizes this is a probability so it's [00:51:57] minimizes this is a probability so it's like a positive number I have like [00:51:58] like a positive number I have like positive part and negative part here the [00:52:01] positive part and negative part here the way to minimize this is to put as much [00:52:03] way to minimize this is to put as much weight as possible for this side and as [00:52:06] weight as possible for this side and as little as possible for this side so that [00:52:08] little as possible for this side so that tells me that never show too and always [00:52:11] tells me that never show too and always show one does anyone see that so so the [00:52:15] show one does anyone see that so so the best thing that I can do as agent 2 is [00:52:18] best thing that I can do as agent 2 is to follow a pure strategy that always [00:52:20] to follow a pure strategy that always shows 1 and never shows [00:52:24] okay so this was kind of interesting [00:52:28] okay so this was kind of interesting right like if someone comes in and tells [00:52:29] right like if someone comes in and tells me this is the thing this is a mixed [00:52:31] me this is the thing this is a mixed strategy I'm gonna follow I'll have a [00:52:33] strategy I'm gonna follow I'll have a solution in response to that and that [00:52:35] solution in response to that and that solution is always going to be a pure [00:52:36] solution is always going to be a pure strategy actually so that's I hope cool [00:52:45] all right so so this is actually what's [00:52:47] all right so so this is actually what's happening in a more general case I'm [00:52:48] happening in a more general case I'm gonna make a lot of generalizations in [00:52:50] gonna make a lot of generalizations in this lecture so I show one example and I [00:52:53] this lecture so I show one example and I generalize it but if you're interested [00:52:54] generalize it but if you're interested in details of it we can talk quite at [00:52:56] in details of it we can talk quite at offline so yeah so the setting is for [00:52:58] offline so yeah so the setting is for any fixed mixed strategy pie a so-so pie [00:53:02] any fixed mixed strategy pie a so-so pie a told me what their mixed strategy is [00:53:04] a told me what their mixed strategy is it's a fixed mixed and mixed strategy [00:53:06] it's a fixed mixed and mixed strategy what I should do as agent P's I should [00:53:08] what I should do as agent P's I should minimize that value I should pick PI B [00:53:10] minimize that value I should pick PI B in a way that minimizes that value and [00:53:12] in a way that minimizes that value and that can be attained by pure strategy so [00:53:15] that can be attained by pure strategy so the second thing that I've learned here [00:53:17] the second thing that I've learned here is if player a plays plays mixed [00:53:31] is if player a plays plays mixed strategy makes strategy player B has an [00:53:40] strategy makes strategy player B has an optimal pure strategy that's kind of [00:53:45] optimal pure strategy that's kind of interesting right okay so so in this [00:53:53] interesting right okay so so in this case also we haven't decided what the [00:53:56] case also we haven't decided what the policies should be yet right like behave [00:53:58] policies should be yet right like behave you have started you still we have still [00:54:00] you have started you still we have still been talking about the setting we're PI [00:54:01] been talking about the setting we're PI it like agent a comes in and tells us [00:54:03] it like agent a comes in and tells us what their policy is and we know how to [00:54:05] what their policy is and we know how to respond to it it's going to be a pure [00:54:06] respond to it it's going to be a pure strategy so now I want to figure out [00:54:09] strategy so now I want to figure out what is this this policy like what it [00:54:11] what is this this policy like what it what should be this mixed strategy [00:54:12] what should be this mixed strategy actually so so I want to think of it [00:54:14] actually so so I want to think of it more generally so so I want to go back [00:54:16] more generally so so I want to go back to those two diagrams and actually [00:54:18] to those two diagrams and actually modify those two diagrams in a way where [00:54:20] modify those two diagrams in a way where you talk about it a little bit more [00:54:22] you talk about it a little bit more generally maybe yeah I'll just modify [00:54:27] generally maybe yeah I'll just modify these okay so so let's say that okay and [00:54:32] these okay so so let's say that okay and I'm going to think about both of the [00:54:33] I'm going to think about both of the settings so let's say it again player a [00:54:35] settings so let's say it again player a is deciding to go [00:54:36] is deciding to go first player a is going to follow and [00:54:39] first player a is going to follow and make a mixed strategy so this is all we [00:54:41] make a mixed strategy so this is all we know but we don't know what mix strategy [00:54:43] know but we don't know what mix strategy play player a is going to decide to do [00:54:45] play player a is going to decide to do to follow a mixed strategy this is [00:54:48] to follow a mixed strategy this is player a player ace maximizing player a [00:54:51] player a player ace maximizing player a is falling a mixed strategy the way I'm [00:54:53] is falling a mixed strategy the way I'm writing at mixed strategy there's more [00:54:55] writing at mixed strategy there's more generally saying player a is gonna show [00:54:57] generally saying player a is gonna show one a bit probability P and is going to [00:54:59] one a bit probability P and is going to show two with probability 1 minus P or [00:55:02] show two with probability 1 minus P or generally like some something value okay [00:55:04] generally like some something value okay and then after that it's player B Stern [00:55:08] and then after that it's player B Stern we have just seen that player B the best [00:55:11] we have just seen that player B the best thing player B can do this is to do a [00:55:13] thing player B can do this is to do a pure strategy so player B is either 100% [00:55:16] pure strategy so player B is either 100% is going to pick one or a hundred [00:55:19] is going to pick one or a hundred percent is going to pick turns then like [00:55:43] percent is going to pick turns then like so the thing is that the strategies are [00:55:46] so the thing is that the strategies are probabilities right so there are values [00:55:48] probabilities right so there are values from 0 to 1 and then you kind of always [00:55:51] from 0 to 1 and then you kind of always end up with this negative turn that [00:55:53] end up with this negative turn that you're trying to eliminate as negative [00:55:54] you're trying to eliminate as negative as possible and this positive term that [00:55:56] as possible and this positive term that you're trying to get as positive as [00:55:57] you're trying to get as positive as possible and that's kind of intuitively [00:55:59] possible and that's kind of intuitively why you end up with a period strategy [00:56:01] why you end up with a period strategy and Mercury strategy what I mean is you [00:56:03] and Mercury strategy what I mean is you always end up like putting as much [00:56:05] always end up like putting as much possible like 1 like all your [00:56:07] possible like 1 like all your probabilities on the negative term and [00:56:09] probabilities on the negative term and nothing on the positive term because [00:56:11] nothing on the positive term because you're trying to minimize this so that's [00:56:12] you're trying to minimize this so that's kind of like intuitively why you are [00:56:14] kind of like intuitively why you are getting this here's to do so so you [00:56:18] getting this here's to do so so you wouldn't get one so silver that's what I [00:56:19] wouldn't get one so silver that's what I mean so like if you would have never get [00:56:21] mean so like if you would have never get like 1/2 + 1 if you get 1/2 and 1/2 [00:56:23] like 1/2 + 1 if you get 1/2 and 1/2 that's that's a mixed strategy that's [00:56:25] that's that's a mixed strategy that's not a pure strategy and I'm saying you [00:56:27] not a pure strategy and I'm saying you wouldn't get a mixed strategy because [00:56:29] wouldn't get a mixed strategy because you would always end up in this setting [00:56:31] you would always end up in this setting that to minimize this you end up pushing [00:56:33] that to minimize this you end up pushing all your probabilities is negative 1 all [00:56:39] all your probabilities is negative 1 all right so so all right so let me go back [00:56:40] right so so all right so let me go back to this so alright so we have this [00:56:48] to this so alright so we have this setting or a player a goes first play [00:56:50] setting or a player a goes first play you're a is following a mixed strategy [00:56:52] you're a is following a mixed strategy with P and 1 minus P player B is going [00:56:55] with P and 1 minus P player B is going to follow a period strategy either 1 or [00:56:57] to follow a period strategy either 1 or 2 I don't know which one right so what's [00:57:00] 2 I don't know which one right so what's gonna happen is if you have 1 1 and then [00:57:02] gonna happen is if you have 1 1 and then then that is going to give me 2 value 2 [00:57:05] then that is going to give me 2 value 2 all right so it's 2 times P I'm trying [00:57:08] all right so it's 2 times P I'm trying to write a value here by writing it wait [00:57:10] to write a value here by writing it wait is it 2 times P plus yeah 1 minus P [00:57:16] times 3 right so if we probability 1 [00:57:19] times 3 right so if we probability 1 minus P this guy's gonna pick 2 but this [00:57:21] minus P this guy's gonna pick 2 but this guy picks one you're gonna get - three - [00:57:24] guy picks one you're gonna get - three - three okay and then for this side if [00:57:28] three okay and then for this side if with probability 1 minus P a is gonna [00:57:31] with probability 1 minus P a is gonna show two if I am gonna show - then I'm [00:57:34] show two if I am gonna show - then I'm gonna get four so it's 4 times 1 minus P [00:57:39] gonna get four so it's 4 times 1 minus P and we 1200 TP designs are gonna show [00:57:42] and we 1200 TP designs are gonna show one I'm going to show 2 so that is minus [00:57:45] one I'm going to show 2 so that is minus 3 all right so what are these equal to [00:57:53] 3 all right so what are these equal to so this is equal to 5 P minus 3 that is [00:57:59] so this is equal to 5 P minus 3 that is equal to minus 7 P plus 4 right so so [00:58:06] equal to minus 7 P plus 4 right so so I'm talking about this more general [00:58:07] I'm talking about this more general cases in this more general case player a [00:58:09] cases in this more general case player a comes in clear is playing first and it's [00:58:13] comes in clear is playing first and it's following a mixed strategy but doesn't [00:58:15] following a mixed strategy but doesn't know what P they should choose they're [00:58:16] know what P they should choose they're choosing a P and 1 minus P here and then [00:58:19] choosing a P and 1 minus P here and then player B has to follow a pure strategy [00:58:21] player B has to follow a pure strategy that's what we decided [00:58:23] that's what we decided and then under that case we either get [00:58:25] and then under that case we either get 5p minus 3 and minus MP possible what [00:58:28] 5p minus 3 and minus MP possible what should player B do here this is player B [00:58:31] should player B do here this is player B and this min node [00:58:32] and this min node what should player B do which we should [00:58:35] what should player B do which we should player B pick one or two it should [00:58:40] player B pick one or two it should player B should pick a thing that [00:58:42] player B should pick a thing that minimizes between these two so player B [00:58:45] minimizes between these two so player B is going to take the minimum of 5p minus [00:58:50] is going to take the minimum of 5p minus 3 and minus 7 P plus 4 what should play [00:58:56] 3 and minus 7 P plus 4 what should play your ad what should player a do I'm [00:59:02] your ad what should player a do I'm thinking minimax right [00:59:04] thinking minimax right so when you think about the minimax play [00:59:06] so when you think about the minimax play you're a it's maximum maximizing the [00:59:08] you're a it's maximum maximizing the value so player is going to maximize the [00:59:10] value so player is going to maximize the value that comes up here so player is [00:59:13] value that comes up here so player is going to maximize that and also I'm [00:59:15] going to maximize that and also I'm saying clear air you needs to decide [00:59:17] saying clear air you needs to decide what PDR picking so they're gonna pick a [00:59:19] what PDR picking so they're gonna pick a P that maximizes that these computations [00:59:30] P that maximizes that these computations yeah so these are the four different [00:59:32] yeah so these are the four different things in my payoff matrix so I'm saying [00:59:36] things in my payoff matrix so I'm saying is with probability P a is going to show [00:59:39] is with probability P a is going to show me one right and I'm gonna go down this [00:59:41] me one right and I'm gonna go down this other route where B is also choosing one [00:59:43] other route where B is also choosing one so one like both of us are showing one [00:59:45] so one like both of us are showing one then I'm gonna get two right so I'm [00:59:47] then I'm gonna get two right so I'm gonna get $2 so that's where the two [00:59:49] gonna get $2 so that's where the two dollars comes from times probability P [00:59:51] dollars comes from times probability P with probability 1 minus P a is going to [00:59:54] with probability 1 minus P a is going to show me - I'm gonna show one that's - $3 [00:59:58] show me - I'm gonna show one that's - $3 times probability 1 minus P so so that's [01:00:01] times probability 1 minus P so so that's how and for this particular branch I [01:00:04] how and for this particular branch I know the payoff is going to be 5 P minus [01:00:06] know the payoff is going to be 5 P minus 3 that make sense and then for this side [01:00:09] 3 that make sense and then for this side again like with probability 1 minus P is [01:00:11] again like with probability 1 minus P is going to show me - if it is both of them [01:00:13] going to show me - if it is both of them - I'm gonna get $4 that's why it's 4 [01:00:15] - I'm gonna get $4 that's why it's 4 times probability 1 minus P your [01:00:18] times probability 1 minus P your probability P is going to show me one so [01:00:20] probability P is going to show me one so that's why I'll lose $3 that's -3 times [01:00:23] that's why I'll lose $3 that's -3 times probability P so that's minus 17 so and [01:00:27] probability P so that's minus 17 so and then and then the second player what [01:00:29] then and then the second player what they're gonna do is they're going to [01:00:30] they're gonna do is they're going to minimize between these two values [01:00:31] minimize between these two values they're gonna pick one or two they're [01:00:33] they're gonna pick one or two they're going they're deciding should I pick one [01:00:34] going they're deciding should I pick one or should I pick two and the way they [01:00:36] or should I pick two and the way they are deciding that is by trying to pick [01:00:38] are deciding that is by trying to pick pick one or two based on which one [01:00:40] pick one or two based on which one minimizes I'm writing it like using this [01:00:44] minimizes I'm writing it like using this variable P that's not decided yet and [01:00:46] variable P that's not decided yet and this variable P is the thing that player [01:00:48] this variable P is the thing that player a needs to decide so what what P should [01:00:51] a needs to decide so what what P should tell you're a decide employer a should [01:00:53] tell you're a decide employer a should decide repeated maximizes so I'm writing [01:00:55] decide repeated maximizes so I'm writing like literally a minimax relationship [01:00:57] like literally a minimax relationship here yeah all right so the interesting [01:01:02] here yeah all right so the interesting thing here is this 5p minus 3 is some [01:01:06] thing here is this 5p minus 3 is some line right with with positive slope [01:01:08] line right with with positive slope this is 5p minus and this minus 7 p + [01:01:13] this is 5p minus and this minus 7 p + where is another line - 7 P plus 4 [01:01:16] where is another line - 7 P plus 4 it's another line [01:01:18] it's another line negative slope what is the minimum of [01:01:20] negative slope what is the minimum of this where is going to be the minimum of [01:01:22] this where is going to be the minimum of this happening anymore of these two [01:01:24] this happening anymore of these two lines where they meet each other right [01:01:29] lines where they meet each other right this is going to be a minimum so so the [01:01:35] this is going to be a minimum so so the period I'm going to pick is going to be [01:01:39] period I'm going to pick is going to be actually the P where the value of P [01:01:41] actually the P where the value of P where these two are equal to each other [01:01:43] where these two are equal to each other and that turns out to be at I don't know [01:01:46] and that turns out to be at I don't know what it is 7 over 12 or something [01:01:48] what it is 7 over 12 or something actually I don't know where this what is [01:01:50] actually I don't know where this what is this value yeah so it's going to happen [01:01:53] this value yeah so it's going to happen at 7 over 12 and the value of it is [01:01:58] at 7 over 12 and the value of it is minus 1 over 12 right so ok so let's [01:02:07] minus 1 over 12 right so ok so let's recap ok what did I do [01:02:08] recap ok what did I do so I'm talking about the simultaneous [01:02:11] so I'm talking about the simultaneous game but I'm relaxing it and making it [01:02:12] game but I'm relaxing it and making it sequential I'm saying a is going to play [01:02:14] sequential I'm saying a is going to play first base playing second the thing [01:02:17] first base playing second the thing that's gonna happen is ace playing first [01:02:18] that's gonna happen is ace playing first is deciding to choose a mixed strategy [01:02:20] is deciding to choose a mixed strategy so he's deciding to say maybe 1/2 1/2 [01:02:24] so he's deciding to say maybe 1/2 1/2 well maybe the a doesn't want to say 1/2 [01:02:25] well maybe the a doesn't want to say 1/2 1/2 once to come up with some other [01:02:26] 1/2 once to come up with some other probabilities so the thing is deciding [01:02:29] probabilities so the thing is deciding is should I pick one and with [01:02:31] is should I pick one and with probability P and should I pick 2 with [01:02:33] probability P and should I pick 2 with probability 1 minus P and what should [01:02:35] probability 1 minus P and what should that P be so so what is the probability [01:02:36] that P be so so what is the probability I should be picking 1 so that's what a [01:02:39] I should be picking 1 so that's what a is trying to decide here ok so whatever [01:02:41] is trying to decide here ok so whatever a decides with P and 1 minus P ends up [01:02:44] a decides with P and 1 minus P ends up in two different results and based on [01:02:46] in two different results and based on them me is trying to minimize that when [01:02:49] them me is trying to minimize that when B is trying to minimize that B is [01:02:50] B is trying to minimize that B is minimizing between these two linear [01:02:52] minimizing between these two linear functions these two linear functions [01:02:54] functions these two linear functions meet at one point that is the points [01:02:56] meet at one point that is the points that this thing is going to be minimized [01:02:58] that this thing is going to be minimized and that actually corresponds to a [01:02:59] and that actually corresponds to a p-value when a nice to maximize this [01:03:02] p-value when a nice to maximize this this is I know but this is requires a [01:03:04] this is I know but this is requires a little bit of thinking but any [01:03:07] little bit of thinking but any clarification questions any see a lot of [01:03:10] clarification questions any see a lot of boss faces so yeah and then yeah the [01:03:20] boss faces so yeah and then yeah the interesting point is exactly right yeah [01:03:21] interesting point is exactly right yeah so a is 2 by the way of losing so even [01:03:23] so a is 2 by the way of losing so even in this case where a is trying to come [01:03:25] in this case where a is trying to come up with the best mixed strategy he could [01:03:27] up with the best mixed strategy he could do the best mixed strategy a is doing [01:03:29] do the best mixed strategy a is doing is shown show 1 with probability 7 over [01:03:32] is shown show 1 with probability 7 over 12 and show 2 with probability 5 over 12 [01:03:35] 12 and show 2 with probability 5 over 12 this comes from here even under that [01:03:38] this comes from here even under that scenario aces is losing minus 1 over 2 [01:03:41] scenario aces is losing minus 1 over 2 ok all right ok so also I haven't solved [01:03:49] ok all right ok so also I haven't solved a simultaneous game yeah that's right [01:03:50] a simultaneous game yeah that's right like I have talked about the setting [01:03:52] like I have talked about the setting where a place first so what if B plays [01:03:56] where a place first so what if B plays first so I'm gonna swap this what if he [01:04:00] first so I'm gonna swap this what if he plays first so a goes second B plays [01:04:03] plays first so a goes second B plays first I'm gonna modify this one now ok [01:04:11] first I'm gonna modify this one now ok he goes first [01:04:13] he goes first a is going second he is going to start [01:04:16] a is going second he is going to start is going to reveal this strategy his [01:04:18] is going to reveal this strategy his strategy the strategy that B is going to [01:04:20] strategy the strategy that B is going to reveal is also again I'm gonna with [01:04:23] reveal is also again I'm gonna with probability P show you 1 the probability [01:04:25] probability P show you 1 the probability 1 minus P show you show you 2 then a [01:04:30] 1 minus P show you show you 2 then a place is trying to maximize and a has to [01:04:36] place is trying to maximize and a has to play a pure strategy because of that [01:04:39] play a pure strategy because of that like the best thing I can do is go into [01:04:42] like the best thing I can do is go into the appear strategy always going to the [01:04:44] the appear strategy always going to the Eider showing 1 or 2 and a is deciding [01:04:46] Eider showing 1 or 2 and a is deciding which one but doesn't know yet and the [01:04:48] which one but doesn't know yet and the values here are going to be exactly the [01:04:50] values here are going to be exactly the same thing as third so there are five [01:04:51] same thing as third so there are five point five P minus three minus seven P [01:04:54] point five P minus three minus seven P plus four [01:04:56] plus four all right so what's happening here so so [01:04:59] all right so what's happening here so so in this case a is playing second what a [01:05:02] in this case a is playing second what a likes to do is a likes to maximize [01:05:04] likes to do is a likes to maximize between 5p minus 3 and minus 7 P plus 4 [01:05:09] between 5p minus 3 and minus 7 P plus 4 that's what a likes to do he is going [01:05:12] that's what a likes to do he is going second sorry is going first [01:05:14] second sorry is going first so then B has to minimize that and pick [01:05:18] so then B has to minimize that and pick it P that minimizes that okay so these [01:05:21] it P that minimizes that okay so these two are exactly the same two lines but [01:05:23] two are exactly the same two lines but now I'm picking the maximum of them the [01:05:25] now I'm picking the maximum of them the maximum of these two lines end up being [01:05:27] maximum of these two lines end up being exactly the same point as before ends up [01:05:30] exactly the same point as before ends up being exactly the same P is before and [01:05:32] being exactly the same P is before and giving you exactly the same value as [01:05:34] giving you exactly the same value as before so so this is also equal to minus [01:05:36] before so so this is also equal to minus 1 over 12 so what this is telling me is [01:05:41] 1 over 12 so what this is telling me is if you're playing a mixed strategy [01:05:43] if you're playing a mixed strategy even if you reveal your best to make [01:05:45] even if you reveal your best to make strategy at the beginning it doesn't [01:05:47] strategy at the beginning it doesn't matter it actually doesn't matter if [01:05:49] matter it actually doesn't matter if you're going first or second so like in [01:05:52] you're going first or second so like in the more a game and you were playing if [01:05:54] the more a game and you were playing if you were playing get make strategy and [01:05:56] you were playing get make strategy and you would tell your opponent this is the [01:05:57] you would tell your opponent this is the thing I'm gonna do and this is a mixed [01:05:58] thing I'm gonna do and this is a mixed strategy actually like anything what's [01:06:01] strategy actually like anything what's the optimal thing like like you didn't [01:06:03] the optimal thing like like you didn't matter like if they don't know it or not [01:06:04] matter like if they don't know it or not like you just still get the same value [01:06:07] like you just still get the same value so again you get 5p minus 3 and minus 7 [01:06:11] so again you get 5p minus 3 and minus 7 P plus 4 and then now you're minimizing [01:06:13] P plus 4 and then now you're minimizing or a maximum of these two lines maximum [01:06:15] or a maximum of these two lines maximum of these two lines ends up being at the [01:06:17] of these two lines ends up being at the same point and you pick a P that kind of [01:06:20] same point and you pick a P that kind of maximizes that and you get the same [01:06:23] maximizes that and you get the same value so this is called the Von Neumanns [01:06:25] value so this is called the Von Neumanns theorem so this whole thing that you [01:06:28] theorem so this whole thing that you just did over this one example there's a [01:06:30] just did over this one example there's a theorem about it that says for every [01:06:32] theorem about it that says for every simultaneous two player zero-sum game [01:06:35] simultaneous two player zero-sum game with a finite number of actions the [01:06:39] with a finite number of actions the order of place doesn't matter so he is [01:06:42] order of place doesn't matter so he is playing second or B's playing first the [01:06:45] playing second or B's playing first the values are going to be the same thing if [01:06:46] values are going to be the same thing if you're minimizing over a maximum we're [01:06:47] you're minimizing over a maximum we're maximizing minimum of that value it's [01:06:50] maximizing minimum of that value it's going to be the same thing okay so this [01:06:54] going to be the same thing okay so this is kind of the third thing that we just [01:06:56] is kind of the third thing that we just learned which is one Newman's theorem [01:07:06] learned which is one Newman's theorem which says if I'm writing a modification [01:07:10] which says if I'm writing a modification of simpler shorter version of it safe [01:07:13] of simpler shorter version of it safe playing makes strategy order of play [01:07:18] playing makes strategy order of play doesn't matter and remember if you play [01:07:27] doesn't matter and remember if you play mixed strategy your opponent is going to [01:07:29] mixed strategy your opponent is going to play fewer strategy because this is like [01:07:31] play fewer strategy because this is like this is the first point right if you [01:07:35] this is the first point right if you play mixed strategy your opponent is [01:07:37] play mixed strategy your opponent is going to follow a pure strategy I don't [01:07:39] going to follow a pure strategy I don't want to like you for doing that board or [01:07:46] want to like you for doing that board or anything [01:07:47] anything one of the two answers like look valid [01:07:49] one of the two answers like look valid I'll disappear one one like it will be [01:07:51] I'll disappear one one like it will be either one or two and then in that case [01:07:52] either one or two and then in that case the second music so in this case yeah so [01:07:58] the second music so in this case yeah so so the thing is these two end up being [01:08:00] so the thing is these two end up being equal so the way to it doesn't it [01:08:02] equal so the way to it doesn't it doesn't matter because your way for you [01:08:05] doesn't matter because your way for you to maximize this is going to be the [01:08:07] to maximize this is going to be the point where the two end up being equal [01:08:09] point where the two end up being equal so the two branches like if you actually [01:08:11] so the two branches like if you actually plug in P equal to 7 over to 12 here [01:08:15] plug in P equal to 7 over to 12 here like these two values end up being equal [01:08:19] like these two values end up being equal I'm not an interpretation they're [01:08:21] I'm not an interpretation they're actually equal and the reason that they [01:08:23] actually equal and the reason that they end up being equal is you are trying to [01:08:25] end up being equal is you are trying to minimize the thing that this guy is [01:08:26] minimize the thing that this guy is trying to maximize so you're trying to [01:08:29] trying to maximize so you're trying to pick the P that actually makes this [01:08:30] pick the P that actually makes this thing equal so no matter what your [01:08:32] thing equal so no matter what your opponent's does like you're gonna like [01:08:35] opponent's does like you're gonna like get the best thing that you can do so so [01:08:36] get the best thing that you can do so so yeah like think of it like this okay so [01:08:38] yeah like think of it like this okay so I'm player a I'm still I still have a [01:08:40] I'm player a I'm still I still have a choice my choice is to pick a P I want [01:08:42] choice my choice is to pick a P I want to pick a P that I'm not gonna like lose [01:08:45] to pick a P that I'm not gonna like lose as much what P should I pick I should [01:08:47] as much what P should I pick I should pick a P that makes these choices the [01:08:49] pick a P that makes these choices the same because if I pick a P that makes [01:08:51] same because if I pick a P that makes this one higher than this one of course [01:08:53] this one higher than this one of course the second player is going to make me [01:08:54] the second player is going to make me lose and then go down the routes that's [01:08:56] lose and then go down the routes that's that's better for the second player so [01:08:58] that's better for the second player so the best thing that I can do here is [01:08:59] the best thing that I can do here is make these two as equal as possible so [01:09:02] make these two as equal as possible so then the second player whatever they [01:09:04] then the second player whatever they choose choose one or two like it's gonna [01:09:06] choose choose one or two like it's gonna be the same thing it's going to be those [01:09:08] be the same thing it's going to be those does that make sense no expectations we [01:09:12] does that make sense no expectations we multiplied by P and one was easier [01:09:14] multiplied by P and one was easier saying like oh so in expectation you're [01:09:17] saying like oh so in expectation you're saying when you're choosing P yeah yeah [01:09:18] saying when you're choosing P yeah yeah so I'm just I'm treating P as a variable [01:09:21] so I'm just I'm treating P as a variable that I'm deciding right like peas the [01:09:22] that I'm deciding right like peas the thing I gotta be deciding so I'm player [01:09:25] thing I gotta be deciding so I'm player a I gotta be citing a P that's not gonna [01:09:28] a I gotta be citing a P that's not gonna be too bad for me like let's say I would [01:09:30] be too bad for me like let's say I would pick a P that doesn't make these things [01:09:32] pick a P that doesn't make these things equal let's say I don't know I would [01:09:33] equal let's say I don't know I would pick your P that makes this guy I don't [01:09:35] pick your P that makes this guy I don't know 10 and this makes this guy 5 the [01:09:38] know 10 and this makes this guy 5 the second player is of course going to make [01:09:40] second player is of course going to make me lose and of course is going to like [01:09:41] me lose and of course is going to like pick the thing that's going to be the [01:09:43] pick the thing that's going to be the worst thing for me so the best thing I [01:09:45] worst thing for me so the best thing I can do is I can like make both of them I [01:09:47] can do is I can like make both of them I don't know 7 so it's not gonna be as bad [01:09:50] don't know 7 so it's not gonna be as bad so so that's kind of the idea all right [01:09:53] so so that's kind of the idea all right so we move forward because there's so [01:09:56] so we move forward because there's so much things happening all right so it's [01:09:59] much things happening all right so it's okay so the kind of key idea here is [01:10:01] okay so the kind of key idea here is revealing your [01:10:02] revealing your optimal make strategy does not hurt you [01:10:04] optimal make strategy does not hurt you which is kind of a cool idea the proof [01:10:07] which is kind of a cool idea the proof of that is interesting if you're [01:10:08] of that is interesting if you're interested in look at the notes you can [01:10:10] interested in look at the notes you can use linear programming here the reason [01:10:13] use linear programming here the reason kind of the intuition behind it is if [01:10:15] kind of the intuition behind it is if you're playing mixed strategy the next [01:10:17] you're playing mixed strategy the next person it has to play pure strategy and [01:10:19] person it has to play pure strategy and you have n possible options for that [01:10:21] you have n possible options for that keyword strategy so that creates n [01:10:23] keyword strategy so that creates n constraints that you're putting in for [01:10:25] constraints that you're putting in for your optimization you end up with a [01:10:27] your optimization you end up with a single optimization with n constraints [01:10:29] single optimization with n constraints and then you can use like linear [01:10:30] and then you can use like linear programming duality to actually solve it [01:10:32] programming duality to actually solve it so you could compute this using linear [01:10:35] so you could compute this using linear programming and that's kind of the one [01:10:37] programming and that's kind of the one LS here so so let's summarize what we [01:10:39] LS here so so let's summarize what we have talked about so far so so we have [01:10:41] have talked about so far so so we have talked about these simultaneous games [01:10:43] talked about these simultaneous games and and we've talked about the setting [01:10:46] and and we've talked about the setting where we have pure strategies and we saw [01:10:48] where we have pure strategies and we saw that if you have pure strategies going [01:10:49] that if you have pure strategies going second is better right going second is [01:10:52] second is better right going second is better if you're just telling you like [01:10:53] better if you're just telling you like what's the pure strategy you're using [01:10:54] what's the pure strategy you're using right so that was none of the first [01:10:56] right so that was none of the first point couple and then if you're using [01:10:57] point couple and then if you're using mixed strategies it turns out it doesn't [01:10:59] mixed strategies it turns out it doesn't matter if you're going first or a second [01:11:01] matter if you're going first or a second you're telling them what you're mixed [01:11:02] you're telling them what you're mixed best mixed strategy is and they're going [01:11:05] best mixed strategy is and they're going to respond based on that so and that's [01:11:07] to respond based on that so and that's the Von Neumanns minimax there all right [01:11:10] the Von Neumanns minimax there all right so next ten minutes I want to spend a [01:11:12] so next ten minutes I want to spend a little bit of time talking about non [01:11:14] little bit of time talking about non zero sum games so so far we have talked [01:11:16] zero sum games so so far we have talked about zero-sum games where it's either [01:11:19] about zero-sum games where it's either minimax I get some reward you get the [01:11:21] minimax I get some reward you get the negative of that or vice versa there are [01:11:24] negative of that or vice versa there are also these other things called [01:11:25] also these other things called collaborative games where we were just [01:11:27] collaborative games where we were just supposed to maximizing something so we [01:11:29] supposed to maximizing something so we both get like money out of it and that's [01:11:32] both get like money out of it and that's kind of like a single optimization it's [01:11:34] kind of like a single optimization it's a single maximization you can think of [01:11:35] a single maximization you can think of it as playing search in real life you're [01:11:38] it as playing search in real life you're kind of somewhere in between that and I [01:11:40] kind of somewhere in between that and I want to motivate that by an example so I [01:11:42] want to motivate that by an example so I want to be that but by this idea of [01:11:44] want to be that but by this idea of prisoner's dilemma how many of you have [01:11:46] prisoner's dilemma how many of you have heard of prisoner's dilemma ok good ok [01:11:49] heard of prisoner's dilemma ok good ok so the idea of prisoner's dilemma is you [01:11:51] so the idea of prisoner's dilemma is you have a prosecutor who asks a and B [01:11:54] have a prosecutor who asks a and B individually if they will testify [01:11:55] individually if they will testify against each other or not ok if both of [01:11:58] against each other or not ok if both of them testified and both of them are [01:12:00] them testified and both of them are sentenced to five years in jail [01:12:01] sentenced to five years in jail if both of them refused and both of them [01:12:04] if both of them refused and both of them are sentenced to one year in jail if one [01:12:07] are sentenced to one year in jail if one testifies then he or she gets out for [01:12:09] testifies then he or she gets out for free and and then the other one gets ten [01:12:11] free and and then the other one gets ten years sentence [01:12:12] years sentence play with your partner real quick [01:12:33] okay so let's look at the payoff matrix [01:12:35] okay so let's look at the payoff matrix so I think you kind of have an idea of [01:12:37] so I think you kind of have an idea of how the game works so so you have two [01:12:42] how the game works so so you have two players a or B each one of you have an [01:12:45] players a or B each one of you have an option you can either testify or you can [01:12:49] option you can either testify or you can refuse to testify so you can you can [01:12:53] refuse to testify so you can you can testify and you can refuse to testify [01:12:57] testify and you can refuse to testify and I'm going to create this payoff [01:12:58] and I'm going to create this payoff matrix this payoff matrix is going to [01:13:00] matrix this payoff matrix is going to have two entries now in each one of [01:13:02] have two entries now in each one of these in these cells and then why is [01:13:05] these in these cells and then why is that because we have a non zero sum game [01:13:06] that because we have a non zero sum game before our payoff matrix only had one [01:13:08] before our payoff matrix only had one entry because this was for player a [01:13:10] entry because this was for player a player B would just get negative of that [01:13:12] player B would just get negative of that but now for your a and B are getting [01:13:14] but now for your a and B are getting different values so if both of us [01:13:15] different values so if both of us testify then both of us get five years [01:13:19] testify then both of us get five years jail right so hey gets five years in [01:13:21] jail right so hey gets five years in jail he gets five years right if both of [01:13:25] jail he gets five years right if both of us refuse and gets one year jail he gets [01:13:29] us refuse and gets one year jail he gets one year jail one year one year gee and [01:13:35] one year jail one year one year gee and then if it is a setting or one of us [01:13:38] then if it is a setting or one of us testifies the other one refuses one of [01:13:39] testifies the other one refuses one of us gets Nero the other one gets ten [01:13:41] us gets Nero the other one gets ten years jail so if I refuse to testify [01:13:44] years jail so if I refuse to testify then I get ten years and then B gets [01:13:49] then I get ten years and then B gets zero and then in this case a gets 0 and [01:13:53] zero and then in this case a gets 0 and B gets ten yeah so the payoff matrix is [01:14:00] B gets ten yeah so the payoff matrix is now going to be for every player we are [01:14:01] now going to be for every player we are gonna have a payoff matrix so so now we [01:14:04] gonna have a payoff matrix so so now we have this this v value function which is [01:14:06] have this this v value function which is a function of a player for policy a and [01:14:08] a function of a player for policy a and policy B will be the utility for one [01:14:10] policy B will be the utility for one particular player because you might be [01:14:12] particular player because you might be looking the idea from perspective of [01:14:13] looking the idea from perspective of different players okay so the one known [01:14:16] different players okay so the one known as minimax theorem it doesn't really [01:14:18] as minimax theorem it doesn't really apply here because we don't have the [01:14:20] apply here because we don't have the zero sum game but you actually get [01:14:22] zero sum game but you actually get something a little bit weaker and that's [01:14:24] something a little bit weaker and that's the idea of Nash equilibrium so a Nash [01:14:26] the idea of Nash equilibrium so a Nash equilibrium is set of policies PI star a [01:14:29] equilibrium is set of policies PI star a and PI star B so that no player has an [01:14:32] and PI star B so that no player has an incentive to change their strategy so [01:14:35] incentive to change their strategy so what does that mean so what that means [01:14:37] what does that mean so what that means is if you look at the value function [01:14:40] is if you look at the value function from perspective employer a value phone [01:14:42] from perspective employer a value phone from perspective player a at the Nash [01:14:45] from perspective player a at the Nash equilibrium at PI star a and PI star B [01:14:47] equilibrium at PI star a and PI star B is going to be greater than or equal to [01:14:49] is going to be greater than or equal to value of any other policy PI a if you [01:14:54] value of any other policy PI a if you fix pipe and at the same time the same [01:14:58] fix pipe and at the same time the same thing is true for value of me so for [01:15:00] thing is true for value of me so for agent p value of B Atma and Nash [01:15:02] agent p value of B Atma and Nash equilibrium is going to be greater than [01:15:04] equilibrium is going to be greater than or equal to a value of B at any other PI [01:15:07] or equal to a value of B at any other PI B if if PI a fixes their Falls okay so [01:15:11] B if if PI a fixes their Falls okay so what does that mean in this setting do [01:15:12] what does that mean in this setting do we have a Nash equilibrium here so let's [01:15:16] we have a Nash equilibrium here so let's say I start from here I start from a [01:15:18] say I start from here I start from a equal to minus 10 B equal to 0 can I get [01:15:21] equal to minus 10 B equal to 0 can I get this better and I make this better I did [01:15:24] this better and I make this better I did I flip them I only flipped right 0 minus [01:15:27] I flip them I only flipped right 0 minus n minus 10 0 okay so let's say I start [01:15:32] n minus 10 0 okay so let's say I start from here can I get can I get this [01:15:34] from here can I get can I get this better and I make this better I start [01:15:39] better and I make this better I start from this cell a gets 0 years of jail [01:15:42] from this cell a gets 0 years of jail that's pretty good he gets 10 years of [01:15:45] that's pretty good he gets 10 years of ill that's not that great so he has an [01:15:47] ill that's not that great so he has an incentive to change that right like he [01:15:50] incentive to change that right like he has an incentive to actually move in [01:15:52] has an incentive to actually move in this direction right so B has an [01:15:55] this direction right so B has an incentive to get 5 years jail instead of [01:15:56] incentive to get 5 years jail instead of 10 years similar thing here what if you [01:16:00] 10 years similar thing here what if you start here a has 1 year jail B has 1 [01:16:03] start here a has 1 year jail B has 1 year jail they have an incentive to [01:16:05] year jail they have an incentive to change this now and get 0 years jail he [01:16:09] change this now and get 0 years jail he has an incentive to change this and get [01:16:11] has an incentive to change this and get 0 and we end up with this cell where [01:16:15] 0 and we end up with this cell where like you don't have any incentive to [01:16:17] like you don't have any incentive to change our strategy so we have one Nash [01:16:19] change our strategy so we have one Nash equilibrium here and that one Nash [01:16:21] equilibrium here and that one Nash equilibrium here is both of us are [01:16:23] equilibrium here is both of us are testifying and both of us are getting 5 [01:16:25] testifying and both of us are getting 5 years jail it's kind of interesting [01:16:27] years jail it's kind of interesting because there is like a socially better [01:16:29] because there is like a socially better choice to have here I like both of us [01:16:30] choice to have here I like both of us like if both of us were with your fuse [01:16:32] like if both of us were with your fuse like you would each had one year jail [01:16:34] like you would each had one year jail but that's not gonna be Nash equilibria [01:16:38] all right so there's a theorem which is [01:16:44] all right so there's a theorem which is Ash's existence theorem which basically [01:16:46] Ash's existence theorem which basically says if any finite player game with [01:16:49] says if any finite player game with finite number of actions if you have any [01:16:50] finite number of actions if you have any finite for your game we find out number [01:16:52] finite for your game we find out number of actions then there exists at least [01:16:54] of actions then there exists at least one [01:16:55] one Shaku Libyan and then this is usually [01:16:57] Shaku Libyan and then this is usually one mixed strategy Nash equilibria on [01:17:00] one mixed strategy Nash equilibria on mixed strategy Nash equilibrium in this [01:17:02] mixed strategy Nash equilibrium in this case it's actually a pure strategy Nash [01:17:04] case it's actually a pure strategy Nash equilibria but but in general there's at [01:17:07] equilibria but but in general there's at least one Nash equilibrium if you have [01:17:09] least one Nash equilibrium if you have game this one okay all right [01:17:14] game this one okay all right so so let's look at a few other examples [01:17:16] so so let's look at a few other examples two-finger Maura what would be a Nash [01:17:19] two-finger Maura what would be a Nash equilibrium for that so we just actually [01:17:23] equilibrium for that so we just actually solved the dad's using the minimax [01:17:25] solved the dad's using the minimax oneness minimax theorem right so so it [01:17:27] oneness minimax theorem right so so it would be if you're playing it makes [01:17:29] would be if you're playing it makes strategy of seven over 12 and 5 over 12 [01:17:33] strategy of seven over 12 and 5 over 12 you might you might kind of modify your [01:17:36] you might you might kind of modify your two finger Mora game and make it [01:17:37] two finger Mora game and make it collaborative so in a collaborative [01:17:39] collaborative so in a collaborative setting and what that means is we both [01:17:41] setting and what that means is we both get two dollars or we both get four [01:17:43] get two dollars or we both get four dollars or we both lose three dollars so [01:17:46] dollars or we both lose three dollars so so a collaborative two finger morai game [01:17:49] so a collaborative two finger morai game it's not a zero-sum game anymore and and [01:17:52] it's not a zero-sum game anymore and and you have two Nash equilibria so you [01:17:55] you have two Nash equilibria so you would have a setting where a and B both [01:17:57] would have a setting where a and B both of them play one and the values two or a [01:17:59] of them play one and the values two or a and B both of them play two values 4 and [01:18:04] and B both of them play two values 4 and then prisoner's dilemma it's the case [01:18:07] then prisoner's dilemma it's the case where both of them passive life we just [01:18:08] where both of them passive life we just just saw that on the world alright okay [01:18:13] just saw that on the world alright okay so the summary so far is we've talked [01:18:15] so the summary so far is we've talked about simultaneous zero-sum games we [01:18:17] about simultaneous zero-sum games we talked about this one knowing mins [01:18:19] talked about this one knowing mins minimax theorem which has like multiple [01:18:22] minimax theorem which has like multiple minimax strategies and the single game [01:18:25] minimax strategies and the single game value like we had a single game value [01:18:27] value like we had a single game value because it was zero-sum but in the case [01:18:29] because it was zero-sum but in the case of non zero sum games we would have [01:18:32] of non zero sum games we would have something that's slightly weaker that's [01:18:33] something that's slightly weaker that's Nash's existence theorem we would still [01:18:36] Nash's existence theorem we would still have multiple Nash equilibria we could [01:18:38] have multiple Nash equilibria we could have multiple Nash equilibria but we [01:18:40] have multiple Nash equilibria but we have multiple we also have multiple game [01:18:42] have multiple we also have multiple game values from depending on whose [01:18:44] values from depending on whose perspective you're looking at so this [01:18:47] perspective you're looking at so this kind of was just a brief like short [01:18:50] kind of was just a brief like short introduction to game theory and econ [01:18:52] introduction to game theory and econ there is a huge literature around [01:18:53] there is a huge literature around different types of games in game theory [01:18:55] different types of games in game theory and economics if you're interested in [01:18:57] and economics if you're interested in that take classes and yeah there are [01:19:03] that take classes and yeah there are other types of games too like security [01:19:04] other types of games too like security games or resource allocation games that [01:19:07] games or resource allocation games that have [01:19:08] have some characteristics that are similar to [01:19:09] some characteristics that are similar to things you have talked about you're [01:19:11] things you have talked about you're interested in any of them maybe you can [01:19:12] interested in any of them maybe you can take a look at them would be useful for [01:19:14] take a look at them would be useful for projects and with that I'll see you guys [01:19:18] projects and with that I'll see you guys next time ================================================================================ LECTURE 023 ================================================================================ Constraint Satisfaction Problems (CSPs) 1 - Overview | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=-IO4fPO0rxk --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about constraint satisfaction problems [00:00:12] so before we get into constrained [00:00:14] so before we get into constrained satisfaction problems i just want to [00:00:15] satisfaction problems i just want to revisit where we've been in the course [00:00:17] revisit where we've been in the course we started off with machine learning and [00:00:20] we started off with machine learning and applied to reflex based models such as [00:00:22] applied to reflex based models such as classification or regression where the [00:00:25] classification or regression where the goal is just to output a single number [00:00:27] goal is just to output a single number or a label [00:00:29] or a label and then we looked at state-based models [00:00:32] and then we looked at state-based models in which case the goal was to output a [00:00:35] in which case the goal was to output a solution path [00:00:36] solution path and we thought in terms of states [00:00:38] and we thought in terms of states actions and costs or rewards [00:00:41] actions and costs or rewards and now we're going to embark on a new [00:00:43] and now we're going to embark on a new journey through variable based models [00:00:46] journey through variable based models it's going to be a new paradigm for [00:00:48] it's going to be a new paradigm for modeling in which case we're going to [00:00:50] modeling in which case we're going to think in terms of variables and factors [00:00:55] so [00:00:55] so the heart of variable based models is an [00:00:58] the heart of variable based models is an object called a factor graph [00:01:00] object called a factor graph we're going to define factor graphs [00:01:02] we're going to define factor graphs formally in the next module but for now [00:01:05] formally in the next module but for now let's just try to give some intuition so [00:01:07] let's just try to give some intuition so a factor graph [00:01:09] a factor graph consists of a set of variables usually [00:01:12] consists of a set of variables usually denoted x1 x2 [00:01:15] denoted x1 x2 x3 these are in circles [00:01:18] x3 these are in circles and a factor graph also contains a set [00:01:20] and a factor graph also contains a set of factors [00:01:22] of factors usually denoted f1 f2 f3 f4 these are [00:01:26] usually denoted f1 f2 f3 f4 these are going to be in squares [00:01:27] going to be in squares so now each factor as you'll notice here [00:01:30] so now each factor as you'll notice here touch is a subset of the variables [00:01:34] touch is a subset of the variables and so each factor is going to express [00:01:36] and so each factor is going to express some sort of preference or [00:01:38] some sort of preference or determine the relationship that a subset [00:01:41] determine the relationship that a subset of variables [00:01:42] of variables has so for example f2 is going to [00:01:45] has so for example f2 is going to specify how x1 and x2 are related and f3 [00:01:49] specify how x1 and x2 are related and f3 is going to specify how x2 and x3 are [00:01:51] is going to specify how x2 and x3 are related and f4 is going to specify how x [00:01:54] related and f4 is going to specify how x 3 is [00:01:55] 3 is should be related [00:01:57] should be related the objective of a constraint [00:01:59] the objective of a constraint satisfaction problem is to find the best [00:02:01] satisfaction problem is to find the best assignment of values [00:02:03] assignment of values to the variables where we're going to [00:02:05] to the variables where we're going to define what best means in a second [00:02:09] define what best means in a second so let's look at an example [00:02:12] so let's look at an example of [00:02:14] of a problem that can be solved via [00:02:16] a problem that can be solved via constraint satisfaction problem [00:02:18] constraint satisfaction problem so here's math coloring [00:02:20] so here's math coloring a classic problem here is a [00:02:23] a classic problem here is a map of australia [00:02:25] map of australia we have a number of provinces um seven [00:02:28] we have a number of provinces um seven to be exact [00:02:30] to be exact and each province on western australia [00:02:32] and each province on western australia northern territory south australia etc [00:02:35] northern territory south australia etc has to be assigned a color [00:02:38] has to be assigned a color and the question is how can we color [00:02:40] and the question is how can we color each province either red green or blue [00:02:42] each province either red green or blue so that no two neighboring provinces [00:02:45] so that no two neighboring provinces have the same color so we don't want [00:02:47] have the same color so we don't want western australia and northern territory [00:02:49] western australia and northern territory to have the same color [00:02:53] so here is one possible solution we can [00:02:56] so here is one possible solution we can color western australia red northern [00:02:58] color western australia red northern territory green and so on and you can [00:03:00] territory green and so on and you can double check that no two adjacent [00:03:02] double check that no two adjacent provinces have the same color here [00:03:06] provinces have the same color here so now this is a simple enough problem [00:03:07] so now this is a simple enough problem that we can just solve it by hand [00:03:10] that we can just solve it by hand but as usual we want to ask what are the [00:03:13] but as usual we want to ask what are the algorithmic principles or how do we come [00:03:15] algorithmic principles or how do we come up with something more general to solve [00:03:17] up with something more general to solve problems such as these [00:03:19] problems such as these when we encounter them [00:03:21] when we encounter them so before we talk about how we do this [00:03:23] so before we talk about how we do this with constraint satisfaction problems i [00:03:25] with constraint satisfaction problems i want to revisit how we might do it with [00:03:28] want to revisit how we might do it with as a state-based model [00:03:30] as a state-based model um because that's the hammer we have [00:03:34] um because that's the hammer we have so um [00:03:35] so um let's try to cast this as a search [00:03:37] let's try to cast this as a search problem [00:03:39] problem so [00:03:40] so we're going to start with initial state [00:03:43] we're going to start with initial state and this state is going to represent [00:03:45] and this state is going to represent not having assigned any promises any [00:03:47] not having assigned any promises any colors [00:03:49] colors and then from the state we can take [00:03:50] and then from the state we can take three possible actions we can grab wa [00:03:53] three possible actions we can grab wa and assign it red [00:03:55] and assign it red we can grab wa and assign it green or [00:03:58] we can grab wa and assign it green or you can grab wa and sign it blue [00:04:01] you can grab wa and sign it blue and from each of these [00:04:03] and from each of these points we can take nt and sign it red [00:04:06] points we can take nt and sign it red green or blue [00:04:08] green or blue red green or blue red green or blue [00:04:11] red green or blue red green or blue and you can see here that this is a [00:04:13] and you can see here that this is a search tree as the ones that we have [00:04:16] search tree as the ones that we have studied before [00:04:17] studied before and at the very bottom [00:04:20] and at the very bottom of the search tree [00:04:21] of the search tree we have a complete assignment to all the [00:04:25] we have a complete assignment to all the variables [00:04:27] variables and each assignment to all the variables [00:04:29] and each assignment to all the variables is going to be labeled with either a [00:04:32] is going to be labeled with either a zero [00:04:33] zero if it is inconsistent in other words it [00:04:36] if it is inconsistent in other words it doesn't solve the problem here the [00:04:37] doesn't solve the problem here the problem is that nt and sa are assigned [00:04:40] problem is that nt and sa are assigned the same color that's bad [00:04:42] the same color that's bad here's another complete assignment this [00:04:44] here's another complete assignment this is also bad because wna and nt share the [00:04:47] is also bad because wna and nt share the same color [00:04:49] same color here is an assignment that is good and [00:04:51] here is an assignment that is good and you can verify that all the provinces [00:04:54] you can verify that all the provinces that are neighboring each other have [00:04:56] that are neighboring each other have different colors and this is going to be [00:04:58] different colors and this is going to be denoted with a weight of one [00:05:02] denoted with a weight of one so in general each [00:05:04] so in general each state here represents a partial [00:05:06] state here represents a partial assignment of colors to variables [00:05:11] assignment of colors to variables and at the end of the day we can simply [00:05:14] and at the end of the day we can simply return any leaf that is consistent [00:05:16] return any leaf that is consistent for example this one [00:05:21] so this is a perfectly fine way of [00:05:24] so this is a perfectly fine way of solving [00:05:25] solving this problem [00:05:26] this problem and it goes to show how powerful these [00:05:28] and it goes to show how powerful these state-based models can be [00:05:31] state-based models can be just to recap the state here is a [00:05:33] just to recap the state here is a partial assignment of colors to [00:05:35] partial assignment of colors to provinces [00:05:36] provinces and from each state an action [00:05:39] and from each state an action assigns the next uncolored province a [00:05:42] assigns the next uncolored province a compatible color [00:05:43] compatible color so what's missing why [00:05:45] so what's missing why are we talking about this when we [00:05:47] are we talking about this when we already know how to solve it using a [00:05:48] already know how to solve it using a state-based model [00:05:50] state-based model well [00:05:51] well the question is can we do better than [00:05:53] the question is can we do better than this and the answer is going to be yes [00:05:55] this and the answer is going to be yes because there is more problem structure [00:05:58] because there is more problem structure let me say what i mean by that [00:06:00] let me say what i mean by that so notice that in this problem there's [00:06:04] so notice that in this problem there's just a bunch of provinces they need to [00:06:06] just a bunch of provinces they need to get us all assigned colors it doesn't [00:06:08] get us all assigned colors it doesn't matter which order i assign [00:06:11] matter which order i assign the colors [00:06:12] the colors in other words the variable ordering [00:06:14] in other words the variable ordering doesn't affect correctness [00:06:16] doesn't affect correctness which means that we can [00:06:18] which means that we can not just stick with a fixed ordering but [00:06:20] not just stick with a fixed ordering but we can optimize this ordering and this [00:06:23] we can optimize this ordering and this is something that inference algorithm [00:06:24] is something that inference algorithm can do for us [00:06:26] can do for us and secondly [00:06:27] and secondly the variables here are interdependent in [00:06:31] the variables here are interdependent in only a local way [00:06:33] only a local way and we can decompose the problem [00:06:36] and we can decompose the problem so for example [00:06:38] so for example here we see that [00:06:40] here we see that tasmania [00:06:42] tasmania is completely separated from the rest of [00:06:44] is completely separated from the rest of australia which means that we can [00:06:46] australia which means that we can effectively solve the two [00:06:49] effectively solve the two separate independent problems separately [00:06:51] separate independent problems separately and just combine the solutions [00:06:54] and just combine the solutions and this is as we'll see later [00:06:56] and this is as we'll see later is is great because allow us to really [00:06:59] is is great because allow us to really speed up search [00:07:03] so variable space models [00:07:06] so variable space models allow us to capture these two additional [00:07:09] allow us to capture these two additional pieces of structure [00:07:11] pieces of structure variables based models are umbrella term [00:07:13] variables based models are umbrella term that include constraint satisfaction [00:07:15] that include constraint satisfaction problems markov networks and bayesian [00:07:18] problems markov networks and bayesian networks which all of which we're going [00:07:19] networks which all of which we're going to get through over the next few weeks [00:07:23] to get through over the next few weeks and the key idea [00:07:25] and the key idea behind variable space models is we want [00:07:27] behind variable space models is we want to think in terms of variables and in [00:07:30] to think in terms of variables and in variables based models a solution to a [00:07:33] variables based models a solution to a problem [00:07:34] problem is simply an assignment to the variables [00:07:37] is simply an assignment to the variables and so when you're modeling using [00:07:39] and so when you're modeling using various models we want to set up a set [00:07:41] various models we want to set up a set of variables so that the solution is an [00:07:44] of variables so that the solution is an assignment to the variables [00:07:47] assignment to the variables and the decisions about how to choose [00:07:50] and the decisions about how to choose the ordering of the variables and how to [00:07:53] the ordering of the variables and how to determine which [00:07:54] determine which variables to set first this is going to [00:07:57] variables to set first this is going to be chosen by the inference algorithm [00:08:00] be chosen by the inference algorithm and the key idea here is that you can [00:08:03] and the key idea here is that you can think about variable based models as a [00:08:05] think about variable based models as a higher level modeling language than [00:08:07] higher level modeling language than spade based models so here's an [00:08:09] spade based models so here's an imperfect analogy from programming [00:08:11] imperfect analogy from programming languages [00:08:12] languages so if you were just trying to solve a [00:08:15] so if you were just trying to solve a problem directly in an ad hoc way that's [00:08:17] problem directly in an ad hoc way that's kind of like writing an assembly you [00:08:19] kind of like writing an assembly you just kind of go at it [00:08:21] just kind of go at it um [00:08:22] um if you were using um [00:08:24] if you were using um you know c or c plus plus [00:08:27] you know c or c plus plus that's kind of like using state-based [00:08:29] that's kind of like using state-based models it gives you a higher level [00:08:31] models it gives you a higher level abstraction which is powerful um and [00:08:34] abstraction which is powerful um and allows you to save a lot of kind of [00:08:36] allows you to save a lot of kind of headaches [00:08:38] headaches um but variable based models are kind of [00:08:40] um but variable based models are kind of even a higher level language like let's [00:08:42] even a higher level language like let's say python [00:08:44] say python which allows you to think [00:08:45] which allows you to think um [00:08:46] um purely in terms of kind of the variables [00:08:49] purely in terms of kind of the variables and the modeling and let the inference [00:08:51] and the modeling and let the inference algorithm do more of the work which is [00:08:52] algorithm do more of the work which is always good because then you can spend [00:08:54] always good because then you can spend more time doing the fun stuff which is a [00:08:56] more time doing the fun stuff which is a lot line [00:08:59] so [00:09:00] so i'm going to talk about first constraint [00:09:02] i'm going to talk about first constraint satisfaction problems constraint [00:09:04] satisfaction problems constraint satisfaction problems appear in a number [00:09:06] satisfaction problems appear in a number of applications most of which revolve [00:09:09] of applications most of which revolve around large-scale logistics scheduling [00:09:11] around large-scale logistics scheduling and supply chain management so companies [00:09:14] and supply chain management so companies such as amazon have to figure out how to [00:09:16] such as amazon have to figure out how to put packages on vehicles and deliver [00:09:19] put packages on vehicles and deliver them to customers [00:09:21] them to customers and at the same time minimizing costs [00:09:23] and at the same time minimizing costs and meeting all those promise delivery [00:09:25] and meeting all those promise delivery times [00:09:26] times and so here the variables might be the [00:09:29] and so here the variables might be the assignment of packages to vehicles and [00:09:33] assignment of packages to vehicles and the factors would include [00:09:35] the factors would include travel times and various costs [00:09:38] travel times and various costs so ride sharing services such as uber [00:09:40] so ride sharing services such as uber and lyft also have to [00:09:42] and lyft also have to figure out how to best assign drivers to [00:09:44] figure out how to best assign drivers to riders and all these are extensions of a [00:09:46] riders and all these are extensions of a classical vehicle routing problem [00:09:49] classical vehicle routing problem um here's another example from [00:09:51] um here's another example from sports scheduling so the nfl every year [00:09:54] sports scheduling so the nfl every year they have to schedule [00:09:56] they have to schedule which teams play which other teams and [00:09:58] which teams play which other teams and when these games are going to be [00:10:01] when these games are going to be held [00:10:02] held and the schedule here should minimize [00:10:04] and the schedule here should minimize travel times between uh of teams [00:10:07] travel times between uh of teams um they have to be a time where they fit [00:10:09] um they have to be a time where they fit the tv broadcast schedule [00:10:11] the tv broadcast schedule um you want to be fair against [00:10:13] um you want to be fair against across teams and so on [00:10:16] across teams and so on so um other problem scheduling problems [00:10:19] so um other problem scheduling problems such as these also involve assigning [00:10:21] such as these also involve assigning courses [00:10:23] courses to slots so the registrar office has a [00:10:26] to slots so the registrar office has a number of courses that need to be [00:10:27] number of courses that need to be offered every quarter and they have to [00:10:29] offered every quarter and they have to figure out which classrooms to have [00:10:31] figure out which classrooms to have these courses in and at what's various [00:10:34] these courses in and at what's various time slots again training off various [00:10:36] time slots again training off various constraints like preferences and [00:10:39] constraints like preferences and availability [00:10:42] availability so a final application of [00:10:45] so a final application of constraint satisfaction problems is a [00:10:47] constraint satisfaction problems is a little bit different and this is uh [00:10:49] little bit different and this is uh called the formal verification of [00:10:52] called the formal verification of circuits and programs so say you have a [00:10:54] circuits and programs so say you have a computer program and you want to prove [00:10:57] computer program and you want to prove that this program [00:10:58] that this program is correct let's say the program is [00:11:00] is correct let's say the program is trying to do something like sort numbers [00:11:02] trying to do something like sort numbers um [00:11:03] um so here what you can do is [00:11:06] so here what you can do is um normally you would let's say test the [00:11:08] um normally you would let's say test the program design a bunch of test cases run [00:11:10] program design a bunch of test cases run the program and see what happens but [00:11:12] the program and see what happens but this how do you know for sure that it [00:11:14] this how do you know for sure that it works on all inputs [00:11:16] works on all inputs so this is where verification comes in [00:11:18] so this is where verification comes in you want to actually check that it works [00:11:20] you want to actually check that it works for all inputs [00:11:22] for all inputs um so the way you would set this up is [00:11:24] um so the way you would set this up is that you define a set of variables which [00:11:26] that you define a set of variables which correspond to the unknown inputs to the [00:11:29] correspond to the unknown inputs to the program [00:11:30] program and then the factors encode the program [00:11:32] and then the factors encode the program itself it's going to encode how [00:11:35] itself it's going to encode how execution proceeds line to line [00:11:38] execution proceeds line to line and then you're going to ask the [00:11:40] and then you're going to ask the question whether there exists a program [00:11:42] question whether there exists a program input that produces an error or an [00:11:45] input that produces an error or an incorrect result [00:11:47] incorrect result so unlike the other [00:11:49] so unlike the other applications of csps where you're trying [00:11:52] applications of csps where you're trying to find a satisfying assignment [00:11:55] to find a satisfying assignment in form of vacation you're trying to [00:11:57] in form of vacation you're trying to prove the [00:11:58] prove the that no such satisfying [00:12:00] that no such satisfying assignment exists because that would [00:12:02] assignment exists because that would mean an error in your program [00:12:06] so here is a road map for the rest of [00:12:09] so here is a road map for the rest of the modules on csps [00:12:12] the modules on csps so first we're going to talk about the [00:12:14] so first we're going to talk about the definition of a constraint satisfaction [00:12:17] definition of a constraint satisfaction problem and factor graphs do it more [00:12:19] problem and factor graphs do it more formally then we're going to give a few [00:12:21] formally then we're going to give a few examples of csps [00:12:23] examples of csps then we're going to move over to [00:12:24] then we're going to move over to inference [00:12:25] inference we're just going to start by [00:12:27] we're just going to start by talking about backtracking search [00:12:29] talking about backtracking search which is in the worst case exponential [00:12:31] which is in the worst case exponential time unfortunately [00:12:34] time unfortunately but but there are a number of ways to [00:12:35] but but there are a number of ways to speed up search [00:12:37] speed up search taking full advantage of the fact that [00:12:38] taking full advantage of the fact that we can assign variables in any order we [00:12:40] we can assign variables in any order we can look at dynamic order [00:12:43] can look at dynamic order ordering which we're using heuristics to [00:12:45] ordering which we're using heuristics to figure out which variables assigned [00:12:47] figure out which variables assigned first [00:12:48] first and then we're going to look at a [00:12:49] and then we're going to look at a pruning strategy based on our [00:12:51] pruning strategy based on our consistency which is going to allow us [00:12:54] consistency which is going to allow us to prune out various values [00:12:56] to prune out various values for each of the variables which are not [00:12:59] for each of the variables which are not promising to explore [00:13:01] promising to explore so that dynamic ordering can be much [00:13:03] so that dynamic ordering can be much more effective [00:13:06] more effective but in case you're impatient and don't [00:13:08] but in case you're impatient and don't want to wait a next month of time but [00:13:10] want to wait a next month of time but you're satisfied with an approximate [00:13:13] you're satisfied with an approximate solution you can also do approximate [00:13:15] solution you can also do approximate search so here there's two algorithms [00:13:17] search so here there's two algorithms beam search [00:13:18] beam search which is kind of a [00:13:21] which is kind of a extension of the greedy search algorithm [00:13:24] extension of the greedy search algorithm but it's a little bit smarter it's going [00:13:26] but it's a little bit smarter it's going to small explore only a small fraction [00:13:28] to small explore only a small fraction of the exponentially sized search tree [00:13:31] of the exponentially sized search tree and local search is going to take an [00:13:33] and local search is going to take an initial assignment to all the variables [00:13:35] initial assignment to all the variables and just trying to improve it by [00:13:37] and just trying to improve it by changing [00:13:38] changing one variable at a time [00:13:40] one variable at a time all right so that's it for this overview [00:13:42] all right so that's it for this overview module [00:13:49] you ================================================================================ LECTURE 024 ================================================================================ Constraint Satisfaction Problems (CSPs) 2 - Definitions | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=uj5wCcHsSlA --- Transcript [00:00:05] hi in this module i'm going to formally [00:00:07] hi in this module i'm going to formally define constraint satisfaction problems [00:00:09] define constraint satisfaction problems and the more general notion of a factor [00:00:11] and the more general notion of a factor graph [00:00:13] graph so let's begin with an example a voting [00:00:16] so let's begin with an example a voting example so let's imagine there are three [00:00:18] example so let's imagine there are three people [00:00:19] people person one person two and person three [00:00:23] person one person two and person three and each one is going to cast the vote [00:00:25] and each one is going to cast the vote either blue or red blue red [00:00:28] either blue or red blue red blue or red and we know something about [00:00:31] blue or red and we know something about these people we know that person one is [00:00:34] these people we know that person one is definitely going to vote blue here and [00:00:36] definitely going to vote blue here and we know that person three is gonna is [00:00:39] we know that person three is gonna is leaning red [00:00:41] leaning red we also know that person one and person [00:00:42] we also know that person one and person two are really close friends so they [00:00:44] two are really close friends so they must agree on their vote whereas person [00:00:46] must agree on their vote whereas person two and person three are mere [00:00:47] two and person three are mere acquaintances and their votes only tend [00:00:50] acquaintances and their votes only tend to agree [00:00:52] to agree so the question is um how are all these [00:00:55] so the question is um how are all these people going to influence each other and [00:00:58] people going to influence each other and ultimately cast votes [00:01:00] ultimately cast votes so we can model this problem using a [00:01:02] so we can model this problem using a factor graph [00:01:04] factor graph we're going to define [00:01:05] we're going to define a set of variables x1 for person 1 x2 [00:01:09] a set of variables x1 for person 1 x2 for person 2 x3 for person three [00:01:13] for person 2 x3 for person three and we're going to define a set of [00:01:15] and we're going to define a set of factors that capture each of these four [00:01:19] factors that capture each of these four uh constraints or preferences [00:01:21] uh constraints or preferences so let's begin with f1 f1 is going to [00:01:24] so let's begin with f1 f1 is going to capture the fact that person 1 is [00:01:26] capture the fact that person 1 is definitely blue [00:01:28] definitely blue so [00:01:29] so i'm going to write f1 as [00:01:31] i'm going to write f1 as specifying as a table specifying for [00:01:34] specifying as a table specifying for each value [00:01:35] each value of x1 [00:01:37] of x1 i'm going to specify a number so f1 of [00:01:41] i'm going to specify a number so f1 of x1 [00:01:42] x1 is going to be 0 if x1 is red and it's [00:01:46] is going to be 0 if x1 is red and it's going to be 1 if x1 is b [00:01:49] going to be 1 if x1 is b or blue [00:01:50] or blue and this captures the fact that [00:01:52] and this captures the fact that 0 means [00:01:54] 0 means no way this is going to happen and 1 [00:01:56] no way this is going to happen and 1 means it's ok [00:01:58] means it's ok so mathematically i can write this [00:02:01] so mathematically i can write this factor f1 as [00:02:03] factor f1 as an indicator function of x1 equals b [00:02:07] an indicator function of x1 equals b so now i'm going to write these [00:02:08] so now i'm going to write these indicator functions without [00:02:10] indicator functions without usually you would write a 1 here on [00:02:12] usually you would write a 1 here on because it's just going to drop it [00:02:14] because it's just going to drop it for notational simplicity [00:02:18] so let's look at [00:02:21] so let's look at leaning red so this factor is going to [00:02:23] leaning red so this factor is going to be x f4 and this is also going to [00:02:26] be x f4 and this is also going to correspond to a table where for every [00:02:29] correspond to a table where for every possible value of x3 [00:02:31] possible value of x3 um i'm going to specify a value so r is [00:02:35] um i'm going to specify a value so r is going to be 2 and b is going to be 1. [00:02:37] going to be 2 and b is going to be 1. and mathematically this is going to be [00:02:40] and mathematically this is going to be f4 is equal to x3 equals r this [00:02:43] f4 is equal to x3 equals r this indicator a function plus the smoothing [00:02:46] indicator a function plus the smoothing constant of 1. so remember this [00:02:48] constant of 1. so remember this indicator is going to return one or zero [00:02:51] indicator is going to return one or zero depending on whether it's condition is [00:02:53] depending on whether it's condition is true or false and i'm adding one so i [00:02:55] true or false and i'm adding one so i offset that to a two or a one [00:02:58] offset that to a two or a one so intuitively you can think about this [00:03:00] so intuitively you can think about this as person three [00:03:02] as person three prefers [00:03:04] prefers r to be maybe twice as much [00:03:09] so now let's look at these other factors [00:03:11] so now let's look at these other factors so f2 is going to [00:03:14] so f2 is going to represent the fact that person 1 and [00:03:17] represent the fact that person 1 and person 2 have to agree [00:03:20] person 2 have to agree so again i'm going to look at all the [00:03:22] so again i'm going to look at all the possible assignments to the variables in [00:03:25] possible assignments to the variables in the scope of [00:03:27] the scope of f2 so these two variables x1 x2 [00:03:31] f2 so these two variables x1 x2 and for every value i'm going to assign [00:03:34] and for every value i'm going to assign a particular non-negative number so here [00:03:37] a particular non-negative number so here um rr i'm going to say that's one it's [00:03:40] um rr i'm going to say that's one it's okay they agree and if they don't agree [00:03:42] okay they agree and if they don't agree i'm going to return 0 because i really [00:03:44] i'm going to return 0 because i really don't like that and [00:03:46] don't like that and if they return b that's agree so that's [00:03:48] if they return b that's agree so that's a 1. [00:03:49] a 1. so more succinctly i can write this [00:03:51] so more succinctly i can write this factor as f2 as [00:03:53] factor as f2 as x1 equals x2 [00:03:56] x1 equals x2 and now finally [00:03:58] and now finally for f3 f3 is going to capture whether x2 [00:04:02] for f3 f3 is going to capture whether x2 and x3 tend to agree [00:04:05] and x3 tend to agree and this table is going to look like [00:04:07] and this table is going to look like this for x2 and x3 [00:04:10] this for x2 and x3 if they're both r i'm going to return 3 [00:04:13] if they're both r i'm going to return 3 if they're different i'm going to return [00:04:14] if they're different i'm going to return 2 [00:04:16] 2 and if they're all both b then i'm going [00:04:18] and if they're all both b then i'm going to return 3. [00:04:20] to return 3. so mathematically [00:04:21] so mathematically this factor is going to be [00:04:24] this factor is going to be indicated function of whether x2 equals [00:04:26] indicated function of whether x2 equals x3 [00:04:28] x3 plus a smoothing factor of 2 [00:04:30] plus a smoothing factor of 2 which makes it [00:04:31] which makes it instead of 1 0 0 1 that gives me 3 2 2 [00:04:35] instead of 1 0 0 1 that gives me 3 2 2 3. [00:04:36] 3. so there's a kind of a mild preference [00:04:39] so there's a kind of a mild preference for these two people to agree compared [00:04:42] for these two people to agree compared to not agree [00:04:44] to not agree so now if you click on this demo in the [00:04:46] so now if you click on this demo in the slides here this is going to take you to [00:04:48] slides here this is going to take you to a little javascript application here [00:04:51] a little javascript application here where you can actually write your own [00:04:54] where you can actually write your own fact graph [00:04:56] fact graph and we're going to come back to this um [00:04:58] and we're going to come back to this um later [00:05:00] later so this is the first example of a factor [00:05:02] so this is the first example of a factor graph capturing this simple voting [00:05:05] graph capturing this simple voting situation [00:05:06] situation so now let's look at a different example [00:05:09] so now let's look at a different example that we looked at the overview module so [00:05:11] that we looked at the overview module so this is map colony australia so remember [00:05:14] this is map colony australia so remember australia has these [00:05:16] australia has these seven beautiful provinces and each one [00:05:19] seven beautiful provinces and each one needs to be assigned a color [00:05:22] needs to be assigned a color so each of these provinces is going to [00:05:24] so each of these provinces is going to be represented as a variable [00:05:27] be represented as a variable and here i'm going to give every [00:05:30] and here i'm going to give every variable a name um wa for western [00:05:33] variable a name um wa for western australia nt for northern territory and [00:05:35] australia nt for northern territory and so on and i'm going to use big x usually [00:05:38] so on and i'm going to use big x usually to denote the set of all variables [00:05:41] to denote the set of all variables each variable is also going to take on [00:05:44] each variable is also going to take on a set of values [00:05:46] a set of values which in this case is going to be red [00:05:48] which in this case is going to be red green or blue [00:05:51] green or blue and now i'm going to define the factors [00:05:53] and now i'm going to define the factors of this factor graph [00:05:55] of this factor graph so for every two neighboring provinces i [00:05:59] so for every two neighboring provinces i want to say that they can't have the [00:06:01] want to say that they can't have the same color [00:06:02] same color so for example f1 is going to say wa and [00:06:08] so for example f1 is going to say wa and nt must be different [00:06:10] nt must be different that corresponds to this [00:06:12] that corresponds to this factor over here [00:06:14] factor over here f2 says ntnq must be different [00:06:18] f2 says ntnq must be different and that's going to correspond to this [00:06:20] and that's going to correspond to this factor here [00:06:21] factor here and so on and so forth [00:06:25] so now we're ready to formally define a [00:06:28] so now we're ready to formally define a factor graph so a factor graph [00:06:31] factor graph so a factor graph is uh going to consist of a set of [00:06:34] is uh going to consist of a set of variables um x1 through xn in the [00:06:37] variables um x1 through xn in the general case remember big x is going to [00:06:40] general case remember big x is going to denote the set of all variables [00:06:42] denote the set of all variables where each variable x i [00:06:44] where each variable x i takes on values in some set of possible [00:06:48] takes on values in some set of possible values known as the domain of variable i [00:06:52] values known as the domain of variable i and a factor graph also consists of a [00:06:55] and a factor graph also consists of a set of factors [00:06:57] set of factors generally denoted f1 through fm [00:07:00] generally denoted f1 through fm each fj is going to be a function [00:07:03] each fj is going to be a function that takes on um takes as input an [00:07:07] that takes on um takes as input an assignment [00:07:08] assignment to the variables and going to represent [00:07:11] to the variables and going to represent uh represent a return of a non-negative [00:07:14] uh represent a return of a non-negative number so it's really important that [00:07:16] number so it's really important that this function return a non-negative [00:07:18] this function return a non-negative number instead of a negative number [00:07:20] number instead of a negative number because later we'll see that we're going [00:07:22] because later we'll see that we're going to multiply them [00:07:24] to multiply them together [00:07:26] together so that's the definition of a factor [00:07:28] so that's the definition of a factor graph [00:07:30] graph so a bit of terminology here um so [00:07:34] so a bit of terminology here um so i'm going to define the scope of a [00:07:36] i'm going to define the scope of a factor [00:07:37] factor as the set of factors as a set of [00:07:40] as the set of factors as a set of variables it depends on [00:07:42] variables it depends on so in the map coloring example [00:07:45] so in the map coloring example the scope of f1 [00:07:47] the scope of f1 is going to be [00:07:48] is going to be simply um wa and nt [00:07:53] simply um wa and nt this corresponds to visually the set of [00:07:56] this corresponds to visually the set of variables that this [00:07:57] variables that this this factor is touching [00:08:00] this factor is touching um the arity of a factor is number of [00:08:03] um the arity of a factor is number of variables in the scope [00:08:06] variables in the scope so in this case you just count how many [00:08:08] so in this case you just count how many variables are here the answer is two [00:08:11] variables are here the answer is two um some shorthand notation so unary [00:08:14] um some shorthand notation so unary factors are ones that have area 1 and [00:08:17] factors are ones that have area 1 and binary factors are ones that have error [00:08:19] binary factors are ones that have error d2 [00:08:20] d2 and constraints are factors that return [00:08:24] and constraints are factors that return 0 or 1. [00:08:25] 0 or 1. so notice that factor can return any [00:08:27] so notice that factor can return any non-negative number but a special case [00:08:30] non-negative number but a special case is when it returns 0 1 which means yes [00:08:33] is when it returns 0 1 which means yes or no essentially [00:08:35] or no essentially and in this context [00:08:37] and in this context f1 is a binary [00:08:38] f1 is a binary constraint so one thing to remember [00:08:41] constraint so one thing to remember about factors is [00:08:43] about factors is that each factor usually depends only on [00:08:46] that each factor usually depends only on a [00:08:47] a subset of the variables [00:08:51] and not all the variables and this is [00:08:53] and not all the variables and this is going to be kind of important when we [00:08:55] going to be kind of important when we talk about an algorithmic efficiency [00:08:59] talk about an algorithmic efficiency so now we fully define [00:09:02] so now we fully define what a factor graph is [00:09:04] what a factor graph is i'm going to now talk about the notion [00:09:05] i'm going to now talk about the notion of assignment weight [00:09:08] of assignment weight so let's go back to the voting example [00:09:10] so let's go back to the voting example in the voting example we had a four [00:09:13] in the voting example we had a four factors corresponding to whether [00:09:16] factors corresponding to whether person one and person three were voting [00:09:19] person one and person three were voting a certain way and whether person one [00:09:20] a certain way and whether person one person two and person two and person [00:09:22] person two and person two and person three are agreed or not [00:09:25] three are agreed or not um so an assignment [00:09:27] um so an assignment is going to be um just [00:09:30] is going to be um just um assignment of values to each of the [00:09:33] um assignment of values to each of the the variables in this case there's three [00:09:35] the variables in this case there's three variables x1 x2 x3 [00:09:37] variables x1 x2 x3 and each assignment is going to be [00:09:39] and each assignment is going to be associated with a weight [00:09:41] associated with a weight so here's how the weight is going to be [00:09:42] so here's how the weight is going to be calculated i'm going to go through each [00:09:45] calculated i'm going to go through each of these factors and i'm going to plug [00:09:47] of these factors and i'm going to plug in this assignment and read out a [00:09:48] in this assignment and read out a particular number [00:09:50] particular number so let's take this factor f1 [00:09:52] so let's take this factor f1 so what is x1 it's r [00:09:55] so what is x1 it's r so i'm going to get a zero [00:09:58] what about this factor what is x1 and x2 [00:10:02] what about this factor what is x1 and x2 it's going to rr i'm going to return a [00:10:04] it's going to rr i'm going to return a 1. let me copy that down here well this [00:10:07] 1. let me copy that down here well this factor x2 and x3 are rr i'm going to get [00:10:10] factor x2 and x3 are rr i'm going to get a 3 [00:10:11] a 3 and finally the fourth factor f4 [00:10:15] and finally the fourth factor f4 what is x3 it's r so i'm going to read [00:10:18] what is x3 it's r so i'm going to read out a 2. [00:10:19] out a 2. and the [00:10:20] and the all these uh the outputs of the factors [00:10:23] all these uh the outputs of the factors are numbers i'm going to multiply all of [00:10:26] are numbers i'm going to multiply all of them together i'm going to get a weight [00:10:29] them together i'm going to get a weight and that weight in this case is 0. so [00:10:32] and that weight in this case is 0. so now you can go through all the other [00:10:33] now you can go through all the other possible assignments of values to all [00:10:36] possible assignments of values to all the variables [00:10:37] the variables in this case there are eight possible [00:10:39] in this case there are eight possible assignments [00:10:41] assignments and each of them is going to have a [00:10:43] and each of them is going to have a particular weight [00:10:46] particular weight so now let's look at the demo um if you [00:10:48] so now let's look at the demo um if you click step here that's going to run this [00:10:50] click step here that's going to run this inference algorithm and produce [00:10:54] inference algorithm and produce a wait for [00:10:56] a wait for every possible [00:10:58] every possible assignment that's um that has non-zero [00:11:01] assignment that's um that has non-zero weight so in this case we verify that [00:11:03] weight so in this case we verify that there is two possible [00:11:05] there is two possible um assignments [00:11:07] um assignments that have non-zero weight [00:11:09] that have non-zero weight assigning bbr and [00:11:12] assigning bbr and bbb [00:11:16] okay so now let's switch over again to [00:11:19] okay so now let's switch over again to the map coloring example just to see how [00:11:22] the map coloring example just to see how weights are computed here [00:11:24] weights are computed here so here is a possible [00:11:26] so here is a possible assignment [00:11:27] assignment and [00:11:28] and of course to provinces [00:11:31] of course to provinces so here [00:11:33] so here notationally i'm going to make a slight [00:11:35] notationally i'm going to make a slight change it's going to be sometimes [00:11:36] change it's going to be sometimes convenient to be representing [00:11:38] convenient to be representing assignments in this kind of dictionary [00:11:40] assignments in this kind of dictionary format where [00:11:42] format where the variables have names [00:11:44] the variables have names so here [00:11:46] so here i have [00:11:47] i have wa is assigned red [00:11:50] wa is assigned red nt assigned time green and so on and so [00:11:52] nt assigned time green and so on and so forth so literally you can think about [00:11:53] forth so literally you can think about this as a python dictionary if you like [00:11:57] this as a python dictionary if you like what is the weight of this assignment [00:11:59] what is the weight of this assignment well in this particular case [00:12:01] well in this particular case all [00:12:02] all neighbors [00:12:03] neighbors have different colors [00:12:05] have different colors and remember the each factor is just [00:12:07] and remember the each factor is just going to thumbs up [00:12:10] going to thumbs up return one if [00:12:12] return one if the two adjacent neighbors have this [00:12:15] the two adjacent neighbors have this different colors so i'm just gonna get [00:12:17] different colors so i'm just gonna get one times one times one and that's just [00:12:19] one times one times one and that's just one [00:12:20] one now consider alternative assignment [00:12:22] now consider alternative assignment where i've simply replaced nt [00:12:25] where i've simply replaced nt with red here so nt it becomes red [00:12:29] with red here so nt it becomes red and now we can see that the weight of [00:12:31] and now we can see that the weight of this altered assignment is going to be [00:12:34] this altered assignment is going to be zero because these two factors are going [00:12:37] zero because these two factors are going to evaluate to zero [00:12:39] to evaluate to zero these two here [00:12:41] these two here and one thing you might realize very [00:12:42] and one thing you might realize very quickly here is that [00:12:45] quickly here is that all it takes is for one [00:12:48] all it takes is for one factor to veto [00:12:50] factor to veto the entire assignment because we're [00:12:53] the entire assignment because we're multiplying if one of the factors [00:12:55] multiplying if one of the factors returns zero [00:12:56] returns zero then the product of all the factors is [00:12:59] then the product of all the factors is also going to be zero [00:13:04] so here is a general definition of [00:13:06] so here is a general definition of assignment weight [00:13:08] assignment weight assignment little x [00:13:10] assignment little x is going to be x1 through xn [00:13:14] is going to be x1 through xn has a weight [00:13:15] has a weight um [00:13:17] um and this weight [00:13:18] and this weight uh [00:13:19] uh is a function that takes an assignment [00:13:22] is a function that takes an assignment and returns [00:13:25] and returns the product over all possible factors [00:13:30] the product over all possible factors of [00:13:31] of the factor f j [00:13:33] the factor f j apply to [00:13:35] apply to an assignment [00:13:36] an assignment and here no even though [00:13:39] and here no even though each factor only depends on the subset [00:13:41] each factor only depends on the subset of variables i'm i'm kind of simplifying [00:13:44] of variables i'm i'm kind of simplifying notation by just passing in an entire [00:13:46] notation by just passing in an entire assignment in practice i would only pass [00:13:48] assignment in practice i would only pass in only the variables that are in the [00:13:50] in only the variables that are in the scope of fj [00:13:53] scope of fj so a bit of terminology an assignment is [00:13:56] so a bit of terminology an assignment is consistent [00:13:57] consistent if its weight is greater than [00:14:01] if its weight is greater than um zero [00:14:01] um zero weight can't be negative because all the [00:14:03] weight can't be negative because all the factors return non-negative numbers so a [00:14:05] factors return non-negative numbers so a weight is zero [00:14:07] weight is zero that means the assignment is [00:14:09] that means the assignment is inconsistent [00:14:11] inconsistent and the objective of a constrained [00:14:13] and the objective of a constrained satisfaction problem finally getting to [00:14:15] satisfaction problem finally getting to what the point of all this is is to find [00:14:18] what the point of all this is is to find the maximum weight assignment [00:14:19] the maximum weight assignment mathematically it's written arc max over [00:14:22] mathematically it's written arc max over all possible assignments x of weight of [00:14:24] all possible assignments x of weight of x [00:14:26] x and a constraint satisfaction problem is [00:14:28] and a constraint satisfaction problem is said to be satisfiable [00:14:31] said to be satisfiable if [00:14:32] if the weight of a maximum weight [00:14:34] the weight of a maximum weight assignment is greater than zero [00:14:37] assignment is greater than zero another way to say the same thing is [00:14:39] another way to say the same thing is whether there exists some consistent [00:14:41] whether there exists some consistent assignment [00:14:44] assignment and [00:14:45] and note one thing is that the weight here [00:14:48] note one thing is that the weight here in the context of factor graphs and [00:14:50] in the context of factor graphs and constraint satisfaction problems [00:14:52] constraint satisfaction problems are [00:14:53] are not the same as a weight in that we [00:14:56] not the same as a weight in that we study in machine learning those weights [00:14:58] study in machine learning those weights can be negative or non-negative um but [00:15:01] can be negative or non-negative um but these weights and concerns satisfaction [00:15:03] these weights and concerns satisfaction problems and factor graphs have to be [00:15:05] problems and factor graphs have to be non-negative [00:15:07] non-negative one other [00:15:08] one other small comment is that here we are [00:15:10] small comment is that here we are actually defining a slight [00:15:12] actually defining a slight generalization of constraint [00:15:13] generalization of constraint satisfaction problems where factors can [00:15:16] satisfaction problems where factors can actually [00:15:17] actually um [00:15:19] um have not just zero or one [00:15:22] have not just zero or one weights uh but actually [00:15:24] weights uh but actually any non-negative value [00:15:29] so constraint satisfaction problems um [00:15:32] so constraint satisfaction problems um actually is a general umbrella term that [00:15:35] actually is a general umbrella term that captures several important [00:15:37] captures several important cases [00:15:38] cases so the first is boolean satisfiability [00:15:41] so the first is boolean satisfiability problems otherwise known as sat so in [00:15:43] problems otherwise known as sat so in these cases um the variables are boolean [00:15:46] these cases um the variables are boolean valued and the factors are a logical [00:15:48] valued and the factors are a logical formula such as x1 or not x2 or x5 [00:15:53] formula such as x1 or not x2 or x5 so satisfiability problems are [00:15:56] so satisfiability problems are become mp complete problems which means [00:15:58] become mp complete problems which means that in the worst case they're really [00:16:00] that in the worst case they're really really hard and we don't have efficient [00:16:01] really hard and we don't have efficient algorithms for solving them [00:16:04] algorithms for solving them but [00:16:05] but in practice [00:16:06] in practice it turns out that we've there's been an [00:16:08] it turns out that we've there's been an extraordinary amount of progress in set [00:16:11] extraordinary amount of progress in set solving [00:16:12] solving and we can actually routinely solve us [00:16:14] and we can actually routinely solve us sad problems with many many more [00:16:17] sad problems with many many more variables than we might be able to [00:16:19] variables than we might be able to predict uh by theory alone [00:16:24] so there's a joke that says you know [00:16:26] so there's a joke that says you know theoreticians reduce a problem to set if [00:16:28] theoreticians reduce a problem to set if they want to show that it's a hard to [00:16:30] they want to show that it's a hard to solve and practitioners reduce a problem [00:16:32] solve and practitioners reduce a problem to set if they want to solve the problem [00:16:36] to set if they want to solve the problem another class of problems that are is [00:16:38] another class of problems that are is important is linear programming and this [00:16:41] important is linear programming and this in linear programs the variables are [00:16:43] in linear programs the variables are real valued numbers and the factors are [00:16:46] real valued numbers and the factors are a linear inequality such as x2 plus x3 [00:16:50] a linear inequality such as x2 plus x3 x5 less than or equal to one [00:16:53] x5 less than or equal to one and despite the fact that variables can [00:16:55] and despite the fact that variables can take on an infinite number of values [00:16:57] take on an infinite number of values linear programs have the special [00:16:59] linear programs have the special structure that makes them especially [00:17:00] structure that makes them especially efficient to solve and there's been a [00:17:02] efficient to solve and there's been a lot of work in [00:17:04] lot of work in solving linear programs efficiently [00:17:08] solving linear programs efficiently integer linear programs are same as [00:17:10] integer linear programs are same as linear programs except for [00:17:12] linear programs except for the variables are integer valued [00:17:15] the variables are integer valued and this is the fact that they're [00:17:17] and this is the fact that they're integer values makes these incredibly [00:17:19] integer values makes these incredibly hard again just like satisfiability [00:17:22] hard again just like satisfiability problems mixed interior linear programs [00:17:24] problems mixed interior linear programs are [00:17:25] are um are problems where variables are [00:17:28] um are problems where variables are reals and integers [00:17:30] reals and integers um and these problems are also hard to [00:17:32] um and these problems are also hard to solve [00:17:35] so in summary [00:17:37] so in summary we formally defined the notion of a [00:17:39] we formally defined the notion of a factor graph which includes variables [00:17:42] factor graph which includes variables and [00:17:44] and factors so variables specify [00:17:46] factors so variables specify unknown quantities that we need to [00:17:48] unknown quantities that we need to ascertain [00:17:49] ascertain and factors specify [00:17:51] and factors specify preferences or constraints for partial [00:17:54] preferences or constraints for partial assignments [00:17:56] assignments and one thing [00:17:57] and one thing that's special about factor graphs is [00:17:59] that's special about factor graphs is that you're specifying constraints and [00:18:01] that you're specifying constraints and preferences in a local way [00:18:03] preferences in a local way so suppose you're modeling [00:18:05] so suppose you're modeling and [00:18:07] and you think of a particular preference [00:18:09] you think of a particular preference that you have you can just simply write [00:18:11] that you have you can just simply write down a factor [00:18:13] down a factor in terms of the variables that are that [00:18:15] in terms of the variables that are that matter and throw that factor into the [00:18:18] matter and throw that factor into the constraint satisfactoring problem [00:18:21] constraint satisfactoring problem and now the hard work comes in actually [00:18:24] and now the hard work comes in actually processing all these set of factors [00:18:27] processing all these set of factors so a key definition is the weight of a [00:18:30] so a key definition is the weight of a possible an assignment [00:18:32] possible an assignment is the product of all the factors and [00:18:34] is the product of all the factors and this is where all the magic happens this [00:18:35] this is where all the magic happens this is where [00:18:36] is where the [00:18:37] the you have to think globally about all the [00:18:40] you have to think globally about all the factors together [00:18:43] factors together and [00:18:44] and the point of a constraint satisfaction [00:18:46] the point of a constraint satisfaction problem again is to find the maximum [00:18:48] problem again is to find the maximum weight assignment and this is again [00:18:51] weight assignment and this is again something that requires global reasoning [00:18:53] something that requires global reasoning over all the factors [00:18:55] over all the factors and so the model here to remember is [00:18:58] and so the model here to remember is specify locally if you're modeling and [00:19:00] specify locally if you're modeling and optimize globally which is what the [00:19:02] optimize globally which is what the inference algorithm will do [00:19:05] inference algorithm will do that's the end of this module [00:19:11] you ================================================================================ LECTURE 025 ================================================================================ Constraint Satisfaction Problems (CSPs) 3 - Examples | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=Tu6BiZhMDCc --- Transcript [00:00:05] hi in this module i'm going to show you [00:00:07] hi in this module i'm going to show you how you can take some real world [00:00:08] how you can take some real world problems and model them as constraint [00:00:10] problems and model them as constraint satisfaction problems [00:00:12] satisfaction problems so we'll begin with our first example [00:00:15] so we'll begin with our first example so lsat is the standardized test for a [00:00:18] so lsat is the standardized test for a mission into law school and it features [00:00:20] mission into law school and it features these logic puzzles so here's one [00:00:23] these logic puzzles so here's one example of a logic puzzle so imagine you [00:00:25] example of a logic puzzle so imagine you have three sculptures a b and c [00:00:28] have three sculptures a b and c that are to be exhibited in two rooms [00:00:30] that are to be exhibited in two rooms one or two of an art gallery [00:00:32] one or two of an art gallery so the exhibition has [00:00:34] so the exhibition has imposed a certain number of conditions [00:00:36] imposed a certain number of conditions on you [00:00:37] on you so sculptures a and b cannot be in the [00:00:40] so sculptures a and b cannot be in the same room [00:00:41] same room sculptures b and c must be in the same [00:00:43] sculptures b and c must be in the same room [00:00:44] room and room two can hold [00:00:46] and room two can hold only one sculpture [00:00:49] only one sculpture so how do you model this as a constraint [00:00:51] so how do you model this as a constraint satisfaction problem let's do it via [00:00:54] satisfaction problem let's do it via this javascript demo [00:00:56] this javascript demo erase that and start over [00:00:58] erase that and start over so [00:00:59] so um the first thing you want to do when [00:01:01] um the first thing you want to do when you model is you figure out what the [00:01:03] you model is you figure out what the variables are [00:01:04] variables are so looking back here we want to [00:01:07] so looking back here we want to put the three sculptures in rooms so [00:01:10] put the three sculptures in rooms so let's just define a variable for each of [00:01:13] let's just define a variable for each of these sculptures [00:01:15] these sculptures so [00:01:15] so in this javascript demo i'm going to [00:01:17] in this javascript demo i'm going to define a variable [00:01:19] define a variable a [00:01:20] a and the domain of that a is either one [00:01:23] and the domain of that a is either one or two depending on what [00:01:25] or two depending on what room [00:01:26] room that sculpture a should be placed in [00:01:29] that sculpture a should be placed in and i hit step and [00:01:31] and i hit step and i get this [00:01:33] i get this actually [00:01:35] actually a variable and i can mouse over and can [00:01:37] a variable and i can mouse over and can see the domain of that variable [00:01:40] see the domain of that variable okay so now i can do it for the other [00:01:43] okay so now i can do it for the other two [00:01:44] two sculptures b [00:01:45] sculptures b and c [00:01:47] and c um and you'll see that now i have three [00:01:49] um and you'll see that now i have three variables a b and c each of which can [00:01:52] variables a b and c each of which can take on values one or two [00:01:55] take on values one or two so now let me define the factors [00:01:58] so now let me define the factors so i'm going to define a factor for each [00:02:00] so i'm going to define a factor for each of these three conditions usually each [00:02:01] of these three conditions usually each condition corresponds to a factor but as [00:02:03] condition corresponds to a factor but as we'll see later that's not always the [00:02:05] we'll see later that's not always the case [00:02:06] case so the first condition says that [00:02:07] so the first condition says that sculptures a and b cannot be in the same [00:02:10] sculptures a and b cannot be in the same room [00:02:12] room so this naturally is a factor that um [00:02:17] so this naturally is a factor that um touches variables a and b [00:02:19] touches variables a and b so i'm going to call that factor a b [00:02:22] so i'm going to call that factor a b it's a scope is [00:02:24] it's a scope is variables a and b [00:02:26] variables a and b and remember a factor is a function [00:02:29] and remember a factor is a function that takes on an assignment to the [00:02:31] that takes on an assignment to the variables in that scope [00:02:34] variables in that scope a and b in this case [00:02:35] a and b in this case and it's going to return a non-negative [00:02:38] and it's going to return a non-negative number [00:02:40] number in this case i want it to be that case [00:02:42] in this case i want it to be that case that a and b are not in the same room so [00:02:45] that a and b are not in the same room so i'm going to return a not equal to b [00:02:50] i'm going to return a not equal to b so [00:02:51] so um if i hit enter that is going to give [00:02:53] um if i hit enter that is going to give me this factor and i can check its table [00:02:56] me this factor and i can check its table that says 1 2 is good 2 1 is not [00:03:00] that says 1 2 is good 2 1 is not uh 2 1 is also good but [00:03:03] uh 2 1 is also good but 1 1 and 2 2 are not good [00:03:07] so now i'm going to move on to the [00:03:08] so now i'm going to move on to the second condition sculptures b and c must [00:03:11] second condition sculptures b and c must be in the same room [00:03:12] be in the same room so this is similar [00:03:14] so this is similar but now applied to b and c [00:03:18] but now applied to b and c um [00:03:21] um b and c and they have to be in the same [00:03:23] b and c and they have to be in the same room so i'm just going to return b [00:03:25] room so i'm just going to return b equals c [00:03:28] equals c so i'm going to check that that factor [00:03:29] so i'm going to check that that factor does what i wanted to do so it's happy [00:03:33] does what i wanted to do so it's happy with one one and two two [00:03:36] with one one and two two which is good [00:03:37] which is good and now what about the final condition [00:03:39] and now what about the final condition so room two can hold only one sculpture [00:03:42] so room two can hold only one sculpture this one's a little bit tricky because [00:03:44] this one's a little bit tricky because it doesn't mention [00:03:46] it doesn't mention sculptures exactly it mentions only the [00:03:48] sculptures exactly it mentions only the room [00:03:49] room but here what it really means is that i [00:03:51] but here what it really means is that i have to look at all [00:03:54] have to look at all the sculpture variables [00:03:56] the sculpture variables so i'm going to define a factor let's [00:03:57] so i'm going to define a factor let's call it r2 [00:03:59] call it r2 which depends on all the variables here [00:04:03] which depends on all the variables here and i'm going to need to figure out [00:04:06] and i'm going to need to figure out whether room 2 has [00:04:09] whether room 2 has at most one sculpture so let's keep a [00:04:12] at most one sculpture so let's keep a counter [00:04:13] counter and we're going to go through all the [00:04:16] and we're going to go through all the the sculptures and see if sculpture a is [00:04:19] the sculptures and see if sculpture a is in room 2 if it is i'm going to [00:04:21] in room 2 if it is i'm going to increment the counter [00:04:22] increment the counter if [00:04:23] if sculpture b is in room 2 i'm going to [00:04:26] sculpture b is in room 2 i'm going to increment the counter if sculpture c is [00:04:29] increment the counter if sculpture c is in room through i'm going to increment [00:04:31] in room through i'm going to increment the counter [00:04:32] the counter and now i'm going to return whether the [00:04:34] and now i'm going to return whether the number of sculptures in room 2 which is [00:04:37] number of sculptures in room 2 which is now n [00:04:39] now n is at most 1. [00:04:42] is at most 1. okay so i make that factor [00:04:44] okay so i make that factor and i can see that [00:04:47] and i can see that this factor is happy if [00:04:49] this factor is happy if at most [00:04:51] at most one [00:04:52] one sculpture is in room 2 or 0. [00:04:56] sculpture is in room 2 or 0. okay so now [00:04:58] okay so now i have defined my constraint [00:05:00] i have defined my constraint satisfaction problem or factor graph set [00:05:03] satisfaction problem or factor graph set of variables set of factors [00:05:06] of variables set of factors and now if i press step [00:05:08] and now if i press step then it will magically solve [00:05:12] then it will magically solve the the csv [00:05:13] the the csv and here there is one [00:05:15] and here there is one satisfying assignment which assigns um a [00:05:19] satisfying assignment which assigns um a to room two [00:05:20] to room two um signs b and c to room one [00:05:25] um signs b and c to room one okay so that's our first example of [00:05:28] okay so that's our first example of solving a constraint satisfaction [00:05:30] solving a constraint satisfaction problem [00:05:32] um here is another example from object [00:05:35] um here is another example from object tracking so suppose you're trying to [00:05:37] tracking so suppose you're trying to build an autonomous driving system [00:05:39] build an autonomous driving system you want to track where objects are such [00:05:42] you want to track where objects are such as cars and pedestrians so you know [00:05:43] as cars and pedestrians so you know where not to drive [00:05:45] where not to drive so we're going to [00:05:47] so we're going to work with a very simplified [00:05:49] work with a very simplified setup here [00:05:51] setup here and here the setting is we're going to [00:05:52] and here the setting is we're going to have a number of discrete time steps 0 1 [00:05:56] have a number of discrete time steps 0 1 2 [00:05:57] 2 3 4. and in each time step we're going [00:06:00] 3 4. and in each time step we're going to have a sensor observation that tells [00:06:03] to have a sensor observation that tells us [00:06:03] us um a noisy indicator of the position of [00:06:06] um a noisy indicator of the position of a particular object so maybe at time [00:06:08] a particular object so maybe at time step one i'm going to observe that the [00:06:11] step one i'm going to observe that the object was at [00:06:13] object was at zero [00:06:14] zero and that's time step two [00:06:17] and that's time step two um i get observation of two and at times [00:06:19] um i get observation of two and at times f3 i'm going to get observation of [00:06:22] f3 i'm going to get observation of 2. so the noisy centers report these [00:06:25] 2. so the noisy centers report these positions 0 to 2. [00:06:28] positions 0 to 2. and we know that objects [00:06:30] and we know that objects can't teleport [00:06:32] can't teleport um so the question is what trajectory [00:06:36] um so the question is what trajectory did the object take [00:06:39] did the object take did it do something like this and [00:06:42] did it do something like this and actually the sensor readings were [00:06:44] actually the sensor readings were correct or maybe it did something like [00:06:45] correct or maybe it did something like that or something completely different [00:06:50] so how do we do this [00:06:52] so how do we do this we're going to set up a object tracking [00:06:55] we're going to set up a object tracking csp [00:06:56] csp so let's first define a factor graph [00:06:59] so let's first define a factor graph the variables of the factor graph are [00:07:02] the variables of the factor graph are going to include the position of the [00:07:04] going to include the position of the object at each time step um [00:07:07] object at each time step um one two or three there's three time [00:07:09] one two or three there's three time steps [00:07:10] steps and the domain of each variable is a [00:07:13] and the domain of each variable is a zero one or two so object could be in [00:07:15] zero one or two so object could be in position zero one or two so x i [00:07:19] position zero one or two so x i represents the true position of the [00:07:21] represents the true position of the object at time step i [00:07:24] object at time step i so now we're going to define a bunch of [00:07:27] so now we're going to define a bunch of observation factors and this is going to [00:07:29] observation factors and this is going to be attempting to [00:07:31] be attempting to incorporate the sensor information into [00:07:34] incorporate the sensor information into the problem [00:07:36] the problem so remember um at time step [00:07:39] so remember um at time step one we observed that the object was zero [00:07:41] one we observed that the object was zero of course this is noisy so we don't want [00:07:43] of course this is noisy so we don't want to trust it completely we're going to [00:07:45] to trust it completely we're going to find an observation factor o1 that [00:07:48] find an observation factor o1 that captures this [00:07:49] captures this so o1 is going to be a unifac urinary [00:07:52] so o1 is going to be a unifac urinary factor depends on only on x1 [00:07:55] factor depends on only on x1 and it's going to highly favor assigning [00:07:59] and it's going to highly favor assigning x1 to 0 which is the actual observation [00:08:02] x1 to 0 which is the actual observation but if x1 might be at 1 which is a [00:08:06] but if x1 might be at 1 which is a neighboring location [00:08:08] neighboring location that's going to have a weight of 1. [00:08:10] that's going to have a weight of 1. and if the object is too [00:08:13] and if the object is too far away [00:08:14] far away then a 2 then i'm going to say that's [00:08:16] then a 2 then i'm going to say that's disallowed so whenever you see a 0 [00:08:19] disallowed so whenever you see a 0 weight [00:08:20] weight a factor returning 0 that's saying [00:08:23] a factor returning 0 that's saying that's a [00:08:24] that's a veto okay so um [00:08:28] veto okay so um o2 is similar but applied to um [00:08:31] o2 is similar but applied to um you know x2 [00:08:34] you know x2 x2 is remember the position of the [00:08:36] x2 is remember the position of the object at time step two [00:08:38] object at time step two um and it's going to favor x2 being 2 [00:08:42] um and it's going to favor x2 being 2 and degrade if it's a 1 away and forbid [00:08:45] and degrade if it's a 1 away and forbid it if it's 2 away [00:08:48] it if it's 2 away x 3 is also similar but applied to x3 [00:08:53] x 3 is also similar but applied to x3 which is the object position at times f3 [00:08:56] which is the object position at times f3 it's going to favor x3 equal to but also [00:08:59] it's going to favor x3 equal to but also going to degrade and forbid if it's too [00:09:01] going to degrade and forbid if it's too far away [00:09:02] far away okay so we have three observation [00:09:04] okay so we have three observation factors that capture the sensor readings [00:09:08] factors that capture the sensor readings and now we're going to define [00:09:10] and now we're going to define transition factors which represent the [00:09:12] transition factors which represent the fact that objects positions can't change [00:09:15] fact that objects positions can't change too much or in other words objects can't [00:09:17] too much or in other words objects can't teleport [00:09:20] teleport so here we're going to write this factor [00:09:21] so here we're going to write this factor a little bit differently [00:09:23] a little bit differently it's going to be a little bit more [00:09:24] it's going to be a little bit more compact so we're going to look at the [00:09:26] compact so we're going to look at the absolute difference between [00:09:28] absolute difference between an [00:09:29] an object at [00:09:30] object at time step i and an object at the next [00:09:33] time step i and an object at the next time step i plus [00:09:35] time step i plus and if the object hasn't moved which [00:09:38] and if the object hasn't moved which means that the difference is zero i'm [00:09:40] means that the difference is zero i'm going to assign a weight of two [00:09:42] going to assign a weight of two if it's moved by one then i'm going to [00:09:44] if it's moved by one then i'm going to assign a weight of one and if it's moved [00:09:47] assign a weight of one and if it's moved by two i'm going to sign a weight of [00:09:49] by two i'm going to sign a weight of zero which is you know disallowing it [00:09:52] zero which is you know disallowing it okay so this concludes the definition of [00:09:55] okay so this concludes the definition of the constraint satisfaction problem for [00:09:57] the constraint satisfaction problem for this simple object tracking [00:09:59] this simple object tracking example [00:10:00] example if i click on the demo [00:10:02] if i click on the demo i'll show [00:10:03] i'll show i can see that the [00:10:05] i can see that the what the cfc looks like in javascript [00:10:07] what the cfc looks like in javascript code i've defined three variables x1 x2 [00:10:10] code i've defined three variables x1 x2 x3 [00:10:11] x3 i'm going to define this helper function [00:10:13] i'm going to define this helper function nearby [00:10:14] nearby that returns 2 if a and b are equal 1 if [00:10:18] that returns 2 if a and b are equal 1 if they're 1 apart and 0 if they're 2 apart [00:10:22] they're 1 apart and 0 if they're 2 apart and then i'm going to define these [00:10:23] and then i'm going to define these factors o1 [00:10:25] factors o1 o2 o3 and [00:10:28] o2 o3 and t1 and t2 [00:10:30] t1 and t2 so if i um [00:10:33] so if i um solve this [00:10:34] solve this csp this will return all the [00:10:38] csp this will return all the the set of [00:10:40] the set of non-zero [00:10:41] non-zero weight assignments and i'll see the [00:10:43] weight assignments and i'll see the maximum weight assignment is one two two [00:10:47] maximum weight assignment is one two two so this is a solution to a csp it's [00:10:50] so this is a solution to a csp it's assigning x one one x two two and x [00:10:53] assigning x one one x two two and x three two [00:10:54] three two um looking at this picture it's uh one [00:10:59] um looking at this picture it's uh one two two so [00:11:01] two two so we think that the object probably took [00:11:03] we think that the object probably took this path [00:11:07] okay so [00:11:08] okay so that's the end of this example [00:11:11] that's the end of this example so now let's look at a third example [00:11:15] so now let's look at a third example event scheduling so csps are really [00:11:17] event scheduling so csps are really suited for generally scheduling problems [00:11:20] suited for generally scheduling problems so here is an example of a simple [00:11:22] so here is an example of a simple scheduling problem so you have a set of [00:11:25] scheduling problem so you have a set of events [00:11:26] events and that need to be assigned into a [00:11:29] and that need to be assigned into a number of time slots [00:11:32] number of time slots so the events are numbered one through e [00:11:34] so the events are numbered one through e and the time slots are number one [00:11:35] and the time slots are number one through t [00:11:37] through t um so we have three conditions here the [00:11:39] um so we have three conditions here the first condition is that each event must [00:11:41] first condition is that each event must be put in exactly one time slot [00:11:44] be put in exactly one time slot condition two says that each time slot [00:11:46] condition two says that each time slot can have at most one event so you can't [00:11:49] can have at most one event so you can't double book two events into one time [00:11:50] double book two events into one time slot [00:11:51] slot and then condition three says [00:11:53] and then condition three says that event [00:11:56] that event e is allowed in time slot t [00:11:59] e is allowed in time slot t only if [00:12:00] only if this pair is exists in a set of allowed [00:12:03] this pair is exists in a set of allowed pairs so i can visualize a as a set of [00:12:06] pairs so i can visualize a as a set of edges between the events and the time [00:12:08] edges between the events and the time slots and here is one [00:12:12] slots and here is one possible assignment i assign [00:12:15] possible assignment i assign event 1 to time slot 2 [00:12:17] event 1 to time slot 2 assign [00:12:19] assign event 2 to time slot 1 and assign event [00:12:22] event 2 to time slot 1 and assign event 3 to time slot 3. [00:12:24] 3 to time slot 3. notice that i can't assign event to time [00:12:27] notice that i can't assign event to time slot 2 because that would violate [00:12:30] slot 2 because that would violate c3 there's no edge between [00:12:33] c3 there's no edge between event 2 and time slot 2. [00:12:36] event 2 and time slot 2. okay so how are we going to model this [00:12:38] okay so how are we going to model this as a csp i'm actually going to show you [00:12:40] as a csp i'm actually going to show you not one but two [00:12:42] not one but two possible formulations of the csp which [00:12:44] possible formulations of the csp which goes to show that there's some [00:12:46] goes to show that there's some flexibility or you can say artistic [00:12:49] flexibility or you can say artistic you know license in terms of how you [00:12:52] you know license in terms of how you decide to [00:12:54] decide to formulate problems with csps [00:12:58] formulate problems with csps okay so the first formulation is going [00:13:02] okay so the first formulation is going to be looking at it from the events [00:13:05] to be looking at it from the events perspective [00:13:06] perspective so here i'm going to define a set of [00:13:08] so here i'm going to define a set of variables for each variable each event e [00:13:11] variables for each variable each event e i'm going to define a variable [00:13:14] i'm going to define a variable x e [00:13:15] x e and the domain of [00:13:18] and the domain of xe is going to be [00:13:20] xe is going to be some integer 1 through t [00:13:22] some integer 1 through t so notice here that right off the bat [00:13:24] so notice here that right off the bat i've satisfied condition c1 [00:13:27] i've satisfied condition c1 because in a csp [00:13:29] because in a csp every variable has to take on exactly [00:13:30] every variable has to take on exactly one value [00:13:32] one value and so that means that each [00:13:35] and so that means that each event will be put in exactly one time [00:13:37] event will be put in exactly one time slot [00:13:38] slot so what about c2 now i have to do [00:13:41] so what about c2 now i have to do something for c2 [00:13:42] something for c2 notice that c2 is in terms of time slots [00:13:45] notice that c2 is in terms of time slots but you know our variables are in terms [00:13:47] but you know our variables are in terms of events [00:13:49] of events so if you remember from the lz puzzle [00:13:51] so if you remember from the lz puzzle that means we implicitly have to define [00:13:54] that means we implicitly have to define a factor that looks at all [00:13:56] a factor that looks at all possible variables here [00:13:59] possible variables here so i'm going to define a constraint [00:14:02] so i'm going to define a constraint which is for [00:14:04] which is for on every pair of events um [00:14:07] on every pair of events um i'm going to make sure that [00:14:10] i'm going to make sure that um the [00:14:11] um the the time slot that event e was assigned [00:14:14] the time slot that event e was assigned is not the same as the [00:14:16] is not the same as the time slot that event e prime was [00:14:18] time slot that event e prime was assigned [00:14:19] assigned so if i check this for all pairs of [00:14:21] so if i check this for all pairs of events [00:14:22] events now i've satisfied a c2 i can guarantee [00:14:26] now i've satisfied a c2 i can guarantee that no time slot has two events piling [00:14:29] that no time slot has two events piling onto it [00:14:31] onto it okay so now what about c3 so each event [00:14:35] okay so now what about c3 so each event must be only allowed in certain time [00:14:37] must be only allowed in certain time slots [00:14:39] slots so here again i'm going to look at each [00:14:42] so here again i'm going to look at each possible event [00:14:43] possible event and i'm simply going to enforce that [00:14:46] and i'm simply going to enforce that whatever time slot event e was assigned [00:14:49] whatever time slot event e was assigned that's denoted x e [00:14:51] that's denoted x e that pair is in the set of allowed event [00:14:55] that pair is in the set of allowed event time slot pairs [00:14:57] time slot pairs and that's enough to satisfy condition [00:14:59] and that's enough to satisfy condition three [00:15:01] three okay so that's the first formulation of [00:15:03] okay so that's the first formulation of the csp [00:15:05] the csp so now let's look at [00:15:07] so now let's look at alternative formulation [00:15:09] alternative formulation so now i'm going to look at from the [00:15:11] so now i'm going to look at from the perspective of time slots [00:15:13] perspective of time slots so here i'm going to define a variable [00:15:16] so here i'm going to define a variable um yt for every possible time slot t [00:15:19] um yt for every possible time slot t and yt can take on a [00:15:22] and yt can take on a value which is either one of the [00:15:24] value which is either one of the possible events or [00:15:27] possible events or no which means that no events have been [00:15:29] no which means that no events have been assigned to that time slot [00:15:31] assigned to that time slot so notice here right off the bat i've [00:15:33] so notice here right off the bat i've satisfied condition 2 [00:15:36] satisfied condition 2 because [00:15:37] because remember every variable gets assigned [00:15:39] remember every variable gets assigned one [00:15:40] one exactly one value which is either going [00:15:43] exactly one value which is either going to be event or no event so you can't [00:15:45] to be event or no event so you can't possibly assign two events to a time [00:15:48] possibly assign two events to a time slot [00:15:49] slot now now we have to deal with [00:15:52] now now we have to deal with condition one so how do we deal with it [00:15:56] condition one so how do we deal with it so here [00:15:57] so here um [00:15:58] um all variables are in terms of time slots [00:16:01] all variables are in terms of time slots but uh condition one is in terms of [00:16:02] but uh condition one is in terms of events so again we're going to have to [00:16:04] events so again we're going to have to define [00:16:05] define um a constraint that touches all the [00:16:08] um a constraint that touches all the variables [00:16:09] variables so for every event here [00:16:11] so for every event here i need to enforce that [00:16:14] i need to enforce that if i look over all the time slots [00:16:18] if i look over all the time slots that [00:16:20] that that event shows up exactly once [00:16:23] that event shows up exactly once so what this is saying is that [00:16:25] so what this is saying is that this factor looks at all y1 [00:16:28] this factor looks at all y1 through [00:16:29] through yt [00:16:31] yt and checks that [00:16:33] and checks that yt equals [00:16:35] yt equals y little t equals e for exactly one of [00:16:37] y little t equals e for exactly one of the possibilities [00:16:39] the possibilities so this will check the box for c1 [00:16:42] so this will check the box for c1 and now c3 is similar to before [00:16:45] and now c3 is similar to before so for every time slot we're going to [00:16:48] so for every time slot we're going to enforce that either [00:16:49] enforce that either nothing was selected at that time slot [00:16:51] nothing was selected at that time slot or if something were a schedule that [00:16:54] or if something were a schedule that that event [00:16:56] that event and that time slot are compatible [00:16:59] and that time slot are compatible okay so that concludes the definition of [00:17:03] okay so that concludes the definition of the second formulation [00:17:05] the second formulation and now one might wonder which one is [00:17:07] and now one might wonder which one is better [00:17:09] better and this is um [00:17:11] and this is um you know a matter of efficiency and [00:17:13] you know a matter of efficiency and there's various trade-offs which are [00:17:15] there's various trade-offs which are discussed more in the notes [00:17:19] okay so here is a final example of a csp [00:17:24] okay so here is a final example of a csp which is going to be a little bit [00:17:25] which is going to be a little bit different and um so it will be kind of [00:17:28] different and um so it will be kind of interesting [00:17:29] interesting so this is program verification [00:17:32] so this is program verification so everyone writes programs and you're [00:17:34] so everyone writes programs and you're probably used to the idea of writing [00:17:35] probably used to the idea of writing unit tests to check whether a program is [00:17:39] unit tests to check whether a program is correct [00:17:40] correct but just because you program past a [00:17:42] but just because you program past a bunch of tests doesn't actually [00:17:44] bunch of tests doesn't actually guarantee that it's correct because [00:17:45] guarantee that it's correct because you're never sure that you covered all [00:17:47] you're never sure that you covered all the cases [00:17:48] the cases so behind the idea behind program [00:17:50] so behind the idea behind program verification is to prove that your [00:17:53] verification is to prove that your program works for all possible inputs [00:17:56] program works for all possible inputs so let's work through a simple example [00:17:58] so let's work through a simple example suppose you have this program foo which [00:18:00] suppose you have this program foo which takes in two values x and y [00:18:03] takes in two values x and y and it computes the following [00:18:05] and it computes the following so it's gonna assign [00:18:07] so it's gonna assign x times x to a [00:18:09] x times x to a it's going to add y times y to a and [00:18:12] it's going to add y times y to a and then assign that to b [00:18:14] then assign that to b and then it's going to subtract this [00:18:16] and then it's going to subtract this quantity and assigned to the c and [00:18:18] quantity and assigned to the c and return c [00:18:20] return c so [00:18:21] so the thing i want to prove here is this [00:18:23] the thing i want to prove here is this following specification that c [00:18:26] following specification that c is greater than u or equal to zero no [00:18:29] is greater than u or equal to zero no matter what value x and y take [00:18:32] matter what value x and y take so here is how i'm going to specify the [00:18:36] so here is how i'm going to specify the csp [00:18:37] csp i'm going to define a set of variables [00:18:40] i'm going to define a set of variables that corresponds to both the inputs and [00:18:43] that corresponds to both the inputs and also the intermediate quantities that [00:18:45] also the intermediate quantities that are computed along the way so x y a b [00:18:47] are computed along the way so x y a b and c [00:18:49] and c and now i'm going to define a set of [00:18:50] and now i'm going to define a set of constraints [00:18:52] constraints corresponding to the program statements [00:18:54] corresponding to the program statements which are going to relate these [00:18:55] which are going to relate these variables [00:18:57] variables and so for the first constraint i'm [00:18:59] and so for the first constraint i'm going to have a equals x squared which [00:19:02] going to have a equals x squared which captures what this first statement is [00:19:05] captures what this first statement is doing [00:19:06] doing i'm going to have b equals a plus y [00:19:08] i'm going to have b equals a plus y squared which is going to capture the [00:19:10] squared which is going to capture the second program statement and c equals b [00:19:12] second program statement and c equals b minus 2 x y which is going to capture [00:19:14] minus 2 x y which is going to capture the third [00:19:16] the third program statement [00:19:18] program statement so an important but really subtle note [00:19:21] so an important but really subtle note is that equals means two things here [00:19:24] is that equals means two things here so in the python program [00:19:26] so in the python program equal is an assignment operator it says [00:19:29] equal is an assignment operator it says take the right-hand side compute its [00:19:31] take the right-hand side compute its value and then put it in the variable [00:19:34] value and then put it in the variable that is on the left-hand side [00:19:36] that is on the left-hand side whereas in the csp [00:19:38] whereas in the csp equal represents mathematical equality [00:19:42] equal represents mathematical equality it's saying whether um the left-hand [00:19:44] it's saying whether um the left-hand side is equal to the right-hand side [00:19:48] side is equal to the right-hand side so [00:19:49] so you'll remember what this factor don't [00:19:52] you'll remember what this factor don't be c deceived what the [00:19:54] be c deceived what the by the looks of this actor is actually a [00:19:56] by the looks of this actor is actually a function [00:19:58] function that takes in a value of a [00:20:00] that takes in a value of a and a value of x and checks whether a [00:20:03] and a value of x and checks whether a equals x squared it returns a 1 or a 0. [00:20:06] equals x squared it returns a 1 or a 0. so it's doing checking whereas a equals [00:20:10] so it's doing checking whereas a equals x [00:20:10] x times x is doing assignment it's taking [00:20:13] times x is doing assignment it's taking x squared and putting into a [00:20:17] so now there's a final constraint for [00:20:19] so now there's a final constraint for this specification and this is also kind [00:20:22] this specification and this is also kind of interesting [00:20:23] of interesting no note that we wanted to check that [00:20:26] no note that we wanted to check that c is greater than 0 for all x and y but [00:20:30] c is greater than 0 for all x and y but we're going to negate that here because [00:20:32] we're going to negate that here because csvs were doing um only looking for an [00:20:35] csvs were doing um only looking for an existence [00:20:36] existence of a particular assignment and csps [00:20:39] of a particular assignment and csps can't um [00:20:40] can't um you know natively check all possible [00:20:43] you know natively check all possible you know [00:20:43] you know assignments in a sense [00:20:45] assignments in a sense um so we're going to negate it so [00:20:47] um so we're going to negate it so intuitively what this is doing is [00:20:49] intuitively what this is doing is looking for a counter example it's going [00:20:50] looking for a counter example it's going to say hey can we find a setting of x y [00:20:55] to say hey can we find a setting of x y a b and c such that we are able to find [00:20:58] a b and c such that we are able to find c less than or equal to 0. [00:21:01] c less than or equal to 0. and [00:21:02] and you know if we can [00:21:04] you know if we can that means um the [00:21:07] that means um the the specification doesn't hold there's a [00:21:09] the specification doesn't hold there's a counter example [00:21:11] counter example but if we are not able to find any [00:21:13] but if we are not able to find any consistent [00:21:14] consistent assignment if the csp is not satisfiable [00:21:18] assignment if the csp is not satisfiable that means the program satisfies the [00:21:21] that means the program satisfies the specification [00:21:23] specification so it's maybe a little bit counter to it [00:21:24] so it's maybe a little bit counter to it at first but we're proving [00:21:27] at first but we're proving correctness [00:21:28] correctness based on the fact that the csp has no [00:21:30] based on the fact that the csp has no satisfying assignments [00:21:33] satisfying assignments so one thing that's really kind of cool [00:21:36] so one thing that's really kind of cool and interesting about [00:21:38] and interesting about formulating the program as a csp and the [00:21:41] formulating the program as a csp and the fact that this mathematical equality is [00:21:44] fact that this mathematical equality is not uh it's bi-directional [00:21:47] not uh it's bi-directional is that the csp can actually reason [00:21:50] is that the csp can actually reason in no particular order it can look start [00:21:53] in no particular order it can look start with this constraint c less than zero [00:21:55] with this constraint c less than zero and it can work backwards it can look [00:21:57] and it can work backwards it can look backwards through c b n a or it can look [00:22:00] backwards through c b n a or it can look forwards [00:22:01] forwards starting with x and y or it can do it in [00:22:03] starting with x and y or it can do it in kind of a more sophisticated order [00:22:06] kind of a more sophisticated order whereas if you were only to execute the [00:22:08] whereas if you were only to execute the program you can only go forwards so this [00:22:11] program you can only go forwards so this shows you kind of the flexibility and [00:22:12] shows you kind of the flexibility and power of [00:22:13] power of reasoning over programs using a [00:22:15] reasoning over programs using a constrained satisfaction problem [00:22:19] okay so we've presented a number of [00:22:22] okay so we've presented a number of examples of real world problems and show [00:22:24] examples of real world problems and show you how to formulate them as a csp or [00:22:27] you how to formulate them as a csp or two [00:22:29] two so how do you do it well first step is [00:22:31] so how do you do it well first step is to decide on the variables and the [00:22:33] to decide on the variables and the domains and you want to check that an [00:22:36] domains and you want to check that an assignment to all these variables gives [00:22:38] assignment to all these variables gives you [00:22:39] you the result of interest [00:22:41] the result of interest and then we take a look at all the [00:22:43] and then we take a look at all the zerata the constraints and the [00:22:45] zerata the constraints and the preferences the wishes [00:22:47] preferences the wishes and translate them into a set of factors [00:22:51] and translate them into a set of factors and the nice thing about csps is that [00:22:53] and the nice thing about csps is that this process is often paralyzable [00:22:56] this process is often paralyzable so if you have a set of disjointer [00:22:58] so if you have a set of disjointer usually each digital rod term translates [00:23:01] usually each digital rod term translates into a factor or a set of factors and [00:23:03] into a factor or a set of factors and then at the end of the day you just [00:23:05] then at the end of the day you just throw in all the factors into your csp [00:23:09] so there are some [00:23:12] so there are some notes to keep in mind when you're [00:23:13] notes to keep in mind when you're designing a constraint satisfaction [00:23:15] designing a constraint satisfaction problems you should keep the csps small [00:23:17] problems you should keep the csps small so that they will be more [00:23:18] so that they will be more computationally efficient to solve [00:23:21] computationally efficient to solve so which means either having fewer [00:23:24] so which means either having fewer variables uh fewer [00:23:26] variables uh fewer factors a smaller number of uh domains [00:23:30] factors a smaller number of uh domains um smaller number of errors you can't [00:23:33] um smaller number of errors you can't make everything small but and there's [00:23:35] make everything small but and there's various trade-offs [00:23:36] various trade-offs exactly [00:23:38] exactly what exactly is the recipe for uh [00:23:40] what exactly is the recipe for uh computational efficiency really depends [00:23:43] computational efficiency really depends on the problem there's no kind of [00:23:46] on the problem there's no kind of general rule [00:23:47] general rule so this is going to be a little bit of [00:23:49] so this is going to be a little bit of art here [00:23:51] art here and finally one just kind of reminder is [00:23:54] and finally one just kind of reminder is that when you think about implementing [00:23:57] that when you think about implementing each factor [00:23:58] each factor you know it is true that each factor is [00:24:01] you know it is true that each factor is itself a little mini program but you [00:24:04] itself a little mini program but you should really think of in terms of [00:24:05] should really think of in terms of checking a solution [00:24:07] checking a solution checking whether assignment to the the [00:24:10] checking whether assignment to the the variables of that factor [00:24:12] variables of that factor encompasses is valid rather than trying [00:24:15] encompasses is valid rather than trying to compute the solution [00:24:17] to compute the solution so equals is mathematical equality [00:24:19] so equals is mathematical equality rather than [00:24:21] rather than assignment and this is really important [00:24:24] assignment and this is really important and it takes a little bit of getting [00:24:25] and it takes a little bit of getting used to because csps requires a [00:24:27] used to because csps requires a fundamentally different mindset than [00:24:29] fundamentally different mindset than normal [00:24:30] normal kind of procedural programming [00:24:32] kind of procedural programming which is most salient in the program [00:24:34] which is most salient in the program verification example [00:24:36] verification example um but you know hopefully after a bit of [00:24:39] um but you know hopefully after a bit of practice you'll get um used to how [00:24:41] practice you'll get um used to how thinking in terms of csps and hopefully [00:24:43] thinking in terms of csps and hopefully will become more second nature [00:24:46] will become more second nature all right so that's the end of this [00:24:48] all right so that's the end of this module ================================================================================ LECTURE 026 ================================================================================ Constraint Satisfaction Problems (CSPs) 4 - Dynamic Ordering | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=Lyu8VzbIe_A --- Transcript [00:00:06] hi in the previous module we looked at [00:00:08] hi in the previous module we looked at modeling [00:00:09] modeling using this module I'm going to start [00:00:12] using this module I'm going to start talking about inference particular [00:00:14] talking about inference particular introducing backtracking with dynamic [00:00:19] introducing backtracking with dynamic or so just a quick refresher remember a [00:00:24] or so just a quick refresher remember a CSP is defined by a factor graph which [00:00:27] CSP is defined by a factor graph which has a set of [00:00:28] has a set of variables n where each variable on some [00:00:32] variables n where each variable on some values in domain of I and it also has [00:00:36] values in domain of I and it also has some factors F1 through FM where each [00:00:39] some factors F1 through FM where each factor FJ is a function that takes a [00:00:43] factor FJ is a function that takes a subset of the variables and returns a [00:00:46] subset of the variables and returns a non- negative [00:00:47] non- negative quantity so the [00:00:50] quantity so the assignment weight is defined as follows [00:00:53] assignment weight is defined as follows so each [00:00:54] so each assignment to all the variables has a [00:00:57] assignment to all the variables has a weight which is given by the product of [00:01:00] weight which is given by the product of all the factors and the goal in finding [00:01:04] all the factors and the goal in finding this in solving CSP is to compute the [00:01:07] this in solving CSP is to compute the maximum weight [00:01:15] assignment so let's start with [00:01:18] assignment so let's start with backtracking [00:01:19] backtracking search um which we already talked about [00:01:22] search um which we already talked about a little bit so uh the backtracking [00:01:25] a little bit so uh the backtracking search is going to be the kind of [00:01:26] search is going to be the kind of blueprint for the um current algorithm [00:01:30] blueprint for the um current algorithm that we're going to talk [00:01:31] that we're going to talk about so if we start with empty [00:01:35] about so if we start with empty assignment no variable has [00:01:37] assignment no variable has any and we uh choose one of the [00:01:41] any and we uh choose one of the variables assign of a particular value [00:01:44] variables assign of a particular value red in this case and then we recurse and [00:01:47] red in this case and then we recurse and we pick another variable assign a value [00:01:50] we pick another variable assign a value and recursing maybea and then we back up [00:01:55] and recursing maybea and then we back up backtrack and uh Try Green um backtrack [00:01:59] backtrack and uh Try Green um backtrack try [00:02:00] try uh blue and we backtrack up here now [00:02:03] uh blue and we backtrack up here now we're going to try setting W green down [00:02:07] we're going to try setting W green down here explore this sub tree then we come [00:02:10] here explore this sub tree then we come back up ex uh explore n green n blue and [00:02:16] back up ex uh explore n green n blue and so so [00:02:19] forth so at the bottom of this tree we [00:02:23] forth so at the bottom of this tree we have the [00:02:24] have the leaves and each [00:02:27] leaves and each Leaf is a complete assignment and each [00:02:30] Leaf is a complete assignment and each assignment um has a weight which we can [00:02:34] assignment um has a weight which we can compute and now if once we search [00:02:37] compute and now if once we search through all the assignments we simply [00:02:39] through all the assignments we simply take the assignment with the [00:02:42] take the assignment with the m all right so this is the most [00:02:44] m all right so this is the most straightforward way of taking a CSP and [00:02:47] straightforward way of taking a CSP and solving it using backtracking [00:02:51] solving it using backtracking search so the first thing we'll note is [00:02:53] search so the first thing we'll note is that we can actually compute weight of [00:02:58] that we can actually compute weight of partial assignments as we go rather than [00:03:00] partial assignments as we go rather than waiting until the very end to compute [00:03:02] waiting until the very end to compute the weight of an entire assignment so [00:03:05] the weight of an entire assignment so this is how it's going to [00:03:07] this is how it's going to proceed so um let's start with empty [00:03:11] proceed so um let's start with empty assignment and we uh assign wa [00:03:15] assignment and we uh assign wa red [00:03:16] red and we can't evaluate any of the factors [00:03:19] and we can't evaluate any of the factors so far um but once we assign [00:03:23] so far um but once we assign NT we can actually evaluate this Factor [00:03:27] NT we can actually evaluate this Factor uh and test whether wa equal n [00:03:30] uh and test whether wa equal n so these other factors we can't evaluate [00:03:32] so these other factors we can't evaluate yet because we don't know these values [00:03:34] yet because we don't know these values but values to these variables but we can [00:03:37] but values to these variables but we can move on um now we recurse we assign saay [00:03:41] move on um now we recurse we assign saay value and now we can assign we can [00:03:44] value and now we can assign we can evaluate these [00:03:45] evaluate these factors um W not equal sa and NT not [00:03:50] factors um W not equal sa and NT not equal then we assign Q we can evaluate [00:03:55] equal then we assign Q we can evaluate these two factors sign NS w we can [00:03:58] these two factors sign NS w we can evaluate these two factors [00:04:01] evaluate these two factors and assign V we can [00:04:04] and assign V we can now evaluate these two factors and these [00:04:07] now evaluate these two factors and these are all the factors in [00:04:10] are all the factors in CSP so at any point in time for example [00:04:13] CSP so at any point in time for example at [00:04:14] at nssw we have this partial assignment [00:04:18] nssw we have this partial assignment here and we Define the weight of that [00:04:21] here and we Define the weight of that partial assignment to be the product of [00:04:24] partial assignment to be the product of all the factors that we can [00:04:27] all the factors that we can evaluate where all the factor factors [00:04:30] evaluate where all the factor factors are evalu are evaluable if all the [00:04:35] are evalu are evaluable if all the variables scope of that factor have been [00:04:41] set so more formally suppose we have a [00:04:45] set so more formally suppose we have a partial assignment [00:04:48] partial assignment and as follows we're going to define the [00:04:51] and as follows we're going to define the set of dependent factors as follows so D [00:04:55] set of dependent factors as follows so D of a partial assignment X and a [00:04:58] of a partial assignment X and a particular variable X I is the set of [00:05:01] particular variable X I is the set of factors depending on x i and X but not [00:05:05] factors depending on x i and X but not on the unassigned variable so for [00:05:08] on the unassigned variable so for example D of this partial assignment up [00:05:12] example D of this partial assignment up here and um this variable [00:05:15] here and um this variable sa are simply these two factors [00:05:21] here these are the factors that are [00:05:23] here these are the factors that are going to be multiplied in when x i is [00:05:28] going to be multiplied in when x i is set okay so now we're ready to present [00:05:31] set okay so now we're ready to present our main backtracking search algorithm [00:05:35] our main backtracking search algorithm okay so this is going to be a kind of [00:05:36] okay so this is going to be a kind of General blueprint for many of the bells [00:05:40] General blueprint for many of the bells and whistles that we're going to talk [00:05:41] and whistles that we're going to talk about uh soon backtrack takes a partial [00:05:45] about uh soon backtrack takes a partial assignment X and the weight of that [00:05:48] assignment X and the weight of that partial assignment which is all the [00:05:50] partial assignment which is all the factors that we can evaluate so far and [00:05:53] factors that we can evaluate so far and domains which specifies valid uh [00:05:57] domains which specifies valid uh possible values for each of the [00:05:59] possible values for each of the variables in the CSP more on this in a [00:06:03] variables in the CSP more on this in a bit so if x is a complete assignment [00:06:06] bit so if x is a complete assignment then we have reached the leaf and we [00:06:09] then we have reached the leaf and we look at its weight and we update um our [00:06:13] look at its weight and we update um our current best and we [00:06:16] current best and we return if not we're going to choose an [00:06:19] return if not we're going to choose an unassigned variable [00:06:21] unassigned variable XI we're going to look at the values in [00:06:25] XI we're going to look at the values in the domain I of XI and order them [00:06:30] the domain I of XI and order them somehow we're going to go through each [00:06:33] somehow we're going to go through each value in that [00:06:34] value in that order and we're going to compute a [00:06:37] order and we're going to compute a weight um [00:06:39] weight um update so we're going to look at this [00:06:42] update so we're going to look at this assignment which is the X extended with [00:06:46] assignment which is the X extended with XI set to V and then we're going to look [00:06:49] XI set to V and then we're going to look at all the factors in the dependent set [00:06:52] at all the factors in the dependent set of factors of X the partial assignment [00:06:55] of factors of X the partial assignment and new variable XI that we're going to [00:06:58] and new variable XI that we're going to assign vol multiply all the factors um [00:07:02] assign vol multiply all the factors um evaluated at this extended assignment so [00:07:05] evaluated at this extended assignment so that number we're going to call [00:07:07] that number we're going to call Delta which is going to be the update on [00:07:10] Delta which is going to be the update on W okay so if Delta equals zero then we [00:07:16] W okay so if Delta equals zero then we uh stop there and don't recurse further [00:07:19] uh stop there and don't recurse further because [00:07:20] because remember any zero uh by a factor is [00:07:24] remember any zero uh by a factor is enough to zero out the um this uh the [00:07:28] enough to zero out the um this uh the weight of their given assignment [00:07:31] weight of their given assignment assignment so if not then we continue [00:07:34] assignment so if not then we continue we're going to do this thing called look [00:07:36] we're going to do this thing called look ahead which takes the domains and tries [00:07:39] ahead which takes the domains and tries to reduce them tries to prune away [00:07:42] to reduce them tries to prune away Things based [00:07:45] Things based on this new assignment XI [00:07:49] on this new assignment XI to so now if any of these domains become [00:07:52] to so now if any of these domains become empty then we can again prune and stop [00:07:56] empty then we can again prune and stop recursing otherwise we going to recurse [00:07:59] recursing otherwise we going to recurse and Backtrack on this extended [00:08:02] and Backtrack on this extended assignment with this updated weight w * [00:08:06] assignment with this updated weight w * Delta and the new domains that we've uh [00:08:09] Delta and the new domains that we've uh ConEd [00:08:11] ConEd ahead okay so this recipe has three [00:08:15] ahead okay so this recipe has three Choice points how to choose the [00:08:18] Choice points how to choose the unassigned variable how to order the [00:08:20] unassigned variable how to order the values of the assigned unassigned [00:08:23] values of the assigned unassigned variable and finally this look ahead [00:08:26] variable and finally this look ahead which is how we prune the so we're going [00:08:29] which is how we prune the so we're going to talk about each of these in turn [00:08:32] to talk about each of these in turn starting with the look [00:08:36] ahead so we're going to introduce a [00:08:39] ahead so we're going to introduce a simple form of look ahead called forward [00:08:41] simple form of look ahead called forward checking okay so first we're going to [00:08:45] checking okay so first we're going to visualize the domains of each of the [00:08:48] visualize the domains of each of the variables with this um set of valid [00:08:50] variables with this um set of valid colors above the respective so in the [00:08:53] colors above the respective so in the empty assignment all the values are [00:08:57] empty assignment all the values are allowed okay so now um we're going to um [00:09:02] allowed okay so now um we're going to um set let's say wa equals um red so at [00:09:06] set let's say wa equals um red so at this point two things happen first we [00:09:09] this point two things happen first we wipe out the all the other values from [00:09:13] wipe out the all the other values from that variable which is clear that W [00:09:16] that variable which is clear that W we're committing to that but in addition [00:09:19] we're committing to that but in addition what we're going to do is do one step [00:09:21] what we're going to do is do one step look ahead forward checking so we're [00:09:24] look ahead forward checking so we're going to eliminate the inconsistent [00:09:26] going to eliminate the inconsistent values from the domains of xi's [00:09:28] values from the domains of xi's Neighbors so in this casee we're going [00:09:30] Neighbors so in this casee we're going to look at the neighbors of wa which are [00:09:33] to look at the neighbors of wa which are ntns and we're going to [00:09:35] ntns and we're going to remove red from those domains and why is [00:09:39] remove red from those domains and why is that because this Factor says that well [00:09:42] that because this Factor says that well if this is red then this can't be red so [00:09:45] if this is red then this can't be red so red is gone [00:09:46] red is gone now okay so now uh backtracking search [00:09:50] now okay so now uh backtracking search is going to recurse and it's going to [00:09:53] is going to recurse and it's going to let's say it sets a n to [00:09:56] let's say it sets a n to Green um so now again I do one step look [00:09:59] Green um so now again I do one step look Ahad look at the neighbors of ENT and [00:10:01] Ahad look at the neighbors of ENT and I'm going to wipe out green from [00:10:06] I'm going to wipe out green from these okay so suppose I recurse again [00:10:09] these okay so suppose I recurse again and now I set um Q to be [00:10:14] and now I set um Q to be blue so again one step look ahead I'm [00:10:17] blue so again one step look ahead I'm going to wipe out Blue from my [00:10:21] going to wipe out Blue from my neighbors now look what happens essay [00:10:24] neighbors now look what happens essay has an empty [00:10:26] has an empty domain which means that there are no [00:10:29] domain which means that there are no Poss possible values that I can set [00:10:31] Poss possible values that I can set essay to to make the assignment uh [00:10:36] essay to to make the assignment uh consistent so when in this case if any [00:10:39] consistent so when in this case if any domain becomes empty I simply return [00:10:42] domain becomes empty I simply return here and this is important because now [00:10:46] here and this is important because now all these other variables have not been [00:10:48] all these other variables have not been set yet and rather than recursing and [00:10:51] set yet and rather than recursing and trying to set them all sorts of [00:10:52] trying to set them all sorts of different ways I already know at this [00:10:55] different ways I already know at this point that essay is not setable so I [00:10:59] point that essay is not setable so I just stop there so this allows forward [00:11:02] just stop there so this allows forward checking allows me to use these domains [00:11:06] checking allows me to use these domains to [00:11:09] prun okay so forward checking is also [00:11:12] prun okay so forward checking is also going to allow me to um choose a [00:11:15] going to allow me to um choose a unassigned variable and order the values [00:11:18] unassigned variable and order the values in a variable I'll show um follows so [00:11:22] in a variable I'll show um follows so suppose we're in this situation so wa [00:11:24] suppose we're in this situation so wa and N have been set and I've applied [00:11:26] and N have been set and I've applied forward checking to um propagate the [00:11:29] forward checking to um propagate the constraint to the all [00:11:31] constraint to the all theor so now the question is which [00:11:34] theor so now the question is which variable do I assign next so there is [00:11:37] variable do I assign next so there is this horis called most constrained [00:11:40] this horis called most constrained variable [00:11:41] variable MCB which simply chooses the variable [00:11:44] MCB which simply chooses the variable that has the smallest doain okay so um [00:11:49] that has the smallest doain okay so um what's uh the domain size here so [00:11:51] what's uh the domain size here so there's two elements of Q three elements [00:11:55] there's two elements of Q three elements here one element here so SA is the [00:11:59] here one element here so SA is the variable that has the smallest domain it [00:12:02] variable that has the smallest domain it has only one element and the intuition [00:12:05] has only one element and the intuition here is I want to restrict the branching [00:12:07] here is I want to restrict the branching factor and choose variables that have [00:12:10] factor and choose variables that have small um [00:12:12] small um branching determined by number of [00:12:15] branching determined by number of elements in that [00:12:18] domain so the [00:12:21] domain so the second choice point is once I've [00:12:24] second choice point is once I've selected a variable how do I order the [00:12:27] selected a variable how do I order the values to explore [00:12:30] values to explore so consider the following so I'm trying [00:12:32] so consider the following so I'm trying to assign a value to Q do I first try [00:12:36] to assign a value to Q do I first try red or do I try [00:12:40] red or do I try blue so the idea behind this herisa [00:12:43] blue so the idea behind this herisa called least constraint value is I'm [00:12:46] called least constraint value is I'm going to order the values of um the [00:12:51] going to order the values of um the selected x i by decreasing number of [00:12:55] selected x i by decreasing number of consistent values of neighboring VAR [00:12:57] consistent values of neighboring VAR okay so what does this mean [00:12:59] okay so what does this mean on this example so I look at [00:13:02] on this example so I look at Q and remember I set this to Red [00:13:06] Q and remember I set this to Red tentatively and I propagate um via [00:13:10] tentatively and I propagate um via forward checking to its neighbor so I [00:13:11] forward checking to its neighbor so I wiped out red here and now I look at the [00:13:14] wiped out red here and now I look at the neighbors and say how many possible uh [00:13:18] neighbors and say how many possible uh consistent values are there so there are [00:13:20] consistent values are there so there are two plus two plus two so that's uh six [00:13:24] two plus two plus two so that's uh six values and what about if I set it to [00:13:26] values and what about if I set it to Blue here and I've eliminate blue B from [00:13:30] Blue here and I've eliminate blue B from these neighbors and the number of [00:13:32] these neighbors and the number of consistent values is 1 + 1 + 2 which is [00:13:37] consistent values is 1 + 1 + 2 which is four here six is larger than four so I'm [00:13:40] four here six is larger than four so I'm going to try red this Cas so intuitively [00:13:44] going to try red this Cas so intuitively why does this make sense I want to [00:13:47] why does this make sense I want to choose a value that gives as much [00:13:50] choose a value that gives as much Freedom as possible to its neighbors so [00:13:53] Freedom as possible to its neighbors so that I don't and run into trouble and [00:13:56] that I don't and run into trouble and get things to be inconsistent here and [00:13:59] get things to be inconsistent here and you can see that by having red here and [00:14:04] you can see that by having red here and red here I have more options for the [00:14:07] red here I have more options for the Neighbors NTN essay then over [00:14:10] Neighbors NTN essay then over here if since I can only do green here [00:14:14] here if since I can only do green here and you can if you look even one step [00:14:17] and you can if you look even one step ahead you'll notice that this is already [00:14:19] ahead you'll notice that this is already going to cause [00:14:21] going to cause trouble okay so least constrainted value [00:14:25] trouble okay so least constrainted value order the values to in order to free up [00:14:29] order the values to in order to free up the neighbors as much as [00:14:32] possible [00:14:34] possible so this might seem a little bit strange [00:14:37] so this might seem a little bit strange so most constrained variable least [00:14:39] so most constrained variable least constrained value these seem [00:14:41] constrained value these seem superficially kind of at odds with each [00:14:44] superficially kind of at odds with each other but there is a reasoning which is [00:14:49] other but there is a reasoning which is that variables and values are very [00:14:52] that variables and values are very different so in a CSP every variable [00:14:56] different so in a CSP every variable must be assigned so we don't we can't [00:14:58] must be assigned so we don't we can't left of leave a variable alone and hope [00:15:01] left of leave a variable alone and hope that the problem uh will disappear later [00:15:05] that the problem uh will disappear later um so what we're going to do is we're [00:15:07] um so what we're going to do is we're going to try to choose the most [00:15:08] going to try to choose the most constrained variables as possible so if [00:15:11] constrained variables as possible so if we're going to fail we're going to [00:15:12] we're going to fail we're going to choose the hardest variables first to [00:15:15] choose the hardest variables first to try and we might as well fail early [00:15:17] try and we might as well fail early which leads to more [00:15:19] which leads to more pruning on the other hand values [00:15:24] pruning on the other hand values are those that we're going to Cho we [00:15:26] are those that we're going to Cho we just need to choose some value for each [00:15:28] just need to choose some value for each variable [00:15:30] variable so what we're going to try to do is [00:15:31] so what we're going to try to do is choose the value that is most likely to [00:15:34] choose the value that is most likely to lead uh to a [00:15:36] lead uh to a solution it doesn't matter if some value [00:15:40] solution it doesn't matter if some value is going to cause trouble because if we [00:15:43] is going to cause trouble because if we choose a value that happens to work then [00:15:46] choose a value that happens to work then maybe we'll be [00:15:49] maybe we'll be happy so when do these hortic help kind [00:15:52] happy so when do these hortic help kind of [00:15:52] of formul the most constrained variable is [00:15:55] formul the most constrained variable is useful when some of the factors are [00:15:58] useful when some of the factors are constrained [00:16:00] constrained um it's okay if some of the factors are [00:16:02] um it's okay if some of the factors are not constrained um but it's important [00:16:07] not constrained um but it's important that at least uh one of the factors is a [00:16:10] that at least uh one of the factors is a constraint which means that it's [00:16:13] constraint which means that it's something that returns uh can return [00:16:15] something that returns uh can return zero if all the factors are returning [00:16:18] zero if all the factors are returning non zero values then none of these [00:16:21] non zero values then none of these teristics are going to be helpful you [00:16:23] teristics are going to be helpful you kind of have to explore [00:16:26] kind of have to explore everything this constraint value is [00:16:28] everything this constraint value is useful when all the factors are [00:16:31] useful when all the factors are constraints in other words the Simon [00:16:34] constraints in other words the Simon weights are one and or zero so they have [00:16:37] weights are one and or zero so they have to look like this but not like this [00:16:40] to look like this but not like this factor and the rationale here is [00:16:45] factor and the rationale here is that we have [00:16:48] that we have to uh we don't have to um find all of [00:16:53] to uh we don't have to um find all of the assignments in this case if we find [00:16:56] the assignments in this case if we find an assignment that has weight one [00:16:59] an assignment that has weight one then we know we're done because one is [00:17:02] then we know we're done because one is the maximum weight possible and we just [00:17:04] the maximum weight possible and we just return immediately kind of stop [00:17:07] return immediately kind of stop early but if there's other factors which [00:17:10] early but if there's other factors which have um varying [00:17:13] have um varying values oh VAR weights of different [00:17:15] values oh VAR weights of different magnitudes then we can't necessarily [00:17:17] magnitudes then we can't necessarily stop if we find a weight of two well [00:17:20] stop if we find a weight of two well maybe there's a weight another [00:17:21] maybe there's a weight another assignment that has a weight of four or [00:17:22] assignment that has a weight of four or eight or 17 or so on we don't really we [00:17:26] eight or 17 or so on we don't really we can't really stop early enough [00:17:30] can't really stop early enough and notice that we need forward checking [00:17:33] and notice that we need forward checking uh to make both of these things work [00:17:35] uh to make both of these things work because these horis sixs rely on [00:17:38] because these horis sixs rely on Counting the number of elements in [00:17:40] Counting the number of elements in domains and so we need to groom or prune [00:17:42] domains and so we need to groom or prune these domains so that theistic [00:17:47] are okay so let's conclude here so we [00:17:51] are okay so let's conclude here so we presented backtracking search so [00:17:54] presented backtracking search so backtracking search has uh three Choice [00:17:57] backtracking search has uh three Choice points first we need to choose an [00:18:00] points first we need to choose an unassigned variable XI this is done by [00:18:04] unassigned variable XI this is done by using um most constrain variable [00:18:08] using um most constrain variable MCV once we found a variable to assign [00:18:12] MCV once we found a variable to assign we're going to order the values of that [00:18:16] we're going to order the values of that unassigned variable based on the lcv [00:18:20] unassigned variable based on the lcv heris or least constrain [00:18:23] heris or least constrain value and then we're going [00:18:26] value and then we're going to um compute the updated weight as we [00:18:30] to um compute the updated weight as we discussed before and then we're going to [00:18:34] discussed before and then we're going to update the [00:18:35] update the domains uh via one-step look ahead AKA [00:18:38] domains uh via one-step look ahead AKA forward [00:18:39] forward checking and then if the number of uh [00:18:43] checking and then if the number of uh elements in any domain is zero then we [00:18:46] elements in any domain is zero then we stop there and recurse don't recurse [00:18:48] stop there and recurse don't recurse otherwise [00:18:50] otherwise we okay so notice that none of these her [00:18:52] we okay so notice that none of these her istics is guaranteed to speed up [00:18:55] istics is guaranteed to speed up backtracking search there's no uh Theory [00:18:58] backtracking search there's no uh Theory here but often in practice these can [00:19:01] here but often in practice these can make a big difference so next time we'll [00:19:04] make a big difference so next time we'll look at the look ahead and see how we [00:19:07] look at the look ahead and see how we can even improve upon over we checking [00:19:10] can even improve upon over we checking so that's it ================================================================================ LECTURE 027 ================================================================================ Constraint Satisfaction Problems (CSPs) 5 - Arc Consistency | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=5rlIYGJdPy4 --- Transcript [00:00:05] hi in this module i'm going to be [00:00:07] hi in this module i'm going to be talking about the notion of our [00:00:09] talking about the notion of our consistency this is going to lead us to [00:00:11] consistency this is going to lead us to a look at algorithm called ac3 which is [00:00:14] a look at algorithm called ac3 which is going to enable us to prune domains much [00:00:17] going to enable us to prune domains much more aggressively than before in the [00:00:19] more aggressively than before in the context of backtracking search [00:00:21] context of backtracking search let's begin [00:00:23] let's begin first i want to review backtracking [00:00:25] first i want to review backtracking search [00:00:26] search so backtracking search is recursive [00:00:28] so backtracking search is recursive procedure [00:00:30] procedure where [00:00:31] where takes [00:00:32] takes a partial assignment x [00:00:34] a partial assignment x its weight [00:00:36] its weight and the domains of each of the variables [00:00:38] and the domains of each of the variables in the csp [00:00:40] in the csp if all the variables have already been [00:00:43] if all the variables have already been assigned in x [00:00:45] assigned in x then we just [00:00:46] then we just see if it's better than the best [00:00:48] see if it's better than the best assignment we've seen so far and if so [00:00:50] assignment we've seen so far and if so updated and then we return this is the [00:00:54] updated and then we return this is the base case [00:00:55] base case otherwise we're going to choose an [00:00:58] otherwise we're going to choose an unassigned variable x i [00:01:00] unassigned variable x i we're going to look at all the values in [00:01:03] we're going to look at all the values in the domain of x i and order them [00:01:06] the domain of x i and order them according to some heuristic lcv [00:01:09] according to some heuristic lcv and now we're going to step through each [00:01:11] and now we're going to step through each of the values v [00:01:13] of the values v in that order [00:01:14] in that order we're going to compute [00:01:17] we're going to compute the weight update [00:01:19] the weight update based on the xi's [00:01:22] based on the xi's being set to v [00:01:24] being set to v and if this is zero then we can just [00:01:26] and if this is zero then we can just stop recursing right there [00:01:28] stop recursing right there um otherwise we're going to [00:01:30] um otherwise we're going to use this updated uh [00:01:33] use this updated uh assignment [00:01:34] assignment to [00:01:36] to um as an input into the look at head [00:01:38] um as an input into the look at head algorithm to reduce the domains [00:01:41] algorithm to reduce the domains and now if any of the domains become [00:01:43] and now if any of the domains become empty then again we stop recursing [00:01:46] empty then again we stop recursing otherwise we recurse [00:01:49] otherwise we recurse so last time we talked about [00:01:51] so last time we talked about the heuristics for choosing unassigned [00:01:53] the heuristics for choosing unassigned variable [00:01:54] variable ordering the values these are the mcb [00:01:57] ordering the values these are the mcb and lcv heuristics and then we looked at [00:02:00] and lcv heuristics and then we looked at forward checking which was a one step [00:02:02] forward checking which was a one step look at now we're going to [00:02:04] look at now we're going to upgrade that to ac3 [00:02:08] upgrade that to ac3 so before we get into ac3 i need to talk [00:02:11] so before we get into ac3 i need to talk about our consistency using us let's use [00:02:13] about our consistency using us let's use a simple example [00:02:15] a simple example so suppose we have just two [00:02:18] so suppose we have just two variables x i and xj xi can be one two [00:02:22] variables x i and xj xi can be one two three four or five and x j can be one or [00:02:25] three four or five and x j can be one or two [00:02:27] two so [00:02:28] so and x i and x j are related be via a [00:02:33] and x i and x j are related be via a single factor which says that their sum [00:02:35] single factor which says that their sum must equal 4 exactly [00:02:39] must equal 4 exactly so what does it mean to enforce our [00:02:41] so what does it mean to enforce our consistency on let's say x i [00:02:45] consistency on let's say x i this means i'm going to go through each [00:02:47] this means i'm going to go through each of the values in the domain of x i and [00:02:50] of the values in the domain of x i and try to eliminate it if eliminated if it [00:02:54] try to eliminate it if eliminated if it can't be satisfied by any value in xj's [00:02:58] can't be satisfied by any value in xj's domain okay so let's try this so look at [00:03:02] domain okay so let's try this so look at one [00:03:02] one does there exist any possible setting of [00:03:05] does there exist any possible setting of x j so that i can do one plus something [00:03:08] x j so that i can do one plus something to get four [00:03:10] to get four one plus one is not four one plus two is [00:03:12] one plus one is not four one plus two is not four so therefore one is just [00:03:14] not four so therefore one is just impossible [00:03:15] impossible without even knowing the value of xj so [00:03:18] without even knowing the value of xj so let me eliminate it [00:03:20] let me eliminate it what about 2 [00:03:21] what about 2 well i can set xj to 2 [00:03:24] well i can set xj to 2 to get 4 so that's okay [00:03:27] to get 4 so that's okay notice that it's fine that 1 [00:03:31] notice that it's fine that 1 plus two isn't four it just matters that [00:03:34] plus two isn't four it just matters that there exists [00:03:35] there exists one of the values in xj that work so [00:03:38] one of the values in xj that work so let's leave two alone so what about [00:03:40] let's leave two alone so what about three [00:03:41] three well three plus one is four so that's [00:03:43] well three plus one is four so that's okay too [00:03:44] okay too what about four [00:03:46] what about four i can't add four to one or two to get [00:03:48] i can't add four to one or two to get four so [00:03:49] four so that gets eliminated and same with five [00:03:53] that gets eliminated and same with five so in the end [00:03:55] so in the end enforcing our consistency on x i results [00:03:58] enforcing our consistency on x i results in a smaller domain which only consists [00:04:00] in a smaller domain which only consists of two and three [00:04:03] of two and three so notice i can eliminate values without [00:04:06] so notice i can eliminate values without even knowing what [00:04:08] even knowing what the exact value of xj is [00:04:13] so more formally our consistency [00:04:16] so more formally our consistency is a property which i'll explain [00:04:18] is a property which i'll explain so a variable x i [00:04:21] so a variable x i is are consistent with respect to [00:04:23] is are consistent with respect to another variable xj if for each [00:04:27] another variable xj if for each value [00:04:28] value in the domain of x x i [00:04:31] in the domain of x x i there exists some other value [00:04:34] there exists some other value in the domain of x j such that [00:04:37] in the domain of x j such that essentially all the factors check out so [00:04:40] essentially all the factors check out so formally what that means is that if you [00:04:41] formally what that means is that if you look at all the factors [00:04:44] look at all the factors whose scope contains x i and x j and you [00:04:47] whose scope contains x i and x j and you evaluate that factor [00:04:49] evaluate that factor on x i [00:04:51] on x i x j [00:04:52] x j then you get something that's not zero [00:04:56] then you get something that's not zero okay so an enforcing our consistency [00:04:59] okay so an enforcing our consistency is a procedure that takes two variables [00:05:03] is a procedure that takes two variables and just simply removes the values from [00:05:06] and just simply removes the values from domain i [00:05:08] domain i to make x i are consistent with respect [00:05:11] to make x i are consistent with respect to xj [00:05:12] to xj exactly what we did on the example on [00:05:15] exactly what we did on the example on the previous slide [00:05:18] so let's [00:05:20] so let's revisit the australia example [00:05:23] revisit the australia example and apply [00:05:26] and apply ac3 [00:05:27] ac3 okay [00:05:28] okay so [00:05:29] so here is the empty assignment [00:05:32] here is the empty assignment and here are all the domains of each of [00:05:34] and here are all the domains of each of the variables [00:05:36] the variables so let's suppose we set wa [00:05:39] so let's suppose we set wa to be red okay so as before we eliminate [00:05:43] to be red okay so as before we eliminate um the other [00:05:46] um the other uh [00:05:47] uh values from was domain of course [00:05:50] values from was domain of course and then we enforce our consistency on [00:05:54] and then we enforce our consistency on the neighbors of wa in this case ntnsa [00:05:58] the neighbors of wa in this case ntnsa so out goes red on both of these [00:06:02] so out goes red on both of these um and now we [00:06:03] um and now we continue try to enforce our consistency [00:06:07] continue try to enforce our consistency on the neighbors of ntnsa [00:06:09] on the neighbors of ntnsa but in this case i can't actually [00:06:11] but in this case i can't actually eliminate anything [00:06:13] eliminate anything so [00:06:14] so now [00:06:15] now we're going to recurse [00:06:19] we're going to recurse and suppose now in the next level of [00:06:21] and suppose now in the next level of backtracking we assign nt green [00:06:25] backtracking we assign nt green so now again [00:06:27] so now again we're going to [00:06:30] apply [00:06:32] apply enforce our consistency on the neighbors [00:06:34] enforce our consistency on the neighbors of nt [00:06:35] of nt so that will eliminate green [00:06:37] so that will eliminate green from these two [00:06:39] from these two so notice that one step [00:06:41] so notice that one step should look very very familiar this is [00:06:43] should look very very familiar this is exactly forward checking [00:06:46] exactly forward checking but ac3 doesn't stop there and then it [00:06:49] but ac3 doesn't stop there and then it says enforce our consistency on the [00:06:51] says enforce our consistency on the neighbors of q and sa okay so let's uh [00:06:55] neighbors of q and sa okay so let's uh enforce our consistency on the neighbors [00:06:57] enforce our consistency on the neighbors of sa [00:06:58] of sa that eliminates [00:06:59] that eliminates blue from [00:07:01] blue from its neighbors [00:07:03] its neighbors and now let's enforce our consistency on [00:07:05] and now let's enforce our consistency on the neighbors of q [00:07:06] the neighbors of q so that eliminates red [00:07:09] so that eliminates red from the neighbors [00:07:11] from the neighbors and now let's enforce our consistency on [00:07:13] and now let's enforce our consistency on the neighbors [00:07:14] the neighbors of nsw [00:07:17] of nsw well that eliminates green [00:07:20] well that eliminates green and at this point [00:07:21] and at this point now we're done [00:07:23] now we're done so notice what happened [00:07:27] each of these domains is only left with [00:07:29] each of these domains is only left with one value [00:07:31] one value so even though we're still in the [00:07:33] so even though we're still in the context of backtracking search at nt and [00:07:37] context of backtracking search at nt and we're still trying to figure out what to [00:07:38] we're still trying to figure out what to do with nt by looking ahead we've [00:07:41] do with nt by looking ahead we've actually [00:07:43] actually seen what [00:07:44] seen what values are even possible [00:07:47] values are even possible and [00:07:48] and we essentially solve the problem so now [00:07:51] we essentially solve the problem so now formally we haven't set these values yet [00:07:53] formally we haven't set these values yet we just eliminated their domains [00:07:56] we just eliminated their domains but backtracking search [00:07:58] but backtracking search uh recurs [00:07:59] uh recurs recursing on the rest of these values [00:08:01] recursing on the rest of these values should be really a walk in the park you [00:08:03] should be really a walk in the park you go into essay and you said set it to [00:08:05] go into essay and you said set it to blue set uh q to red [00:08:08] blue set uh q to red um and a sub u to green and v to red and [00:08:12] um and a sub u to green and v to red and you're done [00:08:13] you're done so this shows you the power of the ac3 [00:08:16] so this shows you the power of the ac3 with one fell swoop it basically can [00:08:18] with one fell swoop it basically can clean out a lot of the domains and [00:08:20] clean out a lot of the domains and reveal kind of what the actual [00:08:24] reveal kind of what the actual assignments [00:08:25] assignments values are possible here [00:08:30] so here is ac3 [00:08:33] so here is ac3 more formally so remember forward [00:08:35] more formally so remember forward checking [00:08:37] checking is what you do is when you assign [00:08:41] is what you do is when you assign the variable xj to some value xj [00:08:44] the variable xj to some value xj literally xj [00:08:45] literally xj you set the domain [00:08:48] you set the domain to only include that value and then you [00:08:50] to only include that value and then you enforce our consistency on the neighbors [00:08:53] enforce our consistency on the neighbors of [00:08:54] of neighbors x i with respect to x j [00:08:58] neighbors x i with respect to x j okay so here's a picture so you're [00:09:00] okay so here's a picture so you're setting x j [00:09:02] setting x j and then you consider all the neighbors [00:09:03] and then you consider all the neighbors of x j [00:09:05] of x j for example x i and then you enforce our [00:09:07] for example x i and then you enforce our consistency on x i so you try to [00:09:10] consistency on x i so you try to propagate what you know about xj to xi [00:09:13] propagate what you know about xj to xi and try to eliminate excise domains [00:09:17] and try to eliminate excise domains so now ac3 just repeatedly enforces our [00:09:20] so now ac3 just repeatedly enforces our consistency and there's nothing left to [00:09:22] consistency and there's nothing left to do [00:09:23] do so here's the algorithm [00:09:26] so here's the algorithm we're going to maintain a working set [00:09:29] we're going to maintain a working set of variables that we need to go process [00:09:33] of variables that we need to go process so we start with xj which is [00:09:35] so we start with xj which is the variable that we just assigned [00:09:37] the variable that we just assigned and while there's still [00:09:39] and while there's still variables to process we're going to just [00:09:41] variables to process we're going to just remove any xj from [00:09:44] remove any xj from s this order doesn't really matter here [00:09:46] s this order doesn't really matter here and then for each of the neighbors x i [00:09:49] and then for each of the neighbors x i of x [00:09:50] of x x j [00:09:51] x j we're going to enforce our consistency [00:09:53] we're going to enforce our consistency on that neighbor with respect to x j so [00:09:56] on that neighbor with respect to x j so propagate the constraints out [00:09:59] propagate the constraints out and now if the domain [00:10:01] and now if the domain of x i changed then we're going to add x [00:10:05] of x i changed then we're going to add x i to s because we know more about x i [00:10:08] i to s because we know more about x i now and we can hopefully propagate the [00:10:12] now and we can hopefully propagate the information farther to its neighbors [00:10:15] information farther to its neighbors so notice that [00:10:18] a variable could be revisited multiple [00:10:21] a variable could be revisited multiple times so this is kind of like breadth [00:10:23] times so this is kind of like breadth first search with exception that [00:10:26] first search with exception that you might visit a node [00:10:28] you might visit a node more than once because you might [00:10:30] more than once because you might propagate some value to another neighbor [00:10:33] propagate some value to another neighbor and that [00:10:34] and that uh value might be [00:10:36] uh value might be constrained something else and then you [00:10:38] constrained something else and then you might get more additional information [00:10:40] might get more additional information back um and this can kind of go on for a [00:10:43] back um and this can kind of go on for a while but it does run in [00:10:46] while but it does run in polynomial time you can read the notes [00:10:48] polynomial time you can read the notes for a little bit more details about the [00:10:49] for a little bit more details about the running time [00:10:52] so [00:10:54] so as great as ac3 might seem it's not a [00:10:57] as great as ac3 might seem it's not a panacea and it shouldn't be and it [00:11:00] panacea and it shouldn't be and it shouldn't be surprising because solving [00:11:02] shouldn't be surprising because solving a csp should take an exponential time [00:11:05] a csp should take an exponential time in general and ac3 isn't doing [00:11:08] in general and ac3 isn't doing any sort of backtracking search [00:11:10] any sort of backtracking search so here is a small example that shows [00:11:13] so here is a small example that shows when ac3 doesn't do anything [00:11:16] when ac3 doesn't do anything so here we have a mini australia here [00:11:18] so here we have a mini australia here with three [00:11:19] with three variables [00:11:20] variables and suppose each of them can either be [00:11:22] and suppose each of them can either be red or blue red or blue red or blue [00:11:25] red or blue red or blue red or blue so immediately you should [00:11:27] so immediately you should realize that there is no [00:11:29] realize that there is no consistent assignment to three variables [00:11:32] consistent assignment to three variables with only two [00:11:33] with only two colors such that any pair can't have the [00:11:36] colors such that any pair can't have the same color [00:11:39] same color but what happens [00:11:41] but what happens if you run ac3 [00:11:44] if you run ac3 okay so let's look at this uh factor [00:11:47] okay so let's look at this uh factor here so w-a-n-t [00:11:50] here so w-a-n-t so this is r consistent [00:11:53] so this is r consistent because if i assign w-a red then nt can [00:11:56] because if i assign w-a red then nt can be blue if i assign wa blue then nt can [00:11:59] be blue if i assign wa blue then nt can be red so if i just look at this local [00:12:02] be red so if i just look at this local configuration there's no problem and [00:12:05] configuration there's no problem and analogously if i look over here there's [00:12:07] analogously if i look over here there's no problem and if i look over here [00:12:09] no problem and if i look over here there's no problem [00:12:10] there's no problem so ac3 doesn't detect a problem [00:12:15] so ac3 doesn't detect a problem even though there's no satisfying [00:12:17] even though there's no satisfying assignment [00:12:18] assignment so the intuition here is that ac3 and in [00:12:21] so the intuition here is that ac3 and in general our consistency all it's doing [00:12:23] general our consistency all it's doing is look looking locally at the graph [00:12:26] is look looking locally at the graph and it says it only [00:12:29] and it says it only detects problems that are kind of [00:12:31] detects problems that are kind of blatantly wrong which can be detected [00:12:33] blatantly wrong which can be detected locally [00:12:34] locally but you can't avoid [00:12:36] but you can't avoid exhaustive search to actually detect the [00:12:39] exhaustive search to actually detect the kind of the deep problems [00:12:44] so let me summarize here enforcing our [00:12:46] so let me summarize here enforcing our consistency [00:12:48] consistency is a way to take what you know about one [00:12:52] is a way to take what you know about one variable's domain to propagate that [00:12:54] variable's domain to propagate that information via the factors to make [00:12:57] information via the factors to make reduce the domains of its neighbors [00:13:01] reduce the domains of its neighbors forward checking only applies our [00:13:03] forward checking only applies our consistency to its neighbors [00:13:05] consistency to its neighbors and this was already somewhat effective [00:13:08] and this was already somewhat effective ac3 just takes that [00:13:10] ac3 just takes that to the extreme limit and enforces our [00:13:12] to the extreme limit and enforces our consistency on the neighbors and their [00:13:14] consistency on the neighbors and their neighbors and their neighbors and so on [00:13:16] neighbors and their neighbors and so on until you converge [00:13:18] until you converge so it's trying to kind of exhaustively [00:13:21] so it's trying to kind of exhaustively enforce our consistency as much as [00:13:23] enforce our consistency as much as possible to eliminate as much [00:13:25] possible to eliminate as much of the values from the domains as [00:13:27] of the values from the domains as possible [00:13:29] possible and [00:13:30] and of course remember that [00:13:32] of course remember that ac3 forward checking or look at head [00:13:34] ac3 forward checking or look at head algorithms which are used in the context [00:13:37] algorithms which are used in the context of backtracking search to detect [00:13:39] of backtracking search to detect inconsistencies so we can prune early [00:13:42] inconsistencies so we can prune early and also to maintain these domains so [00:13:44] and also to maintain these domains so that we can use them for heuristics such [00:13:47] that we can use them for heuristics such as mcb and lcd and look ahead turns out [00:13:50] as mcb and lcd and look ahead turns out to be very very important for [00:13:52] to be very very important for backtracking search if you can look [00:13:53] backtracking search if you can look ahead and detect [00:13:55] ahead and detect inconsistency then that saves you the [00:13:57] inconsistency then that saves you the work of actually having to recurse and [00:13:59] work of actually having to recurse and explore a combinatorial number of [00:14:01] explore a combinatorial number of possibilities [00:14:03] possibilities okay that's the end ================================================================================ LECTURE 028 ================================================================================ Constraint Satisfaction Problems (CSPs) 6 - Beam Search | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=XuWMeIHGkus --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about beam search a really simple [00:00:09] about beam search a really simple algorithm for finding approximate [00:00:11] algorithm for finding approximate maximum weight assignments efficiently [00:00:13] maximum weight assignments efficiently when you're in a hurry and don't want to [00:00:15] when you're in a hurry and don't want to incur the full cost of backtracking [00:00:17] incur the full cost of backtracking search [00:00:19] search so just as to review [00:00:21] so just as to review remember constraint satisfaction or csp [00:00:25] remember constraint satisfaction or csp is defined by a factor graph which [00:00:27] is defined by a factor graph which consists of a set of variables x1 [00:00:29] consists of a set of variables x1 through xn where each xi can be [00:00:32] through xn where each xi can be at some element of a domain i [00:00:34] at some element of a domain i and a set of factors f1 through fm where [00:00:38] and a set of factors f1 through fm where each factor function fj [00:00:40] each factor function fj is a function that takes [00:00:42] is a function that takes assignment and returns a non-negative [00:00:46] assignment and returns a non-negative number and usually the factor function [00:00:48] number and usually the factor function depends only on a subset of the [00:00:50] depends only on a subset of the variables [00:00:51] variables so each assignment little x to all the [00:00:55] so each assignment little x to all the variables [00:00:56] variables has a weight and that weight is given by [00:00:59] has a weight and that weight is given by simply the product of all the factors [00:01:02] simply the product of all the factors applied to the assignment [00:01:04] applied to the assignment and the objective is to find the maximum [00:01:07] and the objective is to find the maximum weight assignment [00:01:10] so let us revisit the object object [00:01:13] so let us revisit the object object tracking example so in this example [00:01:16] tracking example so in this example we're trying to track an object over [00:01:17] we're trying to track an object over time [00:01:18] time and at each time step we record a noisy [00:01:21] and at each time step we record a noisy sensor reading of its position so at [00:01:23] sensor reading of its position so at time step one we see zero times f2 we [00:01:27] time step one we see zero times f2 we see two times f3 c2 [00:01:31] see two times f3 c2 and we can the question is [00:01:34] and we can the question is what was the trajectory that object took [00:01:36] what was the trajectory that object took is this one or this one or something [00:01:39] is this one or this one or something else [00:01:40] else we modeled this [00:01:42] we modeled this problem as a csp [00:01:45] problem as a csp with [00:01:46] with x1 x2 and x3 [00:01:48] x1 x2 and x3 we defined factors that captured our [00:01:52] we defined factors that captured our intuitions about the problem [00:01:54] intuitions about the problem o1 captures the fact that the actual [00:01:57] o1 captures the fact that the actual position should be close to the sensor [00:01:59] position should be close to the sensor reading [00:02:00] reading so [00:02:01] so 2 [00:02:02] 2 is the weight assigned to 0. so 0 x1 [00:02:05] is the weight assigned to 0. so 0 x1 equals 0 is favored and x1 equals 2 is [00:02:09] equals 0 is favored and x1 equals 2 is disallowed [00:02:10] disallowed similarly [00:02:12] similarly o2 [00:02:14] o2 favors x2 equals 2 [00:02:16] favors x2 equals 2 o3 favors x3 equals 2 [00:02:19] o3 favors x3 equals 2 and finally the transition factors [00:02:22] and finally the transition factors t1 and t2 [00:02:24] t1 and t2 favor [00:02:25] favor adjacent xi's which are close so a [00:02:29] adjacent xi's which are close so a distance of zero will get [00:02:31] distance of zero will get to a weight of two whereas a distance of [00:02:34] to a weight of two whereas a distance of one will get [00:02:35] one will get one and so on [00:02:37] one and so on okay and you can click on this demo [00:02:40] okay and you can click on this demo to actually play with um [00:02:42] to actually play with um this [00:02:43] this uh csp [00:02:45] uh csp we'll come back to this in a bit [00:02:49] we'll come back to this in a bit okay so this is the object tracking um [00:02:52] okay so this is the object tracking um example so now [00:02:54] example so now so far we've seen backtracking search as [00:02:56] so far we've seen backtracking search as a way to compute [00:02:58] a way to compute maximal white uh assignments [00:03:00] maximal white uh assignments and backtracking search essentially does [00:03:03] and backtracking search essentially does uh exhaustive depth first search of the [00:03:05] uh exhaustive depth first search of the entire tree in the worst case which can [00:03:08] entire tree in the worst case which can take a very very long time [00:03:10] take a very very long time so how can we avoid this well we have to [00:03:13] so how can we avoid this well we have to give up on something and what we're [00:03:15] give up on something and what we're going to give up on [00:03:17] going to give up on is correctness [00:03:19] is correctness so what we're going to do is simply not [00:03:22] so what we're going to do is simply not backtrack [00:03:23] backtrack so let's start with something called the [00:03:25] so let's start with something called the greedy search algorithm [00:03:27] greedy search algorithm so again we start with the empty [00:03:29] so again we start with the empty assignment [00:03:30] assignment we consider [00:03:32] we consider possible settings of let's say x1 [00:03:35] possible settings of let's say x1 so let's say there's two possible [00:03:37] so let's say there's two possible settings [00:03:38] settings and we're just going to choose one of [00:03:39] and we're just going to choose one of them whichever ones has the [00:03:42] them whichever ones has the highest [00:03:43] highest weight [00:03:44] weight and the weight of remember of a partial [00:03:46] and the weight of remember of a partial assignment is the product of all the [00:03:48] assignment is the product of all the factors that you can evaluate so far [00:03:50] factors that you can evaluate so far so let's pick with this one [00:03:52] so let's pick with this one again let's set x2 [00:03:55] again let's set x2 there's two possible ways to set it [00:03:57] there's two possible ways to set it let's pick the better one and keep on [00:03:59] let's pick the better one and keep on going until we reach a complete [00:04:02] going until we reach a complete assignment and then we just return that [00:04:06] assignment and then we just return that so [00:04:07] so formally [00:04:08] formally what greedy search is doing is starting [00:04:10] what greedy search is doing is starting with a partial assignment which is empty [00:04:13] with a partial assignment which is empty and then it's going through each of the [00:04:15] and then it's going through each of the variables x1 through xn [00:04:17] variables x1 through xn i'm going to try to extend the partial [00:04:20] i'm going to try to extend the partial assignment to set [00:04:23] assignment to set x i so [00:04:24] x i so for each possible value [00:04:27] for each possible value that i can assign x i i'm going to form [00:04:30] that i can assign x i i'm going to form a [00:04:31] a potential candidate partial assignment [00:04:35] potential candidate partial assignment and call it xp [00:04:37] and call it xp and then i'm going to compute the weight [00:04:39] and then i'm going to compute the weight of each of these xv's and then choose [00:04:42] of each of these xv's and then choose the one with the highest weight [00:04:45] the one with the highest weight so important caveat is this is [00:04:46] so important caveat is this is definitely not guaranteed to find the [00:04:48] definitely not guaranteed to find the maximum weight assignment even local [00:04:50] maximum weight assignment even local even though locally it appears to be [00:04:52] even though locally it appears to be optimizing and finding the value with [00:04:55] optimizing and finding the value with the best [00:04:56] the best weight [00:04:58] weight so let's look at this um [00:05:00] so let's look at this um demo to see how it works on object [00:05:03] demo to see how it works on object tracking okay so [00:05:05] tracking okay so here we have the csp that's defined [00:05:08] here we have the csp that's defined and i'm going to step through this [00:05:09] and i'm going to step through this algorithm so [00:05:11] algorithm so initially [00:05:13] initially i extend the empty assignment to [00:05:16] i extend the empty assignment to assignments that [00:05:18] assignments that only [00:05:19] only fill in x1 so x1 could be 0 1 or 2 and [00:05:23] fill in x1 so x1 could be 0 1 or 2 and these are the weights of these partial [00:05:26] these are the weights of these partial three partial assignments [00:05:28] three partial assignments remember the sensor reading was zero [00:05:31] remember the sensor reading was zero so therefore x one equals zero has a [00:05:34] so therefore x one equals zero has a larger one [00:05:36] larger one so next step i prune i keep only the [00:05:40] so next step i prune i keep only the best candidate which in this case is x [00:05:42] best candidate which in this case is x one equals zero [00:05:45] one equals zero so then i go to um i equals two and i [00:05:49] so then i go to um i equals two and i extend [00:05:50] extend that assignment to [00:05:52] that assignment to uh three possible settings of x2 compute [00:05:56] uh three possible settings of x2 compute their weights [00:05:57] their weights and then i keep the best one which in [00:05:59] and then i keep the best one which in case this case is uh another one [00:06:02] case this case is uh another one and now i extend again to x3 [00:06:06] and now i extend again to x3 three possible values to set x3 compute [00:06:09] three possible values to set x3 compute the weights of these um [00:06:11] the weights of these um well now [00:06:12] well now complete assignments [00:06:14] complete assignments and then i choose uh the best one [00:06:17] and then i choose uh the best one so in this case 3d search ends up with [00:06:20] so in this case 3d search ends up with the assignment 0 1 1 [00:06:23] the assignment 0 1 1 with a weight of 4. and if you remember [00:06:27] with a weight of 4. and if you remember this [00:06:28] this this example the best [00:06:30] this example the best weight assignment have weight 8. so four [00:06:33] weight assignment have weight 8. so four is definitely not the right answer but [00:06:36] is definitely not the right answer but it's not zero either it found something [00:06:42] okay so what's the problem with 3d [00:06:44] okay so what's the problem with 3d search is that it's too myopic it only [00:06:46] search is that it's too myopic it only keeps the best canada single best [00:06:49] keeps the best canada single best candidate [00:06:50] candidate so beam search is just the natural [00:06:52] so beam search is just the natural generalization [00:06:53] generalization of greedy where i'm keeping at most k [00:06:57] of greedy where i'm keeping at most k candidates at each level [00:07:00] candidates at each level so let's say k equals four [00:07:03] so let's say k equals four so i'm going to start with empty [00:07:06] so i'm going to start with empty assignment i'm going to x uh extend [00:07:11] assignment i'm going to x uh extend and then i don't need to prune because [00:07:13] and then i don't need to prune because there's only two [00:07:14] there's only two um possible partial assignments here [00:07:17] um possible partial assignments here and i have a capacity of four [00:07:19] and i have a capacity of four i'm going to extend again [00:07:21] i'm going to extend again again i don't need to prune [00:07:23] again i don't need to prune but then next i'm going to extend each [00:07:26] but then next i'm going to extend each of [00:07:27] of the elements on my beam the partial [00:07:30] the elements on my beam the partial assignments [00:07:31] assignments extend each of these [00:07:33] extend each of these and now i have eight and i now i need to [00:07:35] and now i have eight and i now i need to reduce the eight partial assignments to [00:07:38] reduce the eight partial assignments to four [00:07:40] four and to do this i'm going to simply [00:07:42] and to do this i'm going to simply compute the weight of each of these [00:07:44] compute the weight of each of these eight partial assignments and then take [00:07:46] eight partial assignments and then take the four which have the highest weight [00:07:49] the four which have the highest weight and now let's suppose those are these [00:07:52] and now let's suppose those are these four [00:07:53] four and then i'll continue only expanding [00:07:55] and then i'll continue only expanding the ones i've kept [00:07:57] the ones i've kept and then [00:07:58] and then keeping the ones the again the top four [00:08:01] keeping the ones the again the top four and then keep on going [00:08:03] and then keep on going so notice that [00:08:06] so notice that visually [00:08:07] visually i'm exploring only a very very small [00:08:09] i'm exploring only a very very small fraction [00:08:10] fraction of the tree but i'm doing this kind of [00:08:13] of the tree but i'm doing this kind of holistically looking down uh the tree i [00:08:16] holistically looking down uh the tree i kind of multiple uh diff [00:08:19] kind of multiple uh diff i could be exploring [00:08:21] i could be exploring different parts of the tree um at the [00:08:23] different parts of the tree um at the same time [00:08:26] same time so formally [00:08:27] so formally beam search uh keeps at most k [00:08:30] beam search uh keeps at most k candidates [00:08:32] candidates of partial assignments [00:08:34] of partial assignments i'm gonna initialize the candidate set [00:08:36] i'm gonna initialize the candidate set to be [00:08:37] to be just the single partial assignment which [00:08:39] just the single partial assignment which is empty [00:08:40] is empty now again like greedy search i'm going [00:08:42] now again like greedy search i'm going to go through the variables one at a [00:08:44] to go through the variables one at a time i'm going to extend [00:08:47] time i'm going to extend in this case i'm going to consider each [00:08:50] in this case i'm going to consider each partial assignment in c [00:08:53] partial assignment in c and each possible value [00:08:55] and each possible value that i can assign x i [00:08:58] that i can assign x i and i'm gonna do perform the extend [00:09:01] and i'm gonna do perform the extend uh the assignment [00:09:03] uh the assignment and i'm just gonna keep track a c prime [00:09:06] and i'm just gonna keep track a c prime is going to be the new set of candidates [00:09:09] is going to be the new set of candidates and then now i'm going to prune that set [00:09:12] and then now i'm going to prune that set by computing the weight for each [00:09:14] by computing the weight for each element of c prime [00:09:17] element of c prime and just keeping the top k elements [00:09:22] and just keeping the top k elements so this is not guaranteed to find the [00:09:23] so this is not guaranteed to find the maximum weight assignment [00:09:25] maximum weight assignment either [00:09:27] either but sometimes it works better so let's [00:09:29] but sometimes it works better so let's look at this example [00:09:33] object tracking [00:09:35] object tracking and [00:09:36] and i stay extend from the empty assignment [00:09:39] i stay extend from the empty assignment to get three partial assignments to x1 [00:09:43] to get three partial assignments to x1 um [00:09:44] um i prune to the top three so nothing gets [00:09:48] i prune to the top three so nothing gets removed and then extend [00:09:50] removed and then extend so each of these three [00:09:53] so each of these three partial assignments gets extended into [00:09:56] partial assignments gets extended into three additional ones now i have nine [00:10:00] three additional ones now i have nine and now i'm going to prune down from [00:10:02] and now i'm going to prune down from nine to three so that will keep [00:10:05] nine to three so that will keep all the [00:10:07] all the assignments here with a positive void [00:10:10] assignments here with a positive void and now i extend again [00:10:12] and now i extend again to [00:10:14] to find settings of x3 [00:10:17] find settings of x3 compute each of these weights and then [00:10:19] compute each of these weights and then i'm going to take the top [00:10:23] assignments [00:10:26] okay so now [00:10:28] okay so now notice that [00:10:29] notice that the top assignment that i have right now [00:10:32] the top assignment that i have right now is [00:10:33] is one two two with a weight of eight [00:10:36] one two two with a weight of eight in this case i got lucky and i found [00:10:39] in this case i got lucky and i found actual max weight assignment but in [00:10:42] actual max weight assignment but in general you won't be guaranteed [00:10:49] okay so what is the time complexity of [00:10:51] okay so what is the time complexity of beam search because one of the [00:10:53] beam search because one of the advantages is that's supposed to be fast [00:10:56] advantages is that's supposed to be fast so let's do a simple calculation here so [00:10:58] so let's do a simple calculation here so suppose we have n variables which is the [00:11:00] suppose we have n variables which is the depth of this tree [00:11:01] depth of this tree and suppose that each of the variables [00:11:04] and suppose that each of the variables has [00:11:05] has v elements which is going to be the [00:11:07] v elements which is going to be the branching factor here [00:11:09] branching factor here and then the beam size is k [00:11:11] and then the beam size is k okay so what is the time [00:11:12] okay so what is the time that it takes to run beam search [00:11:15] that it takes to run beam search it's going to be for each of [00:11:18] it's going to be for each of the variables [00:11:19] the variables each level of this tree [00:11:22] each level of this tree we're going to have a set of candidates [00:11:24] we're going to have a set of candidates which is of size k [00:11:27] which is of size k and [00:11:28] and the extension [00:11:30] the extension is is going to take each of these k [00:11:32] is is going to take each of these k and extend it into b candidates so then [00:11:35] and extend it into b candidates so then i'm going to end up with kb [00:11:38] i'm going to end up with kb extended candidates total [00:11:40] extended candidates total and then i'm going to take have to take [00:11:43] and then i'm going to take have to take the top k [00:11:44] the top k so the time it takes to take a list of [00:11:47] so the time it takes to take a list of kb elements and select the top k [00:11:51] kb elements and select the top k elements is kb log k by building a heap [00:11:56] elements is kb log k by building a heap so the total time is n kb log k [00:11:59] so the total time is n kb log k and importantly [00:12:01] and importantly this is linear in the number of [00:12:04] this is linear in the number of variables whereas backtracking search [00:12:06] variables whereas backtracking search would be exponential and the number of [00:12:08] would be exponential and the number of variables [00:12:11] okay so let us summarize now [00:12:14] okay so let us summarize now so beam search is a fairly simple [00:12:16] so beam search is a fairly simple heuristic to [00:12:18] heuristic to approximate uh maximum weight [00:12:20] approximate uh maximum weight assignments and it's really done if [00:12:22] assignments and it's really done if you're really in a hurry and you don't [00:12:24] you're really in a hurry and you don't really care about getting maximum weight [00:12:25] really care about getting maximum weight assignment because you probably won't [00:12:28] assignment because you probably won't so um the nice thing about beam search [00:12:30] so um the nice thing about beam search is it has this parameter k which allows [00:12:32] is it has this parameter k which allows you to control the trade-off between [00:12:34] you to control the trade-off between efficiency and accuracy [00:12:37] efficiency and accuracy so if you're really in a hurry you set k [00:12:39] so if you're really in a hurry you set k equals one you just get greedy search [00:12:41] equals one you just get greedy search which sometimes actually gets you pretty [00:12:43] which sometimes actually gets you pretty good answers [00:12:45] good answers but and if as you increase k more and [00:12:49] but and if as you increase k more and more [00:12:50] more um if you can increase k to infinity [00:12:52] um if you can increase k to infinity then you'll definitely [00:12:54] then you'll definitely search the entire search tree and you [00:12:56] search the entire search tree and you will get the optimal answer but this is [00:12:58] will get the optimal answer but this is basically exponential time [00:13:01] basically exponential time one thing to know about uh beam search [00:13:03] one thing to know about uh beam search with kuan's infinity it's it is [00:13:06] with kuan's infinity it's it is performing a breadth first search of the [00:13:08] performing a breadth first search of the tree because at each it performs level [00:13:10] tree because at each it performs level by level and explores all the nodes in a [00:13:13] by level and explores all the nodes in a tree [00:13:15] tree systematically [00:13:17] systematically so using this analogy i want to [00:13:21] so using this analogy i want to end with a final [00:13:23] end with a final note here which is that that tracking [00:13:26] note here which is that that tracking search [00:13:27] search is really like doing a debt first search [00:13:29] is really like doing a debt first search on the search tree it dives deeply into [00:13:33] on the search tree it dives deeply into one [00:13:35] one harsh complete assignment [00:13:37] harsh complete assignment and then backtracks and then finds [00:13:38] and then backtracks and then finds another complete assignment and [00:13:40] another complete assignment and backtracks looking kind of one [00:13:42] backtracks looking kind of one assignment at a time [00:13:44] assignment at a time whereas beam search [00:13:46] whereas beam search is more akin to breadth first search [00:13:48] is more akin to breadth first search where we're proceeding level by level [00:13:50] where we're proceeding level by level but the main difference with breadth [00:13:52] but the main difference with breadth research is that we're doing this [00:13:54] research is that we're doing this heuristic pruning at each level to make [00:13:56] heuristic pruning at each level to make sure that we don't have too many [00:13:58] sure that we don't have too many candidates [00:13:59] candidates and the way it's using using that um [00:14:03] and the way it's using using that um doing that pruning is based on the [00:14:05] doing that pruning is based on the factors [00:14:06] factors that it can evaluate so far so for beam [00:14:09] that it can evaluate so far so for beam search to work [00:14:10] search to work you really need it to be the case that [00:14:13] you really need it to be the case that the factors are local and they can be [00:14:16] the factors are local and they can be evaluated as much as possible along the [00:14:18] evaluated as much as possible along the way and not all at the very end [00:14:21] way and not all at the very end all right so that's the end of this [00:14:23] all right so that's the end of this module ================================================================================ LECTURE 029 ================================================================================ Constraint Satisfaction Problems (CSPs) 7 - Local Search | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=VwZKPlK6jUg --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about local search a strategy for [00:00:09] about local search a strategy for approximately computing the maximum [00:00:11] approximately computing the maximum weight [00:00:12] weight assignment [00:00:13] assignment a constraint satisfaction problem [00:00:16] a constraint satisfaction problem so remember that a csv is defined by a [00:00:19] so remember that a csv is defined by a factor graph which includes a set of [00:00:21] factor graph which includes a set of variables x1 through xn [00:00:23] variables x1 through xn and a set of factors f1 through fm where [00:00:26] and a set of factors f1 through fm where each factor is a function that depends [00:00:28] each factor is a function that depends on a subset of the variables and returns [00:00:30] on a subset of the variables and returns a non-negative number [00:00:33] a non-negative number each assignment to all the variables has [00:00:36] each assignment to all the variables has a weight given by the product of all the [00:00:39] a weight given by the product of all the factors evaluated on assignment [00:00:42] factors evaluated on assignment and the objective is to compute the [00:00:44] and the objective is to compute the maximum weight assignment as usual [00:00:48] maximum weight assignment as usual so so far we've seen backtracking search [00:00:50] so so far we've seen backtracking search and beam search [00:00:52] and beam search and both of these search algorithms [00:00:54] and both of these search algorithms works by extending partial assignments [00:00:57] works by extending partial assignments you start with the empty assignments [00:01:00] you start with the empty assignments and then you assign one variable and you [00:01:02] and then you assign one variable and you sign another variable until you get to a [00:01:05] sign another variable until you get to a complete assignment and then maybe you [00:01:07] complete assignment and then maybe you backtrack or maybe you don't [00:01:09] backtrack or maybe you don't so local search is going to be a little [00:01:12] so local search is going to be a little bit different it's going to [00:01:15] bit different it's going to modify complete assignments so you're [00:01:17] modify complete assignments so you're going to start with a random assignment [00:01:19] going to start with a random assignment and then you're going to choose one [00:01:20] and then you're going to choose one variable and you change it choose [00:01:22] variable and you change it choose another variable and change it more kind [00:01:24] another variable and change it more kind of like house maintenance rather than [00:01:26] of like house maintenance rather than building a house [00:01:28] building a house so one of the advantages of local search [00:01:30] so one of the advantages of local search is that gives you additional flexibility [00:01:32] is that gives you additional flexibility you can pick any variable and try to [00:01:35] you can pick any variable and try to improve it [00:01:36] improve it whereas [00:01:37] whereas backtracking search and beam search you [00:01:40] backtracking search and beam search you have to do things in a certain order or [00:01:42] have to do things in a certain order or beam search once you've assigned a [00:01:44] beam search once you've assigned a variable you can't go back [00:01:46] variable you can't go back and backtracking search you can't [00:01:48] and backtracking search you can't backtrack but [00:01:50] backtrack but you can't really kind of backtrack [00:01:52] you can't really kind of backtrack backtrack out of order [00:01:56] so recall our running example object [00:01:59] so recall our running example object tracking so at each time position you [00:02:02] tracking so at each time position you observe a noisy sensor reading of a [00:02:04] observe a noisy sensor reading of a particular object you observe [00:02:07] particular object you observe 0 [00:02:08] 0 2 [00:02:09] 2 and 2 as the positions of the object and [00:02:12] and 2 as the positions of the object and you're trying to figure out where this [00:02:14] you're trying to figure out where this object was [00:02:15] object was so we did model this as a csp [00:02:18] so we did model this as a csp where we have [00:02:20] where we have three of observational factors [00:02:24] three of observational factors one which favors x1 equals [00:02:27] one which favors x1 equals zero o2 which favors x2 equals to [00:02:31] zero o2 which favors x2 equals to o3 which favors x3 equals two and we [00:02:34] o3 which favors x3 equals two and we have two transition factors [00:02:37] have two transition factors that [00:02:38] that favor [00:02:39] favor subsequent positions being close by [00:02:45] so let's jump in and suppose we just [00:02:49] so let's jump in and suppose we just have [00:02:50] have a complete assignment zero zero one [00:02:54] a complete assignment zero zero one okay the question is how do we improve [00:02:56] okay the question is how do we improve this [00:02:57] this well let's look at um the weight of this [00:03:00] well let's look at um the weight of this assignment so the weight of this [00:03:02] assignment so the weight of this assignment is 2 [00:03:04] assignment is 2 because 0 agrees with 0 [00:03:08] because 0 agrees with 0 times 2 because 0 agrees with 0 [00:03:11] times 2 because 0 agrees with 0 times 0 uh-oh because these two are too [00:03:15] times 0 uh-oh because these two are too far apart [00:03:17] far apart um times one because these only differ [00:03:21] um times one because these only differ by one [00:03:22] by one and times one because [00:03:25] and times one because by one [00:03:26] by one but you get a zero so that's not a very [00:03:28] but you get a zero so that's not a very good assignment so how can we improve [00:03:31] good assignment so how can we improve let's try to reassign x2 to something [00:03:34] let's try to reassign x2 to something else [00:03:34] else let's try to [00:03:35] let's try to assign it to some v so we can set v [00:03:38] assign it to some v so we can set v equals 0 1 or 2. [00:03:41] equals 0 1 or 2. and for each of these [00:03:44] and for each of these alternate assignments and compute its [00:03:46] alternate assignments and compute its weight [00:03:47] weight and then we simply take the assignment [00:03:49] and then we simply take the assignment with the best weight in this case it's [00:03:52] with the best weight in this case it's uh this one [00:03:54] uh this one which sets x2 to be one [00:03:57] which sets x2 to be one then we end up with a new assignment [00:03:59] then we end up with a new assignment which is better than the old one so [00:04:01] which is better than the old one so mission accomplished [00:04:05] so we can refine this strategy a little [00:04:07] so we can refine this strategy a little bit more so [00:04:09] bit more so suppose we're trying to [00:04:11] suppose we're trying to assign x2 [00:04:13] assign x2 the weight of a new assignment [00:04:16] the weight of a new assignment where x2 has been replaced with sum v [00:04:20] where x2 has been replaced with sum v is as follows so you're multiplying all [00:04:22] is as follows so you're multiplying all the factors in the csp together o 1 t 1 [00:04:25] the factors in the csp together o 1 t 1 o 2 t [00:04:26] o 2 t three [00:04:28] three but [00:04:29] but note that [00:04:30] note that only some of the factors depend on v [00:04:33] only some of the factors depend on v in particular o1 and o3 don't depend on [00:04:36] in particular o1 and o3 don't depend on v [00:04:37] v so no matter what v is these are the [00:04:39] so no matter what v is these are the same which means that we can ignore them [00:04:43] same which means that we can ignore them and um just evaluate the factors that [00:04:47] and um just evaluate the factors that involve [00:04:48] involve x2 [00:04:49] x2 so this is an idea of locality which [00:04:52] so this is an idea of locality which leverages the [00:04:54] leverages the structure of the csp [00:04:56] structure of the csp when evaluating possible reassignments [00:04:58] when evaluating possible reassignments to some variable x i we only need to [00:05:01] to some variable x i we only need to consider the factors that depend on x i [00:05:04] consider the factors that depend on x i so in a factor graph where there's lots [00:05:08] so in a factor graph where there's lots and lots of variables and you're trying [00:05:10] and lots of variables and you're trying to reassign one variable which might [00:05:12] to reassign one variable which might have a small neighborhood [00:05:14] have a small neighborhood then you're saving a lot of effort [00:05:19] so now we're ready to define [00:05:21] so now we're ready to define our local search algorithm which is [00:05:22] our local search algorithm which is called iterated conditional modes sounds [00:05:25] called iterated conditional modes sounds fancy but it's really simple [00:05:27] fancy but it's really simple the idea is that we're going to start [00:05:29] the idea is that we're going to start x to be a random complete assignment [00:05:34] x to be a random complete assignment and we're going to loop through [00:05:36] and we're going to loop through x1 through xn [00:05:38] x1 through xn and then keep on going until [00:05:40] and then keep on going until we converge or we run out of time [00:05:44] we converge or we run out of time um what we're going to do is we're going [00:05:46] um what we're going to do is we're going to try to reassign x on okay so we're [00:05:48] to try to reassign x on okay so we're going to consider each possible value [00:05:50] going to consider each possible value that x i could take on [00:05:53] that x i could take on and then we're going to update the [00:05:55] and then we're going to update the current assignment x [00:05:58] current assignment x with that value [00:06:00] with that value okay so this um that produces an [00:06:02] okay so this um that produces an assignment x v [00:06:05] assignment x v and then we're going to compute the [00:06:06] and then we're going to compute the weights [00:06:08] weights of each of these x v's and choose the [00:06:11] of each of these x v's and choose the one with the highest weight remember in [00:06:13] one with the highest weight remember in computing the weight we only need to [00:06:15] computing the weight we only need to evaluate [00:06:16] evaluate the factors that [00:06:17] the factors that touch x i and also notice that this [00:06:21] touch x i and also notice that this looks remarkably like you know greedy [00:06:24] looks remarkably like you know greedy search or beam search [00:06:26] search or beam search um there is a substantial difference [00:06:29] um there is a substantial difference in that here x are complete assignments [00:06:32] in that here x are complete assignments not [00:06:33] not partial assignments so this is not [00:06:35] partial assignments so this is not extending an assignment so much as [00:06:37] extending an assignment so much as replacing x i [00:06:40] replacing x i uh with a read [00:06:43] uh with a read so pictorially what this looks like is [00:06:46] so pictorially what this looks like is um you [00:06:49] um you start with x1 [00:06:50] start with x1 so by convention [00:06:52] so by convention unshaded nodes are the ones that are [00:06:55] unshaded nodes are the ones that are meant to be [00:06:56] meant to be reassigned and shaded ones are the ones [00:06:58] reassigned and shaded ones are the ones that are fixed [00:07:00] that are fixed so you pick up x1 and you say can i [00:07:02] so you pick up x1 and you say can i change it to make it better [00:07:04] change it to make it better and then you [00:07:05] and then you pick some value of x1 then you go to [00:07:08] pick some value of x1 then you go to x2 and say can i make uh change x2 to [00:07:10] x2 and say can i make uh change x2 to make this uh assignment better [00:07:13] make this uh assignment better and then you go to x3 and then you go [00:07:15] and then you go to x3 and then you go back to x1 and say hey can i make it [00:07:18] back to x1 and say hey can i make it better by changing x1 again you keep on [00:07:20] better by changing x1 again you keep on going until [00:07:22] going until it converges [00:07:24] it converges so here's a demo on the object tracking [00:07:27] so here's a demo on the object tracking example [00:07:28] example um [00:07:33] so [00:07:33] so at the start of the algorithm we're just [00:07:35] at the start of the algorithm we're just going to initialize this with a random [00:07:38] going to initialize this with a random assignment 0 1 2 and it has a weight [00:07:42] assignment 0 1 2 and it has a weight of 4. um and now i'm going to try to [00:07:46] of 4. um and now i'm going to try to maximize variable one uh x1 given [00:07:49] maximize variable one uh x1 given everything else so let's consider [00:07:51] everything else so let's consider alternative values of x1 so it could be [00:07:53] alternative values of x1 so it could be zero one or two for each of these i'm [00:07:56] zero one or two for each of these i'm going to compute [00:07:58] going to compute its weight only evaluating the factors [00:08:01] its weight only evaluating the factors that touch x1 so in this case it's only [00:08:04] that touch x1 so in this case it's only o1 and t1 that touch x1 so i only need [00:08:07] o1 and t1 that touch x1 so i only need to evaluate those [00:08:09] to evaluate those compute the weights [00:08:10] compute the weights choose the best one breaking ties [00:08:13] choose the best one breaking ties arbitrarily so i [00:08:14] arbitrarily so i choose x1 0 which means i didn't change [00:08:18] choose x1 0 which means i didn't change so now let me step [00:08:21] so now let me step so now i'm looking at one [00:08:23] so now i'm looking at one x2 [00:08:25] x2 um and can i change anything [00:08:30] um and can i change anything nope can i what about here x3 [00:08:34] nope can i what about here x3 um [00:08:35] um assigned is assigned to [00:08:37] assigned is assigned to what can i do [00:08:39] what can i do well i compute the weights [00:08:42] well i compute the weights and here i [00:08:44] and here i am choosing [00:08:45] am choosing x [00:08:46] x three to be one okay so [00:08:49] three to be one okay so um i change that assignment [00:08:51] um i change that assignment and now i go back to x1 and iterate [00:08:55] and now i go back to x1 and iterate and it looks like i've uh converged [00:08:58] and it looks like i've uh converged because i'm not changing anything [00:09:00] because i'm not changing anything so i've converged to [00:09:02] so i've converged to an assignment with a weight of 4 which [00:09:05] an assignment with a weight of 4 which if you remember is not the optimum uh [00:09:08] if you remember is not the optimum uh maximum weight assignment the maximum [00:09:11] maximum weight assignment the maximum weight assignment is [00:09:12] weight assignment is um has weight eight [00:09:14] um has weight eight so again iterative conditional modes is [00:09:17] so again iterative conditional modes is is going to give you an okay solution [00:09:19] is going to give you an okay solution but not necessarily the best one [00:09:26] so convergence properties [00:09:29] so convergence properties but the good news is that [00:09:31] but the good news is that the weight of your assignment is [00:09:34] the weight of your assignment is not going to go down it's going to [00:09:36] not going to go down it's going to always increase or stay the same each [00:09:38] always increase or stay the same each iteration [00:09:39] iteration and this is because when you're trying [00:09:41] and this is because when you're trying to reassign a variable you can always [00:09:44] to reassign a variable you can always choose the old value [00:09:47] choose the old value and maintain the same weight so any [00:09:49] and maintain the same weight so any change must be increasing the weight [00:09:52] change must be increasing the weight so this means that it converges in a [00:09:55] so this means that it converges in a finite number of iterations because [00:09:57] finite number of iterations because there's only finite number of possible [00:10:00] there's only finite number of possible assignments so you can only increase the [00:10:02] assignments so you can only increase the weight [00:10:03] weight a finite number of times [00:10:05] a finite number of times this uh can get second off local optimum [00:10:08] this uh can get second off local optimum as we've seen and it's not generally [00:10:11] as we've seen and it's not generally guaranteed to find the optimum [00:10:14] guaranteed to find the optimum assignment [00:10:16] assignment so just a quick note is that there's two [00:10:20] so just a quick note is that there's two ways around this [00:10:21] ways around this one is that there is a version of this [00:10:24] one is that there is a version of this where you can change two variables or [00:10:26] where you can change two variables or maybe three variables at a time [00:10:28] maybe three variables at a time and that allows you to perhaps get out [00:10:31] and that allows you to perhaps get out of your local optimum [00:10:33] of your local optimum and another thing [00:10:35] and another thing we can do is add randomness so at each [00:10:39] we can do is add randomness so at each step we could just add [00:10:41] step we could just add choose the best option or just choose a [00:10:43] choose the best option or just choose a random option and this will also allow [00:10:46] random option and this will also allow us to [00:10:47] us to escape these local optimum [00:10:50] escape these local optimum or we can use something like give [00:10:51] or we can use something like give sampling which i'll talk about [00:10:53] sampling which i'll talk about in a future module which will add [00:10:55] in a future module which will add stochasticity to [00:10:57] stochasticity to um [00:10:58] um icm [00:11:01] okay so here is the summary [00:11:04] okay so here is the summary so [00:11:05] so um [00:11:06] um let me summarize actually [00:11:08] let me summarize actually um all the search algorithms for csps [00:11:12] um all the search algorithms for csps that we've encountered so first we [00:11:15] that we've encountered so first we looked at backtracking search [00:11:17] looked at backtracking search so the strategy is to extend partial [00:11:20] so the strategy is to extend partial assignments and then backtrack when we [00:11:23] assignments and then backtrack when we get to [00:11:25] get to the complete assignment [00:11:27] the complete assignment the backtracking search is [00:11:29] the backtracking search is exact [00:11:30] exact it computes the actual [00:11:33] it computes the actual maximum weight assignment it's the only [00:11:35] maximum weight assignment it's the only algorithm that we're considering that [00:11:36] algorithm that we're considering that does that in general [00:11:39] does that in general but the main problem is that the time [00:11:41] but the main problem is that the time can be exponential [00:11:44] can be exponential in the number of variables [00:11:46] in the number of variables then we looked at a beam search [00:11:49] then we looked at a beam search which [00:11:50] which extends also extends partial assignments [00:11:54] extends also extends partial assignments and here we're trading off accuracy for [00:11:56] and here we're trading off accuracy for time so this is approximate [00:11:58] time so this is approximate it will only give you a [00:12:00] it will only give you a an okay solution [00:12:02] an okay solution but it's linear in the number of [00:12:04] but it's linear in the number of variables [00:12:06] variables and local search um we saw iterative [00:12:10] and local search um we saw iterative conditional modes which [00:12:12] conditional modes which does local search by choosing the best [00:12:15] does local search by choosing the best value of a variable at each given time [00:12:19] value of a variable at each given time um is a different strategy here we're [00:12:21] um is a different strategy here we're starting with complete assignments and [00:12:24] starting with complete assignments and modifying them to make them better [00:12:27] modifying them to make them better so here it's also [00:12:29] so here it's also approximate [00:12:31] approximate but it's fast just like beam search [00:12:34] but it's fast just like beam search okay so that concludes this module ================================================================================ LECTURE 030 ================================================================================ Markov Networks 1 - Overview | Stanford CS221: Artificial Intelligence (Autumn 2021) Source: https://www.youtube.com/watch?v=neeaJb3wCYw --- Transcript [00:00:05] hi in this module i'm going to be [00:00:07] hi in this module i'm going to be talking about markov network [00:00:10] talking about markov network so far we've introduced constraint [00:00:12] so far we've introduced constraint satisfaction problems the first of our [00:00:14] satisfaction problems the first of our variable based models now we're going to [00:00:16] variable based models now we're going to talk about markov networks the second [00:00:18] talk about markov networks the second type of variable based models which will [00:00:20] type of variable based models which will connect factor graphs with probability [00:00:23] connect factor graphs with probability and this will be a stepping stone along [00:00:25] and this will be a stepping stone along the way to network [00:00:28] the way to network so recall that variable based models are [00:00:31] so recall that variable based models are all based on factor graphs and markov [00:00:34] all based on factor graphs and markov networks are no [00:00:35] networks are no different so remember that a factor [00:00:37] different so remember that a factor graph consists of a set of variables x1 [00:00:40] graph consists of a set of variables x1 n [00:00:41] n and a set of factors f1 through fm [00:00:44] and a set of factors f1 through fm where each factor takes a subset of the [00:00:46] where each factor takes a subset of the variables and returns a negative number [00:00:49] variables and returns a negative number if you multiply all of these numbers [00:00:51] if you multiply all of these numbers together you can evaluate the weight of [00:00:54] together you can evaluate the weight of a particular assignment [00:00:58] so let's look at example of object [00:01:01] so let's look at example of object tracking so here remember the goal is [00:01:04] tracking so here remember the goal is over [00:01:05] over time [00:01:06] time record the noisy sensor [00:01:08] record the noisy sensor reading of [00:01:10] reading of an object position at zero [00:01:13] an object position at zero two and two and the goal is to figure [00:01:16] two and two and the goal is to figure out what the actual trajectory of this [00:01:18] out what the actual trajectory of this object [00:01:19] object we model this as a factor graph follows [00:01:22] we model this as a factor graph follows where we have a number of factors [00:01:25] where we have a number of factors representing uh the affinity for x1 to [00:01:28] representing uh the affinity for x1 to be [00:01:29] be close to zero f x two to be close to two [00:01:33] close to zero f x two to be close to two and x three close to two and also [00:01:36] and x three close to two and also adjacent positions to be close to each [00:01:39] adjacent positions to be close to each other [00:01:42] so [00:01:44] before we treated this factor graph as a [00:01:46] before we treated this factor graph as a constraint satisfaction problem where [00:01:48] constraint satisfaction problem where the goal is to find the maximum weight [00:01:50] the goal is to find the maximum weight assignment [00:01:52] assignment and in this particular example we can [00:01:55] and in this particular example we can look at the all the possible assignments [00:01:57] look at the all the possible assignments each assignment has a weight [00:01:59] each assignment has a weight and [00:02:01] and find that the maximum weight assignment [00:02:03] find that the maximum weight assignment is 1 [00:02:05] is 1 2 [00:02:06] 2 but just returning a single maximum way [00:02:10] but just returning a single maximum way assignment [00:02:11] assignment doesn't really give us the full picture [00:02:13] doesn't really give us the full picture in particular it doesn't represent how [00:02:16] in particular it doesn't represent how certain we are of [00:02:17] certain we are of this assignment and what about all the [00:02:19] this assignment and what about all the other possibilities [00:02:22] other possibilities so the goal of markov networks is to try [00:02:24] so the goal of markov networks is to try to capture this uncertainty over [00:02:26] to capture this uncertainty over assignments using the language [00:02:28] assignments using the language probability [00:02:30] probability so we've actually done most of the hard [00:02:31] so we've actually done most of the hard work already by setting up vector graphs [00:02:34] work already by setting up vector graphs the only remaining part is to connect [00:02:36] the only remaining part is to connect factor graphs with probability [00:02:38] factor graphs with probability so formally a markup network or a markov [00:02:41] so formally a markup network or a markov random field as it's sometimes called is [00:02:43] random field as it's sometimes called is a factor graph which defines a joint [00:02:45] a factor graph which defines a joint distribution [00:02:47] distribution over a set of random variables x1 [00:02:50] over a set of random variables x1 through xn [00:02:51] through xn so before these were just variables and [00:02:53] so before these were just variables and now their random variables become [00:02:54] now their random variables become talking about probabilities [00:02:57] talking about probabilities so remember the factor graph gives us a [00:02:59] so remember the factor graph gives us a weight [00:03:00] weight for each possible assignment x [00:03:03] for each possible assignment x and to convert this weight into a [00:03:04] and to convert this weight into a probability we just need to normalize it [00:03:08] probability we just need to normalize it so what i mean by that is i'm going to [00:03:11] so what i mean by that is i'm going to look at [00:03:13] look at the sum over all possible weights [00:03:16] the sum over all possible weights all possible assignments and their [00:03:18] all possible assignments and their weights [00:03:19] weights and i'm going to define z [00:03:21] and i'm going to define z as the sum of all the weights and that's [00:03:24] as the sum of all the weights and that's going to be called the normalization [00:03:25] going to be called the normalization constant or sometimes called the [00:03:27] constant or sometimes called the partition function [00:03:28] partition function and then i'm just going to divide by z [00:03:31] and then i'm just going to divide by z so this is going to produce [00:03:33] so this is going to produce something that sums to 1 [00:03:36] something that sums to 1 and i'm going to define that as a joint [00:03:38] and i'm going to define that as a joint distribution over big x [00:03:40] distribution over big x equals the assignment little x [00:03:44] equals the assignment little x okay so let's do this example here we [00:03:46] okay so let's do this example here we have um [00:03:48] have um uh x1 x2 x3 and the weight of x so for [00:03:52] uh x1 x2 x3 and the weight of x so for this uh we have a bunch of eight [00:03:55] this uh we have a bunch of eight possible or not um or six possible [00:03:57] possible or not um or six possible non-zero weight assignments with [00:03:59] non-zero weight assignments with particular weights [00:04:01] particular weights we add all these weights up that gives [00:04:04] we add all these weights up that gives us the partition function z which is [00:04:06] us the partition function z which is 26 here and then we divide each of these [00:04:09] 26 here and then we divide each of these weights by 26 to produce joint [00:04:13] weights by 26 to produce joint probability [00:04:16] and so now this probability distribution [00:04:19] and so now this probability distribution represents the uncertainty in the [00:04:21] represents the uncertainty in the problem [00:04:22] problem and notice that while one two two was [00:04:26] and notice that while one two two was the maximum weight assignment and it [00:04:28] the maximum weight assignment and it still is [00:04:29] still is this probability gives us a more nuanced [00:04:31] this probability gives us a more nuanced picture which is that we're only 31 [00:04:34] picture which is that we're only 31 percent sure that that is actually the [00:04:36] percent sure that that is actually the true uh trajectory that object so this [00:04:40] true uh trajectory that object so this could be useful information there's a [00:04:41] could be useful information there's a big difference between 31 and 90 [00:04:46] but wait we can do more than that so the [00:04:49] but wait we can do more than that so the language of probability as can allow us [00:04:51] language of probability as can allow us to ask for other answer other questions [00:04:54] to ask for other answer other questions besides just probabilities of all [00:04:56] besides just probabilities of all assignments [00:04:58] assignments for example if we wanted to know what [00:05:00] for example if we wanted to know what was [00:05:00] was where was object at time step 2 [00:05:03] where was object at time step 2 so that is what is the value f of random [00:05:07] so that is what is the value f of random variable x2 and i don't care about x1 [00:05:10] variable x2 and i don't care about x1 and x3 [00:05:11] and x3 so this query is captured by a quantity [00:05:14] so this query is captured by a quantity called marginal probability [00:05:16] called marginal probability and the marginal probability of a [00:05:18] and the marginal probability of a particular random variable [00:05:21] particular random variable equaling a particular value v is given [00:05:23] equaling a particular value v is given by [00:05:24] by so we write this as p of x i equals v [00:05:29] so we write this as p of x i equals v and this is given by just summing over [00:05:31] and this is given by just summing over all possible [00:05:32] all possible full assignments such that x i equals v [00:05:36] full assignments such that x i equals v so all assignments conditio [00:05:38] so all assignments conditio consistent with this condition [00:05:40] consistent with this condition of the joint distribution which we [00:05:43] of the joint distribution which we defined on the previous slide [00:05:46] defined on the previous slide so now let's look at this object [00:05:48] so now let's look at this object tracking example again [00:05:49] tracking example again so we [00:05:50] so we have our joint probability table that we [00:05:54] have our joint probability table that we computed on the previous slide and now [00:05:56] computed on the previous slide and now let's compute some probabilities of [00:05:58] let's compute some probabilities of marginal probability so first let's [00:06:01] marginal probability so first let's consider what is the probability of x2 [00:06:03] consider what is the probability of x2 equals one here [00:06:05] equals one here so to do that we look over here and look [00:06:07] so to do that we look over here and look at all the rows where x2 was one so [00:06:10] at all the rows where x2 was one so that's this first four here [00:06:12] that's this first four here and then we just add up their [00:06:13] and then we just add up their probabilities so that's 0.15 0.15 0.15.5 [00:06:17] probabilities so that's 0.15 0.15 0.15.5 and that gives us 0.6 [00:06:20] and that gives us 0.6 and now we can issue another [00:06:23] and now we can issue another marginal probability query what's the [00:06:24] marginal probability query what's the probability of x2 equals 2 [00:06:27] probability of x2 equals 2 and look at all the rows where x2 is 2 [00:06:30] and look at all the rows where x2 is 2 and these are these last two rows [00:06:33] and these are these last two rows and we add up these probabilities and [00:06:34] and we add up these probabilities and that gives us a point [00:06:37] that gives us a point there's some rounding error here [00:06:39] there's some rounding error here why this doesn't add up exactly [00:06:42] why this doesn't add up exactly okay so that allows us to answer [00:06:44] okay so that allows us to answer marginal probability [00:06:46] marginal probability so one thing you might note is that [00:06:49] so one thing you might note is that the answer here is actually different [00:06:51] the answer here is actually different than what if you just look at the max [00:06:53] than what if you just look at the max weight assignment [00:06:55] weight assignment in particular if you look at the maximum [00:06:57] in particular if you look at the maximum weight assignment it says the most [00:06:58] weight assignment it says the most likely thing is that x2 [00:07:01] likely thing is that x2 is one two two and you look at x2 oh and [00:07:04] is one two two and you look at x2 oh and it says two [00:07:06] it says two but notice that that is not the most [00:07:08] but notice that that is not the most likely [00:07:09] likely marg under the marginal probability the [00:07:11] marg under the marginal probability the most likely uh value for x2 under [00:07:14] most likely uh value for x2 under marginal probability is one and it has [00:07:16] marginal probability is one and it has 62 percent chance of being a one in that [00:07:20] 62 percent chance of being a one in that case [00:07:22] case so [00:07:23] so the intuition here is that while this [00:07:27] the intuition here is that while this trajectory [00:07:28] trajectory is has indeed the largest weight [00:07:31] is has indeed the largest weight um there is actually a lot of [00:07:33] um there is actually a lot of decentralized evidence for x2 equals one [00:07:36] decentralized evidence for x2 equals one with these assignments which have less [00:07:38] with these assignments which have less weight but kind of strengthen numbers we [00:07:42] weight but kind of strengthen numbers we if you add up all these weights they [00:07:44] if you add up all these weights they actually out [00:07:46] actually out number um the evidence for x2 equals [00:07:50] number um the evidence for x2 equals so this is kind of an important lesson [00:07:51] so this is kind of an important lesson that [00:07:52] that what question [00:07:54] what question what answer you're going to get really [00:07:55] what answer you're going to get really depends on the type of question you're [00:07:58] depends on the type of question you're asking [00:07:59] asking and in this case if you're really [00:08:00] and in this case if you're really interested in objects at time step 2 [00:08:02] interested in objects at time step 2 then marginal probability is [00:08:06] so let's look at a particular example so [00:08:09] so let's look at a particular example so the easy model is a very canonical [00:08:12] the easy model is a very canonical example that dates back to the 1920s [00:08:16] example that dates back to the 1920s from statistical physics [00:08:17] from statistical physics and the idea was that this is a model of [00:08:20] and the idea was that this is a model of ferromagnetism [00:08:24] so the idea is that you have um a markup [00:08:27] so the idea is that you have um a markup network which contains a bunch of [00:08:30] network which contains a bunch of different sites um [00:08:33] different sites um and each site is going to be denoted x i [00:08:36] and each site is going to be denoted x i which can take on two values minus one [00:08:39] which can take on two values minus one and plus one so minus one represents a [00:08:41] and plus one so minus one represents a down spin and plus one represents um an [00:08:45] down spin and plus one represents um an up spin [00:08:47] up spin and furthermore [00:08:50] and furthermore all these uh [00:08:52] all these uh variables are going to be related by [00:08:55] variables are going to be related by a factor so we're going to call this f [00:08:57] a factor so we're going to call this f i j [00:08:59] i j which connects the site i and site j [00:09:01] which connects the site i and site j it's going to depend on the spin of i [00:09:04] it's going to depend on the spin of i and the spin of type j [00:09:06] and the spin of type j and that's going to be equal to [00:09:09] and that's going to be equal to um [00:09:10] um x of [00:09:11] x of beta times x i x j [00:09:14] beta times x i x j okay so the intuition is that we want [00:09:16] okay so the intuition is that we want neighboring insights to have the same [00:09:19] neighboring insights to have the same okay so by multiplying these together [00:09:22] okay so by multiplying these together uh if both of them are have the same [00:09:25] uh if both of them are have the same sign then this is going to be one if [00:09:26] sign then this is going to be one if they have opposite signs they're going [00:09:28] they have opposite signs they're going to be negative one [00:09:29] to be negative one and beta here is a scaling that says how [00:09:33] and beta here is a scaling that says how strong is the [00:09:34] strong is the affinity so if theta is zero [00:09:37] affinity so if theta is zero that means uh this is just x of zero [00:09:41] that means uh this is just x of zero which is one so that means there's no [00:09:42] which is one so that means there's no connection between and as beta increases [00:09:46] connection between and as beta increases then the affinity becomes stronger the [00:09:48] then the affinity becomes stronger the difference between [00:09:50] difference between them a green and not a green becomes [00:09:54] them a green and not a green becomes heightened [00:09:56] heightened so one thing uh easy models are useful [00:09:59] so one thing uh easy models are useful for is to study phase transitions in the [00:10:01] for is to study phase transitions in the ethical systems so here is an example of [00:10:04] ethical systems so here is an example of what happens when a beta increases [00:10:08] what happens when a beta increases so [00:10:10] this [00:10:11] this if beta is close to zero then you're [00:10:14] if beta is close to zero then you're basically going to get unstructured [00:10:17] basically going to get unstructured systems where [00:10:18] systems where each site is just behaving independently [00:10:21] each site is just behaving independently in fact if beta is zero then you all [00:10:23] in fact if beta is zero then you all assignments are equally likely [00:10:25] assignments are equally likely and as beta increases you'll see that [00:10:27] and as beta increases you'll see that more and more coherence happens [00:10:30] more and more coherence happens where neighboring sites really like to [00:10:32] where neighboring sites really like to be close to each other but of course [00:10:34] be close to each other but of course there's going to be some kind of sharp [00:10:36] there's going to be some kind of sharp ridges where um two neighbors have to [00:10:39] ridges where um two neighbors have to disagree [00:10:42] so how we're gonna sample from [00:10:45] so how we're gonna sample from this model [00:10:46] this model is going to be a topic for another [00:10:48] is going to be a topic for another module [00:10:51] so here is another canonical application [00:10:53] so here is another canonical application of markov networks from computer vision [00:10:55] of markov networks from computer vision so this used to be very popular before [00:10:58] so this used to be very popular before um so the idea is that you take a noisy [00:11:02] um so the idea is that you take a noisy image and you want to denoise it into a [00:11:04] image and you want to denoise it into a clean image [00:11:06] clean image so we're going to present a very [00:11:08] so we're going to present a very stylized uh simple example of this [00:11:11] stylized uh simple example of this so here is our [00:11:13] so here is our three by five image so each side is a [00:11:16] three by five image so each side is a pixel and we assume that [00:11:19] pixel and we assume that we only [00:11:21] we only so x i is either zero or one [00:11:24] so x i is either zero or one uh which is a pixel value which is [00:11:25] uh which is a pixel value which is unknown we're modeling the clean image [00:11:28] unknown we're modeling the clean image and we assume that only a subset of the [00:11:30] and we assume that only a subset of the pixels are observed so maybe we observe [00:11:32] pixels are observed so maybe we observe this one this one this one this one [00:11:34] this one this one this one this one this one and the uh the goal is to fill [00:11:37] this one and the uh the goal is to fill in the rest of the pixels [00:11:39] in the rest of the pixels so we can capture [00:11:41] so we can capture this observation [00:11:44] this observation by an observation potential of o i x i [00:11:47] by an observation potential of o i x i which is one if x i agrees with the [00:11:51] which is one if x i agrees with the observation and zero if it doesn't so [00:11:54] observation and zero if it doesn't so this is a hard constraint that says [00:11:56] this is a hard constraint that says where i observed a value [00:11:58] where i observed a value x i must take on that value so this one [00:12:01] x i must take on that value so this one has to be zero this one has to be one [00:12:03] has to be zero this one has to be one and so on [00:12:04] and so on and finally we have um [00:12:07] and finally we have um transition factors that say neighboring [00:12:09] transition factors that say neighboring pixels are more likely to be the same [00:12:11] pixels are more likely to be the same than different so again the same [00:12:13] than different so again the same intuition is the easy model [00:12:16] intuition is the easy model and we're going to denote this as tij [00:12:19] and we're going to denote this as tij and this equals [00:12:22] and this equals is 2 if two [00:12:25] is 2 if two neighboring pixels agree [00:12:27] neighboring pixels agree and is going to be one if they [00:12:35] so [00:12:35] so let me summarize markov networks you can [00:12:38] let me summarize markov networks you can think about it succinctly as taking [00:12:40] think about it succinctly as taking factor graphs and [00:12:42] factor graphs and marrying them with probability [00:12:45] marrying them with probability so again factor graphs already have done [00:12:48] so again factor graphs already have done a lot of the work they allow already you [00:12:51] a lot of the work they allow already you to specify a non-negative weight for [00:12:53] to specify a non-negative weight for every assignment and all we have to do [00:12:56] every assignment and all we have to do is normalize that to get a probability [00:12:59] is normalize that to get a probability distribution [00:13:01] distribution and once we have the probability [00:13:02] and once we have the probability distribution we can answer all sorts of [00:13:04] distribution we can answer all sorts of queries for example computing marginal [00:13:06] queries for example computing marginal probabilities which allows us to [00:13:08] probabilities which allows us to pinpoint individual variables and ask [00:13:11] pinpoint individual variables and ask questions about them [00:13:13] questions about them so it is useful comparing markup [00:13:14] so it is useful comparing markup networks with uh csp [00:13:17] networks with uh csp so dsps we talked about variables who [00:13:20] so dsps we talked about variables who are known unknown [00:13:21] are known unknown and [00:13:22] and markov networks it's [00:13:24] markov networks it's we're call them random variables uh [00:13:27] we're call them random variables uh maybe hive behave like variables but [00:13:29] maybe hive behave like variables but they're random variables because um [00:13:31] they're random variables because um we're endowing them with a probabilistic [00:13:33] we're endowing them with a probabilistic interpretation [00:13:35] interpretation in csps we talked about weights [00:13:37] in csps we talked about weights markov networks we talked about [00:13:39] markov networks we talked about probabilities which are the normalized [00:13:41] probabilities which are the normalized weights [00:13:42] weights and the main difference is that in csps [00:13:44] and the main difference is that in csps we were trying to find the maximum [00:13:46] we were trying to find the maximum weight assignment [00:13:48] weight assignment and in markov networks we're looking at [00:13:51] and in markov networks we're looking at the distribution over [00:13:53] the distribution over assignments holistically and answering [00:13:55] assignments holistically and answering questions about marginal probabilities [00:13:58] questions about marginal probabilities which gives us a more nuanced idea of [00:14:00] which gives us a more nuanced idea of the set of possible [00:14:04] okay that's it for this module ================================================================================ LECTURE 031 ================================================================================ Markov Networks 2 - Gibbs Sampling | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=k6aZZF2pk7k --- Transcript [00:00:06] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about gibbs sampling simple algorithm [00:00:09] about gibbs sampling simple algorithm for approximately computing marginal [00:00:12] for approximately computing marginal probabilities and [00:00:15] so recall that a markov network is based [00:00:17] so recall that a markov network is based on a factor graph [00:00:19] on a factor graph and [00:00:20] and a factor graph [00:00:23] a factor graph gives a weight to every possible [00:00:25] gives a weight to every possible assignment of variables in that vector [00:00:26] assignment of variables in that vector graph [00:00:27] graph and in a markov network we'll convert [00:00:29] and in a markov network we'll convert that weight into a probability by first [00:00:32] that weight into a probability by first computing [00:00:33] computing the normalization constant which is sum [00:00:35] the normalization constant which is sum over [00:00:37] over uh all the assignments of the weight of [00:00:39] uh all the assignments of the weight of that assignment [00:00:40] that assignment divide by that normalization constant [00:00:42] divide by that normalization constant and we get the probability of [00:00:46] and we get the probability of assignment little x [00:00:49] assignment little x so in this object tracking example [00:00:51] so in this object tracking example we see [00:00:53] we see we have a bunch of different assignments [00:00:56] we have a bunch of different assignments their weights [00:00:57] their weights the partition function in this case is [00:00:58] the partition function in this case is 26 we divide each of these weights by 26 [00:01:02] 26 we divide each of these weights by 26 and we get [00:01:03] and we get these probabilities [00:01:06] these probabilities so the cool thing with markov networks [00:01:07] so the cool thing with markov networks is that you can compute marginal [00:01:09] is that you can compute marginal probability [00:01:11] probability and that's going to be our focus today [00:01:13] and that's going to be our focus today so marginal probability [00:01:15] so marginal probability is going to be focusing on one [00:01:17] is going to be focusing on one particular variable x i and asking what [00:01:20] particular variable x i and asking what values could it take on [00:01:22] values could it take on and to get that we're going to sum over [00:01:25] and to get that we're going to sum over all possible assignments where x i does [00:01:28] all possible assignments where x i does actually equal [00:01:29] actually equal of the joint probability of that [00:01:32] of the joint probability of that assignment [00:01:33] assignment and in this example [00:01:35] and in this example if you look [00:01:37] if you look and ask for the probability of x2 equals [00:01:39] and ask for the probability of x2 equals one you sum over all the rows where x [00:01:42] one you sum over all the rows where x two one that gives you uh [00:01:45] two one that gives you uh point six two and if you ask for [00:01:48] point six two and if you ask for x two equals two [00:01:50] x two equals two either you're summing over the last two [00:01:52] either you're summing over the last two rows and that gives you [00:01:56] so now let me present give sampling [00:02:00] so now let me present give sampling there's a simple algorithm for [00:02:01] there's a simple algorithm for approximately computing these marginal [00:02:04] approximately computing these marginal you could [00:02:05] you could iterate over all possible assignments [00:02:06] iterate over all possible assignments and compute but that would take [00:02:08] and compute but that would take exponential time [00:02:10] exponential time so give sampling is going to follow the [00:02:12] so give sampling is going to follow the template of local search where we're [00:02:14] template of local search where we're going to go through each variable one at [00:02:16] going to go through each variable one at a time and update them [00:02:17] a time and update them but unlike iterated conditional modes [00:02:20] but unlike iterated conditional modes which we saw before give sampling is a [00:02:22] which we saw before give sampling is a randomized algorithm tailored for the [00:02:24] randomized algorithm tailored for the purpose of computing [00:02:27] purpose of computing so let's present the algorithm [00:02:29] so let's present the algorithm so [00:02:30] so we're going to initialize the assignment [00:02:32] we're going to initialize the assignment to some completely random [00:02:34] to some completely random assignment [00:02:36] assignment and then we're going to loop through [00:02:37] and then we're going to loop through each of the variables until convergence [00:02:40] each of the variables until convergence which i'll talk about a little bit later [00:02:42] which i'll talk about a little bit later we're going to [00:02:45] we're going to set [00:02:46] set the assignment x i equals v [00:02:49] the assignment x i equals v with this probability the probability of [00:02:52] with this probability the probability of x i equals v [00:02:54] x i equals v given [00:02:56] given x minus i equals [00:02:57] x minus i equals x minus i so this my x minus i notation [00:03:01] x minus i so this my x minus i notation just refers to all the variables except [00:03:03] just refers to all the variables except for x i [00:03:06] for x i i'll come back to this in a second but [00:03:08] i'll come back to this in a second but let me just highlight the kind of the [00:03:09] let me just highlight the kind of the general flow of the algorithm so suppose [00:03:12] general flow of the algorithm so suppose you have three variables if sampling is [00:03:14] you have three variables if sampling is going to try to sample x1 holding the [00:03:18] going to try to sample x1 holding the other ones fixed and now it's going to [00:03:20] other ones fixed and now it's going to move on to x2 holding the others fixed [00:03:22] move on to x2 holding the others fixed and update x2 and then go to x3 and then [00:03:26] and update x2 and then go to x3 and then it's going to cycle back to x1 x2 x3 and [00:03:28] it's going to cycle back to x1 x2 x3 and so [00:03:31] so now how do i sample [00:03:33] so now how do i sample x i equals [00:03:36] x i equals so um [00:03:37] so um [Music] [00:03:38] [Music] here is one example [00:03:41] here is one example what we're going to do is we're going to [00:03:44] what we're going to do is we're going to try uh assigning [00:03:48] try uh assigning x i equals v [00:03:50] x i equals v and getting some [00:03:51] and getting some weight [00:03:53] weight so for every possible assignment of x2 [00:03:56] so for every possible assignment of x2 i'm going to get some weight [00:03:58] i'm going to get some weight and now remember in icm i would just [00:04:00] and now remember in icm i would just simply take the value that produce the [00:04:03] simply take the value that produce the largest weight [00:04:04] largest weight but the main difference with gibb [00:04:06] but the main difference with gibb sampling is that i'm going to take these [00:04:08] sampling is that i'm going to take these weights and i'm going to normalize them [00:04:10] weights and i'm going to normalize them to produce a probability again [00:04:13] to produce a probability again normalizing is summing these values but [00:04:16] normalizing is summing these values but when i get five and dividing by five to [00:04:19] when i get five and dividing by five to get probability point two point four [00:04:21] get probability point two point four four [00:04:23] four and now i'm going to sample x two equals [00:04:26] and now i'm going to sample x two equals one of these values according to this [00:04:27] one of these values according to this probability district [00:04:29] probability district so you can visualize that sampling [00:04:31] so you can visualize that sampling process [00:04:32] process by [00:04:34] by the interval from 0 to 1 [00:04:36] the interval from 0 to 1 where i have a number of segments [00:04:39] where i have a number of segments representing the different possible [00:04:41] representing the different possible values of x2 [00:04:43] values of x2 and the length is uh exactly the [00:04:45] and the length is uh exactly the probability so probability of x2 equals [00:04:48] probability so probability of x2 equals 0 probability of x2 equals 1 and [00:04:51] 0 probability of x2 equals 1 and probability of x2 equals 2. [00:04:54] probability of x2 equals 2. and then i'm going to throw a [00:04:55] and then i'm going to throw a one-dimensional dart at this um at this [00:04:58] one-dimensional dart at this um at this line i'm going to hit it somewhere and [00:05:00] line i'm going to hit it somewhere and i'm going to take whatever value [00:05:03] i'm going to take whatever value is [00:05:04] is specified by that and [00:05:07] specified by that and okay so now i have [00:05:09] okay so now i have a new value for [00:05:11] a new value for x2 here and now i proceed to the next [00:05:14] x2 here and now i proceed to the next variable and [00:05:18] so [00:05:20] so so that [00:05:22] so that produces a sequence of [00:05:24] produces a sequence of samples [00:05:25] samples of the assignments [00:05:27] of the assignments and the main remaining thing to do is to [00:05:30] and the main remaining thing to do is to aggregate them [00:05:32] aggregate them so i'm going to [00:05:34] so i'm going to every time i go through this loop i'm [00:05:36] every time i go through this loop i'm going to increment a counter [00:05:39] going to increment a counter for variable i [00:05:40] for variable i of the particular value [00:05:43] of the particular value that i i saw [00:05:46] that i i saw okay so [00:05:47] okay so and at the very end [00:05:49] and at the very end i'm going to compute an estimate p hat [00:05:52] i'm going to compute an estimate p hat of [00:05:53] of x i equals x little x i [00:05:56] x i equals x little x i and this is going to be simply the [00:05:58] and this is going to be simply the normalized version of the count [00:06:02] normalized version of the count so this is going to be the relative [00:06:04] so this is going to be the relative frequency of [00:06:06] frequency of seeing a particular value little x i uh [00:06:09] seeing a particular value little x i uh compared to you know everything else [00:06:11] compared to you know everything else i've seen [00:06:14] i've seen but there's a lot of counting and [00:06:16] but there's a lot of counting and normalizing [00:06:19] normalizing so let's look at this demo to uh give us [00:06:22] so let's look at this demo to uh give us a [00:06:23] a more fuller sense what's going on okay [00:06:26] more fuller sense what's going on okay so here is the object tracking example i [00:06:27] so here is the object tracking example i have three variables and here i'm going [00:06:30] have three variables and here i'm going i can specify the query which is which [00:06:33] i can specify the query which is which variable am i interested in calculating [00:06:35] variable am i interested in calculating the marginal of [00:06:37] the marginal of and i'm going to run gibbs sampling here [00:06:41] and then at the beginning [00:06:43] and then at the beginning i sample a variable x 1 [00:06:46] i sample a variable x 1 given everything else so consider all [00:06:49] given everything else so consider all the possible values of x1 [00:06:51] the possible values of x1 um i'm going to look at their [00:06:53] um i'm going to look at their potentials or factors computer weight [00:06:57] potentials or factors computer weight normalize [00:06:59] normalize to get a distribution and i'm going to [00:07:01] to get a distribution and i'm going to sample a value according to [00:07:03] sample a value according to these probabilities so this in this case [00:07:06] these probabilities so this in this case is just a coin flip i choose x1 equals [00:07:08] is just a coin flip i choose x1 equals zero [00:07:09] zero and then i update my counter [00:07:12] and then i update my counter so i'm recording that i saw [00:07:14] so i'm recording that i saw x2 equals ones once [00:07:19] x2 equals ones once okay [00:07:20] okay and then i'm going to move on to the [00:07:21] and then i'm going to move on to the next variable x2 do the same thing [00:07:24] next variable x2 do the same thing move to the next variable x3 and do the [00:07:27] move to the next variable x3 and do the same thing and i'm going to just cycle [00:07:29] same thing and i'm going to just cycle for this for a moment you can see that [00:07:32] for this for a moment you can see that um [00:07:33] um the the assignment which is depicted up [00:07:36] the the assignment which is depicted up here is changing [00:07:39] here is changing um [00:07:40] um and down here [00:07:42] and down here um i can see that the count [00:07:45] um i can see that the count of the number of times x1 [00:07:47] of the number of times x1 x2 equals 1 has gone up to 25 [00:07:51] x2 equals 1 has gone up to 25 um and now look i actually hit a [00:07:54] um and now look i actually hit a different value x i went to a [00:07:56] different value x i went to a configuration where x2 equals two now [00:07:59] configuration where x2 equals two now um [00:08:00] um and then i might sample a little bit [00:08:03] and then i might sample a little bit more and they'll come back to one [00:08:05] more and they'll come back to one and you can just watch this for a little [00:08:08] and you can just watch this for a little while and you can see over here that [00:08:11] while and you can see over here that these are the estimates of the marginal [00:08:14] these are the estimates of the marginal probability of x2 based on the counts so [00:08:17] probability of x2 based on the counts so these numbers are simply these [00:08:19] these numbers are simply these normalized versions of these [00:08:22] normalized versions of these so i'm going to speed this up a little [00:08:24] so i'm going to speed this up a little bit so let me do just a thousand steps [00:08:27] bit so let me do just a thousand steps at a time [00:08:28] at a time okay so now i have a val if i did a [00:08:31] okay so now i have a val if i did a thousand steps of gibbs sampling now i [00:08:33] thousand steps of gibbs sampling now i have a lot of counts of x uh two equals [00:08:35] have a lot of counts of x uh two equals one some counts of x two equals two and [00:08:38] one some counts of x two equals two and now you can see the probabilities are [00:08:42] now you can see the probabilities are kind of converging to something like 0.6 [00:08:44] kind of converging to something like 0.6 and 0.3 let me just hit a step a few [00:08:47] and 0.3 let me just hit a step a few more times and you can see that these [00:08:50] more times and you can see that these probabilities are indeed converging to [00:08:53] probabilities are indeed converging to 0.61 [00:08:55] 0.61 which if you remember from here is [00:08:58] which if you remember from here is pretty close to the true marginal [00:09:00] pretty close to the true marginal probability [00:09:03] probability okay so [00:09:04] okay so it seems you know at first gland kind of [00:09:07] it seems you know at first gland kind of a wild thing right so we're running this [00:09:09] a wild thing right so we're running this algorithm it's just generating samples [00:09:11] algorithm it's just generating samples left and right it's [00:09:13] left and right it's kind of random and yet if i compute the [00:09:17] kind of random and yet if i compute the randomness is very carefully are [00:09:19] randomness is very carefully are orchestrated so that when i sum things [00:09:21] orchestrated so that when i sum things up properly i actually get the right [00:09:24] up properly i actually get the right answer out [00:09:28] so let me now go to the image de-noising [00:09:31] so let me now go to the image de-noising example so here the goal is you're given [00:09:33] example so here the goal is you're given a noisy image clean it up and in our [00:09:37] a noisy image clean it up and in our simplified version [00:09:38] simplified version i have [00:09:39] i have x i which represents the [00:09:42] x i which represents the clean pixel value [00:09:44] clean pixel value which i don't know [00:09:45] which i don't know a subset of the pixels are observed so [00:09:48] a subset of the pixels are observed so for example these um in green here [00:09:51] for example these um in green here and i'm going to clamp those pixel [00:09:52] and i'm going to clamp those pixel values to the observed value and then i [00:09:55] values to the observed value and then i have a [00:09:56] have a factor [00:09:58] factor that says neighboring pixels are twice [00:10:00] that says neighboring pixels are twice as likely to be the same than different [00:10:04] as likely to be the same than different so let's do give sampling in this image [00:10:06] so let's do give sampling in this image noise in case so what give sampling [00:10:08] noise in case so what give sampling would do is it's going to sweep across [00:10:11] would do is it's going to sweep across the image [00:10:12] the image and sample each variable condition on [00:10:14] and sample each variable condition on the left [00:10:16] the left so [00:10:17] so suppose [00:10:18] suppose i'm landing on this particular [00:10:21] i'm landing on this particular pixel value and i'm trying to figure out [00:10:22] pixel value and i'm trying to figure out what should its value be [00:10:25] what should its value be so again i look at the possible values [00:10:26] so again i look at the possible values it could be zero or one and for each [00:10:29] it could be zero or one and for each value i'm going to compute a weight [00:10:31] value i'm going to compute a weight so [00:10:32] so remember from icm that i actually don't [00:10:35] remember from icm that i actually don't need to [00:10:36] need to compute the weight of the entire [00:10:38] compute the weight of the entire assignment i just only need to look at [00:10:40] assignment i just only need to look at the factors which are dependent on this [00:10:43] the factors which are dependent on this value [00:10:44] value okay so [00:10:45] okay so let's consider v equals zero [00:10:48] let's consider v equals zero so here if i put zero here that means [00:10:50] so here if i put zero here that means this potential is going to be happy [00:10:52] this potential is going to be happy because years agree and i'm gonna get a [00:10:54] because years agree and i'm gonna get a two [00:10:55] two um [00:10:56] um and this one is going to disagree [00:10:59] and this one is going to disagree this one's going to disagree on this one [00:11:02] this one's going to disagree on this one so the weight is 2 times 1 times 1 times [00:11:05] so the weight is 2 times 1 times 1 times 1 which is [00:11:07] 1 which is so now if i try to put a 1 in this [00:11:10] so now if i try to put a 1 in this position [00:11:12] position now [00:11:12] now um this uh potential says one while the [00:11:16] um this uh potential says one while the others say two [00:11:18] others say two so now that has a weight of eight [00:11:21] so now that has a weight of eight so now to get the probability of x i [00:11:23] so now to get the probability of x i equals v [00:11:24] equals v given everything else i'm simply going [00:11:27] given everything else i'm simply going to sum up and normalize so i have 2 and [00:11:30] to sum up and normalize so i have 2 and 8 here the normalization constant [00:11:33] 8 here the normalization constant is 10. so i get probabilities 0.2 and [00:11:36] is 10. so i get probabilities 0.2 and 0.8 [00:11:37] 0.8 now given this distribution i'm going to [00:11:39] now given this distribution i'm going to set [00:11:40] set this value to [00:11:42] this value to 1 with probability of 0.8 and 0 with [00:11:45] 1 with probability of 0.8 and 0 with probability 0.2 [00:11:46] probability 0.2 and then i'm going to keep on going [00:11:50] so here is a fun little demo of give [00:11:53] so here is a fun little demo of give sampling for image noise that runs in [00:11:56] sampling for image noise that runs in your browser [00:11:57] your browser okay so the idea is that here is an [00:11:59] okay so the idea is that here is an image and if you [00:12:01] image and if you uh hit control enter here [00:12:04] uh hit control enter here um you'll see that this is the input to [00:12:07] um you'll see that this is the input to the system so we have black pixels and [00:12:11] the system so we have black pixels and red pixels these are the observed pixels [00:12:14] red pixels these are the observed pixels and white [00:12:16] and white pixels are unobserved and these are the [00:12:18] pixels are unobserved and these are the ones that we want to fill in [00:12:20] ones that we want to fill in so there's a bunch of settings which [00:12:22] so there's a bunch of settings which i'll talk to about in a second but if [00:12:24] i'll talk to about in a second but if you click here you can see how get a [00:12:27] you click here you can see how get a feeling for what give sampling is doing [00:12:30] feeling for what give sampling is doing each frame here each iteration is a full [00:12:34] each frame here each iteration is a full pass over all [00:12:35] pass over all the pixels and you can see that it's [00:12:37] the pixels and you can see that it's kind of dancing around because it's [00:12:39] kind of dancing around because it's trying to explore [00:12:40] trying to explore different uh assignments [00:12:44] different uh assignments so one thing you can do is um you can [00:12:47] so one thing you can do is um you can set show marginals equals true [00:12:49] set show marginals equals true and what this does is that instead of [00:12:52] and what this does is that instead of visualizing the assignment at a [00:12:54] visualizing the assignment at a particular iteration for each of his [00:12:57] particular iteration for each of his pixel here i'm actually visualizing the [00:13:00] pixel here i'm actually visualizing the marginal probability estimate so this is [00:13:03] marginal probability estimate so this is in general going to be a number between [00:13:04] in general going to be a number between 0 and 1 which is represented as a shade [00:13:07] 0 and 1 which is represented as a shade between black and red here [00:13:09] between black and red here so this in some sense is the kind of [00:13:11] so this in some sense is the kind of best guess at what the reconstruction [00:13:14] best guess at what the reconstruction is [00:13:16] is so there are a number of things you can [00:13:18] so there are a number of things you can play with so for example the fraction of [00:13:20] play with so for example the fraction of missing pixels if i reduce this to let's [00:13:22] missing pixels if i reduce this to let's say 0.3 [00:13:24] say 0.3 then [00:13:26] then you know the problem becomes easier and [00:13:28] you know the problem becomes easier and you can see that the reconstruction gets [00:13:30] you can see that the reconstruction gets some you know pretty reasonable [00:13:32] some you know pretty reasonable results another fun thing you can play [00:13:35] results another fun thing you can play with is um well actually let me let me [00:13:38] with is um well actually let me let me bring down the map [00:13:40] bring down the map bring up the missing fraction to one [00:13:43] bring up the missing fraction to one okay so that means i don't see any pixel [00:13:45] okay so that means i don't see any pixel so [00:13:47] so here um this is just going to be [00:13:49] here um this is just going to be actually i mean let me do [00:13:51] actually i mean let me do show margins equals false [00:13:53] show margins equals false oops [00:13:58] so here you can see kind of just blind [00:14:00] so here you can see kind of just blind samples from [00:14:02] samples from the model [00:14:03] the model okay and [00:14:05] okay and if i pop up the coherence if i bump it [00:14:08] if i pop up the coherence if i bump it down [00:14:09] down um then you'll see kind of a more random [00:14:11] um then you'll see kind of a more random pattern [00:14:13] pattern if i bump it up to 10 [00:14:15] if i bump it up to 10 then you'll see kind of more coherence [00:14:18] then you'll see kind of more coherence so remember this is kind of like the [00:14:19] so remember this is kind of like the phase transitions that we saw for the [00:14:22] phase transitions that we saw for the easy [00:14:24] easy okay so i will let you [00:14:26] okay so i will let you play with this on your phone [00:14:29] play with this on your phone let me just conclude here uh actually [00:14:32] let me just conclude here uh actually one thing before we [00:14:33] one thing before we so [00:14:34] so let me try to go back to iterated [00:14:36] let me try to go back to iterated conditional modes and compare that with [00:14:38] conditional modes and compare that with give samples both of them have the same [00:14:41] give samples both of them have the same kind of template you're working with [00:14:43] kind of template you're working with complete assignments and you're going [00:14:44] complete assignments and you're going through each variable and updating the [00:14:46] through each variable and updating the assignment to that variable one at a [00:14:49] assignment to that variable one at a time [00:14:49] time but there's a few differences here [00:14:52] but there's a few differences here one the first salient one is that [00:14:55] one the first salient one is that idea of conditional modes was for [00:14:56] idea of conditional modes was for solving csps where we're trying to find [00:14:58] solving csps where we're trying to find the maximum weight assignment dip [00:15:00] the maximum weight assignment dip sampling is for markup networks where [00:15:01] sampling is for markup networks where we're trying to compute marginal [00:15:03] we're trying to compute marginal probabilities [00:15:05] probabilities so as a consequence for icm [00:15:08] so as a consequence for icm we at each step we're choosing the value [00:15:13] we at each step we're choosing the value to sign to a variable which maximizes [00:15:16] to sign to a variable which maximizes its weight whereas in give sampling [00:15:19] its weight whereas in give sampling we're [00:15:20] we're using the weights to form a distribution [00:15:22] using the weights to form a distribution and sampling from that distribution [00:15:25] and sampling from that distribution in icm we notice that [00:15:27] in icm we notice that the algorithm does converge but often to [00:15:30] the algorithm does converge but often to a local optimum which is not the best [00:15:33] a local optimum which is not the best maximum weight assignment [00:15:35] maximum weight assignment for gift sampling [00:15:37] for gift sampling as you can see from these samples [00:15:38] as you can see from these samples there's no traditional notions of [00:15:40] there's no traditional notions of convergence then the samples are going [00:15:42] convergence then the samples are going to keep on changing and keep on changing [00:15:44] to keep on changing and keep on changing so the iterates are not the ones which [00:15:46] so the iterates are not the ones which are converging what is actually going to [00:15:48] are converging what is actually going to converge are the marginal estimates [00:15:52] converge are the marginal estimates and [00:15:53] and in under some technical assumptions [00:15:56] in under some technical assumptions these estimates are actually going to [00:15:58] these estimates are actually going to converge to the correct answer [00:16:00] converge to the correct answer we saw that for object tracking it did a [00:16:02] we saw that for object tracking it did a pretty good job there [00:16:04] pretty good job there um but there were some kind of technical [00:16:05] um but there were some kind of technical conditions um one sufficient condition [00:16:08] conditions um one sufficient condition is that all the weights uh be [00:16:10] is that all the weights uh be positive [00:16:12] positive but more [00:16:13] but more uh generally what we need is that for [00:16:16] uh generally what we need is that for the probability of going from one [00:16:18] the probability of going from one assignment to another assignment via [00:16:19] assignment to another assignment via give sampling has positive probability [00:16:21] give sampling has positive probability because if you have two disconnected um [00:16:25] because if you have two disconnected um regions then you can't if you start keep [00:16:28] regions then you can't if you start keep sampling at one particular point then [00:16:29] sampling at one particular point then you will never reach the other point [00:16:32] you will never reach the other point so one important caveat is skip sampling [00:16:36] so one important caveat is skip sampling is wonderful but [00:16:37] is wonderful but it in the worst case it does take [00:16:39] it in the worst case it does take exponential time so these are really [00:16:41] exponential time so these are really computing margin probabilities is a [00:16:42] computing margin probabilities is a really hard problem and gibb sampling is [00:16:45] really hard problem and gibb sampling is just you know a heuristic with some [00:16:48] just you know a heuristic with some nice [00:16:49] nice asymptotic guarantee [00:16:53] so wrapping up [00:16:54] so wrapping up um we looked at [00:16:56] um we looked at computing the marginal probabilities of [00:17:00] computing the marginal probabilities of a markov network [00:17:02] a markov network and we saw that gibbs sampling did this [00:17:06] and we saw that gibbs sampling did this by sampling one variable at a time [00:17:09] by sampling one variable at a time and it counts visitations [00:17:12] and it counts visitations to each of the values for a given [00:17:15] to each of the values for a given variable [00:17:17] variable and it's one of these kind of [00:17:18] and it's one of these kind of astonishing things that give sampling [00:17:21] astonishing things that give sampling is so carefully constructed that it [00:17:23] is so carefully constructed that it actually kind of works and you can prove [00:17:25] actually kind of works and you can prove lots of interesting theorems about it [00:17:28] lots of interesting theorems about it finally gibbs family is just the first [00:17:31] finally gibbs family is just the first taste of a much more broad class of [00:17:34] taste of a much more broad class of techniques called markov chain monte [00:17:36] techniques called markov chain monte carlo which are used to [00:17:40] carlo which are used to produce [00:17:41] produce much kind of richer ways of estimating [00:17:44] much kind of richer ways of estimating probabilities in markup [00:17:47] probabilities in markup all right that's the end of this module ================================================================================ LECTURE 032 ================================================================================ Bayesian Networks 1 - Overview | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=fA7zP6EcVdw --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about bayesian networks and new modeling [00:00:09] about bayesian networks and new modeling paradigm [00:00:10] paradigm so we have talked about two types of [00:00:12] so we have talked about two types of variable based models the first was [00:00:14] variable based models the first was constraint satisfaction problems where [00:00:16] constraint satisfaction problems where the objective is to find the maximum [00:00:18] the objective is to find the maximum weight assignment given a factor graph [00:00:20] weight assignment given a factor graph then we talked about markup networks [00:00:22] then we talked about markup networks where we used factor graphs to define a [00:00:24] where we used factor graphs to define a joint probability distribution over [00:00:26] joint probability distribution over assignments and we were computing [00:00:28] assignments and we were computing marginal probabilities [00:00:31] marginal probabilities now we're going to talk about bayesian [00:00:32] now we're going to talk about bayesian networks where we still define a [00:00:34] networks where we still define a distribution over [00:00:36] distribution over a set of random variables using a factor [00:00:38] a set of random variables using a factor graph but now the factors are going to [00:00:40] graph but now the factors are going to have special meaning [00:00:42] have special meaning the bayesian networks were developed by [00:00:43] the bayesian networks were developed by judea pearl in the mid 1980s and really [00:00:47] judea pearl in the mid 1980s and really have evolved into the more general [00:00:49] have evolved into the more general notion of generative modeling that we [00:00:51] notion of generative modeling that we see today in machine [00:00:53] see today in machine so quickly before diving into vision [00:00:56] so quickly before diving into vision networks it's helpful to compare and [00:00:58] networks it's helpful to compare and contrast with markov networks [00:01:01] contrast with markov networks so both [00:01:02] so both are going to define a probability [00:01:03] are going to define a probability distribution over assignments to a set [00:01:06] distribution over assignments to a set of random variables [00:01:08] of random variables but the way that each approaches this is [00:01:10] but the way that each approaches this is very different [00:01:11] very different so if you're defining a markov network [00:01:14] so if you're defining a markov network you tend to think in terms of specifying [00:01:17] you tend to think in terms of specifying a set of preferences [00:01:18] a set of preferences and you throw these factors encoding [00:01:21] and you throw these factors encoding these preferences into [00:01:23] these preferences into the markov network so for example last [00:01:25] the markov network so for example last time we just threw [00:01:26] time we just threw in the transition factor and observation [00:01:28] in the transition factor and observation vector for the object tracking example [00:01:31] vector for the object tracking example so the bayesian network is going to [00:01:33] so the bayesian network is going to require a more coordinated approach [00:01:36] require a more coordinated approach so in a bayesian network the factors are [00:01:37] so in a bayesian network the factors are going to be local conditional [00:01:39] going to be local conditional distributions as we'll see later and we [00:01:42] distributions as we'll see later and we really think about a generative process [00:01:44] really think about a generative process by which each of these variables is set [00:01:47] by which each of these variables is set based on other variables in turn [00:01:52] so there are many applications of [00:01:54] so there are many applications of bayesian networks um or more generally [00:01:56] bayesian networks um or more generally generated models [00:01:58] generated models so i'll just go through a couple of them [00:02:00] so i'll just go through a couple of them here so the first one is topic modeling [00:02:02] here so the first one is topic modeling where the goal is to discover hidden [00:02:05] where the goal is to discover hidden structure in a large collection of [00:02:07] structure in a large collection of documents so an example of topic [00:02:09] documents so an example of topic modeling is latent dirichlet allocation [00:02:11] modeling is latent dirichlet allocation or lda [00:02:13] or lda and lda posits that each document is [00:02:16] and lda posits that each document is generated by [00:02:18] generated by drawing a mixture of topics and then [00:02:20] drawing a mixture of topics and then generating the words given those topics [00:02:24] generating the words given those topics another interesting example is this idea [00:02:27] another interesting example is this idea of vision as inverse graphics [00:02:29] of vision as inverse graphics so much of computer vision today [00:02:31] so much of computer vision today is [00:02:33] is taking images and processing them in [00:02:35] taking images and processing them in some way to generate semantic [00:02:37] some way to generate semantic descriptions such as object categories [00:02:39] descriptions such as object categories or scene descriptions [00:02:41] or scene descriptions so vision is inverse graphics takes a [00:02:44] so vision is inverse graphics takes a very different approach [00:02:46] very different approach where we specify using laws of physics a [00:02:50] where we specify using laws of physics a graphics engine that can generate an [00:02:52] graphics engine that can generate an image given some semantic description [00:02:54] image given some semantic description for example a 3d model of an object [00:02:57] for example a 3d model of an object and then given this model [00:03:00] and then given this model computer vision is just [00:03:02] computer vision is just inverse graphics where we're trying to [00:03:05] inverse graphics where we're trying to recover the semantic description [00:03:08] recover the semantic description using [00:03:09] using the image as input [00:03:11] the image as input so this is an example of inference on [00:03:14] so this is an example of inference on this generative model [00:03:16] this generative model so while this idea hasn't really been [00:03:19] so while this idea hasn't really been able to scale the scale past some [00:03:21] able to scale the scale past some limited examples it's i think a very [00:03:24] limited examples it's i think a very tantalizing idea nonetheless [00:03:26] tantalizing idea nonetheless so switching gears a little bit let's [00:03:28] so switching gears a little bit let's talk about communication networks [00:03:31] talk about communication networks so [00:03:32] so in the communication networks nodes must [00:03:34] in the communication networks nodes must send messages [00:03:36] send messages just a sequence of bits to each other [00:03:38] just a sequence of bits to each other but these bits can get corrupted along [00:03:41] but these bits can get corrupted along the way due to [00:03:42] the way due to physics [00:03:43] physics so the idea behind error correcting [00:03:45] so the idea behind error correcting codes [00:03:46] codes more in particular these things called [00:03:48] more in particular these things called low density parity codes is that the [00:03:50] low density parity codes is that the sender sends a random parity checks on [00:03:53] sender sends a random parity checks on the data bits [00:03:55] the data bits and then the receiver obtains a noisy [00:03:57] and then the receiver obtains a noisy version of both the data and the parity [00:03:59] version of both the data and the parity bits the bayesian network defines how [00:04:02] bits the bayesian network defines how the original bits are related to the [00:04:05] the original bits are related to the noisy bits and then the receiver can use [00:04:08] noisy bits and then the receiver can use bayesian inference to compute and [00:04:10] bayesian inference to compute and recover the original bits so this is [00:04:12] recover the original bits so this is actually a very effective idea that's [00:04:13] actually a very effective idea that's used in practice [00:04:16] used in practice the final example is either [00:04:20] the final example is either controversial or a little bit grim [00:04:22] controversial or a little bit grim which i'll explain later so this this is [00:04:24] which i'll explain later so this this is a problem of dna matching [00:04:27] a problem of dna matching so there are two use cases of this [00:04:30] so there are two use cases of this one is in forensics so given dna found [00:04:33] one is in forensics so given dna found at a crime site [00:04:35] at a crime site even if the suspect's dna is not in the [00:04:38] even if the suspect's dna is not in the database [00:04:39] database one can still match this dna against the [00:04:42] one can still match this dna against the family members of a subject and here the [00:04:45] family members of a subject and here the bayesian network is structured along the [00:04:47] bayesian network is structured along the family tree [00:04:48] family tree and [00:04:49] and specifies the relationship between the [00:04:51] specifies the relationship between the family members dna due to using [00:04:54] family members dna due to using mendelian inheritance [00:04:56] mendelian inheritance so now while this technology has [00:04:58] so now while this technology has actually been used to solve a number of [00:05:00] actually been used to solve a number of crime cases there's definitely a lot of [00:05:03] crime cases there's definitely a lot of tricky ethical concerns about this [00:05:05] tricky ethical concerns about this expanded dna matching especially when an [00:05:08] expanded dna matching especially when an individual's decision to release their [00:05:10] individual's decision to release their own dna can impact the privacy of family [00:05:13] own dna can impact the privacy of family members [00:05:15] members the second use case is in disaster [00:05:17] the second use case is in disaster victim identification so after a big [00:05:20] victim identification so after a big airplane crash or some other disaster [00:05:23] airplane crash or some other disaster for example malaysia airlines crashed in [00:05:25] for example malaysia airlines crashed in ukraine in 2014 [00:05:27] ukraine in 2014 a victim's dna is found at the crash [00:05:29] a victim's dna is found at the crash site and is matched against the family [00:05:32] site and is matched against the family members using the same mechanism as i [00:05:34] members using the same mechanism as i just described to help identify victims [00:05:37] just described to help identify victims and these methods are very scalable [00:05:40] and these methods are very scalable which allows them to [00:05:42] which allows them to deal with well these unfortunate large [00:05:45] deal with well these unfortunate large crash sites [00:05:48] so why bayesian networks [00:05:51] so why bayesian networks well these days it's kind of hard not to [00:05:53] well these days it's kind of hard not to think about problems exclusively through [00:05:56] think about problems exclusively through the lens of standard supervised learning [00:05:58] the lens of standard supervised learning such as just train a deep neural network [00:06:00] such as just train a deep neural network on the pile of data [00:06:02] on the pile of data vision networks really operate in a very [00:06:04] vision networks really operate in a very different paradigm which offers several [00:06:06] different paradigm which offers several advantages that i want to underscore [00:06:08] advantages that i want to underscore here [00:06:10] here so the first [00:06:11] so the first is that it can handle heterogeneously [00:06:14] is that it can handle heterogeneously missing information [00:06:16] missing information so normally when you're doing standard [00:06:18] so normally when you're doing standard supervised learning [00:06:20] supervised learning your data is fairly homogeneous you have [00:06:23] your data is fairly homogeneous you have training exam input and output pairs [00:06:25] training exam input and output pairs both at training and test time [00:06:28] both at training and test time but [00:06:28] but in cases where you have missing [00:06:30] in cases where you have missing information or your auxiliary [00:06:31] information or your auxiliary information asian networks can [00:06:33] information asian networks can gracefully handle this missingness in a [00:06:36] gracefully handle this missingness in a way that's a little bit more challenging [00:06:39] way that's a little bit more challenging for traditional supervised methods [00:06:42] for traditional supervised methods the second is that bayesian networks [00:06:43] the second is that bayesian networks allow you to incorporate prior knowledge [00:06:46] allow you to incorporate prior knowledge much more easily so when you have it for [00:06:48] much more easily so when you have it for example you understand how mendelian [00:06:50] example you understand how mendelian inheritance works on dna or you [00:06:53] inheritance works on dna or you understand the laws of physics then [00:06:54] understand the laws of physics then visual networks provides a nice language [00:06:56] visual networks provides a nice language for incorporating this information into [00:06:59] for incorporating this information into your model [00:07:00] your model and now using this model you can [00:07:02] and now using this model you can actually learn from very few samples and [00:07:04] actually learn from very few samples and extrapolate beyond the training [00:07:06] extrapolate beyond the training distribution whereas in contrast many [00:07:09] distribution whereas in contrast many kind of model [00:07:10] kind of model agnostic [00:07:12] agnostic low inductive bias methods such as deep [00:07:14] low inductive bias methods such as deep neural networks require much more data [00:07:16] neural networks require much more data to be effective [00:07:18] to be effective because you're specifying prior [00:07:20] because you're specifying prior knowledge you can also [00:07:22] knowledge you can also interpret [00:07:23] interpret the variables inside the bayesian [00:07:25] the variables inside the bayesian networks and this could be useful for [00:07:28] networks and this could be useful for understanding [00:07:29] understanding why [00:07:30] why a model is making a certain decision and [00:07:32] a model is making a certain decision and you can introspect and ask questions [00:07:34] you can introspect and ask questions about any of the intermediate variables [00:07:36] about any of the intermediate variables and this is just follows from the laws [00:07:38] and this is just follows from the laws of probability [00:07:39] of probability finally [00:07:40] finally asian networks are an important [00:07:42] asian networks are an important precursor to causal models so these are [00:07:45] precursor to causal models so these are beyond the scope of this course but they [00:07:47] beyond the scope of this course but they are extremely important especially these [00:07:49] are extremely important especially these days they allow you to answer questions [00:07:51] days they allow you to answer questions about [00:07:52] about interventions for example what would [00:07:55] interventions for example what would happen if we give this drug to this [00:07:57] happen if we give this drug to this patient [00:07:58] patient and counter factuals what would have [00:08:00] and counter factuals what would have happened if we have given this drug so [00:08:03] happened if we have given this drug so these questions are extremely [00:08:05] these questions are extremely tricky and deep [00:08:07] tricky and deep that standard machine learning or any [00:08:09] that standard machine learning or any methods that view the world through just [00:08:11] methods that view the world through just the lens of predictions are really [00:08:14] the lens of predictions are really inadequate to answer so we're not going [00:08:16] inadequate to answer so we're not going to talk about this in this course but i [00:08:18] to talk about this in this course but i highly encourage you to explore this [00:08:19] highly encourage you to explore this topic on your own [00:08:21] topic on your own so finally bayesian networks obviously [00:08:24] so finally bayesian networks obviously aren't the panacea in many situations so [00:08:27] aren't the panacea in many situations so often in these [00:08:29] often in these canonical ai applications such as vision [00:08:32] canonical ai applications such as vision speech and language [00:08:34] speech and language we actually have large data sets and we [00:08:36] we actually have large data sets and we mostly care about prediction [00:08:38] mostly care about prediction and it's extremely hard to incorporate [00:08:40] and it's extremely hard to incorporate prior knowledge into [00:08:42] prior knowledge into your models in these very complex [00:08:44] your models in these very complex domains so in these cases bayesian [00:08:46] domains so in these cases bayesian networks haven't been um [00:08:48] networks haven't been um as successful and have largely but been [00:08:51] as successful and have largely but been supplanted by deep learning approaches [00:08:53] supplanted by deep learning approaches but still having bayesian networks in [00:08:55] but still having bayesian networks in your tool kit will allow you to use it [00:08:58] your tool kit will allow you to use it effectively when you discover the right [00:09:00] effectively when you discover the right problem [00:09:02] problem so in the remaining modules on bayesian [00:09:03] so in the remaining modules on bayesian networks i will first introduce azure [00:09:06] networks i will first introduce azure networks more formally [00:09:08] networks more formally and then i'll talk about problems [00:09:10] and then i'll talk about problems programming which is a way to define [00:09:12] programming which is a way to define bayesian networks using probabilistic [00:09:15] bayesian networks using probabilistic programs so this is a really cool [00:09:17] programs so this is a really cool way [00:09:18] way to think about modeling [00:09:21] to think about modeling then we'll turn to inference [00:09:23] then we'll turn to inference i'll talk about what inference means um [00:09:26] i'll talk about what inference means um computing conditional and marginal [00:09:28] computing conditional and marginal probabilities we're actually going to [00:09:30] probabilities we're actually going to reduce the problem in beijing networks [00:09:32] reduce the problem in beijing networks to the uh the same problem of a big [00:09:36] to the uh the same problem of a big probabilistic inference in markov [00:09:37] probabilistic inference in markov networks allowing to leverage the stuff [00:09:40] networks allowing to leverage the stuff that we talked about [00:09:41] that we talked about when we talked about [00:09:43] when we talked about markov networks [00:09:44] markov networks then we're going to specialize to hidden [00:09:46] then we're going to specialize to hidden markup models hmms an important special [00:09:48] markup models hmms an important special case of asian networks [00:09:50] case of asian networks we're going to show that the forward [00:09:51] we're going to show that the forward backward algorithm can leverage the [00:09:53] backward algorithm can leverage the chain structure of an hmm allowing you [00:09:56] chain structure of an hmm allowing you to do exact probabilistic inference [00:09:58] to do exact probabilistic inference efficiently [00:09:59] efficiently then we're going to talk about particle [00:10:00] then we're going to talk about particle filtering which allows you to do [00:10:02] filtering which allows you to do approximate inference and scale up to [00:10:05] approximate inference and scale up to hmms where variables have larger domains [00:10:09] hmms where variables have larger domains finally we're going to talk about [00:10:11] finally we're going to talk about learning in bayesian networks we're just [00:10:13] learning in bayesian networks we're just going to start with supervised learning [00:10:15] going to start with supervised learning where all the variables are observed and [00:10:17] where all the variables are observed and this actually turns out to be quite easy [00:10:20] this actually turns out to be quite easy you'll be pleasantly surprised [00:10:22] you'll be pleasantly surprised then we're going to show you how to [00:10:24] then we're going to show you how to guard against overfitting using laplace [00:10:26] guard against overfitting using laplace smoothing and finally we're going to [00:10:27] smoothing and finally we're going to turn to cases where not all the [00:10:29] turn to cases where not all the variables are observed and we introduce [00:10:31] variables are observed and we introduce the em algorithm that will help us learn [00:10:33] the em algorithm that will help us learn in such vision elements [00:10:35] in such vision elements okay so let's jump in ================================================================================ LECTURE 033 ================================================================================ Bayesian Networks 2 - Definition | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=xvC6XmZmR_U --- Transcript [00:00:05] hi in this module i'm going to present [00:00:07] hi in this module i'm going to present the formal definition of bayesian [00:00:09] the formal definition of bayesian networks give a few examples and then [00:00:11] networks give a few examples and then talk about a really interesting property [00:00:13] talk about a really interesting property called explaining away [00:00:15] called explaining away so before we begin i want to review some [00:00:17] so before we begin i want to review some basic probability [00:00:19] basic probability so suppose we have some random variables [00:00:22] so suppose we have some random variables we have one called s which is [00:00:25] we have one called s which is representing whether they're sunshine [00:00:28] representing whether they're sunshine and another random variable called r [00:00:30] and another random variable called r which represents whether there's rain [00:00:33] which represents whether there's rain so you should think about a setting [00:00:35] so you should think about a setting of values to s and r as capturing some [00:00:38] of values to s and r as capturing some state of the world [00:00:40] state of the world so we don't know which state of the [00:00:42] so we don't know which state of the world we're in so we're going to capture [00:00:44] world we're in so we're going to capture this uncertainty using a joint [00:00:46] this uncertainty using a joint distribution [00:00:47] distribution so the joint distribution over s and r [00:00:50] so the joint distribution over s and r in p s comma r is equal to [00:00:53] in p s comma r is equal to a table [00:00:54] a table and this table specifies for every [00:00:57] and this table specifies for every possible assignment to snr [00:00:59] possible assignment to snr i'm going to have a probability [00:01:01] i'm going to have a probability associated with it so for example [00:01:04] associated with it so for example there's the probability of no sun and no [00:01:06] there's the probability of no sun and no rain is 0.2 [00:01:08] rain is 0.2 snow sun and rain is 0.08 and so on and [00:01:12] snow sun and rain is 0.08 and so on and so forth [00:01:13] so forth so notice that i'm using lowercase [00:01:15] so notice that i'm using lowercase letters to denote values and uppercase [00:01:18] letters to denote values and uppercase letters to denote the random variables [00:01:22] letters to denote the random variables and also notice that [00:01:23] and also notice that this quantity is a number it's a [00:01:25] this quantity is a number it's a probability whereas this quantity [00:01:28] probability whereas this quantity is a table [00:01:31] is a table so the joint distribution captures [00:01:34] so the joint distribution captures everything that you really want to know [00:01:36] everything that you really want to know you can think about it as a [00:01:37] you can think about it as a probabilistic database [00:01:39] probabilistic database that [00:01:40] that captures how the world works [00:01:43] captures how the world works so now we can use the joint distribution [00:01:46] so now we can use the joint distribution to answer all sorts of interesting [00:01:47] to answer all sorts of interesting questions [00:01:49] questions so we can compute what is called a [00:01:51] so we can compute what is called a marginal distribution so the idea here [00:01:53] marginal distribution so the idea here is that suppose i'm interested in only [00:01:55] is that suppose i'm interested in only whether it's sunshine or not i don't [00:01:57] whether it's sunshine or not i don't care about whether it's raining [00:01:59] care about whether it's raining so we can compute p of s [00:02:02] so we can compute p of s and this is a table [00:02:03] and this is a table which specifies the possible values of s [00:02:06] which specifies the possible values of s 0 1 and the probability the marginal [00:02:09] 0 1 and the probability the marginal probability of that particular value [00:02:13] probability of that particular value so how do we compute this we simply [00:02:14] so how do we compute this we simply aggregate rows [00:02:16] aggregate rows so in particular we're going to look at [00:02:18] so in particular we're going to look at s equals 0 [00:02:19] s equals 0 look at the joint distribution and match [00:02:22] look at the joint distribution and match all the rows where s is zero so these [00:02:24] all the rows where s is zero so these two and add them up so that gives us [00:02:26] two and add them up so that gives us 0.28 [00:02:28] 0.28 and then for s equals one the matching [00:02:30] and then for s equals one the matching rows are these last two rows and that [00:02:32] rows are these last two rows and that gives us 0.72 that's the marginal [00:02:36] gives us 0.72 that's the marginal distribution over s [00:02:39] distribution over s so there's also another concept called [00:02:41] so there's also another concept called conditional distribution [00:02:43] conditional distribution and here the idea is that suppose [00:02:45] and here the idea is that suppose i knew it was raining so i'm going to [00:02:48] i knew it was raining so i'm going to condition on r equals one [00:02:50] condition on r equals one so now i want to know what is the [00:02:52] so now i want to know what is the probability of sunshine [00:02:54] probability of sunshine so this is a table where i again specify [00:02:57] so this is a table where i again specify the possible values of s zero one and i [00:03:00] the possible values of s zero one and i want to know the probability the [00:03:01] want to know the probability the conditional probability of s given r [00:03:04] conditional probability of s given r equals one [00:03:05] equals one so the way i want to approach [00:03:07] so the way i want to approach conditional distribution is as follows i [00:03:10] conditional distribution is as follows i have this condition r equals one [00:03:12] have this condition r equals one that means i'm going to effectively [00:03:15] that means i'm going to effectively remove [00:03:16] remove all the rows where r does not equal one [00:03:20] all the rows where r does not equal one i'm left with these two [00:03:22] i'm left with these two and now i'm going to [00:03:25] and now i'm going to um [00:03:26] um simply normalize this distribution so i [00:03:29] simply normalize this distribution so i have zero point zero eight and 0.02 the [00:03:33] have zero point zero eight and 0.02 the sum is 0.1 so i divide by that sum [00:03:36] sum is 0.1 so i divide by that sum and that's going to give me 0.8 and 0.2 [00:03:40] and that's going to give me 0.8 and 0.2 for [00:03:41] for the values of [00:03:44] s given r equals 1. [00:03:47] s given r equals 1. so [00:03:48] so all we did was select the rows that [00:03:50] all we did was select the rows that match the condition [00:03:51] match the condition and normalize to get [00:03:54] and normalize to get the [00:03:55] the distribution [00:03:57] distribution and now just a simple note is that [00:04:00] and now just a simple note is that the normalization constant which is the [00:04:02] the normalization constant which is the sum of these two [00:04:04] sum of these two is actually the probability of r equals [00:04:08] is actually the probability of r equals the marginal probability of r equals one [00:04:10] the marginal probability of r equals one you can check that the conditional by [00:04:13] you can check that the conditional by definition is equal to the joint divided [00:04:15] definition is equal to the joint divided by the marginal of the thinking [00:04:21] so let's expand our example a little bit [00:04:23] so let's expand our example a little bit so suppose we have variables now [00:04:25] so suppose we have variables now sunshine rain [00:04:27] sunshine rain traffic [00:04:28] traffic and autumn [00:04:30] and autumn so we have a joint distribution over all [00:04:33] so we have a joint distribution over all four variables [00:04:34] four variables and we can [00:04:36] and we can marginal and conditioning are not [00:04:39] marginal and conditioning are not mutually exclusive we can actually [00:04:40] mutually exclusive we can actually define core [00:04:41] define core questions that involve both so here is [00:04:44] questions that involve both so here is an example suppose i [00:04:46] an example suppose i know that it's uh there's traffic and [00:04:49] know that it's uh there's traffic and that we're in the autumn quarter [00:04:52] that we're in the autumn quarter and now we're interested in a particular [00:04:54] and now we're interested in a particular variable query variable [00:04:56] variable query variable in this case r [00:04:58] in this case r so this question can be written as [00:05:00] so this question can be written as follows the probability [00:05:02] follows the probability of the query variable r condition on [00:05:06] of the query variable r condition on this evidence t equals one a equals one [00:05:10] this evidence t equals one a equals one okay so now [00:05:11] okay so now the variables which are not mentioned [00:05:13] the variables which are not mentioned here are said to be marginalized out so [00:05:16] here are said to be marginalized out so s is not mentioned here so we're [00:05:17] s is not mentioned here so we're marginalizing out s [00:05:19] marginalizing out s okay so in general there's three [00:05:22] okay so in general there's three sets of variables the query variables [00:05:25] sets of variables the query variables the conditioning variables and [00:05:28] the conditioning variables and the marginalized out variables which [00:05:30] the marginalized out variables which form a partition of all the variables in [00:05:33] form a partition of all the variables in your [00:05:34] your in your system [00:05:38] so now let's turn to a classic puzzle [00:05:42] so now let's turn to a classic puzzle which we will solve using [00:05:45] which we will solve using suppose that in the world [00:05:47] suppose that in the world there are unfortunate things such as [00:05:49] there are unfortunate things such as earthquakes and burglaries and suppose [00:05:52] earthquakes and burglaries and suppose that they're independent events and [00:05:54] that they're independent events and hopefully rare um each of them happens [00:05:56] hopefully rare um each of them happens with probability epsilon probably [00:05:59] with probability epsilon probably epsilon is some small [00:06:02] epsilon is some small um you've installed an alarm which will [00:06:05] um you've installed an alarm which will go off whenever [00:06:07] go off whenever either there's an earthquake or [00:06:10] either there's an earthquake or um [00:06:11] um there's a burglar so you got some [00:06:12] there's a burglar so you got some special deal where it's like a [00:06:13] special deal where it's like a two-in-one kind of alarm [00:06:15] two-in-one kind of alarm so um suppose you're away on vacation [00:06:18] so um suppose you're away on vacation and then you got a notification that [00:06:20] and then you got a notification that your alarm went off okay [00:06:22] your alarm went off okay so now [00:06:24] so now um [00:06:25] um does hearing that there's an earthquake [00:06:28] does hearing that there's an earthquake on [00:06:29] on uh the radio or on your news feed [00:06:32] uh the radio or on your news feed increase decrease or keep constant the [00:06:35] increase decrease or keep constant the probability of burglar [00:06:38] probability of burglar okay so if does knowing that there's an [00:06:40] okay so if does knowing that there's an earthquake in addition to alarm how does [00:06:43] earthquake in addition to alarm how does that change your beliefs about a [00:06:45] that change your beliefs about a burglary [00:06:47] burglary so [00:06:48] so now we could try to intuit the answer [00:06:51] now we could try to intuit the answer and i would encourage you to do that and [00:06:53] and i would encourage you to do that and see if you're right but [00:06:55] see if you're right but sometimes this is uh can be very [00:06:58] sometimes this is uh can be very slippery because the right answer could [00:07:00] slippery because the right answer could be counterintuitive [00:07:03] so you might think that because well [00:07:05] so you might think that because well earthquakes and burglaries i said [00:07:06] earthquakes and burglaries i said they're independent [00:07:08] they're independent so knowing that there's an earthquake [00:07:10] so knowing that there's an earthquake why should that change the probability [00:07:12] why should that change the probability of berkeley that's one way to think [00:07:14] of berkeley that's one way to think about it but that turns out to be wrong [00:07:16] about it but that turns out to be wrong and i'll show you why [00:07:18] and i'll show you why so let's try to tackle this problem [00:07:21] so let's try to tackle this problem using bayesian networks [00:07:23] using bayesian networks so we're going to define a joint [00:07:25] so we're going to define a joint distribution over [00:07:27] distribution over earthquake burglary and alarm i'll do [00:07:29] earthquake burglary and alarm i'll do this in the next slide but first let's [00:07:31] this in the next slide but first let's talk about the questions let's convert [00:07:33] talk about the questions let's convert this word problem into mathematical [00:07:35] this word problem into mathematical notation here [00:07:37] notation here so the two things i want to compare is [00:07:39] so the two things i want to compare is what is the probability of there being a [00:07:41] what is the probability of there being a burglary given only that i heard an [00:07:44] burglary given only that i heard an alarm [00:07:46] alarm versus what is the probability of a [00:07:47] versus what is the probability of a burglary given that i heard an alarm and [00:07:50] burglary given that i heard an alarm and i heard that there's an earthquake [00:07:52] i heard that there's an earthquake so is it smaller is it the same is it [00:07:55] so is it smaller is it the same is it larger and that's what i want to know [00:07:59] larger and that's what i want to know so now let us define the bayesian [00:08:01] so now let us define the bayesian network of [00:08:03] network of completely [00:08:04] completely so there's going to be four steps um to [00:08:07] so there's going to be four steps um to thinking about how to define a bayesian [00:08:09] thinking about how to define a bayesian network [00:08:10] network first of all let's uh figure out what [00:08:12] first of all let's uh figure out what the variables are so the variables are [00:08:14] the variables are so the variables are whether there's a burglary b whether [00:08:16] whether there's a burglary b whether there's an earthquake e and whether [00:08:18] there's an earthquake e and whether there's uh the alarm went off or on a [00:08:21] there's uh the alarm went off or on a okay [00:08:22] okay second what we're going to do is to [00:08:25] second what we're going to do is to model the dependencies between [00:08:27] model the dependencies between these variables using directed arrows [00:08:32] these variables using directed arrows and these capture [00:08:34] and these capture you can think about them as capturing [00:08:36] you can think about them as capturing causality but although that's [00:08:39] causality but although that's not necessarily the case here [00:08:43] and so these are meant to just capture [00:08:45] and so these are meant to just capture qualitative relationships so here the [00:08:48] qualitative relationships so here the alarm is triggered off either by the [00:08:51] alarm is triggered off either by the burglary or an earthquake so that seems [00:08:53] burglary or an earthquake so that seems sensible [00:08:55] sensible so to make this qualitative relationship [00:08:57] so to make this qualitative relationship it's quantitative i'm going to define [00:09:00] it's quantitative i'm going to define a local conditional distribution for [00:09:02] a local conditional distribution for each variable conditioned on its parents [00:09:05] each variable conditioned on its parents so let's go through these examples so we [00:09:07] so let's go through these examples so we have b that's a variable [00:09:09] have b that's a variable so a local conditional distribution [00:09:10] so a local conditional distribution specifies [00:09:12] specifies for each possible value of b [00:09:15] for each possible value of b what is this probability so [00:09:17] what is this probability so i said that the probability of burglary [00:09:19] i said that the probability of burglary is epsilon and that means the [00:09:20] is epsilon and that means the probability of no burglary is one minus [00:09:22] probability of no burglary is one minus sub-small [00:09:24] sub-small um then we look at e [00:09:26] um then we look at e e has no parents as well so um the [00:09:29] e has no parents as well so um the probability of uh earthquake is [00:09:33] probability of uh earthquake is epsilon and the probability of no [00:09:34] epsilon and the probability of no earthquake is one minus [00:09:36] earthquake is one minus so now let's turn into the um and i can [00:09:40] so now let's turn into the um and i can write these conditional distributions as [00:09:42] write these conditional distributions as follows as well so i can write [00:09:44] follows as well so i can write probability of b [00:09:45] probability of b is epsilon times indicator of b equals 1 [00:09:49] is epsilon times indicator of b equals 1 plus 1 minus epsilon times indicator b [00:09:51] plus 1 minus epsilon times indicator b 0. so if i plug in 1 here then i'm going [00:09:55] 0. so if i plug in 1 here then i'm going to get epsilon and if i plug in 0 [00:09:57] to get epsilon and if i plug in 0 i'm going to get 1 minus epsilon and [00:09:59] i'm going to get 1 minus epsilon and same with probability of epsilon e [00:10:02] same with probability of epsilon e so now the what is the probability of a [00:10:05] so now the what is the probability of a given its parents [00:10:08] given its parents so [00:10:09] so it's easiest to write it actually [00:10:11] it's easiest to write it actually mathematically like this [00:10:13] mathematically like this so this is just the indicator of whether [00:10:16] so this is just the indicator of whether a equals b or e [00:10:19] a equals b or e so this is a deterministic relationship [00:10:22] so this is a deterministic relationship but i've lifted this to this [00:10:23] but i've lifted this to this probabilistic notation [00:10:26] i can also [00:10:28] i can also write this out as a table where i [00:10:30] write this out as a table where i specify [00:10:32] specify for every possible configuration of the [00:10:34] for every possible configuration of the parents [00:10:36] parents and of a itself [00:10:37] and of a itself what is its probability [00:10:39] what is its probability so here if a b [00:10:41] so here if a b and e are zero [00:10:43] and e are zero command [00:10:45] command and does a equal [00:10:47] and does a equal uh is zero equal a zero or zero [00:10:50] uh is zero equal a zero or zero that's yes so this probability is one [00:10:53] that's yes so this probability is one zero or a zero equal one the answer is [00:10:56] zero or a zero equal one the answer is no so that's a zero because uh zero or z [00:11:00] no so that's a zero because uh zero or z one equal zero [00:11:02] one equal zero that's also a no [00:11:03] that's also a no and the zero or one equal one [00:11:07] and the zero or one equal one um the answer is yes so that's a one [00:11:10] um the answer is yes so that's a one go on the rest of the table analogously [00:11:13] go on the rest of the table analogously okay so now i have defined a local [00:11:15] okay so now i have defined a local conditional distribution for each [00:11:18] conditional distribution for each variable [00:11:19] variable given its parents [00:11:22] given its parents and now [00:11:23] and now the final step is to multiply all these [00:11:26] the final step is to multiply all these together and that is defined [00:11:30] together and that is defined as the joint distribution over all the [00:11:33] as the joint distribution over all the random variable [00:11:35] random variable so notice that i'm deliberately using [00:11:37] so notice that i'm deliberately using two types of p here so one is lowercase [00:11:40] two types of p here so one is lowercase p which is used to specify [00:11:43] p which is used to specify the local conditional disk probabilities [00:11:46] the local conditional disk probabilities and the blackboard uppercase p is [00:11:48] and the blackboard uppercase p is reserved for the joint distribution and [00:11:51] reserved for the joint distribution and also derived marginal and conditional [00:11:54] also derived marginal and conditional distributions [00:11:56] distributions so notice that again that these local [00:11:59] so notice that again that these local conditional distributions are just [00:12:00] conditional distributions are just defined [00:12:02] defined whereas this joint distribution is [00:12:04] whereas this joint distribution is derived from the local conditional [00:12:07] derived from the local conditional distribution [00:12:10] all right so the joint distribution like [00:12:12] all right so the joint distribution like i said is the simply the product of all [00:12:16] i said is the simply the product of all the local conditional distributions so [00:12:18] the local conditional distributions so if i work that out i get this table over [00:12:21] if i work that out i get this table over all possible assignments to be ena and [00:12:25] all possible assignments to be ena and its probability [00:12:27] its probability okay so now i can work on [00:12:29] okay so now i can work on these questions that i'm asking so this [00:12:32] these questions that i'm asking so this is my probabilistic database let's go [00:12:34] is my probabilistic database let's go query it [00:12:35] query it so let's warm up with something [00:12:39] so let's warm up with something relatively simple so probability what is [00:12:41] relatively simple so probability what is the marginal probability of b [00:12:43] the marginal probability of b equals one here [00:12:45] equals one here okay so remember how do i compute [00:12:46] okay so remember how do i compute marginal probability i look at b equals [00:12:49] marginal probability i look at b equals one okay so that selects these rows down [00:12:52] one okay so that selects these rows down here and i simply add up these [00:12:54] here and i simply add up these probabilities so there's epsilon times 1 [00:12:56] probabilities so there's epsilon times 1 minus epsilon [00:12:59] minus epsilon and then adding [00:13:01] and then adding epsilon squared [00:13:03] epsilon squared so that gives me epsilon minus epsilon [00:13:06] so that gives me epsilon minus epsilon squared plus epsilon squared equals [00:13:08] squared plus epsilon squared equals epsilon [00:13:10] epsilon okay so what about [00:13:12] okay so what about probability of burglary [00:13:14] probability of burglary condition on the alarm [00:13:17] condition on the alarm so remember for conditional [00:13:18] so remember for conditional distributions [00:13:20] distributions i'm going to now wipe out all the rows [00:13:23] i'm going to now wipe out all the rows where a [00:13:24] where a is not one [00:13:26] is not one um [00:13:27] um this [00:13:29] this so i'm left with all these rows which [00:13:31] so i'm left with all these rows which are consistent with my evidence of equal [00:13:33] are consistent with my evidence of equal one [00:13:35] one and now [00:13:36] and now i'm going to look at [00:13:38] i'm going to look at um probability of b equals one so that [00:13:42] um probability of b equals one so that is are these two rows [00:13:44] is are these two rows and now i add [00:13:45] and now i add um epsilon 1 minus epsilon [00:13:49] um epsilon 1 minus epsilon plus [00:13:51] plus um [00:13:52] um epsilon squared [00:13:54] epsilon squared okay and i'm going to divide by the sum [00:13:57] okay and i'm going to divide by the sum of all these three things which is [00:14:00] of all these three things which is same as the numerator plus this [00:14:01] same as the numerator plus this additional one minus epsilon times [00:14:03] additional one minus epsilon times epsilon [00:14:04] epsilon if you do the math here you get uh one [00:14:07] if you do the math here you get uh one over two minus epsilon [00:14:10] over two minus epsilon okay so this intuitively makes sense um [00:14:13] okay so this intuitively makes sense um the prior probability of a burglary is [00:14:15] the prior probability of a burglary is small but if i hear alarm then this goes [00:14:18] small but if i hear alarm then this goes up to actually a little bit over 50 [00:14:20] up to actually a little bit over 50 percent [00:14:23] so now the final [00:14:26] so now the final question is what is the probability of [00:14:27] question is what is the probability of burglary given that i heard the alarm [00:14:31] burglary given that i heard the alarm and also i hear that there's an [00:14:33] and also i hear that there's an earthquake [00:14:34] earthquake okay so i'm conditioning on now a equals [00:14:37] okay so i'm conditioning on now a equals one and equals one so i'm going to wipe [00:14:39] one and equals one so i'm going to wipe out [00:14:40] out the rows where [00:14:42] the rows where e is zero [00:14:45] and now i am left with what's the [00:14:48] and now i am left with what's the probability of equals one so that is [00:14:51] probability of equals one so that is epsilon squared [00:14:53] epsilon squared divided by [00:14:55] divided by the sum over these two probabilities [00:14:57] the sum over these two probabilities which is epsilon squared plus one minus [00:14:59] which is epsilon squared plus one minus f squared epsilon and this gives me [00:15:03] f squared epsilon and this gives me if you do the math it gives me epsilon [00:15:07] if you do the math it gives me epsilon okay so this answers our question now [00:15:10] okay so this answers our question now um when i [00:15:13] um when i heard the alarm the probability [00:15:15] heard the alarm the probability of a burglary uh goes up rightfully but [00:15:19] of a burglary uh goes up rightfully but now [00:15:20] now i see that if there is an earthquake or [00:15:23] i see that if there is an earthquake or hear that there's an earthquake that [00:15:24] hear that there's an earthquake that probably goes down back to epsilon [00:15:28] probably goes down back to epsilon okay so the answer to the question is [00:15:30] okay so the answer to the question is that [00:15:31] that observing the earthquake does cause the [00:15:33] observing the earthquake does cause the problem of burglary to go down [00:15:36] problem of burglary to go down okay so let me actually work [00:15:39] okay so let me actually work convince you of this via this demo so [00:15:42] convince you of this via this demo so here um remember from this [00:15:45] here um remember from this before that we can define arbitrary [00:15:48] before that we can define arbitrary factor graphs including major networks [00:15:50] factor graphs including major networks using this tool so we have three [00:15:53] using this tool so we have three variables uh b e and a [00:15:55] variables uh b e and a epsilon we're setting to 0.05 here [00:15:58] epsilon we're setting to 0.05 here um i'm going to define factors or local [00:16:01] um i'm going to define factors or local conditional distributions here [00:16:03] conditional distributions here probability of [00:16:04] probability of a b [00:16:05] a b probability of e probability of a given [00:16:08] probability of e probability of a given b and e [00:16:09] b and e and [00:16:10] and now i'm going to ask for the probability [00:16:12] now i'm going to ask for the probability of b [00:16:14] of b um so [00:16:16] um so if i step through this algorithm i see [00:16:18] if i step through this algorithm i see that the probability of b is [00:16:21] that the probability of b is 0.05 which is epsilon [00:16:25] 0.05 which is epsilon so now what happens when i condition on [00:16:28] so now what happens when i condition on a [00:16:30] a when i condition on a [00:16:31] when i condition on a i find that the probability of b [00:16:34] i find that the probability of b condition on equals 1 is [00:16:36] condition on equals 1 is 0.51 so remember this is 1 over 2 [00:16:40] 0.51 so remember this is 1 over 2 minus epsilon [00:16:43] so now [00:16:44] so now um [00:16:46] um finally i'm going to condition on [00:16:48] finally i'm going to condition on earthquake um [00:16:52] earthquake um here [00:16:53] here if i condition an earthquake then [00:16:56] if i condition an earthquake then i see that the probability of burglary [00:16:59] i see that the probability of burglary goes down to 0.05 which is epsilon [00:17:05] okay so [00:17:06] okay so what do i learn what have we learned [00:17:08] what do i learn what have we learned from this so you can write a flashy [00:17:11] from this so you can write a flashy headline saying earthquakes decrease [00:17:13] headline saying earthquakes decrease burglaries [00:17:15] burglaries okay so [00:17:16] okay so of course this is run a little bit time [00:17:18] of course this is run a little bit time and cheat because this is actually not a [00:17:20] and cheat because this is actually not a causal statement you have to be because [00:17:22] causal statement you have to be because if you go in and cause some earthquakes [00:17:25] if you go in and cause some earthquakes i don't know how you would do that but [00:17:26] i don't know how you would do that but supposing you do [00:17:28] supposing you do then it's not like all the buller goes [00:17:29] then it's not like all the buller goes are going to [00:17:30] are going to disappear so um here decrease does not [00:17:34] disappear so um here decrease does not mean you know causal effect it just [00:17:37] mean you know causal effect it just means that [00:17:38] means that given this evidence then actually the [00:17:40] given this evidence then actually the probability of of various other [00:17:43] probability of of various other variables change [00:17:45] variables change so the punch line here is that [00:17:48] so the punch line here is that you know dealing with all these [00:17:49] you know dealing with all these probabilities and [00:17:51] probabilities and reasoning under certainty is quite [00:17:53] reasoning under certainty is quite slippery [00:17:54] slippery so [00:17:55] so we need some sort of sound mathematical [00:17:57] we need some sort of sound mathematical framework such as bayesian networks to [00:17:59] framework such as bayesian networks to deliver um the right answers [00:18:04] so this type of [00:18:07] so this type of phenomenon is [00:18:08] phenomenon is is so important to bayesian networks [00:18:10] is so important to bayesian networks that it has a special name it's called [00:18:12] that it has a special name it's called explaining away [00:18:14] explaining away so in general explaining away is when [00:18:17] so in general explaining away is when you have two causes [00:18:19] you have two causes or more that have positively influenced [00:18:22] or more that have positively influenced an effect [00:18:23] an effect so condition on effect [00:18:26] so condition on effect further conditioning on one cause it [00:18:28] further conditioning on one cause it actually reduces the probability of the [00:18:30] actually reduces the probability of the other cause [00:18:31] other cause so mathematically this is written as [00:18:33] so mathematically this is written as probability of [00:18:34] probability of uh the other cause given [00:18:38] uh the other cause given the effect and [00:18:39] the effect and one of the causes is less than the [00:18:41] one of the causes is less than the probability of the cause given just the [00:18:44] probability of the cause given just the effect alone [00:18:45] effect alone and this is true even if the causes are [00:18:48] and this is true even if the causes are independent which is might be somewhat [00:18:51] independent which is might be somewhat of the counter to it counterintuitive [00:18:53] of the counter to it counterintuitive effect and this is kind of the hallmark [00:18:55] effect and this is kind of the hallmark of bayesian networks this is called a v [00:18:58] of bayesian networks this is called a v structure it looks like a v [00:19:00] structure it looks like a v um so you can rationalize this uh if you [00:19:03] um so you can rationalize this uh if you want some intuition as follows so [00:19:06] want some intuition as follows so you have this effect [00:19:07] you have this effect and you observe a equals one and now [00:19:10] and you observe a equals one and now you're trying to seek an explanation for [00:19:12] you're trying to seek an explanation for what caused [00:19:14] what caused this effect it's a b or e [00:19:16] this effect it's a b or e so just [00:19:18] so just condition on a [00:19:20] condition on a well it could be either one so it's kind [00:19:22] well it could be either one so it's kind of like 50 50. but [00:19:25] of like 50 50. but if you [00:19:26] if you you have if i told you that one of the [00:19:28] you have if i told you that one of the causes was actually activated [00:19:31] causes was actually activated then that intuitively lessens [00:19:34] then that intuitively lessens um the [00:19:36] um the responsibility and you don't really need [00:19:38] responsibility and you don't really need this other cause to explain a so that's [00:19:41] this other cause to explain a so that's why the probability of this other [00:19:43] why the probability of this other cause goes down [00:19:45] cause goes down so of course that is very hand wavy but [00:19:48] so of course that is very hand wavy but you can rest assured that there is a [00:19:50] you can rest assured that there is a rigorous mathematical calculations [00:19:52] rigorous mathematical calculations behind that that just did [00:19:55] behind that that just did okay so let's look at another [00:19:58] okay so let's look at another example this is kind of a toy medical [00:20:00] example this is kind of a toy medical diagnosis problem so suppose you are [00:20:02] diagnosis problem so suppose you are coughing and you have itchy eyes [00:20:04] coughing and you have itchy eyes you have a cold or [00:20:07] you have a cold or something else [00:20:08] something else so remember there's four steps so let's [00:20:10] so remember there's four steps so let's go through them in turn so the first [00:20:12] go through them in turn so the first step is to [00:20:14] step is to write down what are the random variables [00:20:16] write down what are the random variables um of interest [00:20:18] um of interest so here we have cold [00:20:20] so here we have cold c allergies a cough [00:20:23] c allergies a cough h and itchy eyes [00:20:25] h and itchy eyes i [00:20:27] i okay so [00:20:29] okay so um these are variables c a [00:20:32] um these are variables c a h and i [00:20:34] h and i so the second step is to draw arrows [00:20:36] so the second step is to draw arrows between them using prior knowledge so [00:20:39] between them using prior knowledge so using a really super crude [00:20:41] using a really super crude medical knowledge i'm gonna just declare [00:20:43] medical knowledge i'm gonna just declare that [00:20:44] that well cough could be either because of [00:20:47] well cough could be either because of cold or allergies whereas heis is [00:20:51] cold or allergies whereas heis is generally due to allergies alone but not [00:20:53] generally due to allergies alone but not cold [00:20:55] cold so step three is i'm going to make this [00:20:57] so step three is i'm going to make this quantitative by defining local [00:21:00] quantitative by defining local conditional distribution so i'm going to [00:21:03] conditional distribution so i'm going to for each [00:21:04] for each variable [00:21:05] variable i'm going to write down a local [00:21:06] i'm going to write down a local conditional distribution given its [00:21:08] conditional distribution given its parent so probability of c c has no [00:21:10] parent so probability of c c has no parents [00:21:11] parents probability of a a has no parents [00:21:14] probability of a a has no parents um probability of h given its parent c [00:21:17] um probability of h given its parent c and a [00:21:19] and a and probability of i given its parent a [00:21:24] and probability of i given its parent a okay so i'm not going to bother to write [00:21:25] okay so i'm not going to bother to write down the actual uh probability [00:21:28] down the actual uh probability distribution on the slide [00:21:30] distribution on the slide um step four is to multiply all these [00:21:33] um step four is to multiply all these together [00:21:34] together to form [00:21:36] to form the joint distribution [00:21:37] the joint distribution over all the random variables and [00:21:40] over all the random variables and lowercase p is local conditional [00:21:42] lowercase p is local conditional distribution blackboard p is the joint [00:21:45] distribution blackboard p is the joint distribution [00:21:48] distribution okay so now i can ask this i have this [00:21:51] okay so now i can ask this i have this probability database now we can ask [00:21:53] probability database now we can ask questions about it so [00:21:56] questions about it so let's warm up um not exactly with this [00:21:58] let's warm up um not exactly with this question but a different question which [00:22:00] question but a different question which is what is the probability i have a cold [00:22:02] is what is the probability i have a cold if i just were coughing [00:22:05] if i just were coughing okay [00:22:06] okay so let's look at this demo [00:22:08] so let's look at this demo so here is [00:22:10] so here is um [00:22:11] um bayesian network for medical diagnosis [00:22:13] bayesian network for medical diagnosis where i've defined [00:22:14] where i've defined um c [00:22:16] um c a [00:22:16] a um h [00:22:18] um h and i [00:22:19] and i and now i'm conditioning on h equals one [00:22:21] and now i'm conditioning on h equals one and i equals one [00:22:23] and i equals one um and i'm asking for the probability of [00:22:25] um and i'm asking for the probability of c [00:22:26] c marginalizing out a [00:22:28] marginalizing out a okay so i am going to [00:22:32] okay so i am going to there's a few times it runs the sum [00:22:33] there's a few times it runs the sum variable animation algorithm which don't [00:22:35] variable animation algorithm which don't worry about that for now [00:22:37] worry about that for now um and it produces that the probability [00:22:39] um and it produces that the probability of c condition on equals one y equals [00:22:42] of c condition on equals one y equals one is [00:22:43] one is point one three [00:22:46] point one three um [00:22:46] um sorry [00:22:48] sorry this uh i meant to only condition on h [00:22:51] this uh i meant to only condition on h equals one [00:22:52] equals one so let me do that again and i get [00:22:55] so let me do that again and i get .28 so i'm gonna write point [00:22:59] .28 so i'm gonna write point here [00:23:00] here and now what is the probability when i [00:23:03] and now what is the probability when i condition on both um h equals one and i [00:23:06] condition on both um h equals one and i equals one [00:23:08] equals one and actually i already did this but i'll [00:23:10] and actually i already did this but i'll just do it again [00:23:12] just do it again this is going to be 0.13 [00:23:16] this is going to be 0.13 okay [00:23:17] okay so again you can rest assured that [00:23:20] so again you can rest assured that these calculations were followed before [00:23:22] these calculations were followed before using laws of probability [00:23:25] using laws of probability and one thing i want to point out is [00:23:27] and one thing i want to point out is that this is another case of explaining [00:23:29] that this is another case of explaining away but slightly disguised [00:23:32] away but slightly disguised so here's how to think about it so um i [00:23:37] so here's how to think about it so um i condition [00:23:38] condition on i equals one so i observe that i have [00:23:40] on i equals one so i observe that i have heis okay so heis is only connected to a [00:23:44] heis okay so heis is only connected to a so that's only going to boost [00:23:47] so that's only going to boost support for a [00:23:49] support for a even though i don't condition on a i'm [00:23:51] even though i don't condition on a i'm getting more support for a [00:23:53] getting more support for a and now having more support for a [00:23:56] and now having more support for a now this isn't explaining away [00:23:59] now this isn't explaining away why [00:24:01] why the cough so now a can explain the cough [00:24:04] the cough so now a can explain the cough which lessens the need for [00:24:07] which lessens the need for the cold [00:24:08] the cold so that's why the probability of cold [00:24:10] so that's why the probability of cold actually decreases compared to if i [00:24:13] actually decreases compared to if i didn't have hei [00:24:15] didn't have hei okay so you should be really kind of [00:24:18] okay so you should be really kind of impressed by this kind of [00:24:20] impressed by this kind of reasoning it's it's quite subtle even [00:24:23] reasoning it's it's quite subtle even for [00:24:24] for this very small four node bayesian [00:24:26] this very small four node bayesian network and even qualitatively you might [00:24:30] network and even qualitatively you might think [00:24:30] think it's it's time hard to understand what [00:24:33] it's it's time hard to understand what we'll happen to see [00:24:34] we'll happen to see so just uh imagine if you have a huge [00:24:37] so just uh imagine if you have a huge bayesian network and you want to get [00:24:39] bayesian network and you want to get qualitatively precise answers [00:24:42] qualitatively precise answers um you should be glad that we have [00:24:43] um you should be glad that we have vision networks that can allow you to [00:24:46] vision networks that can allow you to answer these questions [00:24:49] based on laws of probability [00:24:54] so [00:24:55] so now let's define [00:24:57] now let's define bayesian networks formally so a bayesian [00:25:00] bayesian networks formally so a bayesian network [00:25:02] network is specified by a set of random [00:25:04] is specified by a set of random variables generically x1 to xn [00:25:08] variables generically x1 to xn and it specifies a direct acyclic graph [00:25:12] and it specifies a direct acyclic graph over these random variables [00:25:15] over these random variables so that specifies the [00:25:18] so that specifies the dependencies qualitatively [00:25:20] dependencies qualitatively and then we specify a local conditional [00:25:24] and then we specify a local conditional distributions for each [00:25:26] distributions for each variable x i given [00:25:28] variable x i given the parents [00:25:29] the parents of x i [00:25:33] and when you [00:25:34] and when you multiply all these [00:25:36] multiply all these local conditional distributions together [00:25:40] local conditional distributions together then you get [00:25:41] then you get the joint distribution over all the [00:25:43] the joint distribution over all the random variables [00:25:45] random variables okay so again we're using local [00:25:47] okay so again we're using local uh [00:25:48] uh lowercase p to denote local conditional [00:25:50] lowercase p to denote local conditional distributions and blackboard p to denote [00:25:53] distributions and blackboard p to denote joint distribution [00:25:57] so now we can look at probabilistic [00:25:59] so now we can look at probabilistic inference more formally as well so [00:26:02] inference more formally as well so all the same friends you're given as [00:26:03] all the same friends you're given as input a bayesian network specifying this [00:26:05] input a bayesian network specifying this joint distribution so this is my [00:26:07] joint distribution so this is my probabilistic database [00:26:09] probabilistic database i get some evidence [00:26:11] i get some evidence where a subset of the variables e [00:26:14] where a subset of the variables e has been observed to take on particular [00:26:17] has been observed to take on particular values on little e [00:26:20] values on little e and i'm interested in a set of query [00:26:22] and i'm interested in a set of query variables q which is another set of [00:26:24] variables q which is another set of variables [00:26:26] variables so now the probabilistic inference [00:26:28] so now the probabilistic inference produces [00:26:29] produces a probability of q [00:26:32] a probability of q conditioned on the evidence [00:26:35] conditioned on the evidence and [00:26:36] and just to be very precise what this means [00:26:38] just to be very precise what this means this is [00:26:40] this is a probability [00:26:42] a probability u equals q given equals e [00:26:44] u equals q given equals e for each value [00:26:46] for each value uh little q [00:26:49] uh little q for example [00:26:50] for example if i have [00:26:51] if i have different coffee and have heis we have a [00:26:54] different coffee and have heis we have a cold [00:26:55] cold this is expressed as this probabilistic [00:26:57] this is expressed as this probabilistic inference question what is the [00:26:59] inference question what is the probability of a cold condition on [00:27:03] probability of a cold condition on coughing and hdrs [00:27:07] so this is a formal definition of [00:27:08] so this is a formal definition of probabilistic inference [00:27:10] probabilistic inference um the bad news is that answers [00:27:12] um the bad news is that answers computing this is actually going to turn [00:27:14] computing this is actually going to turn out to be very computationally [00:27:16] out to be very computationally intractable [00:27:17] intractable but we'll see algorithms that can tackle [00:27:20] but we'll see algorithms that can tackle either approximately or special cases [00:27:23] either approximately or special cases surely [00:27:25] surely so in summary we've introduced bayesian [00:27:27] so in summary we've introduced bayesian networks [00:27:29] networks it's important to think about [00:27:32] it's important to think about the basis of asian networks is based on [00:27:33] the basis of asian networks is based on these random variables which capture the [00:27:36] these random variables which capture the state of the world [00:27:38] state of the world we have directed edges between these [00:27:40] we have directed edges between these variables which represent [00:27:42] variables which represent directional dependencies [00:27:46] quantitatively we define a local [00:27:49] quantitatively we define a local conditional distribution for each [00:27:50] conditional distribution for each variable it's conditioned on its parents [00:27:53] variable it's conditioned on its parents and we multiply all those together to [00:27:54] and we multiply all those together to produce a joint distribution [00:27:57] produce a joint distribution now this joint distribution is my [00:27:59] now this joint distribution is my probabilistic database where i can ask [00:28:01] probabilistic database where i can ask questions about [00:28:03] questions about the world [00:28:05] the world and this is the process of probabilistic [00:28:07] and this is the process of probabilistic inference [00:28:09] inference and hopefully through the alarm and the [00:28:11] and hopefully through the alarm and the medical diagnosis examples [00:28:13] medical diagnosis examples hopefully you can appreciate um the [00:28:16] hopefully you can appreciate um the framework which is bayesian networks and [00:28:17] framework which is bayesian networks and that captures certain types of reasoning [00:28:20] that captures certain types of reasoning patterns such as explaining away which [00:28:22] patterns such as explaining away which might be intuitive or counter-intuitive [00:28:25] might be intuitive or counter-intuitive but you can rest that well at night [00:28:27] but you can rest that well at night because this is all based on laws of [00:28:30] because this is all based on laws of probability okay ================================================================================ LECTURE 034 ================================================================================ Bayesian Networks 3 - Probabilistic Programming | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=ZVk8y1zVoD4 --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about probabilistic programming new way [00:00:09] about probabilistic programming new way to think about defining bayesian [00:00:11] to think about defining bayesian networks through the lens of writing [00:00:13] networks through the lens of writing programs and this really is going to [00:00:15] programs and this really is going to highlight the generative process aspect [00:00:17] highlight the generative process aspect of bayesian networks [00:00:20] of bayesian networks so recall in bayesian networks is [00:00:21] so recall in bayesian networks is defined by a set of variables [00:00:25] defined by a set of variables there are directed edges between the [00:00:28] there are directed edges between the random variables that capture [00:00:29] random variables that capture qualitative relationships [00:00:32] qualitative relationships then for every variable [00:00:34] then for every variable we define a local conditional [00:00:35] we define a local conditional distribution condition on the parents of [00:00:39] distribution condition on the parents of that variable [00:00:40] that variable you multiply all these together and you [00:00:42] you multiply all these together and you get the joint distribution over all the [00:00:45] get the joint distribution over all the random [00:00:47] random and then given this joint distribution [00:00:49] and then given this joint distribution as a probabilistic database you can go [00:00:51] as a probabilistic database you can go and do probabilistic inference and [00:00:53] and do probabilistic inference and answer all sorts of questions [00:00:56] answer all sorts of questions so what we're going to focus on today is [00:00:57] so what we're going to focus on today is how to write down this [00:01:00] how to write down this joint distribution or via the beijing [00:01:02] joint distribution or via the beijing network [00:01:03] network so we're going to look at it in a you [00:01:05] so we're going to look at it in a you via the lens of programs let's go [00:01:07] via the lens of programs let's go through this example so let me write a [00:01:09] through this example so let me write a short program that i claim is going to [00:01:11] short program that i claim is going to be equivalent to writing down either [00:01:16] be equivalent to writing down either this ex this [00:01:17] this ex this equation or drawing this graph [00:01:20] equation or drawing this graph so here it goes [00:01:22] so here it goes so first i'm going to draw b from a [00:01:24] so first i'm going to draw b from a bernoulli distribution [00:01:26] bernoulli distribution so you can think about this as the [00:01:28] so you can think about this as the bernoulli is just a function [00:01:31] bernoulli is just a function that [00:01:32] that it snaps on [00:01:33] it snaps on and returns [00:01:35] and returns one with or true with probability [00:01:37] one with or true with probability epsilon [00:01:39] epsilon so b is going to be set to 1 or true [00:01:43] so b is going to be set to 1 or true with a probability epsilon [00:01:45] with a probability epsilon i'm going to independently do the same [00:01:47] i'm going to independently do the same for e [00:01:49] for e and then finally i'm going to set a [00:01:51] and then finally i'm going to set a equals b for e so if i run this program [00:01:56] equals b for e so if i run this program it's going to produce a setting to a [00:02:00] it's going to produce a setting to a b and b [00:02:02] b and b so in general a project program is just [00:02:04] so in general a project program is just a randomized program such that if you [00:02:06] a randomized program such that if you run it it sets the random variables [00:02:10] run it it sets the random variables and in particular it produces an [00:02:11] and in particular it produces an assignment to the random variable [00:02:15] assignment to the random variable so [00:02:16] so while you can run the program it's [00:02:19] while you can run the program it's useful to think about the program itself [00:02:21] useful to think about the program itself as just a mathematical construct that's [00:02:24] as just a mathematical construct that's used to define [00:02:25] used to define a distribution in particular the [00:02:28] a distribution in particular the probability of a program producing a [00:02:30] probability of a program producing a particular assignment is going to be [00:02:33] particular assignment is going to be if by definition the joint distribution [00:02:35] if by definition the joint distribution over that assignment [00:02:38] over that assignment so let's look at a more interesting [00:02:40] so let's look at a more interesting example that showcases the convenience [00:02:42] example that showcases the convenience of using programming [00:02:45] of using programming so this one's going to have a for loop [00:02:46] so this one's going to have a for loop in it so let's say we're doing object [00:02:48] in it so let's say we're doing object tracking so we're going to assume that [00:02:51] tracking so we're going to assume that there's some object that starts [00:02:53] there's some object that starts at 0 0 [00:02:55] at 0 0 and then for each time step 1 through n [00:02:59] and then for each time step 1 through n i'm going to [00:03:01] i'm going to with probability [00:03:03] with probability alpha [00:03:05] alpha i'm going to go right so i minus 1 x i [00:03:09] i'm going to go right so i minus 1 x i minus 1 is a previous location i'm going [00:03:11] minus 1 is a previous location i'm going to add 1 comma 0 to it [00:03:14] to add 1 comma 0 to it i'm going to go right [00:03:16] i'm going to go right or [00:03:17] or with probably one minus alpha i'm going [00:03:18] with probably one minus alpha i'm going to go down [00:03:21] to go down so here is the [00:03:24] so here is the bayesian network corresponding to this [00:03:26] bayesian network corresponding to this probabilistic program so you can see [00:03:28] probabilistic program so you can see that each x i depends only on x i minus [00:03:32] that each x i depends only on x i minus one [00:03:34] one the cool part is that this is a program [00:03:36] the cool part is that this is a program and we can actually run this so this is [00:03:38] and we can actually run this so this is implemented in javascript behind scenes [00:03:40] implemented in javascript behind scenes and you click run [00:03:42] and you click run with alpha equals 0.5 and each run [00:03:45] with alpha equals 0.5 and each run produces an assignment x1 x2 x3 x4 and [00:03:49] produces an assignment x1 x2 x3 x4 and so on [00:03:51] so on to the random variables and we can [00:03:53] to the random variables and we can visualize them [00:03:54] visualize them and you can play with alpha if alpha [00:03:56] and you can play with alpha if alpha equals two [00:03:57] equals two then [00:03:58] then um actually let's make this point one [00:04:02] um actually let's make this point one then [00:04:05] i actually need to press ctrl enter to [00:04:08] i actually need to press ctrl enter to save if it's point one then all the [00:04:10] save if it's point one then all the trajectories are going to be over here [00:04:12] trajectories are going to be over here it's point nine [00:04:15] it's point nine then the trajectories are [00:04:20] so this [00:04:22] so this program [00:04:23] program is specifies what is called a markov [00:04:25] is specifies what is called a markov model which is a special case of a [00:04:27] model which is a special case of a bayesian network where [00:04:29] bayesian network where we have a chain each variable is only [00:04:32] we have a chain each variable is only dependent on the previous [00:04:36] so [00:04:38] so with this markup model we can ask [00:04:40] with this markup model we can ask particular questions for example what [00:04:42] particular questions for example what are the possible trajectories given [00:04:44] are the possible trajectories given the evidence x10 equals h2 [00:04:48] the evidence x10 equals h2 so here [00:04:49] so here i'm going to condition on [00:04:52] i'm going to condition on x10 equals a2 [00:04:54] x10 equals a2 and if i run this then i'm sampling from [00:04:58] and if i run this then i'm sampling from all the program traces where i restrict [00:05:01] all the program traces where i restrict only those ones where x10 is clamped to [00:05:05] only those ones where x10 is clamped to a2 [00:05:07] a2 so this is a way to visualize [00:05:09] so this is a way to visualize the conditional distribution [00:05:12] the conditional distribution of a problem 6 program [00:05:16] so now i'm going to quickly go through [00:05:19] so now i'm going to quickly go through a set of examples of bayesian networks [00:05:22] a set of examples of bayesian networks and by using probabilistic programs to [00:05:25] and by using probabilistic programs to write them down [00:05:26] write them down um so this is going to be a fairly broad [00:05:29] um so this is going to be a fairly broad uh and quick overview um [00:05:32] uh and quick overview um so one [00:05:33] so one another my application is in language [00:05:35] another my application is in language modeling which is often used to score [00:05:38] modeling which is often used to score sentences [00:05:39] sentences for speech recognition or machine [00:05:40] for speech recognition or machine translation [00:05:43] translation so here is a probabilistic program [00:05:46] so here is a probabilistic program for each position in the sentence we're [00:05:48] for each position in the sentence we're going to generate a word x i [00:05:50] going to generate a word x i given i minus one so this is actually in [00:05:54] given i minus one so this is actually in nlp it's called a bi-gram or more [00:05:57] nlp it's called a bi-gram or more generally n-gram [00:05:58] generally n-gram model [00:05:59] model so here we generate x1 maybe that's rec [00:06:03] so here we generate x1 maybe that's rec and generate x2 given x1 maybe that's ah [00:06:06] and generate x2 given x1 maybe that's ah and made x3 give an x2 that's nice and [00:06:08] and made x3 give an x2 that's nice and x4 given x3 that's beach [00:06:15] so here is an example of object tracking [00:06:18] so here is an example of object tracking which that's actually we're going to [00:06:20] which that's actually we're going to study at length in future modules [00:06:23] study at length in future modules this is called a hidden markov model [00:06:26] this is called a hidden markov model um so here for every time step equals [00:06:29] um so here for every time step equals one to big t [00:06:31] one to big t i'm going to generate an object location [00:06:34] i'm going to generate an object location ht [00:06:35] ht so for example h1 i'm going to generate [00:06:38] so for example h1 i'm going to generate 3 1 [00:06:39] 3 1 and then i'm going to also generate a [00:06:40] and then i'm going to also generate a sensor reading et [00:06:43] sensor reading et given ht [00:06:44] given ht so given h1 i'm going to generate [00:06:47] so given h1 i'm going to generate um e1 [00:06:49] um e1 and i might get something like just the [00:06:51] and i might get something like just the sum of coordinates uh [00:06:53] sum of coordinates uh example [00:06:55] example and then i'm gonna move to the next time [00:06:56] and then i'm gonna move to the next time step generate h2 given h1 um [00:07:00] step generate h2 given h1 um maybe that's three two i'm going to [00:07:02] maybe that's three two i'm going to generate a sensor reading which is some [00:07:04] generate a sensor reading which is some of the coordinates and then so on [00:07:06] of the coordinates and then so on generate h3 generate e3 turn h4 e4 and [00:07:10] generate h3 generate e3 turn h4 e4 and ray h5e [00:07:13] ray h5e so that specifies the joint distribution [00:07:15] so that specifies the joint distribution over these [00:07:17] over these object locations [00:07:19] object locations and sensor readings and now a canonical [00:07:21] and sensor readings and now a canonical question you might want to ask is given [00:07:23] question you might want to ask is given the sensor readings where is the object [00:07:28] so here is a generalization [00:07:31] so here is a generalization of the hmm to allow for multiple object [00:07:34] of the hmm to allow for multiple object tracking it's called a factorial hmm [00:07:37] tracking it's called a factorial hmm so here for every time step now i'm [00:07:39] so here for every time step now i'm going to have two objects a and b [00:07:42] going to have two objects a and b and i'm going to generate the location [00:07:44] and i'm going to generate the location of object o at time step t [00:07:48] of object o at time step t um for example here i have [00:07:51] um for example here i have h 1 a [00:07:53] h 1 a and h 1 b [00:07:56] and h 1 b and i'm going to generate a single [00:07:59] and i'm going to generate a single sensor reading [00:08:00] sensor reading which depends on both the objects [00:08:04] which depends on both the objects so here i have e1 [00:08:07] so here i have e1 condition on both [00:08:08] condition on both h1a and [00:08:11] h1a and go h1b [00:08:11] go h1b the next time step [00:08:13] the next time step generate the object locations [00:08:16] generate the object locations for [00:08:18] for the two objects and then generate the [00:08:20] the two objects and then generate the sensor reading condition on those two [00:08:22] sensor reading condition on those two objects [00:08:23] objects transition to the third time step [00:08:26] transition to the third time step generate the sensor reading transition [00:08:28] generate the sensor reading transition to the fourth time step [00:08:30] to the fourth time step sensor reading [00:08:33] so in general this defines a joint [00:08:35] so in general this defines a joint distribution now overall [00:08:38] distribution now overall the object locations for both objects as [00:08:42] the object locations for both objects as well as the corresponding sensor reading [00:08:46] so here is another uh classic example [00:08:50] so here is another uh classic example called naive bayes which is often used [00:08:52] called naive bayes which is often used for [00:08:53] for a very fast classification [00:08:56] a very fast classification so the way naive phase works is that [00:08:58] so the way naive phase works is that we're going to [00:08:59] we're going to generate a class or a label y [00:09:03] generate a class or a label y example in document classification i [00:09:05] example in document classification i might generate that this document is [00:09:07] might generate that this document is going to be about travel [00:09:09] going to be about travel and then for each position in the [00:09:11] and then for each position in the document [00:09:12] document i'm going to generate a word wi [00:09:16] i'm going to generate a word wi so for this one i might generate a beach [00:09:19] so for this one i might generate a beach um [00:09:20] um second word my general paris [00:09:22] second word my general paris um and then [00:09:25] um and then all the way up to wl [00:09:27] all the way up to wl so now the typical way you uh um [00:09:30] so now the typical way you uh um use these database models is that you're [00:09:32] use these database models is that you're given a text document which is the [00:09:34] given a text document which is the sequence of words you asked for [00:09:37] sequence of words you asked for the label what is this document [00:09:43] so a fancier version of the naive phase [00:09:46] so a fancier version of the naive phase model is [00:09:48] model is called laying dirichlet allocation [00:09:50] called laying dirichlet allocation and here we're going to assume that a [00:09:53] and here we're going to assume that a document is not just about one topic but [00:09:55] document is not just about one topic but possibly multiple topics so i'm going to [00:09:56] possibly multiple topics so i'm going to generate a distribution over topics [00:09:58] generate a distribution over topics called alpha notice that this is [00:10:00] called alpha notice that this is actually a continuous [00:10:02] actually a continuous random variable a might take on [00:10:05] random variable a might take on uh values um [00:10:08] uh values um which uh assigns probability 0.8 to [00:10:11] which uh assigns probability 0.8 to travel 1.2 to europe [00:10:13] travel 1.2 to europe and then for again for each us element [00:10:16] and then for again for each us element in the document each position i [00:10:19] in the document each position i and generate a topic [00:10:21] and generate a topic zi [00:10:22] zi so here i might generate um [00:10:25] so here i might generate um travel for z1 [00:10:27] travel for z1 i'm going to generate a word [00:10:29] i'm going to generate a word given that topic so here w1 [00:10:33] given that topic so here w1 um given the d1 let me carry the beach [00:10:37] um given the d1 let me carry the beach then move on to the next word generate a [00:10:38] then move on to the next word generate a topic generate a word given the topic [00:10:41] topic generate a word given the topic and so on and so forth before i reach [00:10:43] and so on and so forth before i reach the end [00:10:44] the end document [00:10:46] document okay so the typical way you would [00:10:48] okay so the typical way you would uh use lda is that you're given a text [00:10:50] uh use lda is that you're given a text document [00:10:51] document the words here [00:10:53] the words here what topics is it about [00:10:55] what topics is it about i want to infer what are the topics for [00:10:58] i want to infer what are the topics for each of the words but also the topic [00:11:00] each of the words but also the topic distribution for that [00:11:05] so here is another example which [00:11:07] so here is another example which generalizes the bayesian network that we [00:11:08] generalizes the bayesian network that we actually saw in a previous module [00:11:11] actually saw in a previous module so [00:11:13] so in general let's suppose that [00:11:16] in general let's suppose that you have a bunch of diseases [00:11:19] you have a bunch of diseases we're going to generate for each disease [00:11:21] we're going to generate for each disease di which is the activity of disease high [00:11:24] di which is the activity of disease high so we might have pneumonia generator 1 [00:11:28] so we might have pneumonia generator 1 cold and malaria [00:11:31] cold and malaria um [00:11:32] um and we're going to have a set of [00:11:34] and we're going to have a set of symptoms and symptoms [00:11:37] symptoms and symptoms where each symptom would generate an [00:11:38] where each symptom would generate an activity [00:11:40] activity of that symptom sj [00:11:43] of that symptom sj we might have fever [00:11:45] we might have fever which depends on [00:11:47] which depends on the diseases [00:11:49] the diseases and we might have a cough which depends [00:11:51] and we might have a cough which depends on the set of diseases and we have [00:11:54] on the set of diseases and we have vomiting which depends on the diseases [00:11:58] vomiting which depends on the diseases so now the way you typically use this [00:12:00] so now the way you typically use this visual network is that a patient comes [00:12:02] visual network is that a patient comes in [00:12:03] in and reports some symptoms [00:12:06] and reports some symptoms um you ask the question what diseases [00:12:09] um you ask the question what diseases might they have [00:12:10] might they have and i'll just point out that this is a [00:12:12] and i'll just point out that this is a case where missing information can be [00:12:14] case where missing information can be handled naturally if you a patient [00:12:16] handled naturally if you a patient doesn't have [00:12:18] doesn't have um you if you didn't record a particular [00:12:20] um you if you didn't record a particular symptom [00:12:21] symptom then you can just [00:12:23] then you can just ignore that variable [00:12:27] so here is another example [00:12:30] so here is another example motivation is that you have a social [00:12:32] motivation is that you have a social network you want to analyze [00:12:34] network you want to analyze why certain people are connected with [00:12:36] why certain people are connected with other people [00:12:37] other people so the model is formally called a [00:12:39] so the model is formally called a sarcastic block model [00:12:42] sarcastic block model the idea is that for each person [00:12:44] the idea is that for each person we're going to generate a type of that [00:12:47] we're going to generate a type of that person [00:12:49] person so maybe we have three people a [00:12:50] so maybe we have three people a politician a scientist [00:12:53] politician a scientist and another scientist [00:12:55] and another scientist and then for every pair of people we're [00:12:58] and then for every pair of people we're going to generate [00:13:00] going to generate whether those two people are connected [00:13:02] whether those two people are connected eij is a boolean [00:13:04] eij is a boolean whether [00:13:07] this [00:13:08] this politician and the scientist [00:13:10] politician and the scientist um [00:13:11] um might be connected there's a one [00:13:14] might be connected there's a one and the generation of this only depends [00:13:16] and the generation of this only depends on the types [00:13:17] on the types of the two people in consideration [00:13:20] of the two people in consideration so [00:13:21] so two and three [00:13:23] two and three are scientists and they're connected and [00:13:25] are scientists and they're connected and this politician and this scientist [00:13:28] this politician and this scientist are not connected [00:13:31] are not connected so remember we are given [00:13:34] so remember we are given the social network which [00:13:35] the social network which are [00:13:36] are just the connectivity structures so [00:13:38] just the connectivity structures so these e's [00:13:40] these e's and we're asked what is the probability [00:13:43] and we're asked what is the probability of [00:13:44] of the people [00:13:45] the people being of certain types [00:13:49] so [00:13:50] so that was a whirlwind tour of a lot of [00:13:52] that was a whirlwind tour of a lot of different bayesian popular bayesian [00:13:55] different bayesian popular bayesian network [00:13:56] network architectures in the literature [00:13:58] architectures in the literature but they all basically boil down to this [00:14:02] but they all basically boil down to this one which is that there is [00:14:04] one which is that there is a variable or a set of variables h [00:14:08] a variable or a set of variables h which are generated first and then [00:14:10] which are generated first and then giving rise to a set of [00:14:12] giving rise to a set of variables e [00:14:14] variables e so the probabilistic program [00:14:18] so the probabilistic program specifies a bayesian network by running [00:14:20] specifies a bayesian network by running it it gives you a joint assignment and [00:14:23] it it gives you a joint assignment and the probability of producing that [00:14:25] the probability of producing that uh joint assignment is the joint [00:14:27] uh joint assignment is the joint probability [00:14:29] probability there are many many types of models i've [00:14:31] there are many many types of models i've only given you a very small subsample of [00:14:34] only given you a very small subsample of them [00:14:35] them but the [00:14:36] but the i want you to take away from this a [00:14:38] i want you to take away from this a general [00:14:39] general uh [00:14:40] uh paradigm [00:14:42] paradigm is that you come up with stories [00:14:44] is that you come up with stories of how quantities of interest [00:14:47] of how quantities of interest each um [00:14:49] each um generate the data that you observe [00:14:52] generate the data that you observe e [00:14:53] e so this is really the opposite of how [00:14:55] so this is really the opposite of how you normally think about machine [00:14:57] you normally think about machine learning or classification where you [00:14:59] learning or classification where you start with the inputs and then you [00:15:01] start with the inputs and then you define a sequence of operations to [00:15:03] define a sequence of operations to produce the outputs [00:15:05] produce the outputs invasion networks often it's reversed [00:15:07] invasion networks often it's reversed you think about the quantities of [00:15:09] you think about the quantities of interest first [00:15:11] interest first how they might arise in the world and [00:15:13] how they might arise in the world and then how [00:15:14] then how the data is generated from those [00:15:16] the data is generated from those quantity of interest [00:15:18] quantity of interest so this paradigm might take a little bit [00:15:21] so this paradigm might take a little bit of getting used to but it might become [00:15:23] of getting used to but it might become natural after some practice all right [00:15:26] natural after some practice all right that's in ================================================================================ LECTURE 035 ================================================================================ Bayesian Networks 4 - Probabilistic Inference | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=-dGOWB9Zh8s --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about the general strategy for [00:00:08] about the general strategy for performing probabilistic inference [00:00:10] performing probabilistic inference injection networks [00:00:12] injection networks so recall that the bayesian network [00:00:15] so recall that the bayesian network consists of a set of random variables [00:00:17] consists of a set of random variables for example [00:00:19] for example cold allergies cough [00:00:21] cold allergies cough and hgi [00:00:23] and hgi and then the bayesian network defines a [00:00:25] and then the bayesian network defines a direct acyclic graph over these random [00:00:28] direct acyclic graph over these random variables that capture the qualitative [00:00:30] variables that capture the qualitative dependencies [00:00:31] dependencies between the variables for example cough [00:00:34] between the variables for example cough is caused by [00:00:35] is caused by old or allergies [00:00:37] old or allergies gis is caused by allergies alone [00:00:40] gis is caused by allergies alone quantitatively the bayesian network [00:00:41] quantitatively the bayesian network specifies a set of local conditional [00:00:44] specifies a set of local conditional distribution [00:00:45] distribution of each variable x i given [00:00:48] of each variable x i given the parents [00:00:49] the parents of phi [00:00:52] of phi and so in this example i would have [00:00:53] and so in this example i would have probability of c times probability of a [00:00:56] probability of c times probability of a times probability of each given cna [00:00:58] times probability of each given cna times probability of i given a [00:01:02] times probability of i given a and then when i multiply all these [00:01:04] and then when i multiply all these probabilities uh together [00:01:07] probabilities uh together then i get by definition the joint [00:01:11] then i get by definition the joint uh this [00:01:12] uh this probability distribution over all of the [00:01:14] probability distribution over all of the random [00:01:16] random in this case i have a joint distribution [00:01:18] in this case i have a joint distribution over [00:01:18] over c [00:01:19] c a [00:01:21] a h [00:01:22] h and i [00:01:26] you can think about the [00:01:27] you can think about the bayesian network as defining this joint [00:01:29] bayesian network as defining this joint distribution which is a probabilistic [00:01:31] distribution which is a probabilistic database [00:01:32] database where you can answer questions about [00:01:35] where you can answer questions about this for example [00:01:37] this for example what is the probability of c given h [00:01:40] what is the probability of c given h equals 1 and i equals 1. generally you [00:01:42] equals 1 and i equals 1. generally you have a bayesian network [00:01:44] have a bayesian network some of the variables you [00:01:46] some of the variables you observe as evidence for example [00:01:49] observe as evidence for example h and i in this case [00:01:51] h and i in this case and another set of variables you are [00:01:53] and another set of variables you are interested in which are the query [00:01:55] interested in which are the query variables [00:01:57] variables so that way q would be c here [00:02:00] so that way q would be c here and what we want to produce is [00:02:02] and what we want to produce is the probability of the query variable's [00:02:05] the probability of the query variable's condition on evidence [00:02:07] condition on evidence normally this is a probability [00:02:09] normally this is a probability of q equals q [00:02:11] of q equals q for each of the values of little q [00:02:18] so the overarching strategy that we're [00:02:20] so the overarching strategy that we're going to take for performing inference [00:02:22] going to take for performing inference invasion networks is to convert them [00:02:24] invasion networks is to convert them into markov networks which we discussed [00:02:27] into markov networks which we discussed inference for [00:02:29] inference for so [00:02:30] so recall we're going to walk through this [00:02:31] recall we're going to walk through this example [00:02:33] example so recall that the joint distribution [00:02:36] so recall that the joint distribution over [00:02:38] over the variables here [00:02:39] the variables here is equal to the simply the product of [00:02:43] is equal to the simply the product of the local conditional distributions [00:02:46] the local conditional distributions by definition of the beijing [00:02:50] by definition of the beijing but [00:02:51] but these local conditional distributions [00:02:53] these local conditional distributions are non-negative quantities so they can [00:02:55] are non-negative quantities so they can be interpreted as factors [00:02:57] be interpreted as factors in the factor graph so let's draw the [00:02:59] in the factor graph so let's draw the factor graph [00:03:01] factor graph so here we have the same set of [00:03:03] so here we have the same set of variables [00:03:05] variables for every variable we have [00:03:07] for every variable we have a factor corresponding to local [00:03:09] a factor corresponding to local conditional distribution we have [00:03:11] conditional distribution we have probability of c [00:03:13] probability of c probability of a [00:03:15] probability of a probability of h given c and a which [00:03:18] probability of h given c and a which connects c a and h [00:03:21] connects c a and h and then probability of i [00:03:24] so [00:03:25] so in the factory graph representation [00:03:27] in the factory graph representation these are simply functions [00:03:29] these are simply functions this is a function that takes depends on [00:03:31] this is a function that takes depends on c and h and the factor graph doesn't [00:03:34] c and h and the factor graph doesn't really care that it's a local [00:03:35] really care that it's a local distribution [00:03:37] distribution so now remember in a markov network [00:03:40] so now remember in a markov network we take a factor graph [00:03:42] we take a factor graph and we multiply all the factors together [00:03:44] and we multiply all the factors together and we divide by the normalization [00:03:47] and we divide by the normalization constant to get [00:03:49] constant to get this product to sum to one [00:03:52] this product to sum to one but notice that in this case that the [00:03:54] but notice that in this case that the normalization constant is exactly one [00:03:56] normalization constant is exactly one because we [00:03:58] because we had this equality from the definition of [00:04:01] had this equality from the definition of the bayesian network so z has to be one [00:04:05] the bayesian network so z has to be one in this case [00:04:06] in this case so the bayesian network is just a markov [00:04:08] so the bayesian network is just a markov network with a normalization constant [00:04:11] network with a normalization constant one [00:04:12] one and that means we can take [00:04:16] and that means we can take any major network and [00:04:18] any major network and reinterpret it as a markov network and [00:04:20] reinterpret it as a markov network and answer all sorts of marginal queries for [00:04:22] answer all sorts of marginal queries for example we can ask for the probability [00:04:24] example we can ask for the probability of a we can ask for the probability of h [00:04:27] of a we can ask for the probability of h and so on [00:04:29] and so on i'll just remind you that [00:04:32] i'll just remind you that a single factor [00:04:34] a single factor connects all the parents [00:04:36] connects all the parents so notice that there are two edges c to [00:04:38] so notice that there are two edges c to h [00:04:39] h a to h here but in the factory graph [00:04:42] a to h here but in the factory graph representation you should connect [00:04:44] representation you should connect um [00:04:46] um the parents and the child into one [00:04:50] the parents and the child into one okay so [00:04:52] okay so there's only one thing missing from this [00:04:54] there's only one thing missing from this picture which is that often in bayesian [00:04:56] picture which is that often in bayesian networks you want to condition on [00:04:58] networks you want to condition on evidence [00:05:00] evidence so let's condition on h and i [00:05:05] to do this we're going to define a [00:05:07] to do this we're going to define a markov network [00:05:09] markov network over the non-conditioned variables so in [00:05:12] over the non-conditioned variables so in this case that's going to be c [00:05:14] this case that's going to be c equals c [00:05:15] equals c a equals a condition on h equals 1 and i [00:05:18] a equals a condition on h equals 1 and i equals 1. [00:05:20] equals 1. and [00:05:22] and what i'm going to do is we're just going [00:05:25] what i'm going to do is we're just going to substitute the values [00:05:27] to substitute the values to of the evidence into the factors [00:05:30] to of the evidence into the factors themselves [00:05:31] themselves so here is the factor graph i have only [00:05:33] so here is the factor graph i have only c and a left [00:05:35] c and a left and p of c and p of a is the same [00:05:38] and p of c and p of a is the same and now we have this factor [00:05:41] and now we have this factor that depends on c and a but h is equal [00:05:44] that depends on c and a but h is equal to one so i don't need to represent h as [00:05:46] to one so i don't need to represent h as a variable [00:05:48] a variable and i hear the same i equals one so i [00:05:50] and i hear the same i equals one so i don't need to represent i as x variable [00:05:53] don't need to represent i as x variable so now i take these four factors [00:05:56] so now i take these four factors and i multiply them all together [00:05:59] and i multiply them all together and i get [00:06:01] and i get that is this factor graph and now i need [00:06:03] that is this factor graph and now i need to normalize [00:06:05] to normalize by one over z it's a different z now [00:06:08] by one over z it's a different z now in this case [00:06:10] in this case z is not one because i'm conditioning on [00:06:12] z is not one because i'm conditioning on evidence [00:06:14] evidence and in particular [00:06:16] and in particular z is going to be the probability of the [00:06:20] z is going to be the probability of the evidence [00:06:22] evidence and you can see this because this is a [00:06:24] and you can see this because this is a joint this this is a conditional [00:06:26] joint this this is a conditional distribution and conditional [00:06:28] distribution and conditional distribution is equal to the joint [00:06:30] distribution is equal to the joint distribution [00:06:32] distribution divided by the marginal of the thing [00:06:35] divided by the marginal of the thing that you're conditioning on [00:06:36] that you're conditioning on z has to be equal to the marginal [00:06:38] z has to be equal to the marginal evidence [00:06:40] evidence but nonetheless this is a markov network [00:06:44] but nonetheless this is a markov network and now again we can run any inference [00:06:46] and now again we can run any inference algorithm we like over this markup [00:06:49] algorithm we like over this markup network for example gibbs sampling [00:06:51] network for example gibbs sampling so let me actually do that in this uh [00:06:54] so let me actually do that in this uh here [00:06:55] here so here is the medical diagnosis we [00:06:58] so here is the medical diagnosis we define variable c [00:07:00] define variable c a [00:07:01] a h and i [00:07:03] h and i we're gonna condition on h equals one i [00:07:06] we're gonna condition on h equals one i equals one [00:07:07] equals one now we're interested in the marginal [00:07:09] now we're interested in the marginal probability of c [00:07:10] probability of c and we're going to run gibbs sampling [00:07:14] and we're going to run gibbs sampling so um if sampling remember it's going to [00:07:18] so um if sampling remember it's going to take opera arbitrary factor graph [00:07:22] take opera arbitrary factor graph or markup network and it's going to go [00:07:24] or markup network and it's going to go through an assignment and reassign each [00:07:27] through an assignment and reassign each variable one at a time and it's going to [00:07:29] variable one at a time and it's going to accumulate these counts [00:07:31] accumulate these counts let me speed this up a little bit and [00:07:33] let me speed this up a little bit and say do a thousand steps at a time [00:07:35] say do a thousand steps at a time and now you can see that these counts [00:07:38] and now you can see that these counts should converge to [00:07:40] should converge to the right it's about [00:07:42] the right it's about 0.13 should converge to the right [00:07:46] 0.13 should converge to the right probability of c [00:07:47] probability of c condition on h and i [00:07:54] so then we're kind of done we [00:07:57] so then we're kind of done we have a bayesian network we condition on [00:07:58] have a bayesian network we condition on evidence [00:08:00] evidence we reform this reduced factor graph [00:08:03] we reform this reduced factor graph or markov network and then we just run [00:08:06] or markov network and then we just run gibbs sampling [00:08:08] so in some sense we are done but i want [00:08:11] so in some sense we are done but i want to push this a little bit further and [00:08:13] to push this a little bit further and show how we can leverage the structure [00:08:15] show how we can leverage the structure of asian networks to optimize things [00:08:18] of asian networks to optimize things so let's take another example where [00:08:20] so let's take another example where we're now conditioning on [00:08:22] we're now conditioning on h [00:08:23] h okay so [00:08:25] okay so we're conditioning our h so let's go [00:08:26] we're conditioning our h so let's go through the motions here we're going to [00:08:28] through the motions here we're going to find the markup network on the um [00:08:31] find the markup network on the um the variables that we didn't condition [00:08:34] the variables that we didn't condition on condition on h equals one [00:08:37] on condition on h equals one and that's going to be equal to just the [00:08:39] and that's going to be equal to just the product of all the local conditional [00:08:42] product of all the local conditional distributions where we've substituted [00:08:45] distributions where we've substituted now h equals one [00:08:46] now h equals one and now the normalization constant is [00:08:48] and now the normalization constant is the probability of the evidence [00:08:51] the probability of the evidence and now i can ask the question [00:08:55] and now i can ask the question what is the probability of c equals one [00:08:56] what is the probability of c equals one given equals one [00:08:58] given equals one this is something that i can just go and [00:09:02] this is something that i can just go and compute using keep sampling [00:09:04] compute using keep sampling but the question is can we reduce the [00:09:06] but the question is can we reduce the markov network before running inference [00:09:08] markov network before running inference because if we can get the markup network [00:09:10] because if we can get the markup network to be a little bit smaller [00:09:12] to be a little bit smaller then hopefully inference can be a bit [00:09:14] then hopefully inference can be a bit faster [00:09:17] so the answer is yes and [00:09:21] so the answer is yes and we're going to show this by doing a [00:09:22] we're going to show this by doing a little bit of algebra here so here is [00:09:28] this visual network again [00:09:30] this visual network again where i've conditioned on h [00:09:32] where i've conditioned on h so now let me [00:09:35] so now let me compute the marginal distribution where [00:09:38] compute the marginal distribution where i've marginalized that i so here i don't [00:09:40] i've marginalized that i so here i don't have i anymore [00:09:42] have i anymore but i can express this in terms of [00:09:45] but i can express this in terms of this probability of c a and i [00:09:49] this probability of c a and i where i simply sum out all possible [00:09:52] where i simply sum out all possible values of [00:09:53] values of so this is just the definition of [00:09:55] so this is just the definition of marginal not marginal probability [00:09:58] marginal not marginal probability so now i can um using the definition of [00:10:02] so now i can um using the definition of the bayesian network i can rewrite the [00:10:04] the bayesian network i can rewrite the joint distribution [00:10:07] joint distribution in terms of local conditional [00:10:08] in terms of local conditional distribution [00:10:10] distribution okay so [00:10:14] and now [00:10:15] and now i [00:10:16] i make an observation which is that [00:10:21] make an observation which is that summing over i but none of this actually [00:10:23] summing over i but none of this actually depends on i except for this last [00:10:26] depends on i except for this last factor [00:10:27] factor so what i can do is [00:10:29] so what i can do is push all this stuff out or equivalently [00:10:32] push all this stuff out or equivalently push the summation inside so now it's [00:10:35] push the summation inside so now it's wrapped tightly around this a p of i [00:10:38] wrapped tightly around this a p of i given a [00:10:40] given a now what is this [00:10:42] now what is this by definition of local conditional [00:10:44] by definition of local conditional distributions this is exactly one so it [00:10:48] distributions this is exactly one so it gets dropped [00:10:49] gets dropped so now i have this nicer form [00:10:52] so now i have this nicer form but not only is it smaller [00:10:54] but not only is it smaller let's try to understand what it is it's [00:10:56] let's try to understand what it is it's this is [00:10:58] this is the probability of c [00:11:00] the probability of c probability of a probability of h equals [00:11:03] probability of a probability of h equals 1 given c and a [00:11:05] 1 given c and a so it's as if [00:11:08] so it's as if this variable i didn't exist at all [00:11:12] this variable i didn't exist at all so this is a [00:11:14] so this is a general idea behind bayesian networks [00:11:16] general idea behind bayesian networks which is that you can throw away any [00:11:19] which is that you can throw away any unobserved leaves before running [00:11:21] unobserved leaves before running inference [00:11:22] inference so this is very powerful because [00:11:25] so this is very powerful because it connects [00:11:26] it connects [Music] [00:11:27] [Music] over variables which is generally an [00:11:29] over variables which is generally an algebraic operation involves a lot of [00:11:31] algebraic operation involves a lot of hard work [00:11:33] hard work with [00:11:34] with removal which is a graph operation which [00:11:37] removal which is a graph operation which is more intuitive than trivial in this [00:11:39] is more intuitive than trivial in this case [00:11:40] case so in general marginalization is hard [00:11:42] so in general marginalization is hard but when they are unobserved leaves of a [00:11:45] but when they are unobserved leaves of a bayesian network it is trivial just [00:11:47] bayesian network it is trivial just remove it [00:11:52] so here is another type of structure we [00:11:54] so here is another type of structure we can exploit which is actually not [00:11:56] can exploit which is actually not specific to bayesian networks [00:11:58] specific to bayesian networks it shows up more generally in markup [00:12:00] it shows up more generally in markup network [00:12:01] network so let's take another example here we're [00:12:03] so let's take another example here we're going to condition on i this time [00:12:06] going to condition on i this time so [00:12:07] so here we're going to define this markup [00:12:08] here we're going to define this markup network where [00:12:10] network where let's just write down [00:12:12] let's just write down this query that we're interested in [00:12:14] this query that we're interested in we're interested in c equals c given i [00:12:16] we're interested in c equals c given i equals 1 here [00:12:19] equals 1 here and expanding it out based on the [00:12:21] and expanding it out based on the definition of marginal probability um i [00:12:24] definition of marginal probability um i can put in probability of c a and h [00:12:27] can put in probability of c a and h where i sum over all possible values of [00:12:30] where i sum over all possible values of h so i'm marginalizing out a and h here [00:12:34] h so i'm marginalizing out a and h here and by definition of the bayesian [00:12:37] and by definition of the bayesian network i can replace this with the [00:12:40] network i can replace this with the local conditional [00:12:42] local conditional distributions [00:12:43] distributions and now using the same trick as before [00:12:46] and now using the same trick as before i notice that [00:12:48] i notice that h is an unobserved leave so i can [00:12:50] h is an unobserved leave so i can actually marginalize out h and this [00:12:52] actually marginalize out h and this factor disappears graphically this h [00:12:55] factor disappears graphically this h disappears [00:12:58] and now i am left with [00:13:00] and now i am left with this [00:13:01] this um [00:13:03] um asian network [00:13:04] asian network where [00:13:06] where notice that [00:13:08] notice that the only thing that [00:13:11] the only thing that depends on c is this p of c [00:13:14] depends on c is this p of c so i can actually pull it out [00:13:17] so i can actually pull it out and rewrite it as [00:13:18] and rewrite it as follows and now i have p of c times some [00:13:22] follows and now i have p of c times some mess [00:13:23] mess and the nice thing in this case is that [00:13:26] and the nice thing in this case is that this mess is just a constant [00:13:29] this mess is just a constant because it doesn't depend on [00:13:31] because it doesn't depend on z [00:13:32] z and moreover because pfc is a [00:13:34] and moreover because pfc is a distribution and this left-hand side is [00:13:36] distribution and this left-hand side is the distribution this constant is [00:13:38] the distribution this constant is actually one [00:13:40] actually one so [00:13:41] so because c graphically c and [00:13:44] because c graphically c and this a i [00:13:46] this a i uh sub graph is actually disconnected [00:13:49] uh sub graph is actually disconnected which means that i can simply remove [00:13:52] which means that i can simply remove this part [00:13:55] this part so generally [00:13:56] so generally i can throw away any disconnected [00:13:59] i can throw away any disconnected components [00:14:00] components before running inference [00:14:02] before running inference okay [00:14:04] okay so [00:14:05] so in general [00:14:08] in general let's summarize here [00:14:09] let's summarize here we've tackled the problem how to perform [00:14:11] we've tackled the problem how to perform probabilistic inference in bayesian [00:14:13] probabilistic inference in bayesian networks by reducing the problem to [00:14:15] networks by reducing the problem to inference in markov network [00:14:18] inference in markov network so [00:14:19] so to prepare the markov network we're [00:14:22] to prepare the markov network we're going to first condition on the evidence [00:14:25] going to first condition on the evidence so this is tantamount to substituting [00:14:27] so this is tantamount to substituting the values of the evidence into the [00:14:29] the values of the evidence into the factors [00:14:31] factors then we throw away any unobserved leaves [00:14:33] then we throw away any unobserved leaves in this case h [00:14:36] in this case h we throw away any disconnected [00:14:37] we throw away any disconnected components [00:14:39] components but these two are just optimizations [00:14:40] but these two are just optimizations which are totally optional but it'll [00:14:42] which are totally optional but it'll often save you some work [00:14:45] often save you some work now we define a markup network over the [00:14:47] now we define a markup network over the remaining factors [00:14:50] remaining factors now we just have a factor [00:14:53] graph where we can now run your favorite [00:14:56] graph where we can now run your favorite inference algorithm so in case it's very [00:14:58] inference algorithm so in case it's very simple as it is the case here you can [00:15:01] simple as it is the case here you can just do it manually or if it's [00:15:03] just do it manually or if it's what's remaining is more complicated [00:15:05] what's remaining is more complicated then you can do something like sampling [00:15:08] then you can do something like sampling and that's the end [00:15:15] you ================================================================================ LECTURE 036 ================================================================================ Bayesian Networks 5 - Forward-backward Algorithm | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=N-ZPbpJOQs0 --- Transcript [00:00:05] hi in this module i'm going to introduce [00:00:07] hi in this module i'm going to introduce the forward backward algorithm for [00:00:09] the forward backward algorithm for performing exact and efficient inference [00:00:11] performing exact and efficient inference and hidden markov models which are an [00:00:13] and hidden markov models which are an important special case of bayesian [00:00:15] important special case of bayesian networks [00:00:17] networks so let's revisit our object tracking [00:00:19] so let's revisit our object tracking example now through the lens of hidemak [00:00:23] example now through the lens of hidemak recall that at each time i [00:00:26] recall that at each time i there's an object that is at a position [00:00:28] there's an object that is at a position particular position h i [00:00:30] particular position h i object might have gone [00:00:32] object might have gone this trajectory [00:00:34] this trajectory in each position there's also a noisy [00:00:36] in each position there's also a noisy observation [00:00:37] observation 0 [00:00:38] 0 2 and 2. [00:00:40] 2 and 2. so let's formally define [00:00:42] so let's formally define a probabilistic story for how this [00:00:46] a probabilistic story for how this these data might occur [00:00:48] these data might occur so we start at [00:00:50] so we start at h1 which is the position of object at [00:00:52] h1 which is the position of object at time step one [00:00:54] time step one and we're going to generate this [00:00:56] and we're going to generate this position [00:00:57] position uniformly at random [00:00:59] uniformly at random probably one third one third one third [00:01:02] probably one third one third one third for each of these possible positions [00:01:04] for each of these possible positions zero one or two [00:01:07] zero one or two and then i'm going to transition into [00:01:10] and then i'm going to transition into the second time step so in general i'm [00:01:13] the second time step so in general i'm going to look at h i minus 1 [00:01:16] going to look at h i minus 1 look into each i which is going to be [00:01:19] look into each i which is going to be up with [00:01:20] up with probability of one quarter the same with [00:01:23] probability of one quarter the same with probably one half and [00:01:25] probably one half and down with probably one half [00:01:27] down with probably one half so mathematically this looks like this [00:01:30] so mathematically this looks like this um h i can be h i minus one minus one [00:01:33] um h i can be h i minus one minus one the same plus one [00:01:35] the same plus one probabilities [00:01:37] probabilities so this transition distribution is also [00:01:39] so this transition distribution is also used to generate h3 given h2 [00:01:43] used to generate h3 given h2 now at each time step i have an emission [00:01:48] now at each time step i have an emission of e1 e2 and e3 [00:01:51] of e1 e2 and e3 and in general i'm looking at the actual [00:01:53] and in general i'm looking at the actual position [00:01:54] position at time step i [00:01:56] at time step i and i'm going to generate e i according [00:01:58] and i'm going to generate e i according to essentially the same process which is [00:02:01] to essentially the same process which is up with probability one quarter [00:02:03] up with probability one quarter same with probability one half and down [00:02:05] same with probability one half and down with probably one [00:02:07] with probably one and this is [00:02:08] and this is local conditional distribution formally [00:02:10] local conditional distribution formally stated [00:02:13] now i [00:02:14] now i multiply [00:02:16] multiply all the trend of the colloquial [00:02:18] all the trend of the colloquial conditional distributions together [00:02:20] conditional distributions together we have the probability of start h1 we [00:02:23] we have the probability of start h1 we have the probability of [00:02:25] have the probability of each i given h i minus 1 for each [00:02:27] each i given h i minus 1 for each subsequent time step [00:02:29] subsequent time step times the probability of [00:02:31] times the probability of the noisy sensor reading e i given the [00:02:34] the noisy sensor reading e i given the actual position h i for all the time [00:02:37] actual position h i for all the time steps [00:02:38] steps and this gives us the joint distribution [00:02:40] and this gives us the joint distribution for all over all the actual positions [00:02:42] for all over all the actual positions and sensor readings [00:02:47] so now let's ask questions about our [00:02:51] so now let's ask questions about our data markov [00:02:52] data markov so there's two types of questions which [00:02:54] so there's two types of questions which are common [00:02:55] are common one is called filtering [00:02:57] one is called filtering and the other is called smoothing so the [00:02:59] and the other is called smoothing so the filtering question [00:03:01] filtering question is something like this which is i'm [00:03:03] is something like this which is i'm interested in a particular [00:03:06] interested in a particular object [00:03:08] object location at a particular time step h2 [00:03:11] location at a particular time step h2 given some evidence [00:03:13] given some evidence which is all the sensor readings that [00:03:15] which is all the sensor readings that i've seen [00:03:16] i've seen before that [00:03:20] smoothing [00:03:21] smoothing is similar except for i in addition [00:03:24] is similar except for i in addition condition on the future [00:03:26] condition on the future so i might observe e3 [00:03:28] so i might observe e3 is equal to 2 as well [00:03:32] is equal to 2 as well so notice that filtering is actually a [00:03:34] so notice that filtering is actually a special case of smoothing if we [00:03:36] special case of smoothing if we marginalize unobserved leaves [00:03:39] marginalize unobserved leaves so uh to show this suppose we have um [00:03:44] so uh to show this suppose we have um just this [00:03:45] just this network or hmm [00:03:47] network or hmm and i didn't observe e3 [00:03:49] and i didn't observe e3 if i didn't observe e3 e3 is just on [00:03:52] if i didn't observe e3 e3 is just on observed leave [00:03:53] observed leave and i can marginalize it out by [00:03:56] and i can marginalize it out by just removing it [00:03:58] just removing it now h3 is an observed leaf and i can [00:04:01] now h3 is an observed leaf and i can remove that as well [00:04:03] remove that as well so now this filtering query [00:04:05] so now this filtering query is actually a smoothing query where [00:04:08] is actually a smoothing query where there is no future because i don't [00:04:10] there is no future because i don't observe the future [00:04:16] so [00:04:17] so now let us just focus on [00:04:19] now let us just focus on smoothing queries [00:04:20] smoothing queries without loss of generality [00:04:23] without loss of generality so the forward backward algorithm is [00:04:24] so the forward backward algorithm is based on dynamic programming and the key [00:04:28] based on dynamic programming and the key uh idea is to represent [00:04:31] uh idea is to represent the set of all assignments using [00:04:33] the set of all assignments using allowance [00:04:34] allowance so this lattice is a directly acyclic [00:04:37] so this lattice is a directly acyclic graph not to be confused with actual [00:04:41] graph not to be confused with actual hmm or pigeon network [00:04:43] hmm or pigeon network there's a start state [00:04:45] there's a start state and an end state [00:04:47] and an end state and [00:04:48] and each [00:04:49] each column is going to represent a [00:04:52] column is going to represent a particular value [00:04:54] particular value and each row is going to correspond to [00:04:57] and each row is going to correspond to a particular variable [00:05:00] a particular variable so and each path [00:05:03] so and each path through [00:05:04] through this lattice [00:05:05] this lattice now is going to correspond to [00:05:07] now is going to correspond to an assignment [00:05:09] an assignment of [00:05:10] of values to all the variables for example [00:05:14] values to all the variables for example if i go through this path and set h1 [00:05:17] if i go through this path and set h1 equals 0 [00:05:19] equals 0 h2 equals 2 and h3 1. [00:05:24] h2 equals 2 and h3 1. so this is just a very compact way of [00:05:26] so this is just a very compact way of representing all exponentially many [00:05:29] representing all exponentially many assignments in a [00:05:31] assignments in a in a polynomial sized object [00:05:35] in a polynomial sized object so now i'm going to attach weights to [00:05:37] so now i'm going to attach weights to the edges [00:05:39] the edges so [00:05:40] so edge start to any of these initial [00:05:43] edge start to any of these initial states has weight [00:05:45] states has weight the start probability times [00:05:48] the start probability times the first emission probability [00:05:50] the first emission probability for example this edge here has weight [00:05:54] for example this edge here has weight h probability of h1 equals zero [00:05:57] h probability of h1 equals zero times probability of e1 equals zero [00:05:59] times probability of e1 equals zero given h1 [00:06:00] given h1 because e1 remember was observed to be [00:06:03] because e1 remember was observed to be zero so i've plugged in that evidence [00:06:07] zero so i've plugged in that evidence so now the subsequent edges are between [00:06:11] so now the subsequent edges are between some h i minus 1 and some h i [00:06:14] some h i minus 1 and some h i and that has weight the transition [00:06:16] and that has weight the transition probability times the emission [00:06:17] probability times the emission probability of the destination [00:06:21] probability of the destination state [00:06:22] state for example this edge here has [00:06:25] for example this edge here has a [00:06:26] a weight [00:06:27] weight probability of h2 equals zero [00:06:30] probability of h2 equals zero uh given h1 equals zero that's the [00:06:32] uh given h1 equals zero that's the transition times probability of e2 [00:06:34] transition times probability of e2 equals two [00:06:36] equals two which is what we observe as evidence [00:06:38] which is what we observe as evidence condition on h2 equal [00:06:41] condition on h2 equal and this one is h3 equals zero given h2 [00:06:43] and this one is h3 equals zero given h2 equals zero times the emission [00:06:44] equals zero times the emission probability and this edge doesn't have [00:06:47] probability and this edge doesn't have anything on it so [00:06:48] anything on it so assume it to be one [00:06:49] assume it to be one [Music] [00:06:51] [Music] and now [00:06:53] and now each path [00:06:55] each path as we stated before [00:06:58] as we stated before from start to end [00:06:59] from start to end is an [00:07:00] is an assignment of [00:07:03] assignment of of all the variables [00:07:05] of all the variables but in particular it has weight that's [00:07:08] but in particular it has weight that's equal to [00:07:09] equal to the product of the edge weights [00:07:12] the product of the edge weights so this path here [00:07:14] so this path here has weight which is simply the product [00:07:17] has weight which is simply the product of all these purple numbers [00:07:21] of all these purple numbers okay so and that weight is actually the [00:07:23] okay so and that weight is actually the joint probability of [00:07:26] joint probability of this particular assignment and the [00:07:30] this particular assignment and the evidence [00:07:32] evidence okay so now the key part [00:07:34] okay so now the key part is [00:07:35] is that smoothing queries such as [00:07:37] that smoothing queries such as probability of h i given [00:07:40] probability of h i given e i [00:07:41] e i e equals e [00:07:43] e equals e is simply the weighted fraction of paths [00:07:46] is simply the weighted fraction of paths through [00:07:47] through h i equals h i [00:07:49] h i equals h i so for example if i'm looking interested [00:07:52] so for example if i'm looking interested in what is the probability [00:07:54] in what is the probability of h2 equals one condition on evidence [00:07:58] of h2 equals one condition on evidence what i'm really asking in the context of [00:08:00] what i'm really asking in the context of this lattice is [00:08:02] this lattice is what is the fraction [00:08:04] what is the fraction of paths that pass through this node [00:08:07] of paths that pass through this node compared to all paths [00:08:09] compared to all paths so stated differently [00:08:12] so stated differently i'm going to look at all the paths [00:08:14] i'm going to look at all the paths through this [00:08:15] through this this node [00:08:17] this node add up all the weights [00:08:19] add up all the weights and divide by [00:08:20] and divide by the sum of the weights over all paths [00:08:23] the sum of the weights over all paths so this gives us a really nice graphical [00:08:26] so this gives us a really nice graphical interpretation of these smoothing [00:08:28] interpretation of these smoothing queries [00:08:32] so now we can compute [00:08:35] so now we can compute those quantities [00:08:37] those quantities using a recurrence [00:08:39] using a recurrence i'm going to define two [00:08:41] i'm going to define two types of objects four word messages and [00:08:43] types of objects four word messages and backward messages [00:08:44] backward messages here's our lattice [00:08:46] here's our lattice the forward message [00:08:48] the forward message for each [00:08:50] for each node here is going to be f of i [00:08:53] node here is going to be f of i written [00:08:54] written of h of i [00:08:57] of h of i and this is supposed to be the sum of [00:08:59] and this is supposed to be the sum of weights of paths from the start to a [00:09:02] weights of paths from the start to a particular hi equals hi [00:09:05] particular hi equals hi so for example [00:09:07] so for example f [00:09:08] f 2 of [00:09:11] 2 of of 1 [00:09:12] of 1 is going to be [00:09:13] is going to be the sum of the weights of all paths [00:09:16] the sum of the weights of all paths from start to h2 equals [00:09:20] from start to h2 equals and i can compute this recursively as [00:09:22] and i can compute this recursively as follows so all paths that go from here [00:09:25] follows so all paths that go from here to here have to [00:09:27] to here have to and uh [00:09:29] and uh have to have a previous [00:09:32] position which is one of these so i'm [00:09:34] position which is one of these so i'm gonna sum over all possible h i minus 1 [00:09:39] gonna sum over all possible h i minus 1 values of the previous [00:09:41] values of the previous variable [00:09:43] variable and then i'm going to recurse on f i [00:09:44] and then i'm going to recurse on f i minus 1 of h [00:09:46] minus 1 of h i minus 1 so the sum of all the [00:09:49] i minus 1 so the sum of all the weights to each of these previous [00:09:51] weights to each of these previous locations [00:09:52] locations times [00:09:53] times the weight [00:09:56] of [00:09:58] of along the edge from a particular h [00:10:01] along the edge from a particular h i minus 1 to h i [00:10:05] i minus 1 to h i so the backward message is analogous so [00:10:09] so the backward message is analogous so b i [00:10:10] b i of h i [00:10:12] of h i is going to be the sum of weights of all [00:10:14] is going to be the sum of weights of all paths from a particular hi equals hi to [00:10:17] paths from a particular hi equals hi to the end [00:10:18] the end so b2 of 1 is the sum of all paths from [00:10:22] so b2 of 1 is the sum of all paths from this [00:10:23] this to the end [00:10:26] and this again is recursively defined as [00:10:29] and this again is recursively defined as looking at all [00:10:32] looking at all next nodes [00:10:34] next nodes h i plus one [00:10:37] h i plus one um [00:10:38] um and recurse on b of i plus one of h i [00:10:42] and recurse on b of i plus one of h i plus one [00:10:43] plus one times [00:10:44] times the weight [00:10:46] the weight of the edge from between the h i and h i [00:10:50] of the edge from between the h i and h i plus one [00:10:53] okay so now having defined forward and [00:10:56] okay so now having defined forward and backward [00:10:57] backward messages [00:10:58] messages i can multiply them together [00:11:01] i can multiply them together to form s i and my claim is that the sum [00:11:05] to form s i and my claim is that the sum of the weights of all paths from start [00:11:08] of the weights of all paths from start to end that go through a particular [00:11:11] to end that go through a particular node is exactly [00:11:12] node is exactly si of h i [00:11:14] si of h i okay so [00:11:16] okay so for example if i'm looking at this node [00:11:18] for example if i'm looking at this node again [00:11:19] again f i is looking at all the ways to go [00:11:22] f i is looking at all the ways to go from start to this node [00:11:24] from start to this node and then the bi is looking at all the [00:11:26] and then the bi is looking at all the ways to go from this node to the end [00:11:29] ways to go from this node to the end and if i multiply all those [00:11:32] and if i multiply all those two quantities together then i get [00:11:34] two quantities together then i get all the paths from start that go through [00:11:37] all the paths from start that go through this node to the [00:11:42] so [00:11:43] so now we're almost done [00:11:45] now we're almost done we can take [00:11:47] we can take these sis [00:11:49] these sis and then we normalize them [00:11:52] and then we normalize them over all possible other values that each [00:11:55] over all possible other values that each i could take [00:11:56] i could take and that gives us exactly the [00:11:58] and that gives us exactly the probability of h i equals h i given [00:12:03] probability of h i equals h i given the evidence [00:12:04] the evidence and this is exactly this the smoothing [00:12:07] and this is exactly this the smoothing uh quantity that we're looking for [00:12:10] uh quantity that we're looking for what is the probability of h2 equals one [00:12:12] what is the probability of h2 equals one condition on evidence [00:12:15] condition on evidence now putting things together the forward [00:12:17] now putting things together the forward backward algorithm is going to simply [00:12:20] backward algorithm is going to simply compute [00:12:21] compute all the forward messages proceeding from [00:12:24] all the forward messages proceeding from one to two to three all the way up to n [00:12:27] one to two to three all the way up to n where f i depends on f i minus one so [00:12:30] where f i depends on f i minus one so i'm going forward [00:12:32] i'm going forward then it's going to compute all the [00:12:33] then it's going to compute all the backward messages going from n down to [00:12:36] backward messages going from n down to one because bi depends on bi plus one [00:12:41] one because bi depends on bi plus one now i'm going to multiply everything to [00:12:43] now i'm going to multiply everything to the fi and the bi together to compute s [00:12:45] the fi and the bi together to compute s i [00:12:46] i and normalize and that gives me [00:12:49] and normalize and that gives me the answer to the smoothing question [00:12:53] the answer to the smoothing question so the runtime of this algorithm is [00:12:57] so the runtime of this algorithm is so we have n time steps [00:12:59] so we have n time steps and for each of the time steps i have [00:13:01] and for each of the time steps i have number of domain [00:13:03] number of domain elements that i need to consider so this [00:13:05] elements that i need to consider so this is the number of [00:13:07] is the number of nodes in this lattice [00:13:08] nodes in this lattice and i have also another multiplicative [00:13:11] and i have also another multiplicative factor of domain [00:13:13] factor of domain because to compute the recurrence [00:13:17] because to compute the recurrence and this is exactly actually the number [00:13:19] and this is exactly actually the number of edges [00:13:20] of edges in this lattice [00:13:23] in this lattice so one other note is that [00:13:26] so one other note is that notice that the four backward algorithm [00:13:29] notice that the four backward algorithm actually computes all the smoothing [00:13:31] actually computes all the smoothing queries for each [00:13:33] queries for each i [00:13:34] i and this takes time [00:13:37] and this takes time the time complexity for computing all of [00:13:38] the time complexity for computing all of them is exactly the same as computing [00:13:42] them is exactly the same as computing uh the one for any individual one [00:13:44] uh the one for any individual one and that's because there's a lot of [00:13:46] and that's because there's a lot of shared computation [00:13:48] shared computation so the forward message here that's [00:13:51] so the forward message here that's computed is used down here and here and [00:13:54] computed is used down here and here and same with the backward messages in the [00:13:56] same with the backward messages in the other direction [00:13:58] other direction so let's look at a quick demo of um [00:14:02] so let's look at a quick demo of um this in action so [00:14:03] this in action so um here is the [00:14:05] um here is the object tracking hmm so we have h1h3 [00:14:09] object tracking hmm so we have h1h3 and we [00:14:10] and we have uh the various probabilities of h1 [00:14:15] have uh the various probabilities of h1 1 given h1 h2 given h1 [00:14:17] 1 given h1 h2 given h1 e2 h3 [00:14:20] e2 h3 so on [00:14:22] so on and now i'm interested in the [00:14:23] and now i'm interested in the probability [00:14:25] probability of [00:14:27] of h2 [00:14:28] h2 so here [00:14:30] so here notice that i'm actually [00:14:32] notice that i'm actually not going to [00:14:34] not going to run forward backward but i'm going to [00:14:36] run forward backward but i'm going to run this more general algorithm called [00:14:37] run this more general algorithm called sum variable elimination [00:14:40] sum variable elimination um so the details are going to be a [00:14:42] um so the details are going to be a little bit different so don't worry [00:14:44] little bit different so don't worry about it too much but i just want to [00:14:45] about it too much but i just want to give you a flavor of how it works [00:14:48] give you a flavor of how it works so here the [00:14:50] so here the uh the first thing i do is i compute [00:14:53] uh the first thing i do is i compute um this this factor which is actually [00:14:56] um this this factor which is actually the forward [00:14:58] the forward message where i've sum [00:14:59] message where i've sum out [00:15:01] out the previous [00:15:02] the previous time step on h1 [00:15:06] time step on h1 and then i'm going to compute another [00:15:08] and then i'm going to compute another factor which is summing out [00:15:10] factor which is summing out the backward message which sums out [00:15:13] the backward message which sums out h3 [00:15:14] h3 and then i'm going to multiply them [00:15:16] and then i'm going to multiply them together [00:15:17] together and i get the probability of h2 [00:15:21] and i get the probability of h2 to be [00:15:23] to be 0.61 [00:15:24] 0.61 and 0.3 [00:15:31] all right so to summarize we've [00:15:32] all right so to summarize we've presented the forward backward algorithm [00:15:34] presented the forward backward algorithm for proficiency inference in hmms in [00:15:37] for proficiency inference in hmms in particular answering [00:15:38] particular answering smoothing questions [00:15:41] smoothing questions so the key idea behind the forward [00:15:43] so the key idea behind the forward background is this lattice [00:15:44] background is this lattice representations which allows us to [00:15:46] representations which allows us to compactly [00:15:48] compactly represent [00:15:50] represent paths as assignments [00:15:52] paths as assignments and the weights of each assignment to be [00:15:56] and the weights of each assignment to be the product of the edge weights [00:15:59] the product of the edge weights that allows us to define [00:16:01] that allows us to define a dynamic program [00:16:03] a dynamic program which computes the forward and backward [00:16:05] which computes the forward and backward messages [00:16:06] messages an efficient way [00:16:10] and now if you multiply the forward and [00:16:13] and now if you multiply the forward and backward messages and normalize now you [00:16:15] backward messages and normalize now you can compute all the smoothing queries [00:16:18] can compute all the smoothing queries you want [00:16:19] you want um in the same amount of time as [00:16:21] um in the same amount of time as computing any one of them because [00:16:22] computing any one of them because there's a lot of shared computation [00:16:25] there's a lot of shared computation all right that's the end ================================================================================ LECTURE 037 ================================================================================ Bayesian Networks 6 - Particle Filtering | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=8sOtXbQIOuE --- Transcript [00:00:05] hi in this module i'm going to present [00:00:07] hi in this module i'm going to present the particle filtering algorithm for [00:00:09] the particle filtering algorithm for performing approximate inference in [00:00:11] performing approximate inference in hidden markov models which is really [00:00:13] hidden markov models which is really useful when the size of the domain of [00:00:15] useful when the size of the domain of the variables is very large [00:00:18] the variables is very large so let's start with our familiar object [00:00:21] so let's start with our familiar object tracking hmm [00:00:22] tracking hmm in this case for every time step we have [00:00:24] in this case for every time step we have a position hi of an object which we [00:00:27] a position hi of an object which we don't see [00:00:28] don't see and instead we see some noisy sensor [00:00:31] and instead we see some noisy sensor readings [00:00:32] readings so the probabilistic story of object [00:00:34] so the probabilistic story of object tracking is as follows [00:00:36] tracking is as follows each one the first object position is [00:00:38] each one the first object position is generated uniformly [00:00:41] generated uniformly across all the values zero one two [00:00:44] across all the values zero one two subsequently h2 is generated addition on [00:00:48] subsequently h2 is generated addition on h1 via this transition distribution [00:00:50] h1 via this transition distribution which goes up [00:00:52] which goes up with probably one quarter uh stays the [00:00:55] with probably one quarter uh stays the same with probably one half down with [00:00:57] same with probably one half down with probably one quarter [00:00:59] probably one quarter um [00:01:01] um h3 [00:01:02] h3 now at each time step we get a sensor [00:01:05] now at each time step we get a sensor reading [00:01:06] reading e1 e2 e3 condition on the respective [00:01:09] e1 e2 e3 condition on the respective locations and that's governed by the [00:01:12] locations and that's governed by the emission distribution [00:01:15] emission distribution which takes high and goes up by a [00:01:19] which takes high and goes up by a quarter [00:01:20] quarter is the same with [00:01:21] is the same with half goes down with probably one [00:01:25] half goes down with probably one so now you multiply all of these local [00:01:28] so now you multiply all of these local conditional distributions together and [00:01:30] conditional distributions together and you get one glorious joint distribution [00:01:32] you get one glorious joint distribution over all the object locations as well as [00:01:35] over all the object locations as well as the sensor [00:01:37] the sensor readings [00:01:38] readings that's rh [00:01:41] that's rh so now given this hmm we can ask all [00:01:44] so now given this hmm we can ask all sorts of questions uh on it in [00:01:46] sorts of questions uh on it in particular we looked at filtering and [00:01:48] particular we looked at filtering and smoothing questions [00:01:50] smoothing questions the particle filtering as the name might [00:01:51] the particle filtering as the name might suggest does filtering so let's focus on [00:01:54] suggest does filtering so let's focus on filtering [00:01:56] filtering so in filtering we're asked [00:01:58] so in filtering we're asked asking for [00:02:00] asking for the position of an object at a [00:02:02] the position of an object at a particular time step condition on the [00:02:05] particular time step condition on the past [00:02:06] past the time step one we're asking for [00:02:08] the time step one we're asking for i'm looking at only the evidence at uh [00:02:11] i'm looking at only the evidence at uh time step one and i was like where is [00:02:13] time step one and i was like where is this object [00:02:14] this object um at time step two i now have two [00:02:18] um at time step two i now have two observations now i ask where is the time [00:02:20] observations now i ask where is the time object at time step two [00:02:23] object at time step two it's time step three i have three [00:02:24] it's time step three i have three observations and now my four words [00:02:27] observations and now my four words object at time step three [00:02:30] object at time step three so now i could apply forward backward uh [00:02:33] so now i could apply forward backward uh algorithm to [00:02:34] algorithm to this scenario um [00:02:36] this scenario um and that would work but [00:02:38] and that would work but the problem is that there's if you have [00:02:40] the problem is that there's if you have a setting where there's many many [00:02:42] a setting where there's many many location values for each eye but in this [00:02:45] location values for each eye but in this in our simple example there's only three [00:02:47] in our simple example there's only three but in practice you're going to be a [00:02:48] but in practice you're going to be a hundred thousand [00:02:50] hundred thousand and in that case four backwards is going [00:02:52] and in that case four backwards is going to be really really slow [00:02:54] to be really really slow because its running time scales [00:02:55] because its running time scales quadratically with the number of to have [00:02:58] quadratically with the number of to have a hundred thousand squared which is not [00:03:00] a hundred thousand squared which is not a nice [00:03:02] a nice the goal of particle filtering is to [00:03:03] the goal of particle filtering is to realize that well you have a hundred [00:03:06] realize that well you have a hundred thousand possible values but only a very [00:03:08] thousand possible values but only a very small fraction of them are really uh [00:03:11] small fraction of them are really uh likely given the data out [00:03:15] likely given the data out so to start [00:03:17] so to start introducing particle filtering let us [00:03:19] introducing particle filtering let us revisit beam search because structurally [00:03:22] revisit beam search because structurally particles and beam search are analogous [00:03:26] particles and beam search are analogous so in beam search remember the idea was [00:03:28] so in beam search remember the idea was to keep track of at most k [00:03:31] to keep track of at most k partial assignments which we're going to [00:03:33] partial assignments which we're going to call particles now in the particle [00:03:36] call particles now in the particle filtering lingo [00:03:38] filtering lingo so beam search starts with a candidate [00:03:41] so beam search starts with a candidate set of only one assignment which is the [00:03:43] set of only one assignment which is the empty assignment [00:03:44] empty assignment it's going to go through each variable [00:03:47] it's going to go through each variable in turn from 1 to n [00:03:49] in turn from 1 to n and now we're going to for each of these [00:03:52] and now we're going to for each of these partial assignments for our h1 through h [00:03:55] partial assignments for our h1 through h i minus 1 [00:03:57] i minus 1 i'm going to take that [00:03:59] i'm going to take that and then i'm going to consider all [00:04:00] and then i'm going to consider all possible values i can assign to hi [00:04:03] possible values i can assign to hi i'm going to extend that assignment so [00:04:05] i'm going to extend that assignment so now we get a bunch of assignments from [00:04:08] now we get a bunch of assignments from h1 to hi my i [00:04:11] h1 to hi my i and now i'm going to compute the weight [00:04:13] and now i'm going to compute the weight of each of these candidate particles [00:04:17] of each of these candidate particles and i'm just going to take the [00:04:20] and i'm just going to take the a highest weight particles [00:04:24] a highest weight particles so let's recall [00:04:26] so let's recall beam search on this example here [00:04:29] beam search on this example here um here we have our object tracking hmm [00:04:33] um here we have our object tracking hmm we have the variables and all the local [00:04:36] we have the variables and all the local conditional distributions and i'm [00:04:38] conditional distributions and i'm observing zero two two [00:04:41] observing zero two two beam search starts out um [00:04:44] beam search starts out um extending to variable h1 and it produces [00:04:48] extending to variable h1 and it produces um [00:04:49] um zero one and two are the possible [00:04:51] zero one and two are the possible particles with weights [00:04:53] particles with weights um these probabilities here [00:04:57] um these probabilities here and i prune which does nothing because k [00:04:59] and i prune which does nothing because k is three [00:05:00] is three and then next [00:05:01] and then next i'm going to extend [00:05:04] i'm going to extend to h2 so each of these particles um [00:05:07] to h2 so each of these particles um kind of multiplies [00:05:09] kind of multiplies into [00:05:10] into three particles [00:05:12] three particles and now the weight [00:05:14] and now the weight of each of these particles is going to [00:05:16] of each of these particles is going to now include [00:05:18] now include the factors which are the transition [00:05:20] the factors which are the transition into h2 and the emission maybe e2 equals [00:05:23] into h2 and the emission maybe e2 equals two given h2 [00:05:25] two given h2 um now i'm going to [00:05:27] um now i'm going to room [00:05:28] room down to three [00:05:30] down to three i'm gonna now extend to h3 [00:05:33] i'm gonna now extend to h3 and i'm going to prune [00:05:36] and i'm going to prune um [00:05:37] um and at the end [00:05:38] and at the end i get [00:05:39] i get a [00:05:41] a set of particles so here i have zero one [00:05:44] set of particles so here i have zero one two [00:05:45] two um i have one [00:05:47] um i have one uh so zero one one i also have a one two [00:05:52] uh so zero one one i also have a one two two and each particle has [00:05:55] two and each particle has some weight [00:05:56] some weight normally in beam search we're interested [00:05:58] normally in beam search we're interested in them uh [00:06:00] in them uh presented in the context of finding max [00:06:02] presented in the context of finding max weight assignments so in this case you [00:06:04] weight assignments so in this case you would just return onto two [00:06:07] would just return onto two but in the case of particle filtering [00:06:08] but in the case of particle filtering we're interested in answering [00:06:10] we're interested in answering um filtering queries [00:06:12] um filtering queries so what we're going to do instead is [00:06:14] so what we're going to do instead is we're going to normalize the weights [00:06:16] we're going to normalize the weights over all the particles to perform get an [00:06:19] over all the particles to perform get an approximate distribution [00:06:21] approximate distribution over assignments [00:06:23] over assignments and now we're going to pretend this is a [00:06:25] and now we're going to pretend this is a joint distribution over h1 to hm given [00:06:29] joint distribution over h1 to hm given evidence [00:06:31] evidence and now i'm going to adjust some [00:06:32] and now i'm going to adjust some probabilities to get any approximate [00:06:35] probabilities to get any approximate smoothing query a filtering query you [00:06:37] smoothing query a filtering query you know i like [00:06:39] know i like um [00:06:41] so [00:06:42] so this is fine [00:06:44] this is fine but it has two problems with it [00:06:48] but it has two problems with it one is that [00:06:49] one is that the extend step [00:06:51] the extend step is slow because it requires considering [00:06:53] is slow because it requires considering every possible value [00:06:55] every possible value of a chi [00:06:58] of a chi so sometimes you can be clever you don't [00:07:00] so sometimes you can be clever you don't have to enumerate all the values in the [00:07:02] have to enumerate all the values in the domain and um [00:07:05] domain and um you can only uh enumerate over values [00:07:08] you can only uh enumerate over values that are going to have positive weight [00:07:13] that are going to have positive weight but even that could be a lot [00:07:16] but even that could be a lot so the second problem is that [00:07:20] so the second problem is that we are greedily taking the [00:07:22] we are greedily taking the a particles with the highest weight [00:07:26] a particles with the highest weight so [00:07:27] so um this is going to as we'll see later [00:07:29] um this is going to as we'll see later doesn't provide enough diversity [00:07:32] doesn't provide enough diversity so [00:07:33] so we're going to have to do something [00:07:34] we're going to have to do something about it [00:07:35] about it so particle filtering is going to solve [00:07:38] so particle filtering is going to solve both of these issues [00:07:40] both of these issues and particle filtering is going to [00:07:41] and particle filtering is going to contain three steps to replace the [00:07:44] contain three steps to replace the extend prune [00:07:45] extend prune two-step procedure of beam surge [00:07:49] two-step procedure of beam surge so propose weight and resample are the [00:07:51] so propose weight and resample are the three steps and we're going to go [00:07:52] three steps and we're going to go through the turn [00:07:54] through the turn the first step is propose so in general [00:07:57] the first step is propose so in general you should think about the set of [00:07:59] you should think about the set of particles as an approximating a certain [00:08:02] particles as an approximating a certain distribution [00:08:03] distribution in general the filtering distribution is [00:08:06] in general the filtering distribution is the probability of the variables that [00:08:08] the probability of the variables that you are considering condition on the [00:08:10] you are considering condition on the evidence so far and suppose we have just [00:08:13] evidence so far and suppose we have just two particles here [00:08:16] two particles here um zero one and one two [00:08:19] um zero one and one two so now [00:08:20] so now the proposed step is [00:08:23] the proposed step is um going to take each of these particles [00:08:26] um going to take each of these particles and i'm just going to sample [00:08:29] and i'm just going to sample a value for h3 [00:08:31] a value for h3 the next variable [00:08:33] the next variable given the transition distribution [00:08:36] given the transition distribution remember was [00:08:37] remember was up and down with probably one quarter [00:08:39] up and down with probably one quarter and the same the same way same with [00:08:41] and the same the same way same with probability half [00:08:43] probability half so that is going to produce these [00:08:45] so that is going to produce these extended articles um conditioned on the [00:08:49] extended articles um conditioned on the same evidence [00:08:51] same evidence so [00:08:52] so for example i'm going to take 0 1 [00:08:56] for example i'm going to take 0 1 i'm going to [00:08:57] i'm going to uh [00:08:58] uh now that will produce [00:09:00] now that will produce um each this particle with probability [00:09:03] um each this particle with probability half [00:09:04] half because uh i'm just keeping [00:09:09] the the value the same here [00:09:11] the the value the same here and i'm going to take this particle and [00:09:14] and i'm going to take this particle and i'm going to extend it to 2 and that's [00:09:16] i'm going to extend it to 2 and that's also going to happen with probably half [00:09:19] also going to happen with probably half now this is a random algorithm so i [00:09:21] now this is a random algorithm so i could have sampled from the distribution [00:09:23] could have sampled from the distribution i could have gotten one here [00:09:25] i could have gotten one here i could have gotten three here but let's [00:09:27] i could have gotten three here but let's just go with one two [00:09:32] so in the next step i'm going to [00:09:35] so in the next step i'm going to wait [00:09:36] wait so you should think about these [00:09:38] so you should think about these particles [00:09:39] particles as [00:09:40] as a guess as to what h3 is going to be [00:09:45] a guess as to what h3 is going to be but we need to fact check this guess [00:09:47] but we need to fact check this guess with evidence [00:09:48] with evidence and so the waiting part uh step is going [00:09:52] and so the waiting part uh step is going to [00:09:53] to assign a weight to each [00:09:55] assign a weight to each particle [00:09:56] particle and that weight is going to be the [00:09:59] and that weight is going to be the probability of the new evidence i got [00:10:02] probability of the new evidence i got conditioned on [00:10:04] conditioned on h3 here [00:10:07] h3 here so [00:10:08] so this is going to produce a set of new [00:10:10] this is going to produce a set of new particles which are weighted [00:10:13] particles which are weighted representing the distribution [00:10:15] representing the distribution on the [00:10:16] on the h1 h2 h3 condition all the evidence so [00:10:20] h1 h2 h3 condition all the evidence so far [00:10:22] far so [00:10:23] so let's work out this example so for the [00:10:25] let's work out this example so for the first particle i have h3 equals one [00:10:29] first particle i have h3 equals one so h3 equals one it's going to generate [00:10:32] so h3 equals one it's going to generate the evidence e3 equals two with [00:10:34] the evidence e3 equals two with probability one quarter so i have a [00:10:37] probability one quarter so i have a weight of one quarter attached to the [00:10:38] weight of one quarter attached to the first particle the second particle [00:10:41] first particle the second particle um [00:10:44] has [00:10:47] has this should be a two here actually so [00:10:48] this should be a two here actually so the second particle has h3 equals two [00:10:51] the second particle has h3 equals two and i'm going to look up this table and [00:10:54] and i'm going to look up this table and that's going to [00:10:56] that's going to um have weight one half because the [00:10:58] um have weight one half because the probability of generating a two given a [00:11:00] probability of generating a two given a two is one half [00:11:02] two is one half so i have a weight of one half on this [00:11:05] so i have a weight of one half on this particle [00:11:08] particle so now at this point [00:11:10] so now at this point i have a set of particles [00:11:13] i have a set of particles that represent [00:11:14] that represent the [00:11:15] the advanced um [00:11:17] advanced um filtering distribution [00:11:19] filtering distribution but [00:11:20] but notice that the weights are not the same [00:11:24] notice that the weights are not the same and some weights are small some weights [00:11:25] and some weights are small some weights are big [00:11:26] are big and in particular the particles with [00:11:28] and in particular the particles with small weight [00:11:30] small weight are kind of wasting [00:11:32] are kind of wasting space [00:11:33] space you should think about the k particles [00:11:35] you should think about the k particles as kind of a limited resource for [00:11:37] as kind of a limited resource for representing this distribution so if you [00:11:39] representing this distribution so if you have a particle with weight 0.0001 [00:11:43] have a particle with weight 0.0001 or maybe even 0 then certainly we [00:11:47] or maybe even 0 then certainly we shouldn't be wasting one of the valuable [00:11:50] shouldn't be wasting one of the valuable k slots for that [00:11:52] k slots for that value [00:11:53] value so what we're going to do is to [00:11:56] so what we're going to do is to reallocate our resources via resampling [00:11:59] reallocate our resources via resampling so in the resampling step we're going to [00:12:02] so in the resampling step we're going to normalize these weights [00:12:04] normalize these weights and draw k samples so normalizing these [00:12:07] and draw k samples so normalizing these weights produces [00:12:09] weights produces this distribution one-third [00:12:11] this distribution one-third two-thirds [00:12:13] two-thirds and now i'm going to draw k-samples from [00:12:16] and now i'm going to draw k-samples from that distribution so redistribute [00:12:20] that distribution so redistribute so the [00:12:21] so the resulting particles are still going to [00:12:23] resulting particles are still going to represent [00:12:24] represent the same distribution but slightly in a [00:12:27] the same distribution but slightly in a different way without weight [00:12:30] different way without weight so um [00:12:32] so um i'm going to sample [00:12:33] i'm going to sample maybe i get [00:12:35] maybe i get 1 2 2. i got that with probably two [00:12:37] 1 2 2. i got that with probably two thirds [00:12:38] thirds sample again and maybe i get the same [00:12:40] sample again and maybe i get the same particle again [00:12:42] particle again two-thirds [00:12:43] two-thirds now of course this is again a random out [00:12:45] now of course this is again a random out algorithm so i could have gone the first [00:12:47] algorithm so i could have gone the first one [00:12:48] one and the second one or the second one the [00:12:49] and the second one or the second one the first one or the first one the first one [00:12:55] so [00:12:56] so now you might wonder why are we [00:12:58] now you might wonder why are we resampling why leave the result of the [00:13:00] resampling why leave the result of the algorithm up to chance [00:13:03] algorithm up to chance so to see why consider the following [00:13:05] so to see why consider the following setting so we have a distribution over a [00:13:08] setting so we have a distribution over a bunch of possible locations [00:13:12] bunch of possible locations and suppose that distribution were very [00:13:14] and suppose that distribution were very it's very close to uniform so you can [00:13:16] it's very close to uniform so you can see [00:13:17] see maybe you can see that there's a slight [00:13:19] maybe you can see that there's a slight um [00:13:20] um higher probability in the middle [00:13:23] higher probability in the middle but it's pretty flat [00:13:26] but it's pretty flat so now if you did beam search which [00:13:29] so now if you did beam search which takes the k [00:13:31] takes the k possible positions with the highest [00:13:32] possible positions with the highest weight you would end up with this [00:13:35] weight you would end up with this and so you would end up with all the [00:13:37] and so you would end up with all the particles clustering around the middle [00:13:39] particles clustering around the middle which is really not representative of [00:13:41] which is really not representative of the distribution [00:13:43] the distribution because you have all these positioned [00:13:44] because you have all these positioned out here with [00:13:46] out here with a non-negligible probability masks which [00:13:48] a non-negligible probability masks which have no support so it's kind of like [00:13:50] have no support so it's kind of like putting all your eggs in the same basket [00:13:53] putting all your eggs in the same basket or the same cake baskets i guess [00:13:56] or the same cake baskets i guess um so instead if you re-sample or sample [00:14:00] um so instead if you re-sample or sample from this distribution k times you're [00:14:02] from this distribution k times you're going to get something more like this [00:14:03] going to get something more like this which i would argue is more [00:14:05] which i would argue is more representative [00:14:06] representative of the distribution [00:14:10] so [00:14:11] so in cases where [00:14:12] in cases where most of the weight is on a few locations [00:14:15] most of the weight is on a few locations sampling versus taking the top okay is [00:14:18] sampling versus taking the top okay is not really a big deal [00:14:20] not really a big deal but when there's high uncertainty as in [00:14:22] but when there's high uncertainty as in this example [00:14:23] this example sampling is really important because it [00:14:26] sampling is really important because it allows you to maintain [00:14:27] allows you to maintain uncertainty [00:14:31] so now we're ready to present our final [00:14:33] so now we're ready to present our final particle filtering algorithm which is [00:14:35] particle filtering algorithm which is again structured very similar to beam [00:14:38] again structured very similar to beam search [00:14:39] search so like beam search we initialize with [00:14:41] so like beam search we initialize with the empty assignment for each of these [00:14:45] the empty assignment for each of these time steps we're going to propose wait [00:14:48] time steps we're going to propose wait and re-sample so here we're going to [00:14:50] and re-sample so here we're going to propose [00:14:51] propose we're going to take each [00:14:53] we're going to take each assignment to h1 through hiv minus 1 [00:14:57] assignment to h1 through hiv minus 1 and we're going to [00:14:59] and we're going to look at the transition distribution and [00:15:00] look at the transition distribution and generate [00:15:02] generate one possible [00:15:03] one possible uh assignment to each i and i'm just [00:15:06] uh assignment to each i and i'm just going to take that so in b search i'm [00:15:08] going to take that so in b search i'm going to consider all of them [00:15:11] going to consider all of them which can result in a blow-up but in [00:15:14] which can result in a blow-up but in particles i'm going to only look at one [00:15:17] particles i'm going to only look at one now i'm going to weight [00:15:19] now i'm going to weight the particles based on [00:15:22] the particles based on the evidence which is this emission [00:15:24] the evidence which is this emission distribution [00:15:27] distribution and finally i'm going to redistribute my [00:15:30] and finally i'm going to redistribute my resources by [00:15:32] resources by normalizing this weight distribution [00:15:36] normalizing this weight distribution and drawing [00:15:37] and drawing a particles independently from that [00:15:39] a particles independently from that distribution [00:15:41] distribution okay so let's see a demo here [00:15:45] so here i have my object tracking hmm [00:15:49] so here i have my object tracking hmm i'm going to run particle filtering with [00:15:52] i'm going to run particle filtering with 100 particles instead of view search [00:15:56] 100 particles instead of view search so [00:15:57] so i'm going to start out with [00:16:00] i'm going to start out with extending to [00:16:03] extending to the [00:16:04] the just the first [00:16:06] just the first variable [00:16:07] variable and now i have three part well i have a [00:16:10] and now i have three part well i have a hundred particles [00:16:12] hundred particles but um 38 of them are zero 33 of them [00:16:16] but um 38 of them are zero 33 of them are one and 29 of them two [00:16:18] are one and 29 of them two and these are the weights [00:16:20] and these are the weights i [00:16:22] i resample and now i redistribute [00:16:25] resample and now i redistribute probability to [00:16:28] probability to um zero and one with these particle [00:16:30] um zero and one with these particle counts [00:16:32] counts so now i'm going to extend [00:16:35] so now i'm going to extend and [00:16:36] and notice that [00:16:37] notice that before i had 73 particles at zero now 51 [00:16:41] before i had 73 particles at zero now 51 of them go [00:16:42] of them go to h2 equals zero and 22 of them go to [00:16:45] to h2 equals zero and 22 of them go to h2e1 [00:16:48] h2e1 um [00:16:50] um and then i'm going to [00:16:52] and then i'm going to resample [00:16:55] resample which redistributes the particles again [00:16:58] which redistributes the particles again now i'm going to propose and [00:17:02] now i'm going to propose and re-weight now all the particles [00:17:05] re-weight now all the particles are [00:17:06] are all over the place [00:17:08] all over the place and now i redistribute mass [00:17:11] and now i redistribute mass so that the particles are [00:17:13] so that the particles are used more effectively [00:17:16] used more effectively okay so now at the end [00:17:19] okay so now at the end i have now a [00:17:21] i have now a 100 particles [00:17:23] 100 particles covering all these different assignments [00:17:25] covering all these different assignments i can simply [00:17:27] i can simply count the fraction of them [00:17:29] count the fraction of them for [00:17:30] for that satisfied various values of h3 and [00:17:32] that satisfied various values of h3 and i get my approximate filtering [00:17:35] i get my approximate filtering distribution over h3 condition [00:17:42] okay so there are two [00:17:44] okay so there are two ways to make particle filtering more [00:17:47] ways to make particle filtering more efficient [00:17:48] efficient so particle filtering we've casted [00:17:52] so particle filtering we've casted in terms of generating a distribution [00:17:54] in terms of generating a distribution over [00:17:54] over complete assignments to all the [00:17:56] complete assignments to all the variables [00:17:57] variables but if you're only interested in [00:17:59] but if you're only interested in filtering queries which [00:18:01] filtering queries which look at the last variables [00:18:03] look at the last variables then what we can do is instead of [00:18:06] then what we can do is instead of storing all [00:18:08] storing all um the assignments we only we only need [00:18:11] um the assignments we only we only need to keep the value of the last hi so it's [00:18:15] to keep the value of the last hi so it's i'm going to only look at [00:18:17] i'm going to only look at h3 [00:18:19] h3 because this is sufficient to [00:18:21] because this is sufficient to continue the algorithm forward [00:18:25] continue the algorithm forward and furthermore [00:18:27] and furthermore if you have [00:18:29] if you have are multiple particles that have the [00:18:31] are multiple particles that have the same value you can actually just store [00:18:33] same value you can actually just store the counts as we saw in the demo one [00:18:36] the counts as we saw in the demo one occurs twice and two occurs three [00:18:40] times now let's visualize particle [00:18:43] times now let's visualize particle filtering in a more realistic [00:18:45] filtering in a more realistic interactive object tracking setting [00:18:48] interactive object tracking setting okay so here [00:18:50] okay so here we have a grid [00:18:52] we have a grid um and we have an object that's going to [00:18:54] um and we have an object that's going to be moving in this grid [00:18:56] be moving in this grid where we're trying to determine its [00:18:58] where we're trying to determine its location [00:18:59] location so [00:19:01] so the hmm is going to put [00:19:03] the hmm is going to put have a transition distribution that [00:19:06] have a transition distribution that places a uniform distribution over [00:19:08] places a uniform distribution over moving north south east west or staying [00:19:12] moving north south east west or staying put [00:19:13] put and the emission distribution is going [00:19:15] and the emission distribution is going to put a uniform distribution over [00:19:17] to put a uniform distribution over locations that are within [00:19:19] locations that are within three steps either vertical or [00:19:21] three steps either vertical or horizontally so you can kind of see this [00:19:24] horizontally so you can kind of see this definition of this emission distribution [00:19:26] definition of this emission distribution which only depends on the [00:19:28] which only depends on the x-distance and the y-distance and it's [00:19:31] x-distance and the y-distance and it's going to look at [00:19:32] going to look at um a uniform distribution over [00:19:35] um a uniform distribution over basically a box [00:19:38] basically a box okay so if i hit ctrl enter here we can [00:19:42] okay so if i hit ctrl enter here we can see the observations they're very noisy [00:19:46] see the observations they're very noisy and we're trying to guess where the [00:19:47] and we're trying to guess where the object is so i don't know where [00:19:50] object is so i don't know where somewhere um [00:19:53] somewhere um so what we're going to do is we're going [00:19:56] so what we're going to do is we're going to [00:19:58] to run particle filtering let's say we have [00:20:00] run particle filtering let's say we have uh uh 10 000 particles [00:20:04] uh uh 10 000 particles we hit control enter again and now what [00:20:06] we hit control enter again and now what we're going to see [00:20:08] we're going to see is um a red blob [00:20:11] is um a red blob and this represents where the particles [00:20:14] and this represents where the particles are [00:20:14] are with intensity [00:20:16] with intensity of the particle representing the number [00:20:19] of the particle representing the number of particles at that particular [00:20:22] of particles at that particular location [00:20:24] location so you'll see that [00:20:26] so you'll see that this is our kind of best guess of where [00:20:29] this is our kind of best guess of where the object [00:20:30] the object is [00:20:32] is okay so you can see how well we're doing [00:20:34] okay so you can see how well we're doing by [00:20:35] by showing the true position [00:20:37] showing the true position so let's see where this object actually [00:20:39] so let's see where this object actually is and we'll see that we're tracking it [00:20:42] is and we'll see that we're tracking it you know rather well [00:20:44] you know rather well um sometimes i think you'll notice that [00:20:47] um sometimes i think you'll notice that it might mess up but on the whole it's [00:20:50] it might mess up but on the whole it's it's pretty good [00:20:52] it's pretty good so also notice that um [00:20:55] so also notice that um you know [00:20:56] you know the [00:20:57] the the red blob where it thinks the object [00:20:59] the red blob where it thinks the object is is not fooled by where the [00:21:02] is is not fooled by where the observation is because there's enough [00:21:04] observation is because there's enough noise here [00:21:05] noise here um that [00:21:06] um that um what [00:21:08] um what a modeling [00:21:09] a modeling the particle filter is doing is that [00:21:12] the particle filter is doing is that it's essentially kind of smoothing out [00:21:14] it's essentially kind of smoothing out the noise the noise is jumping around a [00:21:17] the noise the noise is jumping around a lot but it's kind of tracking and knows [00:21:19] lot but it's kind of tracking and knows that the object can't be teleporting and [00:21:22] that the object can't be teleporting and it's moving um only [00:21:24] it's moving um only by one step at most each time step [00:21:29] so you can play with this demo a bit [00:21:32] so you can play with this demo a bit more um [00:21:33] more um we also have implemented um instead of [00:21:36] we also have implemented um instead of this box noise you have gaussian noise [00:21:39] this box noise you have gaussian noise which looks uh kind of similar [00:21:41] which looks uh kind of similar more of a spherical blob um you can also [00:21:44] more of a spherical blob um you can also play with it there's kind of really [00:21:46] play with it there's kind of really weird looking noise which [00:21:49] weird looking noise which places uniform distributions over all [00:21:51] places uniform distributions over all positions on this uh [00:21:53] positions on this uh lattice that have a certain kind of [00:21:57] lattice that have a certain kind of parody [00:22:01] okay so in summary we've presented the [00:22:04] okay so in summary we've presented the particle filtering algorithm which [00:22:06] particle filtering algorithm which allows us to [00:22:08] allows us to answer [00:22:09] answer filtering questions of the following [00:22:12] filtering questions of the following where is this object at a particular [00:22:14] where is this object at a particular time step given [00:22:15] time step given evidence so far [00:22:18] evidence so far and the key idea is using particles to [00:22:21] and the key idea is using particles to represent [00:22:22] represent this approximate distribution [00:22:26] this approximate distribution so remember particle filter mean uh has [00:22:30] so remember particle filter mean uh has three steps which [00:22:32] three steps which is used to advance [00:22:34] is used to advance um the set of particles first we propose [00:22:38] um the set of particles first we propose when we take article and [00:22:40] when we take article and transition them to the next time step [00:22:44] transition them to the next time step we uh this is a guess of where the [00:22:47] we uh this is a guess of where the object is going to be at the next time [00:22:48] object is going to be at the next time step [00:22:49] step then we're going to [00:22:52] then we're going to fact check our gas by re-weighting the [00:22:55] fact check our gas by re-weighting the particles based on the mission [00:22:57] particles based on the mission distribution of what we actually saw [00:23:02] distribution of what we actually saw and then we're going to reallocate our [00:23:04] and then we're going to reallocate our resources by resampling and this will [00:23:07] resources by resampling and this will allow the particles to occupy [00:23:10] allow the particles to occupy the regions with a higher weight [00:23:14] the regions with a higher weight so unlike forward backward algorithm [00:23:16] so unlike forward backward algorithm this particle filtering allows us to [00:23:18] this particle filtering allows us to scale up to cases where there are large [00:23:20] scale up to cases where there are large number of locations [00:23:22] number of locations and also unlike a beam search [00:23:26] and also unlike a beam search it allows us to maintain [00:23:28] it allows us to maintain a better particle diversity especially [00:23:31] a better particle diversity especially in situations where the distribution is [00:23:33] in situations where the distribution is close to uniform [00:23:36] close to uniform so now particle filtering is also called [00:23:38] so now particle filtering is also called a sequential monte carlo and there's [00:23:41] a sequential monte carlo and there's many many more sophisticated extensions [00:23:44] many many more sophisticated extensions that i haven't covered in particular [00:23:46] that i haven't covered in particular particle filtering works for general [00:23:48] particle filtering works for general factor graphs not just hidden markup [00:23:50] factor graphs not just hidden markup models and i encourage you to read up [00:23:53] models and i encourage you to read up and learn more about it [00:23:55] and learn more about it that's all ================================================================================ LECTURE 038 ================================================================================ Bayesian Networks 7 - Supervised Learning | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=_rbDjsJTgm8 --- Transcript [00:00:05] so far we've introduced bayesian [00:00:07] so far we've introduced bayesian networks and talked about how to perform [00:00:09] networks and talked about how to perform inference in them in this module we'll [00:00:11] inference in them in this module we'll turn to the question of how to learn [00:00:13] turn to the question of how to learn them from [00:00:15] them from so recall that a bayesian network [00:00:17] so recall that a bayesian network consists of a set of random variables [00:00:19] consists of a set of random variables for example [00:00:20] for example of cold allergies [00:00:22] of cold allergies off [00:00:23] off and itchy eyes [00:00:25] and itchy eyes the bayesian network also comes equipped [00:00:27] the bayesian network also comes equipped with a dag specifying the qualitative [00:00:30] with a dag specifying the qualitative relationships between all these [00:00:33] relationships between all these different variables [00:00:35] different variables quantitatively however the bayesian [00:00:37] quantitatively however the bayesian network defines [00:00:39] network defines a set of local conditional distributions [00:00:41] a set of local conditional distributions over each variable x i given the parents [00:00:45] over each variable x i given the parents of i [00:00:48] and in this example we would have [00:00:50] and in this example we would have probability of c given parents which are [00:00:52] probability of c given parents which are none probability of a probability of h [00:00:55] none probability of a probability of h given its two parents c and a and [00:00:57] given its two parents c and a and probability of i given a [00:01:01] probability of i given a so finally if we multiply all of these [00:01:04] so finally if we multiply all of these probability distributions together [00:01:07] probability distributions together then we get [00:01:08] then we get what is the joint distribution all [00:01:11] what is the joint distribution all random variables in this case we have a [00:01:15] random variables in this case we have a c [00:01:16] c a [00:01:17] a h [00:01:18] h and i [00:01:22] then there's a question of how you do [00:01:24] then there's a question of how you do inference in asian networks so inference [00:01:27] inference in asian networks so inference remember you're given the bayesian [00:01:28] remember you're given the bayesian network you're given some evidence that [00:01:31] network you're given some evidence that you observe for example h and i equals [00:01:34] you observe for example h and i equals one and one over a subset of the [00:01:36] one and one over a subset of the variables and then you're given a query [00:01:38] variables and then you're given a query variable which is something that you're [00:01:40] variable which is something that you're interested in let's say you [00:01:42] interested in let's say you hold [00:01:43] hold and the inference algorithm is going to [00:01:45] and the inference algorithm is going to produce [00:01:46] produce a distribution over your query variables [00:01:48] a distribution over your query variables condition on evidence [00:01:50] condition on evidence so for every possible setting of the [00:01:53] so for every possible setting of the query variable we have a probability [00:01:56] query variable we have a probability so we saw many ways of doing this [00:01:59] so we saw many ways of doing this including manually by exhaustive [00:02:01] including manually by exhaustive enumeration [00:02:03] enumeration we can convert major networks into [00:02:04] we can convert major networks into markup networks and do give sampling and [00:02:07] markup networks and do give sampling and then for hmms we have specialized [00:02:09] then for hmms we have specialized techniques such as the forward backward [00:02:11] techniques such as the forward backward algorithm and particle filtering [00:02:15] so inference assumes that all these [00:02:17] so inference assumes that all these local conditional distributions are [00:02:19] local conditional distributions are known [00:02:20] known but the big question is where did all [00:02:23] but the big question is where did all these come from [00:02:25] these come from so all these numbers are called the [00:02:27] so all these numbers are called the parameters of the bayesian network the [00:02:30] parameters of the bayesian network the red question marks and in general we [00:02:33] red question marks and in general we might not know what they are [00:02:36] might not know what they are so let's try to learn them so again in [00:02:39] so let's try to learn them so again in all learning tasks we start with the [00:02:41] all learning tasks we start with the data so in this case the training data [00:02:44] data so in this case the training data is going to include [00:02:46] is going to include examples where each example is a [00:02:48] examples where each example is a complete assignment to x so this is the [00:02:50] complete assignment to x so this is the fully supervised setting which is the [00:02:52] fully supervised setting which is the simplest one to start out with [00:02:55] simplest one to start out with and then the learning algorithm is going [00:02:56] and then the learning algorithm is going to produce parameters and the parameters [00:02:59] to produce parameters and the parameters are exactly all these red question marks [00:03:01] are exactly all these red question marks these are all the local conditional [00:03:03] these are all the local conditional probabilities [00:03:06] probabilities so [00:03:07] so we're going to go through a bunch of [00:03:09] we're going to go through a bunch of examples and then later show a general [00:03:11] examples and then later show a general principle that ties all of them together [00:03:14] principle that ties all of them together so you might be feeling a little bit um [00:03:18] so you might be feeling a little bit um that this this might be very challenging [00:03:20] that this this might be very challenging because probabilistic inference assumes [00:03:22] because probabilistic inference assumes you know the parameters and it was [00:03:24] you know the parameters and it was already pretty hard both computationally [00:03:27] already pretty hard both computationally and perhaps conceptually even [00:03:29] and perhaps conceptually even but it turns out that for bayesian [00:03:31] but it turns out that for bayesian networks at least somewhat surprising if [00:03:34] networks at least somewhat surprising if you're learning under fully supervised [00:03:36] you're learning under fully supervised data learning actually turns out to be [00:03:39] data learning actually turns out to be surprisingly easy [00:03:42] so let's begin [00:03:44] so let's begin so suppose you're developing vision [00:03:46] so suppose you're developing vision networks to model how [00:03:48] networks to model how people rate movies [00:03:50] people rate movies so um let's start with the world's [00:03:52] so um let's start with the world's simplest page network which has one [00:03:54] simplest page network which has one variable r which represents the rating [00:03:56] variable r which represents the rating of a movie so the joint distribution is [00:03:58] of a movie so the joint distribution is just p r in this case [00:04:00] just p r in this case the movie rating can be one two five [00:04:04] the movie rating can be one two five so first we have to [00:04:05] so first we have to identify what the parameters are so the [00:04:07] identify what the parameters are so the parameters here theta is just the [00:04:09] parameters here theta is just the probability of one probability of two [00:04:11] probability of one probability of two probably three part of a four [00:04:13] probably three part of a four probability of five [00:04:14] probability of five there are five parameters and if you're [00:04:17] there are five parameters and if you're a little bit clever you only need four [00:04:18] a little bit clever you only need four of them because the five numbers have to [00:04:20] of them because the five numbers have to sum to one but for the sake of [00:04:23] sum to one but for the sake of simplicity let's just say that there's [00:04:25] simplicity let's just say that there's five parameters [00:04:27] five parameters okay and now you're given some training [00:04:29] okay and now you're given some training data [00:04:30] data some ratings from users you have a one [00:04:32] some ratings from users you have a one you have a three i have a bunch of fours [00:04:34] you have a three i have a bunch of fours and three fives [00:04:36] and three fives and now the question is how do you [00:04:38] and now the question is how do you estimate the parameters given the [00:04:40] estimate the parameters given the training data [00:04:42] training data let's just follow our notes here well [00:04:44] let's just follow our notes here well intuitively you would think that the [00:04:46] intuitively you would think that the probability of a rating [00:04:48] probability of a rating is a proportional to the number of [00:04:50] is a proportional to the number of occurrences [00:04:51] occurrences of that particular rating in the [00:04:52] of that particular rating in the training data [00:04:53] training data so now this is just intuition um [00:04:56] so now this is just intuition um it might be a good thing or it might not [00:04:58] it might be a good thing or it might not be a good thing well let's find out [00:05:00] be a good thing well let's find out later but let's just go with that for [00:05:01] later but let's just go with that for now [00:05:03] now so here's the training data and what i'm [00:05:05] so here's the training data and what i'm going to do [00:05:06] going to do um is the parameters are a [00:05:10] um is the parameters are a probability table so we're going to see [00:05:12] probability table so we're going to see a lot of these over the course of the [00:05:13] a lot of these over the course of the next few slides so for every rainy i'm [00:05:15] next few slides so for every rainy i'm going to count number of times it shows [00:05:17] going to count number of times it shows up 1 shows up once [00:05:19] up 1 shows up once 2 shows up 0 times 3 shows up once [00:05:23] 2 shows up 0 times 3 shows up once four shows up five times [00:05:26] four shows up five times and five shows up three times [00:05:29] and five shows up three times and now i'm just going to [00:05:30] and now i'm just going to sum up all the counts that gives me 10 [00:05:33] sum up all the counts that gives me 10 i'm going to normalize to get my [00:05:35] i'm going to normalize to get my probabilities [00:05:36] probabilities and that's the probabilistment that's it [00:05:40] and that's the probabilistment that's it count and normalize [00:05:41] count and normalize okay so let's um [00:05:44] okay so let's um level up a little bit and talk about two [00:05:46] level up a little bit and talk about two variables suppose that [00:05:48] variables suppose that um now the rating is governed by the [00:05:51] um now the rating is governed by the genre so in particular the asian network [00:05:53] genre so in particular the asian network is you first generate the genre and then [00:05:55] is you first generate the genre and then you generate the rating given genre [00:05:58] you generate the rating given genre so now there's um [00:06:01] so now there's um the parameters of this bayesian network [00:06:03] the parameters of this bayesian network includes both the probability of the [00:06:05] includes both the probability of the genre which can two parameters and the [00:06:07] genre which can two parameters and the probability of rating given genre which [00:06:10] probability of rating given genre which includes [00:06:11] includes two times five parameters so 10 [00:06:13] two times five parameters so 10 parameters for a total of 12 parameters [00:06:16] parameters for a total of 12 parameters again if you're being clever you can get [00:06:17] again if you're being clever you can get that down to nine [00:06:19] that down to nine so now we're giving some training data [00:06:22] so now we're giving some training data so we have um each training a point [00:06:26] so we have um each training a point remember is a full assignment all the [00:06:28] remember is a full assignment all the variables so we have our g equals d and [00:06:31] variables so we have our g equals d and r equals four here [00:06:35] r equals four here so now [00:06:36] so now how do we estimate the parameters given [00:06:39] how do we estimate the parameters given this more complicated [00:06:41] this more complicated network [00:06:42] network so following our nodes again there's [00:06:44] so following our nodes again there's intuitive strategy is that we're just [00:06:46] intuitive strategy is that we're just going to estimate each local conditional [00:06:48] going to estimate each local conditional distribution separately and see what [00:06:50] distribution separately and see what happens okay so what does that mean that [00:06:53] happens okay so what does that mean that means for [00:06:54] means for probability of [00:06:56] probability of g [00:06:57] g i'm just going to count the number of [00:06:58] i'm just going to count the number of times [00:06:59] times particular values of g show up so d [00:07:01] particular values of g show up so d shows up uh one two three times [00:07:05] shows up uh one two three times and c shows up twice so notice that this [00:07:08] and c shows up twice so notice that this is the kind of same calculation as we [00:07:11] is the kind of same calculation as we had before [00:07:12] had before so now this is three fits and two fits [00:07:15] so now this is three fits and two fits if you sum up and normalize [00:07:19] if you sum up and normalize okay so in estimating p of g i simply [00:07:22] okay so in estimating p of g i simply only look at the slice of the examples [00:07:25] only look at the slice of the examples that matter for this [00:07:27] that matter for this and same with a [00:07:28] and same with a probability of [00:07:31] probability of r given g so now i'm going to look at [00:07:34] r given g so now i'm going to look at all the possible [00:07:36] all the possible assignments to [00:07:38] assignments to the parent of a particular [00:07:40] the parent of a particular a node and also that node so that's a g [00:07:43] a node and also that node so that's a g and r so [00:07:44] and r so d4 [00:07:45] d4 shows up twice [00:07:48] shows up twice e5 shows up once c1 shows up once and c5 [00:07:52] e5 shows up once c1 shows up once and c5 shows up once [00:07:53] shows up once now i count and normalize and i get my [00:07:56] now i count and normalize and i get my probability estimate of r [00:07:58] probability estimate of r and g [00:08:00] and g okay so far so good so [00:08:02] okay so far so good so in summary [00:08:03] in summary consider each local conditional [00:08:05] consider each local conditional distribution separately and then count [00:08:08] distribution separately and then count based on the slice of the data that's [00:08:10] based on the slice of the data that's matters and normalize [00:08:13] matters and normalize so now let's um consider three variables [00:08:17] so now let's um consider three variables um so we have a genre whether the movie [00:08:20] um so we have a genre whether the movie won an award or not and the rating [00:08:22] won an award or not and the rating so here we have a genre and whether it [00:08:25] so here we have a genre and whether it won award influencing whether um how [00:08:28] won award influencing whether um how well the movie is rated joint [00:08:30] well the movie is rated joint distribution is p of g p of a p of r [00:08:33] distribution is p of g p of a p of r given g n a [00:08:35] given g n a so now we have [00:08:37] so now we have local conditional distributions for each [00:08:40] local conditional distributions for each of these [00:08:42] of these factors here [00:08:44] factors here so remember that v structures [00:08:47] so remember that v structures this type of structure was really [00:08:49] this type of structure was really special in bayesian networks it gives [00:08:52] special in bayesian networks it gives rise to explaining away [00:08:53] rise to explaining away it's a thing that if you marginalize [00:08:55] it's a thing that if you marginalize unobserved leaves you render things come [00:08:58] unobserved leaves you render things come independent [00:08:59] independent and it was really a hallmark of visual [00:09:01] and it was really a hallmark of visual but from a perspective of learning [00:09:03] but from a perspective of learning there's really nothing special here [00:09:06] there's really nothing special here and uh to see this what we're gonna do [00:09:09] and uh to see this what we're gonna do is just um you know suppose we have some [00:09:12] is just um you know suppose we have some training data which concludes [00:09:14] training data which concludes assignments to all three variables [00:09:16] assignments to all three variables um we're just going to count and [00:09:17] um we're just going to count and normalize again okay so [00:09:20] normalize again okay so here [00:09:21] here um we're going to start with p of g [00:09:24] um we're going to start with p of g this is exactly the same thing as before [00:09:26] this is exactly the same thing as before we just look at only the genre [00:09:28] we just look at only the genre and then we're going to look at um [00:09:32] and then we're going to look at um here a which is analogously looking at [00:09:35] here a which is analogously looking at only [00:09:37] only 0 1 0 [00:09:40] 0 1 0 1 [00:09:41] 1 and counting and normalizing and now the [00:09:44] and counting and normalizing and now the the big local conditional distribution [00:09:47] the big local conditional distribution is p of r [00:09:48] is p of r given g and a so here i'm going to look [00:09:52] given g and a so here i'm going to look at [00:09:53] at the parents of r [00:09:55] the parents of r and r itself i'm going to count the [00:09:57] and r itself i'm going to count the number of times this local configuration [00:10:00] number of times this local configuration happens so i have d01 [00:10:02] happens so i have d01 showing up once uh d03 [00:10:06] showing up once uh d03 um [00:10:07] um going up once [00:10:08] going up once um [00:10:10] um and d15 showing up once [00:10:13] and d15 showing up once and each of these showing once and now [00:10:16] and each of these showing once and now now i want to normalize so i have to be [00:10:18] now i want to normalize so i have to be a little bit careful i don't want to add [00:10:20] a little bit careful i don't want to add all these numbers and normalize because [00:10:22] all these numbers and normalize because this is [00:10:23] this is um [00:10:24] um conditioned on [00:10:26] conditioned on g and a [00:10:28] g and a so that means for every [00:10:30] so that means for every uh [00:10:31] uh g and setting of g and a i have my a [00:10:34] g and setting of g and a i have my a distribution over r [00:10:36] distribution over r i'm gonna look at d zero [00:10:39] i'm gonna look at d zero so i have uh r um one occurrence of r [00:10:42] so i have uh r um one occurrence of r equals one and one occurrence of r [00:10:44] equals one and one occurrence of r equals three so that's [00:10:46] equals three so that's if i normalize that it's going to give [00:10:48] if i normalize that it's going to give me a half and a half [00:10:50] me a half and a half and now for [00:10:51] and now for this setting g and a i only have one [00:10:55] this setting g and a i only have one possibility of r so that has probability [00:10:58] possibility of r so that has probability one and same for these other ones [00:11:02] one and same for these other ones so again everything is count and [00:11:04] so again everything is count and normalize where you have to pay [00:11:07] normalize where you have to pay attention to what you're normalizing [00:11:09] attention to what you're normalizing over you're only normalizing over [00:11:11] over you're only normalizing over possible [00:11:13] possible values of r [00:11:15] values of r not g and a [00:11:19] so one thing you might note is that [00:11:22] so one thing you might note is that a lot of these probabilities are [00:11:24] a lot of these probabilities are one and the probabilities that are not [00:11:26] one and the probabilities that are not mentioned here are zero so you might [00:11:29] mentioned here are zero so you might wonder that if this is a good estimate [00:11:31] wonder that if this is a good estimate but we'll come back to that later [00:11:34] but we'll come back to that later so um now let's invert the b structure [00:11:37] so um now let's invert the b structure let's look at a different [00:11:39] let's look at a different structure so we have the genre and [00:11:41] structure so we have the genre and suppose we have two people jim and [00:11:42] suppose we have two people jim and martha and they're both going to rate [00:11:45] martha and they're both going to rate this movie and both of them rated [00:11:47] this movie and both of them rated depending on the genre so g generates r1 [00:11:50] depending on the genre so g generates r1 and also generates r2 [00:11:55] so now we have this three uh node page [00:11:58] so now we have this three uh node page network [00:11:59] network and um the estimation is going to [00:12:03] and um the estimation is going to be the same i'll just go through it very [00:12:05] be the same i'll just go through it very quickly so we have parameters one for [00:12:08] quickly so we have parameters one for every variable here [00:12:10] every variable here and so [00:12:12] and so probability of [00:12:14] probability of g [00:12:15] g is count to normalize [00:12:17] is count to normalize probability of [00:12:19] probability of r1 [00:12:20] r1 given g [00:12:21] given g is you count and normalize again [00:12:23] is you count and normalize again remember that [00:12:25] remember that i'm normalizing over [00:12:27] i'm normalizing over possible values of g so you can [00:12:29] possible values of g so you can partition the rows based on the value of [00:12:31] partition the rows based on the value of g so here i have [00:12:33] g so here i have two and one and i'm normalizing two [00:12:35] two and one and i'm normalizing two thirds and one thirds and g equals c is [00:12:38] thirds and one thirds and g equals c is just handled uh separately [00:12:40] just handled uh separately in a separate normalization [00:12:43] in a separate normalization and then um [00:12:44] and then um [Music] [00:12:45] [Music] r2 given g is analogous so i'm not going [00:12:48] r2 given g is analogous so i'm not going to go over this [00:12:51] so this is fine um [00:12:54] so this is fine um except for what i'm going to now do is [00:12:58] except for what i'm going to now do is think [00:12:58] think about the setting where suppose you have [00:13:01] about the setting where suppose you have not just two users but a thousand users [00:13:04] not just two users but a thousand users or a million users now you might be a [00:13:07] or a million users now you might be a little bit worried because now for every [00:13:10] little bit worried because now for every user you might have to [00:13:12] user you might have to have its own have their own [00:13:15] have its own have their own local conditional distribution [00:13:18] local conditional distribution and the number of parameters might just [00:13:20] and the number of parameters might just blow up which means that um estimation [00:13:23] blow up which means that um estimation might be hard especially for new users [00:13:26] might be hard especially for new users so we're going to consider a slightly [00:13:28] so we're going to consider a slightly different um [00:13:30] different um it's going to be the same [00:13:32] it's going to be the same bayesian network here [00:13:35] bayesian network here but [00:13:36] but the the parameters are different [00:13:39] the the parameters are different in particular i'm going to consider [00:13:41] in particular i'm going to consider um a single parameter of r given g [00:13:46] um a single parameter of r given g instead of having p r 1 and p of r 2. [00:13:50] instead of having p r 1 and p of r 2. so now how do i estimate [00:13:53] so now how do i estimate distribution of this model [00:13:56] distribution of this model so [00:13:58] so let's begin so [00:13:59] let's begin so probability of g is the same as before [00:14:02] probability of g is the same as before and now the probability of r given g [00:14:06] and now the probability of r given g i'm just going to count the number of [00:14:08] i'm just going to count the number of times [00:14:10] times a particular local configuration shows [00:14:13] a particular local configuration shows up [00:14:13] up either where r is r1 or r2 so d3 shows [00:14:18] either where r is r1 or r2 so d3 shows up [00:14:19] up once here [00:14:21] once here d4 shows up [00:14:24] d4 shows up three times you have one [00:14:27] three times you have one and [00:14:28] and two and three so notice i'm counting [00:14:31] two and three so notice i'm counting both uh currencies of r1 and r2 [00:14:35] both uh currencies of r1 and r2 and d5 shows up twice um [00:14:38] and d5 shows up twice um here [00:14:39] here with r1 and here with r2 [00:14:42] with r1 and here with r2 um c1 shows up once a c2 shows up one c [00:14:46] um c1 shows up once a c2 shows up one c or shows up once and c5 shows up once as [00:14:49] or shows up once and c5 shows up once as well [00:14:50] well um now [00:14:51] um now i just count and normalize so i look at [00:14:54] i just count and normalize so i look at all the [00:14:55] all the e's and i count from sum and normalize [00:14:59] e's and i count from sum and normalize and i look at all the of c's [00:15:02] and i look at all the of c's i count normalize [00:15:04] i count normalize okay so when i have only one [00:15:07] okay so when i have only one distribution [00:15:09] distribution that is responsible for two nodes i [00:15:11] that is responsible for two nodes i simply aggregate their counts and [00:15:13] simply aggregate their counts and normalize [00:15:16] so this is [00:15:19] so this is an important slide so the more general [00:15:21] an important slide so the more general idea that i want to highlight is this [00:15:23] idea that i want to highlight is this idea of parameter sharing and base [00:15:25] idea of parameter sharing and base networks [00:15:26] networks and this happens when the local [00:15:27] and this happens when the local conditional distributions over different [00:15:29] conditional distributions over different variables [00:15:30] variables are actually [00:15:32] are actually the same [00:15:33] the same and to be very precise about that i want [00:15:36] and to be very precise about that i want you to look at the following picture [00:15:38] you to look at the following picture so we have g [00:15:40] so we have g r1 [00:15:41] r1 and r2 [00:15:44] and r2 um so [00:15:45] um so so far [00:15:47] so far we've looked at bayesian networks [00:15:49] we've looked at bayesian networks through the lens of inference where we [00:15:51] through the lens of inference where we know that every variable comes with [00:15:54] know that every variable comes with a local [00:15:55] a local conditional distribution but we didn't [00:15:57] conditional distribution but we didn't worry about where that [00:15:59] worry about where that came from it was just there [00:16:01] came from it was just there but now for learning it matters where it [00:16:04] but now for learning it matters where it came from [00:16:06] came from so [00:16:06] so what we should think about is each of [00:16:08] what we should think about is each of these variables being powered [00:16:11] these variables being powered by a local conditional distribution so g [00:16:14] by a local conditional distribution so g is powered by this table here [00:16:17] is powered by this table here r1 is powered by this table [00:16:20] r1 is powered by this table and in the case of parameter sharing r2 [00:16:22] and in the case of parameter sharing r2 is also powered by this table [00:16:25] is also powered by this table so we have a bayesian network and behind [00:16:27] so we have a bayesian network and behind the scenes you should think about all [00:16:28] the scenes you should think about all these tables which have arrows kind of [00:16:31] these tables which have arrows kind of hooking up and providing juice to each [00:16:34] hooking up and providing juice to each of these variables [00:16:36] of these variables and now [00:16:37] and now if you didn't have parameter sharing [00:16:39] if you didn't have parameter sharing then r1 and r2 would be powered by [00:16:42] then r1 and r2 would be powered by different tables [00:16:44] different tables now this is an important point when [00:16:46] now this is an important point when we're doing inference [00:16:48] we're doing inference you should think about that as reading [00:16:50] you should think about that as reading from the parameters and when you're [00:16:52] from the parameters and when you're reading you don't care whether [00:16:54] reading you don't care whether um you have two copies of something or [00:16:57] um you have two copies of something or one copy of something because you're [00:16:58] one copy of something because you're getting the same thing [00:17:00] getting the same thing but in learning [00:17:01] but in learning we're writing to the parameters from the [00:17:04] we're writing to the parameters from the observed variables [00:17:06] observed variables in that case [00:17:08] in that case you need to worry about whether you're [00:17:09] you need to worry about whether you're writing to one [00:17:11] writing to one a memory location or two memory [00:17:12] a memory location or two memory locations so the right analogy is you [00:17:15] locations so the right analogy is you think about the programming you have [00:17:16] think about the programming you have pass by reference or passed by value [00:17:19] pass by reference or passed by value and in parameter sharing we're passing [00:17:21] and in parameter sharing we're passing by reference so we're passing this [00:17:23] by reference so we're passing this parameter into each of these nodes and [00:17:26] parameter into each of these nodes and when we do learning we write back into [00:17:28] when we do learning we write back into those parameters and it matters whether [00:17:31] those parameters and it matters whether they're the same [00:17:33] they're the same they're uh or not [00:17:36] so [00:17:38] so when would you do parameter sharing like [00:17:40] when would you do parameter sharing like this [00:17:40] this well it's a trade-off and it's [00:17:42] well it's a trade-off and it's ultimately a modeling decision so by [00:17:44] ultimately a modeling decision so by doing this [00:17:45] doing this you aggregate your data which means that [00:17:47] you aggregate your data which means that you have more data for parameter which [00:17:50] you have more data for parameter which allows you to get more reliable [00:17:51] allows you to get more reliable estimates on the other hand you end up [00:17:54] estimates on the other hand you end up with less expressive models [00:17:56] with less expressive models for example if you had [00:17:58] for example if you had a lot of users you might lose ability to [00:18:00] a lot of users you might lose ability to personalize if you want to share and [00:18:02] personalize if you want to share and there's obviously many of um [00:18:06] intermediate points as well which we [00:18:08] intermediate points as well which we won't get in [00:18:11] so let's look at some other bayesian [00:18:14] so let's look at some other bayesian networks with parameter sharing so [00:18:17] networks with parameter sharing so we already looked at naive bayes before [00:18:19] we already looked at naive bayes before but just to anchor it in this notation [00:18:23] but just to anchor it in this notation we have some let's say we have a genre [00:18:25] we have some let's say we have a genre and we have a movie review [00:18:26] and we have a movie review um and we have a asian network which [00:18:31] um and we have a asian network which generates each word [00:18:33] generates each word independently conditioned on [00:18:36] independently conditioned on the [00:18:37] the genre [00:18:38] genre and so the joint distribution over [00:18:41] and so the joint distribution over everything is equal to probability of [00:18:44] everything is equal to probability of under genre of y times um for each word [00:18:48] under genre of y times um for each word the probability word given of a [00:18:51] the probability word given of a particular word given y [00:18:54] particular word given y so the parameters of this page network [00:18:56] so the parameters of this page network are the genre and p word [00:19:00] are the genre and p word so now you can think about doing a [00:19:02] so now you can think about doing a little exercise of how many parameters [00:19:04] little exercise of how many parameters are there [00:19:06] are there um [00:19:06] um so you look at theta into p genre well [00:19:10] so you look at theta into p genre well that's two parameters because two genres [00:19:12] that's two parameters because two genres keyword that's two times um the number [00:19:16] keyword that's two times um the number of [00:19:17] of words [00:19:19] words that the number of values the wi can [00:19:21] that the number of values the wi can take on [00:19:23] take on and so [00:19:24] and so that's it [00:19:25] that's it um so [00:19:27] um so notice importantly that the number of [00:19:29] notice importantly that the number of parameters does not [00:19:31] parameters does not grow with l even though the number of [00:19:34] grow with l even though the number of variables in the bayesian network grows [00:19:36] variables in the bayesian network grows with l so now we see the kind of [00:19:39] with l so now we see the kind of complexities of the parameters and the [00:19:41] complexities of the parameters and the number of variables to be quite [00:19:43] number of variables to be quite different you can have a million [00:19:44] different you can have a million variable [00:19:45] variable bayesian network but you might have only [00:19:47] bayesian network but you might have only one parameter for example that's quite [00:19:50] one parameter for example that's quite possible so here's another example our [00:19:54] possible so here's another example our friendly hmm so we have actual positions [00:19:57] friendly hmm so we have actual positions of objects h1 through hn and e1 through [00:20:00] of objects h1 through hn and e1 through en and [00:20:01] en and this should be very familiar by now so [00:20:03] this should be very familiar by now so you have hmm [00:20:05] you have hmm which has a joint distribution which is [00:20:07] which has a joint distribution which is given by [00:20:09] given by three distributions p start of h one [00:20:12] three distributions p start of h one times transition of h i given h i minus [00:20:14] times transition of h i given h i minus one times for each variable the [00:20:18] one times for each variable the probability of emitting e i given h i [00:20:21] probability of emitting e i given h i again the parameters are p star p trans [00:20:23] again the parameters are p star p trans and [00:20:25] and and you can think about how many [00:20:27] and you can think about how many parameters are in this [00:20:30] parameters are in this bayesian network well you have the [00:20:32] bayesian network well you have the number of positions times number of [00:20:35] number of positions times number of positions squared times number of [00:20:38] positions squared times number of positions times number of possible [00:20:40] positions times number of possible sensor [00:20:41] sensor reading values [00:20:43] reading values okay again there is no dependence on [00:20:46] okay again there is no dependence on the time window the number of uh time [00:20:50] the time window the number of uh time steps here and [00:20:52] steps here and and this is useful because if you [00:20:53] and this is useful because if you imagine tracking over a long period of [00:20:55] imagine tracking over a long period of time you may have a million time steps [00:20:57] time you may have a million time steps you don't want the number of parameters [00:20:59] you don't want the number of parameters to be same [00:21:01] to be same okay so [00:21:02] okay so here the training data is going to again [00:21:06] here the training data is going to again be full assignments to all the random [00:21:08] be full assignments to all the random variables [00:21:09] variables and [00:21:11] and later in a future module we'll come back [00:21:12] later in a future module we'll come back to the case where in practice [00:21:15] to the case where in practice um you might only observe the sensor [00:21:17] um you might only observe the sensor reading but more on that later [00:21:20] reading but more on that later so now let's present the general case [00:21:22] so now let's present the general case hopefully the intuitions have already um [00:21:24] hopefully the intuitions have already um been fleshed out but i just want to [00:21:27] been fleshed out but i just want to write things down with some formal [00:21:29] write things down with some formal notation so [00:21:30] notation so a bayesian network um [00:21:32] a bayesian network um remember includes variables x1 through [00:21:34] remember includes variables x1 through xn [00:21:35] xn and [00:21:36] and now we have parameters [00:21:39] now we have parameters and the parameters is a collection of [00:21:42] and the parameters is a collection of distributions [00:21:44] distributions um so i'm going to write that as p [00:21:46] um so i'm going to write that as p subscript d [00:21:48] subscript d where d indexes into [00:21:50] where d indexes into um some set and for the hmm for example [00:21:54] um some set and for the hmm for example big d is start trans and min so d is [00:21:56] big d is start trans and min so d is just a label if you will [00:21:59] just a label if you will name [00:22:01] name so each variable x i is generated from [00:22:04] so each variable x i is generated from some distribution and now the notation [00:22:07] some distribution and now the notation gets a little bit [00:22:09] gets a little bit scary but it's p sub [00:22:11] scary but it's p sub d i [00:22:12] d i is the distribution [00:22:14] is the distribution that points into x i and i'm taking that [00:22:17] that points into x i and i'm taking that um [00:22:18] um looking up that distribution by name [00:22:23] so um you can think about this more [00:22:26] so um you can think about this more formally as this is just the equation [00:22:29] formally as this is just the equation for defining what evasion network is um [00:22:32] for defining what evasion network is um of the joint distribution [00:22:33] of the joint distribution those products the local conditional [00:22:36] those products the local conditional distributions but now i'm being very [00:22:38] distributions but now i'm being very explicit [00:22:39] explicit that each variable [00:22:41] that each variable di [00:22:43] di every variable i has a particular [00:22:45] every variable i has a particular distribution that is powering that [00:22:47] distribution that is powering that variable [00:22:50] variable so the idea of parameter sharing is that [00:22:51] so the idea of parameter sharing is that di is just the same for multiple eyes [00:22:57] okay so here is [00:22:59] okay so here is the learning algorithm for general [00:23:01] the learning algorithm for general bayesian networks [00:23:03] bayesian networks so the input is a d train consisting of [00:23:06] so the input is a d train consisting of four assignments to all the variables x1 [00:23:08] four assignments to all the variables x1 for xn [00:23:09] for xn and the output is going to be all these [00:23:12] and the output is going to be all these distributions here [00:23:13] distributions here so the algorithm is again just count and [00:23:16] so the algorithm is again just count and normalize so what we're going to do is [00:23:18] normalize so what we're going to do is go through every training [00:23:20] go through every training example which is the full assignment to [00:23:22] example which is the full assignment to all the variables for every [00:23:24] all the variables for every variable in your [00:23:26] variable in your basic network we're just going to [00:23:28] basic network we're just going to increment a counter [00:23:30] increment a counter okay so what this counter is is [00:23:34] okay so what this counter is is i look at which distribution is powering [00:23:36] i look at which distribution is powering variable i [00:23:38] variable i and i'm going to increment [00:23:40] and i'm going to increment the that counter for [00:23:42] the that counter for the local configuration which is looking [00:23:45] the local configuration which is looking at [00:23:45] at um [00:23:46] um assignment to its parents and also the [00:23:49] assignment to its parents and also the value of x i [00:23:52] value of x i and then i'm just going to normalize for [00:23:54] and then i'm just going to normalize for each distribution and local assignment [00:23:57] each distribution and local assignment to its its parents [00:24:00] to its its parents i'm going to set the probability under [00:24:03] i'm going to set the probability under the distribution of x i given parents to [00:24:06] the distribution of x i given parents to be proportional [00:24:07] be proportional to this count [00:24:10] to this count okay and that's it [00:24:15] so so far we've presented this counter [00:24:17] so so far we've presented this counter normalized algorithm shown a lot of [00:24:19] normalized algorithm shown a lot of examples and hopefully this seems like a [00:24:22] examples and hopefully this seems like a reasonable thing to do [00:24:24] reasonable thing to do but [00:24:24] but part of you might still be wondering [00:24:26] part of you might still be wondering well why [00:24:28] well why why is talented normalize a reasonable [00:24:30] why is talented normalize a reasonable thing to do and there [00:24:32] thing to do and there is a higher principle here and it's [00:24:34] is a higher principle here and it's called maximum likelihood [00:24:36] called maximum likelihood so the principle of maximum likelihood [00:24:38] so the principle of maximum likelihood which is a a very old idea in statistics [00:24:42] which is a a very old idea in statistics is that [00:24:43] is that we have our training data here [00:24:47] we have our training data here okay so we [00:24:48] okay so we if we look at the product over all [00:24:51] if we look at the product over all examples in the training data and we [00:24:53] examples in the training data and we look at the probability [00:24:54] look at the probability under the bayesian network that um [00:24:58] under the bayesian network that um that is assigned to that data [00:25:00] that is assigned to that data and notice i'm going to provide [00:25:01] and notice i'm going to provide semicolon theta here [00:25:03] semicolon theta here to [00:25:04] to recognize the fact that this bayesian [00:25:06] recognize the fact that this bayesian network depends on the parameters now [00:25:10] network depends on the parameters now so this is the likelihood of this data [00:25:13] so this is the likelihood of this data given these parameters and maximum [00:25:16] given these parameters and maximum likelihood is saying i want to tweak [00:25:18] likelihood is saying i want to tweak these parameters so that this likelihood [00:25:21] these parameters so that this likelihood as large as possible [00:25:23] as large as possible so this should look a little bit more [00:25:25] so this should look a little bit more like what we were doing in the machine [00:25:27] like what we were doing in the machine learning modules where we write down a [00:25:29] learning modules where we write down a loss function which depends on [00:25:30] loss function which depends on parameters and and which is usually a [00:25:33] parameters and and which is usually a sum over the data and we try to minimize [00:25:36] sum over the data and we try to minimize find the parameters that minimize loss [00:25:38] find the parameters that minimize loss here it's the opposite we're trying to [00:25:40] here it's the opposite we're trying to find the parameters that maximize [00:25:42] find the parameters that maximize the likelihood [00:25:44] the likelihood and if you just take a log and you [00:25:47] and if you just take a log and you negate it you actually end up with [00:25:49] negate it you actually end up with minimum loss as well but [00:25:51] minimum loss as well but i will ignore that for now [00:25:54] i will ignore that for now so intuitively this is um [00:25:57] so intuitively this is um a reasonable principle as well what [00:26:00] a reasonable principle as well what you're trying to do is for every setting [00:26:02] you're trying to do is for every setting of parameters that gives you some [00:26:04] of parameters that gives you some likelihood under the model of the data [00:26:07] likelihood under the model of the data and you just want to keep on tweaking [00:26:09] and you just want to keep on tweaking that until the likelihood as high as [00:26:12] that until the likelihood as high as possible [00:26:14] possible so [00:26:16] so having said that [00:26:18] having said that now i'm just going to claim that that [00:26:20] now i'm just going to claim that that algorithm which we call counter [00:26:21] algorithm which we call counter normalize is exactly solving the maximum [00:26:25] normalize is exactly solving the maximum likelihood objective [00:26:27] likelihood objective so this is really nice because it gives [00:26:29] so this is really nice because it gives us a closed solute form solution [00:26:32] us a closed solute form solution to this maximum likelihood objective you [00:26:34] to this maximum likelihood objective you don't have to take the gradient of this [00:26:36] don't have to take the gradient of this and iterate and worry about convergence [00:26:38] and iterate and worry about convergence also it's just done and this is one of [00:26:41] also it's just done and this is one of the reasons that makes maximum molecular [00:26:43] the reasons that makes maximum molecular estimation invasion networks so um [00:26:46] estimation invasion networks so um scalable and [00:26:48] scalable and you know intuitive is that well it is [00:26:51] you know intuitive is that well it is available [00:26:53] available that was a little bit logical [00:26:56] that was a little bit logical all right so um i haven't justified why [00:26:59] all right so um i haven't justified why maximum likelihood principle leads to [00:27:01] maximum likelihood principle leads to the countenance algorithm but let me [00:27:03] the countenance algorithm but let me just provide you a little bit of a taste [00:27:06] just provide you a little bit of a taste of why this might be the case so let's [00:27:09] of why this might be the case so let's take this small data set d4 d5 and c5 [00:27:14] take this small data set d4 d5 and c5 so um if i write down the maximum [00:27:17] so um if i write down the maximum likelihood objective um so i have two [00:27:20] likelihood objective um so i have two two variables here [00:27:23] two variables here i'm going to expand [00:27:26] i'm going to expand that okay so i have max over theta and [00:27:29] that okay so i have max over theta and theta really here is the probability of [00:27:32] theta really here is the probability of genre [00:27:33] genre the probability of rating [00:27:35] the probability of rating of given that genre is [00:27:38] of given that genre is c and the probability of writing a given [00:27:39] c and the probability of writing a given genre is d [00:27:41] genre is d so i have three [00:27:42] so i have three distributions here [00:27:44] distributions here that i want to [00:27:46] that i want to optimize and i'm just expand out based [00:27:49] optimize and i'm just expand out based on the definition of a bayesian network [00:27:51] on the definition of a bayesian network um [00:27:52] um i have probability of a d given for [00:27:56] i have probability of a d given for probability of rating for given d [00:27:58] probability of rating for given d so that is [00:27:59] so that is the probability [00:28:01] the probability of the first [00:28:03] of the first data point times [00:28:05] data point times p of d [00:28:06] p of d five given d that's the second data [00:28:09] five given d that's the second data point and then e of c and p of five [00:28:12] point and then e of c and p of five given c that's the third data point so [00:28:14] given c that's the third data point so i'm multiplying all these probabilities [00:28:15] i'm multiplying all these probabilities across all the data points and that is [00:28:18] across all the data points and that is the probability [00:28:20] the probability of the data given a particular [00:28:25] of the data given a particular assignment to the local conditional [00:28:27] assignment to the local conditional distribution [00:28:30] distribution and now i've color coded them on purpose [00:28:32] and now i've color coded them on purpose because what we can do is we can shuffle [00:28:35] because what we can do is we can shuffle things around [00:28:36] things around if you just look at probability of g [00:28:39] if you just look at probability of g right so i'm maxing over that [00:28:41] right so i'm maxing over that and [00:28:42] and that [00:28:43] that shows up in these three places and it [00:28:46] shows up in these three places and it doesn't uh affect anything else so i'm i [00:28:49] doesn't uh affect anything else so i'm i can just pull that out [00:28:51] can just pull that out and i can pull the green [00:28:53] and i can pull the green apart which is up r given c [00:28:56] apart which is up r given c and can pull the blue stuff out and [00:28:59] and can pull the blue stuff out and that's maximizing over um if r [00:29:02] that's maximizing over um if r are given [00:29:04] are given g equals d here [00:29:06] g equals d here so [00:29:07] so the [00:29:08] the punchline here is that we can decompose [00:29:13] punchline here is that we can decompose the maximum likelihood objective which [00:29:14] the maximum likelihood objective which looks like a big tangled mess [00:29:16] looks like a big tangled mess into actually some problems one for [00:29:18] into actually some problems one for every distribution and assignments to [00:29:22] every distribution and assignments to the parents of [00:29:23] the parents of a particular variable [00:29:26] a particular variable and now having done that [00:29:28] and now having done that now i have just one a little local [00:29:32] now i have just one a little local uh optimization problem here [00:29:35] uh optimization problem here which is uh basically [00:29:37] which is uh basically a solved by um in closed form [00:29:41] a solved by um in closed form uh you can do this uh i'm not gonna do [00:29:43] uh you can do this uh i'm not gonna do this for you but you can introduce a [00:29:45] this for you but you can introduce a lagrange multiplier for the sum to one [00:29:47] lagrange multiplier for the sum to one constraint [00:29:48] constraint and you um do some [00:29:52] and you um do some uh take some derivatives and you set it [00:29:53] uh take some derivatives and you set it as zero and then you get that [00:29:55] as zero and then you get that uh the probability max molecular [00:29:58] uh the probability max molecular probability is proportional to the case [00:30:00] probability is proportional to the case here [00:30:01] here and in this case what we will estimate [00:30:04] and in this case what we will estimate is that um [00:30:06] is that um the probability of d is two-thirds [00:30:08] the probability of d is two-thirds probably c is one-third and so on [00:30:15] okay so let me summarize now so we've [00:30:17] okay so let me summarize now so we've talked about [00:30:19] talked about learning in fully supervised bayesian [00:30:21] learning in fully supervised bayesian networks where we're observing instances [00:30:24] networks where we're observing instances of all [00:30:25] of all the variables here [00:30:27] the variables here so one important concept to take away is [00:30:29] so one important concept to take away is this idea of parameter sharing [00:30:32] this idea of parameter sharing so we have talked about [00:30:34] so we have talked about just a bayesian network [00:30:37] just a bayesian network which an inference doesn't care about [00:30:39] which an inference doesn't care about where these parameters come from [00:30:42] where these parameters come from but we should really think about each of [00:30:43] but we should really think about each of these nodes as being powered by [00:30:47] these nodes as being powered by a particular [00:30:48] a particular local conditional distribution [00:30:50] local conditional distribution and sometimes two variables could be [00:30:53] and sometimes two variables could be powered by the same distribution [00:30:56] powered by the same distribution and again inference is reading from the [00:30:58] and again inference is reading from the parameters right [00:31:00] parameters right learning is right into the parameters in [00:31:01] learning is right into the parameters in which case [00:31:03] which case it matters where these arrows come from [00:31:08] so [00:31:09] so secondly we looked at the maximum [00:31:12] secondly we looked at the maximum likelihood principle which is this kind [00:31:14] likelihood principle which is this kind of high minor principle that says [00:31:16] of high minor principle that says maximize the likelihood of your data and [00:31:18] maximize the likelihood of your data and we show that this is [00:31:20] we show that this is equal to this very [00:31:22] equal to this very pragmatic and simple intuitive principle [00:31:25] pragmatic and simple intuitive principle of [00:31:26] of counting and normalizing [00:31:28] counting and normalizing and this is the simplicity which makes [00:31:30] and this is the simplicity which makes bayesian networks especially naive bayes [00:31:33] bayesian networks especially naive bayes still very kind of practical useful and [00:31:36] still very kind of practical useful and interpretable [00:31:37] interpretable that's the end ================================================================================ LECTURE 039 ================================================================================ Bayesian Networks 8 - Smoothing | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=M7rWvN_0xbw --- Transcript [00:00:05] hi in this module i'm going to talk [00:00:07] hi in this module i'm going to talk about laplace smoothing for guardian and [00:00:09] about laplace smoothing for guardian and glance over [00:00:11] glance over so let's review maximum likelihood [00:00:13] so let's review maximum likelihood estimation [00:00:15] estimation remember last time we had an example of [00:00:17] remember last time we had an example of a two-variable [00:00:19] a two-variable work a genre of a movie and the rating [00:00:21] work a genre of a movie and the rating of the movie where their joint [00:00:23] of the movie where their joint distribution is given by probability of [00:00:25] distribution is given by probability of a genre times probability of rating [00:00:27] a genre times probability of rating given [00:00:29] given and now we don't know these parameters [00:00:32] and now we don't know these parameters but we want to estimate them from data [00:00:34] but we want to estimate them from data suppose we gather [00:00:36] suppose we gather five data points here [00:00:38] five data points here and the way that maximum likelihood [00:00:40] and the way that maximum likelihood estimation works is by counting and [00:00:42] estimation works is by counting and normalizing parameters here are [00:00:46] normalizing parameters here are probability of g [00:00:47] probability of g and for that i'm going to count the [00:00:49] and for that i'm going to count the number of times [00:00:50] number of times g [00:00:51] g goes up [00:00:52] goes up and normalize [00:00:54] and normalize and for [00:00:56] and for probability of r given g i'm going to [00:00:58] probability of r given g i'm going to look at the number of times each of [00:01:00] look at the number of times each of these configurations shows up [00:01:02] these configurations shows up and then i'm going to normalize [00:01:04] and then i'm going to normalize each one condition on the value [00:01:10] so if you look at these estimates you [00:01:12] so if you look at these estimates you might notice that there's something [00:01:14] might notice that there's something funny going on here [00:01:16] funny going on here so [00:01:17] so the probability that these [00:01:20] the probability that these parameters assigned to a rating of two [00:01:23] parameters assigned to a rating of two given that there's a comedy is zero it [00:01:26] given that there's a comedy is zero it doesn't show up in this row of this [00:01:28] doesn't show up in this row of this table which means that it's zero [00:01:30] table which means that it's zero so do we really believe this [00:01:33] so do we really believe this just because we didn't see an example of [00:01:35] just because we didn't see an example of a comedy being rated as a two are we [00:01:38] a comedy being rated as a two are we licensed to just give it a probably zero [00:01:40] licensed to just give it a probably zero well that would be very close my dead [00:01:43] well that would be very close my dead so this is a case where maximum [00:01:44] so this is a case where maximum likelihood has overfit [00:01:49] there's a very simple way to fix this [00:01:51] there's a very simple way to fix this called laplace smoothing [00:01:53] called laplace smoothing and the idea is that we're just going to [00:01:55] and the idea is that we're just going to add [00:01:56] add a lambda [00:01:58] a lambda which is some positive value let's say [00:02:00] which is some positive value let's say one to each count so let's do a maximum [00:02:03] one to each count so let's do a maximum likelihood with laplace movement [00:02:06] likelihood with laplace movement so training data is the same as before [00:02:08] so training data is the same as before and what we're going to do is for each [00:02:10] and what we're going to do is for each of these local distributions we're going [00:02:13] of these local distributions we're going to preset pre-load a 1 [00:02:16] to preset pre-load a 1 labda more generally into this position [00:02:20] labda more generally into this position and now i'm going to go through the [00:02:21] and now i'm going to go through the training and count as usual so i add [00:02:23] training and count as usual so i add three and i add two [00:02:25] three and i add two and then i'm going to normalize [00:02:27] and then i'm going to normalize over these combined counts [00:02:30] over these combined counts and same with uh the probability of r [00:02:34] and same with uh the probability of r given [00:02:35] given g for each of these [00:02:38] g for each of these configurations so now i have to actually [00:02:40] configurations so now i have to actually instantiate all possible configurations [00:02:43] instantiate all possible configurations i'm going to load a one [00:02:46] i'm going to load a one into each of these counts [00:02:49] into each of these counts and then i'm going to look at my [00:02:50] and then i'm going to look at my training data [00:02:52] training data and i'm going to add two there's two d4s [00:02:55] and i'm going to add two there's two d4s one d5 one c1 and one c5 [00:02:58] one d5 one c1 and one c5 now given these counts i'm going to [00:03:00] now given these counts i'm going to normalize to get my probability limit [00:03:03] normalize to get my probability limit look at all the d's count them up [00:03:05] look at all the d's count them up normalize i get [00:03:07] normalize i get um some [00:03:08] um some of these here and look at all the case [00:03:12] of these here and look at all the case rows where g c [00:03:14] rows where g c i'm going to [00:03:16] i'm going to sum in [00:03:18] sum in so now [00:03:20] so now what we revisit our probability estimate [00:03:22] what we revisit our probability estimate of r equals two given uh g equals c [00:03:26] of r equals two given uh g equals c this was zero before [00:03:28] this was zero before but now it's uh [00:03:30] but now it's uh here that's one over seven which is [00:03:32] here that's one over seven which is greater than zero [00:03:34] greater than zero so now because we smooth these estimates [00:03:36] so now because we smooth these estimates now we have a little bit more [00:03:38] now we have a little bit more probability on even [00:03:40] probability on even those outcomes that we've never seen [00:03:43] those outcomes that we've never seen during training [00:03:46] during training the key idea behind maximum likelihood [00:03:48] the key idea behind maximum likelihood laplace smoothing is follows [00:03:51] laplace smoothing is follows so we're going to go through each [00:03:53] so we're going to go through each distribution [00:03:55] distribution and partial assignment [00:03:57] and partial assignment uh to [00:03:59] uh to the parents of a node and the node [00:04:01] the parents of a node and the node itself [00:04:02] itself and we're simply going to add lambda [00:04:04] and we're simply going to add lambda to the count [00:04:07] to the count now we do maximum likelihood estimation [00:04:09] now we do maximum likelihood estimation as usual so we're going to go through [00:04:12] as usual so we're going to go through the training data and increment those [00:04:14] the training data and increment those counts based on what we saw [00:04:16] counts based on what we saw and then we count and normalize and [00:04:17] and then we count and normalize and that's it [00:04:19] that's it so the interpretation that we can place [00:04:22] so the interpretation that we can place on the plot smoothing is it's like we [00:04:24] on the plot smoothing is it's like we hallucinated [00:04:26] hallucinated lambda occurrences of each local [00:04:28] lambda occurrences of each local assignment so sometimes these uh lambda [00:04:31] assignment so sometimes these uh lambda accounts are called pseudo counts [00:04:32] accounts are called pseudo counts because they're not based on the data [00:04:34] because they're not based on the data they're kind of made up for virtual [00:04:36] they're kind of made up for virtual counts [00:04:37] counts so you can think about pretending you [00:04:39] so you can think about pretending you saw some examples before you saw data [00:04:42] saw some examples before you saw data and then do a maximum likelihood image [00:04:48] so [00:04:48] so how [00:04:49] how much [00:04:51] much should lambda be how much smoothing [00:04:52] should lambda be how much smoothing should we have and how does it interact [00:04:54] should we have and how does it interact with the um there's two observations i [00:04:57] with the um there's two observations i want to make first is that the more [00:05:00] want to make first is that the more you smooth which means that the more the [00:05:02] you smooth which means that the more the bigger the lambda is [00:05:04] bigger the lambda is the closer you're going to push the [00:05:06] the closer you're going to push the probability estimates closer to the [00:05:07] probability estimates closer to the uniform distribution so for example [00:05:11] uniform distribution so for example if i just smooth with lambda equals [00:05:13] if i just smooth with lambda equals one-half [00:05:14] one-half then and i observe only a d here then [00:05:17] then and i observe only a d here then the probabilities estimates are going to [00:05:18] the probabilities estimates are going to be three quarters in one corner whereas [00:05:21] be three quarters in one corner whereas if i smooth with one then the [00:05:24] if i smooth with one then the probabilities are going to be two-thirds [00:05:25] probabilities are going to be two-thirds and one-third which is closer to [00:05:27] and one-third which is closer to half-half [00:05:29] half-half the second observation i want to make is [00:05:31] the second observation i want to make is that no matter what you set lambda to [00:05:34] that no matter what you set lambda to the data wins out in the end [00:05:36] the data wins out in the end so suppose we only see examples of [00:05:39] so suppose we only see examples of dramas [00:05:41] dramas so suppose that we're smoothing with [00:05:44] so suppose that we're smoothing with lambda equals one and we saw only one [00:05:46] lambda equals one and we saw only one example of g equals [00:05:49] example of g equals then again the problem estimates are [00:05:51] then again the problem estimates are two-thirds and one-third [00:05:53] two-thirds and one-third but suppose we keep on seeing dramas [00:05:56] but suppose we keep on seeing dramas over and over again so we see 998 of [00:06:00] over and over again so we see 998 of them [00:06:00] them and now if we account and [00:06:03] and now if we account and normalize then we get the probability [00:06:05] normalize then we get the probability estimate as 0.999 [00:06:07] estimate as 0.999 uh [00:06:08] uh of drama which is much closer to [00:06:11] of drama which is much closer to scene-only dramas [00:06:16] so to summarize [00:06:20] uh we [00:06:21] uh we looked at laplace smoothing for avoiding [00:06:24] looked at laplace smoothing for avoiding overfitting and estimating beijing [00:06:26] overfitting and estimating beijing networks [00:06:27] networks and the key idea is that [00:06:29] and the key idea is that we preload counts with a lambda [00:06:33] we preload counts with a lambda and then we're going to go to training [00:06:34] and then we're going to go to training data and we add [00:06:36] data and we add counts based on our data and then we [00:06:39] counts based on our data and then we normalize [00:06:41] normalize so [00:06:42] so the smoothing pulls us away [00:06:45] the smoothing pulls us away from zeros to the uniform distribution [00:06:49] from zeros to the uniform distribution but in the end [00:06:50] but in the end all the smoothing [00:06:52] all the smoothing gets washed out with more data [00:06:55] gets washed out with more data that's the end ================================================================================ LECTURE 040 ================================================================================ Bayesian Networks 9 - EM Algorithm | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=CPVFJBd-Qcg --- Transcript [00:00:06] hi in this module I'm going to talk [00:00:07] hi in this module I'm going to talk about the EM algorithm for learning ban [00:00:09] about the EM algorithm for learning ban networks when we have unobserved [00:00:10] networks when we have unobserved variables our training [00:00:12] variables our training data so let's start with our familiar [00:00:15] data so let's start with our familiar movie raing example so here remember [00:00:17] movie raing example so here remember this basan network we have a genre which [00:00:20] this basan network we have a genre which could be drama or comedy and we have two [00:00:23] could be drama or comedy and we have two people Jim and Martha who are going to [00:00:25] people Jim and Martha who are going to rate produce ratings of this movie going [00:00:29] rate produce ratings of this movie going to denote as R1 and [00:00:31] to denote as R1 and R2 and before when we observe all the [00:00:34] R2 and before when we observe all the variables in our training data we could [00:00:36] variables in our training data we could just use maximum likelihood which um [00:00:39] just use maximum likelihood which um amounts to counting and [00:00:40] amounts to counting and normalizing but this only works if we [00:00:43] normalizing but this only works if we observe all the variables in each [00:00:45] observe all the variables in each training example so data collection is [00:00:49] training example so data collection is expensive what happens if we don't [00:00:51] expensive what happens if we don't observe some of the for example what it [00:00:53] observe some of the for example what it happens if we don't know the genre of [00:00:56] happens if we don't know the genre of the movies but we only observe the pairs [00:00:59] the movies but we only observe the pairs of ratings of Martha and Jim so what can [00:01:04] of ratings of Martha and Jim so what can we do in this case you know intuitively [00:01:07] we do in this case you know intuitively it seems kind of hopeless how can we [00:01:09] it seems kind of hopeless how can we learn a basan network relating G and R [00:01:12] learn a basan network relating G and R when we don't even see examples of G but [00:01:16] when we don't even see examples of G but we'll show that this is actually [00:01:19] we'll show that this is actually possible in many cases well certainly [00:01:21] possible in many cases well certainly not all cases and that's kind of the [00:01:23] not all cases and that's kind of the magic of em and unsupervised learning in [00:01:28] magic of em and unsupervised learning in general so let's try to approach this [00:01:31] general so let's try to approach this problem top down what are the principles [00:01:33] problem top down what are the principles that we have well maximum likelihood was [00:01:36] that we have well maximum likelihood was uh something that served quite well so [00:01:38] uh something that served quite well so let's try to see if we can make that [00:01:41] let's try to see if we can make that work so generally we have a set of [00:01:45] work so generally we have a set of variables which are hidden called Big H [00:01:47] variables which are hidden called Big H and we also have some variables Big E [00:01:50] and we also have some variables Big E which are observed so in this a movie [00:01:54] which are observed so in this a movie rating example we have G is the hidden [00:01:57] rating example we have G is the hidden variable the two ratings is a um observe [00:02:00] variable the two ratings is a um observe variable and we have some little e [00:02:03] variable and we have some little e denoting what they're observed to and in [00:02:06] denoting what they're observed to and in this case we have remember the set of [00:02:08] this case we have remember the set of parameters which is the probability of G [00:02:11] parameters which is the probability of G and the probability of R [00:02:13] and the probability of R given so [00:02:15] given so the the principle of Maximum marginal [00:02:19] the the principle of Maximum marginal likelihood says well just maximize the [00:02:23] likelihood says well just maximize the probability of the data tweak the [00:02:25] probability of the data tweak the parameters to make that probability as [00:02:27] parameters to make that probability as high as possible so what does means for [00:02:30] high as possible so what does means for us is that we're going to try to find [00:02:31] us is that we're going to try to find the Theta that maximizes the probability [00:02:36] the Theta that maximizes the probability over all the [00:02:38] over all the observed observations that we have in [00:02:40] observed observations that we have in the training data of the probability of [00:02:43] the training data of the probability of that observation given data so this [00:02:47] that observation given data so this looks very much like maximum likelihood [00:02:49] looks very much like maximum likelihood with the exception that we are [00:02:52] with the exception that we are marginalizing out the hidden variable [00:02:56] marginalizing out the hidden variable and uh just to spell this out what this [00:02:58] and uh just to spell this out what this quantity really is is is the summation [00:03:02] quantity really is is is the summation over possible values of the Hidden [00:03:05] over possible values of the Hidden variables of H equal H [00:03:10] and okay so this is the principle that [00:03:13] and okay so this is the principle that we want to adhere [00:03:15] we want to adhere to so it turns out that the EM algorithm [00:03:19] to so it turns out that the EM algorithm is one way of trying to optimize this um [00:03:24] is one way of trying to optimize this um objective but we're going to try to [00:03:26] objective but we're going to try to motivate em in a more intuitive way um [00:03:30] motivate em in a more intuitive way um so em you should think about it as a [00:03:32] so em you should think about it as a generalization of the K means algorithm [00:03:36] generalization of the K means algorithm and remember in K means for clustering [00:03:38] and remember in K means for clustering we also had a a similar problem where we [00:03:41] we also had a a similar problem where we have cluster centroids and cluster [00:03:44] have cluster centroids and cluster assignments both of which we didn't know [00:03:47] assignments both of which we didn't know in our case the cluster centroids are [00:03:50] in our case the cluster centroids are going to be generalized to parameters of [00:03:52] going to be generalized to parameters of evasion Network in general and the [00:03:54] evasion Network in general and the cluster assignments are going to be [00:03:55] cluster assignments are going to be generalized to the hidden variables [00:04:01] so here are the the variables we have e [00:04:05] so here are the the variables we have e h and here is the expectation [00:04:07] h and here is the expectation maximization algorithm or otherwise [00:04:10] maximization algorithm or otherwise known as em so we're first going to [00:04:13] known as em so we're first going to initialize the parameters [00:04:15] initialize the parameters randomly and then we're going to repeat [00:04:17] randomly and then we're going to repeat until [00:04:18] until convergence um it's going to alternate [00:04:21] convergence um it's going to alternate between two steps the Eep and the M step [00:04:24] between two steps the Eep and the M step in the Eep what we're going to first do [00:04:27] in the Eep what we're going to first do is try to use the parameters to guess [00:04:31] is try to use the parameters to guess the hidden variables so we're going to [00:04:33] the hidden variables so we're going to compute Q of H is going to be a [00:04:37] compute Q of H is going to be a distribution over possible values that [00:04:39] distribution over possible values that the hidden variables could take on and [00:04:42] the hidden variables could take on and this is going to be equal to Simply the [00:04:45] this is going to be equal to Simply the probability of the Hidden variable [00:04:47] probability of the Hidden variable condition on um the evidence or [00:04:50] condition on um the evidence or observations that we [00:04:52] observations that we saw and again this is depends on the [00:04:56] saw and again this is depends on the parameters um at the current iteration [00:05:00] parameters um at the current iteration and we're going to do this for every [00:05:02] and we're going to do this for every possible uh value of H how do we do this [00:05:06] possible uh value of H how do we do this well we've already seen how we can [00:05:09] well we've already seen how we can compute these type of quantities given a [00:05:11] compute these type of quantities given a fixed Pion Network and this is called [00:05:13] fixed Pion Network and this is called protic inference so in case h is small [00:05:17] protic inference so in case h is small we can just do it kind of through Force [00:05:20] we can just do it kind of through Force if uh H is hmm we can use the network as [00:05:24] if uh H is hmm we can use the network as hmm we can use forward backward in [00:05:26] hmm we can use forward backward in general we can use Gib sampling Etc [00:05:30] general we can use Gib sampling Etc so now what do we have we have um these [00:05:34] so now what do we have we have um these weights for [00:05:36] weights for every H and now we can create fully [00:05:40] every H and now we can create fully observed examples now we can just pair a [00:05:44] observed examples now we can just pair a particular H with our observations and [00:05:47] particular H with our observations and put a weight next to that example and [00:05:52] put a weight next to that example and the important thing is now now we have a [00:05:54] the important thing is now now we have a set of weighted examples which are fully [00:05:56] set of weighted examples which are fully observed and what do we know how do we [00:05:59] observed and what do we know how do we deal with fully observed examples well [00:06:01] deal with fully observed examples well we can do maximum likelihood now uh so [00:06:05] we can do maximum likelihood now uh so we take these weighted examples and then [00:06:07] we take these weighted examples and then we just count and normalize and that [00:06:10] we just count and normalize and that gives us a fresh set of parameters which [00:06:13] gives us a fresh set of parameters which we then can go back and repeat um the E [00:06:16] we then can go back and repeat um the E step and the M step over and over [00:06:20] step and the M step over and over again so the EM algorithm is guarantee [00:06:23] again so the EM algorithm is guarantee to converge through a local Optimum just [00:06:25] to converge through a local Optimum just like K means but it can get stuck in [00:06:28] like K means but it can get stuck in local Optimum and not actually solve uh [00:06:31] local Optimum and not actually solve uh the global optimization [00:06:34] the global optimization problem so let's do an example we're [00:06:37] problem so let's do an example we're just going to do one iteration of em on [00:06:40] just going to do one iteration of em on our sample Basia Network so suppose our [00:06:43] our sample Basia Network so suppose our training data includes two examples um [00:06:47] training data includes two examples um R2 uh R1 and R2 equals 2 and two and a [00:06:51] R2 uh R1 and R2 equals 2 and two and a second example where it's one and two [00:06:53] second example where it's one and two and the genre is [00:06:56] and the genre is unobserved okay so suppose we have [00:06:59] unobserved okay so suppose we have parameters [00:07:00] parameters um that look like this so um probability [00:07:03] um that look like this so um probability of G is just [00:07:05] of G is just uniform and probability of R given G is [00:07:10] uniform and probability of R given G is this by this [00:07:12] this by this table okay so now we're going to do the [00:07:15] table okay so now we're going to do the Eep so remember the Eep is trying to [00:07:18] Eep so remember the Eep is trying to guess what G is given each of these [00:07:22] guess what G is given each of these examples because we don't know what uh [00:07:25] examples because we don't know what uh so let's look at [00:07:27] so let's look at 22 okay so the first example example [00:07:30] 22 okay so the first example example well G could be either C or D so there's [00:07:34] well G could be either C or D so there's two possibilities and for each one I'm [00:07:36] two possibilities and for each one I'm going to compute the probability of the [00:07:40] going to compute the probability of the joint assignment for [00:07:43] joint assignment for now so here I'm just going to by [00:07:46] now so here I'm just going to by definition of uh the be Network this is [00:07:49] definition of uh the be Network this is going to be um probability of g equals c [00:07:54] going to be um probability of g equals c that's .5 times probability of G Ral to [00:08:00] that's .5 times probability of G Ral to given uh c g gal C that's 6 um and then [00:08:08] given uh c g gal C that's 6 um and then R2 equals to given G that's 6 that gives [00:08:13] R2 equals to given G that's 6 that gives me8 and now we look at the other uh [00:08:16] me8 and now we look at the other uh possibility which is gals D and the [00:08:19] possibility which is gals D and the probability of gals D is [00:08:24] probability of gals D is .5 and the probability of R1 = 2 given [00:08:28] .5 and the probability of R1 = 2 given gal d [00:08:30] gal d that is [00:08:32] that is um4 here uh4 down here and a probability [00:08:38] um4 here uh4 down here and a probability of R2 equal 2 given g c that's also [00:08:41] of R2 equal 2 given g c that's also another [00:08:42] another point4 okay so now I have uh these [00:08:46] point4 okay so now I have uh these probabilities next to each of these [00:08:49] probabilities next to each of these possible um extensions of this [00:08:53] possible um extensions of this assignment um now I can normalize and [00:08:56] assignment um now I can normalize and that's how I get my uh Q distribution so [00:09:01] that's how I get my uh Q distribution so if I normalize distribution I'm going to [00:09:02] if I normalize distribution I'm going to get 69 and [00:09:06] get 69 and point there's more [00:09:09] point there's more probability Mass on g equals c if I were [00:09:14] probability Mass on g equals c if I were to guess then g equals [00:09:17] to guess then g equals D okay so now I move on to the second [00:09:19] D okay so now I move on to the second data point um and I'm going to do the [00:09:21] data point um and I'm going to do the same thing so one two could either be C [00:09:25] same thing so one two could either be C or and I'm going to compute the prob of [00:09:29] or and I'm going to compute the prob of each possible assignment to G so I have [00:09:34] each possible assignment to G so I have a probability of gal C that's .5 times [00:09:38] a probability of gal C that's .5 times probability of R1 = 1 given g c that's [00:09:44] probability of R1 = 1 given g c that's 04 um and I have what is the probability [00:09:47] 04 um and I have what is the probability of R2 = 2 given gal C that's 6 and [00:09:53] of R2 = 2 given gal C that's 6 and analogously I can compute the same [00:09:55] analogously I can compute the same quantity for g c and again I [00:10:00] quantity for g c and again I normalize and I get 05 [00:10:03] normalize and I get 05 and5 okay so at this point at the end of [00:10:06] and5 okay so at this point at the end of the Eep what I have are four flesh to [00:10:10] the Eep what I have are four flesh to data points I started with two data [00:10:13] data points I started with two data points but it's kind of been expanded [00:10:15] points but it's kind of been expanded into the possible continuations of of G [00:10:19] into the possible continuations of of G and each data point is weighted by some [00:10:23] and each data point is weighted by some probability Q of G which is essentially [00:10:27] probability Q of G which is essentially how much I think that that uh um data [00:10:31] how much I think that that uh um data point is valid in some [00:10:33] point is valid in some sense okay so now we move on to the m [00:10:36] sense okay so now we move on to the m step and the mstep is just going to take [00:10:39] step and the mstep is just going to take these four data points and count them up [00:10:42] these four data points and count them up and normalize so this should be very [00:10:45] and normalize so this should be very familiar so first we're going to [00:10:47] familiar so first we're going to estimate the probability of G so G can [00:10:51] estimate the probability of G so G can take on two values C and D so I count [00:10:54] take on two values C and D so I count them up so where are the how many times [00:10:57] them up so where are the how many times did uh gals C occur well it shows up in [00:11:01] did uh gals C occur well it shows up in the first and the third data points and [00:11:04] the first and the third data points and I'm just going to add their weights [00:11:05] I'm just going to add their weights together which is [00:11:07] together which is 69 and [00:11:10] 69 and .5 and what about D well gals D shows up [00:11:14] .5 and what about D well gals D shows up in the second and the fourth rows and [00:11:16] in the second and the fourth rows and that's. 31 [00:11:18] that's. 31 +0.5 and then I'm just going to uh [00:11:21] +0.5 and then I'm just going to uh normalize this into an actual [00:11:25] normalize this into an actual distribution so now I move on to the [00:11:28] distribution so now I move on to the probability of R given G so for each [00:11:31] probability of R given G so for each possible configuration here I'm going to [00:11:34] possible configuration here I'm going to count so C1 shows up um here once and [00:11:41] count so C1 shows up um here once and that has a weight of five um C what [00:11:45] that has a weight of five um C what about C2 C2 shows up three times one [00:11:49] about C2 C2 shows up three times one here one with R2 and one down here if I [00:11:54] here one with R2 and one down here if I add the weights of those I'm going to [00:11:57] add the weights of those I'm going to get 05 Plus 69 plus 69 remember notice [00:12:01] get 05 Plus 69 plus 69 remember notice that this uh example is used twice [00:12:05] that this uh example is used twice because um I'm generating two twice from [00:12:10] because um I'm generating two twice from C okay so now I have uh these counts I [00:12:14] C okay so now I have uh these counts I normalize this uh distribution to get a [00:12:18] normalize this uh distribution to get a distribution of R given g equals c so [00:12:24] distribution of R given g equals c so now I move on to what happens when um G [00:12:28] now I move on to what happens when um G is d so I look at D1 D1 shows up uh once [00:12:33] is d so I look at D1 D1 shows up uh once here with weight .5 and what about d = 2 [00:12:37] here with weight .5 and what about d = 2 well that shows up three times um twice [00:12:42] well that shows up three times um twice here 2.3 ons and then once here I have [00:12:47] here 2.3 ons and then once here I have another 0.5 I'm going to add normalize [00:12:50] another 0.5 I'm going to add normalize and I get a [00:12:52] and I get a distribution so the only difference [00:12:54] distribution so the only difference between maximum likelihood and the M [00:12:57] between maximum likelihood and the M step is that now I'm adding these [00:12:59] step is that now I'm adding these fractional counts rather than um integer [00:13:02] fractional counts rather than um integer counts but otherwise the logic and the [00:13:05] counts but otherwise the logic and the code is exactly um the [00:13:07] code is exactly um the same so what have we done stepping back [00:13:10] same so what have we done stepping back a little bit intuitively we've gone from [00:13:13] a little bit intuitively we've gone from a preliminary set of um parameters and [00:13:17] a preliminary set of um parameters and I'm guessing what G is and then using [00:13:19] I'm guessing what G is and then using that guess of G to further refine my [00:13:22] that guess of G to further refine my estimate of the parameters and you'll [00:13:24] estimate of the parameters and you'll see that uh the parameters over here [00:13:28] see that uh the parameters over here were uh 4 and 6 and now they've been [00:13:31] were uh 4 and 6 and now they've been pushed um to0 2 [00:13:35] pushed um to0 2 and8 so in general em is going to uh it [00:13:39] and8 so in general em is going to uh it tends to polarize the probabilities [00:13:42] tends to polarize the probabilities because that's the best way to maximize [00:13:45] because that's the best way to maximize the likelihood of the data and now this [00:13:49] the likelihood of the data and now this is just one iteration of BM now I would [00:13:51] is just one iteration of BM now I would take these parameters and go through the [00:13:53] take these parameters and go through the same process and go through the same [00:13:54] same process and go through the same process until I've um converted [00:14:01] okay so now let's turn to an interesting [00:14:04] okay so now let's turn to an interesting application of of em and that's [00:14:08] application of of em and that's decipherment so this is an example of a [00:14:10] decipherment so this is an example of a ciphers it's called a copal cipher 105 [00:14:14] ciphers it's called a copal cipher 105 page encrypted volume dating back from [00:14:16] page encrypted volume dating back from the 1730s it looks like this so for a [00:14:19] the 1730s it looks like this so for a long time no one knew what was what [00:14:22] long time no one knew what was what these words were um it was finally [00:14:26] these words were um it was finally cracked in [00:14:27] cracked in 2011 with the help of Em by Kevin Knight [00:14:31] 2011 with the help of Em by Kevin Knight an NLP researcher so the kobio cipher is [00:14:34] an NLP researcher so the kobio cipher is actually very complex so what we're [00:14:36] actually very complex so what we're going to do is motivate the idea of [00:14:39] going to do is motivate the idea of using basan networks for decipherment [00:14:41] using basan networks for decipherment with a simple substitution [00:14:43] with a simple substitution Cipher so the idea behind a substitution [00:14:46] Cipher so the idea behind a substitution Cipher is that suppose you wanted to [00:14:49] Cipher is that suppose you wanted to send an encrypted message to someone so [00:14:51] send an encrypted message to someone so you're you're going to generate uh a [00:14:54] you're you're going to generate uh a substitution table which specifies how [00:14:57] substitution table which specifies how each letter uh uh gets transformed into [00:15:01] each letter uh uh gets transformed into another letter Cipher is going to be a [00:15:03] another letter Cipher is going to be a permutation of all the [00:15:05] permutation of all the letter um and then you have a message [00:15:09] letter um and then you have a message you want to send suppose you want to say [00:15:11] you want to send suppose you want to say hello [00:15:12] hello World um you're going to use this [00:15:14] World um you're going to use this substitution table table apply it to [00:15:16] substitution table table apply it to this plain text to produce a cipher text [00:15:19] this plain text to produce a cipher text and this is done by taking a mapping H [00:15:23] and this is done by taking a mapping H to n um e to m l to y and L to Y and O [00:15:30] to n um e to m l to y and L to Y and O to T and so on so now you hide this [00:15:34] to T and so on so now you hide this substitution T table and then you hand [00:15:37] substitution T table and then you hand someone the cipher text or you put in a [00:15:39] someone the cipher text or you put in a book and bury it for someone to discover [00:15:42] book and bury it for someone to discover later so now the question is when [00:15:46] later so now the question is when someone receives a cipher text uh is [00:15:49] someone receives a cipher text uh is given the cipher text can they recover [00:15:52] given the cipher text can they recover the pl text importantly the pl text is [00:15:55] the pl text importantly the pl text is obviously unknown but also the [00:15:57] obviously unknown but also the substitution table is also [00:16:00] substitution table is also unknown so this is a very challenging [00:16:03] unknown so this is a very challenging problem but let's see how we can use uh [00:16:06] problem but let's see how we can use uh asan networks in particular hmm to try [00:16:09] asan networks in particular hmm to try to address [00:16:10] to address this so remember the process of using [00:16:13] this so remember the process of using hmm you have to think about what is the [00:16:15] hmm you have to think about what is the generative story of how this data [00:16:19] generative story of how this data arose so I'm going to model this as [00:16:21] arose so I'm going to model this as follows I'm going to have a sequence of [00:16:24] follows I'm going to have a sequence of letters which are plain [00:16:27] letters which are plain text but and these are hi hidden [00:16:30] text but and these are hi hidden um and we have a corresponding sequence [00:16:33] um and we have a corresponding sequence of characters in the cipher text and I'm [00:16:36] of characters in the cipher text and I'm going to define a joint distribution [00:16:38] going to define a joint distribution over all these by first generating uh [00:16:42] over all these by first generating uh the the plain text letters according to [00:16:46] the the plain text letters according to a Markoff uh [00:16:48] a Markoff uh model via by uh start and a bunch of [00:16:52] model via by uh start and a bunch of Transitions and then for each plain text [00:16:56] Transitions and then for each plain text uh letter I'm going to generate a cipher [00:16:58] uh letter I'm going to generate a cipher text letter [00:16:59] text letter via some [00:17:01] via some emission so the parameters of the hmm [00:17:04] emission so the parameters of the hmm remember are the probability of start [00:17:06] remember are the probability of start the probability of transition and the [00:17:08] the probability of transition and the probability of [00:17:10] probability of emission okay so [00:17:12] emission okay so intuitively the transitions are going to [00:17:14] intuitively the transitions are going to capture kind of the cohesion of plain [00:17:16] capture kind of the cohesion of plain text because it's actually supposed to [00:17:18] text because it's actually supposed to be readable and has structures not [00:17:20] be readable and has structures not random letters and the emission is going [00:17:24] random letters and the emission is going to distribution is going to capture the [00:17:26] to distribution is going to capture the substitution table [00:17:30] so how are we going to uh um estimate [00:17:35] so how are we going to uh um estimate this HML okay so first of all we're [00:17:39] this HML okay so first of all we're going to make some just uh simplifying [00:17:42] going to make some just uh simplifying choices here but we'll show that it's [00:17:44] choices here but we'll show that it's kind of sufficient so we're going to set [00:17:46] kind of sufficient so we're going to set a p start to uniform um you could be a [00:17:49] a p start to uniform um you could be a little bit more clever but let's it I'm [00:17:51] little bit more clever but let's it I'm just going to leave it alone just for [00:17:54] just going to leave it alone just for Simplicity then the transition [00:17:57] Simplicity then the transition probabilities so these are are uh this [00:18:00] probabilities so these are are uh this specifies a biogram model over [00:18:02] specifies a biogram model over characters and this is this model tells [00:18:06] characters and this is this model tells you what looks like uh English or not [00:18:10] you what looks like uh English or not and the really cool thing about this is [00:18:12] and the really cool thing about this is that if we know the plain text is [00:18:14] that if we know the plain text is supposed to be English we can just go [00:18:16] supposed to be English we can just go and grab a ton of English and estimate [00:18:19] and grab a ton of English and estimate uh a distribution over that text and [00:18:22] uh a distribution over that text and that gives us PTR we don't even look at [00:18:25] that gives us PTR we don't even look at the cipher [00:18:26] the cipher text and then finally the key [00:18:29] text and then finally the key uh part is that the emission [00:18:32] uh part is that the emission distribution is the substitution table [00:18:35] distribution is the substitution table and that's what we're going to estimate [00:18:37] and that's what we're going to estimate from em so notice that P emission is [00:18:41] from em so notice that P emission is actually more General than a [00:18:42] actually more General than a substitution it says for every plain [00:18:45] substitution it says for every plain text character I can actually generate a [00:18:48] text character I can actually generate a distribution over Cipher text letters [00:18:52] distribution over Cipher text letters whereas a substitution table says [00:18:54] whereas a substitution table says there's exactly one and this is more out [00:18:56] there's exactly one and this is more out of convenience because makes [00:18:59] of convenience because makes optimization easier um but in principle [00:19:02] optimization easier um but in principle you can also think about pit as being [00:19:04] you can also think about pit as being constrained to just a onetoone [00:19:07] constrained to just a onetoone mapping okay so why do we think that [00:19:11] mapping okay so why do we think that this will work [00:19:13] this will work intuitively well so the transition [00:19:17] intuitively well so the transition distributions which we've already [00:19:18] distributions which we've already estimated on English is going to favor [00:19:20] estimated on English is going to favor plain text that look like English while [00:19:23] plain text that look like English while the emission distribution is going to [00:19:25] the emission distribution is going to try to favor consistent character [00:19:27] try to favor consistent character substitutions so we don't want it to be [00:19:29] substitutions so we don't want it to be the case that a maps to a t here and a v [00:19:33] the case that a maps to a t here and a v here and a f there we want some [00:19:35] here and a f there we want some consistency and by having this emission [00:19:37] consistency and by having this emission distribution and maximizing likelihood [00:19:40] distribution and maximizing likelihood it's going to try to encourage that kind [00:19:42] it's going to try to encourage that kind of consistency so we have these two [00:19:44] of consistency so we have these two forces kind of at play each other with [00:19:47] forces kind of at play each other with each other while we're trying to [00:19:48] each other while we're trying to estimate both the hidden variables and [00:19:51] estimate both the hidden variables and the [00:19:54] parameters okay so [00:19:56] parameters okay so um let's try to actually step into the [00:20:00] um let's try to actually step into the EM algorithm and say what kind of [00:20:02] EM algorithm and say what kind of computations are needed to estimate this [00:20:05] computations are needed to estimate this hmm so in the Eep what I need to do is [00:20:09] hmm so in the Eep what I need to do is to compute um the distribution over the [00:20:13] to compute um the distribution over the hidden variables condition on the [00:20:17] hidden variables condition on the observations and to do that we [00:20:19] observations and to do that we introduced the forward backward [00:20:21] introduced the forward backward algorithm a while back and for ofac [00:20:24] algorithm a while back and for ofac algorithm is Computing these smoothing [00:20:26] algorithm is Computing these smoothing queries which is exactly what's the [00:20:29] queries which is exactly what's the probability of a plain text letter being [00:20:32] probability of a plain text letter being a particular value H um given the cipher [00:20:37] a particular value H um given the cipher text that we uh [00:20:40] text that we uh observe and I'm going to uh do this for [00:20:44] observe and I'm going to uh do this for each position in this in this text in [00:20:48] each position in this in this text in the cipher text and every potential [00:20:51] the cipher text and every potential character so I'm going to Define QI of H [00:20:54] character so I'm going to Define QI of H to be this um probability [00:20:59] to be this um probability so this is my best guess at at a [00:21:01] so this is my best guess at at a particular um location what what do I [00:21:04] particular um location what what do I think the the plain text character [00:21:07] think the the plain text character is so now given these guesses the M step [00:21:11] is so now given these guesses the M step is going to reestimate the substitution [00:21:13] is going to reestimate the substitution table or the mission distribution so I'm [00:21:16] table or the mission distribution so I'm going to count a fractional count and [00:21:19] going to count a fractional count and normalize for all the characters um e [00:21:22] normalize for all the characters um e and H okay so um for every possible um [00:21:29] and H okay so um for every possible um plain text letter and every Cipher text [00:21:32] plain text letter and every Cipher text letter I'm going to look at all the [00:21:35] letter I'm going to look at all the positions where the cipher text was [00:21:38] positions where the cipher text was actually e and I'm going to add this [00:21:42] actually e and I'm going to add this probability or weight UI of [00:21:46] probability or weight UI of H okay so this is going to tell me how [00:21:49] H okay so this is going to tell me how many times on in expectation we believe [00:21:52] many times on in expectation we believe that a particular plain text letter and [00:21:55] that a particular plain text letter and a particular Cipher text letter are uh [00:21:59] a particular Cipher text letter are uh together and now I'm just going to [00:22:02] together and now I'm just going to normalize this distribution so P emit of [00:22:05] normalize this distribution so P emit of a cipher text letter given a PL text [00:22:06] a cipher text letter given a PL text letter is proportional to this count [00:22:09] letter is proportional to this count emit of H and [00:22:14] E okay so that's it and we just run the [00:22:17] E okay so that's it and we just run the EM algorithm and we hope for the best [00:22:20] EM algorithm and we hope for the best okay so just to make this a little bit [00:22:21] okay so just to make this a little bit more exciting I'm going to try to code [00:22:25] more exciting I'm going to try to code this up in Python so we can see it an [00:22:27] this up in Python so we can see it an action all right so a few things first [00:22:30] action all right so a few things first so here is our Cipher text uh you [00:22:33] so here is our Cipher text uh you shouldn't be able to read this and we [00:22:35] shouldn't be able to read this and we but we're going to try to decipher this [00:22:38] but we're going to try to decipher this um and we also have this lm. train which [00:22:43] um and we also have this lm. train which is this uh form called large amount of [00:22:46] is this uh form called large amount of um English text uh that we can draw [00:22:51] um English text uh that we can draw from okay so um I'm going to um we also [00:22:55] from okay so um I'm going to um we also have this utility file which I'll just [00:22:57] have this utility file which I'll just review so it allows you to read text um [00:23:00] review so it allows you to read text um we're going to convert text into a [00:23:02] we're going to convert text into a sequence of integers just for um [00:23:05] sequence of integers just for um Simplicity and uh we also importantly [00:23:08] Simplicity and uh we also importantly have implemented this forward backward [00:23:10] have implemented this forward backward algorithm which is going to take a [00:23:11] algorithm which is going to take a sequence of observations and the [00:23:14] sequence of observations and the parameters of hmm and it's going to [00:23:17] parameters of hmm and it's going to return Q which is a two-dimensional [00:23:20] return Q which is a two-dimensional array uh where it's Qi for each position [00:23:24] array uh where it's Qi for each position we have a distribution over um possible [00:23:28] we have a distribution over um possible values of age [00:23:29] values of age five okay so let's decipher let's [00:23:35] five okay so let's decipher let's decipher some um some Cipher text okay [00:23:39] decipher some um some Cipher text okay so import etail so I'm going to uh [00:23:45] so import etail so I'm going to uh declared K to be the number of uh [00:23:48] declared K to be the number of uh characters so this is a lowercase [00:23:51] characters so this is a lowercase letters up plus space um so I've [00:23:54] letters up plus space um so I've normalized the text um the first thing I [00:23:58] normalized the text um the first thing I want to do is initialize the [00:24:00] want to do is initialize the hmm so remember the parameters of hmm I [00:24:03] hmm so remember the parameters of hmm I have start um [00:24:06] have start um probabilities so this is going to be P [00:24:09] probabilities so this is going to be P start of H and I'm just going to set [00:24:12] start of H and I'm just going to set this to the uniform distribution and all [00:24:14] this to the uniform distribution and all the DAT so start probs equals 1 / k [00:24:19] the DAT so start probs equals 1 / k for um H and range uh [00:24:24] for um H and range uh K so that's going to be just the uniform [00:24:27] K so that's going to be just the uniform distribution [00:24:32] okay so now what about the transition [00:24:35] okay so now what about the transition probabilities so transition probably [00:24:37] probabilities so transition probably goes from H1 to [00:24:38] goes from H1 to H2 um and this is p of H2 given H1 so [00:24:43] H2 um and this is p of H2 given H1 so not the order is Switched here because I [00:24:46] not the order is Switched here because I want transition problems of each one to [00:24:48] want transition problems of each one to be actually an array which specifies the [00:24:50] be actually an array which specifies the distribution ARR [00:24:52] distribution ARR two um so here we're going [00:24:55] two um so here we're going to uh estimate this from um plain [00:25:01] to uh estimate this from um plain text so I'm going to have raw text this [00:25:05] text so I'm going to have raw text this is I'm going to read it from lm. [00:25:08] is I'm going to read it from lm. Trin uh which we saw earlier and I'm [00:25:11] Trin uh which we saw earlier and I'm going to convert it into a integer [00:25:13] going to convert it into a integer sequence okay so let's see what that [00:25:17] sequence okay so let's see what that looks like so that's just a sequence of [00:25:20] looks like so that's just a sequence of integers um okay so now I'm going to [00:25:25] integers um okay so now I'm going to estimate uh pans from this R text [00:25:29] estimate uh pans from this R text so this is actually going to be just a [00:25:32] so this is actually going to be just a standard um fully observable estimation [00:25:35] standard um fully observable estimation problem um so I'm going to look over all [00:25:41] problem um so I'm going to look over all um [00:25:43] um positions um starting from one to the [00:25:46] positions um starting from one to the end and then I'm going to Define [00:25:50] end and then I'm going to Define um look at H1 and H2 to be consecutive [00:25:55] um look at H1 and H2 to be consecutive or uh [00:25:57] or uh consecutive um characters in this raw [00:26:00] consecutive um characters in this raw character [00:26:01] character sequence and then I'm going to increment [00:26:05] sequence and then I'm going to increment a counter so I'm going to define [00:26:07] a counter so I'm going to define transition counts to be uh for each H1 [00:26:12] transition counts to be uh for each H1 in range uh K and then for each H2 in [00:26:18] in range uh K and then for each H2 in range [00:26:19] range K I'm going to have a [00:26:22] K I'm going to have a zero okay so this is going to be a k byk [00:26:25] zero okay so this is going to be a k byk zero Matrix [00:26:28] zero Matrix um and then I'm going to just increment [00:26:30] um and then I'm going to just increment this count Sol this count once and then [00:26:34] this count Sol this count once and then I'm going to normalize so the way I'm [00:26:37] I'm going to normalize so the way I'm going to normalize is I'm going to [00:26:38] going to normalize is I'm going to define transition probabilities to be [00:26:41] define transition probabilities to be for each [00:26:43] for each H1 I'm going to call uh normalize on [00:26:49] H1 I'm going to call uh normalize on transition [00:26:50] transition counts counts of [00:26:53] counts counts of H1 okay so for every H1 uh this gives me [00:26:57] H1 okay so for every H1 uh this gives me a distribution over 2 uh if I normalize [00:27:00] a distribution over 2 uh if I normalize it that's going to be my transition [00:27:03] it that's going to be my transition probability I'm done with transition [00:27:05] probability I'm done with transition probabilities so what about emission [00:27:10] probabilities so what about emission probabilities um so here I have for [00:27:13] probabilities um so here I have for every H I have a distribution over e so [00:27:17] every H I have a distribution over e so this is going to be just to write it out [00:27:18] this is going to be just to write it out in our mathematical language this is [00:27:20] in our mathematical language this is going to be uh e given [00:27:23] going to be uh e given H okay so here I just want to initialize [00:27:27] H okay so here I just want to initialize it to um the uniform [00:27:31] it to um the uniform distribution uh so just to document this [00:27:33] distribution uh so just to document this a little bit more so we have the uniform [00:27:35] a little bit more so we have the uniform distribution this is done estimate this [00:27:38] distribution this is done estimate this one play text done um and this is you no [00:27:42] one play text done um and this is you no we're going to estimate this this is [00:27:44] we're going to estimate this this is just an [00:27:46] just an initialization okay so I'm going to [00:27:48] initialization okay so I'm going to initialize this to um for each H [00:27:52] initialize this to um for each H in the domain of H um for each e in [00:27:58] in the domain of H um for each e in the similar domain I'm just going to [00:28:01] the similar domain I'm just going to have a one over [00:28:03] have a one over K okay so this is a uniform [00:28:06] K okay so this is a uniform distribution uh and now I'm going to run [00:28:10] distribution uh and now I'm going to run em to estimate only this uh emission [00:28:15] em to estimate only this uh emission from [00:28:16] from buil okay so let's harder so to run Em [00:28:21] buil okay so let's harder so to run Em I'm going to load my Cipher text in so [00:28:24] I'm going to load my Cipher text in so observations equals um read the text [00:28:28] observations equals um read the text Cipher text and then I'm going to [00:28:30] Cipher text and then I'm going to convert this into integer [00:28:36] sequence okay so now I'm going to [00:28:39] sequence okay so now I'm going to iterate a number of times let's just [00:28:41] iterate a number of times let's just call it 200 [00:28:43] call it 200 um and then I'm going to do uh the Eep [00:28:48] um and then I'm going to do uh the Eep and the M step okay so what happens in [00:28:52] and the M step okay so what happens in the Eep um I'm going to use my current [00:28:56] the Eep um I'm going to use my current setting of parameters to guess [00:28:59] setting of parameters to guess at what uh the pl text is so I'm going [00:29:02] at what uh the pl text is so I'm going to run forward [00:29:05] to run forward backward on the observations and pass in [00:29:08] backward on the observations and pass in the parameters [00:29:11] the parameters of the [00:29:16] hmm [00:29:18] hmm larger okay and this is going to return [00:29:21] larger okay and this is going to return Q just to know that Q of H equals in [00:29:25] Q just to know that Q of H equals in mathematical notation this is the [00:29:27] mathematical notation this is the probability [00:29:29] probability at uh that hi equal H um given the [00:29:35] at uh that hi equal H um given the evidence which is observations [00:29:38] evidence which is observations here print out best guess so far so [00:29:41] here print out best guess so far so let's see how we're doing we're going to [00:29:42] let's see how we're doing we're going to do this at each [00:29:44] do this at each iteration um so to do this so for [00:29:48] iteration um so to do this so for every um let's define n equals the [00:29:51] every um let's define n equals the length of uh the number of observations [00:29:54] length of uh the number of observations here so for each um [00:29:59] here so for each um position I'm going to look at Qi so this [00:30:02] position I'm going to look at Qi so this gives me a distribution over H and I'm [00:30:06] gives me a distribution over H and I'm going to take the one that has the [00:30:08] going to take the one that has the highest [00:30:11] highest probability so then I'm going to convert [00:30:14] probability so then I'm going to convert this to um [00:30:19] string and print it [00:30:24] out [00:30:26] out okay and now that finally the M step is [00:30:31] okay and now that finally the M step is we're just going to count and normalize [00:30:34] we're just going to count and normalize here so I'm going to define a new [00:30:38] here so I'm going to define a new temporary variable which is emission [00:30:40] temporary variable which is emission counts and this is going to be [00:30:44] counts and this is going to be um let me just actually cheat a little [00:30:47] um let me just actually cheat a little bit and I'm going to call emission [00:30:49] bit and I'm going to call emission counts to be zero for um the same [00:30:54] counts to be zero for um the same dimensionality as Mission props this is [00:30:57] dimensionality as Mission props this is a matrix of zero [00:31:00] okay so now we're going to up go through [00:31:03] okay so now we're going to up go through um each position here I in range of of N [00:31:09] um each position here I in range of of N and for each position what are the [00:31:12] and for each position what are the possible values it can take so um that's [00:31:15] possible values it can take so um that's going to be H and I'm going to [00:31:18] going to be H and I'm going to upate the emission [00:31:20] upate the emission counts of um so emission remember is h e [00:31:23] counts of um so emission remember is h e so H and E is going to be observations [00:31:27] so H and E is going to be observations of I [00:31:28] of I plus equal to q i h so this is probably [00:31:34] plus equal to q i h so this is probably the most uh important line here um so [00:31:39] the most uh important line here um so remember qih is what is the weight on um [00:31:45] remember qih is what is the weight on um a particular H at position I and then um [00:31:51] a particular H at position I and then um emission counts is going to be h of that [00:31:54] emission counts is going to be h of that particular observation and I'm going to [00:31:56] particular observation and I'm going to just update that count [00:31:59] just update that count okay so now all I need to do is [00:32:01] okay so now all I need to do is normalize so emission probabilities is [00:32:05] normalize so emission probabilities is for each um possible value of H I'm [00:32:09] for each um possible value of H I'm going to [00:32:11] going to normalize emission counts of [00:32:16] normalize emission counts of H okay so that's it so just to uh review [00:32:21] H okay so that's it so just to uh review this briefly so I first initialize the [00:32:24] this briefly so I first initialize the hmm the starting probabilities are just [00:32:27] hmm the starting probabilities are just uniform [00:32:28] uniform and then I'm going to um estimate the [00:32:31] and then I'm going to um estimate the transition probabilities in a fully [00:32:33] transition probabilities in a fully supervised way from plain text um where [00:32:36] supervised way from plain text um where I just simply count and normalize and [00:32:40] I just simply count and normalize and then I'm going to initialize the mission [00:32:42] then I'm going to initialize the mission probabilities to just uniform for now [00:32:45] probabilities to just uniform for now and I'm going to run the EM algorithm to [00:32:49] and I'm going to run the EM algorithm to actually um update these emission [00:32:53] actually um update these emission probability okay so I read in [00:32:55] probability okay so I read in observations and then I'm going to to [00:32:58] observations and then I'm going to to iterate between the Eep and the mstep [00:33:01] iterate between the Eep and the mstep where in the Eep I run forward backward [00:33:03] where in the Eep I run forward backward to compute the distribution over [00:33:08] to compute the distribution over particular possible values of H at each [00:33:11] particular possible values of H at each position and print out my best guess and [00:33:14] position and print out my best guess and then I'm going to count and [00:33:17] then I'm going to count and normalize all right so let's see how [00:33:20] normalize all right so let's see how this uh does so [00:33:23] this uh does so decipher [00:33:25] decipher y so at each pep it's going to print out [00:33:28] y so at each pep it's going to print out its best guess and over time you can see [00:33:31] its best guess and over time you can see that this jumble of letters is going to [00:33:33] that this jumble of letters is going to slowly evolve as em is trying to figure [00:33:36] slowly evolve as em is trying to figure out both the plain text as well as the [00:33:39] out both the plain text as well as the substitution table so this isn't going [00:33:42] substitution table so this isn't going to be perfect because we've used a [00:33:44] to be perfect because we've used a fairly simple model and we don't have [00:33:46] fairly simple model and we don't have much data but you can see some structure [00:33:48] much data but you can see some structure emerging so so I W my woke alone without [00:33:53] emerging so so I W my woke alone without so that's a real word any one that I [00:33:57] so that's a real word any one that I could [00:33:59] could really and so on um and plain um there's [00:34:03] really and so on um and plain um there's probably something um and so on okay so [00:34:08] probably something um and so on okay so just for comparison this is actually a [00:34:11] just for comparison this is actually a plain text so this is a a little passage [00:34:15] plain text so this is a a little passage from the little prints so I live my life [00:34:18] from the little prints so I live my life alone without anyone that I could really [00:34:22] alone without anyone that I could really uh talk to until I had an accident with [00:34:26] uh talk to until I had an accident with my plan so definitely far far from [00:34:28] my plan so definitely far far from perfect but given that we just did it in [00:34:31] perfect but given that we just did it in 10 minutes it's maybe not [00:34:34] 10 minutes it's maybe not bad okay so let me summarize we've [00:34:37] bad okay so let me summarize we've presented the EM algorithm for [00:34:38] presented the EM algorithm for estimating the parameters of aasan [00:34:40] estimating the parameters of aasan network when there are unobserved [00:34:42] network when there are unobserved variables so the overarching principle [00:34:45] variables so the overarching principle is that of Maximum marginal likelihood [00:34:47] is that of Maximum marginal likelihood we're going to find the parameters such [00:34:50] we're going to find the parameters such that that drives up the probability of [00:34:53] that that drives up the probability of the variables that we did observe as [00:34:55] the variables that we did observe as much as possible [00:34:58] much as possible so the EM algorithm is going to optimize [00:35:02] so the EM algorithm is going to optimize the marginal likelihood objective but [00:35:05] the marginal likelihood objective but fundamentally it's a chicken and egg [00:35:07] fundamentally it's a chicken and egg problem just like in kme right we don't [00:35:09] problem just like in kme right we don't know the hidden variables and we also [00:35:11] know the hidden variables and we also don't know the parameters so what we're [00:35:14] don't know the parameters so what we're going to do is to iterate between one [00:35:17] going to do is to iterate between one and the other so in the Eep we're going [00:35:20] and the other so in the Eep we're going to perform problemistic inference given [00:35:22] to perform problemistic inference given a fixed set of parameters to produce my [00:35:25] a fixed set of parameters to produce my our best guess over what some of the [00:35:28] our best guess over what some of the Hidden variables are and then in the M [00:35:30] Hidden variables are and then in the M step we're going to use these these [00:35:34] step we're going to use these these probabilities as weights of examples and [00:35:37] probabilities as weights of examples and then we're just going to count and [00:35:38] then we're just going to count and normalize to parameters and then we go [00:35:42] normalize to parameters and then we go estimate the hidden variables and [00:35:43] estimate the hidden variables and estimate the parameters and so on and so [00:35:46] estimate the parameters and so on and so forth so finally once you've learned [00:35:49] forth so finally once you've learned your beija network you can go off and [00:35:51] your beija network you can go off and perform inference and answer all sorts [00:35:53] perform inference and answer all sorts of questions which could involve asking [00:35:56] of questions which could involve asking about these on observe variables that [00:35:59] about these on observe variables that you didn't see on new test examples or [00:36:01] you didn't see on new test examples or it could be used to ask questions about [00:36:04] it could be used to ask questions about the uh observe [00:36:06] the uh observe variables um given some other variables [00:36:09] variables um given some other variables and in general this highlights kind of [00:36:11] and in general this highlights kind of the flexibility of beian networks just [00:36:13] the flexibility of beian networks just because you had a certain pattern of [00:36:16] because you had a certain pattern of missing at training time doesn't mean [00:36:18] missing at training time doesn't mean you have to commit to that at test [00:36:21] you have to commit to that at test time so there's many applications of be [00:36:24] time so there's many applications of be networks including involving the EM [00:36:27] networks including involving the EM algorithm we looked at decipherment [00:36:28] algorithm we looked at decipherment where the goal is to infer the plain [00:36:30] where the goal is to infer the plain text from the cipher text Em could also [00:36:33] text from the cipher text Em could also be used to reconstruct philogenetic [00:36:35] be used to reconstruct philogenetic trees given the DNA of modern [00:36:38] trees given the DNA of modern organisms and it can also be used to [00:36:41] organisms and it can also be used to infer the unknown label of a data point [00:36:45] infer the unknown label of a data point where the observations are the possible [00:36:47] where the observations are the possible noisy labels provided by crowd workers [00:36:51] noisy labels provided by crowd workers so finally em is the most canonical [00:36:53] so finally em is the most canonical version of broader class of techniques [00:36:56] version of broader class of techniques called variational inference [00:36:58] called variational inference which actually includes things like [00:36:59] which actually includes things like variational Auto encoders which some of [00:37:01] variational Auto encoders which some of you might have heard of um in that case [00:37:05] you might have heard of um in that case the Q is actually the encoder and it's [00:37:09] the Q is actually the encoder and it's given by a neural network and the [00:37:11] given by a neural network and the decoder is the beijan network so there's [00:37:15] decoder is the beijan network so there's a lot more to connections to be explored [00:37:18] a lot more to connections to be explored and I encourage you to read up on this [00:37:21] and I encourage you to read up on this by yourself ================================================================================ LECTURE 041 ================================================================================ Logic 1 - Overview: Logic Based Models | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=oM5LUGPO7Zk --- Transcript [00:00:06] okay hi everyone so this week we are [00:00:09] okay hi everyone so this week we are going to be talking about the logic so [00:00:11] going to be talking about the logic so this is our last set of modules and [00:00:13] this is our last set of modules and we're we're going to switch from [00:00:15] we're we're going to switch from variable based models and start talking [00:00:17] variable based models and start talking about logic [00:00:19] about logic so let's start with a question so let's [00:00:21] so let's start with a question so let's start with this question if x1 plus x2 [00:00:24] start with this question if x1 plus x2 is equal to 10 and x1 minus x2 is equal [00:00:27] is equal to 10 and x1 minus x2 is equal to 4 what is x1 okay [00:00:30] to 4 what is x1 okay so [00:00:31] so think about this for a few seconds so [00:00:33] think about this for a few seconds so the way you would go about this is [00:00:35] the way you would go about this is you'll probably like use the thing that [00:00:36] you'll probably like use the thing that you've used in algebra and then [00:00:38] you've used in algebra and then basically like cancel out x2s together [00:00:41] basically like cancel out x2s together and you would have like 2x1 equal to 14 [00:00:43] and you would have like 2x1 equal to 14 divide 14 by 2 and you'll end up getting [00:00:46] divide 14 by 2 and you'll end up getting seven right [00:00:47] seven right another way of looking at this problem [00:00:49] another way of looking at this problem is you can think of this as a factor [00:00:50] is you can think of this as a factor graph this is actually a factor graph [00:00:52] graph this is actually a factor graph and we have these constraints and one [00:00:54] and we have these constraints and one way of solving this is to to go and do [00:00:56] way of solving this is to to go and do backtracking search and then actually [00:00:58] backtracking search and then actually try to figure out the satisfying [00:01:00] try to figure out the satisfying assignment there but but the problem [00:01:02] assignment there but but the problem there is like that might not be the most [00:01:04] there is like that might not be the most efficient way of doing it and kind of [00:01:06] efficient way of doing it and kind of like the trick that you've learned in [00:01:07] like the trick that you've learned in algebra is is probably a more efficient [00:01:09] algebra is is probably a more efficient way of dealing with this question so [00:01:12] way of dealing with this question so that is kind of like a motivation for [00:01:14] that is kind of like a motivation for why we are talking about logic could we [00:01:16] why we are talking about logic could we do logical inferences in a way that [00:01:18] do logical inferences in a way that makes our lives much easier and then [00:01:21] makes our lives much easier and then allow us to talk about expressions much [00:01:23] allow us to talk about expressions much more concisely and allow us to move [00:01:25] more concisely and allow us to move around symbols and and come up with [00:01:27] around symbols and and come up with decisions come up with solutions based [00:01:30] decisions come up with solutions based on these sort of logical inferences so [00:01:32] on these sort of logical inferences so have this have this example in your mind [00:01:34] have this have this example in your mind throughout the lecture because we're [00:01:36] throughout the lecture because we're using similar type of ideas when we talk [00:01:38] using similar type of ideas when we talk about logic and then doing inference in [00:01:40] about logic and then doing inference in logic okay [00:01:42] logic okay so if you remember our course plan right [00:01:44] so if you remember our course plan right we started with machine learning and [00:01:46] we started with machine learning and then we talked about in general reflex [00:01:47] then we talked about in general reflex basement and reflex based models where [00:01:50] basement and reflex based models where we have a low level of intelligence and [00:01:52] we have a low level of intelligence and we started adding to to these levels of [00:01:54] we started adding to to these levels of intelligence thinking about state-based [00:01:56] intelligence thinking about state-based models and then thinking about variable [00:01:58] models and then thinking about variable based models and finally we are we are [00:02:00] based models and finally we are we are at logic when we are talking about a [00:02:02] at logic when we are talking about a higher level of intelligence and [00:02:04] higher level of intelligence and expressivity than when we think about [00:02:06] expressivity than when we think about aiai systems and just taking a step back [00:02:09] aiai systems and just taking a step back and thinking about the paradigms that [00:02:11] and thinking about the paradigms that we've used in this class we started [00:02:13] we've used in this class we started thinking about learning and modeling and [00:02:15] thinking about learning and modeling and inference so the idea was we have some [00:02:17] inference so the idea was we have some sort of data from that data we're going [00:02:19] sort of data from that data we're going to learn a model we're going to learn [00:02:22] to learn a model we're going to learn this representation and then we're going [00:02:24] this representation and then we're going to be able to do inference on that model [00:02:26] to be able to do inference on that model so once we have a model once we have an [00:02:28] so once we have a model once we have an mvp [00:02:29] mvp once we have a search problem we can [00:02:30] once we have a search problem we can basically ask questions and that's [00:02:32] basically ask questions and that's inference right you can basically ask [00:02:34] inference right you can basically ask questions and infer an answer and and [00:02:36] questions and infer an answer and and that allows us to think about different [00:02:38] that allows us to think about different types of inference algorithms okay [00:02:41] types of inference algorithms okay so examples of that is we talked about [00:02:43] so examples of that is we talked about search problems so when we have a search [00:02:45] search problems so when we have a search problem like the objectives inference [00:02:47] problem like the objectives inference problems that we're thinking about was [00:02:49] problems that we're thinking about was finding a minimum cost path or we were [00:02:51] finding a minimum cost path or we were talking about mdps so in mvps for [00:02:54] talking about mdps so in mvps for example or games you were thinking about [00:02:55] example or games you were thinking about maximum value policies or or we looked [00:02:58] maximum value policies or or we looked at csps or bayesian networks where we're [00:03:01] at csps or bayesian networks where we're looking at basically like what is the [00:03:03] looking at basically like what is the probability of some query conditioned on [00:03:05] probability of some query conditioned on some sort of evidence so so [00:03:07] some sort of evidence so so these are some examples of inference [00:03:09] these are some examples of inference questions inference problems that we [00:03:11] questions inference problems that we have looked at [00:03:12] have looked at throughout the throughout the different [00:03:14] throughout the throughout the different lectures and modules that we have seen [00:03:17] lectures and modules that we have seen in in modeling so when you think about [00:03:19] in in modeling so when you think about modeling paradigms and we had when we [00:03:21] modeling paradigms and we had when we have state-based models we thought about [00:03:23] have state-based models we thought about search problems mvps and games and and [00:03:26] search problems mvps and games and and we basically thought about these in [00:03:28] we basically thought about these in terms of states actions and and costs [00:03:30] terms of states actions and and costs right so those were kind of the main [00:03:32] right so those were kind of the main core elements that would come in into [00:03:35] core elements that would come in into our modeling when we think when we were [00:03:36] our modeling when we think when we were thinking about state-based models and [00:03:38] thinking about state-based models and applications of that were things of the [00:03:40] applications of that were things of the form of route finding or playing games [00:03:44] form of route finding or playing games and then when we started thinking about [00:03:45] and then when we started thinking about variable based models we started [00:03:47] variable based models we started defining this idea of variables and [00:03:49] defining this idea of variables and factors and constraints between them and [00:03:52] factors and constraints between them and then we talked about csps we talked [00:03:53] then we talked about csps we talked about bayesian networks markov networks [00:03:56] about bayesian networks markov networks and applications of that were things [00:03:58] and applications of that were things that that that was easier to think about [00:04:00] that that that was easier to think about in terms of variables so we talked about [00:04:02] in terms of variables so we talked about scheduling or tracking or medical [00:04:04] scheduling or tracking or medical diagnosis where we have uh dependencies [00:04:07] diagnosis where we have uh dependencies between these different variables and [00:04:09] between these different variables and that was in variable based models so so [00:04:12] that was in variable based models so so in this week what we want to do is we [00:04:13] in this week what we want to do is we want to talk about logic based models [00:04:16] want to talk about logic based models and in logic based models [00:04:18] and in logic based models similar to state-based models and [00:04:20] similar to state-based models and variable-based models we're going to [00:04:21] variable-based models we're going to define a few different types of logic [00:04:23] define a few different types of logic that we are going to be using so [00:04:25] that we are going to be using so specifically we are going to be talking [00:04:27] specifically we are going to be talking about prepositional logic and first [00:04:29] about prepositional logic and first order logic and we're going to think in [00:04:32] order logic and we're going to think in terms of logical formulas and in [00:04:34] terms of logical formulas and in manipulating these logical formulas and [00:04:36] manipulating these logical formulas and how we can infer new formulas from them [00:04:39] how we can infer new formulas from them so specifically how we think about [00:04:41] so specifically how we think about inference rules [00:04:42] inference rules and what are some applications of logic [00:04:44] and what are some applications of logic where logic shows up in a variety of [00:04:46] where logic shows up in a variety of applications starting from theory [00:04:47] applications starting from theory improving hardware and software [00:04:49] improving hardware and software verification uh and also in general like [00:04:52] verification uh and also in general like reasoning it's a core element of [00:04:54] reasoning it's a core element of reasoning in artificial intelligence [00:04:57] reasoning in artificial intelligence so historically if you think about logic [00:04:59] so historically if you think about logic kind of like back like old ai was very [00:05:02] kind of like back like old ai was very highly dependent on logic so logic was [00:05:04] highly dependent on logic so logic was dominant in ai like before 1990s so the [00:05:08] dominant in ai like before 1990s so the same sort of excitement the same sort of [00:05:10] same sort of excitement the same sort of hype that is around deep learning today [00:05:12] hype that is around deep learning today that same sort of hype was around logic [00:05:15] that same sort of hype was around logic before 1990s and that was kind of like [00:05:17] before 1990s and that was kind of like the core of ai people were thinking [00:05:19] the core of ai people were thinking logic is going to give us really [00:05:20] logic is going to give us really understanding of artificial intelligence [00:05:23] understanding of artificial intelligence and developing artificial intelligence [00:05:25] and developing artificial intelligence that could that could really achieve [00:05:26] that could that could really achieve things that humans can [00:05:28] things that humans can but but that didn't really pan out and [00:05:30] but but that didn't really pan out and the reason it didn't pan out was logic [00:05:32] the reason it didn't pan out was logic had a few problems so so the first [00:05:34] had a few problems so so the first problem was logic was deterministic [00:05:37] problem was logic was deterministic right and and it it couldn't really [00:05:39] right and and it it couldn't really handle uncertainty and that gave rise to [00:05:42] handle uncertainty and that gave rise to two things of the form of probabilistic [00:05:43] two things of the form of probabilistic and friends and and in general [00:05:45] and friends and and in general understanding probabilities and adding [00:05:47] understanding probabilities and adding uncertainty on top of logic or [00:05:49] uncertainty on top of logic or developing models that can capture [00:05:50] developing models that can capture uncertainty beyond logic [00:05:53] uncertainty beyond logic the second problem with logic based [00:05:55] the second problem with logic based models where they were very rule-based [00:05:57] models where they were very rule-based and and they wouldn't allow like [00:05:59] and and they wouldn't allow like fine-tuning based on data so so because [00:06:01] fine-tuning based on data so so because of that they they were very brittle so [00:06:03] of that they they were very brittle so if i have new data that comes in and [00:06:06] if i have new data that comes in and tells me something else then that [00:06:07] tells me something else then that rule-based model is not going to be able [00:06:09] rule-based model is not going to be able to capture that and it's really hard to [00:06:11] to capture that and it's really hard to incorporate information coming from new [00:06:14] incorporate information coming from new data and again that gives rise to [00:06:15] data and again that gives rise to machine learning and this idea of [00:06:17] machine learning and this idea of data-driven models and looking at data [00:06:20] data-driven models and looking at data and being able to be being able to learn [00:06:22] and being able to be being able to learn new models and being able to do in [00:06:24] new models and being able to do in france from that perspective okay [00:06:26] france from that perspective okay but the so so these are kind of like the [00:06:28] but the so so these are kind of like the problem one and two are weaknesses of [00:06:30] problem one and two are weaknesses of logic but in general logic has some sort [00:06:33] logic but in general logic has some sort of strength that some of our models [00:06:35] of strength that some of our models today like some some of the [00:06:36] today like some some of the state-of-the-art models so they don't [00:06:38] state-of-the-art models so they don't really have that and and i think there [00:06:39] really have that and and i think there is really an opportunity here to use [00:06:42] is really an opportunity here to use ideas from logic in some of the more [00:06:43] ideas from logic in some of the more modern machine learning systems or some [00:06:45] modern machine learning systems or some of the more modern ai systems and the [00:06:47] of the more modern ai systems and the strength of logic is expressivity so so [00:06:51] strength of logic is expressivity so so we're going to be talking about this [00:06:53] we're going to be talking about this throughout this week in general but but [00:06:55] throughout this week in general but but the nice thing that logic gives us is it [00:06:57] the nice thing that logic gives us is it provides a compact way of expressing [00:07:00] provides a compact way of expressing models expressing representations that [00:07:03] models expressing representations that that we wouldn't normally be able to get [00:07:05] that we wouldn't normally be able to get so this compact representation can be [00:07:07] so this compact representation can be really powerful we could because we [00:07:08] really powerful we could because we could manipulate that that compact [00:07:10] could manipulate that that compact representation and that could allow us [00:07:13] representation and that could allow us to to move in [00:07:14] to to move in to to be able to infer new ideas and for [00:07:17] to to be able to infer new ideas and for new rules um and so on and in general [00:07:19] new rules um and so on and in general that expressivity is is a big strength [00:07:22] that expressivity is is a big strength of logic and the reason that it is still [00:07:24] of logic and the reason that it is still around and there is still excitement [00:07:25] around and there is still excitement around using it [00:07:28] around using it all right so let me let me motivate [00:07:30] all right so let me let me motivate logic with an example so we've looked at [00:07:32] logic with an example so we've looked at this example i think percy shown this [00:07:33] this example i think percy shown this showed this example during the first [00:07:35] showed this example during the first lecture where [00:07:37] lecture where our goal is we want to have a smart [00:07:39] our goal is we want to have a smart personal assistant so let's say you're [00:07:41] personal assistant so let's say you're sitting on the beach and this is after [00:07:42] sitting on the beach and this is after the class and you're on vacation and [00:07:44] the class and you're on vacation and after covet and and we're sitting on a [00:07:46] after covet and and we're sitting on a beach and we have a personal assistant [00:07:49] beach and we have a personal assistant and what we want to do is we want to ask [00:07:51] and what we want to do is we want to ask our personal assistant maybe it's siri [00:07:53] our personal assistant maybe it's siri or maybe there's something fancier than [00:07:55] or maybe there's something fancier than siri and we want to ask our personal [00:07:57] siri and we want to ask our personal assistant a set of questions or maybe [00:07:59] assistant a set of questions or maybe maybe you want to tell it some [00:08:00] maybe you want to tell it some information maybe you want to inform it [00:08:02] information maybe you want to inform it about something or ask questions from it [00:08:04] about something or ask questions from it okay so and and let's say we use natural [00:08:08] okay so and and let's say we use natural language so let's start with natural [00:08:09] language so let's start with natural language as a medium for talking to this [00:08:12] language as a medium for talking to this personal assistant okay so so let's look [00:08:15] personal assistant okay so so let's look at an example here so so this was the [00:08:17] at an example here so so this was the system so let's say that this is my [00:08:19] system so let's say that this is my system and i tell my system [00:08:21] system and i tell my system all students like cs221 okay so [00:08:27] all students like cs221 okay so i'm telling it some information and then [00:08:29] i'm telling it some information and then my personal assistant says i learned [00:08:31] my personal assistant says i learned something okay [00:08:32] something okay then i can say [00:08:34] then i can say um bob does not like [00:08:37] um bob does not like cs221 okay and then it would be like i [00:08:41] cs221 okay and then it would be like i learned something and now based on this [00:08:43] learned something and now based on this knowledge that it has like let's call [00:08:45] knowledge that it has like let's call that knowledge base so based on this [00:08:47] that knowledge base so based on this knowledge base that it has i can ask [00:08:49] knowledge base that it has i can ask this personal assistant questions i can [00:08:51] this personal assistant questions i can ask is [00:08:52] ask is bob student [00:08:55] bob student what should what should it answer [00:08:57] what should what should it answer so if it actually does inference right [00:08:59] so if it actually does inference right right it should it should answer no [00:09:01] right it should it should answer no right like if it if it has a set of [00:09:03] right like if it if it has a set of formulas and based on those it can infer [00:09:05] formulas and based on those it can infer and it can reason then it should [00:09:06] and it can reason then it should actually be able to answer answer that [00:09:08] actually be able to answer answer that question underneath here there are a [00:09:11] question underneath here there are a bunch of formulas and and there are a [00:09:13] bunch of formulas and and there are a bunch of inference rules we are going to [00:09:14] bunch of inference rules we are going to be talking talking about that but we [00:09:16] be talking talking about that but we could take a look at that and see what [00:09:18] could take a look at that and see what are the formulas it has access to and [00:09:20] are the formulas it has access to and what are the things that it is unfair [00:09:21] what are the things that it is unfair and it is unfairing that that bob is not [00:09:24] and it is unfairing that that bob is not a not a student here based on the things [00:09:26] a not a student here based on the things that i've told it well this is kind of [00:09:28] that i've told it well this is kind of uh this is an environment that we're [00:09:30] uh this is an environment that we're going to be talking about throughout uh [00:09:32] going to be talking about throughout uh the lectures and other modules this week [00:09:34] the lectures and other modules this week all right so [00:09:36] all right so let's go back here okay [00:09:38] let's go back here okay so so in general when we think about [00:09:41] so so in general when we think about having this personal assistant where [00:09:43] having this personal assistant where we're using natural language or where [00:09:44] we're using natural language or where we're using where we're using logic the [00:09:46] we're using where we're using logic the idea is it should be able to digest [00:09:48] idea is it should be able to digest heterogeneous information and you should [00:09:51] heterogeneous information and you should also be able to reason deeply about that [00:09:53] also be able to reason deeply about that information right it can't have just [00:09:55] information right it can't have just shallow knowledge of that information it [00:09:57] shallow knowledge of that information it should be able to inference it should be [00:09:58] should be able to inference it should be able to connect these different pieces [00:10:00] able to connect these different pieces of information and make logical [00:10:02] of information and make logical statements based on that makes logical [00:10:04] statements based on that makes logical move space based on them okay so so why [00:10:07] move space based on them okay so so why should we use natural language that's a [00:10:09] should we use natural language that's a good question why natural language uh or [00:10:12] good question why natural language uh or like anything else okay [00:10:14] like anything else okay so natural language is kind of nice we [00:10:17] so natural language is kind of nice we all like speaking natural language i'm [00:10:18] all like speaking natural language i'm talking to you guys in natural language [00:10:20] talking to you guys in natural language it's a very nice medium to use when we [00:10:23] it's a very nice medium to use when we talk to personal assistants or when we [00:10:26] talk to personal assistants or when we want to we're going to basically express [00:10:28] want to we're going to basically express what what we would like to say so it's a [00:10:30] what what we would like to say so it's a very rich medium for expressing what we [00:10:32] very rich medium for expressing what we want [00:10:33] want um and and because it is rich we can say [00:10:35] um and and because it is rich we can say things like i don't know a dime is [00:10:37] things like i don't know a dime is better than a nickel and and we can say [00:10:39] better than a nickel and and we can say things like a nickel is better than a [00:10:41] things like a nickel is better than a penny [00:10:42] penny and and based on that we could be able [00:10:45] and and based on that we could be able to make expressions we could make [00:10:46] to make expressions we could make logical statements based on that and say [00:10:48] logical statements based on that and say therefore a dime is better than a penny [00:10:50] therefore a dime is better than a penny which which makes sense [00:10:52] which which makes sense but the problem with natural language is [00:10:55] but the problem with natural language is um it's also a little bit slippery like [00:10:57] um it's also a little bit slippery like um i can start with something that says [00:11:00] um i can start with something that says a penny is better than nothing that's [00:11:02] a penny is better than nothing that's okay and then i would have another [00:11:04] okay and then i would have another statement that just says nothing is [00:11:06] statement that just says nothing is better than world peace and that's [00:11:07] better than world peace and that's perfectly fine and putting these two [00:11:09] perfectly fine and putting these two together i can come up with a logical [00:11:12] together i can come up with a logical like [00:11:13] like kind of a statement based on based on [00:11:15] kind of a statement based on based on what i've seen that a penny is better [00:11:17] what i've seen that a penny is better than world peace which sounds a little [00:11:19] than world peace which sounds a little bit weird and not not correct and not [00:11:22] bit weird and not not correct and not the thing that i actually wanted okay [00:11:25] the thing that i actually wanted okay so even though natural language is [00:11:27] so even though natural language is pretty rich [00:11:28] pretty rich when we are thinking about logical [00:11:30] when we are thinking about logical statements and making logical uh logical [00:11:34] statements and making logical uh logical inference [00:11:35] inference here and following like inference rules [00:11:37] here and following like inference rules it feels like natural language is a [00:11:39] it feels like natural language is a little bit slippery and you might want [00:11:40] little bit slippery and you might want to have access to some other type of [00:11:42] to have access to some other type of language so this language like when we [00:11:45] language so this language like when we talk about language it doesn't need to [00:11:46] talk about language it doesn't need to be natural language right language is [00:11:48] be natural language right language is just a mechanism for expressing things [00:11:51] just a mechanism for expressing things it's just a way of expressing okay so um [00:11:54] it's just a way of expressing okay so um natural language is an example of a [00:11:57] natural language is an example of a language that allows us to express [00:11:59] language that allows us to express things it's kind of informal [00:12:01] things it's kind of informal we also have programming languages that [00:12:03] we also have programming languages that those are kind of formal like we have [00:12:05] those are kind of formal like we have like python or c spots [00:12:07] like python or c spots in addition to this we can have logical [00:12:09] in addition to this we can have logical languages and the nice thing about [00:12:11] languages and the nice thing about logical languages is that they're formal [00:12:13] logical languages is that they're formal and we can think about the relationship [00:12:14] and we can think about the relationship between them and formal formal [00:12:16] between them and formal formal connections between them but the other [00:12:18] connections between them but the other nice thing about logical languages is [00:12:20] nice thing about logical languages is they're actually closer to natural [00:12:22] they're actually closer to natural language than let's say programming [00:12:24] language than let's say programming languages because they're declarative so [00:12:26] languages because they're declarative so so there is like actually a connection [00:12:28] so there is like actually a connection between natural language and logical [00:12:30] between natural language and logical languages and in one of the later [00:12:31] languages and in one of the later modules we're actually going to talk [00:12:33] modules we're actually going to talk about how we can write expressions in [00:12:35] about how we can write expressions in first order logic if we have a natural [00:12:37] first order logic if we have a natural language statement so [00:12:40] language statement so so in this lecture this week we are [00:12:42] so in this lecture this week we are going to be talking about two types of [00:12:43] going to be talking about two types of logical languages propositional logic [00:12:45] logical languages propositional logic and first order logic [00:12:48] and first order logic all right so what what is the goal of a [00:12:51] all right so what what is the goal of a logical language so the goal here is to [00:12:54] logical language so the goal here is to be able to represent knowledge right you [00:12:56] be able to represent knowledge right you want to be able to represent knowledge [00:12:57] want to be able to represent knowledge about the world but that is not the only [00:12:59] about the world but that is not the only goal in addition to that you want to be [00:13:01] goal in addition to that you want to be able to reason about that knowledge [00:13:03] able to reason about that knowledge right it's not just about representing [00:13:05] right it's not just about representing it's about how we can move how we can [00:13:07] it's about how we can move how we can make logical statements and how we can [00:13:09] make logical statements and how we can we can run inference rules and how we [00:13:11] we can run inference rules and how we can make new statements and reason about [00:13:13] can make new statements and reason about them so so an example is if i tell you [00:13:16] them so so an example is if i tell you guys it's it's raining and it is wet you [00:13:19] guys it's it's raining and it is wet you should be able to reason about that and [00:13:20] should be able to reason about that and should be able to figure out that well [00:13:22] should be able to figure out that well it is raining and right are you telling [00:13:23] it is raining and right are you telling me raining and wet so so it is it is [00:13:26] me raining and wet so so it is it is definitely like both of those statements [00:13:28] definitely like both of those statements are true and you should be able to [00:13:29] are true and you should be able to reason about that and that is the goal [00:13:31] reason about that and that is the goal of um of a logical language [00:13:35] of um of a logical language and and when we think about logic [00:13:37] and and when we think about logic uh we have kind of three main [00:13:39] uh we have kind of three main ingredients for logic i'm going to go [00:13:41] ingredients for logic i'm going to go into these details a little bit more in [00:13:44] into these details a little bit more in our first module but let me just give [00:13:46] our first module but let me just give you a quick overview so we're going to [00:13:48] you a quick overview so we're going to have syntax [00:13:49] have syntax and syntax basically is tells us like [00:13:52] and syntax basically is tells us like are the symbols of of that language so [00:13:54] are the symbols of of that language so so basically it defines a set of valid [00:13:57] so basically it defines a set of valid formulas so so syntax here for example [00:14:01] formulas so so syntax here for example in a logical formula in propositional [00:14:03] in a logical formula in propositional logic syntax could be rain and wet okay [00:14:06] logic syntax could be rain and wet okay so when i write rain and red here this [00:14:10] so when i write rain and red here this ends in the syntax land doesn't have any [00:14:13] ends in the syntax land doesn't have any meanings it's just a symbol it's just [00:14:15] meanings it's just a symbol it's just like this shape okay and then reign into [00:14:17] like this shape okay and then reign into it they don't have any any meanings [00:14:19] it they don't have any any meanings they're just symbols okay so so when [00:14:22] they're just symbols okay so so when you're talking about syntax you're [00:14:23] you're talking about syntax you're really talking about the symbols that [00:14:25] really talking about the symbols that are building blocks of language but [00:14:27] are building blocks of language but symbols are alone like syntax alone is [00:14:30] symbols are alone like syntax alone is not going to be able to define a [00:14:32] not going to be able to define a language in addition to syntax what we [00:14:35] language in addition to syntax what we need is semantics we actually need to [00:14:37] need is semantics we actually need to give meaning to the syntax so for each [00:14:39] give meaning to the syntax so for each one of these formulas we need to be able [00:14:41] one of these formulas we need to be able to specify a meaning or we specify it as [00:14:44] to specify a meaning or we specify it as a meaning has a very very precise [00:14:47] a meaning has a very very precise meaning to it so so meaning corresponds [00:14:49] meaning to it so so meaning corresponds to an assignment a configuration of the [00:14:52] to an assignment a configuration of the role like a setting in the world that [00:14:54] role like a setting in the world that that corresponds to that formula that [00:14:56] that corresponds to that formula that corresponds to that syntactical formula [00:14:59] corresponds to that syntactical formula so for example in the case of rain and [00:15:01] so for example in the case of rain and wit it actually corresponds to specific [00:15:04] wit it actually corresponds to specific meaning where rain takes value one and [00:15:06] meaning where rain takes value one and what takes value one and and this is [00:15:09] what takes value one and and this is like a specific model a specific world [00:15:11] like a specific model a specific world where we live in and and in this world [00:15:14] where we live in and and in this world rain and red has this particular meaning [00:15:17] rain and red has this particular meaning okay so so the main ingredients of logic [00:15:20] okay so so the main ingredients of logic are first of syntax and semantics and [00:15:23] are first of syntax and semantics and once we have syntax and semantics then [00:15:25] once we have syntax and semantics then we can talk about inference rules we can [00:15:27] we can talk about inference rules we can actually talk about what can we infer [00:15:30] actually talk about what can we infer not now that we have a set of formulas [00:15:32] not now that we have a set of formulas or we have a set of knowledge about the [00:15:34] or we have a set of knowledge about the world so so given that we have a formula [00:15:36] world so so given that we have a formula f could we infer could could we derive a [00:15:39] f could we infer could could we derive a new formula g could we figure out like [00:15:42] new formula g could we figure out like if g is true or not based on f for [00:15:44] if g is true or not based on f for example like if you tell me rain and wet [00:15:46] example like if you tell me rain and wet as a formula from that i can derive rain [00:15:49] as a formula from that i can derive rain is also true right because it's got to [00:15:51] is also true right because it's got to be rain and do it so from from that i [00:15:53] be rain and do it so from from that i should be able to derive right right and [00:15:56] should be able to derive right right and then that's what we are going to spend [00:15:57] then that's what we are going to spend quite a bit of time this week because [00:15:59] quite a bit of time this week because what are the inference rules that we can [00:16:00] what are the inference rules that we can play around with and and how do they [00:16:02] play around with and and how do they apply to different types of logic okay [00:16:05] apply to different types of logic okay so three ingredients syntax semantics [00:16:08] so three ingredients syntax semantics and inference rules [00:16:11] and inference rules all right so let me just um make a [00:16:13] all right so let me just um make a bigger point for this difference between [00:16:15] bigger point for this difference between syntax and semantics because the [00:16:17] syntax and semantics because the difference might be a little subtle so [00:16:19] difference might be a little subtle so so again if you think about syntax [00:16:21] so again if you think about syntax syntax is talking about the valid [00:16:23] syntax is talking about the valid expressions that are in your language [00:16:25] expressions that are in your language it's basically talking about the symbols [00:16:26] it's basically talking about the symbols right the things that are valid to say [00:16:28] right the things that are valid to say in the like the the the [00:16:30] in the like the the the the symbols that are valid to write in [00:16:32] the symbols that are valid to write in this in this language semantics is about [00:16:35] this in this language semantics is about what the expressions means means so so [00:16:37] what the expressions means means so so let me give you an example here so let's [00:16:39] let me give you an example here so let's say if you're looking at 2 plus 3 versus [00:16:42] say if you're looking at 2 plus 3 versus 3 plus 2 okay 2 plus 3 and 3 plus 2 have [00:16:45] 3 plus 2 okay 2 plus 3 and 3 plus 2 have different syntax they're not the same [00:16:47] different syntax they're not the same they don't look the same if i have no [00:16:49] they don't look the same if i have no idea what two means or plus means or [00:16:51] idea what two means or plus means or three means right two plus three has [00:16:53] three means right two plus three has nothing to do with three plus two they [00:16:55] nothing to do with three plus two they have two very different syntax but they [00:16:57] have two very different syntax but they have the same semantics right if i know [00:16:59] have the same semantics right if i know what plus means and what two means and [00:17:01] what plus means and what two means and what three means and if i know what two [00:17:03] what three means and if i know what two plus three is five and three plus two is [00:17:05] plus three is five and three plus two is five then they have the same meaning [00:17:07] five then they have the same meaning they have they have they have the same [00:17:08] they have they have they have the same the same semantics so different syntax [00:17:11] the same semantics so different syntax but similar the same semantics on the [00:17:14] but similar the same semantics on the other hand we can have settings where we [00:17:16] other hand we can have settings where we have the same syntax things look the [00:17:18] have the same syntax things look the same but they have different meanings [00:17:20] same but they have different meanings for example you can look at three over [00:17:22] for example you can look at three over two in python 2.7 versus python three [00:17:25] two in python 2.7 versus python three and in that case it's like it looks the [00:17:27] and in that case it's like it looks the same three looks the same the divide [00:17:30] same three looks the same the divide looks the same two looks the same so [00:17:32] looks the same two looks the same so syntactically these two are exactly the [00:17:34] syntactically these two are exactly the same thing but semantically they have [00:17:36] same thing but semantically they have different meanings they actually [00:17:38] different meanings they actually correspond to different values right [00:17:39] correspond to different values right when you're doing this in python 2.7 or [00:17:42] when you're doing this in python 2.7 or python 3. okay so so again we have two [00:17:45] python 3. okay so so again we have two expressions that have the same syntax in [00:17:47] expressions that have the same syntax in this case but they have different [00:17:48] this case but they have different meanings and different semantics so [00:17:50] meanings and different semantics so syntax and semantics are two different [00:17:52] syntax and semantics are two different things both of them are needed to define [00:17:54] things both of them are needed to define a logical language [00:17:56] a logical language and i want to kind of like end with this [00:17:59] and i want to kind of like end with this this view uh that and this diagram that [00:18:01] this view uh that and this diagram that i'm gonna come back to and explain it in [00:18:03] i'm gonna come back to and explain it in in a bit more detail in kind of future [00:18:05] in a bit more detail in kind of future modules so so the idea is we have two [00:18:08] modules so so the idea is we have two worlds here we have on the left we have [00:18:11] worlds here we have on the left we have uh syntax syntax land and on the right [00:18:14] uh syntax syntax land and on the right we have semantics land okay so in syntax [00:18:17] we have semantics land okay so in syntax land we have formula so i'm going to use [00:18:19] land we have formula so i'm going to use these rectangles to define to kind of [00:18:21] these rectangles to define to kind of like represent formulas these are [00:18:23] like represent formulas these are different formulas that i can write like [00:18:25] different formulas that i can write like rain and wet and each one of these [00:18:27] rain and wet and each one of these formulas have a meaning in in this [00:18:30] formulas have a meaning in in this semantics line okay so each formula has [00:18:32] semantics line okay so each formula has a meaning in the semantics and and [00:18:35] a meaning in the semantics and and that's called models here [00:18:37] that's called models here and what our goal is throughout the [00:18:38] and what our goal is throughout the lecture is to come up with inference [00:18:40] lecture is to come up with inference like first define syntax and semantics [00:18:43] like first define syntax and semantics for different types of logics and then [00:18:45] for different types of logics and then after that come up with inference rules [00:18:47] after that come up with inference rules that allow us to manipulate these [00:18:49] that allow us to manipulate these formulas these compact formulas that are [00:18:51] formulas these compact formulas that are kind of like nice and expressive and [00:18:54] kind of like nice and expressive and manipulate them and come up with new [00:18:55] manipulate them and come up with new formulas and fair new formulas that that [00:18:58] formulas and fair new formulas that that have meanings that that are also [00:19:00] have meanings that that are also entailing the meanings of our current [00:19:02] entailing the meanings of our current formulas so more on this later if this [00:19:05] formulas so more on this later if this is confusing i will talk about this in [00:19:07] is confusing i will talk about this in more details in a few in a few modules [00:19:09] more details in a few in a few modules okay [00:19:10] okay just to give you a quick overview of [00:19:12] just to give you a quick overview of different types of logics you are being [00:19:14] different types of logics you are being we will be talking about there are [00:19:15] we will be talking about there are different types of logics and and in the [00:19:18] different types of logics and and in the order of a disorder [00:19:20] order of a disorder they're they're kind of like increasing [00:19:22] they're they're kind of like increasing in the order of increasing of [00:19:23] in the order of increasing of expressivity so uh and in this uh this [00:19:27] expressivity so uh and in this uh this week we are going to be talking about uh [00:19:29] week we are going to be talking about uh the bolded ones so we will be talking [00:19:31] the bolded ones so we will be talking about propositional logic and [00:19:33] about propositional logic and specifically propositional logic a [00:19:34] specifically propositional logic a subset of it that only has these things [00:19:37] subset of it that only has these things that are called horn clauses so we'll [00:19:38] that are called horn clauses so we'll talk about what those are [00:19:40] talk about what those are and we will also talk about first order [00:19:42] and we will also talk about first order logic so first order logic only with [00:19:44] logic so first order logic only with horn clauses and just generally first [00:19:46] horn clauses and just generally first order logic there are other types of [00:19:48] order logic there are other types of logic that we're not discussing in this [00:19:50] logic that we're not discussing in this class second order logic temporal logic [00:19:53] class second order logic temporal logic and they're actually quite useful and in [00:19:56] and they're actually quite useful and in a variety of fields in programming [00:19:57] a variety of fields in programming languages and robotics and formal [00:19:59] languages and robotics and formal methods um and yeah like if you're [00:20:02] methods um and yeah like if you're interested in any of these we can chat [00:20:05] interested in any of these we can chat about it offline one other point i want [00:20:07] about it offline one other point i want to make is as we increase like the level [00:20:10] to make is as we increase like the level of expressivity of logic here like as [00:20:12] of expressivity of logic here like as you go down in this list like [00:20:14] you go down in this list like expressivity of these logics are is [00:20:17] expressivity of these logics are is going higher and higher but the thing [00:20:19] going higher and higher but the thing that you're losing on is computational [00:20:21] that you're losing on is computational efficiency so if you want to run [00:20:22] efficiency so if you want to run inference rules it's going to become [00:20:24] inference rules it's going to become much more difficult if you're running it [00:20:26] much more difficult if you're running it on first order logic as opposed to [00:20:28] on first order logic as opposed to propositional logic so so there is a [00:20:30] propositional logic so so there is a trade-off between computational [00:20:31] trade-off between computational efficiency and expressivity of that [00:20:33] efficiency and expressivity of that logical language [00:20:35] logical language all right so with that this is the [00:20:37] all right so with that this is the roadmap for this week's lecture so [00:20:39] roadmap for this week's lecture so lectures so [00:20:41] lectures so we're going to start with kind of like [00:20:44] we're going to start with kind of like modeling and by modeling here i mean [00:20:46] modeling and by modeling here i mean what i mean is defining the syntax and [00:20:48] what i mean is defining the syntax and semantics of logic so so we're going to [00:20:51] semantics of logic so so we're going to talk about propositional logic and the [00:20:53] talk about propositional logic and the syntax of that then we are going to talk [00:20:55] syntax of that then we are going to talk about the semantics of propositional [00:20:56] about the semantics of propositional logic at that point we're going to be [00:20:59] logic at that point we're going to be switching to inference and discussing [00:21:01] switching to inference and discussing inference rules and in general we're [00:21:03] inference rules and in general we're going to be talking about two main [00:21:05] going to be talking about two main inference rules modus ponens is one and [00:21:07] inference rules modus ponens is one and the other one is resolution so so we're [00:21:10] the other one is resolution so so we're going to talk about under propositional [00:21:12] going to talk about under propositional logic how do we do modus ponens how do [00:21:14] logic how do we do modus ponens how do we do resolution at that point we're [00:21:16] we do resolution at that point we're going to switch back to to a higher [00:21:19] going to switch back to to a higher level of expressivity in terms of our [00:21:21] level of expressivity in terms of our models we're going to talk about first [00:21:22] models we're going to talk about first order logic after that again syntax and [00:21:25] order logic after that again syntax and semantics of first order logic and then [00:21:27] semantics of first order logic and then after that we're going to talk about [00:21:29] after that we're going to talk about modus ponens again an inference rule for [00:21:31] modus ponens again an inference rule for first order logic and we have an [00:21:33] first order logic and we have an optional [00:21:34] optional module at the end which is about [00:21:36] module at the end which is about resolution for first order logic this [00:21:38] resolution for first order logic this gets a little bit hairy but if you're [00:21:40] gets a little bit hairy but if you're interested you can kind of look at look [00:21:42] interested you can kind of look at look at this [00:21:44] at this and how resolution gets applied to first [00:21:46] and how resolution gets applied to first order logic and then we're not talking [00:21:48] order logic and then we're not talking about learning during this week in [00:21:50] about learning during this week in general learning more recently has been [00:21:52] general learning more recently has been applied uh to to to kind of logical [00:21:55] applied uh to to to kind of logical formula specifically in the area of [00:21:57] formula specifically in the area of formal methods [00:21:58] formal methods people have been thinking about learning [00:22:01] people have been thinking about learning logical formulas from data from [00:22:03] logical formulas from data from demonstrations [00:22:04] demonstrations but that's outside the scope of this [00:22:06] but that's outside the scope of this class so we will not be talking about [00:22:08] class so we will not be talking about that [00:22:13] you ================================================================================ LECTURE 042 ================================================================================ Logic 2 - Propositional Logic Syntax | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=LBjNaewGJzk --- Transcript [00:00:04] all right so in this module we're going [00:00:06] all right so in this module we're going to be talking about syntax of [00:00:08] to be talking about syntax of propositional logic so if you remember [00:00:10] propositional logic so if you remember this diagram what we're going to be [00:00:12] this diagram what we're going to be talking about in this lecture is [00:00:13] talking about in this lecture is thinking about syntax of logic thinking [00:00:16] thinking about syntax of logic thinking about semantics of logic the meaning of [00:00:18] about semantics of logic the meaning of logic and in addition to that thinking [00:00:20] logic and in addition to that thinking about inference rules how can we [00:00:21] about inference rules how can we manipulate logic and one point i want to [00:00:24] manipulate logic and one point i want to mention is like you might have seen [00:00:26] mention is like you might have seen logic in other classes you might have [00:00:27] logic in other classes you might have seen logical formulas and being able to [00:00:29] seen logical formulas and being able to manipulate them and move things around [00:00:31] manipulate them and move things around and and that's not really the point here [00:00:33] and and that's not really the point here the point here is to have this general [00:00:36] the point here is to have this general framework kind of like this more [00:00:38] framework kind of like this more principle way of looking at logic where [00:00:40] principle way of looking at logic where we can think about algorithms that can [00:00:42] we can think about algorithms that can manipulate uh logical formulas and and [00:00:45] manipulate uh logical formulas and and can do inference rules just more [00:00:47] can do inference rules just more generally from an algorithmic [00:00:48] generally from an algorithmic perspective so the point is not for you [00:00:51] perspective so the point is not for you to be able to like do logic and move [00:00:53] to be able to like do logic and move things around the point is to have an [00:00:55] things around the point is to have an algorithm that that algorithm can can do [00:00:58] algorithm that that algorithm can can do logic so because the goal of this class [00:01:00] logic so because the goal of this class is to have an artificial intelligence [00:01:02] is to have an artificial intelligence that can do similar things as how humans [00:01:04] that can do similar things as how humans would do it so so the point is not for [00:01:06] would do it so so the point is not for you to do logic the point is for the ai [00:01:08] you to do logic the point is for the ai to be able to do logic and this is kind [00:01:10] to be able to do logic and this is kind of like an analogy to that is thinking [00:01:12] of like an analogy to that is thinking about the bayesian networks lecture [00:01:14] about the bayesian networks lecture right like last week we talked about [00:01:15] right like last week we talked about bayesian networks and and in that [00:01:17] bayesian networks and and in that setting right like you might be able to [00:01:19] setting right like you might be able to do conditional marginal probabilities [00:01:21] do conditional marginal probabilities perfectly fine right you might be able [00:01:23] perfectly fine right you might be able to manipulate things perfectly fine but [00:01:24] to manipulate things perfectly fine but that was not the point the point was not [00:01:26] that was not the point the point was not for you to do that the point was to have [00:01:28] for you to do that the point was to have an algorithm maybe like keep sampling [00:01:31] an algorithm maybe like keep sampling that is more generally can be applied to [00:01:33] that is more generally can be applied to any bayesian network not like a single [00:01:35] any bayesian network not like a single example so so we're basically trying to [00:01:37] example so so we're basically trying to do a similar thing in the space of logic [00:01:39] do a similar thing in the space of logic here okay [00:01:41] here okay so let's talk about syntax so what is [00:01:43] so let's talk about syntax so what is syntax so the syntax of a propositional [00:01:45] syntax so the syntax of a propositional logic consists of a few different things [00:01:47] logic consists of a few different things so it consists of propositional symbols [00:01:50] so it consists of propositional symbols so this could be like uh a or b or c [00:01:52] so this could be like uh a or b or c like these take boolean values and then [00:01:56] like these take boolean values and then based on these propositionals of symbols [00:01:58] based on these propositionals of symbols we can we can basically build formulas [00:02:00] we can we can basically build formulas on top of them the prepositional symbols [00:02:02] on top of them the prepositional symbols are also commonly like known as atomic [00:02:04] are also commonly like known as atomic formulas and then you can make more [00:02:07] formulas and then you can make more complicated formulas based on these [00:02:09] complicated formulas based on these atomic formulas using a set of logical [00:02:11] atomic formulas using a set of logical connectives so these are negation and or [00:02:15] connectives so these are negation and or implication or bi-directional [00:02:17] implication or bi-directional implication [00:02:19] implication so let me actually write that here [00:02:22] so let me actually write that here so [00:02:23] so here so we're going to start with syntax [00:02:25] here so we're going to start with syntax so what does syntax have so syntax has [00:02:27] so what does syntax have so syntax has prepositional [00:02:29] prepositional formulas [00:02:31] formulas propositional symbols sorry [00:02:35] propositional symbols sorry so these are like a b [00:02:37] so these are like a b c and so on and then we can have [00:02:40] c and so on and then we can have formulas [00:02:42] formulas defined on top of them and let me use f [00:02:45] defined on top of them and let me use f for formula and how do i define formulas [00:02:47] for formula and how do i define formulas i use these logical connectives to to [00:02:50] i use these logical connectives to to create formulas these connectives that [00:02:52] create formulas these connectives that we just talked about okay [00:02:54] we just talked about okay so um like here are a couple of examples [00:02:57] so um like here are a couple of examples of how we go about it so we can build [00:02:59] of how we go about it so we can build these formulas recursively let's call f [00:03:02] these formulas recursively let's call f and g formulas here and if f and g are [00:03:04] and g formulas here and if f and g are formulas then i can even build more [00:03:07] formulas then i can even build more formulas on top of them so i can have [00:03:09] formulas on top of them so i can have negation of f as a new formula or f and [00:03:12] negation of f as a new formula or f and g as a new formula or f or g as a new [00:03:14] g as a new formula or f or g as a new formula if implication g or f [00:03:18] formula if implication g or f bi-directional implication g so like f [00:03:20] bi-directional implication g so like f is equivalent to g you can think of it [00:03:22] is equivalent to g you can think of it like that okay [00:03:24] like that okay here are a few examples so if a is is a [00:03:27] here are a few examples so if a is is a boolean symbol a is a formula by itself [00:03:30] boolean symbol a is a formula by itself negation of a is a formula [00:03:32] negation of a is a formula negation of b implying c is a formula so [00:03:35] negation of b implying c is a formula so i've just used a bunch of connectives [00:03:37] i've just used a bunch of connectives and created a more [00:03:39] and created a more complicated formula here [00:03:41] complicated formula here uh i can have this guy as a formula [00:03:43] uh i can have this guy as a formula right so negation of a is a formula [00:03:45] right so negation of a is a formula negation of b is a formula negation of b [00:03:47] negation of b is a formula negation of b implying c is a formula negation of b or [00:03:51] implying c is a formula negation of b or d is a formula and then or of these and [00:03:53] d is a formula and then or of these and end of these is also going to be a [00:03:55] end of these is also going to be a formula okay [00:03:57] formula okay negation of negation of a is a formula [00:03:59] negation of negation of a is a formula well why is that because a is a formula [00:04:02] well why is that because a is a formula negation of a is a formula negation of [00:04:04] negation of a is a formula negation of that is also formula [00:04:06] that is also formula and this guy a negation b is not a [00:04:09] and this guy a negation b is not a formula so why is that the case well [00:04:11] formula so why is that the case well negation of b is a formula a is a [00:04:13] negation of b is a formula a is a formula but a and negation of b are not [00:04:16] formula but a and negation of b are not connected with each other using any any [00:04:18] connected with each other using any any logical connectives so so this is just [00:04:20] logical connectives so so this is just basically putting two two bullion [00:04:23] basically putting two two bullion two formulas two logical formulas right [00:04:25] two formulas two logical formulas right next to each other without any [00:04:27] next to each other without any connectives and then that's not a [00:04:28] connectives and then that's not a formula [00:04:29] formula a plus b is not a formula well why is [00:04:33] a plus b is not a formula well why is that because plus just doesn't have any [00:04:35] that because plus just doesn't have any meaning like it doesn't have like the [00:04:36] meaning like it doesn't have like the syntax of it is not defined here right [00:04:38] syntax of it is not defined here right like i never defined plus and and that [00:04:41] like i never defined plus and and that doesn't have uh any that doesn't make [00:04:43] doesn't have uh any that doesn't make sense in this logic it's not defined in [00:04:46] sense in this logic it's not defined in this language okay [00:04:48] this language okay and whatever point i want to mention [00:04:49] and whatever point i want to mention here and we'll talk about semantics soon [00:04:52] here and we'll talk about semantics soon is that syntax here you can think of it [00:04:55] is that syntax here you can think of it just as symbols like syntax it doesn't [00:04:57] just as symbols like syntax it doesn't have any meanings right syntax is just [00:05:00] have any meanings right syntax is just the symbols that we are using here and [00:05:02] the symbols that we are using here and and with no meanings assigned to it and [00:05:04] and with no meanings assigned to it and the job of semantics is to to assign [00:05:07] the job of semantics is to to assign meanings to what does negation mean [00:05:09] meanings to what does negation mean actually or what does implication mean [00:05:12] actually or what does implication mean and like what would be the meaning of it [00:05:14] and like what would be the meaning of it but when we're talking about syntax i [00:05:15] but when we're talking about syntax i could use any other symbol i can use [00:05:17] could use any other symbol i can use this symbol and and and i can just [00:05:20] this symbol and and and i can just define that in my logic and and that [00:05:22] define that in my logic and and that would be the syntax of my logic so don't [00:05:24] would be the syntax of my logic so don't assign any meanings just yet and when [00:05:26] assign any meanings just yet and when you're talking about syntax it's just [00:05:28] you're talking about syntax it's just symbol manipulation when we are talking [00:05:30] symbol manipulation when we are talking about syntax right [00:05:31] about syntax right but next in the next module we're going [00:05:33] but next in the next module we're going to talk about semantics and giving some [00:05:34] to talk about semantics and giving some meanings [00:05:40] you ================================================================================ LECTURE 043 ================================================================================ Logic 3 - Propositional Logic Semantics | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=N37yIn1jX98 --- Transcript [00:00:05] all right so in this module we're going [00:00:08] all right so in this module we're going to be talking about semantics so we've [00:00:11] to be talking about semantics so we've started talking about syntax and in [00:00:13] started talking about syntax and in propositional logic and then we defined [00:00:15] propositional logic and then we defined formulas for positional formulas which [00:00:17] formulas for positional formulas which basically take propositional symbols and [00:00:19] basically take propositional symbols and logical connectives and put them [00:00:21] logical connectives and put them together and symbolically create [00:00:23] together and symbolically create something that we call a formula and [00:00:26] something that we call a formula and that was kind of a syntactic view of [00:00:28] that was kind of a syntactic view of things where we didn't assign any [00:00:29] things where we didn't assign any meanings there was no meanings for [00:00:31] meanings there was no meanings for anything it was just symbols and what we [00:00:33] anything it was just symbols and what we would like to do in this module is we're [00:00:34] would like to do in this module is we're trying to assign meanings to those [00:00:37] trying to assign meanings to those syntactical formulas that we defined and [00:00:39] syntactical formulas that we defined and then that corresponds to semantics so in [00:00:41] then that corresponds to semantics so in this in this module we're going to be [00:00:43] this in this module we're going to be talking about semantics and giving [00:00:44] talking about semantics and giving meanings to those formulas and in [00:00:47] meanings to those formulas and in general in this lecture we're going to [00:00:48] general in this lecture we're going to have a good number of definitions so i'm [00:00:51] have a good number of definitions so i'm going to write out those definitions in [00:00:52] going to write out those definitions in a separate whiteboard so we can keep [00:00:54] a separate whiteboard so we can keep track of them but a good number of [00:00:56] track of them but a good number of definitions are coming up especially in [00:00:58] definitions are coming up especially in this module and and let's start with [00:01:00] this module and and let's start with some of them so so the first definition [00:01:03] some of them so so the first definition is is a definition of a model and this [00:01:05] is is a definition of a model and this is a very poor choice of words you've [00:01:07] is a very poor choice of words you've been using the word model throughout uh [00:01:10] been using the word model throughout uh the the lectures of this class as a [00:01:12] the the lectures of this class as a different as a different thing right [00:01:14] different as a different thing right like we talked about modeling inference [00:01:16] like we talked about modeling inference and learning but in the logic lecture [00:01:18] and learning but in the logic lecture we're going to assign a different [00:01:20] we're going to assign a different meaning to the word model and that has [00:01:22] meaning to the word model and that has historical reasons because historically [00:01:24] historical reasons because historically like model has been used in these [00:01:26] like model has been used in these settings uh for for for in in logic in [00:01:30] settings uh for for for in in logic in in this particular way to to refer to [00:01:32] in this particular way to to refer to assignments really so so a model for [00:01:35] assignments really so so a model for this lecture and for the logic lecture [00:01:37] this lecture and for the logic lecture let's refer to a model in propositional [00:01:39] let's refer to a model in propositional logic as an assignment of truth values [00:01:42] logic as an assignment of truth values to propositional symbols [00:01:44] to propositional symbols so i'm going to use the word w for a [00:01:46] so i'm going to use the word w for a model w for world so so that's why it's [00:01:49] model w for world so so that's why it's it's called w so a model of w in [00:01:52] it's called w so a model of w in propositional logic is just an [00:01:54] propositional logic is just an assignment of truth values okay so what [00:01:56] assignment of truth values okay so what does that mean so let's look at an [00:01:58] does that mean so let's look at an example for example let's say we have [00:02:00] example for example let's say we have three propositional symbols a b and c [00:02:03] three propositional symbols a b and c okay how many models do we have well we [00:02:06] okay how many models do we have well we can have eight possible models right two [00:02:08] can have eight possible models right two to the three possible models or worlds [00:02:10] to the three possible models or worlds that we can live in and a particular w a [00:02:13] that we can live in and a particular w a particular model is going to be a [00:02:15] particular model is going to be a particular assignment so for example i [00:02:17] particular assignment so for example i can pick a equal to zero b equal to one [00:02:20] can pick a equal to zero b equal to one and c equal to zero and that's a model [00:02:22] and c equal to zero and that's a model that is one w that corresponds to an [00:02:24] that is one w that corresponds to an assignment of truth values to [00:02:26] assignment of truth values to prepositional symbols okay so let me [00:02:28] prepositional symbols okay so let me write that on our uh on our whiteboard [00:02:32] write that on our uh on our whiteboard here so going back here [00:02:34] here so going back here i'm going to start [00:02:36] i'm going to start under semantics [00:02:38] under semantics and then we have the word model [00:02:45] model [00:02:49] and i'm going to use the word w for it [00:02:51] and i'm going to use the word w for it okay [00:02:54] all right [00:02:57] let's go back [00:03:00] let's go back all right so now we are ready to define [00:03:02] all right so now we are ready to define this thing that's called interpretation [00:03:04] this thing that's called interpretation function and an interpretation function [00:03:06] function and an interpretation function is the thing that actually gives us [00:03:07] is the thing that actually gives us gives us semantics and gives us meaning [00:03:10] gives us semantics and gives us meaning so what is an interpretation function so [00:03:11] so what is an interpretation function so let f be a formula that's what we [00:03:14] let f be a formula that's what we defined in syntax and let w be a model [00:03:18] defined in syntax and let w be a model okay [00:03:18] okay so an assignment so an interpretation [00:03:20] so an assignment so an interpretation function i gets a formula gets a model [00:03:24] function i gets a formula gets a model and it basically outputs true or false [00:03:27] and it basically outputs true or false it basically tells us if if w satisfies [00:03:30] it basically tells us if if w satisfies f or if w doesn't satisfy f okay so so [00:03:33] f or if w doesn't satisfy f okay so so interpretation function is really the [00:03:35] interpretation function is really the thing that binds formulas and models [00:03:38] thing that binds formulas and models formulas living in the syntactic [00:03:40] formulas living in the syntactic syntactic land and models living in the [00:03:42] syntactic land and models living in the semantic land and and basically [00:03:44] semantic land and and basically interpretation function is trying to [00:03:46] interpretation function is trying to like connect them and tell us if that is [00:03:49] like connect them and tell us if that is true or not so so let me go back here [00:03:51] true or not so so let me go back here and let me let me write out [00:03:53] and let me let me write out interpretation function [00:03:55] interpretation function enter [00:03:57] enter rotation function [00:04:03] so it's a function i [00:04:05] so it's a function i that takes f a formula and w and it [00:04:08] that takes f a formula and w and it gives us true or false so let me go [00:04:12] gives us true or false so let me go here and show this so let's say that we [00:04:14] here and show this so let's say that we have a formula f i'm gonna write our or [00:04:17] have a formula f i'm gonna write our or draw our formulas using these rectangle [00:04:20] draw our formulas using these rectangle rectangles so uh in the in the syntactic [00:04:23] rectangles so uh in the in the syntactic land you might have a number of formulas [00:04:25] land you might have a number of formulas let's say i have one formula f and in [00:04:27] let's say i have one formula f and in the world of semantics we might have [00:04:30] the world of semantics we might have different models right these are [00:04:32] different models right these are different models or worlds that we can [00:04:35] different models or worlds that we can live in or worlds [00:04:39] and i can pick a specific one let me [00:04:41] and i can pick a specific one let me call that w no [00:04:46] okay let me call that w [00:04:49] okay let me call that w and and what i can do is i can basically [00:04:53] and and what i can do is i can basically connect the formula to this w using this [00:04:56] connect the formula to this w using this interpretation function so i'm going to [00:04:58] interpretation function so i'm going to interpret have an interpretation [00:04:59] interpret have an interpretation function f and w [00:05:01] function f and w that gives me true or false if if f [00:05:04] that gives me true or false if if f satisfies w or not okay i can have other [00:05:06] satisfies w or not okay i can have other w's in this space of models in this [00:05:09] w's in this space of models in this semantic line so how do we define an [00:05:11] semantic line so how do we define an interpretation function so [00:05:14] interpretation function so um the way we define an interpretation [00:05:15] um the way we define an interpretation function is recursively in a similar way [00:05:18] function is recursively in a similar way that we defined our syntax so we're [00:05:20] that we defined our syntax so we're going to start with propositional [00:05:22] going to start with propositional symbols so so we have these [00:05:23] symbols so so we have these propositional symbols a b and c right [00:05:26] propositional symbols a b and c right like these these take boolean values and [00:05:28] like these these take boolean values and the interpretation function of each one [00:05:31] the interpretation function of each one of these each one of these propositional [00:05:33] of these each one of these propositional symbols p [00:05:34] symbols p and a model w is just going to return w [00:05:38] and a model w is just going to return w of that prepositional prepositional [00:05:40] of that prepositional prepositional symbol okay remember w is an assignment [00:05:43] symbol okay remember w is an assignment to these so going back here let me let [00:05:45] to these so going back here let me let me give you an example so my w is going [00:05:47] me give you an example so my w is going to be an assignment that maybe says um a [00:05:51] to be an assignment that maybe says um a it takes the value zero okay [00:05:54] it takes the value zero okay and my prepositional symbol is just a [00:05:56] and my prepositional symbol is just a right so if i look at interpretation [00:05:58] right so if i look at interpretation function over p and w it's basically [00:06:02] function over p and w it's basically checking [00:06:04] checking in interpreting a [00:06:06] in interpreting a and [00:06:08] and the model a being assigned to equal to [00:06:10] the model a being assigned to equal to zero and that returns the value zero [00:06:13] zero and that returns the value zero okay so so that is just for the for the [00:06:16] okay so so that is just for the for the base case of this okay [00:06:18] base case of this okay so then when we think about more general [00:06:21] so then when we think about more general formulas these formulas how are they how [00:06:23] formulas these formulas how are they how are they defined they're defined based [00:06:24] are they defined they're defined based on these logical connectives used [00:06:27] on these logical connectives used applied over prepositional symbols and [00:06:29] applied over prepositional symbols and based on that we can we can basically [00:06:31] based on that we can we can basically recursively define this interpretation [00:06:33] recursively define this interpretation function so i can have a formula f and a [00:06:36] function so i can have a formula f and a formula g and i can create kind of this [00:06:38] formula g and i can create kind of this truth table and an interpretation [00:06:40] truth table and an interpretation function of f and w could take value [00:06:43] function of f and w could take value zero or one interpretation function of g [00:06:45] zero or one interpretation function of g and w could take value zero or one and [00:06:48] and w could take value zero or one and then for anything else like for any of [00:06:50] then for anything else like for any of these other logical connectives i can [00:06:52] these other logical connectives i can basically recursively define them so [00:06:54] basically recursively define them so what would be an interpretation function [00:06:56] what would be an interpretation function of negation of f and w it would [00:06:58] of negation of f and w it would basically try to negate this column [00:07:01] basically try to negate this column right so so it would be one one zero [00:07:02] right so so it would be one one zero zero or if i'm thinking about [00:07:04] zero or if i'm thinking about interpretation of f and the g well what [00:07:07] interpretation of f and the g well what would that be [00:07:08] would that be and and and the model w what would that [00:07:10] and and and the model w what would that be that would be interpretation function [00:07:12] be that would be interpretation function of f and w and interpretation of g and w [00:07:16] of f and w and interpretation of g and w so basically adding these two columns [00:07:18] so basically adding these two columns and that gives us these values and so on [00:07:21] and that gives us these values and so on similarly we can define interpretation [00:07:22] similarly we can define interpretation function over f or g or f implying g or [00:07:26] function over f or g or f implying g or if bi-directional implication of g and [00:07:28] if bi-directional implication of g and and so on and then we can kind of assign [00:07:31] and so on and then we can kind of assign meanings to these more more generic [00:07:33] meanings to these more more generic formulas okay [00:07:36] formulas okay all right so so let's look at an example [00:07:38] all right so so let's look at an example of how do we do this recursively so [00:07:39] of how do we do this recursively so let's say we have a formula f and that [00:07:41] let's say we have a formula f and that formula is negation of a and b and [00:07:44] formula is negation of a and b and bi-directional implications c okay so [00:07:46] bi-directional implications c okay so that's my formula i have an assignment i [00:07:49] that's my formula i have an assignment i have i have a model that model is truth [00:07:51] have i have a model that model is truth assignment to my prepositional symbols a [00:07:54] assignment to my prepositional symbols a b and c so let's say a is one b is one [00:07:57] b and c so let's say a is one b is one and c is equal to 0 and now i can call [00:07:59] and c is equal to 0 and now i can call an interpretation function f and w and [00:08:02] an interpretation function f and w and see what the value of that would be [00:08:04] see what the value of that would be how do we do that well let's start with [00:08:06] how do we do that well let's start with these notes so um at this node i can [00:08:09] these notes so um at this node i can call interpretation function over symbol [00:08:11] call interpretation function over symbol a and w what is that equal to well that [00:08:13] a and w what is that equal to well that is equal to just one because i'm just [00:08:16] is equal to just one because i'm just going to call my my table of models [00:08:18] going to call my my table of models right i already have this that's just [00:08:20] right i already have this that's just equal to one [00:08:21] equal to one then negation of a is going to be equal [00:08:23] then negation of a is going to be equal to zero what is interpretation function [00:08:25] to zero what is interpretation function of b and w well again i have an [00:08:27] of b and w well again i have an assignment i have a model that tells me [00:08:29] assignment i have a model that tells me b takes value one so that's one [00:08:32] b takes value one so that's one and then if i'm taking an interpretation [00:08:34] and then if i'm taking an interpretation function of negation of a and b then [00:08:36] function of negation of a and b then that is an interval the the end of these [00:08:39] that is an interval the the end of these these two so one and zero gives me zero [00:08:42] these two so one and zero gives me zero similarly i can look at interpretation [00:08:44] similarly i can look at interpretation function of c and w reading that off my [00:08:46] function of c and w reading that off my model that is equal to zero and then [00:08:49] model that is equal to zero and then when i'm looking at equivalence of this [00:08:51] when i'm looking at equivalence of this c and negation of a and b well these two [00:08:54] c and negation of a and b well these two are equivalent so that's just going to [00:08:56] are equivalent so that's just going to be equal to one okay so this is just [00:08:58] be equal to one okay so this is just showing recursively how do we how do we [00:09:01] showing recursively how do we how do we run an interpretation function see [00:09:03] run an interpretation function see there's like there's no learning here [00:09:04] there's like there's no learning here this is like defined by logic right like [00:09:06] this is like defined by logic right like you can define your own logic that would [00:09:08] you can define your own logic that would be fun but but this is defined by some [00:09:10] be fun but but this is defined by some sort of logic that is this propositional [00:09:12] sort of logic that is this propositional logic that we have defined using using [00:09:14] logic that we have defined using using our formulas and using using our [00:09:16] our formulas and using using our connectives and so on and i'm just like [00:09:19] connectives and so on and i'm just like calling this like i'm just computing [00:09:20] calling this like i'm just computing this i'm not doing anything like fancy [00:09:22] this i'm not doing anything like fancy here okay [00:09:25] here okay all right so um so each formula and [00:09:28] all right so um so each formula and model right like we'll give like we can [00:09:29] model right like we'll give like we can interpret it using this interpretation [00:09:31] interpret it using this interpretation function and that gives us a value zero [00:09:34] function and that gives us a value zero or one okay so so now i'm going to [00:09:36] or one okay so so now i'm going to define this thing that's called models [00:09:39] define this thing that's called models of f and basically it's a set of w's [00:09:42] of f and basically it's a set of w's it's a set of models where [00:09:43] it's a set of models where interpretation function is equal to one [00:09:46] interpretation function is equal to one so going back here right like looking at [00:09:48] so going back here right like looking at this this this there could be one w and [00:09:51] this this this there could be one w and i can check interpretation function of f [00:09:53] i can check interpretation function of f and w i can also be looking at a set of [00:09:56] and w i can also be looking at a set of models right and i can call this models [00:09:58] models right and i can call this models of f [00:09:59] of f and what is models of f right models of [00:10:03] and what is models of f right models of f is a setting where interpretation is [00:10:05] f is a setting where interpretation is it's a set of w's so let me write that [00:10:08] it's a set of w's so let me write that here models of f what is that equal to [00:10:10] here models of f what is that equal to it's a set it's a set of w's such that [00:10:13] it's a set it's a set of w's such that interpretation function of f and w [00:10:17] interpretation function of f and w is going to be equal to 1. [00:10:20] is going to be equal to 1. so let me write that in the set of my [00:10:21] so let me write that in the set of my definitions so we talked about [00:10:23] definitions so we talked about interpretation function we talked about [00:10:24] interpretation function we talked about a single model [00:10:26] a single model and [00:10:27] and i don't know why this does this let me [00:10:29] i don't know why this does this let me erase this [00:10:31] erase this we have models [00:10:35] m of f [00:10:37] m of f okay and what is that that is a set of [00:10:39] okay and what is that that is a set of w's such that interpretation function of [00:10:42] w's such that interpretation function of f and w [00:10:43] f and w is just equal to one [00:10:45] is just equal to one okay [00:10:46] okay all right so now we have our models [00:10:49] all right so now we have our models let's go back here [00:10:51] let's go back here okay [00:10:53] all right so so basically um intuitively [00:10:56] all right so so basically um intuitively you can think of this models of f as all [00:10:59] you can think of this models of f as all the all the worlds all the assignments [00:11:01] the all the worlds all the assignments were f holds and anything outside of [00:11:04] were f holds and anything outside of this is still like has some world has [00:11:07] this is still like has some world has some possibility but that's a setting [00:11:09] some possibility but that's a setting where this particular f doesn't doesn't [00:11:11] where this particular f doesn't doesn't necessarily hold okay [00:11:13] necessarily hold okay so so let's look at an example let's say [00:11:15] so so let's look at an example let's say our formula f is rain or red okay [00:11:19] our formula f is rain or red okay so then if i think about models all [00:11:22] so then if i think about models all possible models are when we think about [00:11:24] possible models are when we think about rain taking values zero or one and with [00:11:27] rain taking values zero or one and with taking value 0 1 so i can kind of like [00:11:29] taking value 0 1 so i can kind of like show that by this 2x2 grid that's all [00:11:32] show that by this 2x2 grid that's all possible models but what is models of f [00:11:35] possible models but what is models of f models of f is when rain or red holds [00:11:39] models of f is when rain or red holds and that is the shaded area right so the [00:11:41] and that is the shaded area right so the shaded area here is showing kind of like [00:11:44] shaded area here is showing kind of like the meaning of this formula f which is [00:11:47] the meaning of this formula f which is again symbolically written doesn't have [00:11:49] again symbolically written doesn't have a meaning but models of f is assigning a [00:11:52] a meaning but models of f is assigning a meaning to it so it's saying hey these [00:11:54] meaning to it so it's saying hey these grids is showing like what is the [00:11:57] grids is showing like what is the meaning of rain or wet okay [00:12:00] meaning of rain or wet okay and and the key idea here in logic in [00:12:02] and and the key idea here in logic in general is there's this formula although [00:12:04] general is there's this formula although it is written like again syntactically [00:12:06] it is written like again syntactically and then it is it is a symbolic [00:12:08] and then it is it is a symbolic representation it's a very compact like [00:12:10] representation it's a very compact like representation of a giant set of models [00:12:13] representation of a giant set of models right so so in general the nice thing [00:12:15] right so so in general the nice thing about logic is is we could use formulas [00:12:18] about logic is is we could use formulas to compactly represent like very like [00:12:20] to compactly represent like very like large meanings like a lot of times like [00:12:22] large meanings like a lot of times like exponential meanings could be [00:12:23] exponential meanings could be represented by by formulas that are [00:12:26] represented by by formulas that are pretty compact and nice and then that is [00:12:28] pretty compact and nice and then that is kind of the power of logic you can write [00:12:29] kind of the power of logic you can write this compactly and then you can do [00:12:31] this compactly and then you can do operations on it you can do inference on [00:12:33] operations on it you can do inference on it and so on and that that is really [00:12:35] it and so on and that that is really nice okay [00:12:37] nice okay all right so that was formulas and [00:12:39] all right so that was formulas and models and interpretation functions [00:12:41] models and interpretation functions finding formulas and models [00:12:43] finding formulas and models and and what now we want to do is we [00:12:45] and and what now we want to do is we want to think about how could we do [00:12:47] want to think about how could we do operations here and how could we what [00:12:49] operations here and how could we what would what would meanings like what [00:12:51] would what would meanings like what would new formulas add in terms of [00:12:53] would new formulas add in terms of meanings to the knowledge that we [00:12:55] meanings to the knowledge that we already have so for that let's define [00:12:57] already have so for that let's define something that's called a knowledge base [00:12:59] something that's called a knowledge base so a knowledge base is a set of formulas [00:13:02] so a knowledge base is a set of formulas that i already know okay so if i have [00:13:03] that i already know okay so if i have like a system a virtual assistant system [00:13:06] like a system a virtual assistant system that that i want to i want to add logic [00:13:08] that that i want to i want to add logic to it or i want to speak to it using [00:13:10] to it or i want to speak to it using using language or using logic [00:13:12] using language or using logic that that system has a knowledge base [00:13:14] that that system has a knowledge base which is a set of formulas that are [00:13:16] which is a set of formulas that are already represented it's a conjunction [00:13:18] already represented it's a conjunction or intersection of a bunch of things [00:13:20] or intersection of a bunch of things that it already knows okay so let me go [00:13:23] that it already knows okay so let me go back here and write out knowledge base [00:13:25] back here and write out knowledge base as a thing [00:13:27] as a thing so [00:13:28] so knowledge [00:13:31] space [00:13:33] kb i'm going to use kb for this and this [00:13:36] kb i'm going to use kb for this and this is a set of formulas that that you [00:13:38] is a set of formulas that that you already know okay so we might already [00:13:40] already know okay so we might already know um a formula that says rain or snow [00:13:44] know um a formula that says rain or snow or we might already know there's traffic [00:13:46] or we might already know there's traffic okay so this is our knowledge base okay [00:13:49] okay so this is our knowledge base okay so then what happens is is that someone [00:13:52] so then what happens is is that someone might come and give me a new formula and [00:13:55] might come and give me a new formula and what we're interested in looking at is [00:13:56] what we're interested in looking at is how does how does that affect our [00:13:58] how does how does that affect our knowledge [00:13:59] knowledge so before getting in there so so [00:14:01] so before getting in there so so knowledge base is a set of formulas so [00:14:03] knowledge base is a set of formulas so it is in the syntax land [00:14:05] it is in the syntax land what would be the analog of it in the [00:14:07] what would be the analog of it in the semantics land it would be models of kb [00:14:09] semantics land it would be models of kb and and what is the models of kb models [00:14:12] and and what is the models of kb models of kb is going to be an intersection of [00:14:15] of kb is going to be an intersection of models of f [00:14:16] models of f okay so maybe let's go back here let me [00:14:19] okay so maybe let's go back here let me let me just look at an example so we [00:14:21] let me just look at an example so we looked at an example where [00:14:23] looked at an example where let's say i have a formula f1 and if one [00:14:27] let's say i have a formula f1 and if one says it's raining and snowing [00:14:31] says it's raining and snowing okay [00:14:32] okay and maybe i have f2 and f2 says there is [00:14:34] and maybe i have f2 and f2 says there is traffic [00:14:36] traffic okay [00:14:38] let me let me separate these okay [00:14:41] let me let me separate these okay and and i have a knowledge base and my [00:14:43] and and i have a knowledge base and my knowledge base has f1 [00:14:45] knowledge base has f1 and f2 in it okay so someone already [00:14:48] and f2 in it okay so someone already told me there's raining and snowing and [00:14:50] told me there's raining and snowing and then there's traffic so what would be [00:14:52] then there's traffic so what would be models of kb so models of kb [00:14:58] models of kb so models of kb is is going to be an intersection of [00:15:00] is is going to be an intersection of models of f1 [00:15:04] models of f1 with models of f2 [00:15:06] with models of f2 and and why is that because if you think [00:15:08] and and why is that because if you think about it right like [00:15:10] about it right like f1 is a formula [00:15:12] f1 is a formula f1 has a set of models corresponding to [00:15:15] f1 has a set of models corresponding to it models of f1 corresponding to it [00:15:19] it models of f1 corresponding to it and f2 is another formula that i'm just [00:15:21] and f2 is another formula that i'm just adding to my knowledge base [00:15:23] adding to my knowledge base and that has a bunch of other [00:15:26] and that has a bunch of other models corresponding to it [00:15:28] models corresponding to it and my knowledge base right like is now [00:15:31] and my knowledge base right like is now going to be an intersection [00:15:33] going to be an intersection of these two because as we add more [00:15:36] of these two because as we add more formulas right as we add more knowledge [00:15:38] formulas right as we add more knowledge to our knowledge base then then our [00:15:41] to our knowledge base then then our models is going to become smaller and [00:15:42] models is going to become smaller and smaller because we are adding more [00:15:44] smaller because we are adding more constraints which is pretty interesting [00:15:46] constraints which is pretty interesting right so in general if i have let me [00:15:48] right so in general if i have let me maybe write that in a different color so [00:15:50] maybe write that in a different color so if i have a knowledge base [00:15:52] if i have a knowledge base and i add a new formula to that [00:15:53] and i add a new formula to that knowledge base maybe maybe i union it [00:15:56] knowledge base maybe maybe i union it with a new formula that is added new f [00:15:59] with a new formula that is added new f what would be the effect of that on [00:16:01] what would be the effect of that on models of kb [00:16:03] models of kb the effect of data models of kb is going [00:16:06] the effect of data models of kb is going to be what i had for models of kb [00:16:08] to be what i had for models of kb intersection models of f [00:16:11] intersection models of f so adding new formulas is constraining [00:16:14] so adding new formulas is constraining our models right constraining the [00:16:16] our models right constraining the meaning more and more because it can be [00:16:18] meaning more and more because it can be like if you have raining and snowing and [00:16:20] like if you have raining and snowing and you have traffic as a whole other set of [00:16:21] you have traffic as a whole other set of models the intersection of the two is [00:16:23] models the intersection of the two is going to give me give me um this this [00:16:25] going to give me give me um this this this models okay [00:16:27] this models okay so um also let me connect this so this [00:16:30] so um also let me connect this so this corresponds to these models correspond [00:16:31] corresponds to these models correspond to these mods [00:16:32] to these mods all right [00:16:34] all right so let's go back here [00:16:37] so let's go back here so so that's how we define models of [00:16:39] so so that's how we define models of knowledge base let's look at another [00:16:41] knowledge base let's look at another example here so let's say we're looking [00:16:43] example here so let's say we're looking at raining as one formula [00:16:45] at raining as one formula models of rain is going to be this [00:16:47] models of rain is going to be this shaded area rain is equal to one and [00:16:50] shaded area rain is equal to one and then we have another formula rain [00:16:52] then we have another formula rain implies wet and what is models of rain [00:16:55] implies wet and what is models of rain implies wet it is uh basically negation [00:16:58] implies wet it is uh basically negation of rain or red so basically it is this [00:17:00] of rain or red so basically it is this shaded area [00:17:02] shaded area if i'm looking at a knowledge base that [00:17:04] if i'm looking at a knowledge base that has both of these in it [00:17:06] has both of these in it then what would be the models of that [00:17:07] then what would be the models of that knowledge base is just going to be the [00:17:10] knowledge base is just going to be the intersection of these two shaded area [00:17:12] intersection of these two shaded area which is basically this square okay [00:17:14] which is basically this square okay where we have both rain and rain implies [00:17:16] where we have both rain and rain implies vent folding [00:17:18] vent folding all right sounds good [00:17:21] all right sounds good and then this is what i've already [00:17:22] and then this is what i've already basically mentioned we have knowledge [00:17:23] basically mentioned we have knowledge base if we add a formula to it we [00:17:25] base if we add a formula to it we increase the size of our knowledge base [00:17:26] increase the size of our knowledge base we're shrinking the the size of set of [00:17:29] we're shrinking the the size of set of models because we're constraining things [00:17:30] models because we're constraining things more and more so we are constraining the [00:17:32] more and more so we are constraining the meaning part [00:17:35] meaning part all right so so now let's talk about [00:17:37] all right so so now let's talk about this idea of what happens if i have a [00:17:39] this idea of what happens if i have a knowledge base and i add a new formula [00:17:41] knowledge base and i add a new formula so i have a knowledge base i'm trying to [00:17:43] so i have a knowledge base i'm trying to add a new formula see what happens and [00:17:45] add a new formula see what happens and there are three things that can happen [00:17:47] there are three things that can happen so so one option is entailment so what [00:17:50] so so one option is entailment so what entailment says is if i have a knowledge [00:17:52] entailment says is if i have a knowledge base if i have kb as my knowledge base [00:17:55] base if i have kb as my knowledge base and you come and tell me a new formula f [00:17:57] and you come and tell me a new formula f and that formula is not adding anything [00:18:00] and that formula is not adding anything to my knowledge base then then we say we [00:18:03] to my knowledge base then then we say we have entailment okay so so this is a [00:18:05] have entailment okay so so this is a scenario where f is just not adding any [00:18:08] scenario where f is just not adding any information or any new constraints like [00:18:10] information or any new constraints like this is basically telling me things are [00:18:11] this is basically telling me things are already new okay so so we say kb entails [00:18:15] already new okay so so we say kb entails f and what that [00:18:16] f and what that that is written using this double line [00:18:19] that is written using this double line kind of um [00:18:20] kind of um symbol so if you say kb entails f if and [00:18:23] symbol so if you say kb entails f if and only if models of kb is an intersection [00:18:27] only if models of kb is an intersection of models of f [00:18:29] of models of f let's look at an example here so let's [00:18:31] let's look at an example here so let's go back to [00:18:33] go back to uh let's go back here maybe i'll start a [00:18:36] uh let's go back here maybe i'll start a new [00:18:38] new so we have three options [00:18:41] so we have three options one is called entailment [00:18:45] entailment [00:18:48] so let's start with a knowledge base [00:18:52] so let's start with a knowledge base and my knowledge base maybe is rain and [00:18:56] and my knowledge base maybe is rain and snow so i have a formula in my knowledge [00:18:58] snow so i have a formula in my knowledge base that says rain and snow [00:19:01] base that says rain and snow and that has models corresponding to it [00:19:04] and that has models corresponding to it so this is models of kb [00:19:08] and you might come and tell me a new [00:19:10] and you might come and tell me a new formula and that new formula is rain [00:19:13] formula and that new formula is rain and if you tell me rain and i already [00:19:15] and if you tell me rain and i already have rain and snow in my knowledge base [00:19:17] have rain and snow in my knowledge base that doesn't add any knowledge to me [00:19:19] that doesn't add any knowledge to me right like i already knew it was raining [00:19:21] right like i already knew it was raining so so then um this models is going to be [00:19:25] so so then um this models is going to be as super models of f [00:19:27] as super models of f is going to be a super set of models of [00:19:29] is going to be a super set of models of kb [00:19:31] kb okay so we say kb [00:19:33] okay so we say kb entails f [00:19:35] entails f if and only if [00:19:37] if and only if models of kb [00:19:40] models of kb is an intersection [00:19:42] is an intersection of models of f [00:19:44] of models of f so [00:19:45] so didn't didn't tell me anything new [00:19:47] didn't didn't tell me anything new already knew that and and that is [00:19:49] already knew that and and that is entailment so let me go back here and [00:19:52] entailment so let me go back here and maybe add these um [00:19:54] maybe add these um under [00:19:55] under our definition so now we have defined [00:19:58] our definition so now we have defined entailment [00:20:01] entailment contracts okay [00:20:04] contracts okay all right [00:20:05] all right so let's go back here [00:20:09] so rain and snow is entailing stem [00:20:14] so rain and snow is entailing stem so [00:20:15] so okay so that was one option another [00:20:16] okay so that was one option another option is contradiction so so what is [00:20:18] option is contradiction so so what is contradiction contradiction is a [00:20:20] contradiction contradiction is a scenario where you're telling me a new [00:20:22] scenario where you're telling me a new formula f i already have a knowledge [00:20:23] formula f i already have a knowledge base kb you tell me a new formula f and [00:20:26] base kb you tell me a new formula f and that is contradicting with my knowledge [00:20:29] that is contradicting with my knowledge base okay so [00:20:31] base okay so the the in in the models land what [00:20:33] the the in in the models land what happens is that models of kb doesn't [00:20:35] happens is that models of kb doesn't have any intersections with models of f [00:20:40] have any intersections with models of f so f contradicts what we know our [00:20:42] so f contradicts what we know our knowledge base if and only if models of [00:20:45] knowledge base if and only if models of kb intersection models of f is going to [00:20:47] kb intersection models of f is going to be the empty set [00:20:49] be the empty set right so so let's look at an example uh [00:20:52] right so so let's look at an example uh let's maybe go back here so [00:20:54] let's maybe go back here so our second option is contradiction [00:20:57] our second option is contradiction so let's write that here [00:20:59] so let's write that here contradiction [00:21:03] so contradiction is a scenario where i [00:21:06] so contradiction is a scenario where i know some uh knowledge base [00:21:09] know some uh knowledge base so my knowledge base is maybe [00:21:11] so my knowledge base is maybe rain and snow again so [00:21:14] rain and snow again so so i think it's raining and snowing [00:21:16] so i think it's raining and snowing and then you come and tell me a new new [00:21:18] and then you come and tell me a new new formula and that new formula is negation [00:21:20] formula and that new formula is negation of snow maybe [00:21:22] of snow maybe okay [00:21:23] okay and then that contradicts with my [00:21:25] and then that contradicts with my knowledge base right so if that [00:21:27] knowledge base right so if that contradicts with my knowledge base what [00:21:28] contradicts with my knowledge base what happens is that there is a models of kb [00:21:33] happens is that there is a models of kb and there is a models of f [00:21:36] and there is a models of f and they don't have any intersection so [00:21:37] and they don't have any intersection so contradiction is a scenario where models [00:21:40] contradiction is a scenario where models of kb [00:21:41] of kb intersection models of f [00:21:44] intersection models of f is empty [00:21:48] one other interesting thing to to kind [00:21:50] one other interesting thing to to kind of like notice here [00:21:52] of like notice here is that if you think about contradiction [00:21:54] is that if you think about contradiction contradiction is very related to [00:21:56] contradiction is very related to entailment [00:21:58] entailment contradiction is the same thing as [00:22:00] contradiction is the same thing as entailing negation of f and then why is [00:22:02] entailing negation of f and then why is that the case because if you look at [00:22:04] that the case because if you look at models of f right [00:22:06] models of f right models of negation of f is anything [00:22:08] models of negation of f is anything outside of it right so if this is models [00:22:10] outside of it right so if this is models of negation of f [00:22:12] of negation of f then what is happening the thing that's [00:22:14] then what is happening the thing that's happening is that models of of kb [00:22:20] is a subset of models of negation of f [00:22:23] is a subset of models of negation of f and and if you remember our definition [00:22:25] and and if you remember our definition of entailment that is the same thing as [00:22:27] of entailment that is the same thing as kb [00:22:28] kb entailing negation of f [00:22:31] entailing negation of f okay so that's pretty interesting [00:22:33] okay so that's pretty interesting because contradiction is the same thing [00:22:35] because contradiction is the same thing as entailing negation of f [00:22:39] as entailing negation of f all right so so those were the two cases [00:22:41] all right so so those were the two cases so far right you either told me a new [00:22:43] so far right you either told me a new formula and and already knew it so that [00:22:46] formula and and already knew it so that is entailment or you tell me a new [00:22:48] is entailment or you tell me a new formula and that contradicts the [00:22:50] formula and that contradicts the knowledge base that i've had so that is [00:22:52] knowledge base that i've had so that is contradiction okay and let's add that [00:22:54] contradiction okay and let's add that here [00:22:55] here so we talked about entailment [00:22:58] so we talked about entailment now we've talked about contradiction [00:23:02] and we wrote entailment as kb entailing [00:23:05] and we wrote entailment as kb entailing a formula and contradiction is kb [00:23:08] a formula and contradiction is kb entailing negation of a formula okay [00:23:12] entailing negation of a formula okay all right so there is a third case let's [00:23:14] all right so there is a third case let's talk about that third case [00:23:16] talk about that third case let me skip that so we talked about [00:23:18] let me skip that so we talked about contradiction being very related to [00:23:20] contradiction being very related to entailment and kb contradicting with f [00:23:23] entailment and kb contradicting with f is the same thing as kb entailing [00:23:25] is the same thing as kb entailing negation of f [00:23:27] negation of f all right so the third case here is what [00:23:29] all right so the third case here is what we're calling contingency and that is [00:23:32] we're calling contingency and that is when you're telling me a formula and [00:23:34] when you're telling me a formula and that formula is adding is actually [00:23:36] that formula is adding is actually telling me something i didn't know it's [00:23:37] telling me something i didn't know it's telling me some non-trivial information [00:23:40] telling me some non-trivial information okay so so that is when models of kb has [00:23:43] okay so so that is when models of kb has some intersection with some non-trivial [00:23:45] some intersection with some non-trivial intersection with models of f okay [00:23:49] intersection with models of f okay so so that is uh when we write models of [00:23:51] so so that is uh when we write models of kb intersection models of f is is going [00:23:54] kb intersection models of f is is going to be a subset of models of kb but it's [00:23:57] to be a subset of models of kb but it's going to be strict subset of models of [00:23:59] going to be strict subset of models of kb if this is equal like we get [00:24:01] kb if this is equal like we get entailment so so we don't include uh [00:24:03] entailment so so we don't include uh equality here okay [00:24:05] equality here okay all right let's look at an example maybe [00:24:07] all right let's look at an example maybe um let's go back here [00:24:12] so our third case [00:24:14] so our third case is contingency [00:24:20] and that is when i have maybe my [00:24:22] and that is when i have maybe my knowledge base and maybe my knowledge [00:24:24] knowledge base and maybe my knowledge base is just rain this time [00:24:26] base is just rain this time and you come and tell me a new formula [00:24:27] and you come and tell me a new formula and that new formula is no okay so so my [00:24:31] and that new formula is no okay so so my knowledge base thought it is raining so [00:24:33] knowledge base thought it is raining so i have my models of knowledge base [00:24:34] i have my models of knowledge base corresponding to raining here and then [00:24:36] corresponding to raining here and then you come and tell me hey by the way it [00:24:38] you come and tell me hey by the way it is it is also snowing and models of [00:24:41] is it is also snowing and models of snowing is here and there is some [00:24:43] snowing is here and there is some non-trivial intersection going on here [00:24:46] non-trivial intersection going on here okay so so contingency is when models of [00:24:49] okay so so contingency is when models of kb [00:24:50] kb intersection models of f [00:24:54] intersection models of f is going to be a subset and it's not [00:24:56] is going to be a subset and it's not going to be equal to models of kb [00:24:59] going to be equal to models of kb and similarly empty set is going to be a [00:25:02] and similarly empty set is going to be a subset of this but it's not going to be [00:25:03] subset of this but it's not going to be equal so if you get like some [00:25:04] equal so if you get like some non-trivial information something that [00:25:07] non-trivial information something that you didn't know and that gets added [00:25:09] you didn't know and that gets added okay so so that is contingency [00:25:12] okay so so that is contingency and going back here let me add [00:25:14] and going back here let me add contingency as the third [00:25:16] contingency as the third option [00:25:23] and contingency um is when you have [00:25:26] and contingency um is when you have these non-trivial intersections i'm not [00:25:27] these non-trivial intersections i'm not gonna write it out [00:25:29] gonna write it out all right [00:25:30] all right let's go back here okay so so we have [00:25:32] let's go back here okay so so we have these three possibilities you give me a [00:25:34] these three possibilities you give me a new formula [00:25:36] new formula i'm either entailing it or contradicting [00:25:38] i'm either entailing it or contradicting it or or i have contingency [00:25:40] it or or i have contingency so now let's talk about let's talk about [00:25:43] so now let's talk about let's talk about how we would use these ideas if we want [00:25:46] how we would use these ideas if we want to implement a virtual assistant [00:25:47] to implement a virtual assistant remember like we started this lecture [00:25:50] remember like we started this lecture thinking about having a virtual [00:25:51] thinking about having a virtual assistant where we can talk to it in [00:25:52] assistant where we can talk to it in logic or language and that virtual [00:25:55] logic or language and that virtual assistants right like you can tell with [00:25:56] assistants right like you can tell with some information or we can ask it [00:25:58] some information or we can ask it questions and then maybe you want to [00:26:00] questions and then maybe you want to implement this tell operation so if i [00:26:02] implement this tell operation so if i want to implement a tell operation in [00:26:05] want to implement a tell operation in this in this virtual assistant the [00:26:06] this in this virtual assistant the spiritual assistant is going to have [00:26:08] spiritual assistant is going to have some knowledge base at the moment some [00:26:10] some knowledge base at the moment some kb and i tell it f a new formula f so [00:26:13] kb and i tell it f a new formula f so what can happen so if i tell it a new [00:26:16] what can happen so if i tell it a new formula maybe i tell it it is raining so [00:26:18] formula maybe i tell it it is raining so i do tell rain [00:26:20] i do tell rain three things can happen right it can [00:26:22] three things can happen right it can either it can either entail f right my [00:26:24] either it can either entail f right my knowledge base might already like have [00:26:26] knowledge base might already like have raining in it [00:26:28] raining in it in that case the response to tell [00:26:30] in that case the response to tell operation until tell it is raining it's [00:26:32] operation until tell it is raining it's going to be i already knew that okay so [00:26:34] going to be i already knew that okay so so if it says if my virtual assistant [00:26:37] so if it says if my virtual assistant already entails rain it should respond [00:26:39] already entails rain it should respond to me as i already knew that [00:26:41] to me as i already knew that okay if it contradicts them it should [00:26:44] okay if it contradicts them it should say i don't believe that because it's [00:26:46] say i don't believe that because it's knowledge base basically says it's not [00:26:47] knowledge base basically says it's not raining and now you're telling it it is [00:26:49] raining and now you're telling it it is raining and and because of that it would [00:26:51] raining and and because of that it would respond as i don't believe that because [00:26:53] respond as i don't believe that because my knowledge base tells me the opposite [00:26:56] my knowledge base tells me the opposite or if you're telling it something new [00:26:58] or if you're telling it something new you're telling it it is raining and it [00:26:59] you're telling it it is raining and it didn't know that or didn't have any [00:27:00] didn't know that or didn't have any information about it then it should say [00:27:02] information about it then it should say i learned something new and based on [00:27:05] i learned something new and based on that thing that you're telling it it [00:27:06] that thing that you're telling it it should update knowledge base it should [00:27:08] should update knowledge base it should update its kb by by this this new [00:27:10] update its kb by by this this new formula that it is raining okay so so [00:27:13] formula that it is raining okay so so now we can implement a tele operation [00:27:15] now we can implement a tele operation based on based on these three ideas of [00:27:17] based on based on these three ideas of entailment contingency and contradiction [00:27:20] entailment contingency and contradiction in a very similar fashion we can also [00:27:22] in a very similar fashion we can also implement an ask operation if you ask it [00:27:24] implement an ask operation if you ask it is it raining then based on that it can [00:27:27] is it raining then based on that it can it can go ahead and it can answer yes a [00:27:30] it can go ahead and it can answer yes a definite yes if kb already entails f so [00:27:33] definite yes if kb already entails f so if you have entailment it should answer [00:27:35] if you have entailment it should answer no if we have a contradiction if kb [00:27:38] no if we have a contradiction if kb contradicts f right or if kb entails [00:27:41] contradicts f right or if kb entails negation of f so it should give us a [00:27:43] negation of f so it should give us a definite no no that there's a [00:27:44] definite no no that there's a contradiction [00:27:46] contradiction or it should tell us i don't know if [00:27:48] or it should tell us i don't know if there is contingency it doesn't know so [00:27:50] there is contingency it doesn't know so if you ask it is it raining it doesn't [00:27:51] if you ask it is it raining it doesn't know so it should just say i don't know [00:27:53] know so it should just say i don't know okay so so going back to the things we [00:27:56] okay so so going back to the things we were defining here let me let me move [00:27:58] were defining here let me let me move this inference rule thing [00:28:01] this inference rule thing okay [00:28:02] okay so um let me just write [00:28:05] so um let me just write tell n so so we talked about this till [00:28:08] tell n so so we talked about this till an ask operation and and we can [00:28:10] an ask operation and and we can basically implement tell and ask based [00:28:13] basically implement tell and ask based on this entailment contradiction and [00:28:15] on this entailment contradiction and contingency [00:28:17] contingency okay [00:28:19] okay all right so i want to just do a quick [00:28:21] all right so i want to just do a quick side note here i don't want to go into [00:28:22] side note here i don't want to go into this in too much detail but there is a [00:28:25] this in too much detail but there is a connection between the things we were [00:28:27] connection between the things we were talking about here and and some of the [00:28:29] talking about here and and some of the topics we discussed like two weeks ago [00:28:31] topics we discussed like two weeks ago in bayesian networks right so so we've [00:28:33] in bayesian networks right so so we've been talking about this idea of models [00:28:35] been talking about this idea of models and having having these types of these [00:28:37] and having having these types of these types of models and this is the same [00:28:39] types of models and this is the same thing as assignments right and you can [00:28:41] thing as assignments right and you can think of it you can basically think of [00:28:43] think of it you can basically think of having bayesian networks as a [00:28:45] having bayesian networks as a distribution over these assignments over [00:28:48] distribution over these assignments over these models right i can have like a [00:28:50] these models right i can have like a equal to zero b equal to zero c equal to [00:28:52] equal to zero b equal to zero c equal to zero and i can have a probability [00:28:54] zero and i can have a probability assigned to that like probability of [00:28:55] assigned to that like probability of that could be point three and i can have [00:28:57] that could be point three and i can have another assignment or model and i can [00:28:59] another assignment or model and i can have another probability assigned to it [00:29:01] have another probability assigned to it so from a bayesian network [00:29:03] so from a bayesian network perspective from probabilistic [00:29:05] perspective from probabilistic perspective one can think about logic in [00:29:07] perspective one can think about logic in a probabilistic way and think about [00:29:09] a probabilistic way and think about probability of a formula given a [00:29:12] probability of a formula given a knowledge base so so when you have a [00:29:14] knowledge base so so when you have a knowledge base you have some knowledge [00:29:15] knowledge base you have some knowledge and and you're asking about a formula [00:29:17] and and you're asking about a formula one can think of instead of thinking [00:29:19] one can think of instead of thinking about just these three different things [00:29:21] about just these three different things like entailment contradiction and [00:29:23] like entailment contradiction and contingency one can think of a [00:29:25] contingency one can think of a probability like an actual like value [00:29:27] probability like an actual like value right and probability of the formula [00:29:29] right and probability of the formula given a knowledge base so what is that [00:29:31] given a knowledge base so what is that going to be equal to [00:29:33] going to be equal to that is going to be equal to probability [00:29:36] that is going to be equal to probability of of models w's that that exists in the [00:29:39] of of models w's that that exists in the in the intersection of models of kb and [00:29:41] in the intersection of models of kb and models of f right over probability of [00:29:45] models of f right over probability of all possible models in knowledge base so [00:29:47] all possible models in knowledge base so so w's are all possible models in my [00:29:49] so w's are all possible models in my knowledge base i'm going to sum over all [00:29:51] knowledge base i'm going to sum over all those probabilities of all possible [00:29:54] those probabilities of all possible models in my knowledge base in the [00:29:55] models in my knowledge base in the denominator and in the numerator i'm [00:29:58] denominator and in the numerator i'm going to just focus on models that are [00:30:00] going to just focus on models that are in the intersection of models of kb and [00:30:02] in the intersection of models of kb and models of f and then the union so so [00:30:05] models of f and then the union so so models of kb union f is equal to models [00:30:08] models of kb union f is equal to models of kb intersection models of f that's [00:30:11] of kb intersection models of f that's why there's a union here you remember [00:30:13] why there's a union here you remember like if you add an f to your kb you're [00:30:15] like if you add an f to your kb you're shrinking your knowledge base so so [00:30:17] shrinking your knowledge base so so that's why this this numerator is [00:30:19] that's why this this numerator is smaller right you're shrinking you're [00:30:21] smaller right you're shrinking you're shrinking your knowledge base by adding [00:30:22] shrinking your knowledge base by adding this f to it okay [00:30:24] this f to it okay and if you think about this fraction [00:30:26] and if you think about this fraction right this is a number this is a number [00:30:28] right this is a number this is a number between zero and one and now we have [00:30:30] between zero and one and now we have probabilities we have actually like a [00:30:32] probabilities we have actually like a probability for f like being satisfied [00:30:34] probability for f like being satisfied or not given given a knowledge base [00:30:37] or not given given a knowledge base but in general like this was just like a [00:30:39] but in general like this was just like a quick diversion like talking about a [00:30:41] quick diversion like talking about a probabilistic view of this there's quite [00:30:43] probabilistic view of this there's quite a bit of work actually in logic and [00:30:44] a bit of work actually in logic and probabilistic versions of it and [00:30:47] probabilistic versions of it and thinking about probabilistic model [00:30:48] thinking about probabilistic model checking and instead of giving just like [00:30:50] checking and instead of giving just like zero one values what would be a [00:30:52] zero one values what would be a probabilistic view of it we're not gonna [00:30:54] probabilistic view of it we're not gonna go into details of those in this class [00:30:56] go into details of those in this class and and basically you can think of these [00:30:58] and and basically you can think of these probabilities as these three different [00:31:01] probabilities as these three different ways of looking at the problem that we [00:31:03] ways of looking at the problem that we have been talking about if this [00:31:04] have been talking about if this probability is equal to zero then we [00:31:07] probability is equal to zero then we basically have an answer of no we have [00:31:09] basically have an answer of no we have contradiction right like f is not [00:31:10] contradiction right like f is not satisfied [00:31:12] satisfied well if this probability is equal to one [00:31:14] well if this probability is equal to one if the numerator and denominator are [00:31:15] if the numerator and denominator are equal to each other then you're [00:31:16] equal to each other then you're answering yes right like f is not adding [00:31:19] answering yes right like f is not adding any information so we have entailment [00:31:21] any information so we have entailment and if you get any other value in [00:31:23] and if you get any other value in between any other value between zero and [00:31:25] between any other value between zero and one then you're in an entailment [00:31:26] one then you're in an entailment situation sorry we are in a sorry [00:31:29] situation sorry we are in a sorry contingency situation and we basically [00:31:31] contingency situation and we basically say we don't know right like any of them [00:31:34] say we don't know right like any of them we basically say we don't know okay [00:31:36] we basically say we don't know okay all right so so that was just a quick [00:31:38] all right so so that was just a quick kind of like connection to probabilistic [00:31:41] kind of like connection to probabilistic generalization of some of the things we [00:31:43] generalization of some of the things we were talking about on ambition networks [00:31:45] were talking about on ambition networks but now let's just go back to go back to [00:31:47] but now let's just go back to go back to the same problem you're talking about [00:31:49] the same problem you're talking about right so so we've talked about these [00:31:51] right so so we've talked about these three different things entailment [00:31:52] three different things entailment contingency and contradiction we've [00:31:54] contingency and contradiction we've talked about how we can have a tell and [00:31:56] talked about how we can have a tell and ask opera operator based on that now we [00:31:59] ask opera operator based on that now we are going to talk about this idea of [00:32:00] are going to talk about this idea of satisfiability so what is satisfiability [00:32:04] satisfiability so what is satisfiability so a knowledge-based kb is satisfiable [00:32:06] so a knowledge-based kb is satisfiable if models of kb is not empty okay [00:32:10] if models of kb is not empty okay very simple like models of kb is not [00:32:12] very simple like models of kb is not empty we have satisfied gold okay so why [00:32:15] empty we have satisfied gold okay so why is satisfiable to useful why am i [00:32:17] is satisfiable to useful why am i talking about this at satisfiability [00:32:19] talking about this at satisfiability because satisfiability is a well-known [00:32:21] because satisfiability is a well-known problem we have really good like solvers [00:32:23] problem we have really good like solvers for it sat solvers and then we're going [00:32:25] for it sat solvers and then we're going to talk about that in one slide really [00:32:27] to talk about that in one slide really quickly so it's nice to think about [00:32:29] quickly so it's nice to think about these three different things entailment [00:32:31] these three different things entailment um and contingency and and contradiction [00:32:36] um and contingency and and contradiction in in kind of a view of a problem of [00:32:38] in in kind of a view of a problem of satisfiability okay so so we have these [00:32:41] satisfiability okay so so we have these three things satisfiability gives me a [00:32:43] three things satisfiability gives me a yes or no answer so how can i use [00:32:46] yes or no answer so how can i use satisfiability to answer if you are in [00:32:48] satisfiability to answer if you are in either in any of these situations [00:32:51] either in any of these situations so so the way you satisfy ability is we [00:32:53] so so the way you satisfy ability is we do two calls to satisfy [00:32:55] do two calls to satisfy so [00:32:57] so the first call is in general if i want [00:32:59] the first call is in general if i want to think about my ask operator or tell [00:33:01] to think about my ask operator or tell operator and if i want to reduce it to [00:33:03] operator and if i want to reduce it to satisfiability i can i can do two calls [00:33:05] satisfiability i can i can do two calls to satisfy both i can first ask if kb [00:33:08] to satisfy both i can first ask if kb union negation of f is satisfiable or [00:33:11] union negation of f is satisfiable or not okay so what does the answer to that [00:33:14] not okay so what does the answer to that give me so if i get no for that right if [00:33:17] give me so if i get no for that right if kb union negation of f is not [00:33:19] kb union negation of f is not satisfiable i have entailment right so i [00:33:22] satisfiable i have entailment right so i get [00:33:23] get my answer for entailment here [00:33:25] my answer for entailment here and if i get yes for that that doesn't [00:33:27] and if i get yes for that that doesn't answer everything right like if i just [00:33:29] answer everything right like if i just get yes for this i don't know if i'm in [00:33:31] get yes for this i don't know if i'm in a contingency situation or a [00:33:32] a contingency situation or a contradiction situation so what do i [00:33:34] contradiction situation so what do i need to do i need to do another call to [00:33:36] need to do i need to do another call to satisfy voting and the second call to [00:33:38] satisfy voting and the second call to satisfyability is asking if kb union f [00:33:42] satisfyability is asking if kb union f is satisfiable and what does that check [00:33:44] is satisfiable and what does that check well well if i get no for that then i'm [00:33:47] well well if i get no for that then i'm getting contradiction remember [00:33:48] getting contradiction remember contradiction is the same thing as [00:33:50] contradiction is the same thing as entailing negation of f that is why the [00:33:53] entailing negation of f that is why the answer for this gives me contradiction [00:33:55] answer for this gives me contradiction and if i get yes for that i get [00:33:56] and if i get yes for that i get contingency so what i've just done is is [00:33:59] contingency so what i've just done is is if i in general if i want to know if i'm [00:34:02] if i in general if i want to know if i'm in the entailment contradiction or [00:34:03] in the entailment contradiction or contingency situation [00:34:05] contingency situation then i can basically figure that out [00:34:07] then i can basically figure that out with two calls to satisfiability [00:34:09] with two calls to satisfiability and and why do i want to know i'm in any [00:34:12] and and why do i want to know i'm in any of these situations because that helps [00:34:13] of these situations because that helps me implement my ask and tell operations [00:34:17] me implement my ask and tell operations so going back here [00:34:19] so going back here so we talked about ask and so we talked [00:34:21] so we talked about ask and so we talked about how it relates to entailment [00:34:22] about how it relates to entailment contradiction and contingency and now we [00:34:24] contradiction and contingency and now we have talked about satisfiability [00:34:26] have talked about satisfiability as a way of answering [00:34:29] as a way of answering which scenario we are in okay [00:34:31] which scenario we are in okay and then how do we answer satisfiability [00:34:34] and then how do we answer satisfiability so so that's a good question to ask so [00:34:36] so so that's a good question to ask so what is satisfiability so satisfiability [00:34:39] what is satisfiability so satisfiability and checking satisfiable to the sad [00:34:40] and checking satisfiable to the sad problem in propositional logic is [00:34:43] problem in propositional logic is basically just a special case of solving [00:34:46] basically just a special case of solving a constrained satisfaction problem a csv [00:34:49] a constrained satisfaction problem a csv and we have already like like learned [00:34:50] and we have already like like learned about cscs and solving csvs so what that [00:34:53] about cscs and solving csvs so what that means is we can basically check [00:34:55] means is we can basically check satisfiability we can basically uh [00:34:59] satisfiability we can basically uh check if the csb problem like was the [00:35:01] check if the csb problem like was the solution to the csv problem and solve [00:35:04] solution to the csv problem and solve satisfied we'll give you the algorithms [00:35:05] satisfied we'll give you the algorithms that we already have access to okay and [00:35:08] that we already have access to okay and this idea of checking satisfiability is [00:35:11] this idea of checking satisfiability is called model checking you're checking if [00:35:13] called model checking you're checking if a model exists or not you're checking if [00:35:15] a model exists or not you're checking if an assignment exists or not okay so so [00:35:18] an assignment exists or not okay so so the mapping of the sat problem to csvs [00:35:21] the mapping of the sat problem to csvs is as follows so prepositional symbols [00:35:23] is as follows so prepositional symbols are basically variables what we used to [00:35:25] are basically variables what we used to call variables [00:35:26] call variables formulas are basically constraints and [00:35:30] formulas are basically constraints and then if you have variables and [00:35:31] then if you have variables and constraints you can come up with an [00:35:32] constraints you can come up with an assignment and that assignment is [00:35:34] assignment and that assignment is basically a model so you're checking if [00:35:36] basically a model so you're checking if a model exists or not you're checking if [00:35:39] a model exists or not you're checking if if a satisfying assignment exists or not [00:35:41] if a satisfying assignment exists or not okay [00:35:43] okay let's look at an example so let's say [00:35:44] let's look at an example so let's say our knowledge base has these two [00:35:46] our knowledge base has these two formulas in it we have a or b [00:35:49] formulas in it we have a or b and we have b um [00:35:51] and we have b um bi-directional implication negation of c [00:35:54] bi-directional implication negation of c okay [00:35:55] okay all right so we have three symbols a b [00:35:58] all right so we have three symbols a b and c these symbols are the same things [00:36:00] and c these symbols are the same things as csp variables so we can have three [00:36:03] as csp variables so we can have three nodes these three variables and then we [00:36:05] nodes these three variables and then we have basically two formulas these [00:36:07] have basically two formulas these formulas create constraints in our csv [00:36:10] formulas create constraints in our csv so we have a or b and then we have we [00:36:11] so we have a or b and then we have we have b equivalent negation of c [00:36:13] have b equivalent negation of c and then what are we doing so we have a [00:36:15] and then what are we doing so we have a csv we can solve it right we can find an [00:36:17] csv we can solve it right we can find an assignment a consistent assignment for [00:36:19] assignment a consistent assignment for it which is the same thing as a [00:36:21] it which is the same thing as a satisfying model and if you find an [00:36:24] satisfying model and if you find an assignment this problem is satisfiable [00:36:26] assignment this problem is satisfiable model shaking comes up with a model for [00:36:28] model shaking comes up with a model for it [00:36:29] it and then if it is not satisfiable it's [00:36:31] and then if it is not satisfiable it's going to return as unset it doesn't come [00:36:33] going to return as unset it doesn't come up with any assignments [00:36:35] up with any assignments so that's kind of nice going back here [00:36:37] so that's kind of nice going back here right like this problem that you've been [00:36:39] right like this problem that you've been talking about this tell and ask [00:36:41] talking about this tell and ask operation [00:36:42] operation reduces to entail my contradiction and [00:36:44] reduces to entail my contradiction and contingency i can use satisfiability to [00:36:47] contingency i can use satisfiability to cause the satisfiability to to answer [00:36:50] cause the satisfiability to to answer that then how do i do that well i use [00:36:52] that then how do i do that well i use model checkers to do that so so that's [00:36:54] model checkers to do that so so that's called model checking checking the [00:36:56] called model checking checking the satisfiability which is basically [00:36:58] satisfiability which is basically solving a csv okay [00:37:02] solving a csv okay all right so going back here okay so so [00:37:05] all right so going back here okay so so what does model checking do model [00:37:06] what does model checking do model checking takes as an input a knowledge [00:37:08] checking takes as an input a knowledge base and what does it output it outputs [00:37:10] base and what does it output it outputs if there exists a satisfying model or [00:37:12] if there exists a satisfying model or not and if it does it returns that model [00:37:15] not and if it does it returns that model okay so it checks if models of kb is is [00:37:18] okay so it checks if models of kb is is not empty [00:37:19] not empty and there are a good number of [00:37:21] and there are a good number of algorithms out there that try to do [00:37:22] algorithms out there that try to do model checking and and um some of the [00:37:25] model checking and and um some of the older ones is the dpl is kind of like a [00:37:27] older ones is the dpl is kind of like a well-known algorithm that tries to do [00:37:29] well-known algorithm that tries to do satisfiability and model checking what [00:37:32] satisfiability and model checking what it does is it uses backtracking search [00:37:34] it does is it uses backtracking search and quite a bit of pruning and quite a [00:37:36] and quite a bit of pruning and quite a bit of heuristic goes into it [00:37:38] bit of heuristic goes into it to make sure that it can solve this [00:37:39] to make sure that it can solve this problem as fast as possible um some some [00:37:42] problem as fast as possible um some some more recent algorithms are things like [00:37:44] more recent algorithms are things like walk sets that is pretty similar to [00:37:46] walk sets that is pretty similar to gibbs sampling and it does a randomized [00:37:48] gibbs sampling and it does a randomized local search there are a good number of [00:37:51] local search there are a good number of solvers out there satisfiability solvers [00:37:53] solvers out there satisfiability solvers out there z3 is a famous [00:37:55] out there z3 is a famous sat solver that you can look into if [00:37:57] sat solver that you can look into if you're interested in solving sad [00:37:59] you're interested in solving sad problems [00:38:01] problems and and with that we now have a good [00:38:03] and and with that we now have a good idea of syntax we have a good idea of [00:38:05] idea of syntax we have a good idea of semantics and next what we would like to [00:38:07] semantics and next what we would like to talk about is we would like to talk [00:38:09] talk about is we would like to talk about what the formulas get us right [00:38:11] about what the formulas get us right like why do we live in the formula on [00:38:13] like why do we live in the formula on land like why do we want to even like [00:38:15] land like why do we want to even like look at syntax and it turns out that we [00:38:17] look at syntax and it turns out that we can we can do inference on formulas and [00:38:19] can we can do inference on formulas and and that that buys us quite a bit so in [00:38:21] and that that buys us quite a bit so in the next module we're going to be [00:38:23] the next module we're going to be talking about what formula is formulas [00:38:25] talking about what formula is formulas bias and how to do inference rules [00:38:33] you ================================================================================ LECTURE 044 ================================================================================ Logic 4 - Inference Rules | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=RIk67yGMVv4 --- Transcript [00:00:05] all right so in this module we will be [00:00:07] all right so in this module we will be talking about inference rules so if you [00:00:09] talking about inference rules so if you remember so far we've been talking about [00:00:11] remember so far we've been talking about syntax and semantics and now we would [00:00:13] syntax and semantics and now we would like to talk about how we can play [00:00:15] like to talk about how we can play around with formulas and manipulate them [00:00:18] around with formulas and manipulate them and apply inference rules on them i want [00:00:21] and apply inference rules on them i want to talk about this diagram a little bit [00:00:23] to talk about this diagram a little bit more um for a second before jumping into [00:00:25] more um for a second before jumping into inference rules so let me let me go back [00:00:27] inference rules so let me let me go back to my [00:00:28] to my my whiteboard here so [00:00:31] my whiteboard here so basically [00:00:33] basically what i've been drawing here is we [00:00:35] what i've been drawing here is we live in the syntax land and formulas [00:00:37] live in the syntax land and formulas live in the syntax land so i'm going to [00:00:39] live in the syntax land so i'm going to draw them like this so maybe i have [00:00:40] draw them like this so maybe i have formula f1 [00:00:42] formula f1 formula f2 [00:00:44] formula f2 through maybe formula fn [00:00:47] through maybe formula fn okay [00:00:48] okay and then in the semantic slant [00:00:51] and then in the semantic slant i give meanings [00:00:52] i give meanings to these formulas right so so each [00:00:55] to these formulas right so so each formula has a corresponding set of [00:00:57] formula has a corresponding set of models to it so so each one of these [00:00:59] models to it so so each one of these formulas will correspond to something [00:01:02] formulas will correspond to something that i'm calling models of f1 [00:01:07] that i'm calling models of f1 f2 might have another set of models [00:01:11] f2 might have another set of models models of f2 [00:01:13] models of f2 and so on and you might have a bunch of [00:01:15] and so on and you might have a bunch of other ones so so let's say that i only [00:01:17] other ones so so let's say that i only have like three of these actually let me [00:01:19] have like three of these actually let me make this [00:01:21] make this that's three to make this simpler [00:01:24] that's three to make this simpler so let's say i have f3 and f3 has [00:01:28] so let's say i have f3 and f3 has another set of models corresponding to [00:01:30] another set of models corresponding to it autos of f3 okay [00:01:35] and then this part defines our knowledge [00:01:37] and then this part defines our knowledge base right so we talked about knowledge [00:01:39] base right so we talked about knowledge base being a set of formulas and and the [00:01:43] base being a set of formulas and and the shaded area [00:01:44] shaded area corresponds to models of knowledge base [00:01:48] corresponds to models of knowledge base okay so this is models of our knowledge [00:01:50] okay so this is models of our knowledge base [00:01:51] base so this is what we have talked about so [00:01:53] so this is what we have talked about so far so now we want to talk about what [00:01:55] far so now we want to talk about what inference rules really do [00:01:57] inference rules really do so if you have a set of knowledge in [00:01:59] so if you have a set of knowledge in your in your [00:02:00] your in your knowledge base this set of formulas in [00:02:02] knowledge base this set of formulas in your knowledge base the idea of [00:02:04] your knowledge base the idea of inference rules is could you could you [00:02:06] inference rules is could you could you basically apply a set of syntactic rules [00:02:09] basically apply a set of syntactic rules on them and based on those rules i'm [00:02:12] on them and based on those rules i'm going to call them inference rules [00:02:15] going to call them inference rules infer [00:02:17] infer something [00:02:18] something a new formula here [00:02:20] a new formula here so from [00:02:23] so from the formulas f1 through f3 that you have [00:02:25] the formulas f1 through f3 that you have could you infer a new formula a new g [00:02:28] could you infer a new formula a new g that is just based on the formulas that [00:02:30] that is just based on the formulas that you have and based on like symbolically [00:02:32] you have and based on like symbolically manipulating on it manipulating them and [00:02:35] manipulating on it manipulating them and the question is could you make sure that [00:02:37] the question is could you make sure that that g that you're inferring [00:02:39] that g that you're inferring actually has a model [00:02:42] actually has a model that is a superset [00:02:45] that is a superset of so this is model such as a superset [00:02:48] of so this is model such as a superset of models of of of kb because ideally [00:02:51] of models of of of kb because ideally you want to be able to infer something [00:02:53] you want to be able to infer something that just directly comes from the [00:02:55] that just directly comes from the formulas that you have so ideally you [00:02:57] formulas that you have so ideally you would want to be in a situation where [00:02:59] would want to be in a situation where models of g is going to be a subset of [00:03:02] models of g is going to be a subset of models of kb and what does that mean [00:03:04] models of kb and what does that mean that means kb entails g [00:03:07] that means kb entails g right so so could you have a set of [00:03:09] right so so could you have a set of inference rules that end up giving you a [00:03:11] inference rules that end up giving you a g and then you end up in a situation [00:03:13] g and then you end up in a situation where where models of g is going to be a [00:03:15] where where models of g is going to be a subset of models of kb so so could we [00:03:18] subset of models of kb so so could we come up come up with uh those set of [00:03:21] come up come up with uh those set of inference rules and those set of g's so [00:03:23] inference rules and those set of g's so that is the idea of inference rules and [00:03:25] that is the idea of inference rules and what we're going to be talking about [00:03:26] what we're going to be talking about today in this lecture okay [00:03:29] today in this lecture okay all right and that and that's basically [00:03:31] all right and that and that's basically what this diagram shows so we have a set [00:03:32] what this diagram shows so we have a set of formulas each one of them correspond [00:03:34] of formulas each one of them correspond to a set of models at the end of the day [00:03:36] to a set of models at the end of the day i want to apply a set of inference rules [00:03:38] i want to apply a set of inference rules on these formulas and i want to be able [00:03:40] on these formulas and i want to be able to come up with [00:03:42] to come up with models that is a super set of models of [00:03:45] models that is a super set of models of my my knowledge base right would i be [00:03:47] my my knowledge base right would i be able to do that [00:03:49] able to do that all right so [00:03:50] all right so let's talk about that so so let me give [00:03:51] let's talk about that so so let me give you an example what do i mean by that so [00:03:53] you an example what do i mean by that so so let's say that uh i know it is [00:03:56] so let's say that uh i know it is raining so in my knowledge base i have [00:03:58] raining so in my knowledge base i have that it is raining and then i have that [00:04:01] that it is raining and then i have that if it is raining then it is wet so rain [00:04:03] if it is raining then it is wet so rain implies wet if i tell you that it is [00:04:06] implies wet if i tell you that it is raining [00:04:07] raining and if it rains that implies wet what [00:04:09] and if it rains that implies wet what can you tell me from that just from that [00:04:11] can you tell me from that just from that knowledge [00:04:12] knowledge from that knowledge you should be able [00:04:14] from that knowledge you should be able to infer that well therefore it is wet [00:04:16] to infer that well therefore it is wet right it's raining and bringing implies [00:04:18] right it's raining and bringing implies with so there it's got to be wet okay so [00:04:21] with so there it's got to be wet okay so so that is the idea of infrastructure [00:04:23] so that is the idea of infrastructure could we like try to have a rule that [00:04:25] could we like try to have a rule that that tries to basically infer wet here [00:04:28] that tries to basically infer wet here okay just based on the formulas [00:04:31] okay just based on the formulas so so in general in inference rules we [00:04:33] so so in general in inference rules we have a set of premises we have a set of [00:04:34] have a set of premises we have a set of formulas like rain and rain implies wet [00:04:37] formulas like rain and rain implies wet and based on that we want to come up [00:04:39] and based on that we want to come up with a conclusion in this case for [00:04:40] with a conclusion in this case for example that conclusion is is that it is [00:04:43] example that conclusion is is that it is wet and then that defines like in [00:04:45] wet and then that defines like in general like inference rules there's a [00:04:46] general like inference rules there's a specific type of inference rule that [00:04:48] specific type of inference rule that you're going to talk about it's a pretty [00:04:50] you're going to talk about it's a pretty simple one and it's called modus ponens [00:04:53] simple one and it's called modus ponens okay so modus ponens is a very simple [00:04:55] okay so modus ponens is a very simple inference rule and what it does what it [00:04:57] inference rule and what it does what it says is that for any prepositional [00:04:59] says is that for any prepositional simple symbol p and q [00:05:02] simple symbol p and q if in my premises in my knowledge base i [00:05:04] if in my premises in my knowledge base i have p and p implies q [00:05:06] have p and p implies q that allows me to conclude q [00:05:09] that allows me to conclude q kind of like this example that we saw [00:05:11] kind of like this example that we saw here [00:05:12] here and you should think of inference rules [00:05:14] and you should think of inference rules as very like syntactic symbolic views of [00:05:16] as very like syntactic symbolic views of the world so so i basically just like [00:05:18] the world so so i basically just like look at my knowledge base if i find [00:05:20] look at my knowledge base if i find anything that matches form p and [00:05:22] anything that matches form p and anything that matches form p implies q [00:05:25] anything that matches form p implies q from that i should be able to infer to [00:05:26] from that i should be able to infer to you okay [00:05:28] you okay let me put this here in the set of my [00:05:30] let me put this here in the set of my definition so now we are at inference [00:05:32] definition so now we are at inference rules [00:05:33] rules and we are going to be talking about [00:05:36] and we are going to be talking about modus ponens [00:05:39] modus ponens so mode exponents is an inference rule [00:05:41] so mode exponents is an inference rule it tells us if we have p and p implies q [00:05:45] it tells us if we have p and p implies q we can infer q based on that okay [00:05:49] we can infer q based on that okay all right [00:05:51] all right so in general inference rules we can [00:05:53] so in general inference rules we can write them in in this way we have a set [00:05:55] write them in in this way we have a set of formulas f1 through fk and following [00:05:57] of formulas f1 through fk and following an inference rule allows us to conclude [00:06:00] an inference rule allows us to conclude g about it and depends on what inference [00:06:01] g about it and depends on what inference rule we are using modus ponens is an [00:06:04] rule we are using modus ponens is an example that we have just seen and again [00:06:06] example that we have just seen and again these rules are applied directly on [00:06:08] these rules are applied directly on syntax and they do not care about [00:06:10] syntax and they do not care about semantics they don't care about what [00:06:11] semantics they don't care about what raining means or what wet and raining [00:06:14] raining means or what wet and raining and the meaning between them actually [00:06:16] and the meaning between them actually means right they're just applied on the [00:06:18] means right they're just applied on the syntax on the formulas and that's kind [00:06:20] syntax on the formulas and that's kind of the power of logic right like we [00:06:21] of the power of logic right like we talked about these formulas as a compact [00:06:23] talked about these formulas as a compact way of representing representing much [00:06:26] way of representing representing much much larger meanings like exponential [00:06:29] much larger meanings like exponential meanings a lot of times and and now on [00:06:31] meanings a lot of times and and now on these very compact formulas we can apply [00:06:33] these very compact formulas we can apply syntactic rules we can apply inference [00:06:35] syntactic rules we can apply inference rules and based on them we can infer new [00:06:38] rules and based on them we can infer new formulas and and that have new meanings [00:06:40] formulas and and that have new meanings basically [00:06:41] basically okay so so if i want to think about what [00:06:43] okay so so if i want to think about what an inference algorithm does kind of like [00:06:45] an inference algorithm does kind of like a meta algorithm an inference algorithm [00:06:48] a meta algorithm an inference algorithm here does something of this form so so [00:06:50] here does something of this form so so we have an input that input is going to [00:06:52] we have an input that input is going to be a set of inference rules we've talked [00:06:54] be a set of inference rules we've talked about modus ponens as an example but in [00:06:56] about modus ponens as an example but in general i might have other inference [00:06:58] general i might have other inference rules and what i'd like to do is i'll [00:07:00] rules and what i'd like to do is i'll like to repeat this loop until there are [00:07:02] like to repeat this loop until there are no changes no more changes apply to my [00:07:04] no changes no more changes apply to my knowledge base and what do i do i just [00:07:06] knowledge base and what do i do i just choose a subset of my of [00:07:10] choose a subset of my of formulas from my knowledge base and if i [00:07:13] formulas from my knowledge base and if i can match my inference rule and and [00:07:15] can match my inference rule and and infer a new formula g i will add g back [00:07:19] infer a new formula g i will add g back to the knowledge base and i keep doing [00:07:21] to the knowledge base and i keep doing this until there's no more g's no no [00:07:23] this until there's no more g's no no more new formulas to be added to my [00:07:25] more new formulas to be added to my knowledge base okay so that is what an [00:07:27] knowledge base okay so that is what an inference algorithm does [00:07:29] inference algorithm does and and one other definition here is is [00:07:31] and and one other definition here is is this idea of derivation and proving so [00:07:34] this idea of derivation and proving so what we say is is that a knowledge base [00:07:36] what we say is is that a knowledge base proves or derives a formula f [00:07:39] proves or derives a formula f even only if if [00:07:41] even only if if even only if f eventually gets added to [00:07:44] even only if f eventually gets added to knowledge base [00:07:46] knowledge base so going back to to [00:07:49] so going back to to let's say our definitions so now we have [00:07:52] let's say our definitions so now we have a definition [00:07:53] a definition of derivation [00:07:56] of derivation or proving [00:07:58] or proving and you're going to use [00:08:00] and you're going to use so you're going to say a knowledge base [00:08:02] so you're going to say a knowledge base derives f and you're going to represent [00:08:05] derives f and you're going to represent that by this one line um one line kind [00:08:08] that by this one line um one line kind of symbol so so going back here we're [00:08:12] of symbol so so going back here we're going to basically if i have f1 through [00:08:14] going to basically if i have f1 through f3 in my knowledge base if my knowledge [00:08:17] f3 in my knowledge base if my knowledge base and like if i apply inference rules [00:08:20] base and like if i apply inference rules and get a new g i would say my knowledge [00:08:22] and get a new g i would say my knowledge base [00:08:23] base is [00:08:25] is deriving or proving g [00:08:28] deriving or proving g okay [00:08:29] okay and so okay so on this magic land then i [00:08:31] and so okay so on this magic land then i have this idea of entailment [00:08:35] have this idea of entailment which might be different [00:08:37] which might be different from [00:08:38] from what we have [00:08:39] what we have in the syntactic land which is this idea [00:08:41] in the syntactic land which is this idea of inferring or proving [00:08:45] of inferring or proving or deriving [00:08:48] or deriving so okay [00:08:50] so okay all right so we'll talk about the [00:08:52] all right so we'll talk about the relationship between these two in a few [00:08:54] relationship between these two in a few slides but let me just go back to go [00:08:56] slides but let me just go back to go back to talking about derivation a [00:08:58] back to talking about derivation a little bit more okay so that is their [00:09:00] little bit more okay so that is their evasion that is proving so let's look at [00:09:02] evasion that is proving so let's look at an example let's say that i have a [00:09:04] an example let's say that i have a knowledge base and in my knowledge base [00:09:06] knowledge base and in my knowledge base i have that it is raining and i have [00:09:08] i have that it is raining and i have that rating implies being wet and i have [00:09:11] that rating implies being wet and i have that wet implies slippery okay so the [00:09:14] that wet implies slippery okay so the question is can you can i apply the [00:09:16] question is can you can i apply the inference algorithm on this modus ponens [00:09:18] inference algorithm on this modus ponens using using just modus ponens and [00:09:21] using using just modus ponens and rules how does how does that work let's [00:09:24] rules how does how does that work let's actually try this out using using the [00:09:27] actually try this out using using the system that we looked at in the [00:09:29] system that we looked at in the overview lecture so um let's say it is [00:09:32] overview lecture so um let's say it is raining [00:09:33] raining okay [00:09:34] okay so it says i learned something i can [00:09:36] so it says i learned something i can look at the knowledge base so let's look [00:09:37] look at the knowledge base so let's look at what is in the knowledge base so [00:09:39] at what is in the knowledge base so raining is the in the knowledge base [00:09:41] raining is the in the knowledge base okay [00:09:42] okay i can say if it is raining [00:09:45] i can say if it is raining then it is wet [00:09:47] then it is wet okay so it says oh i learned something [00:09:50] okay so it says oh i learned something let's look at the knowledge base so it's [00:09:52] let's look at the knowledge base so it's ha it has it is raining [00:09:54] ha it has it is raining it has raining implies wet that is that [00:09:56] it has raining implies wet that is that is what this means because rain implies [00:09:58] is what this means because rain implies width is equivalent to not rain or red [00:10:01] width is equivalent to not rain or red right because that's what implication [00:10:03] right because that's what implication logical implication means [00:10:05] logical implication means and then based on these two things it [00:10:07] and then based on these two things it actually derives wet it applies modus [00:10:09] actually derives wet it applies modus ponens remember i have rain rain implies [00:10:12] ponens remember i have rain rain implies wet what does modus ponens give me more [00:10:14] wet what does modus ponens give me more disponents gives me wet [00:10:16] disponents gives me wet so so i can derive it [00:10:18] so so i can derive it let's add if it is wet [00:10:22] let's add if it is wet it is slippery [00:10:24] it is slippery and let's see what that gives us so it [00:10:26] and let's see what that gives us so it says i learned something let's look at [00:10:27] says i learned something let's look at the knowledge base okay so we have a [00:10:29] the knowledge base okay so we have a bunch of things right so these are [00:10:31] bunch of things right so these are things i added i said i added rain i [00:10:33] things i added i said i added rain i added rain implies wet i added red [00:10:36] added rain implies wet i added red implies slippery from the first modus [00:10:38] implies slippery from the first modus ponens that we applied we got wet [00:10:41] ponens that we applied we got wet we can apply modus ponens on wet and red [00:10:44] we can apply modus ponens on wet and red implies slippery and we can get slippery [00:10:46] implies slippery and we can get slippery so this slippery is added [00:10:48] so this slippery is added we also get another formula here rain [00:10:51] we also get another formula here rain implies slippery [00:10:52] implies slippery and modus bonus actually doesn't get us [00:10:54] and modus bonus actually doesn't get us that but but this this is using other [00:10:57] that but but this this is using other other types of inference rules not just [00:10:59] other types of inference rules not just modus ponens and then if you if you [00:11:01] modus ponens and then if you if you apply other inference rules you might [00:11:03] apply other inference rules you might actually get rain implies like right [00:11:04] actually get rain implies like right here okay [00:11:06] here okay all right so so let's look at this exact [00:11:08] all right so so let's look at this exact example here [00:11:10] example here all right so rain and rain implies wet [00:11:13] all right so rain and rain implies wet will will get us wet right [00:11:16] will will get us wet right so we apply modus bonus again on red and [00:11:19] so we apply modus bonus again on red and red implies slippery and that gets us [00:11:21] red implies slippery and that gets us slippery and if the only inference rule [00:11:24] slippery and if the only inference rule that we have here is modus ponens then [00:11:26] that we have here is modus ponens then we have converged knowledge base is not [00:11:28] we have converged knowledge base is not changing anymore we have derived wet and [00:11:30] changing anymore we have derived wet and slippery we have derived new formulas um [00:11:33] slippery we have derived new formulas um but uh [00:11:35] but uh we we are we are basically we have [00:11:37] we we are we are basically we have converged at this point and we can't [00:11:38] converged at this point and we can't derive anything more okay so [00:11:40] derive anything more okay so so we can't derive a set of other things [00:11:43] so we can't derive a set of other things here right like we haven't derived like [00:11:44] here right like we haven't derived like not wet probably a good thing that we [00:11:46] not wet probably a good thing that we haven't derived not with because not red [00:11:49] haven't derived not with because not red is actually contradictory to our [00:11:51] is actually contradictory to our knowledge base like it's actually not [00:11:52] knowledge base like it's actually not true right so we shouldn't be able to [00:11:54] true right so we shouldn't be able to derive not wet that's a good thing [00:11:57] derive not wet that's a good thing in addition to that we weren't able to [00:11:59] in addition to that we weren't able to derive rain implies slippery which is [00:12:01] derive rain implies slippery which is actually true right like if you think [00:12:03] actually true right like if you think about entailment and what is the truth [00:12:05] about entailment and what is the truth right like rain implies slippery is [00:12:07] right like rain implies slippery is entailed here but but we weren't able to [00:12:10] entailed here but but we weren't able to we weren't able to get that by just [00:12:13] we weren't able to get that by just applying modus ponens and and we will [00:12:15] applying modus ponens and and we will talk about that in in general in a few [00:12:18] talk about that in in general in a few slides like why is it that we can't get [00:12:20] slides like why is it that we can't get rain implies slippery and what can we do [00:12:22] rain implies slippery and what can we do to make sure that we get everything that [00:12:24] to make sure that we get everything that is entailed okay and and that is kind of [00:12:26] is entailed okay and and that is kind of like the same question as like we see [00:12:29] like the same question as like we see here right like what is the relationship [00:12:31] here right like what is the relationship between entailment and and and varying [00:12:34] between entailment and and and varying and and deriving so derivation and [00:12:36] and and deriving so derivation and entailment how are they related are they [00:12:38] entailment how are they related are they the same thing or are they doing [00:12:39] the same thing or are they doing different things and does it depend on [00:12:41] different things and does it depend on the inference rule okay [00:12:44] the inference rule okay all right so [00:12:46] all right so so the density so far is we have [00:12:48] so the density so far is we have semantics semantics is really about [00:12:51] semantics semantics is really about truth right it's about entailment like [00:12:54] truth right it's about entailment like about the meaning like what is actually [00:12:55] about the meaning like what is actually true when you say a knowledge base [00:12:57] true when you say a knowledge base entails f what that means is is that [00:13:00] entails f what that means is is that models of knowledge base is a subset of [00:13:02] models of knowledge base is a subset of models of f and and no and and in terms [00:13:05] models of f and and no and and in terms of like meaning like [00:13:07] of like meaning like that is actually what what the true [00:13:09] that is actually what what the true truth is on the other hand we've talked [00:13:11] truth is on the other hand we've talked about syntax in syntax we just do symbol [00:13:14] about syntax in syntax we just do symbol manipulation right using inference rules [00:13:16] manipulation right using inference rules we've looked at mode exponents as an [00:13:18] we've looked at mode exponents as an inference rule and then we have looked [00:13:20] inference rule and then we have looked at things like uh derivation so [00:13:22] at things like uh derivation so knowledge base derives f [00:13:24] knowledge base derives f okay [00:13:25] okay so how are these two related let's talk [00:13:27] so how are these two related let's talk about that and that brings us to the [00:13:29] about that and that brings us to the idea of soundness and completeness okay [00:13:32] idea of soundness and completeness okay so we're going to talk about soundness [00:13:34] so we're going to talk about soundness and completeness [00:13:36] and completeness so let's look at this as an example so [00:13:37] so let's look at this as an example so imagine [00:13:38] imagine imagine that you have a glass okay and [00:13:41] imagine that you have a glass okay and and every and things that go inside of [00:13:43] and every and things that go inside of the glass are formulas [00:13:45] the glass are formulas and imagine that anything that is inside [00:13:48] and imagine that anything that is inside of the glass is the truth so what does [00:13:50] of the glass is the truth so what does that mean that means that knowledge base [00:13:52] that mean that means that knowledge base entails those formulas so so every [00:13:55] entails those formulas so so every formula that is true is going to be [00:13:56] formula that is true is going to be inside of the glass [00:13:59] inside of the glass so the idea of soundness is that if i am [00:14:02] so the idea of soundness is that if i am applying inference rules if i'm from [00:14:05] applying inference rules if i'm from running a bunch of inference rules the [00:14:07] running a bunch of inference rules the formulas that are derived from those [00:14:09] formulas that are derived from those inference rules i want to make sure that [00:14:11] inference rules i want to make sure that they're also going to be inside of the [00:14:13] they're also going to be inside of the glass i want to make sure that they're [00:14:15] glass i want to make sure that they're also true okay so so the idea of [00:14:18] also true okay so so the idea of soundness is that a set of inference [00:14:20] soundness is that a set of inference rules rules our sound if if [00:14:22] rules rules our sound if if we have if the formulas that are derived [00:14:25] we have if the formulas that are derived following that inference rule is going [00:14:27] following that inference rule is going to be a subset of the truth which is [00:14:29] to be a subset of the truth which is which is that the the set of formulas [00:14:32] which is that the the set of formulas that are entailed by the knowledge base [00:14:34] that are entailed by the knowledge base okay so they're going to be inside of [00:14:36] okay so they're going to be inside of the glass they're going to be true maybe [00:14:37] the glass they're going to be true maybe they don't fill the glass that's fine [00:14:39] they don't fill the glass that's fine but what this is telling me is that [00:14:41] but what this is telling me is that anything that i'm deriving is still [00:14:43] anything that i'm deriving is still going to be true i'm not going to derive [00:14:45] going to be true i'm not going to derive something that's absolutely false and [00:14:47] something that's absolutely false and that's that's a very important uh [00:14:49] that's that's a very important uh property that you want to have in in [00:14:50] property that you want to have in in general like you would want to have [00:14:52] general like you would want to have soundness you want to have inference [00:14:53] soundness you want to have inference rules that are sound because otherwise [00:14:55] rules that are sound because otherwise we would be deriving things that are [00:14:57] we would be deriving things that are absolutely false and and that inference [00:14:59] absolutely false and and that inference rule is not useful right like we want to [00:15:02] rule is not useful right like we want to be able to derive drive things that are [00:15:04] be able to derive drive things that are at least true okay so that is this idea [00:15:07] at least true okay so that is this idea of soundness [00:15:09] of soundness on the other hand there is kind of like [00:15:11] on the other hand there is kind of like the the other side of the story which is [00:15:13] the the other side of the story which is about completeness completeness is about [00:15:16] about completeness completeness is about making sure that you're deriving [00:15:18] making sure that you're deriving everything that is true again remember [00:15:20] everything that is true again remember everything that is inside of the glass [00:15:23] everything that is inside of the glass is true and the idea of completeness is [00:15:26] is true and the idea of completeness is that you got to make sure that the [00:15:28] that you got to make sure that the formulas that are entailed the formulas [00:15:30] formulas that are entailed the formulas that are inside of the glass are going [00:15:32] that are inside of the glass are going to be a subset of the formulas that are [00:15:35] to be a subset of the formulas that are derived so so what that means is that [00:15:37] derived so so what that means is that your derivation rule make sure that that [00:15:40] your derivation rule make sure that that you are you are getting all the formulas [00:15:43] you are you are getting all the formulas that are true or even more than that [00:15:44] that are true or even more than that right like if you talk about [00:15:45] right like if you talk about completeness without worrying about [00:15:47] completeness without worrying about soundness you might be able you might [00:15:48] soundness you might be able you might even be deriving things that are outside [00:15:50] even be deriving things that are outside of this class but you want to make sure [00:15:52] of this class but you want to make sure that you are deriving everything that is [00:15:54] that you are deriving everything that is inside of the class too so everything [00:15:56] inside of the class too so everything that is entailed and that's the idea of [00:15:58] that is entailed and that's the idea of completeness [00:15:59] completeness okay [00:16:00] okay so if you put soundness and completeness [00:16:02] so if you put soundness and completeness together you get a filled up with just [00:16:05] together you get a filled up with just filled up class right you get everything [00:16:06] filled up class right you get everything that is inside of the glass and just [00:16:08] that is inside of the glass and just everything that's inside of the glass [00:16:10] everything that's inside of the glass okay which is which is like everything [00:16:11] okay which is which is like everything that is true everything that is entailed [00:16:14] that is true everything that is entailed okay so sadness and completeness is [00:16:16] okay so sadness and completeness is about the truth the whole truth and [00:16:19] about the truth the whole truth and nothing but the truth okay so soundness [00:16:22] nothing but the truth okay so soundness gets you nothing but the truth [00:16:24] gets you nothing but the truth everything that's inside of the glass [00:16:26] everything that's inside of the glass and nothing outside of the glass because [00:16:28] and nothing outside of the glass because that would be bad right like you don't [00:16:29] that would be bad right like you don't want to get something false that's what [00:16:31] want to get something false that's what soundness gets you [00:16:32] soundness gets you completeness gets you the whole truth [00:16:34] completeness gets you the whole truth make sure make sure that you get [00:16:35] make sure make sure that you get everything that is inside of the glass [00:16:37] everything that is inside of the glass and and like nowhere is it like kept [00:16:40] and and like nowhere is it like kept empty right like you're driving on all [00:16:41] empty right like you're driving on all the formulas that are inside of the [00:16:43] the formulas that are inside of the class and that that is what completeness [00:16:44] class and that that is what completeness gets you [00:16:45] gets you in general you want both soundness and [00:16:47] in general you want both soundness and completeness it would be awesome to get [00:16:48] completeness it would be awesome to get both soundness and completeness and if [00:16:50] both soundness and completeness and if you if you get both of em then [00:16:52] you if you get both of em then entailment and derivation are equivalent [00:16:54] entailment and derivation are equivalent right like if you derive something [00:16:56] right like if you derive something you'll make sure that it's equivalent to [00:16:58] you'll make sure that it's equivalent to the thing that that you're entailing [00:17:00] the thing that that you're entailing in practice soundness is more important [00:17:03] in practice soundness is more important right because you don't want to derive [00:17:04] right because you don't want to derive something that is false and maybe you [00:17:06] something that is false and maybe you don't get all the truth but maybe that [00:17:07] don't get all the truth but maybe that is okay so so in practice you sound this [00:17:11] is okay so so in practice you sound this we prefer to get that [00:17:13] we prefer to get that first and then and then push towards [00:17:15] first and then and then push towards completeness okay [00:17:17] completeness okay so going back to here [00:17:20] so going back to here so so soundness and completeness is the [00:17:23] so so soundness and completeness is the thing that connects these guys together [00:17:25] thing that connects these guys together so so [00:17:26] so so soundness [00:17:29] and [00:17:30] and completeness [00:17:34] make sure that these two are equivalent [00:17:35] make sure that these two are equivalent you should write that here too [00:17:38] you should write that here too okay so we talked about [00:17:40] okay so we talked about sadness [00:17:43] and completeness [00:17:45] and completeness as things [00:17:47] as things that are relating [00:17:49] that are relating entailment and derivation okay [00:17:53] entailment and derivation okay all right so [00:17:55] all right so and then these are properties of [00:17:56] and then these are properties of inference rules so the question is is [00:17:59] inference rules so the question is is modus ponens [00:18:00] modus ponens sound or is modus ponens complete what [00:18:03] sound or is modus ponens complete what can we say about modus ponens because [00:18:04] can we say about modus ponens because that's the only inference rule we have [00:18:06] that's the only inference rule we have seen so far okay [00:18:08] seen so far okay so remember modus ponens right like we [00:18:10] so remember modus ponens right like we have rain and rain implies wet and modus [00:18:12] have rain and rain implies wet and modus ponens gets a sweat so is that a sound [00:18:15] ponens gets a sweat so is that a sound so so how do we check soundness [00:18:18] so so how do we check soundness so to check soundness right sound this [00:18:20] so to check soundness right sound this is about the meaning it's it's about [00:18:22] is about the meaning it's it's about checking if if it is it is actually like [00:18:25] checking if if it is it is actually like inside of the glass right like the thing [00:18:26] inside of the glass right like the thing you're getting is is actually entailed [00:18:29] you're getting is is actually entailed so so how are we checking that we look [00:18:31] so so how are we checking that we look at models of rain models of rain is this [00:18:33] at models of rain models of rain is this shaded area we look at models of rain [00:18:36] shaded area we look at models of rain implies wet that is this shaded area [00:18:39] implies wet that is this shaded area you take the intersection of them right [00:18:41] you take the intersection of them right because models of these two formulas is [00:18:43] because models of these two formulas is the intersection of these two models [00:18:45] the intersection of these two models that is the the darker area [00:18:48] that is the the darker area and the thing you're going to check is [00:18:49] and the thing you're going to check is that if this darker area [00:18:52] that if this darker area is going to be a subset of models of [00:18:54] is going to be a subset of models of width is it going to be entailed right [00:18:56] width is it going to be entailed right like you're checking entailment because [00:18:58] like you're checking entailment because that is about the truth right like that [00:18:59] that is about the truth right like that is the thing that checks the truth so [00:19:01] is the thing that checks the truth so models of wet is here [00:19:03] models of wet is here and then yeah this darker area is [00:19:05] and then yeah this darker area is actually a subset of models of it so it [00:19:08] actually a subset of models of it so it turns out that modus ponens is actually [00:19:10] turns out that modus ponens is actually sound right like we are getting we are [00:19:12] sound right like we are getting we are we are inferring formulas that are [00:19:14] we are inferring formulas that are actually true okay [00:19:16] actually true okay so it is sound [00:19:18] so it is sound let's look at the difference in french [00:19:19] let's look at the difference in french rule so i have a made up inference rule [00:19:21] rule so i have a made up inference rule that says if you get wet [00:19:23] that says if you get wet and if you get rain implies wet can you [00:19:25] and if you get rain implies wet can you infer rain from that okay so you've got [00:19:28] infer rain from that okay so you've got wet and raining implies red is it [00:19:30] wet and raining implies red is it raining that's the thing you're checking [00:19:33] raining that's the thing you're checking so this inference rule similarly i can [00:19:35] so this inference rule similarly i can look at models a bit i can look at [00:19:37] look at models a bit i can look at models of that implies [00:19:39] models of that implies rain implies wet [00:19:41] rain implies wet this shaded area is going to be the [00:19:43] this shaded area is going to be the intersection and that is not a subset of [00:19:46] intersection and that is not a subset of models of rain right like as you can see [00:19:47] models of rain right like as you can see here that's not a subset of models of [00:19:49] here that's not a subset of models of rain so what that means is we don't have [00:19:51] rain so what that means is we don't have entailment here right so so because of [00:19:53] entailment here right so so because of that [00:19:54] that this particular inference rule is [00:19:56] this particular inference rule is actually not sound [00:19:57] actually not sound so all right so the nice thing about [00:19:59] so all right so the nice thing about modus ponens is is it's actually sound [00:20:02] modus ponens is is it's actually sound but the next question to ask is is modus [00:20:05] but the next question to ask is is modus ponens complete [00:20:07] ponens complete and i want you guys to remember uh this [00:20:09] and i want you guys to remember uh this example that you looked at right like we [00:20:11] example that you looked at right like we got a formula we got rain implies [00:20:13] got a formula we got rain implies slippery [00:20:15] slippery and uh that wasn't from modus ponens [00:20:18] and uh that wasn't from modus ponens right like modus ponens wasn't able to [00:20:19] right like modus ponens wasn't able to get that so this kind of gives us a hint [00:20:22] get that so this kind of gives us a hint that modus bonus is not complete it's [00:20:24] that modus bonus is not complete it's not going to get everything that is [00:20:26] not going to get everything that is actually like entailing is actually true [00:20:29] actually like entailing is actually true but but yeah let's look at an example [00:20:31] but but yeah let's look at an example i'm not going to do justice improving [00:20:33] i'm not going to do justice improving like that modus bonus is not complete [00:20:34] like that modus bonus is not complete i'm mainly just going to look at a few [00:20:36] i'm mainly just going to look at a few examples so yeah so let's look at [00:20:38] examples so yeah so let's look at another example here [00:20:40] another example here so let's say our knowledge base is rain [00:20:43] so let's say our knowledge base is rain and if it is raining or snowing it will [00:20:45] and if it is raining or snowing it will be wet okay [00:20:47] be wet okay so so the question is um if if we can we [00:20:51] so so the question is um if if we can we can infer [00:20:52] can infer wet using using our modus ponens rules [00:20:56] wet using using our modus ponens rules so so first question is is it actually [00:20:58] so so first question is is it actually that is it actually true that it would [00:21:01] that is it actually true that it would be wet just like think about it like [00:21:02] be wet just like think about it like intuitively think about it like [00:21:04] intuitively think about it like logically right like if you just think [00:21:05] logically right like if you just think about it intuitively you know it's [00:21:07] about it intuitively you know it's raining you know if it is raining or [00:21:08] raining you know if it is raining or snowing then it's going to be wet [00:21:11] snowing then it's going to be wet so then it's it's got to be wet right [00:21:13] so then it's it's got to be wet right it's raining so it's got to be wet so [00:21:15] it's raining so it's got to be wet so like if you just think about it [00:21:16] like if you just think about it intuitively like you kind of like [00:21:18] intuitively like you kind of like realize that wet gotta be entailed here [00:21:21] realize that wet gotta be entailed here right like let's meet like from meaning [00:21:22] right like let's meet like from meaning perspective right like wet should be [00:21:24] perspective right like wet should be included or should we should be able to [00:21:26] included or should we should be able to like get to it and and incorporate it in [00:21:29] like get to it and and incorporate it in knowledge base but modus ponens is not [00:21:31] knowledge base but modus ponens is not able to infer that so why is it not able [00:21:33] able to infer that so why is it not able to infer that because in modus ponens we [00:21:36] to infer that because in modus ponens we have this very specific syntactic form [00:21:38] have this very specific syntactic form of f and f implies g and then this [00:21:40] of f and f implies g and then this formula doesn't match that right like it [00:21:42] formula doesn't match that right like it does have this or [00:21:44] does have this or and modus ponens doesn't really have ors [00:21:46] and modus ponens doesn't really have ors in it it doesn't really have any [00:21:47] in it it doesn't really have any branchings in it and and because of that [00:21:50] branchings in it and and because of that like i can't really apply i can't i [00:21:52] like i can't really apply i can't i can't really apply modus 1 is here so [00:21:54] can't really apply modus 1 is here so knowledge base here actually entails f f [00:21:56] knowledge base here actually entails f f is in tail it is going to be red but [00:21:58] is in tail it is going to be red but syntactically using just mode exponents [00:22:01] syntactically using just mode exponents i'm not going to be able to derive f [00:22:04] i'm not going to be able to derive f and then based on this example you can [00:22:05] and then based on this example you can kind of see that modus ponens is not [00:22:07] kind of see that modus ponens is not complete like we're not able to [00:22:10] complete like we're not able to derive everything okay [00:22:12] derive everything okay so one other thing i want to note here [00:22:14] so one other thing i want to note here is modus bonus is kind of interesting [00:22:15] is modus bonus is kind of interesting it's just looking at positive examples [00:22:17] it's just looking at positive examples right like you you have a bunch of [00:22:19] right like you you have a bunch of positive positive clauses uh sorry [00:22:22] positive positive clauses uh sorry positive formulas and based on those [00:22:23] positive formulas and based on those formulas you're able to infer something [00:22:26] formulas you're able to infer something positive and again infer something [00:22:27] positive and again infer something positive and refer something positive [00:22:29] positive and refer something positive right it doesn't really have like these [00:22:31] right it doesn't really have like these fours or negations and then that is why [00:22:33] fours or negations and then that is why it's not able to it's not able to infer [00:22:36] it's not able to it's not able to infer this this particular property because we [00:22:38] this this particular property because we have an or here because it's not able to [00:22:40] have an or here because it's not able to capture that and again it's like [00:22:41] capture that and again it's like applying things syntactically so it [00:22:43] applying things syntactically so it doesn't care about meaning so how can we [00:22:45] doesn't care about meaning so how can we fix this so the question is going back [00:22:47] fix this so the question is going back here right like we just saw that modus [00:22:49] here right like we just saw that modus ponens sure it sound that is great but [00:22:52] ponens sure it sound that is great but it was [00:22:53] it was okay and ideally i want to be able to [00:22:55] okay and ideally i want to be able to get both soundness and completeness [00:22:57] get both soundness and completeness because ideally i want my what i'm [00:22:59] because ideally i want my what i'm deriving would be equivalent to what i'm [00:23:01] deriving would be equivalent to what i'm entailing i want both of them so a [00:23:04] entailing i want both of them so a question that we're asking now is how [00:23:06] question that we're asking now is how can we fix that how can we fix the fact [00:23:08] can we fix that how can we fix the fact that modus ponens is not complete okay [00:23:12] that modus ponens is not complete okay and and that's the topic of the next few [00:23:14] and and that's the topic of the next few modules so we have two options to fix [00:23:16] modules so we have two options to fix this completeness first option is maybe [00:23:19] this completeness first option is maybe we should restrict propositional formula [00:23:21] we should restrict propositional formula so for positional logic maybe it is too [00:23:23] so for positional logic maybe it is too large if you're restricted maybe you can [00:23:25] large if you're restricted maybe you can restrict it to a specific set of [00:23:26] restrict it to a specific set of propositional logic that only has these [00:23:29] propositional logic that only has these things that are called horn clauses [00:23:31] things that are called horn clauses and under that scenario if you're [00:23:33] and under that scenario if you're looking at propositional logic with only [00:23:35] looking at propositional logic with only horn clauses it turns out that modus [00:23:37] horn clauses it turns out that modus ponens is both sound and complete [00:23:40] ponens is both sound and complete the other option is maybe i don't want [00:23:42] the other option is maybe i don't want to change my propositional logic i want [00:23:43] to change my propositional logic i want to keep all of her positional logic but [00:23:45] to keep all of her positional logic but maybe i should be looking at more [00:23:47] maybe i should be looking at more powerful inference rules so modus ponens [00:23:49] powerful inference rules so modus ponens seems pretty simple maybe there are more [00:23:51] seems pretty simple maybe there are more powerful inference rules that i can use [00:23:53] powerful inference rules that i can use and then specifically resolution is is [00:23:56] and then specifically resolution is is an inference rule that you're going to [00:23:57] an inference rule that you're going to be talking about which is both sound and [00:23:59] be talking about which is both sound and complete so next module we'll be talking [00:24:02] complete so next module we'll be talking about horn clauses propositional logic [00:24:04] about horn clauses propositional logic return clauses and the fact that modus [00:24:06] return clauses and the fact that modus ponens is sound and complete and in the [00:24:08] ponens is sound and complete and in the module after that we'll be talking about [00:24:09] module after that we'll be talking about resolution and how we can use resolution [00:24:12] resolution and how we can use resolution and the fact that it is [00:24:19] you ================================================================================ LECTURE 045 ================================================================================ Logic 5 - Propositional Modus Ponens | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=6bj4z2mt1KE --- Transcript [00:00:05] okay so in this module we would like to [00:00:07] okay so in this module we would like to talk about foreign clauses specifically [00:00:10] talk about foreign clauses specifically how modus ponens applies to [00:00:11] how modus ponens applies to propositional logic with only horned [00:00:13] propositional logic with only horned clauses and how we can show uh [00:00:16] clauses and how we can show uh completeness and soundness in that [00:00:17] completeness and soundness in that setting okay [00:00:19] setting okay so to do that you have to define a few [00:00:21] so to do that you have to define a few other things uh so let me go back to my [00:00:24] other things uh so let me go back to my definitions here [00:00:26] definitions here so [00:00:27] so here we've been talking about inference [00:00:28] here we've been talking about inference rules we've been talking about modus [00:00:30] rules we've been talking about modus ponens derivation and proving and we've [00:00:32] ponens derivation and proving and we've talked about sinus and completeness [00:00:34] talked about sinus and completeness you've seen that modus ponens is sound [00:00:36] you've seen that modus ponens is sound but it is not complete and as a way of [00:00:39] but it is not complete and as a way of fixing that we thought maybe we should [00:00:40] fixing that we thought maybe we should restrict our formulas to formulas that [00:00:43] restrict our formulas to formulas that are only that only have horn clauses so [00:00:46] are only that only have horn clauses so we need to define what a horn causes and [00:00:48] we need to define what a horn causes and to define define what a horn clause is [00:00:50] to define define what a horn clause is we have to define what a definite cause [00:00:52] we have to define what a definite cause is so i'm going to define definite [00:00:55] is so i'm going to define definite clause [00:00:57] clause and i'm going to define goal clause [00:01:01] and a horn clause is basically a clause [00:01:03] and a horn clause is basically a clause that is either a definite clause or goal [00:01:05] that is either a definite clause or goal clause i'll define this in a setting in [00:01:08] clause i'll define this in a setting in a second [00:01:10] a second okay so so that's [00:01:12] okay so so that's just back here so what is a definite [00:01:15] just back here so what is a definite clause a definite cause is a clause that [00:01:17] clause a definite cause is a clause that has the following form so you have p1 [00:01:20] has the following form so you have p1 anded through pk implying q okay so that [00:01:24] anded through pk implying q okay so that is a definite clause okay so where p1 [00:01:26] is a definite clause okay so where p1 through pk and q are propositional [00:01:28] through pk and q are propositional symbols notice that just one thing i [00:01:31] symbols notice that just one thing i want to mention is k could be zero too [00:01:33] want to mention is k could be zero too so you could have almost like true [00:01:35] so you could have almost like true implies q so you would end up with just [00:01:37] implies q so you would end up with just q [00:01:38] q so so that is also a definite cause so [00:01:40] so so that is also a definite cause so here are some examples of definite [00:01:42] here are some examples of definite causes so [00:01:44] causes so rain and snow implying traffic is a [00:01:46] rain and snow implying traffic is a definite cause because it does have this [00:01:48] definite cause because it does have this form of p1 and it through pk implying q [00:01:51] form of p1 and it through pk implying q okay [00:01:52] okay traffic itself is also just a definite [00:01:54] traffic itself is also just a definite clause so q itself is just a definite [00:01:57] clause so q itself is just a definite clause [00:01:57] clause not traffic negation of traffic is not a [00:02:00] not traffic negation of traffic is not a definite cause because you don't have [00:02:02] definite cause because you don't have any negations here right like these are [00:02:03] any negations here right like these are propositional symbols [00:02:05] propositional symbols and then random snow implying traffic or [00:02:08] and then random snow implying traffic or peaceful is not is not a definite cause [00:02:11] peaceful is not is not a definite cause because we have this or here okay [00:02:14] because we have this or here okay so again a definite clause has this form [00:02:16] so again a definite clause has this form of just positive information implying [00:02:18] of just positive information implying something positive okay so it has kind [00:02:21] something positive okay so it has kind of kind of that form [00:02:22] of kind of that form and in addition to definite clause we [00:02:24] and in addition to definite clause we also have this other thing and that is [00:02:26] also have this other thing and that is called a goal clause so a goal clause is [00:02:29] called a goal clause so a goal clause is is a clause of this form p1 and it [00:02:31] is a clause of this form p1 and it through pk implying false okay so so [00:02:35] through pk implying false okay so so this clause is called the goal clause [00:02:37] this clause is called the goal clause okay so like traffic and accident [00:02:40] okay so like traffic and accident implying false is going to be a gold [00:02:42] implying false is going to be a gold class [00:02:43] class so what is a foreign cause a foreign [00:02:45] so what is a foreign cause a foreign clause is a cause that is either a [00:02:47] clause is a cause that is either a definite cause or a goal [00:02:49] definite cause or a goal okay [00:02:50] okay and then the reason i'm separating this [00:02:51] and then the reason i'm separating this goal cause here is this type of goal [00:02:53] goal cause here is this type of goal clauses have a specific form they're [00:02:55] clauses have a specific form they're equivalent to basically saying negation [00:02:58] equivalent to basically saying negation of whatever comes first right because [00:03:00] of whatever comes first right because implication is negation of this or false [00:03:04] implication is negation of this or false or false goes away right so so so then [00:03:06] or false goes away right so so so then it's basically just negation of this [00:03:08] it's basically just negation of this first part what is negation of this [00:03:10] first part what is negation of this first part then that is negation of [00:03:12] first part then that is negation of traffic and accident or negation of [00:03:14] traffic and accident or negation of traffic or negation of accident so [00:03:16] traffic or negation of accident so basically you can think of a bunch of [00:03:18] basically you can think of a bunch of ores of a bunch of negations and then [00:03:20] ores of a bunch of negations and then that acts as a gold clause and and that [00:03:22] that acts as a gold clause and and that is also allowed here when you talk about [00:03:24] is also allowed here when you talk about foreign clauses in general okay [00:03:27] foreign clauses in general okay all right so that's a wrong clause [00:03:29] all right so that's a wrong clause and then i'm going to expand this idea [00:03:31] and then i'm going to expand this idea of modus ponens we talked about modus [00:03:32] of modus ponens we talked about modus ponens being of the form of p and p [00:03:35] ponens being of the form of p and p implying q being able to give us q right [00:03:39] implying q being able to give us q right so so the more general modus ponens for [00:03:42] so so the more general modus ponens for infrastructure exponents is of this form [00:03:44] infrastructure exponents is of this form of having p1 through pk [00:03:46] of having p1 through pk and then p1 through pka and it together [00:03:48] and then p1 through pka and it together implying q giving us q here is an [00:03:51] implying q giving us q here is an example so let's say it is wet and it's [00:03:54] example so let's say it is wet and it's a weekday and if it isn't wet and it is [00:03:56] a weekday and if it isn't wet and it is a weekday there is traffic okay so so [00:03:58] a weekday there is traffic okay so so this [00:04:00] this is going to imply traffic here for us [00:04:02] is going to imply traffic here for us okay so that's just a more general form [00:04:04] okay so that's just a more general form of modus ponens okay [00:04:07] of modus ponens okay all right so then we have basically this [00:04:09] all right so then we have basically this theorem and then the theorem says that [00:04:12] theorem and then the theorem says that if i apply this modus ponens true only [00:04:14] if i apply this modus ponens true only on horned clauses then i'm going to get [00:04:16] on horned clauses then i'm going to get completeness okay so so modus ponens is [00:04:19] completeness okay so so modus ponens is complete with respect to horn causes and [00:04:22] complete with respect to horn causes and what that means is that suppose that you [00:04:24] what that means is that suppose that you have a knowledge base that only has horn [00:04:26] have a knowledge base that only has horn clauses and then p is entailed p is a [00:04:29] clauses and then p is entailed p is a symbol and p is entailed in in this [00:04:31] symbol and p is entailed in in this knowledge base then if i just apply [00:04:33] knowledge base then if i just apply modus ponens if i just apply this [00:04:35] modus ponens if i just apply this particular inference rule of modus [00:04:36] particular inference rule of modus ponens i will be able to derive p [00:04:39] ponens i will be able to derive p and that's pretty nice because in [00:04:41] and that's pretty nice because in general if you ask me like remember to [00:04:43] general if you ask me like remember to ask and tell operators if you ask me is [00:04:45] ask and tell operators if you ask me is p true you're really asking me if p is [00:04:48] p true you're really asking me if p is entailed in kb and instead of me doing [00:04:51] entailed in kb and instead of me doing something of the form of model checking [00:04:53] something of the form of model checking and satisfying satisfiability and things [00:04:55] and satisfying satisfiability and things of those forms that we have talked about [00:04:57] of those forms that we have talked about instead of me doing like all of that and [00:04:59] instead of me doing like all of that and trying to figure out if this model [00:05:01] trying to figure out if this model really like entails p or not [00:05:04] really like entails p or not then what i can do is i can basically do [00:05:06] then what i can do is i can basically do a symbol manipulation i can basically [00:05:08] a symbol manipulation i can basically just apply about exponents on my [00:05:09] just apply about exponents on my knowledge base and see if i can derive [00:05:12] knowledge base and see if i can derive it like syntactically or not and then if [00:05:14] it like syntactically or not and then if i can then then then this derivation and [00:05:17] i can then then then this derivation and entailment are equivalent right like if [00:05:19] entailment are equivalent right like if i can derive this based on syntax and [00:05:21] i can derive this based on syntax and based on modus ponens then i would be [00:05:23] based on modus ponens then i would be able to say that the knowledge base also [00:05:25] able to say that the knowledge base also entails p [00:05:27] entails p so going back to this diagram that we [00:05:29] so going back to this diagram that we had before right so so we will have [00:05:32] had before right so so we will have soundness and completeness meaning that [00:05:34] soundness and completeness meaning that um this idea of derivation knowledge [00:05:37] um this idea of derivation knowledge base knowledge base deriving g is going [00:05:39] base knowledge base deriving g is going to be equivalent to knowledge base [00:05:41] to be equivalent to knowledge base entailing g so if you ask me is g true [00:05:44] entailing g so if you ask me is g true or like if you want to add g to the [00:05:46] or like if you want to add g to the knowledge base remember that ask and [00:05:48] knowledge base remember that ask and tell operations that's about asking for [00:05:51] tell operations that's about asking for entailment right and if it is asking for [00:05:53] entailment right and if it is asking for entailment the end right if i'm in a [00:05:55] entailment the end right if i'm in a space where i have sound the same [00:05:56] space where i have sound the same completeness of my inference rules modus [00:05:58] completeness of my inference rules modus ponens in this case then i can just do [00:06:00] ponens in this case then i can just do this derivation which is much simpler [00:06:05] this derivation which is much simpler all right [00:06:06] all right so let's just look at an example here so [00:06:08] so let's just look at an example here so let's say that my knowledge base are the [00:06:10] let's say that my knowledge base are the following and formulas here and then my [00:06:13] following and formulas here and then my modus ponens rule is this more general [00:06:14] modus ponens rule is this more general rule of p1 through pk and then np1 [00:06:17] rule of p1 through pk and then np1 through pk added together implying q and [00:06:19] through pk added together implying q and that gives me q okay so so what happens [00:06:22] that gives me q okay so so what happens here so so if you ask me based on your [00:06:25] here so so if you ask me based on your knowledge base is there traffic can you [00:06:27] knowledge base is there traffic can you tell me if there is traffic or not what [00:06:29] tell me if there is traffic or not what i can do is i can check if the knowledge [00:06:31] i can do is i can check if the knowledge base derives traffic and how do i do [00:06:33] base derives traffic and how do i do that well i have rain and rain implies [00:06:35] that well i have rain and rain implies wet if i apply modus ponens on my [00:06:37] wet if i apply modus ponens on my knowledge base i get wet [00:06:39] knowledge base i get wet i know that it's a weekday that's in my [00:06:40] i know that it's a weekday that's in my knowledge base i have got the sweat and [00:06:42] knowledge base i have got the sweat and added that to my knowledge base i also [00:06:44] added that to my knowledge base i also have red and weekday implies traffic in [00:06:47] have red and weekday implies traffic in my knowledge base with all these three [00:06:49] my knowledge base with all these three together i can infer i can infer traffic [00:06:52] together i can infer i can infer traffic i can derive traffic and because [00:06:54] i can derive traffic and because knowledgebase derives traffic and we [00:06:56] knowledgebase derives traffic and we have soundness and completeness because [00:06:58] have soundness and completeness because we are looking at only horn clauses we [00:07:00] we are looking at only horn clauses we are able to say the knowledge base here [00:07:02] are able to say the knowledge base here in this case entails traffic [00:07:05] in this case entails traffic all right so this is kind of like an [00:07:08] all right so this is kind of like an overview of what we have talked about so [00:07:09] overview of what we have talked about so far we've talked about formulas that's [00:07:11] far we've talked about formulas that's in the syntax land they have meanings in [00:07:13] in the syntax land they have meanings in the semantic line we have models for [00:07:15] the semantic line we have models for each of them and then in the semantic [00:07:17] each of them and then in the semantic land if you want to check if you want to [00:07:19] land if you want to check if you want to check something is entailed or not we [00:07:20] check something is entailed or not we have to do satisfiability right we have [00:07:22] have to do satisfiability right we have to have to do model checking and that [00:07:24] to have to do model checking and that was quite involved so instead of doing [00:07:26] was quite involved so instead of doing that if we have a set of inference rules [00:07:28] that if we have a set of inference rules that are going to be sound and complete [00:07:30] that are going to be sound and complete either because maybe our formulas are [00:07:32] either because maybe our formulas are restricted or maybe our inference rules [00:07:34] restricted or maybe our inference rules are fancier then we are able to derive [00:07:37] are fancier then we are able to derive the formula and and that derivation if [00:07:39] the formula and and that derivation if you have soundness and completeness that [00:07:41] you have soundness and completeness that derivation is the same thing as checking [00:07:43] derivation is the same thing as checking entailment so [00:07:45] entailment so in this module we've talked about form [00:07:46] in this module we've talked about form clauses and and kind of like a [00:07:48] clauses and and kind of like a restricted version of formulas where we [00:07:50] restricted version of formulas where we can apply modus ponens in the next [00:07:52] can apply modus ponens in the next module we'll be talking about resolution [00:07:54] module we'll be talking about resolution so a fancier inference rule as opposed [00:07:56] so a fancier inference rule as opposed to changing our formulas in order to get [00:07:58] to changing our formulas in order to get both soundness and completeness ================================================================================ LECTURE 046 ================================================================================ Logic 6 - Propositional Resolutions | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=egLAF4dFdBo --- Transcript [00:00:05] so in this module we're going to be [00:00:07] so in this module we're going to be talking about the resolution which is an [00:00:09] talking about the resolution which is an inference rule so so far we've been [00:00:11] inference rule so so far we've been talking about propositional logic we've [00:00:13] talking about propositional logic we've been talking about syntax and semantics [00:00:14] been talking about syntax and semantics of propositional logic and we discussed [00:00:16] of propositional logic and we discussed one inference rule specifically modus [00:00:18] one inference rule specifically modus ponens and the idea of this of an [00:00:20] ponens and the idea of this of an inference rule is can we do uh [00:00:22] inference rule is can we do uh manipulation of syntax in the syntactic [00:00:25] manipulation of syntax in the syntactic land over formulas in order to derive in [00:00:28] land over formulas in order to derive in order to prove a new formula and the [00:00:31] order to prove a new formula and the idea is is that in france through under [00:00:33] idea is is that in france through under that specific set of logic and logical [00:00:35] that specific set of logic and logical formulas is that sound and complete and [00:00:37] formulas is that sound and complete and what we have seen is if i apply just [00:00:39] what we have seen is if i apply just modus ponens on propositional logic i [00:00:42] modus ponens on propositional logic i get soundness but i don't get [00:00:44] get soundness but i don't get completeness and then what that means is [00:00:46] completeness and then what that means is if i have a bunch of formulas that are [00:00:48] if i have a bunch of formulas that are entailed that are true i'm not going to [00:00:50] entailed that are true i'm not going to be able to get all of them if i apply [00:00:52] be able to get all of them if i apply modus ponens on propositional logic so [00:00:54] modus ponens on propositional logic so we talked about two ways of solving that [00:00:56] we talked about two ways of solving that and we discussed the first way the first [00:00:58] and we discussed the first way the first idea was instead of looking at all of [00:01:01] idea was instead of looking at all of propositional logic let's look at a [00:01:03] propositional logic let's look at a subset of it and that subset is [00:01:04] subset of it and that subset is propositional logic with only horn [00:01:06] propositional logic with only horn classes so we defined foreign clauses [00:01:09] classes so we defined foreign clauses during the last module and we looked at [00:01:10] during the last module and we looked at propositional logic with only horn [00:01:12] propositional logic with only horn clauses and in that case if i apply [00:01:14] clauses and in that case if i apply modus ponens then i get soundness and [00:01:16] modus ponens then i get soundness and completeness everything is great [00:01:19] completeness everything is great the other option is what if i don't want [00:01:21] the other option is what if i don't want to limit my prepositional logic what if [00:01:23] to limit my prepositional logic what if i want to look at all of her positional [00:01:24] i want to look at all of her positional logic can i make my inference rule a [00:01:27] logic can i make my inference rule a little bit fancier a little bit more [00:01:28] little bit fancier a little bit more powerful [00:01:29] powerful so in this module we are going to be [00:01:31] so in this module we are going to be talking about a new type of inference [00:01:33] talking about a new type of inference rule specifically called resolution as a [00:01:35] rule specifically called resolution as a way of getting both soundness and [00:01:38] way of getting both soundness and completeness [00:01:39] completeness all right so [00:01:40] all right so to start with i i want to just write out [00:01:42] to start with i i want to just write out a few things that that we're all aware [00:01:44] a few things that that we're all aware of but uh let's just get on the same [00:01:46] of but uh let's just get on the same page on all of them so [00:01:49] page on all of them so um all right so [00:01:51] um all right so let me just write out a few things so so [00:01:53] let me just write out a few things so so if we have p twice q well what is that [00:01:55] if we have p twice q well what is that equivalent to that is equivalence to [00:01:58] equivalent to that is equivalence to negation of p or q let's just write out [00:02:01] negation of p or q let's just write out some of these equivalences here [00:02:03] some of these equivalences here if i get if i have negation of p [00:02:06] if i get if i have negation of p and if i have negation of p and q [00:02:10] and if i have negation of p and q what is that what is that equivalent to [00:02:13] what is that what is that equivalent to well i can apply de morgan's law [00:02:16] well i can apply de morgan's law and that gives me gets me negation of p [00:02:19] and that gives me gets me negation of p or negation of q and then if i have [00:02:23] or negation of q and then if i have negation of p [00:02:24] negation of p or q [00:02:26] or q what is that going to be that is going [00:02:28] what is that going to be that is going to be equal to negation of p [00:02:30] to be equal to negation of p and negation of q [00:02:32] and negation of q remove these extra lines [00:02:37] all right so these are a few [00:02:38] all right so these are a few equivalences that we all agree over this [00:02:41] equivalences that we all agree over this is just how they are like [00:02:44] is just how they are like it's it's just like truth right so if [00:02:46] it's it's just like truth right so if you look at the truth a truth table of [00:02:47] you look at the truth a truth table of these you're gonna get you're going to [00:02:49] these you're gonna get you're going to get these these equivalences and the [00:02:51] get these these equivalences and the reason i'm defining these equivalences [00:02:54] reason i'm defining these equivalences is in general i would like to write [00:02:55] is in general i would like to write everything in the form of disjunctions [00:02:57] everything in the form of disjunctions and conjunctions okay so let me define a [00:03:00] and conjunctions okay so let me define a few other things here so so i'm going to [00:03:01] few other things here so so i'm going to define [00:03:02] define a literal [00:03:05] a literal as a prepositional symbol p [00:03:08] as a prepositional symbol p or a negation of a proposed per [00:03:09] or a negation of a proposed per positional symbol negation of p okay so [00:03:12] positional symbol negation of p okay so a literal is just p or negation of p [00:03:14] a literal is just p or negation of p where p is just a compositional symbol [00:03:16] where p is just a compositional symbol okay [00:03:17] okay so then based on that one can define a [00:03:20] so then based on that one can define a clause [00:03:21] clause to be a conjunction [00:03:23] to be a conjunction sorry to be a disjunction of [00:03:25] sorry to be a disjunction of prepositional symbols okay so we talked [00:03:28] prepositional symbols okay so we talked about born clauses during the last [00:03:30] about born clauses during the last module but we never like defined what a [00:03:32] module but we never like defined what a clause is so a clause is just an or of a [00:03:34] clause is so a clause is just an or of a bunch of literals it's just a [00:03:36] bunch of literals it's just a disjunction of a bunch of literal so i [00:03:38] disjunction of a bunch of literal so i can have a clause that's like p1 or [00:03:40] can have a clause that's like p1 or negation of p2 maybe or p3 this is a [00:03:43] negation of p2 maybe or p3 this is a clause [00:03:44] clause because it's just an or of a bunch of [00:03:46] because it's just an or of a bunch of literals [00:03:48] literals so then the question is what is a [00:03:50] so then the question is what is a foreign clause so could we would you [00:03:52] foreign clause so could we would you find horn clauses last lecture but we [00:03:54] find horn clauses last lecture but we could think about horn clauses a little [00:03:56] could think about horn clauses a little bit differently here so a horn [00:03:58] bit differently here so a horn clause it's basically a clause it's just [00:04:01] clause it's basically a clause it's just a disjunction of a bunch of literals [00:04:03] a disjunction of a bunch of literals with at most one positive literal so i'm [00:04:06] with at most one positive literal so i'm going to refer to this guy as a positive [00:04:08] going to refer to this guy as a positive literal [00:04:09] literal and this guy as a negative literal [00:04:14] and this guy as a negative literal and a horned class basically says you [00:04:16] and a horned class basically says you have at most one positive literal in in [00:04:19] have at most one positive literal in in your class for example this clause that [00:04:21] your class for example this clause that i've written here is not a horn clause [00:04:23] i've written here is not a horn clause right because it has two positive [00:04:24] right because it has two positive literals p1 and p3 but for example i can [00:04:27] literals p1 and p3 but for example i can have another horn clause that is p1 or [00:04:30] have another horn clause that is p1 or negation of p2 [00:04:33] negation of p2 or [00:04:34] or negation of p3 [00:04:36] negation of p3 and then this is going to be um a horn [00:04:40] and then this is going to be um a horn clause because it has at most one [00:04:41] clause because it has at most one positive literal that that is p1 okay so [00:04:44] positive literal that that is p1 okay so this is just another way of looking at [00:04:46] this is just another way of looking at horn clauses so going back here right so [00:04:49] horn clauses so going back here right so we have a implies c how can we write it [00:04:51] we have a implies c how can we write it we can write it as negation of a or c [00:04:54] we can write it as negation of a or c we have a and b implying c right what is [00:04:57] we have a and b implying c right what is that equal to it's negation of this [00:04:59] that equal to it's negation of this first part i can use the morgan's law [00:05:01] first part i can use the morgan's law and that gives me negation of a or [00:05:03] and that gives me negation of a or negation of b or c remember like again [00:05:06] negation of b or c remember like again again this is this is a cause now and [00:05:08] again this is this is a cause now and it's a horn clause and and again like [00:05:10] it's a horn clause and and again like defining what i've defined so far a [00:05:12] defining what i've defined so far a literal is going to be a prepositional [00:05:15] literal is going to be a prepositional uh symbol either positive or negative [00:05:17] uh symbol either positive or negative either p or negation of p a clause is [00:05:20] either p or negation of p a clause is just a disjunction of these literals an [00:05:22] just a disjunction of these literals an important clause is just a clause with [00:05:24] important clause is just a clause with at most one positive literal okay [00:05:27] at most one positive literal okay all right so now when i'm thinking about [00:05:29] all right so now when i'm thinking about modus ponens i can actually write it out [00:05:31] modus ponens i can actually write it out as as clauses right like remember i have [00:05:33] as as clauses right like remember i have a and a implies c and that gets me c [00:05:37] a and a implies c and that gets me c that is what modus ponens tells me [00:05:39] that is what modus ponens tells me instead of a implies c i can just write [00:05:41] instead of a implies c i can just write it as a clause and i can write it as [00:05:42] it as a clause and i can write it as negation of a or c okay [00:05:45] negation of a or c okay and kind of like intuitively like what [00:05:48] and kind of like intuitively like what is really happening is you're cancelling [00:05:50] is really happening is you're cancelling out a and negation of a that's why we [00:05:53] out a and negation of a that's why we are getting c [00:05:54] are getting c okay [00:05:56] okay and the reason i'm defining modus ponens [00:05:58] and the reason i'm defining modus ponens like this i'm rewriting it is this kind [00:06:00] like this i'm rewriting it is this kind of helps us to think about this more [00:06:02] of helps us to think about this more general resolution rule that i'll be [00:06:04] general resolution rule that i'll be talking about in a few slides okay [00:06:06] talking about in a few slides okay so the idea of resolution is i don't [00:06:09] so the idea of resolution is i don't want to limit myself to specific types [00:06:11] want to limit myself to specific types of clauses i can talk about general [00:06:13] of clauses i can talk about general clauses and general clauses are what are [00:06:15] clauses and general clauses are what are they there are disjunctions of of [00:06:18] they there are disjunctions of of positive or negative literals [00:06:21] positive or negative literals and the idea of resolution is if you [00:06:23] and the idea of resolution is if you have a bunch of clauses [00:06:25] have a bunch of clauses you'll have a rule you'll have an [00:06:26] you'll have a rule you'll have an inference rule that cancels out your [00:06:29] inference rule that cancels out your positive and negative literals so here's [00:06:32] positive and negative literals so here's an example so if it is raining or [00:06:34] an example so if it is raining or snowing that's part of your knowledge [00:06:36] snowing that's part of your knowledge base [00:06:37] base and if it is not snowing or there is [00:06:40] and if it is not snowing or there is traffic [00:06:41] traffic one can infer that it is raining or [00:06:44] one can infer that it is raining or there is traffic [00:06:45] there is traffic why let's think about like why can't we [00:06:47] why let's think about like why can't we why can't we infer this even intuitively [00:06:50] why can't we infer this even intuitively okay [00:06:50] okay so so if it is snowing right so so so if [00:06:54] so so if it is snowing right so so so if it is snowing then there's got to be if [00:06:56] it is snowing then there's got to be if the snowing is true right there's got to [00:06:59] the snowing is true right there's got to be traffic okay so that's how i get [00:07:01] be traffic okay so that's how i get traffic and if it is not snowing right [00:07:04] traffic and if it is not snowing right if it's not snowing then there's got to [00:07:06] if it's not snowing then there's got to be rain because it's either snowing or [00:07:08] be rain because it's either snowing or or or raining so that's how i get rain [00:07:10] or or raining so that's how i get rain right so intuitively that is why you're [00:07:13] right so intuitively that is why you're getting this rain or traffic and in some [00:07:15] getting this rain or traffic and in some sense you can think about snow and [00:07:17] sense you can think about snow and negation of snow canceling each other [00:07:19] negation of snow canceling each other out because because either if it is [00:07:21] out because because either if it is snowing or or or if it is not snowing [00:07:24] snowing or or or if it is not snowing you are going to get traffic or rain out [00:07:26] you are going to get traffic or rain out of it [00:07:27] of it and then this is basically resolution [00:07:29] and then this is basically resolution inference will apply to one example okay [00:07:32] inference will apply to one example okay one can think about this much more [00:07:34] one can think about this much more generally and think about a clause f1 or [00:07:37] generally and think about a clause f1 or disjunctive through fn or p and then [00:07:40] disjunctive through fn or p and then another clause where you have negation [00:07:41] another clause where you have negation of p or g one through g m [00:07:44] of p or g one through g m and then the idea of an inference rule [00:07:46] and then the idea of an inference rule is based on these two premises you can [00:07:48] is based on these two premises you can conclude a new clause that cancels out p [00:07:52] conclude a new clause that cancels out p and negation [00:07:54] and negation so this is called resolution [00:07:56] so this is called resolution all right [00:08:00] so is is resolution sound so that's a [00:08:03] so is is resolution sound so that's a very good question to ask in general we [00:08:05] very good question to ask in general we want it to be sound because we want to [00:08:06] want it to be sound because we want to be able to derive things that are [00:08:08] be able to derive things that are actually true so is it like remember [00:08:10] actually true so is it like remember this example is it true that i can [00:08:12] this example is it true that i can derive rain or traffic here okay so so [00:08:15] derive rain or traffic here okay so so how do i check that well to check [00:08:17] how do i check that well to check soundness i need to actually get to the [00:08:18] soundness i need to actually get to the models and meanings of each one of these [00:08:20] models and meanings of each one of these formulas and i need to check entailment [00:08:22] formulas and i need to check entailment so let's check that on this example so [00:08:25] so let's check that on this example so if i have rain or snow what is models of [00:08:27] if i have rain or snow what is models of rain or snow models of so so i have here [00:08:30] rain or snow models of so so i have here my my truth table is going to be a [00:08:32] my my truth table is going to be a little bit larger because i have both [00:08:34] little bit larger because i have both snow rain and traffic so i need to look [00:08:36] snow rain and traffic so i need to look at zero one values for all of them [00:08:39] at zero one values for all of them so i have rain or snow rain or snow [00:08:41] so i have rain or snow rain or snow correspond to these shaded areas so [00:08:43] correspond to these shaded areas so that's models of rain or snow [00:08:45] that's models of rain or snow and then i have models of [00:08:48] and then i have models of not snow or traffic that corresponds to [00:08:50] not snow or traffic that corresponds to these shaded areas [00:08:52] these shaded areas and remember as i add more formulas to [00:08:54] and remember as i add more formulas to my knowledge base i'm shrinking its [00:08:56] my knowledge base i'm shrinking its models right i'm adding more constraints [00:08:58] models right i'm adding more constraints so i'm shrinking the models that is why [00:09:00] so i'm shrinking the models that is why models of of these two formulas is going [00:09:03] models of of these two formulas is going to be the intersection of their models [00:09:05] to be the intersection of their models so the intersection is going to be this [00:09:07] so the intersection is going to be this darker red area okay [00:09:10] darker red area okay so if i'm checking entailment if i'm [00:09:12] so if i'm checking entailment if i'm basically checking if resolution is [00:09:14] basically checking if resolution is sound i should be checking entailment [00:09:16] sound i should be checking entailment and what that means is i should be [00:09:18] and what that means is i should be checking if the models of what is in my [00:09:20] checking if the models of what is in my knowledge base is going to be is is [00:09:23] knowledge base is going to be is is going to be uh as a subset of models of [00:09:27] going to be uh as a subset of models of this new formula that i'm trying to [00:09:29] this new formula that i'm trying to derive here so what's the new formula [00:09:30] derive here so what's the new formula i'm trying to derive here resolution [00:09:32] i'm trying to derive here resolution tells me you can derive rain or traffic [00:09:35] tells me you can derive rain or traffic and if i look at rain or traffic and [00:09:36] and if i look at rain or traffic and models of rain or traffic i get this [00:09:39] models of rain or traffic i get this green area so the question is is the [00:09:41] green area so the question is is the shade a dark dark red area uh a subset [00:09:45] shade a dark dark red area uh a subset of the green area and in this case it is [00:09:47] of the green area and in this case it is so so it turns out that the resolution [00:09:49] so so it turns out that the resolution is actually sound so so in terms of [00:09:51] is actually sound so so in terms of thinking about the models thinking about [00:09:53] thinking about the models thinking about the semantics here we are getting we are [00:09:55] the semantics here we are getting we are getting soundness we are ensuring that [00:09:57] getting soundness we are ensuring that we are getting truth uh by applying [00:09:59] we are getting truth uh by applying resolution okay so resolution is sound [00:10:03] resolution okay so resolution is sound so as you've kind of seen resolution [00:10:06] so as you've kind of seen resolution only works on clauses right like i've [00:10:08] only works on clauses right like i've been defining these clauses which are [00:10:10] been defining these clauses which are disjunctions of literals and the [00:10:12] disjunctions of literals and the question is can i apply resolution to [00:10:15] question is can i apply resolution to all of propositional logic and the [00:10:16] all of propositional logic and the answer is yes it actually turns out that [00:10:19] answer is yes it actually turns out that the fact because like if resolution only [00:10:21] the fact because like if resolution only works on clauses that is actually enough [00:10:24] works on clauses that is actually enough and the reason that is actually enough [00:10:26] and the reason that is actually enough is you can think about any prepositional [00:10:28] is you can think about any prepositional formula and you can write any [00:10:30] formula and you can write any prepositional formula as a conjunction [00:10:33] prepositional formula as a conjunction of a bunch of clauses and that's called [00:10:35] of a bunch of clauses and that's called a conjunctive conjunctive normal form [00:10:38] a conjunctive conjunctive normal form okay so a conjunctive normal form a cnf [00:10:41] okay so a conjunctive normal form a cnf formula is a conjunction of clauses okay [00:10:44] formula is a conjunction of clauses okay so so an example of that is is it's you [00:10:47] so so an example of that is is it's you have a clause a or b or negation of c [00:10:50] have a clause a or b or negation of c you have another clause negation of b or [00:10:52] you have another clause negation of b or d [00:10:53] d and an end of these two clauses is is a [00:10:56] and an end of these two clauses is is a conjunct isn't a conjunctive normal form [00:10:58] conjunct isn't a conjunctive normal form okay so you can kind of like think of [00:11:01] okay so you can kind of like think of this as the equivalent of having a [00:11:02] this as the equivalent of having a knowledge base where you have each [00:11:04] knowledge base where you have each formula is a clause and then when you [00:11:07] formula is a clause and then when you have a bunch of formulas in your in your [00:11:08] have a bunch of formulas in your in your knowledge base you're basically thinking [00:11:10] knowledge base you're basically thinking about end of those formulas right so so [00:11:12] about end of those formulas right so so a knowledge base basically is an end of [00:11:15] a knowledge base basically is an end of a bunch of a conjunction of a bunch of [00:11:17] a bunch of a conjunction of a bunch of formulas that could be written let's say [00:11:19] formulas that could be written let's say as clauses okay [00:11:22] as clauses okay all right so so then basically every [00:11:25] all right so so then basically every formula that that is written in [00:11:27] formula that that is written in propositional logic can be converted [00:11:30] propositional logic can be converted into a conjunctive normal form in a new [00:11:32] into a conjunctive normal form in a new formula conjunctive normal form that's [00:11:33] formula conjunctive normal form that's that's exactly equal like the models of [00:11:36] that's exactly equal like the models of the old formula is exactly equal to the [00:11:38] the old formula is exactly equal to the models of the new formula so how can we [00:11:40] models of the new formula so how can we do that um it's actually a [00:11:43] do that um it's actually a kind of um easy way of doing it it's [00:11:46] kind of um easy way of doing it it's just there's just a recipe uh for for [00:11:48] just there's just a recipe uh for for trans for converting every formula to a [00:11:51] trans for converting every formula to a conjunctive normal form let's let's look [00:11:52] conjunctive normal form let's let's look at an example so [00:11:54] at an example so let's say that you have a formula it [00:11:56] let's say that you have a formula it says summer implies no and the whole [00:11:57] says summer implies no and the whole thing implies bizarre okay [00:12:00] thing implies bizarre okay so um [00:12:02] so um here i don't have any ends or ors right [00:12:04] here i don't have any ends or ors right like i have this implication so i need [00:12:06] like i have this implication so i need to get rid of that implication how can i [00:12:08] to get rid of that implication how can i do that i can basically remove [00:12:10] do that i can basically remove implication and and write that write it [00:12:12] implication and and write that write it out in the form that i talked about [00:12:13] out in the form that i talked about earlier which is negation of the first [00:12:15] earlier which is negation of the first term or the second term so this [00:12:17] term or the second term so this implication i can write it as negation [00:12:19] implication i can write it as negation of the first term this whole term or the [00:12:22] of the first term this whole term or the second term i can remove this [00:12:24] second term i can remove this implication and write it in a similar [00:12:27] implication and write it in a similar way i can write it as negation of summer [00:12:29] way i can write it as negation of summer or snow okay so now what i'm going to do [00:12:32] or snow okay so now what i'm going to do is i'm going to push the negation inside [00:12:33] is i'm going to push the negation inside using the morgan's law so i'm going to [00:12:35] using the morgan's law so i'm going to push a negation inside make this and [00:12:37] push a negation inside make this and push negation inside i have a double [00:12:39] push negation inside i have a double negation here i'm going to get rid of [00:12:41] negation here i'm going to get rid of that double negation and make this [00:12:43] that double negation and make this positive [00:12:44] positive so now i have a bunch of literals [00:12:45] so now i have a bunch of literals positive or negative and i only have [00:12:48] positive or negative and i only have like ands and ors but this is actually [00:12:50] like ands and ors but this is actually not in the conjunctive normal form right [00:12:52] not in the conjunctive normal form right because conjunctive normal form means [00:12:54] because conjunctive normal form means that and of a bunch of ores this is [00:12:57] that and of a bunch of ores this is actually the opposite right this is this [00:12:58] actually the opposite right this is this is or of a bunch of ants [00:13:01] is or of a bunch of ants but what you can you can you can [00:13:02] but what you can you can you can actually distribute this or over the end [00:13:05] actually distribute this or over the end and then you end up if you distribute [00:13:07] and then you end up if you distribute the or over the end you end up with [00:13:09] the or over the end you end up with these two causes summer or bizarre and [00:13:11] these two causes summer or bizarre and another clause which is [00:13:13] another clause which is negation of snow or bizarre [00:13:15] negation of snow or bizarre okay so so you end up in a cnf form any [00:13:18] okay so so you end up in a cnf form any formula you give me i can end up in a [00:13:20] formula you give me i can end up in a cnn form so so the general recipe for it [00:13:23] cnn form so so the general recipe for it is if you have if you have implications [00:13:25] is if you have if you have implications or these bi-directional implications [00:13:28] or these bi-directional implications replace them with ad and orders and [00:13:29] replace them with ad and orders and negations so that's the first thing you [00:13:31] negations so that's the first thing you want to do bi-directional implication [00:13:33] want to do bi-directional implication write it out as as [00:13:35] write it out as as as implications and ants [00:13:37] as implications and ants if you see an implication write it out [00:13:39] if you see an implication write it out as a form of negation and or [00:13:42] as a form of negation and or if you have any negations move them [00:13:43] if you have any negations move them inside using de morgan's law [00:13:45] inside using de morgan's law if you have double negations remove the [00:13:47] if you have double negations remove the double negations and then at the end [00:13:50] double negations and then at the end just distribute or over end if you have [00:13:52] just distribute or over end if you have if you have any anything of that form [00:13:53] if you have any anything of that form and you'll end up in a conjunctive [00:13:55] and you'll end up in a conjunctive normal form so that is kind of the [00:13:57] normal form so that is kind of the general recipe of converting any [00:13:59] general recipe of converting any propositional logic formula to a cnf [00:14:01] propositional logic formula to a cnf form okay [00:14:03] form okay and then why are we writing this as a [00:14:05] and then why are we writing this as a cnf form because the resolution rule [00:14:07] cnf form because the resolution rule works only on clauses which is it only [00:14:10] works only on clauses which is it only works on cnf form formulas [00:14:13] works on cnf form formulas all right [00:14:14] all right so so what's the idea of resolution [00:14:16] so so what's the idea of resolution algorithm well [00:14:18] algorithm well why are we trying to run resolution the [00:14:20] why are we trying to run resolution the reason is in general you might be asking [00:14:22] reason is in general you might be asking me if f is true or not right like we we [00:14:25] me if f is true or not right like we we care about having that assistant that [00:14:27] care about having that assistant that they can ask from or we can tell it [00:14:29] they can ask from or we can tell it things and and what does that do that [00:14:31] things and and what does that do that tries to basically check things like [00:14:33] tries to basically check things like entailment right so so if if the [00:14:36] entailment right so so if if the knowledge base if you want to check if [00:14:37] knowledge base if you want to check if the knowledge base is entailing a new [00:14:39] the knowledge base is entailing a new formula or not that's the same thing [00:14:42] formula or not that's the same thing right that's the same thing as checking [00:14:43] right that's the same thing as checking if negation of f if the knowledge base [00:14:45] if negation of f if the knowledge base contradicts negation of f or basically [00:14:48] contradicts negation of f or basically checking if negation of f added to the [00:14:51] checking if negation of f added to the knowledge base is unsatisfiable or not [00:14:53] knowledge base is unsatisfiable or not okay [00:14:54] okay so how do we run the resolution based [00:14:56] so how do we run the resolution based algorithm well what we do is if you ask [00:14:58] algorithm well what we do is if you ask me if f is entailed or not i'll add [00:15:00] me if f is entailed or not i'll add negation of f to my knowledge base and [00:15:02] negation of f to my knowledge base and then i convert all my formulas to cnf [00:15:05] then i convert all my formulas to cnf form we can do that and once i have [00:15:07] form we can do that and once i have everything to cn in a cnf form i can [00:15:09] everything to cn in a cnf form i can apply resolution i can keep repeatedly [00:15:11] apply resolution i can keep repeatedly applying resolution until until [00:15:14] applying resolution until until everything is converged and then i can [00:15:16] everything is converged and then i can return entailment even only if i'm [00:15:18] return entailment even only if i'm deriving false okay [00:15:20] deriving false okay so that is how we run resolution if if [00:15:23] so that is how we run resolution if if we want to answer a question about [00:15:25] we want to answer a question about entailment [00:15:27] entailment let's look at an example here so let's [00:15:28] let's look at an example here so let's say i have a knowledge base this is my [00:15:30] say i have a knowledge base this is my knowledge base it has a bunch of things [00:15:31] knowledge base it has a bunch of things in it they're not in a cnn form they're [00:15:33] in it they're not in a cnn form they're not in a class form or anything but but [00:15:35] not in a class form or anything but but i have a bunch of formulas and and [00:15:38] i have a bunch of formulas and and you're asking me if if this knowledge [00:15:40] you're asking me if if this knowledge base entails a new formula and that new [00:15:42] base entails a new formula and that new formula is c so how do i check that [00:15:44] formula is c so how do i check that using resolution what i'm going to do is [00:15:46] using resolution what i'm going to do is i'm going to add a negation of c to my [00:15:48] i'm going to add a negation of c to my knowledge base i'm going to make [00:15:50] knowledge base i'm going to make everything to the to a cnf form so using [00:15:53] everything to the to a cnf form so using that recipe that i talked about removing [00:15:55] that recipe that i talked about removing implications and pushing pushing [00:15:57] implications and pushing pushing negations in and distributing wars over [00:15:59] negations in and distributing wars over things right once i do that everything [00:16:01] things right once i do that everything is in a class form i have a clause and i [00:16:03] is in a class form i have a clause and i have literals okay so so this is my this [00:16:06] have literals okay so so this is my this is my uh knowledge base everything is in [00:16:09] is my uh knowledge base everything is in the class form in a cnf form [00:16:11] the class form in a cnf form and then i'm going to repeatedly apply a [00:16:13] and then i'm going to repeatedly apply a resolution so how do i apply resolution [00:16:15] resolution so how do i apply resolution let's start from from these two so i [00:16:17] let's start from from these two so i have a and i have negation of a or b or [00:16:20] have a and i have negation of a or b or c in some sense a and negation of a gets [00:16:23] c in some sense a and negation of a gets cancelled out so i can add b or c to my [00:16:26] cancelled out so i can add b or c to my to my knowledge base using resolution [00:16:28] to my knowledge base using resolution okay i have negation of b in my in my [00:16:31] okay i have negation of b in my in my knowledge base so so negation of b and b [00:16:34] knowledge base so so negation of b and b get cancelled out and i can add c to my [00:16:36] get cancelled out and i can add c to my knowledge base and i've added negation [00:16:38] knowledge base and i've added negation of c to my knowledge base of negation of [00:16:40] of c to my knowledge base of negation of c and [00:16:41] c and and c get cancelled out and i'll get [00:16:43] and c get cancelled out and i'll get false okay so after repeatedly applying [00:16:46] false okay so after repeatedly applying resolution here i'm getting false [00:16:48] resolution here i'm getting false meaning that when i add negation of the [00:16:50] meaning that when i add negation of the formula i i was able to get this [00:16:52] formula i i was able to get this contradiction i was able to get false [00:16:54] contradiction i was able to get false and what that means is that knowledge [00:16:56] and what that means is that knowledge base actually entails the formula the [00:16:58] base actually entails the formula the formula being c in this case okay so [00:17:00] formula being c in this case okay so knowledge base in klc yeah i can derive [00:17:03] knowledge base in klc yeah i can derive c okay [00:17:05] c okay all right so [00:17:07] all right so a good question to ask is what is the [00:17:08] a good question to ask is what is the time complexity of these and these [00:17:10] time complexity of these and these algorithms so so if you remember modus [00:17:12] algorithms so so if you remember modus ponens right like the idea of modus [00:17:14] ponens right like the idea of modus ponens this was a more general form of [00:17:16] ponens this was a more general form of it was that at every step right like we [00:17:18] it was that at every step right like we would at most uh kind of like add one [00:17:21] would at most uh kind of like add one prepositional symbol uh to to our to our [00:17:24] prepositional symbol uh to to our to our knowledge base and if you're adding one [00:17:26] knowledge base and if you're adding one prepositional symbol like if you have [00:17:28] prepositional symbol like if you have like n of them you have at most like n [00:17:30] like n of them you have at most like n things to go over so so this would be a [00:17:33] things to go over so so this would be a linear time algorithm like when you're [00:17:35] linear time algorithm like when you're running modus ponens it's pretty simple [00:17:37] running modus ponens it's pretty simple it's it's it's also converges fairly [00:17:39] it's it's it's also converges fairly quickly because our end things that we [00:17:41] quickly because our end things that we need to go over [00:17:42] need to go over but when we think about uh [00:17:44] but when we think about uh this this inference rule resolution but [00:17:47] this this inference rule resolution but when we were thinking about resolution [00:17:49] when we were thinking about resolution we are adding many prepositional symbols [00:17:51] we are adding many prepositional symbols back to our back to our um knowledge [00:17:54] back to our back to our um knowledge base and then kind of like worst case [00:17:55] base and then kind of like worst case you're adding all the subsets of [00:17:58] you're adding all the subsets of of the disjunctions of these these [00:18:00] of the disjunctions of these these symbols into our to our uh knowledge [00:18:03] symbols into our to our uh knowledge base at the end too so what that means [00:18:04] base at the end too so what that means is you have to go over all of them and [00:18:07] is you have to go over all of them and then that takes exponential time right [00:18:09] then that takes exponential time right so so running resolution is in terms of [00:18:12] so so running resolution is in terms of time complexity it takes exponential [00:18:14] time complexity it takes exponential time [00:18:15] time and it's actually not surprising that it [00:18:18] and it's actually not surprising that it takes exponential time if you think [00:18:19] takes exponential time if you think about what resolution is doing is it's [00:18:21] about what resolution is doing is it's actually trying to solve a [00:18:22] actually trying to solve a satisfiability problem right like you [00:18:25] satisfiability problem right like you have pauses and and and these clauses [00:18:27] have pauses and and and these clauses and you want to find you we want to [00:18:29] and you want to find you we want to check satisfiability here you're doing [00:18:30] check satisfiability here you're doing model shaking and that satisfiability is [00:18:33] model shaking and that satisfiability is known to be mp complete so it's not [00:18:35] known to be mp complete so it's not surprising that that running resolution [00:18:38] surprising that that running resolution until convergence actually takes [00:18:39] until convergence actually takes exponential time [00:18:41] exponential time so there are really some trade-offs here [00:18:43] so there are really some trade-offs here like if you think about using horn [00:18:46] like if you think about using horn clauses uh you could you could use modus [00:18:49] clauses uh you could you could use modus ponens the nice thing about it is that [00:18:51] ponens the nice thing about it is that it's going to be linear time but it is [00:18:53] it's going to be linear time but it is less expressive you're not able to [00:18:55] less expressive you're not able to represent everything in propositional [00:18:56] represent everything in propositional logic you're only limited to foreign [00:18:58] logic you're only limited to foreign clauses but one classes turn that turn [00:19:00] clauses but one classes turn that turn out to be kind of useful for for many [00:19:03] out to be kind of useful for for many applications especially some [00:19:04] applications especially some applications in programming languages so [00:19:06] applications in programming languages so so in those applications it does make [00:19:08] so in those applications it does make sense to use modus ponens because it [00:19:11] sense to use modus ponens because it it's faster it takes linear time on the [00:19:13] it's faster it takes linear time on the other hand if you really care about all [00:19:15] other hand if you really care about all of your positional logic then you really [00:19:17] of your positional logic then you really care about dealing with any type of [00:19:19] care about dealing with any type of causes and there you have to use [00:19:21] causes and there you have to use resolution but the problem with [00:19:23] resolution but the problem with resolution is it's trying to solve an [00:19:24] resolution is it's trying to solve an np-complete problem it takes exponential ================================================================================ LECTURE 047 ================================================================================ Logic 7 - First Order Logic | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=Z-O0Q3_oTJM --- Transcript [00:00:05] okay so in this module we would like to [00:00:07] okay so in this module we would like to talk about first order logic so far [00:00:09] talk about first order logic so far we've been talking about propositional [00:00:10] we've been talking about propositional logic you've talked about the syntax of [00:00:12] logic you've talked about the syntax of propositional logic it's semantics and [00:00:14] propositional logic it's semantics and we've also talked about a few different [00:00:16] we've also talked about a few different inference rules so we've talked about [00:00:17] inference rules so we've talked about modus ponens and and resolution okay [00:00:20] modus ponens and and resolution okay and now we want to extend our logic and [00:00:23] and now we want to extend our logic and make it a little bit fancier make it a [00:00:25] make it a little bit fancier make it a little bit more complicated and and [00:00:26] little bit more complicated and and think about first order logic so the [00:00:28] think about first order logic so the first question to ask is why do we even [00:00:31] first question to ask is why do we even want to do that why is why is [00:00:33] want to do that why is why is propositional logic not enough like if [00:00:35] propositional logic not enough like if you remember we talked about resolution [00:00:37] you remember we talked about resolution and and resolution was was taking an [00:00:39] and and resolution was was taking an exponential amount of time to to to be [00:00:42] exponential amount of time to to to be solved and it seemed pretty powerful [00:00:44] solved and it seemed pretty powerful right like if you can if you can do that [00:00:45] right like if you can if you can do that in propositional logic it seems pretty [00:00:47] in propositional logic it seems pretty useful it seems pretty powerful so so [00:00:49] useful it seems pretty powerful so so what are some of the limitations of [00:00:51] what are some of the limitations of prepositional logic okay so so let me [00:00:53] prepositional logic okay so so let me show that in one example so imagine we [00:00:55] show that in one example so imagine we start with a sentence that says alice [00:00:58] start with a sentence that says alice and bob both know are athletic so if i [00:01:00] and bob both know are athletic so if i want to write this in propositional [00:01:02] want to write this in propositional logic one way of doing that is is that i [00:01:04] logic one way of doing that is is that i can have a set of set of symbols for [00:01:07] can have a set of set of symbols for positional symbols one being alice knows [00:01:09] positional symbols one being alice knows arithmetic and i can have another [00:01:11] arithmetic and i can have another prepositional symbol about bob knows [00:01:13] prepositional symbol about bob knows arithmetic and this can take true or [00:01:15] arithmetic and this can take true or false values and one way of writing this [00:01:18] false values and one way of writing this is is by writing this this [00:01:20] is is by writing this this this particular formula where i can [00:01:23] this particular formula where i can write alice knows arithmetic and bob [00:01:25] write alice knows arithmetic and bob knows everything okay [00:01:27] knows everything okay so this seems uh something this seems a [00:01:30] so this seems uh something this seems a little weird right right this seems like [00:01:31] little weird right right this seems like something is wrong here so what is wrong [00:01:34] something is wrong here so what is wrong here [00:01:35] here so if i try to extend this and and write [00:01:38] so if i try to extend this and and write something that's slightly more [00:01:39] something that's slightly more complicated [00:01:40] complicated this type of writing symbols and adding [00:01:43] this type of writing symbols and adding them and so on just doesn't scale it's [00:01:45] them and so on just doesn't scale it's not expressive enough let's say i write [00:01:47] not expressive enough let's say i write all students know earth if i'm writing [00:01:50] all students know earth if i'm writing all students no arithmetic [00:01:52] all students no arithmetic then i'm going to list all student names [00:01:55] then i'm going to list all student names and then have a single [00:01:56] and then have a single a single prepositional symbol for each [00:01:58] a single prepositional symbol for each of students knowing arithmetic and [00:02:00] of students knowing arithmetic and ending that and that just just doesn't [00:02:03] ending that and that just just doesn't scale if i have a lot of students if [00:02:04] scale if i have a lot of students if you're in a class like 221 that wouldn't [00:02:07] you're in a class like 221 that wouldn't really scale right like i need to write [00:02:09] really scale right like i need to write alice's student implies alice knows [00:02:11] alice's student implies alice knows arithmetic bob the student implies bob [00:02:13] arithmetic bob the student implies bob knows arithmetic and each one of these [00:02:15] knows arithmetic and each one of these is going to be a symbol that takes like [00:02:17] is going to be a symbol that takes like true false by itself and and this is [00:02:19] true false by itself and and this is going to go up fairly quickly even worse [00:02:22] going to go up fairly quickly even worse like i can have a situation where i i [00:02:24] like i can have a situation where i i write a preposition that says [00:02:26] write a preposition that says i i write a statement that says every [00:02:28] i i write a statement that says every even integer greater than two is the sum [00:02:30] even integer greater than two is the sum of two primes okay so this is actually a [00:02:32] of two primes okay so this is actually a gold box conjecture okay so if i want to [00:02:34] gold box conjecture okay so if i want to write this in logic well i'm kind of [00:02:36] write this in logic well i'm kind of screwed right like i can't write this in [00:02:38] screwed right like i can't write this in propositional logic because it's talking [00:02:40] propositional logic because it's talking about every even integer and there's an [00:02:42] about every even integer and there's an infinite number of them so i'm not going [00:02:44] infinite number of them so i'm not going to be able to write that in [00:02:45] to be able to write that in propositional logic okay so so what can [00:02:48] propositional logic okay so so what can we do it it looks like if i'm using [00:02:51] we do it it looks like if i'm using propositional logic it's very clunky and [00:02:54] propositional logic it's very clunky and then there is there is kind of like a [00:02:55] then there is there is kind of like a lot of proposition a lot of [00:02:57] lot of proposition a lot of prepositional symbols going on and it [00:03:00] prepositional symbols going on and it just wouldn't scale [00:03:02] just wouldn't scale but if you think about it like when [00:03:03] but if you think about it like when you're thinking about these statements [00:03:06] you're thinking about these statements there are some objects here and then [00:03:08] there are some objects here and then some relationships some predicates [00:03:10] some relationships some predicates between these objects and maybe we can [00:03:12] between these objects and maybe we can use that structure there's quite a bit [00:03:13] use that structure there's quite a bit of structure here right like alice being [00:03:16] of structure here right like alice being a student or robbing a student right [00:03:18] a student or robbing a student right like being a student is is a predicate [00:03:21] like being a student is is a predicate on top of this object object being alice [00:03:23] on top of this object object being alice or object being bob so maybe we can use [00:03:25] or object being bob so maybe we can use that structure and instead of defining a [00:03:28] that structure and instead of defining a single propositional symbol for [00:03:29] single propositional symbol for everything maybe we can talk about [00:03:31] everything maybe we can talk about objects and predicates instead okay so [00:03:34] objects and predicates instead okay so so what that means is here for example [00:03:37] so what that means is here for example like in this other example alice knowing [00:03:39] like in this other example alice knowing you're arithmetic alice you can think of [00:03:40] you're arithmetic alice you can think of it as an object arithmetic you can think [00:03:42] it as an object arithmetic you can think of it as an object and knowing is a [00:03:45] of it as an object and knowing is a predicate on top of alice and arithmetic [00:03:48] predicate on top of alice and arithmetic and maybe we can we can think about that [00:03:50] and maybe we can we can think about that structure and in general this other view [00:03:52] structure and in general this other view that there are some objects and some [00:03:53] that there are some objects and some predicates on top of them and think of [00:03:56] predicates on top of them and think of that as a way of writing a new type of [00:03:58] that as a way of writing a new type of logic in addition to that like for that [00:04:01] logic in addition to that like for that example [00:04:02] example i was talking about every integer having [00:04:04] i was talking about every integer having a property then for for those types of [00:04:07] a property then for for those types of types of specifications we need to think [00:04:10] types of specifications we need to think about quantifiers right we need to think [00:04:12] about quantifiers right we need to think about ways of saying for all or rays of [00:04:15] about ways of saying for all or rays of saying there exists so so we need to [00:04:17] saying there exists so so we need to have a way of representing these [00:04:19] have a way of representing these quantifiers and to represent the [00:04:20] quantifiers and to represent the quantifier we need to have a variable [00:04:22] quantifier we need to have a variable right so when i say for all students [00:04:24] right so when i say for all students right i need to have a variable x that [00:04:27] right i need to have a variable x that corresponds to every single student so [00:04:29] corresponds to every single student so in addition to these objects and [00:04:31] in addition to these objects and predicates we need to have quantifiers [00:04:34] predicates we need to have quantifiers and variables and then we need to use [00:04:35] and variables and then we need to use quantifiers and variables to represent [00:04:38] quantifiers and variables to represent to represent our our statements here [00:04:40] to represent our our statements here okay let me let me give you [00:04:42] okay let me let me give you an example so so what i want to do in [00:04:45] an example so so what i want to do in this in this module today is i want to [00:04:46] this in this module today is i want to talk about the syntax of first order [00:04:48] talk about the syntax of first order logic and i want to then talk about [00:04:50] logic and i want to then talk about semantics of first order logic and the [00:04:52] semantics of first order logic and the next module i'll be talking about and [00:04:54] next module i'll be talking about and friends rules but i'm not gonna go into [00:04:57] friends rules but i'm not gonna go into as much like i'm not gonna do justice [00:04:59] as much like i'm not gonna do justice here so i'm not gonna [00:05:00] here so i'm not gonna go into as much details and syntax and [00:05:02] go into as much details and syntax and semantics the same way that we did in [00:05:04] semantics the same way that we did in propositional logic so it's a little bit [00:05:06] propositional logic so it's a little bit more high level here okay so let me just [00:05:09] more high level here okay so let me just give you a couple of examples here so so [00:05:10] give you a couple of examples here so so if i'm saying alice and bob both know [00:05:13] if i'm saying alice and bob both know arithmetic in first order logic ideally [00:05:16] arithmetic in first order logic ideally i would want to be able to write [00:05:17] i would want to be able to write something of this form like something [00:05:20] something of this form like something that says hey predicate of knowing [00:05:23] that says hey predicate of knowing over alice and arithmetic over over [00:05:25] over alice and arithmetic over over these these objects alice and arithmetic [00:05:28] these these objects alice and arithmetic should be true and the same predicate of [00:05:31] should be true and the same predicate of knowing over bob and arithmetic again [00:05:33] knowing over bob and arithmetic again arithmetic is the same symbol should be [00:05:35] arithmetic is the same symbol should be true okay so i want to be able to [00:05:37] true okay so i want to be able to capture that structure of objects and [00:05:39] capture that structure of objects and predicates in first order watching [00:05:42] predicates in first order watching the other thing is like going back to [00:05:43] the other thing is like going back to this other student all students knowing [00:05:46] this other student all students knowing arithmetic i should be able to have [00:05:48] arithmetic i should be able to have quantifiers and variables so so ideally [00:05:50] quantifiers and variables so so ideally if i want to write this in first order [00:05:52] if i want to write this in first order logic i would i should write something [00:05:54] logic i would i should write something of this form for all x [00:05:56] of this form for all x x being a student predicate of student [00:05:59] x being a student predicate of student over variable of x should imply x [00:06:03] over variable of x should imply x knowing arithmetic again knowing is the [00:06:05] knowing arithmetic again knowing is the same predicate as before okay [00:06:08] same predicate as before okay so these are actually just examples of [00:06:09] so these are actually just examples of first order logic but but how do we do [00:06:11] first order logic but but how do we do this how do we get to these get to these [00:06:13] this how do we get to these get to these statements for that we need to define [00:06:16] statements for that we need to define the syntax of first order logic so so [00:06:18] the syntax of first order logic so so let's get into the syntax of first order [00:06:20] let's get into the syntax of first order logic [00:06:21] logic all right [00:06:22] all right so let me go to my notebook so so we're [00:06:24] so let me go to my notebook so so we're going to talk about first order logic [00:06:30] and it's syntax [00:06:33] and it's syntax and when you're defining a syntax of [00:06:35] and when you're defining a syntax of first order logic we have two types of [00:06:37] first order logic we have two types of things going on we have terms [00:06:40] things going on we have terms and we have formulas [00:06:42] and we have formulas if you remember prepositional logic we [00:06:44] if you remember prepositional logic we only had formulas here here we first [00:06:46] only had formulas here here we first need to define a set of terms and these [00:06:48] need to define a set of terms and these terms are expressions that are referring [00:06:51] terms are expressions that are referring to objects okay so so they are [00:06:53] to objects okay so so they are expressions [00:06:59] that are referring to objects [00:07:03] that are referring to objects okay [00:07:04] okay okay so what are terms so the first [00:07:06] okay so what are terms so the first thing that we consider as term is a [00:07:09] thing that we consider as term is a constant symbol [00:07:11] constant symbol okay [00:07:12] okay so alice for example or math [00:07:19] or arithmetic or whatever these are [00:07:21] or arithmetic or whatever these are constant symbols okay so constant symbol [00:07:24] constant symbols okay so constant symbol is is a term in addition to constant [00:07:26] is is a term in addition to constant symbols we need to have variables like [00:07:28] symbols we need to have variables like as i as i was talking about earlier if [00:07:30] as i as i was talking about earlier if you want to if you want to be able to [00:07:32] you want to if you want to be able to talk about quantifiers those quantifiers [00:07:35] talk about quantifiers those quantifiers need to be defined over variables so so [00:07:37] need to be defined over variables so so a variable is kind of like if when i say [00:07:39] a variable is kind of like if when i say for all x x does something that x is a [00:07:42] for all x x does something that x is a variable okay [00:07:44] variable okay and in addition to that we we can have [00:07:46] and in addition to that we we can have functions here and these functions are [00:07:48] functions here and these functions are defined on some of these terms so so [00:07:51] defined on some of these terms so so functions can be defined on terms and [00:07:53] functions can be defined on terms and and they also give us terms so for [00:07:55] and they also give us terms so for example i can look at a function like [00:07:57] example i can look at a function like summing over x and 3. so if i'm looking [00:08:00] summing over x and 3. so if i'm looking at sum of [00:08:01] at sum of 3 and x x is a variable 3 here is a [00:08:05] 3 and x x is a variable 3 here is a constant symbol summing over that is a [00:08:07] constant symbol summing over that is a function and that also gives me a chance [00:08:09] function and that also gives me a chance okay [00:08:10] okay all right so so that is uh that is terms [00:08:14] all right so so that is uh that is terms so now i can talk about formulas okay so [00:08:16] so now i can talk about formulas okay so so now that i've defined terms i can [00:08:18] so now that i've defined terms i can talk about formulas the the most basic [00:08:20] talk about formulas the the most basic form of a formula [00:08:22] form of a formula is is an atomic formula this is actually [00:08:26] is is an atomic formula this is actually very similar to our prepositional [00:08:28] very similar to our prepositional symbols in propositional logic so so [00:08:30] symbols in propositional logic so so this is kind of like the basis of it [00:08:32] this is kind of like the basis of it right in propositional logic we had [00:08:33] right in propositional logic we had these propositional symbols like p or [00:08:36] these propositional symbols like p or negation of negation of p we would [00:08:37] negation of negation of p we would define negation on top of that but p was [00:08:40] define negation on top of that but p was this this propositional symbol here [00:08:42] this this propositional symbol here atomic formula is kind of like the basis [00:08:44] atomic formula is kind of like the basis of of this of this logic okay so so what [00:08:47] of of this of this logic okay so so what is an atomic formula an atomic formula [00:08:50] is an atomic formula an atomic formula is a predicate applied to terms okay so [00:08:53] is a predicate applied to terms okay so so for example bob knowing arithmetic is [00:08:56] so for example bob knowing arithmetic is is an atomic formula i can write that as [00:08:59] is an atomic formula i can write that as um [00:08:59] um nose [00:09:01] nose nose is a predicate applied to symbol [00:09:05] nose is a predicate applied to symbol constant symbol bob and constant symbol [00:09:08] constant symbol bob and constant symbol arithmetic [00:09:10] arithmetic okay and [00:09:12] okay and this whole thing is an atomic formula [00:09:14] this whole thing is an atomic formula okay so once we have the atomic formula [00:09:17] okay so once we have the atomic formula then what we can do is we can do [00:09:18] then what we can do is we can do operations on top of these right we [00:09:20] operations on top of these right we could we could do the same things that [00:09:22] could we could do the same things that we did in propositional logic we can [00:09:24] we did in propositional logic we can have logical connectives on top of them [00:09:27] have logical connectives on top of them connectives [00:09:29] connectives on top of these so uh apply to these [00:09:31] on top of these so uh apply to these formulas so these logical connectives [00:09:33] formulas so these logical connectives are things like negation or or [00:09:36] are things like negation or or and or implication [00:09:38] and or implication bi-directional implication we can just [00:09:40] bi-directional implication we can just apply the same sort of things similar to [00:09:42] apply the same sort of things similar to propositional logic here and in addition [00:09:44] propositional logic here and in addition to that we are going to define [00:09:46] to that we are going to define quantifiers so quantifiers is going to [00:09:49] quantifiers so quantifiers is going to be defined on top of all of these and [00:09:51] be defined on top of all of these and these quantifiers are things like for [00:09:53] these quantifiers are things like for all or there exists okay [00:09:56] all or there exists okay all right [00:09:57] all right so this defines the syntax of our first [00:10:00] so this defines the syntax of our first order logic we have terms we have [00:10:02] order logic we have terms we have formulas and then formulas atomic [00:10:05] formulas and then formulas atomic formulas are basically going to be [00:10:07] formulas are basically going to be predicates on top of terms and once we [00:10:09] predicates on top of terms and once we have the atomic formulas we can play [00:10:11] have the atomic formulas we can play around with them using the connectives [00:10:13] around with them using the connectives logical connectives or using the [00:10:14] logical connectives or using the quantifiers [00:10:16] quantifiers let's go back here so a quick recap of [00:10:19] let's go back here so a quick recap of that constant symbols like arithmetic [00:10:21] that constant symbols like arithmetic variables just like x functions like sum [00:10:23] variables just like x functions like sum of three and x and then we have formulas [00:10:26] of three and x and then we have formulas it's referring to kind of like these [00:10:28] it's referring to kind of like these truth values uh and and atomic formulas [00:10:31] truth values uh and and atomic formulas are predicates applied to terms [00:10:33] are predicates applied to terms connectives connect them for example you [00:10:35] connectives connect them for example you might say uh x is a student implies x [00:10:38] might say uh x is a student implies x knows arithmetic so this implication is [00:10:41] knows arithmetic so this implication is connecting this predicate on top of [00:10:43] connecting this predicate on top of symbol on top of the variable to this [00:10:46] symbol on top of the variable to this predicate on top of the variable and the [00:10:48] predicate on top of the variable and the symbol [00:10:50] symbol and then in addition to that once we [00:10:51] and then in addition to that once we have variables we can have quantifiers [00:10:53] have variables we can have quantifiers you can say for all x if x is student [00:10:56] you can say for all x if x is student that implies x knows everything okay [00:10:59] that implies x knows everything okay all right so that summarizes the syntax [00:11:01] all right so that summarizes the syntax of first order logic [00:11:03] of first order logic one quick note on quantifiers is if you [00:11:05] one quick note on quantifiers is if you think about quantifiers quantifiers are [00:11:08] think about quantifiers quantifiers are are just slightly more complicated [00:11:10] are just slightly more complicated versions of ants and ores so if you [00:11:12] versions of ants and ores so if you think about the for all quantifier the [00:11:15] think about the for all quantifier the universal quantification you can think [00:11:17] universal quantification you can think of it literally as a conjunction okay so [00:11:19] of it literally as a conjunction okay so when i say for all x p of x that's very [00:11:23] when i say for all x p of x that's very similar to saying p of a and p of b and [00:11:25] similar to saying p of a and p of b and p of c and so on okay and and and this [00:11:28] p of c and so on okay and and and this for all is kind of like being treated as [00:11:31] for all is kind of like being treated as ants between all the possible things [00:11:33] ants between all the possible things that can attack that [00:11:35] that can attack that that this x can take okay [00:11:38] that this x can take okay similarly if you talk about existential [00:11:40] similarly if you talk about existential quantification there exists that is kind [00:11:42] quantification there exists that is kind of like an or you can think of it as a [00:11:44] of like an or you can think of it as a disjunction if i say there exists xp of [00:11:46] disjunction if i say there exists xp of x it's very similar to saying p of a or [00:11:48] x it's very similar to saying p of a or pfp or pfc and so on [00:11:51] pfp or pfc and so on and if i have a finite number of them [00:11:52] and if i have a finite number of them then i can actually enumerate all of [00:11:54] then i can actually enumerate all of that i can actually unroll this and [00:11:56] that i can actually unroll this and enumerate all of them okay [00:11:59] enumerate all of them okay um okay so then if for all and there [00:12:01] um okay so then if for all and there exists are kind of like and and or then [00:12:04] exists are kind of like and and or then i can apply the morgan's law so what [00:12:06] i can apply the morgan's law so what that means is if i have a negation [00:12:08] that means is if i have a negation outside of one of these quantifiers if i [00:12:10] outside of one of these quantifiers if i say negation of for all x p x that is [00:12:13] say negation of for all x p x that is equivalent to saying there exists x [00:12:16] equivalent to saying there exists x negation of px [00:12:17] negation of px why because the the ands are going to be [00:12:19] why because the the ands are going to be flipped to become ors so so for all [00:12:22] flipped to become ors so so for all becomes there exists and the negation [00:12:24] becomes there exists and the negation becomes inside just like the morgan's [00:12:26] becomes inside just like the morgan's law applies to enzymes [00:12:29] law applies to enzymes all right [00:12:30] all right another point i want to make here is is [00:12:33] another point i want to make here is is if when we say for all x there exists y [00:12:36] if when we say for all x there exists y we can't flip the order again just like [00:12:38] we can't flip the order again just like hand and earth you can't really like [00:12:40] hand and earth you can't really like flip this order right if i say for all x [00:12:42] flip this order right if i say for all x there exists y so then x knows y that's [00:12:46] there exists y so then x knows y that's pretty different from saying there [00:12:48] pretty different from saying there exists y such that for all x x knows y [00:12:51] exists y such that for all x x knows y so so we can't like simply flip their [00:12:53] so so we can't like simply flip their orders do not do that [00:12:56] orders do not do that okay so so now that we know the syntax [00:12:58] okay so so now that we know the syntax of first order logic let's talk about [00:13:01] of first order logic let's talk about how we can start from natural language [00:13:04] how we can start from natural language and from natural language how can we [00:13:06] and from natural language how can we write first order logic so if you think [00:13:08] write first order logic so if you think about universal quantification when we [00:13:11] about universal quantification when we talk about for all the way that we [00:13:13] talk about for all the way that we usually refer to that in natural [00:13:15] usually refer to that in natural language is by using the word like every [00:13:18] language is by using the word like every okay so if i say every student knows [00:13:20] okay so if i say every student knows arithmetic i would use the for all like [00:13:23] arithmetic i would use the for all like quantifier right like i'm saying for all [00:13:25] quantifier right like i'm saying for all x because that corresponds to every [00:13:26] x because that corresponds to every student [00:13:29] student so [00:13:30] so is this the right so it's a question to [00:13:31] is this the right so it's a question to ask is is this the right way of writing [00:13:33] ask is is this the right way of writing this natural language statement i have [00:13:36] this natural language statement i have every student knows arithmetic [00:13:38] every student knows arithmetic i would write for all x [00:13:40] i would write for all x student x is student [00:13:43] student x is student okay [00:13:44] okay and then in addition to that i write x [00:13:46] and then in addition to that i write x knows arithmetic [00:13:48] knows arithmetic but this doesn't actually correspond to [00:13:50] but this doesn't actually correspond to this sentence so there is there is [00:13:51] this sentence so there is there is something a little bit subtle going on [00:13:53] something a little bit subtle going on here then you say every student knows [00:13:56] here then you say every student knows arithmetic [00:13:57] arithmetic basically if you're conditioning it like [00:13:59] basically if you're conditioning it like knowing you're arithmetic on being a [00:14:01] knowing you're arithmetic on being a student [00:14:02] student but the statement here is not doing that [00:14:04] but the statement here is not doing that conditioning this statement is is [00:14:06] conditioning this statement is is basically saying everyone is student and [00:14:08] basically saying everyone is student and everyone knows arithmetic and then [00:14:10] everyone knows arithmetic and then that's not right right not everyone [00:14:12] that's not right right not everyone knows arithmetic every student knows [00:14:14] knows arithmetic every student knows arithmetic so because there is that [00:14:16] arithmetic so because there is that conditioning that goes on and in our [00:14:18] conditioning that goes on and in our like in this statement and it's kind of [00:14:20] like in this statement and it's kind of like implied in this natural language [00:14:21] like implied in this natural language the correct way of writing this is by [00:14:24] the correct way of writing this is by using an implication so if i want to [00:14:26] using an implication so if i want to write out the statement every student [00:14:28] write out the statement every student knows arithmetic i would write for all x [00:14:30] knows arithmetic i would write for all x of x as a student then that implies that [00:14:33] of x as a student then that implies that x knows arithmetic okay so condition on [00:14:36] x knows arithmetic okay so condition on that person being a student then x knows [00:14:38] that person being a student then x knows everything and then that that is an [00:14:40] everything and then that that is an implication [00:14:41] implication we're going to have a couple of these [00:14:43] we're going to have a couple of these examples and and what i'm in the [00:14:44] examples and and what i'm in the assignment the logic assignment so so i [00:14:47] assignment the logic assignment so so i think it's kind of like a good rule of [00:14:49] think it's kind of like a good rule of rule of thumb to to think about [00:14:51] rule of thumb to to think about for all employees every time you see [00:14:54] for all employees every time you see every so so this is not always true but [00:14:57] every so so this is not always true but in in general if you if you see like [00:14:59] in in general if you if you see like every every student or every person does [00:15:01] every every student or every person does whatever it's it's usually of the form [00:15:04] whatever it's it's usually of the form of for all implications okay [00:15:07] of for all implications okay how about there exists so so let's say [00:15:09] how about there exists so so let's say that we have some student knows [00:15:12] that we have some student knows arithmetic so when we talk about some [00:15:14] arithmetic so when we talk about some student then we have to use existential [00:15:16] student then we have to use existential quantifier and the actual correct way of [00:15:18] quantifier and the actual correct way of writing this is to say well there exists [00:15:20] writing this is to say well there exists some student and an x knows arithmetic [00:15:23] some student and an x knows arithmetic so so an and is going to be sufficient [00:15:26] so so an and is going to be sufficient here so every time you see a sum it it's [00:15:29] here so every time you see a sum it it's usually corresponding to there exists [00:15:31] usually corresponding to there exists and an and and every time you see every [00:15:33] and an and and every time you see every it's usually corresponding to for all [00:15:35] it's usually corresponding to for all and an implication [00:15:39] all right [00:15:41] all right so note that there are different [00:15:42] so note that there are different connectives here for for for all and [00:15:44] connectives here for for for all and there exists usually when you start from [00:15:46] there exists usually when you start from natural language [00:15:48] natural language okay let's look at a few examples let's [00:15:49] okay let's look at a few examples let's see examples let's see if we can write [00:15:51] see examples let's see if we can write these in first order logic so first [00:15:54] these in first order logic so first example is there is some course that [00:15:56] example is there is some course that every student has taken [00:15:59] every student has taken okay how do we write this [00:16:01] okay how do we write this so there is some course so so so [00:16:04] so there is some course so so so there [00:16:05] there exists a y such that y is a course okay [00:16:08] exists a y such that y is a course okay there is some course this is this is to [00:16:10] there is some course this is this is to that part of it there is some course [00:16:11] that part of it there is some course there is a y so y is a course [00:16:13] there is a y so y is a course that every student has taken it so [00:16:15] that every student has taken it so that's the and part and there is [00:16:17] that's the and part and there is something true about this course what is [00:16:19] something true about this course what is true about this course every student has [00:16:22] true about this course every student has taken it [00:16:23] taken it so how do we write every student has [00:16:25] so how do we write every student has taken it what that says is for all x [00:16:27] taken it what that says is for all x when x is a student that implies that x [00:16:31] when x is a student that implies that x that student has taken the course y [00:16:34] that student has taken the course y okay all right so that's the first [00:16:36] okay all right so that's the first example [00:16:37] example let's look at the second example so [00:16:39] let's look at the second example so every if you've seen this example [00:16:41] every if you've seen this example earlier uh in this module so every even [00:16:44] earlier uh in this module so every even integer greater than 2 is the sum of is [00:16:46] integer greater than 2 is the sum of is the sum of two primes how do we write [00:16:48] the sum of two primes how do we write this so it says every even integer so if [00:16:51] this so it says every even integer so if it says every if i see every then i [00:16:54] it says every if i see every then i would expect to see it for all x and [00:16:56] would expect to see it for all x and some implication that comes later so so [00:16:58] some implication that comes later so so what is that x for all x that x is an [00:17:01] what is that x for all x that x is an even integer and this is greater than [00:17:03] even integer and this is greater than two so so for every even integer greater [00:17:06] two so so for every even integer greater than two so even integer greater than [00:17:08] than two so even integer greater than two [00:17:09] two what do i get for that even integer [00:17:10] what do i get for that even integer greater than two that implies that [00:17:13] greater than two that implies that that that integer is going to be the sum [00:17:15] that that integer is going to be the sum of two primes so so how do i write that [00:17:18] of two primes so so how do i write that i'm going to use there exists y and [00:17:20] i'm going to use there exists y and there exists c to correspond to those [00:17:22] there exists c to correspond to those two primes right so that integer is [00:17:24] two primes right so that integer is going to be sum of two primes there [00:17:27] going to be sum of two primes there exists a prime and there exists another [00:17:28] exists a prime and there exists another prime there exists a y and there is [00:17:30] prime there exists a y and there is another z so so that i i i can say [00:17:34] another z so so that i i i can say sum [00:17:35] sum and y is a prime and z is a prime and [00:17:38] and y is a prime and z is a prime and sum of y and z is equal to x that [00:17:41] sum of y and z is equal to x that integer [00:17:42] integer okay [00:17:43] okay all [00:17:44] all right let's look at another example if a [00:17:48] right let's look at another example if a student takes a course and the course [00:17:50] student takes a course and the course covers a concept then the student knows [00:17:53] covers a concept then the student knows the concept [00:17:54] the concept so if it's kind of like every remember [00:17:57] so if it's kind of like every remember like every was also kind of like if [00:17:59] like every was also kind of like if right like when we're seeing every we [00:18:00] right like when we're seeing every we would say for all implies if it's kind [00:18:02] would say for all implies if it's kind of the same thing if it's basically [00:18:04] of the same thing if it's basically saying hey if a student basically means [00:18:06] saying hey if a student basically means for every student so so if a student [00:18:09] for every student so so if a student takes the course and the course covers [00:18:12] takes the course and the course covers the concept then the student knows the [00:18:14] the concept then the student knows the concept so for all x [00:18:16] concept so for all x x being a student okay [00:18:18] x being a student okay and [00:18:19] and x takes a course y so for all course y [00:18:22] x takes a course y so for all course y and for all concepts c so so these four [00:18:24] and for all concepts c so so these four alls are for all student and for all [00:18:27] alls are for all student and for all course and for all concept if x is a [00:18:30] course and for all concept if x is a student and x takes y and y is a course [00:18:34] student and x takes y and y is a course and y covers z if you want it to be like [00:18:37] and y covers z if you want it to be like pedantic and right we should have had [00:18:38] pedantic and right we should have had and z is a concept but i'm skipping that [00:18:42] and z is a concept but i'm skipping that then what does that tell me [00:18:43] then what does that tell me then the student knows the concept right [00:18:45] then the student knows the concept right the thing that comes after a comma is [00:18:47] the thing that comes after a comma is the thing that comes after after this [00:18:49] the thing that comes after after this implication right then the student x [00:18:52] implication right then the student x knows the concept c [00:18:54] knows the concept c okay all right so that was going from [00:18:57] okay all right so that was going from natural language to first order logic [00:19:00] natural language to first order logic and and we're able to talk about the [00:19:02] and and we're able to talk about the syntax of first order logic using terms [00:19:04] syntax of first order logic using terms and then using using these formulas [00:19:06] and then using using these formulas defined over terms so so now let's talk [00:19:08] defined over terms so so now let's talk about the semantics of first order logic [00:19:11] about the semantics of first order logic how do we define the meaning or [00:19:12] how do we define the meaning or semantics for this first order logic so [00:19:14] semantics for this first order logic so if you remember the semantics how we [00:19:16] if you remember the semantics how we define semantics for propositional logic [00:19:19] define semantics for propositional logic they defined it by this using this idea [00:19:21] they defined it by this using this idea of models right like models representing [00:19:24] of models right like models representing a particular situation in the world okay [00:19:27] a particular situation in the world okay so in propositional logic and model w [00:19:30] so in propositional logic and model w was a world [00:19:31] was a world that was mapping prepositional symbols [00:19:33] that was mapping prepositional symbols to truth values it was it was a truth [00:19:35] to truth values it was it was a truth assignment to propositional symbols so [00:19:38] assignment to propositional symbols so if going back to my original example [00:19:40] if going back to my original example alice knows arithmetic and bob knows [00:19:41] alice knows arithmetic and bob knows arithmetic if i were to write that in in [00:19:44] arithmetic if i were to write that in in propositional logic i would basically [00:19:46] propositional logic i would basically have these propositional symbols alice [00:19:48] have these propositional symbols alice knows arithmetic bob knows arithmetic [00:19:50] knows arithmetic bob knows arithmetic and a model would assign one or zero to [00:19:53] and a model would assign one or zero to each one of these prepositional symbols [00:19:55] each one of these prepositional symbols that was in prepositional watching [00:19:57] that was in prepositional watching how do we think about it in first order [00:19:59] how do we think about it in first order logic [00:20:01] logic so the reason we think about this in [00:20:02] so the reason we think about this in first order logic is by having a graph [00:20:05] first order logic is by having a graph representation for every model so so you [00:20:08] representation for every model so so you can think about predicates that we have [00:20:10] can think about predicates that we have been talking about like knowing [00:20:11] been talking about like knowing arithmetic and so on as either like [00:20:14] arithmetic and so on as either like unary or binary predicates that are [00:20:16] unary or binary predicates that are defined over over these terms that we [00:20:18] defined over over these terms that we are talking about okay so so a model w [00:20:21] are talking about okay so so a model w can basically be represented by by a [00:20:24] can basically be represented by by a graph here so so so a w could be a graph [00:20:27] graph here so so so a w could be a graph where we have these these different [00:20:29] where we have these these different nodes and each node corresponds to an [00:20:31] nodes and each node corresponds to an object okay so so an object is going to [00:20:34] object okay so so an object is going to be represented uh by a node and then [00:20:37] be represented uh by a node and then we're labeling each node by constant [00:20:40] we're labeling each node by constant symbol okay so node one o1 is an object [00:20:43] symbol okay so node one o1 is an object and it's going to be labeled by alice [00:20:45] and it's going to be labeled by alice and node two might be an object and it's [00:20:47] and node two might be an object and it's corresponded to like both bob and robert [00:20:49] corresponded to like both bob and robert and o3 is corresponding to uh is a note [00:20:52] and o3 is corresponding to uh is a note to corresponding to an object [00:20:53] to corresponding to an object corresponding to an arithmetic then what [00:20:55] corresponding to an arithmetic then what we can do is we can have directed edges [00:20:57] we can do is we can have directed edges here that that basically talk about [00:20:59] here that that basically talk about binary predicates so alice knowing [00:21:02] binary predicates so alice knowing arithmetic corresponds to this directed [00:21:05] arithmetic corresponds to this directed edge that corresponds to this predicate [00:21:07] edge that corresponds to this predicate knows apply to alice and alice and [00:21:10] knows apply to alice and alice and arithmetic and for for unary predicates [00:21:12] arithmetic and for for unary predicates we're basically joining just going to [00:21:13] we're basically joining just going to have the predicate on top of the note so [00:21:16] have the predicate on top of the note so alice being a student okay [00:21:19] alice being a student okay all right so that defines a model here [00:21:21] all right so that defines a model here so so a model in the first order logic [00:21:24] so so a model in the first order logic basically has two components one is that [00:21:27] basically has two components one is that it basically has constant symbols [00:21:29] it basically has constant symbols assigned to two objects so so so [00:21:32] assigned to two objects so so so basically a model for for alice [00:21:34] basically a model for for alice corresponds to node 01 and [00:21:37] corresponds to node 01 and bob corresponds to node o2 and [00:21:39] bob corresponds to node o2 and arithmetic corresponds to node 03 and [00:21:41] arithmetic corresponds to node 03 and then we have predicate symbols that's [00:21:43] then we have predicate symbols that's that are basically giving us these two [00:21:46] that are basically giving us these two poles of oh one no angle three or o two [00:21:48] poles of oh one no angle three or o two knowing o3 so so that corresponds to w [00:21:51] knowing o3 so so that corresponds to w of knowing that predicate of knowing [00:21:53] of knowing that predicate of knowing which basically gives us these two poles [00:21:55] which basically gives us these two poles either either they could be binary or [00:21:57] either either they could be binary or unary depending on depending on their [00:21:59] unary depending on depending on their predicate [00:22:01] predicate all right so the way we are defining a [00:22:03] all right so the way we are defining a model is a little bit more complex than [00:22:05] model is a little bit more complex than the way we defined modeled in [00:22:06] the way we defined modeled in propositional logic [00:22:09] propositional logic so so there are a few other restrictions [00:22:12] so so there are a few other restrictions that we are putting on models to make [00:22:13] that we are putting on models to make our lives a little bit easier so so if [00:22:16] our lives a little bit easier so so if you remember um okay so we can have um [00:22:18] you remember um okay so we can have um basically a statement that says john and [00:22:21] basically a statement that says john and bob are students right so how do i write [00:22:23] bob are students right so how do i write this and and first order logic i can say [00:22:26] this and and first order logic i can say john a student and and all the students [00:22:29] john a student and and all the students so students predicate on top of john and [00:22:31] so students predicate on top of john and bob [00:22:32] bob so if i think about a model [00:22:34] so if i think about a model corresponding to that i can have a note [00:22:36] corresponding to that i can have a note 01 corresponding to john and and student [00:22:39] 01 corresponding to john and and student predicate student on top of this node [00:22:41] predicate student on top of this node and i can have node o2 responding to bob [00:22:44] and i can have node o2 responding to bob and student on top of this [00:22:46] and student on top of this but that's one option right one could [00:22:49] but that's one option right one could have other types of models that [00:22:51] have other types of models that represent this i can have a single note [00:22:53] represent this i can have a single note and i can say well this person's name is [00:22:55] and i can say well this person's name is both john and bob maybe john and bob [00:22:58] both john and bob maybe john and bob are the same people and and uh we're [00:23:01] are the same people and and uh we're talking about uh both of them being a [00:23:03] talking about uh both of them being a student so one other option is w2 one [00:23:06] student so one other option is w2 one other way of representing this model is [00:23:07] other way of representing this model is w2 where i just write one note or maybe [00:23:10] w2 where i just write one note or maybe i have three notes maybe i have like [00:23:12] i have three notes maybe i have like this other unnamed note here that that [00:23:14] this other unnamed note here that that doesn't have anyone assigned to it so [00:23:17] doesn't have anyone assigned to it so the restriction that we are putting in [00:23:18] the restriction that we are putting in here is basically trying to make sure [00:23:20] here is basically trying to make sure that w2 and w3 doesn't happen so so [00:23:24] that w2 and w3 doesn't happen so so basically we are putting this unique [00:23:25] basically we are putting this unique names assumption which says each object [00:23:28] names assumption which says each object has at most one constant symbol for it [00:23:30] has at most one constant symbol for it and this basically rules out w and in [00:23:33] and this basically rules out w and in addition to the w [00:23:35] addition to the w sorry this rules that w two basically we [00:23:37] sorry this rules that w two basically we can't have both john and bob associated [00:23:40] can't have both john and bob associated to the single node to the single object [00:23:42] to the single node to the single object so we can have at most one constant [00:23:44] so we can have at most one constant symbol [00:23:45] symbol and in addition to that you're going to [00:23:46] and in addition to that you're going to have another assumption on domain [00:23:48] have another assumption on domain closure which basically says each object [00:23:50] closure which basically says each object has at least one constant symbol so so [00:23:53] has at least one constant symbol so so we can't have an object corresponding to [00:23:55] we can't have an object corresponding to o2 here that doesn't have any symbols [00:23:57] o2 here that doesn't have any symbols assigned to it so this rules out double [00:23:59] assigned to it so this rules out double e3 so this basically ensures that when [00:24:03] e3 so this basically ensures that when when we have a constant symbol a [00:24:04] when we have a constant symbol a constant symbol is equivalent to having [00:24:06] constant symbol is equivalent to having an object if i have an object there is [00:24:08] an object if i have an object there is one single constant [00:24:10] one single constant constant symbol that is assigned to it [00:24:12] constant symbol that is assigned to it okay [00:24:13] okay so why am i trying to do this like what [00:24:15] so why am i trying to do this like what would this buy me so the thing that [00:24:18] would this buy me so the thing that despised me this one to one mapping that [00:24:20] despised me this one to one mapping that we have between constant symbols and [00:24:22] we have between constant symbols and objects like using using these two [00:24:23] objects like using using these two assumptions that i've put allows me to [00:24:26] assumptions that i've put allows me to do to do an operation that called [00:24:28] do to do an operation that called prepositionalization [00:24:29] prepositionalization and and what that buys me is actually it [00:24:31] and and what that buys me is actually it advised me to to be able to use [00:24:33] advised me to to be able to use inference rules that were in [00:24:35] inference rules that were in propositional logic the whole reason i'm [00:24:36] propositional logic the whole reason i'm doing this is so i can i can use ideas [00:24:39] doing this is so i can i can use ideas from propositional logic like resolution [00:24:41] from propositional logic like resolution or modus ponens when it comes to [00:24:42] or modus ponens when it comes to inference um in first order logic okay [00:24:46] inference um in first order logic okay so if you think about it if we make this [00:24:48] so if you think about it if we make this restriction then first order logic is [00:24:51] restriction then first order logic is not anything fancy right it's just [00:24:52] not anything fancy right it's just syntactic trigger on top of [00:24:53] syntactic trigger on top of propositional logic it it helps us write [00:24:56] propositional logic it it helps us write things a little bit more expressively [00:24:57] things a little bit more expressively and have like an easier time like [00:24:59] and have like an easier time like writing out things but at the end of the [00:25:01] writing out things but at the end of the day it's it's kind of like the same [00:25:03] day it's it's kind of like the same thing the same sort of logic that goes [00:25:05] thing the same sort of logic that goes behind everything okay so uh for example [00:25:08] behind everything okay so uh for example if we have like this this example [00:25:10] if we have like this this example knowledge base in first order logic we [00:25:12] knowledge base in first order logic we might say alice's student and bob is [00:25:14] might say alice's student and bob is student and every student is a person so [00:25:17] student and every student is a person so for all x axis student that implies x is [00:25:19] for all x axis student that implies x is a person and some students are creative [00:25:22] a person and some students are creative so for for their existing x or x as a [00:25:24] so for for their existing x or x as a student and x is created x is creative [00:25:26] student and x is created x is creative okay so this is my knowledge base in [00:25:28] okay so this is my knowledge base in first order logic okay one can write [00:25:30] first order logic okay one can write this exact knowledge base in [00:25:32] this exact knowledge base in propositional logic right like i can [00:25:34] propositional logic right like i can actually do that now based on this [00:25:36] actually do that now based on this assumption that that every constant [00:25:38] assumption that that every constant symbol has a one-to-one mapping to an [00:25:40] symbol has a one-to-one mapping to an object i can simply write this in [00:25:42] object i can simply write this in prepositional logic i can say student [00:25:44] prepositional logic i can say student alice i can say student pop both of [00:25:46] alice i can say student pop both of these are our prepositional symbols i [00:25:49] these are our prepositional symbols i can have an and of them and then student [00:25:51] can have an and of them and then student alice implies personalis which is [00:25:52] alice implies personalis which is another another propositional symbol and [00:25:55] another another propositional symbol and student bob employs person bob [00:25:57] student bob employs person bob student alice and creative alice or [00:25:59] student alice and creative alice or student bob and creative bob is what i [00:26:02] student bob and creative bob is what i get from this last next question ================================================================================ LECTURE 048 ================================================================================ Logic 8 - First Order Modus Ponens | Stanford CS221: Artificial Intelligence (Autumn 2021) Source: https://www.youtube.com/watch?v=mndzhfBpyUw --- Transcript [00:00:05] okay so so far we've been talking about [00:00:08] okay so so far we've been talking about first order logic and it's syntax and [00:00:10] first order logic and it's syntax and semantics and now what we would like to [00:00:12] semantics and now what we would like to do is you like to talk about inference [00:00:14] do is you like to talk about inference rules for first order logic in this [00:00:16] rules for first order logic in this module we are going to be talking about [00:00:17] module we are going to be talking about modus ponens when we have only horn [00:00:20] modus ponens when we have only horn clauses and in the next module we are [00:00:22] clauses and in the next module we are going to be talking about resolution [00:00:24] going to be talking about resolution when it comes to first order logic okay [00:00:27] when it comes to first order logic okay all right so if you remember what [00:00:29] all right so if you remember what entrance rules do is they basically do [00:00:31] entrance rules do is they basically do symbol manipulation so they take the [00:00:33] symbol manipulation so they take the formulas the syntactic form of the [00:00:35] formulas the syntactic form of the formulas and they have like no notion of [00:00:37] formulas and they have like no notion of meanings or anything of that form but [00:00:39] meanings or anything of that form but based on the formulas that are in [00:00:41] based on the formulas that are in knowledge base they basically try to [00:00:42] knowledge base they basically try to infer they try to derive or prove a new [00:00:45] infer they try to derive or prove a new formula based on what exists by [00:00:47] formula based on what exists by syntactically moving things around kind [00:00:49] syntactically moving things around kind of like what we have seen in modus [00:00:51] of like what we have seen in modus ponens for propositional logic right [00:00:54] ponens for propositional logic right so [00:00:54] so what we would like to do is you like to [00:00:56] what we would like to do is you like to focus on applying modus ponens to first [00:00:58] focus on applying modus ponens to first order logic when we are under a scenario [00:01:01] order logic when we are under a scenario where we are have only horn clauses [00:01:03] where we are have only horn clauses and if you remember horn clauses were [00:01:05] and if you remember horn clauses were definite causes and goal clauses and [00:01:07] definite causes and goal clauses and definite clauses were of the form of [00:01:09] definite clauses were of the form of having some some set of prepositional [00:01:11] having some some set of prepositional symbols so p and q for example implying [00:01:15] symbols so p and q for example implying pp1 and p2 for example implying some sum [00:01:17] pp1 and p2 for example implying some sum q so some positive literals ended with [00:01:20] q so some positive literals ended with each other implying a new positive [00:01:22] each other implying a new positive literal so how do we extend that idea of [00:01:25] literal so how do we extend that idea of definite clause to the space of first [00:01:27] definite clause to the space of first order watching [00:01:28] order watching so if you want to look at definite [00:01:29] so if you want to look at definite causes in first order logic you're going [00:01:31] causes in first order logic you're going to have a set of variables and you're [00:01:33] to have a set of variables and you're going to have quantifiers on top of them [00:01:35] going to have quantifiers on top of them so for example [00:01:36] so for example this is an example of a definite clause [00:01:38] this is an example of a definite clause where we talk about for all x for all y [00:01:40] where we talk about for all x for all y for all z and then we have these [00:01:42] for all z and then we have these predicates takes x and y [00:01:44] predicates takes x and y and ended with another predicate covers [00:01:47] and ended with another predicate covers y and z and that implies a whole new [00:01:49] y and z and that implies a whole new predicate knows x and z okay so we have [00:01:51] predicate knows x and z okay so we have kind of like these atomic formulas ended [00:01:54] kind of like these atomic formulas ended with each other and we have we have a [00:01:56] with each other and we have we have a set of quantifiers outside and this [00:01:58] set of quantifiers outside and this implication okay [00:02:00] implication okay so basically if you prepositionalize [00:02:03] so basically if you prepositionalize here we get one formula for each value [00:02:06] here we get one formula for each value of x x y and z so so if you remember [00:02:08] of x x y and z so so if you remember prepositionalization from the from the [00:02:10] prepositionalization from the from the last module what we can do is we can [00:02:12] last module what we can do is we can basically think about x y and z taking [00:02:14] basically think about x y and z taking specific values like x being alice and y [00:02:17] specific values like x being alice and y being cs221 and z being mdp and if you [00:02:20] being cs221 and z being mdp and if you think about each one of these values [00:02:22] think about each one of these values like each one of these formulas taking [00:02:24] like each one of these formulas taking one value for uh for each x for each y [00:02:27] one value for uh for each x for each y and for each c and then we end up with a [00:02:29] and for each c and then we end up with a prepositional logic formula that ends up [00:02:31] prepositional logic formula that ends up being actually definite clauses [00:02:34] being actually definite clauses but we would like to be able to [00:02:35] but we would like to be able to represent this in this more expressive [00:02:37] represent this in this more expressive way and and [00:02:39] way and and because of that we are defining definite [00:02:41] because of that we are defining definite clauses in first order logic using these [00:02:43] clauses in first order logic using these variables and using these quantifiers [00:02:45] variables and using these quantifiers and so on so more formally a definite [00:02:48] and so on so more formally a definite clause has the following form so so it [00:02:50] clause has the following form so so it has this form of having a for all [00:02:52] has this form of having a for all quantifier for x1 for all through for [00:02:56] quantifier for x1 for all through for all xn where x1 and x8 through xn are [00:02:58] all xn where x1 and x8 through xn are variables [00:03:00] variables and and we have these atomic formulas a1 [00:03:03] and and we have these atomic formulas a1 through a k and b all of these are [00:03:04] through a k and b all of these are atomic formulas and we are adding these [00:03:07] atomic formulas and we are adding these atomic formulas a1 through a k and that [00:03:09] atomic formulas a1 through a k and that implies b and remember these atomic [00:03:11] implies b and remember these atomic formulas actually contain these [00:03:13] formulas actually contain these variables x1 through xn so so they are [00:03:16] variables x1 through xn so so they are they actually have x1 through xn inside [00:03:18] they actually have x1 through xn inside of them containing them okay kind of [00:03:20] of them containing them okay kind of like this example up here [00:03:23] like this example up here all right so this is a definite clause [00:03:25] all right so this is a definite clause in first order logic so how can we do [00:03:28] in first order logic so how can we do modus ponens in in first order logic [00:03:31] modus ponens in in first order logic okay so if this is a definite clause for [00:03:34] okay so if this is a definite clause for all x one through x and a one added [00:03:36] all x one through x and a one added through a k implies b [00:03:38] through a k implies b one possible attempt maybe our first [00:03:40] one possible attempt maybe our first attempt in looking at modus ponens is we [00:03:43] attempt in looking at modus ponens is we have this and in addition to that maybe [00:03:45] have this and in addition to that maybe in our knowledge base we have a one [00:03:47] in our knowledge base we have a one through a k and based on that we should [00:03:49] through a k and based on that we should be able to derive b based on these [00:03:52] be able to derive b based on these premises maybe we can conclude b okay [00:03:56] premises maybe we can conclude b okay so does this work does this does this [00:03:58] so does this work does this does this definition of modus ponens work [00:04:01] definition of modus ponens work let's look at an example so it turns out [00:04:02] let's look at an example so it turns out that it actually doesn't work so so [00:04:04] that it actually doesn't work so so let's look at this example where we have [00:04:07] let's look at this example where we have p of alice so p is a predicate over [00:04:09] p of alice so p is a predicate over alice and maybe that defines our a1 [00:04:12] alice and maybe that defines our a1 and then we say for all x p of x implies [00:04:15] and then we say for all x p of x implies q of x okay [00:04:17] q of x okay and ideally what should i get from this [00:04:20] and ideally what should i get from this ideally i would like to get q of b from [00:04:22] ideally i would like to get q of b from this [00:04:23] this but i'm really not able to do that well [00:04:25] but i'm really not able to do that well why am i not able to do that because [00:04:27] why am i not able to do that because remember modus ponens is an inference [00:04:29] remember modus ponens is an inference rule inference rules don't really know [00:04:31] rule inference rules don't really know anything about semantics or meanings so [00:04:33] anything about semantics or meanings so they're basically just matching symbols [00:04:35] they're basically just matching symbols and if i'm just matching symbols first [00:04:37] and if i'm just matching symbols first off p of alice has nothing to do with p [00:04:39] off p of alice has nothing to do with p of x so i can't really match p of alice [00:04:41] of x so i can't really match p of alice and p of x so i'm kind of like screwed i [00:04:43] and p of x so i'm kind of like screwed i can't like this i can't apply this modus [00:04:45] can't like this i can't apply this modus ponens idea on top of it and then in [00:04:48] ponens idea on top of it and then in addition to that if i even if like i [00:04:50] addition to that if i even if like i could somehow say p of alice and p of x [00:04:52] could somehow say p of alice and p of x are the same thing i i'm not going to be [00:04:54] are the same thing i i'm not going to be able to get [00:04:56] able to get here q of alice because q of q of alice [00:05:00] here q of alice because q of q of alice and q of x are very different things so [00:05:02] and q of x are very different things so i can't infer q of alice and then i also [00:05:05] i can't infer q of alice and then i also can't really like match p of x and p of [00:05:07] can't really like match p of x and p of ours they don't really match here so [00:05:09] ours they don't really match here so this modus ponens like the rule that [00:05:11] this modus ponens like the rule that i've written here just doesn't work this [00:05:13] i've written here just doesn't work this is not the modus ponens that we should [00:05:15] is not the modus ponens that we should be using in first order logic [00:05:17] be using in first order logic so how are we going to solve this [00:05:19] so how are we going to solve this so there are two ideas that i'm going to [00:05:21] so there are two ideas that i'm going to be talking about in this module [00:05:22] be talking about in this module substitution and unification and [00:05:25] substitution and unification and substitution and unification are the [00:05:27] substitution and unification are the things that are going to make our [00:05:29] things that are going to make our improve our modus ponens and help us [00:05:31] improve our modus ponens and help us apply modus ponens in first order logic [00:05:33] apply modus ponens in first order logic so let's look at what they are so what [00:05:35] so let's look at what they are so what is substitution [00:05:37] is substitution so what substitution does is it takes a [00:05:39] so what substitution does is it takes a substitution rule that substitutes a [00:05:42] substitution rule that substitutes a variable with a term [00:05:44] variable with a term and it takes a formula and it basically [00:05:46] and it takes a formula and it basically takes that formula and substitutes all [00:05:49] takes that formula and substitutes all those variables with those terms that it [00:05:51] those variables with those terms that it is given okay [00:05:53] is given okay one thing to notice is it's going to [00:05:54] one thing to notice is it's going to substitute a variable like x with a term [00:05:57] substitute a variable like x with a term and what is a term if you remember our [00:06:00] and what is a term if you remember our module on syntax of first order logic a [00:06:03] module on syntax of first order logic a term is going to be either a constant [00:06:05] term is going to be either a constant symbol [00:06:06] symbol or another variable [00:06:08] or another variable or a function okay so here in this [00:06:10] or a function okay so here in this example alice is a constant symbol so [00:06:12] example alice is a constant symbol so i'm replacing a variable x with a [00:06:15] i'm replacing a variable x with a constant symbol alice okay [00:06:18] constant symbol alice okay here's another example so i'm [00:06:20] here's another example so i'm substituting x with alice and i'm [00:06:22] substituting x with alice and i'm substituting y with z with another [00:06:24] substituting y with z with another variable in this formula p of x and k of [00:06:27] variable in this formula p of x and k of x and y okay so so i'm doing find and [00:06:29] x and y okay so so i'm doing find and replace basically i'm doing find x [00:06:31] replace basically i'm doing find x replace it by alice find x replace it by [00:06:34] replace it by alice find x replace it by hours find y replace it by z [00:06:36] hours find y replace it by z and and that is what substitution does [00:06:39] and and that is what substitution does so substitution theta it's a mapping [00:06:41] so substitution theta it's a mapping from variables to terms and substitute [00:06:44] from variables to terms and substitute theta f returns basically the result of [00:06:47] theta f returns basically the result of performing that substitution on a [00:06:49] performing that substitution on a formula [00:06:51] formula okay [00:06:53] okay all right so what does unification do so [00:06:55] all right so what does unification do so that was substitution that's great so [00:06:57] that was substitution that's great so what does unification do so what [00:06:58] what does unification do so what unification does is it takes two [00:07:00] unification does is it takes two formulas and it tries to match them as [00:07:03] formulas and it tries to match them as closely as possible and unification [00:07:06] closely as possible and unification returns a substitution rule that tries [00:07:08] returns a substitution rule that tries to match those formulas as close as [00:07:10] to match those formulas as close as possible so if i'm doing unify knows [00:07:13] possible so if i'm doing unify knows alice arithmetic and knows x arithmetic [00:07:16] alice arithmetic and knows x arithmetic i have these two formulas i try to match [00:07:18] i have these two formulas i try to match them as close as possible and the [00:07:20] them as close as possible and the substitution rule that matches these as [00:07:22] substitution rule that matches these as close as possible is replace the [00:07:24] close as possible is replace the variable x by alps okay so that is what [00:07:27] variable x by alps okay so that is what i'm going to reject [00:07:29] i'm going to reject let's look at another example so i might [00:07:31] let's look at another example so i might have unify knows alice y and knows x z [00:07:34] have unify knows alice y and knows x z okay so what is a substitution rule that [00:07:36] okay so what is a substitution rule that gets me there i'm going to get a [00:07:38] gets me there i'm going to get a substitution rule that says [00:07:40] substitution rule that says replace x variable x by alice replace [00:07:43] replace x variable x by alice replace variable y by z and and that is going to [00:07:46] variable y by z and and that is going to be the substitution rule that i'll get [00:07:47] be the substitution rule that i'll get out of unifying these two formulas [00:07:51] out of unifying these two formulas here's another example so i have unified [00:07:53] here's another example so i have unified knows alice y and knows biopsy [00:07:56] knows alice y and knows biopsy so this is going to return fail the [00:07:57] so this is going to return fail the reason it's going to return fail is i'm [00:08:00] reason it's going to return fail is i'm not going to be able to substitute a [00:08:02] not going to be able to substitute a symbol a constant symbol by another [00:08:04] symbol a constant symbol by another constant symbol remember we are [00:08:05] constant symbol remember we are substituting variables by terms and [00:08:07] substituting variables by terms and there are no variables here to [00:08:09] there are no variables here to substitute there are two there are two [00:08:11] substitute there are two there are two constant symbols or two terms right so [00:08:13] constant symbols or two terms right so so i'm not going to be able to [00:08:14] so i'm not going to be able to substitute these so i'm going to get [00:08:16] substitute these so i'm going to get failed from unification here [00:08:18] failed from unification here and here's another example so i might [00:08:20] and here's another example so i might have knows alice and y and knows x and f [00:08:22] have knows alice and y and knows x and f of x a function here right so so here a [00:08:26] of x a function here right so so here a substitution rule is take the variable x [00:08:28] substitution rule is take the variable x replace it by alice and takes variable y [00:08:31] replace it by alice and takes variable y and replace it by f of l so i'm taking [00:08:33] and replace it by f of l so i'm taking the most general form of this where i [00:08:36] the most general form of this where i could have f of x here but because i've [00:08:38] could have f of x here but because i've already i already know in my [00:08:39] already i already know in my substitution rule that x needs to be [00:08:41] substitution rule that x needs to be replaced by alice instead of putting f [00:08:43] replaced by alice instead of putting f of x here i'm putting f of alex i've [00:08:45] of x here i'm putting f of alex i've already like replaced x by als okay [00:08:48] already like replaced x by als okay so what is unification well what does it [00:08:50] so what is unification well what does it do more formally it takes two formulas f [00:08:52] do more formally it takes two formulas f and g and it returns the substitution [00:08:55] and g and it returns the substitution which is the most general form of a [00:08:57] which is the most general form of a unifier so to unify f and g two formulas [00:09:00] unifier so to unify f and g two formulas return of theta so then if i if i do [00:09:03] return of theta so then if i if i do substitution of theta and f that gives [00:09:05] substitution of theta and f that gives me the same thing as substitution of [00:09:08] me the same thing as substitution of theta in g and it returns fail if such a [00:09:11] theta in g and it returns fail if such a such a substitution doesn't exist okay [00:09:14] such a substitution doesn't exist okay so why am i defining these so the reason [00:09:16] so why am i defining these so the reason i'm defining unification and [00:09:17] i'm defining unification and substitution is i can now modify my [00:09:20] substitution is i can now modify my modus ponens and i can use this idea of [00:09:22] modus ponens and i can use this idea of substitution and unification in order to [00:09:24] substitution and unification in order to make modus ponens work in first order [00:09:26] make modus ponens work in first order logic so here i'm going to have [00:09:29] logic so here i'm going to have different a1 prime through a k prime [00:09:31] different a1 prime through a k prime these atomic formulas from a1 through ak [00:09:34] these atomic formulas from a1 through ak and different b prime than b these are [00:09:36] and different b prime than b these are going to be different atomic formulas [00:09:38] going to be different atomic formulas okay specifically if you think about it [00:09:41] okay specifically if you think about it these a ones prime through a k prime are [00:09:44] these a ones prime through a k prime are are groundings of this a1 through a k [00:09:47] are groundings of this a1 through a k which basically operate on on these [00:09:50] which basically operate on on these these variables x's and b again operates [00:09:53] these variables x's and b again operates on a variable on x and b prime you can [00:09:55] on a variable on x and b prime you can think of it as a grounding of b okay and [00:09:58] think of it as a grounding of b okay and then b prime and b or a1 prime through a [00:10:00] then b prime and b or a1 prime through a k prime they don't look the same right [00:10:02] k prime they don't look the same right so that's why i can't syntactically just [00:10:04] so that's why i can't syntactically just like replace them by each other but what [00:10:06] like replace them by each other but what i can do is i can use substitution and [00:10:08] i can do is i can use substitution and unification what i can do is first up i [00:10:10] unification what i can do is first up i can look at my a1 prime through ak prime [00:10:12] can look at my a1 prime through ak prime my groundings and then these other [00:10:14] my groundings and then these other atomic formulas a1 through ak and i can [00:10:17] atomic formulas a1 through ak and i can unify them [00:10:18] unify them so once i unify them i get a [00:10:20] so once i unify them i get a substitution rule theta [00:10:23] substitution rule theta and what i can do is i can i can derive [00:10:25] and what i can do is i can i can derive b prime and what is b prime b prime is [00:10:28] b prime and what is b prime b prime is going to be the result of substituting [00:10:30] going to be the result of substituting theta in b and that is going to be my [00:10:33] theta in b and that is going to be my new modus ponens rule okay so i'll end [00:10:36] new modus ponens rule okay so i'll end up getting a grounded version of b b [00:10:39] up getting a grounded version of b b prime and how do i get that by [00:10:41] prime and how do i get that by substituting theta and b and where do i [00:10:43] substituting theta and b and where do i get theta i get theta by unifying a1 [00:10:45] get theta i get theta by unifying a1 prime through a k prime and a1 through [00:10:47] prime through a k prime and a1 through ak okay let's look at an example [00:10:51] ak okay let's look at an example so let's say that in my knowledge base i [00:10:53] so let's say that in my knowledge base i have a premise that says alice takes [00:10:54] have a premise that says alice takes cs221 so this is my first version a1 [00:10:57] cs221 so this is my first version a1 prime which is a grounded version of x [00:11:00] prime which is a grounded version of x taking y and then i have [00:11:03] taking y and then i have cs221 covers mvp again it's a grounded [00:11:05] cs221 covers mvp again it's a grounded version of why taking z so what i do is [00:11:08] version of why taking z so what i do is first i do a unification of of these two [00:11:12] first i do a unification of of these two formulas and these two formulas and [00:11:14] formulas and these two formulas and based on that unification i'm going to [00:11:16] based on that unification i'm going to get a substitution rule that [00:11:17] get a substitution rule that substitution rule tells me that take [00:11:20] substitution rule tells me that take variable x replace it by alice take [00:11:22] variable x replace it by alice take variable b means sorry take takes and [00:11:24] variable b means sorry take takes and take variable by replace it by cs221 [00:11:27] take variable by replace it by cs221 variable z replace it by mvp [00:11:29] variable z replace it by mvp and then what am i going to do right [00:11:31] and then what am i going to do right what am i going to return out of modus [00:11:32] what am i going to return out of modus ponens modus ponens basically tells me [00:11:34] ponens modus ponens basically tells me that this is your b you want to return [00:11:36] that this is your b you want to return kind of a modified version of b and what [00:11:38] kind of a modified version of b and what is that modified version that modified [00:11:41] is that modified version that modified version is using this substitution rule [00:11:44] version is using this substitution rule over your b or over this nose xc so if i [00:11:47] over your b or over this nose xc so if i if i substitute theta in nose xc i'm [00:11:50] if i substitute theta in nose xc i'm going to get alice knows mvp and that is [00:11:52] going to get alice knows mvp and that is the thing i'm going to be returning that [00:11:54] the thing i'm going to be returning that is the thing that i'm going to be [00:11:55] is the thing that i'm going to be deriving here or proving here and that's [00:11:58] deriving here or proving here and that's basically applying modus ponens in first [00:12:00] basically applying modus ponens in first order logic [00:12:02] order logic so let's think about the complexity of [00:12:04] so let's think about the complexity of this is this how is the time complexity [00:12:07] this is this how is the time complexity here and and how how bad is this so if [00:12:10] here and and how how bad is this so if you remember [00:12:11] you remember when we were doing modus ponens in [00:12:12] when we were doing modus ponens in propositional logic every time we were [00:12:15] propositional logic every time we were running mode exponents we were adding [00:12:16] running mode exponents we were adding one we were adding one prepositional [00:12:19] one we were adding one prepositional prepositional symbol right in in the [00:12:21] prepositional symbol right in in the prepositional logic line here every time [00:12:24] prepositional logic line here every time you're running modus bonus you're only [00:12:25] you're running modus bonus you're only adding one atomic formula which is which [00:12:28] adding one atomic formula which is which is not bad which is actually pretty good [00:12:30] is not bad which is actually pretty good okay and in addition to that if you [00:12:31] okay and in addition to that if you don't have any functions right if if [00:12:34] don't have any functions right if if there are no functions going on here [00:12:35] there are no functions going on here then the number of these atomic formulas [00:12:37] then the number of these atomic formulas is at most the number of constant [00:12:39] is at most the number of constant symbols that we have to the power of [00:12:41] symbols that we have to the power of maximum predicate area so so in this [00:12:43] maximum predicate area so so in this example for example i might have p of x [00:12:46] example for example i might have p of x y and z and maybe x takes a hundred [00:12:48] y and z and maybe x takes a hundred values y takes a hundred values and z [00:12:51] values y takes a hundred values and z takes a hundred values [00:12:52] takes a hundred values so then i'm gonna get a hundred to the [00:12:55] so then i'm gonna get a hundred to the power of three which is which is not bad [00:12:58] power of three which is which is not bad but the thing is if there are functions [00:13:00] but the thing is if there are functions here [00:13:01] here then we actually end up with an [00:13:04] then we actually end up with an infinite number of them being applied to [00:13:06] infinite number of them being applied to each other so this becomes unbounded so [00:13:08] each other so this becomes unbounded so if i have if i have a function over a i [00:13:10] if i have if i have a function over a i can keep applying that and then i end up [00:13:13] can keep applying that and then i end up with an infinite number of things being [00:13:15] with an infinite number of things being added in because i can keep applying the [00:13:17] added in because i can keep applying the function on it it on it so remember like [00:13:20] function on it it on it so remember like for example the sum uh function that we [00:13:22] for example the sum uh function that we saw like earlier in one of the examples [00:13:24] saw like earlier in one of the examples we had sum of three and x right so i can [00:13:26] we had sum of three and x right so i can keep applying some on top of each other [00:13:28] keep applying some on top of each other on top of itself and almost like [00:13:30] on top of itself and almost like recreate like arithmetic by applying [00:13:32] recreate like arithmetic by applying some on itself but we are going to get [00:13:35] some on itself but we are going to get an unbounded number of formulas here [00:13:36] an unbounded number of formulas here which is not that great okay [00:13:39] which is not that great okay all right what else do we know about [00:13:41] all right what else do we know about modus ponens so so the thing in modus [00:13:43] modus ponens so so the thing in modus ponens in this space of first order [00:13:44] ponens in this space of first order logic so what we know is modus ponens [00:13:47] logic so what we know is modus ponens turns out to be complete for for first [00:13:49] turns out to be complete for for first order logic with only horn clauses this [00:13:52] order logic with only horn clauses this is a similar type of completeness that [00:13:54] is a similar type of completeness that we have uh when we look at uh when we [00:13:57] we have uh when we look at uh when we look at modus ponens and prepositional [00:13:59] look at modus ponens and prepositional logic again if you are limited to horn [00:14:01] logic again if you are limited to horn causes we have completeness in first [00:14:03] causes we have completeness in first order logic as well [00:14:05] order logic as well in addition to that we know that first [00:14:07] in addition to that we know that first order logic only when it is even only [00:14:09] order logic only when it is even only when it is restricted to horn causes is [00:14:11] when it is restricted to horn causes is semi-decidable so so what does that mean [00:14:14] semi-decidable so so what does that mean what that means is if that is that if [00:14:16] what that means is if that is that if our knowledge base entails f [00:14:19] our knowledge base entails f and then we want to figure out if it [00:14:20] and then we want to figure out if it entails f or not but if it actually [00:14:23] entails f or not but if it actually entails f and we keep doing forward [00:14:25] entails f and we keep doing forward inference we keep trying to derive a new [00:14:28] inference we keep trying to derive a new formula until convergence uh using modus [00:14:31] formula until convergence uh using modus ponens this forward and inference until [00:14:35] ponens this forward and inference until kind of like complete it until we get [00:14:36] kind of like complete it until we get like these complete inference rules and [00:14:39] like these complete inference rules and and getting f takes finite time so so if [00:14:42] and getting f takes finite time so so if my knowledge base actually entails f i [00:14:44] my knowledge base actually entails f i should be able to derive f in finite [00:14:46] should be able to derive f in finite time i should be able to prove f by just [00:14:48] time i should be able to prove f by just inference rules and finite time which is [00:14:51] inference rules and finite time which is pretty nice [00:14:52] pretty nice but with the difficulty that gets me to [00:14:54] but with the difficulty that gets me to semi-decidability is if knowledge base [00:14:56] semi-decidability is if knowledge base doesn't entail f and i might not know if [00:14:59] doesn't entail f and i might not know if knowledge base entails it for a dozen [00:15:00] knowledge base entails it for a dozen entailment right like if i don't know [00:15:02] entailment right like if i don't know and actually knowledge base doesn't [00:15:04] and actually knowledge base doesn't entail f it turns out that there are no [00:15:06] entail f it turns out that there are no algorithms that can show this in finance [00:15:09] algorithms that can show this in finance time [00:15:10] time okay so so [00:15:11] okay so so and then this is actually kind of [00:15:12] and then this is actually kind of related to halting problems so so there [00:15:15] related to halting problems so so there actually people have shown that there is [00:15:16] actually people have shown that there is like no algorithms that that could do [00:15:18] like no algorithms that that could do this in finite time and we are kind of [00:15:20] this in finite time and we are kind of screwed in that case okay [00:15:22] screwed in that case okay so in general this is not too bad in [00:15:25] so in general this is not too bad in general you can think about having a [00:15:27] general you can think about having a budget for the amount of time that [00:15:29] budget for the amount of time that you're going to run your inference role [00:15:31] you're going to run your inference role and run it and see if if you get lucky [00:15:33] and run it and see if if you get lucky and nkb actually entails f you're going [00:15:36] and nkb actually entails f you're going to be able to get f in finite time so so [00:15:39] to be able to get f in finite time so so you could actually run uh first modus [00:15:42] you could actually run uh first modus ponens with first order logic when you [00:15:44] ponens with first order logic when you have horn clauses and and and it does [00:15:47] have horn clauses and and and it does it does work in some instances but then [00:15:50] it does work in some instances but then kb actually entails it [00:15:52] kb actually entails it but then in the next module what we [00:15:53] but then in the next module what we would like to talk about is you want to [00:15:55] would like to talk about is you want to go beyond modus ponens and we want to [00:15:57] go beyond modus ponens and we want to talk about resolution specifically how [00:15:59] talk about resolution specifically how resolution would work in first order [00:16:01] resolution would work in first order logic ================================================================================ LECTURE 049 ================================================================================ Logic 9 - First Order Resolution | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=iG_tz7ZjZAI --- Transcript [00:00:05] okay so in this module we are going to [00:00:07] okay so in this module we are going to be talking about the resolution for [00:00:08] be talking about the resolution for first order logic this is an optional [00:00:11] first order logic this is an optional module but I think it would be [00:00:12] module but I think it would be interesting to think about how we could [00:00:14] interesting to think about how we could apply resolution when we have this more [00:00:16] apply resolution when we have this more complicated logic this first order logic [00:00:18] complicated logic this first order logic and so far we have talked about syntax [00:00:20] and so far we have talked about syntax semantics we have talked about mod [00:00:21] semantics we have talked about mod exponents when we have horn Clauses in [00:00:24] exponents when we have horn Clauses in first order logic and now we want to [00:00:26] first order logic and now we want to extend this idea of applying inference [00:00:30] extend this idea of applying inference to settings where we we don't [00:00:32] to settings where we we don't necessarily have horn Clauses so if you [00:00:34] necessarily have horn Clauses so if you think about first order logic it's not [00:00:36] think about first order logic it's not really limited to settings where we have [00:00:38] really limited to settings where we have horn Clauses we sometimes have non-horn [00:00:40] horn Clauses we sometimes have non-horn Clauses here's actually an example for [00:00:42] Clauses here's actually an example for all xx's student this implies x no Su y [00:00:47] all xx's student this implies x no Su y okay so so this Su y this there exist y [00:00:51] okay so so this Su y this there exist y here is is is going to create a non-horn [00:00:54] here is is is going to create a non-horn clause and why is that because an [00:00:55] clause and why is that because an existential quantifier is really a [00:00:57] existential quantifier is really a glorified or right like it's a glor [00:01:00] glorified or right like it's a glor disjunction so so what this is basically [00:01:02] disjunction so so what this is basically getting us is NOS X and and y1 or X Y Y2 [00:01:08] getting us is NOS X and and y1 or X Y Y2 and so on and and and this basically [00:01:11] and so on and and and this basically creates an or on this side of the [00:01:13] creates an or on this side of the implication and that makes this [00:01:15] implication and that makes this particular statement a non-or clause [00:01:18] particular statement a non-or clause okay so what does that mean that means [00:01:20] okay so what does that mean that means that I can't just apply modus ponents on [00:01:21] that I can't just apply modus ponents on it okay so so what can we do here so the [00:01:25] it okay so so what can we do here so the high level strategy here is that we have [00:01:27] high level strategy here is that we have this formula we have this first order [00:01:29] this formula we have this first order logic formula first off you need to [00:01:31] logic formula first off you need to convert it to to a CNF form form to a [00:01:33] convert it to to a CNF form form to a conjunctive normal form and this is [00:01:35] conjunctive normal form and this is similar to before like before even in [00:01:37] similar to before like before even in prepositional logic then we had [00:01:38] prepositional logic then we had something that wasn't a horn Clause we [00:01:41] something that wasn't a horn Clause we were starting with with writing it as a [00:01:43] were starting with with writing it as a CNF form okay and then after that we [00:01:46] CNF form okay and then after that we repeatedly apply resolution rule on it [00:01:48] repeatedly apply resolution rule on it and our resolution rule here is going to [00:01:50] and our resolution rule here is going to be slightly different from the [00:01:51] be slightly different from the resolution rule uh that we had in [00:01:53] resolution rule uh that we had in propositional logic because similar to [00:01:55] propositional logic because similar to mod exponents we need to do unification [00:01:57] mod exponents we need to do unification we need to do substitution and and and [00:01:59] we need to do substitution and and and similarly we change our resolution rule [00:02:01] similarly we change our resolution rule to actually have that element of [00:02:03] to actually have that element of unification and substitution here okay [00:02:06] unification and substitution here okay converting to CNF is also not exactly [00:02:08] converting to CNF is also not exactly like we did converting to CNF in [00:02:10] like we did converting to CNF in propositional logic there are going to [00:02:12] propositional logic there are going to be a few new things that I'm going to [00:02:14] be a few new things that I'm going to attempt to give you some ideas around it [00:02:17] attempt to give you some ideas around it but in general I'm just giving like a [00:02:19] but in general I'm just giving like a high level strategy idea of how you're [00:02:22] high level strategy idea of how you're going to apply resolution to first order [00:02:24] going to apply resolution to first order logic this is not a very complete um [00:02:27] logic this is not a very complete um complete explanation of that and in [00:02:29] complete explanation of that and in general gets a little bit messy like [00:02:31] general gets a little bit messy like when you think about applying resolution [00:02:32] when you think about applying resolution to first order logic so think of this as [00:02:35] to first order logic so think of this as a big picture like high level strategy [00:02:37] a big picture like high level strategy and overview uh for applying resolution [00:02:39] and overview uh for applying resolution here okay all right so so let's start [00:02:43] here okay all right so so let's start with with a formula let's say this is [00:02:44] with with a formula let's say this is our formula and we have for all X for [00:02:47] our formula and we have for all X for all Y and Y is an animal implies X lops [00:02:50] all Y and Y is an animal implies X lops Y and that implies that there exist in y [00:02:54] Y and that implies that there exist in y uh so such that y LS X okay so so this [00:02:57] uh so such that y LS X okay so so this is some some statement some formula and [00:03:00] is some some statement some formula and and what we would like to do is we would [00:03:01] and what we would like to do is we would like to convert this to a CNF form so [00:03:03] like to convert this to a CNF form so how does a CNF form look like in first [00:03:05] how does a CNF form look like in first order logic so at the end of the day the [00:03:08] order logic so at the end of the day the output is going to look something like [00:03:10] output is going to look something like this for us so it's going to be an end [00:03:13] this for us so it's going to be an end of a bunch of Clauses so these are [00:03:14] of a bunch of Clauses so these are clauses because they have ORS between [00:03:16] clauses because they have ORS between them and in addition to that we have [00:03:18] them and in addition to that we have these new functions these capitalized y [00:03:21] these new functions these capitalized y or capitalized Z and these are called [00:03:23] or capitalized Z and these are called scolum functions and I'm going to talk [00:03:25] scolum functions and I'm going to talk about what they are in in in a few [00:03:27] about what they are in in in a few slides okay so there are few things that [00:03:30] slides okay so there are few things that are new when we think about the CNF form [00:03:33] are new when we think about the CNF form the first thing is that all variables [00:03:35] the first thing is that all variables that I have in this form are actually [00:03:37] that I have in this form are actually like universally Quantified so there is [00:03:39] like universally Quantified so there is like a for all X that exists here that [00:03:42] like a for all X that exists here that I've just dropped okay so so in reality [00:03:45] I've just dropped okay so so in reality there's for all X that exists here and [00:03:48] there's for all X that exists here and then there are these scolum functions [00:03:50] then there are these scolum functions that are functions of of things that are [00:03:52] that are functions of of things that are existentially uh Quantified right so so [00:03:55] existentially uh Quantified right so so so basically basically they represent [00:03:58] so basically basically they represent existential quantifiers here and and [00:04:00] existential quantifiers here and and they're functions of this x thing that [00:04:02] they're functions of this x thing that that has a for all X at the beginning on [00:04:04] that has a for all X at the beginning on it okay so so those are kind of like two [00:04:06] it okay so so those are kind of like two new things that are happening uh in [00:04:08] new things that are happening uh in order to get a CNF form of this first [00:04:11] order to get a CNF form of this first order logic formula let's actually go [00:04:14] order logic formula let's actually go through an example for this so let's [00:04:16] through an example for this so let's start with this statement that says [00:04:19] start with this statement that says anyone who likes all animals is liked by [00:04:22] anyone who likes all animals is liked by someone okay so one can write this as an [00:04:25] someone okay so one can write this as an input that says for all X for all y y is [00:04:28] input that says for all X for all y y is an animal implies X LS Y and that full [00:04:31] an animal implies X LS Y and that full thing implies that there exist in y so y [00:04:35] thing implies that there exist in y so y LS X [00:04:37] LS X okay all right so uh first thing to do [00:04:41] okay all right so uh first thing to do is similar to before if you want to like [00:04:43] is similar to before if you want to like follow like the steps of converting the [00:04:45] follow like the steps of converting the stone CNF form we're going to eliminate [00:04:48] stone CNF form we're going to eliminate implication so I'm going to eliminate [00:04:49] implication so I'm going to eliminate this outside imp implication how do I [00:04:52] this outside imp implication how do I elimin eliminate it I'm going to take [00:04:54] elimin eliminate it I'm going to take the negation of what comes before it so [00:04:56] the negation of what comes before it so negation up until here or or the rest of [00:05:00] negation up until here or or the rest of the statement I'm also going to replace [00:05:04] the statement I'm also going to replace this um implication by negation of the [00:05:06] this um implication by negation of the first part or the second part so [00:05:08] first part or the second part so negation of the first part or the second [00:05:10] negation of the first part or the second part and we get this this statement okay [00:05:14] part and we get this this statement okay now I'm going to push negations inwards [00:05:16] now I'm going to push negations inwards and eliminate double negations this is [00:05:18] and eliminate double negations this is exactly what we have done before so let [00:05:20] exactly what we have done before so let me push negations inside and it goes all [00:05:22] me push negations inside and it goes all the way to negation of love and and now [00:05:25] the way to negation of love and and now we we have ended up with this formula [00:05:27] we we have ended up with this formula where we have these quantifiers right [00:05:29] where we have these quantifiers right like we have these for all and they [00:05:30] like we have these for all and they exist and so on and everything else is [00:05:33] exist and so on and everything else is an atomic formula right remember before [00:05:35] an atomic formula right remember before like when we were trying to convert [00:05:36] like when we were trying to convert things to a CNF form we would end up [00:05:38] things to a CNF form we would end up with propositional with propositional [00:05:40] with propositional with propositional symbols right so we would have we would [00:05:42] symbols right so we would have we would end up with propositional symbols that [00:05:44] end up with propositional symbols that that could take a positive or negative [00:05:46] that could take a positive or negative negative value right so we would have [00:05:48] negative value right so we would have positive or negative literals at the end [00:05:49] positive or negative literals at the end of the day but here we have Atomic [00:05:51] of the day but here we have Atomic formulas so so we end up with this [00:05:53] formulas so so we end up with this Atomic formulas or negations of these [00:05:55] Atomic formulas or negations of these Atomic formulas okay so now one thing [00:05:58] Atomic formulas okay so now one thing that is new is we're kind of [00:06:00] that is new is we're kind of standardizing the variables here so so [00:06:03] standardizing the variables here so so we have a y here and we have a y here [00:06:05] we have a y here and we have a y here but but there is this existential [00:06:07] but but there is this existential quantify on each of them and these y are [00:06:09] quantify on each of them and these y are kind of treated as as a local variable [00:06:12] kind of treated as as a local variable so so in order to kind of like avoid [00:06:14] so so in order to kind of like avoid confusions you're going to Define like a [00:06:16] confusions you're going to Define like a new variable for each of them so I'm [00:06:18] new variable for each of them so I'm going to define a z here and keep this [00:06:20] going to define a z here and keep this as Y and again the reason I'm doing this [00:06:23] as Y and again the reason I'm doing this is at the end of the day I'm removing [00:06:24] is at the end of the day I'm removing this for all X and I want to make sure [00:06:27] this for all X and I want to make sure that these this Y is a function of x and [00:06:29] that these this Y is a function of x and this Z is a function of X and these are [00:06:31] this Z is a function of X and these are different things these are two different [00:06:33] different things these are two different local variables okay all right so so [00:06:36] local variables okay all right so so this is new so I'm a standardizing [00:06:38] this is new so I'm a standardizing variables this is a new Step that is [00:06:40] variables this is a new Step that is done here okay now now that we are left [00:06:43] done here okay now now that we are left with this formula what we are going to [00:06:45] with this formula what we are going to do is we're going to replace all these [00:06:48] do is we're going to replace all these existential qu existentially Quantified [00:06:50] existential qu existentially Quantified variables with something that's called a [00:06:53] variables with something that's called a scolum function okay so so before we had [00:06:56] scolum function okay so so before we had there exists of Y and and this there [00:06:58] there exists of Y and and this there exists of Y depend depends on X too [00:07:00] exists of Y depend depends on X too right so so for all X there exist a y so [00:07:03] right so so for all X there exist a y so this is really a function of X the [00:07:05] this is really a function of X the scolum function is a y function of X the [00:07:08] scolum function is a y function of X the scolum function is y function of X or Z [00:07:11] scolum function is y function of X or Z is a function of of X so I'm going to [00:07:13] is a function of of X so I'm going to write these scolum functions as [00:07:15] write these scolum functions as functions of this variable that is that [00:07:17] functions of this variable that is that is universally Quantified and then I'm [00:07:19] is universally Quantified and then I'm going to just drop that I'm going to [00:07:21] going to just drop that I'm going to later on drop this for all X so that [00:07:24] later on drop this for all X so that that makes my life easier and then [00:07:26] that makes my life easier and then finally I need to distribute uh or [00:07:28] finally I need to distribute uh or overend so I can end up with Clauses in [00:07:31] overend so I can end up with Clauses in conjunctive normal form and this is a [00:07:33] conjunctive normal form and this is a similar step that we've had before in [00:07:35] similar step that we've had before in propositional logic and remove the [00:07:37] propositional logic and remove the universal quantifiers and this is what I [00:07:39] universal quantifiers and this is what I would end up at so so now I've end up in [00:07:41] would end up at so so now I've end up in in in with a formula in CNF form uh in [00:07:45] in in with a formula in CNF form uh in first order logic okay so just just [00:07:48] first order logic okay so just just recap what is new in it we have scolum [00:07:50] recap what is new in it we have scolum functions which kind of represent [00:07:52] functions which kind of represent existential quantifiers and variables [00:07:54] existential quantifiers and variables that are universally Quantified I've [00:07:56] that are universally Quantified I've also dropped the universally quantifier [00:07:58] also dropped the universally quantifier Universal quantify on all my variables [00:08:01] Universal quantify on all my variables here okay so that those are kind of what [00:08:04] here okay so that those are kind of what the core differences here okay so now we [00:08:06] the core differences here okay so now we are ready to talk about resolution now [00:08:09] are ready to talk about resolution now that we can write our our formulas our [00:08:10] that we can write our our formulas our first order logic formulas in CNF form [00:08:14] first order logic formulas in CNF form then we can write the resolution rule as [00:08:15] then we can write the resolution rule as follows so we have these Atomic formulas [00:08:18] follows so we have these Atomic formulas F1 or through FN or p and then we have [00:08:21] F1 or through FN or p and then we have another thing in our set of premises [00:08:23] another thing in our set of premises negation of Q or G1 through GM and and [00:08:27] negation of Q or G1 through GM and and notice that P and Q could be different [00:08:30] notice that P and Q could be different things because they might just look [00:08:31] things because they might just look different from each other so what we do [00:08:33] different from each other so what we do is we unify p and Q and when we unify p [00:08:37] is we unify p and Q and when we unify p and Q we get a substitution rule Theta [00:08:40] and Q we get a substitution rule Theta and and then what we can actually infer [00:08:42] and and then what we can actually infer what we can derive here from resolution [00:08:45] what we can derive here from resolution is going to be an or of F1 or all the [00:08:48] is going to be an or of F1 or all the way through FN or G1 all the way through [00:08:51] way through FN or G1 all the way through GN we are basically canceling out p and [00:08:53] GN we are basically canceling out p and Q with each other but the reason we can [00:08:55] Q with each other but the reason we can do that is we have unified p and Q with [00:08:58] do that is we have unified p and Q with with the substitution rule Theta so so [00:09:00] with the substitution rule Theta so so in this new formula we are substituting [00:09:02] in this new formula we are substituting Theta in the formula okay this is kind [00:09:05] Theta in the formula okay this is kind of similar to the substitution and [00:09:07] of similar to the substitution and unification that we did in MO in modus [00:09:09] unification that we did in MO in modus components we're just doing this now on [00:09:12] components we're just doing this now on resolution on these uh CNF Clauses that [00:09:15] resolution on these uh CNF Clauses that that we have uh we have just created [00:09:18] that we have uh we have just created okay let's look at an example here so so [00:09:20] okay let's look at an example here so so let's say that I have two CNF Clauses uh [00:09:24] let's say that I have two CNF Clauses uh here so I have animal y ofx or love Z of [00:09:28] here so I have animal y ofx or love Z of XX and then I have negation of loves un [00:09:31] XX and then I have negation of loves un and V or feeds un andv okay so loves and [00:09:35] and V or feeds un andv okay so loves and negation of loves are the things that I [00:09:37] negation of loves are the things that I would like to I would like to be able to [00:09:39] would like to I would like to be able to do uh unification on so if I unify these [00:09:42] do uh unification on so if I unify these two then I'm going to come up with a [00:09:44] two then I'm going to come up with a substitution rule that says substitute [00:09:46] substitution rule that says substitute variable U with function C of X [00:09:49] variable U with function C of X substitute the variable V with with with [00:09:52] substitute the variable V with with with variable X okay and at the end of the [00:09:55] variable X okay and at the end of the day the thing that I'm am inferring I'm [00:09:57] day the thing that I'm am inferring I'm deriving here is going going to [00:09:59] deriving here is going going to basically cancel out these two and it's [00:10:02] basically cancel out these two and it's going to get animal y ofx or feeds u and [00:10:07] going to get animal y ofx or feeds u and v but I'm not going to put u and v in [00:10:08] v but I'm not going to put u and v in there anymore why why why is that [00:10:10] there anymore why why why is that because I'm substituting this Theta I'm [00:10:13] because I'm substituting this Theta I'm substituting uh Z of X for U and X for V [00:10:17] substituting uh Z of X for U and X for V so so the thing that at the end of the [00:10:19] so so the thing that at the end of the day I'm I'm proving is animal y of Y of [00:10:22] day I'm I'm proving is animal y of Y of X or feed Z of XX okay so this is [00:10:26] X or feed Z of XX okay so this is there's quite a bit of like symbol [00:10:27] there's quite a bit of like symbol manipulation going on here but kind of [00:10:29] manipulation going on here but kind of do I get the gist of it it's very [00:10:30] do I get the gist of it it's very similar to resolution that we have seen [00:10:32] similar to resolution that we have seen so far combined with unification and [00:10:34] so far combined with unification and substitution over these new new Clauses [00:10:37] substitution over these new new Clauses these new CNF Clauses that that we have [00:10:39] these new CNF Clauses that that we have talked about and that summarizes How We [00:10:42] talked about and that summarizes How We Do inference uh using resolution in [00:10:45] Do inference uh using resolution in first order logic ================================================================================ LECTURE 050 ================================================================================ Logic 10 - Recap | Stanford CS221: Artificial Intelligence (Autumn 2021) Source: https://www.youtube.com/watch?v=LYsOjtmLpPo --- Transcript [00:00:05] okay so this is the last module of last [00:00:08] okay so this is the last module of last set of lectures this quarter so [00:00:11] set of lectures this quarter so let's do a recap of logic so what have [00:00:14] let's do a recap of logic so what have we talked about [00:00:15] we talked about so we talked about logic during this [00:00:17] so we talked about logic during this week we talked about three main [00:00:18] week we talked about three main ingredients of logic we talked about [00:00:20] ingredients of logic we talked about syntax which basically defines a set of [00:00:23] syntax which basically defines a set of formulas allows us to syntactically [00:00:25] formulas allows us to syntactically symbolically talk about formulas and [00:00:28] symbolically talk about formulas and talk about things that exist in the [00:00:29] talk about things that exist in the world so for example i might say rain [00:00:32] world so for example i might say rain and wet without knowing what rain means [00:00:34] and wet without knowing what rain means or what means or what this ant symbol [00:00:36] or what means or what this ant symbol means and and that is in the syntax line [00:00:38] means and and that is in the syntax line where i just like have symbols and i can [00:00:40] where i just like have symbols and i can manipulate those symbols [00:00:42] manipulate those symbols and then i can assign meanings to the [00:00:44] and then i can assign meanings to the syntax using semantics so the idea of [00:00:46] syntax using semantics so the idea of semantics is that for every formula you [00:00:49] semantics is that for every formula you can specify a set of models m of f which [00:00:52] can specify a set of models m of f which is basically a set of assignments or [00:00:54] is basically a set of assignments or configurations in the world that assign [00:00:56] configurations in the world that assign meaning to a formula a syntactic formula [00:00:59] meaning to a formula a syntactic formula f [00:01:00] f so in the case of rain and width for [00:01:02] so in the case of rain and width for example rain can take values 0 and 1 [00:01:04] example rain can take values 0 and 1 when the wet can take values 0 and 1 and [00:01:07] when the wet can take values 0 and 1 and then in the case of rain and width it [00:01:09] then in the case of rain and width it would be this darker area that [00:01:10] would be this darker area that corresponds to the meaning of what does [00:01:12] corresponds to the meaning of what does it mean when both rain and red is true [00:01:14] it mean when both rain and red is true okay [00:01:15] okay so in general when we try to define [00:01:17] so in general when we try to define logic we need both syntax and semantics [00:01:20] logic we need both syntax and semantics syntax as a symbol as a way of just [00:01:22] syntax as a symbol as a way of just writing it out writing out the formula [00:01:24] writing it out writing out the formula semantics as a way of giving meaning to [00:01:26] semantics as a way of giving meaning to those formulas and in addition to syntax [00:01:28] those formulas and in addition to syntax and semantics we talked about inference [00:01:30] and semantics we talked about inference rules we spent quite a bit of time [00:01:31] rules we spent quite a bit of time talking about modus ponens and [00:01:33] talking about modus ponens and resolution for for both propositional [00:01:35] resolution for for both propositional logic and first order logic as ways of [00:01:38] logic and first order logic as ways of doing inference on our knowledge base so [00:01:40] doing inference on our knowledge base so we have a knowledge base which has a [00:01:42] we have a knowledge base which has a bunch of formulas and the question is [00:01:44] bunch of formulas and the question is what are some new formulas that we can [00:01:46] what are some new formulas that we can derive from that knowledge base for [00:01:48] derive from that knowledge base for example you might have rain and wet and [00:01:50] example you might have rain and wet and from that i can derive rain right like i [00:01:53] from that i can derive rain right like i can actually derive and prove that it's [00:01:55] can actually derive and prove that it's raining so how do we how do we think [00:01:57] raining so how do we how do we think about inference rules how can we infer [00:01:59] about inference rules how can we infer new new type of formulas here [00:02:02] new new type of formulas here and what can we tell about the formulas [00:02:04] and what can we tell about the formulas that we infer so i think that is also an [00:02:06] that we infer so i think that is also an interesting question that we have been [00:02:08] interesting question that we have been talking about so so how do we think [00:02:09] talking about so so how do we think about inference algorithms so if you [00:02:11] about inference algorithms so if you have a knowledge base the idea is if we [00:02:13] have a knowledge base the idea is if we have an inference rule like modus ponens [00:02:15] have an inference rule like modus ponens or resolution we should repeatedly apply [00:02:18] or resolution we should repeatedly apply that inference rule until we get new [00:02:20] that inference rule until we get new formulas and we derive new formulas new [00:02:22] formulas and we derive new formulas new f and as we get new formulas you're [00:02:25] f and as we get new formulas you're expanding our knowledge base but we're [00:02:27] expanding our knowledge base but we're shrinking our the space of models [00:02:30] shrinking our the space of models because we're adding more constraints if [00:02:31] because we're adding more constraints if i add a new formula in general and i'm [00:02:33] i add a new formula in general and i'm shrinking my space if i'm deriving a [00:02:36] shrinking my space if i'm deriving a formula though if that formula is just [00:02:38] formula though if that formula is just derived from knowledge base it's not [00:02:40] derived from knowledge base it's not it's not changing the size of my [00:02:41] it's not changing the size of my knowledge base [00:02:43] knowledge base so here is an example so let's say i [00:02:44] so here is an example so let's say i have wet and weekday and wet and weekday [00:02:47] have wet and weekday and wet and weekday implying traffic from these three [00:02:49] implying traffic from these three formulas in my premise what can i [00:02:52] formulas in my premise what can i conclude what can i infer so we talked [00:02:54] conclude what can i infer so we talked about modus ponens as an inference rule [00:02:56] about modus ponens as an inference rule that allows us to infer traffic out of [00:02:58] that allows us to infer traffic out of this okay [00:03:00] this okay and then more more generally what does [00:03:02] and then more more generally what does modus ponens does and what does oneness [00:03:04] modus ponens does and what does oneness wants to modus points thinks about [00:03:06] wants to modus points thinks about having a set of propositional symbols p1 [00:03:09] having a set of propositional symbols p1 through pk [00:03:10] through pk and then a formula that is p1 added [00:03:13] and then a formula that is p1 added through pk implying q [00:03:15] through pk implying q and then it basically says we can derive [00:03:17] and then it basically says we can derive q that is what bonus modus ponens does [00:03:20] q that is what bonus modus ponens does and then we basically talked about [00:03:22] and then we basically talked about soundness and completeness of of [00:03:25] soundness and completeness of of inference rules for example modus ponens [00:03:27] inference rules for example modus ponens as an example so what does soundness [00:03:28] as an example so what does soundness mean soundness means that if you're [00:03:30] mean soundness means that if you're deriving things if you're deriving new [00:03:32] deriving things if you're deriving new formulas you need to make sure that [00:03:34] formulas you need to make sure that these new formulas are actually true [00:03:36] these new formulas are actually true they're actually they live in this space [00:03:38] they're actually they live in this space of things that are entailed and are [00:03:40] of things that are entailed and are actually true and and if you remember [00:03:42] actually true and and if you remember our what our example with glass and [00:03:45] our what our example with glass and water inside of the glass what soundness [00:03:48] water inside of the glass what soundness means is anything that we are deriving [00:03:49] means is anything that we are deriving should be inside of the glass it should [00:03:51] should be inside of the glass it should be a formula that's inside of the glass [00:03:53] be a formula that's inside of the glass because everything that's in glass is is [00:03:55] because everything that's in glass is is entailed and is true so we need to make [00:03:57] entailed and is true so we need to make sure that everything that you're [00:03:58] sure that everything that you're deriving if you want to make sure it's [00:04:00] deriving if you want to make sure it's sound it has to be inside of the glass [00:04:02] sound it has to be inside of the glass on the other hand we talked about [00:04:03] on the other hand we talked about completeness and completeness means that [00:04:06] completeness and completeness means that we are deriving the whole truth meaning [00:04:08] we are deriving the whole truth meaning that if i have this class i should be [00:04:10] that if i have this class i should be able to derive everything that's that is [00:04:12] able to derive everything that's that is inside of this class or even more that [00:04:14] inside of this class or even more that that is what completeness means okay [00:04:17] that is what completeness means okay and if i have both soundness and [00:04:19] and if i have both soundness and completeness then derivation and [00:04:21] completeness then derivation and entailment are basically the same thing [00:04:22] entailment are basically the same thing right like remember entailment is is is [00:04:25] right like remember entailment is is is about about meanings about semantics [00:04:27] about about meanings about semantics right entailment is is about what what [00:04:30] right entailment is is about what what is the meaning of f actually being [00:04:32] is the meaning of f actually being entailed by the knowledge base or being [00:04:34] entailed by the knowledge base or being contradicted by the knowledge base but [00:04:36] contradicted by the knowledge base but derivation is is just basically symbol [00:04:39] derivation is is just basically symbol manipulation okay so [00:04:41] manipulation okay so so it's difficult to talk about [00:04:43] so it's difficult to talk about entailment kind of in the semantics land [00:04:45] entailment kind of in the semantics land because you have to think about meanings [00:04:46] because you have to think about meanings and so on but if you can do derivation [00:04:49] and so on but if you can do derivation in in in the world of syntax you're just [00:04:51] in in in the world of syntax you're just moving around formulas and and kind of [00:04:54] moving around formulas and and kind of mindlessly by moving around formulas and [00:04:56] mindlessly by moving around formulas and applying inference rules you can derive [00:04:58] applying inference rules you can derive new formulas and and that allows you to [00:05:01] new formulas and and that allows you to have a compact way of thinking about [00:05:02] have a compact way of thinking about these formulas and new formulas that are [00:05:04] these formulas and new formulas that are being derived so if derivation is the [00:05:06] being derived so if derivation is the same thing as entailment that's pretty [00:05:08] same thing as entailment that's pretty nice because if if you have a virtual [00:05:11] nice because if if you have a virtual assistant and you want to ask a virtual [00:05:13] assistant and you want to ask a virtual assistant a question or maybe you want [00:05:14] assistant a question or maybe you want to tell it some information that [00:05:16] to tell it some information that corresponds to maybe an entailment [00:05:18] corresponds to maybe an entailment question and that might be difficult to [00:05:20] question and that might be difficult to answer so instead if you have a sound [00:05:22] answer so instead if you have a sound and complete inference rule for your for [00:05:25] and complete inference rule for your for your logic then maybe you can just check [00:05:27] your logic then maybe you can just check derivation and derivation alone is going [00:05:30] derivation and derivation alone is going to is going to give you the answer okay [00:05:32] to is going to give you the answer okay so that is why we talked about [00:05:33] so that is why we talked about derivation for a bit [00:05:35] derivation for a bit and we discussed modus ponens um for [00:05:38] and we discussed modus ponens um for propositional logic and the fact that [00:05:40] propositional logic and the fact that modus ponens is actually sound for [00:05:42] modus ponens is actually sound for propositional logic but it is not [00:05:44] propositional logic but it is not complete it's not able to get all the [00:05:47] complete it's not able to get all the formulas that are true [00:05:49] formulas that are true so in order to solve that we had two [00:05:51] so in order to solve that we had two solutions one was maybe propositional [00:05:54] solutions one was maybe propositional logic is too large maybe we should [00:05:56] logic is too large maybe we should reduce the size of propositional logic [00:05:59] reduce the size of propositional logic and then the other idea was maybe modus [00:06:01] and then the other idea was maybe modus ponens is not as strong maybe we should [00:06:03] ponens is not as strong maybe we should make modus ponens stronger or come up [00:06:05] make modus ponens stronger or come up with a stronger inference rule so let's [00:06:07] with a stronger inference rule so let's talk about those two ideas the first [00:06:09] talk about those two ideas the first idea is propositional logic allows us to [00:06:12] idea is propositional logic allows us to talk about any legal combination of [00:06:14] talk about any legal combination of symbols and and that is pretty [00:06:16] symbols and and that is pretty expressive but maybe that is too [00:06:18] expressive but maybe that is too expressive so maybe we can just look at [00:06:20] expressive so maybe we can just look at propositional logic with only horn [00:06:23] propositional logic with only horn clauses so this is a restricted set of [00:06:25] clauses so this is a restricted set of logic formulas and that allows us [00:06:28] logic formulas and that allows us to to basically have this more [00:06:30] to to basically have this more restricted set and that allows us to get [00:06:32] restricted set and that allows us to get both soundness and completeness with [00:06:35] both soundness and completeness with modus ponens so what is a horn clause [00:06:37] modus ponens so what is a horn clause horn clause is basically a clause that [00:06:39] horn clause is basically a clause that has at most one positive literal so if [00:06:41] has at most one positive literal so if you write this in the cnn form in the [00:06:43] you write this in the cnn form in the conjunctive normal form then you want to [00:06:45] conjunctive normal form then you want to basically make sure that you have at [00:06:47] basically make sure that you have at most one positive literal another way of [00:06:50] most one positive literal another way of writing it is that you have you have an [00:06:52] writing it is that you have you have an end of a set of propositional symbols p1 [00:06:54] end of a set of propositional symbols p1 added through pk and that implies some [00:06:57] added through pk and that implies some sum q and you basically want to make [00:06:59] sum q and you basically want to make sure that there are no no ores here or [00:07:02] sure that there are no no ores here or no branchings here that that is why [00:07:05] no branchings here that that is why we could actually show [00:07:06] we could actually show completeness with horn clauses [00:07:09] completeness with horn clauses all right so [00:07:10] all right so that was basically propositional logic [00:07:13] that was basically propositional logic uh [00:07:14] uh with horn clauses using modus ponens and [00:07:16] with horn clauses using modus ponens and that gives us completeness okay without [00:07:19] that gives us completeness okay without we the general propositional logic [00:07:21] we the general propositional logic doesn't give us completeness with more [00:07:23] doesn't give us completeness with more response the other option we discussed [00:07:26] response the other option we discussed is is maybe we should have a fancier [00:07:28] is is maybe we should have a fancier inference rule specifically resolution [00:07:30] inference rule specifically resolution was the thing that we started looking at [00:07:32] was the thing that we started looking at so resolution was able to give us both [00:07:35] so resolution was able to give us both soundness and completeness the issue [00:07:37] soundness and completeness the issue with it was uh it was it was actually [00:07:40] with it was uh it was it was actually exponential time as opposed to linear [00:07:42] exponential time as opposed to linear time and when we think about modus [00:07:44] time and when we think about modus ponens where we keep adding only one [00:07:45] ponens where we keep adding only one formula and at most like we'll end up [00:07:47] formula and at most like we'll end up with n formulas but with resolution we [00:07:49] with n formulas but with resolution we might have an exponential time algorithm [00:07:52] might have an exponential time algorithm but we end up getting both soundness and [00:07:54] but we end up getting both soundness and completeness which is nice [00:07:57] completeness which is nice okay so that was all about prepositional [00:07:59] okay so that was all about prepositional logic at some point we started talking [00:08:00] logic at some point we started talking about first order logic so we started [00:08:03] about first order logic so we started expanding our logic and try to be more [00:08:05] expanding our logic and try to be more expressive in our logic try to be able [00:08:07] expressive in our logic try to be able to talk about variables talk about uh [00:08:10] to talk about variables talk about uh quantifiers and and be able to have a [00:08:12] quantifiers and and be able to have a better way of representing things that [00:08:14] better way of representing things that are much harder to represent in [00:08:15] are much harder to represent in propositional logic and then we started [00:08:18] propositional logic and then we started talking about syntax semantics and [00:08:20] talking about syntax semantics and inference rules for for first order [00:08:22] inference rules for for first order logic so so basically we went over the [00:08:23] logic so so basically we went over the same things for first order logic so [00:08:26] same things for first order logic so comparing propositional logic with first [00:08:28] comparing propositional logic with first order logic in propositional logic we [00:08:30] order logic in propositional logic we have this option of doing model checking [00:08:32] have this option of doing model checking when we think about our models and and [00:08:35] when we think about our models and and the semantics of our models in first [00:08:37] the semantics of our models in first order logic we don't really have an [00:08:39] order logic we don't really have an analog of that but we have this other [00:08:41] analog of that but we have this other thing called propositionalization so for [00:08:43] thing called propositionalization so for subset of first order logic uh formulas [00:08:46] subset of first order logic uh formulas what we can do is we can do [00:08:47] what we can do is we can do propositionalization and that takes us [00:08:50] propositionalization and that takes us back back to the propositional logic [00:08:51] back back to the propositional logic land and we can use the same tools that [00:08:53] land and we can use the same tools that are there [00:08:55] are there thinking about inference rules we [00:08:57] thinking about inference rules we discussed modus ponens with horn clauses [00:09:00] discussed modus ponens with horn clauses and the fact that it sound incomplete [00:09:02] and the fact that it sound incomplete same story is true with first order [00:09:04] same story is true with first order logic so we could apply modus ponens [00:09:06] logic so we could apply modus ponens with horn clauses in first order logic [00:09:08] with horn clauses in first order logic it's sound and complete there's a plus [00:09:10] it's sound and complete there's a plus plus here and that plus plus basically [00:09:12] plus here and that plus plus basically means that we had to change modus ponens [00:09:15] means that we had to change modus ponens a little bit so we discussed unification [00:09:18] a little bit so we discussed unification and substitution because we have [00:09:20] and substitution because we have variables here so because there are [00:09:22] variables here so because there are those variables you should be able to we [00:09:24] those variables you should be able to we should we should apply unification and [00:09:26] should we should apply unification and substitution to make sure that our modus [00:09:28] substitution to make sure that our modus ponens makes sense in the space of first [00:09:30] ponens makes sense in the space of first order logic [00:09:32] order logic similarly we discussed resolution and [00:09:34] similarly we discussed resolution and then it showed that it is general and [00:09:35] then it showed that it is general and it's sound and complete in propositional [00:09:37] it's sound and complete in propositional logic and in the case of first order [00:09:39] logic and in the case of first order logic we briefly discussed this in an [00:09:42] logic we briefly discussed this in an optional uh module and again like this [00:09:45] optional uh module and again like this is resolution plus plus because we are [00:09:47] is resolution plus plus because we are talking about applying unification and [00:09:50] talking about applying unification and and substitution on resolution too and [00:09:52] and substitution on resolution too and again it sounded complete even under [00:09:54] again it sounded complete even under first order logic which is kind of nice [00:09:57] first order logic which is kind of nice all right so that summarizes our logic [00:09:59] all right so that summarizes our logic lecture i just want to like leave you [00:10:02] lecture i just want to like leave you guys with one thought when you think [00:10:04] guys with one thought when you think about logic [00:10:05] about logic so so what is it about logic that is [00:10:07] so so what is it about logic that is useful again [00:10:08] useful again we talked about all the all the [00:10:10] we talked about all the all the limitations of it right the fact that it [00:10:12] limitations of it right the fact that it can't handle uncertainty or it's not [00:10:15] can't handle uncertainty or it's not probabilistic really or it's pretty [00:10:17] probabilistic really or it's pretty brittle it doesn't it's not able to [00:10:19] brittle it doesn't it's not able to capture data as you get more data it's [00:10:22] capture data as you get more data it's hard to update its rules because it has [00:10:24] hard to update its rules because it has all these deterministic rules that are [00:10:25] all these deterministic rules that are built on top of each other [00:10:27] built on top of each other but it does have one big benefit and [00:10:30] but it does have one big benefit and that one big benefit is that it allows [00:10:32] that one big benefit is that it allows us to have a very compact and concise [00:10:35] us to have a very compact and concise way of representing knowledge that we [00:10:37] way of representing knowledge that we wouldn't normally have right like [00:10:39] wouldn't normally have right like remember like the whole point of [00:10:41] remember like the whole point of inference rules was i had this logical [00:10:43] inference rules was i had this logical formula which is like a very compact way [00:10:45] formula which is like a very compact way of thinking about semantics and [00:10:47] of thinking about semantics and knowledge that's actually pretty [00:10:48] knowledge that's actually pretty difficult to represent in the semantics [00:10:50] difficult to represent in the semantics land and now that i have this concise [00:10:53] land and now that i have this concise formula i can just manipulate it i can [00:10:54] formula i can just manipulate it i can move it around and i can do all sorts of [00:10:56] move it around and i can do all sorts of inference rules on top of it i can come [00:10:58] inference rules on top of it i can come up with new formulas derive new formulas [00:11:01] up with new formulas derive new formulas and prove new formulas and that's pretty [00:11:03] and prove new formulas and that's pretty interesting and it's much harder to do [00:11:05] interesting and it's much harder to do that in the semantics line so the thing [00:11:08] that in the semantics line so the thing that logic really gives us it's really [00:11:10] that logic really gives us it's really like big strength here is having this [00:11:12] like big strength here is having this compact representation that can help us [00:11:15] compact representation that can help us think better about about about formulas [00:11:18] think better about about about formulas and and do better manipulation of them [00:11:20] and and do better manipulation of them and i think one thing that would be very [00:11:22] and i think one thing that would be very interesting to think about is how could [00:11:23] interesting to think about is how could we use these ideas maybe not exactly [00:11:26] we use these ideas maybe not exactly logic but ideas from logic in some of [00:11:28] logic but ideas from logic in some of the more modern ai systems some of the [00:11:31] the more modern ai systems some of the more modern machine learning based [00:11:32] more modern machine learning based systems and and i think that is a pretty [00:11:35] systems and and i think that is a pretty interesting view of logic that that [00:11:37] interesting view of logic that that would be good to take from this class [00:11:44] you ================================================================================ LECTURE 051 ================================================================================ AI and Law I Mariano-Florentino Cuéllar, President of the Carnegie Endowment for International Peace Source: https://www.youtube.com/watch?v=_-hBu3_Jz-0 --- Transcript [00:00:05] so today we have the pleasure of hearing [00:00:07] so today we have the pleasure of hearing from justice mariano florentino whaler [00:00:10] from justice mariano florentino whaler tino is a professor in the law school at [00:00:11] tino is a professor in the law school at stanford he's also a justice on the [00:00:14] stanford he's also a justice on the california supreme court he was also an [00:00:16] california supreme court he was also an official in the clinton and obama [00:00:17] official in the clinton and obama administration which is incredibly cool [00:00:20] administration which is incredibly cool tino did his undergrad at harvard he did [00:00:23] tino did his undergrad at harvard he did his law school at yale he also has a phd [00:00:25] his law school at yale he also has a phd in political science at stanford and he [00:00:27] in political science at stanford and he has done a lot of work around cyber law [00:00:30] has done a lot of work around cyber law and actually like ai and legislation as [00:00:32] and actually like ai and legislation as well as other work around international [00:00:34] well as other work around international affairs and public health he also [00:00:36] affairs and public health he also teaches a class on regulating ai which [00:00:39] teaches a class on regulating ai which is a very cool class so if you're [00:00:41] is a very cool class so if you're interested in these areas and if you're [00:00:42] interested in these areas and if you're interested in these topics i absolutely [00:00:44] interested in these topics i absolutely recommend taking that class especially [00:00:46] recommend taking that class especially after taking 221 i think that would be a [00:00:48] after taking 221 i think that would be a really good class to take [00:00:50] really good class to take um and then we've been basically [00:00:52] um and then we've been basically interacting with tino through the ai [00:00:53] interacting with tino through the ai safety center and the human center the [00:00:55] safety center and the human center the ai institute institute over the past [00:00:58] ai institute institute over the past couple of years we have a project [00:00:59] couple of years we have a project together right now on adaptive agents so [00:01:01] together right now on adaptive agents so it's really great to work with tino it's [00:01:03] it's really great to work with tino it's really great to hear from him and and [00:01:05] really great to hear from him and and for me he's like on the list of the top [00:01:07] for me he's like on the list of the top five people i would want to have a [00:01:09] five people i would want to have a conversation with this includes [00:01:10] conversation with this includes roboticists this is a very small list so [00:01:13] roboticists this is a very small list so it's really great today to hear from [00:01:15] it's really great today to hear from tino and and i'm excited to hear your [00:01:17] tino and and i'm excited to hear your talking you know so welcome [00:01:19] talking you know so welcome thank you very much professor sally [00:01:21] thank you very much professor sally dorsa and thank you [00:01:23] dorsa and thank you percy and peng and woody it's really an [00:01:26] percy and peng and woody it's really an honor to be here and to share some time [00:01:28] honor to be here and to share some time with you i have to tell you that that [00:01:29] with you i have to tell you that that last comment you made dorsa is a lot of [00:01:32] last comment you made dorsa is a lot of pressure i don't want to let the class [00:01:34] pressure i don't want to let the class down [00:01:34] down and get demoted and not be on your top [00:01:37] and get demoted and not be on your top five list um [00:01:38] five list um it's also you know been really great to [00:01:40] it's also you know been really great to get to know you and i learned so much [00:01:42] get to know you and i learned so much from all of our interactions i [00:01:44] from all of our interactions i appreciate that you've come to speak at [00:01:46] appreciate that you've come to speak at my class so it's only fair and it's [00:01:47] my class so it's only fair and it's really an honor to be here [00:01:49] really an honor to be here i want to take like 35 to 40 minutes [00:01:52] i want to take like 35 to 40 minutes which i know in the era of zoom is a [00:01:54] which i know in the era of zoom is a long time so i'm going to hope that [00:01:56] long time so i'm going to hope that those of you who have been [00:01:58] those of you who have been good enough to tune in i know these uh [00:02:00] good enough to tune in i know these uh doing this live is optional that you're [00:02:02] doing this live is optional that you're going to find this worthwhile i want us [00:02:04] going to find this worthwhile i want us to have a lot of time for this [00:02:05] to have a lot of time for this discussion but let me just give you a [00:02:06] discussion but let me just give you a quick overview of what i mostly want to [00:02:08] quick overview of what i mostly want to do [00:02:09] do i [00:02:10] i want to explore with you why [00:02:14] want to explore with you why your interest in artificial intelligence [00:02:16] your interest in artificial intelligence which is what led you to take this class [00:02:18] which is what led you to take this class is actually incredibly relevant to [00:02:21] is actually incredibly relevant to policy to politics and law and along the [00:02:24] policy to politics and law and along the way you're going to see it's also [00:02:26] way you're going to see it's also relevant to international affairs and [00:02:27] relevant to international affairs and geopolitics [00:02:28] geopolitics but really in the course of this talk i [00:02:31] but really in the course of this talk i want to share with you [00:02:32] want to share with you some reasons not only why you should be [00:02:35] some reasons not only why you should be interested in law and policy and take [00:02:37] interested in law and policy and take your technical knowledge and expect that [00:02:38] your technical knowledge and expect that it's going to be relevant to a lot of [00:02:40] it's going to be relevant to a lot of really important questions the world is [00:02:42] really important questions the world is facing i also want to maybe [00:02:45] facing i also want to maybe give you a sense of how i became really [00:02:48] give you a sense of how i became really interested in the subject along the way [00:02:50] interested in the subject along the way and i'm going to try to share my slides [00:02:52] and i'm going to try to share my slides now so you have a better sense [00:02:55] now so you have a better sense of what we're talking about [00:02:57] of what we're talking about so let me start by by noting that [00:03:01] so let me start by by noting that right now you're at an amazing moment in [00:03:03] right now you're at an amazing moment in your life you're learning about [00:03:04] your life you're learning about artificial intelligence you have this [00:03:06] artificial intelligence you have this extraordinary university at least [00:03:08] extraordinary university at least virtually around you eventually you'll [00:03:10] virtually around you eventually you'll be back here physically i hope and [00:03:12] be back here physically i hope and expect [00:03:13] expect and you can look at this talk and think [00:03:15] and you can look at this talk and think about it from the perspective of a [00:03:16] about it from the perspective of a technical expert which is what you're [00:03:18] technical expert which is what you're becoming by taking this class [00:03:20] becoming by taking this class but before we get to that i want you to [00:03:22] but before we get to that i want you to imagine yourself not as a technical [00:03:23] imagine yourself not as a technical expert but to think of yourself as just [00:03:25] expert but to think of yourself as just a citizen somebody who has to think [00:03:27] a citizen somebody who has to think about [00:03:28] about how does this technology affect daily [00:03:30] how does this technology affect daily life [00:03:31] life who's being affected by it [00:03:33] who's being affected by it where are the inequities what are the [00:03:35] where are the inequities what are the opportunities for understanding it [00:03:36] opportunities for understanding it better [00:03:37] better and then near the end of the talk i want [00:03:39] and then near the end of the talk i want you to imagine yourself as a policy [00:03:41] you to imagine yourself as a policy maker somebody who has to make decisions [00:03:43] maker somebody who has to make decisions about how to allocate scarce resources [00:03:45] about how to allocate scarce resources where government budget should go [00:03:48] where government budget should go what people should do in the legislature [00:03:50] what people should do in the legislature and the courts [00:03:51] and the courts around how to resolve the technical [00:03:54] around how to resolve the technical questions and policy questions and the [00:03:55] questions and policy questions and the legal questions that arise [00:03:58] legal questions that arise and what you're going to find is that [00:03:59] and what you're going to find is that your technical knowledge is extremely [00:04:01] your technical knowledge is extremely relevant to a lot of these crucial [00:04:03] relevant to a lot of these crucial issues but at the same time [00:04:05] issues but at the same time you need to round out that knowledge by [00:04:07] you need to round out that knowledge by understanding a little bit about the [00:04:08] understanding a little bit about the legal system and about organizations [00:04:11] legal system and about organizations so [00:04:12] so the bottom line really is that i'm going [00:04:14] the bottom line really is that i'm going to share with you [00:04:16] to share with you a lot of different messages but the core [00:04:18] a lot of different messages but the core message is that this technology that you [00:04:20] message is that this technology that you were learning to master has not only [00:04:22] were learning to master has not only benefits but risks [00:04:24] benefits but risks and in the course of [00:04:26] and in the course of implementing that technology society is [00:04:28] implementing that technology society is going to be [00:04:30] going to be shaping how that technology is used [00:04:32] shaping how that technology is used through the legal system and also [00:04:33] through the legal system and also through organizations through the [00:04:35] through organizations through the associations the institutions the groups [00:04:38] associations the institutions the groups but especially the firms the agencies [00:04:40] but especially the firms the agencies that so many of us are going to work in [00:04:42] that so many of us are going to work in law firms government agencies [00:04:45] law firms government agencies big corporations nonprofit organizations [00:04:48] big corporations nonprofit organizations now i want to tell you because i know [00:04:50] now i want to tell you because i know that it is difficult to hang on to your [00:04:52] that it is difficult to hang on to your attention but i'm going to try [00:04:54] attention but i'm going to try that there are some things that i [00:04:55] that there are some things that i absolutely want to remember have you [00:04:57] absolutely want to remember have you remember [00:04:58] remember if you remember one thing about my whole [00:05:00] if you remember one thing about my whole presentation [00:05:02] presentation it's that the impact of artificial [00:05:03] it's that the impact of artificial intelligence on the world on your daily [00:05:06] intelligence on the world on your daily life is a function of law and [00:05:08] life is a function of law and organizations it's not anything that [00:05:11] organizations it's not anything that actually acts directly by itself but it [00:05:14] actually acts directly by itself but it has to be mediated by some organization [00:05:16] has to be mediated by some organization by like what stanford does or what [00:05:19] by like what stanford does or what the you know republican party does or [00:05:21] the you know republican party does or what [00:05:22] what the united nations does [00:05:24] the united nations does but also it's needed by legal rules and [00:05:27] but also it's needed by legal rules and along the way you're going to find that [00:05:29] along the way you're going to find that we might sometimes talk as though we're [00:05:31] we might sometimes talk as though we're discussing the possibility of developing [00:05:33] discussing the possibility of developing legal rules that will apply to ai well [00:05:35] legal rules that will apply to ai well i'm here to suggest you that many of [00:05:37] i'm here to suggest you that many of those rules already exist the question [00:05:38] those rules already exist the question is just how to translate them to this [00:05:40] is just how to translate them to this context [00:05:41] context if i can convince you to remember two [00:05:43] if i can convince you to remember two things and not just one [00:05:45] things and not just one i'd like you to remember the point above [00:05:47] i'd like you to remember the point above but then also i want you to remember [00:05:48] but then also i want you to remember some crucial terminology and that is the [00:05:51] some crucial terminology and that is the techniques of ai like machine learning [00:05:55] techniques of ai like machine learning are different from ai systems or [00:05:57] are different from ai systems or applications and these are the [00:05:58] applications and these are the mechanisms obviously that instantiate [00:06:00] mechanisms obviously that instantiate the techniques that are attached to a [00:06:02] the techniques that are attached to a user interface i'll say more about this [00:06:04] user interface i'll say more about this later and that actually spit out [00:06:06] later and that actually spit out information [00:06:08] information recommendations [00:06:10] recommendations insights that people will then act on [00:06:13] insights that people will then act on and if miraculously enough i can get you [00:06:14] and if miraculously enough i can get you to remember three things and this is the [00:06:17] to remember three things and this is the last thing i really want you to remember [00:06:18] last thing i really want you to remember for sure it's the previous two points [00:06:21] for sure it's the previous two points plus that [00:06:22] plus that point that law is kind of merging with [00:06:25] point that law is kind of merging with the design and policy challenges that [00:06:27] the design and policy challenges that are implicit in ai so what i'm going to [00:06:28] are implicit in ai so what i'm going to end up is by telling you that lawyers [00:06:31] end up is by telling you that lawyers are becoming more and more a little bit [00:06:32] are becoming more and more a little bit like people like you [00:06:34] like people like you who are [00:06:35] who are trying to wrap their minds around [00:06:38] trying to wrap their minds around machine learning supervised learning [00:06:39] machine learning supervised learning unsupervised learning reinforcement [00:06:40] unsupervised learning reinforcement learning and in the same way [00:06:43] learning and in the same way you and your community the people who [00:06:44] you and your community the people who are the technical experts are [00:06:46] are the technical experts are increasingly pushing to ask questions [00:06:48] increasingly pushing to ask questions like what is the right way to use this [00:06:50] like what is the right way to use this technology what do we want it to do and [00:06:52] technology what do we want it to do and not do [00:06:53] not do so with that as background [00:06:55] so with that as background let me acknowledge [00:06:56] let me acknowledge more explicitly the benefits side of the [00:06:59] more explicitly the benefits side of the ai technology you're learning to master [00:07:01] ai technology you're learning to master because if we don't then we're going to [00:07:03] because if we don't then we're going to get a pretty distorted picture [00:07:06] get a pretty distorted picture if you were physically on campus right [00:07:08] if you were physically on campus right now and you were walking around stanford [00:07:09] now and you were walking around stanford you could go to 10 different places on [00:07:11] you could go to 10 different places on campus where really cool stuff is [00:07:14] campus where really cool stuff is happening that is relevant to real [00:07:15] happening that is relevant to real problems that people are facing around [00:07:17] problems that people are facing around the world and where ai techniques are [00:07:19] the world and where ai techniques are being used to try to make the world a [00:07:21] being used to try to make the world a little bit of a better place [00:07:22] little bit of a better place so let's take for example [00:07:24] so let's take for example the population of the world that is [00:07:27] the population of the world that is facing [00:07:28] facing serious nutritional stress meaning [00:07:30] serious nutritional stress meaning people who are at serious risk of [00:07:32] people who are at serious risk of starving [00:07:33] starving a generation ago that population was [00:07:35] a generation ago that population was much bigger than it is now but sadly [00:07:38] much bigger than it is now but sadly that population is still stubbornly [00:07:40] that population is still stubbornly large [00:07:40] large 700 million people are so faced serious [00:07:43] 700 million people are so faced serious food insecurity these are generally the [00:07:45] food insecurity these are generally the people living on a dollar a day or less [00:07:47] people living on a dollar a day or less you see some of the kids here [00:07:49] you see some of the kids here overwhelmingly that population is [00:07:50] overwhelmingly that population is concentrated in africa and in [00:07:53] concentrated in africa and in india and in asia [00:07:55] india and in asia but there are also some people in north [00:07:57] but there are also some people in north america and europe even who face food [00:07:58] america and europe even who face food insecurity [00:08:00] insecurity the different ways that we have to [00:08:03] the different ways that we have to allocate resources effectively to make [00:08:05] allocate resources effectively to make sure food doesn't go to waste to make [00:08:07] sure food doesn't go to waste to make supply chains more efficient to pinpoint [00:08:09] supply chains more efficient to pinpoint where there are problems in real time [00:08:12] where there are problems in real time and what's more a lot of this population [00:08:13] and what's more a lot of this population not only faces problems around food but [00:08:15] not only faces problems around food but also faces problems around education [00:08:18] also faces problems around education the distribution of access to high [00:08:20] the distribution of access to high quality education is incredibly unequal [00:08:23] quality education is incredibly unequal as you know we're all a part of that [00:08:24] as you know we're all a part of that system we take part in it [00:08:27] system we take part in it so [00:08:29] when i think about the future of both [00:08:30] when i think about the future of both nutrition and education [00:08:32] nutrition and education in a world that is more equitable and [00:08:34] in a world that is more equitable and more benign i cannot imagine that future [00:08:36] more benign i cannot imagine that future without some use of artificial [00:08:38] without some use of artificial intelligence techniques to democratize [00:08:40] intelligence techniques to democratize education to make the delivery of food [00:08:42] education to make the delivery of food more efficient to pinpoint problems in [00:08:43] more efficient to pinpoint problems in real time [00:08:45] real time in somewhat similar fashion this uh [00:08:47] in somewhat similar fashion this uh quirky uh set of four images you see [00:08:50] quirky uh set of four images you see here is an example of the work that dan [00:08:52] here is an example of the work that dan hoe my colleague at the law school is [00:08:54] hoe my colleague at the law school is doing with some colleagues [00:08:56] doing with some colleagues to use satellite imagery to pinpoint [00:08:58] to use satellite imagery to pinpoint where sources of pollution are [00:09:00] where sources of pollution are in [00:09:00] in much more accurate fashion than anything [00:09:03] much more accurate fashion than anything the government currently has really and [00:09:05] the government currently has really and what that would allow us to do is more [00:09:06] what that would allow us to do is more effectively [00:09:07] effectively cross-reference the self-reported data [00:09:10] cross-reference the self-reported data that comes from firms that claim to be [00:09:12] that comes from firms that claim to be complying with environmental law with [00:09:14] complying with environmental law with the reality and it takes some [00:09:17] the reality and it takes some fairly sophisticated but also in some [00:09:19] fairly sophisticated but also in some ways intuitive machine learning [00:09:21] ways intuitive machine learning techniques to make use of this visual [00:09:23] techniques to make use of this visual data [00:09:24] data and then you've got a picture of a [00:09:25] and then you've got a picture of a courtroom [00:09:26] courtroom this is not the kind of courtroom where [00:09:27] this is not the kind of courtroom where i sit because it's the trial courtroom [00:09:30] i sit because it's the trial courtroom mostly this is where trials are actually [00:09:31] mostly this is where trials are actually heard in superior court [00:09:33] heard in superior court the reality is that in california if we [00:09:35] the reality is that in california if we had more time and if we were in person i [00:09:38] had more time and if we were in person i would ask you to guess the number of [00:09:40] would ask you to guess the number of cases [00:09:41] cases we [00:09:42] we hear in california courts every year [00:09:44] hear in california courts every year and generally speaking when i ask that [00:09:46] and generally speaking when i ask that question people like 20 000 and i give a [00:09:48] question people like 20 000 and i give a shock response that's too long people [00:09:50] shock response that's too long people say okay [00:09:51] say okay 200 000 and my eyebrows still go up and [00:09:54] 200 000 and my eyebrows still go up and finally we'll get to like 800 000 well [00:09:56] finally we'll get to like 800 000 well the actual retail answer is like 6 [00:09:59] the actual retail answer is like 6 million cases a year [00:10:01] million cases a year so it will not shock you to hear that [00:10:04] so it will not shock you to hear that probably 40 50 [00:10:06] probably 40 50 of those cases the litigants are [00:10:08] of those cases the litigants are self-represented there are people like [00:10:10] self-represented there are people like you [00:10:10] you they don't have a lawyer [00:10:12] they don't have a lawyer they [00:10:14] they are trying their best to navigate an [00:10:15] are trying their best to navigate an incredibly complicated system i would [00:10:18] incredibly complicated system i would love to imagine a world where the [00:10:19] love to imagine a world where the distribution of legal knowledge is not [00:10:22] distribution of legal knowledge is not so restricted just to people who have a [00:10:24] so restricted just to people who have a stanford law degree or a similarly great [00:10:27] stanford law degree or a similarly great credential or can pay a lot of money for [00:10:29] credential or can pay a lot of money for a fancy lawyer but where software and ai [00:10:33] a fancy lawyer but where software and ai systems that you might help design can [00:10:34] systems that you might help design can help people navigate a very intricate [00:10:36] help people navigate a very intricate legal system [00:10:38] legal system but at the very far right you see a [00:10:40] but at the very far right you see a picture of an african-american man under [00:10:42] picture of an african-american man under the words criminal of justice and [00:10:44] the words criminal of justice and there's a question mark there [00:10:46] there's a question mark there and [00:10:47] and why i'm doing that is to highlight that [00:10:49] why i'm doing that is to highlight that this whole world we can imagine also has [00:10:52] this whole world we can imagine also has its risks and its downsides and to make [00:10:55] its risks and its downsides and to make this more concrete i want to focus on [00:10:57] this more concrete i want to focus on one person in particular [00:10:59] one person in particular the gentleman you see here robert [00:11:01] the gentleman you see here robert williams is one of [00:11:03] williams is one of many people whose lives are being [00:11:06] many people whose lives are being affected by the fact that artificial [00:11:08] affected by the fact that artificial intelligence systems are not just [00:11:09] intelligence systems are not just theoretical anymore in terms of their [00:11:11] theoretical anymore in terms of their practical application [00:11:13] practical application they're being used [00:11:14] they're being used in all kinds of settings including in [00:11:16] in all kinds of settings including in the criminal justice system [00:11:18] the criminal justice system so here he was one day in a suburb of [00:11:21] so here he was one day in a suburb of detroit when he gets arrested by police [00:11:24] detroit when he gets arrested by police he's told that he's being arrested [00:11:26] he's told that he's being arrested because he's suspected of committing [00:11:28] because he's suspected of committing larceny which is a fancy word for [00:11:30] larceny which is a fancy word for stealing [00:11:31] stealing robbing a store [00:11:33] robbing a store in detroit and it turns out that because [00:11:37] in detroit and it turns out that because the police were using an image [00:11:39] the police were using an image recognition system doing facial [00:11:41] recognition system doing facial recognition [00:11:42] recognition and they had a [00:11:44] and they had a base of data corpus of 49 million images [00:11:47] base of data corpus of 49 million images and the system indicated that the image [00:11:49] and the system indicated that the image from the security camera in detroit [00:11:51] from the security camera in detroit matched robert williams's picture [00:11:55] matched robert williams's picture he was arrested [00:11:57] he was arrested now you might ask did the police have [00:11:59] now you might ask did the police have another reason to suspect him did were [00:12:01] another reason to suspect him did were there outstanding arrest warrants for [00:12:03] there outstanding arrest warrants for him had he committed similar crimes in [00:12:05] him had he committed similar crimes in the past and the answer is no no no okay [00:12:07] the past and the answer is no no no okay so once he was arrested the police [00:12:09] so once he was arrested the police admitted that the photo is a little [00:12:11] admitted that the photo is a little blurry [00:12:12] blurry they admitted that they didn't have any [00:12:13] they admitted that they didn't have any other information about him and after a [00:12:15] other information about him and after a little bit more [00:12:17] little bit more discussion [00:12:18] discussion they ultimately [00:12:20] they ultimately agreed with mr williams that the picture [00:12:23] agreed with mr williams that the picture really just didn't look like him meaning [00:12:25] really just didn't look like him meaning the intuitive human response was like no [00:12:27] the intuitive human response was like no that doesn't seem to be you but the [00:12:29] that doesn't seem to be you but the algorithm says it's you so what do we do [00:12:32] algorithm says it's you so what do we do the answer is he was detained for 30 [00:12:34] the answer is he was detained for 30 hours [00:12:35] hours now [00:12:36] now i'm not suggesting that there aren't [00:12:37] i'm not suggesting that there aren't worse things than being detained for no [00:12:39] worse things than being detained for no reason for 30 hours [00:12:41] reason for 30 hours but i'll tell you i grew up on the u.s [00:12:42] but i'll tell you i grew up on the u.s mexico border and it was a fact of life [00:12:45] mexico border and it was a fact of life in my family that sometimes you need to [00:12:47] in my family that sometimes you need to cross over to the american side to go [00:12:49] cross over to the american side to go shopping or do something else like that [00:12:51] shopping or do something else like that and [00:12:52] and being detained even 45 minutes an hour [00:12:55] being detained even 45 minutes an hour an hour and 15 minutes all those things [00:12:56] an hour and 15 minutes all those things happen to me it's not very pleasant so [00:12:59] happen to me it's not very pleasant so you can imagine what it's like or you [00:13:00] you can imagine what it's like or you can begin to imagine if you tried what [00:13:02] can begin to imagine if you tried what it's like to be detained more than 38 [00:13:04] it's like to be detained more than 38 hours and they'll be told that it's [00:13:05] hours and they'll be told that it's because a computer made a mistake [00:13:07] because a computer made a mistake computer must have gotten it wrong was [00:13:08] computer must have gotten it wrong was the exact thing that he was told [00:13:11] the exact thing that he was told everything that i want to share with you [00:13:12] everything that i want to share with you from here on out in a way you could sum [00:13:14] from here on out in a way you could sum up by asking this narrow question of why [00:13:16] up by asking this narrow question of why did this happen to mr williams and what [00:13:18] did this happen to mr williams and what does that mean what are the remedies do [00:13:20] does that mean what are the remedies do we have a legal system a society where [00:13:22] we have a legal system a society where we can sort of disentangle the mistaken [00:13:24] we can sort of disentangle the mistaken uses from the correct ones manage the [00:13:27] uses from the correct ones manage the risks appropriately can we take [00:13:29] risks appropriately can we take seriously the fact that humans also make [00:13:31] seriously the fact that humans also make mistakes when they're looking for faces [00:13:33] mistakes when they're looking for faces i'll say more about that in a moment [00:13:36] i'll say more about that in a moment but i hope i can press you to think [00:13:37] but i hope i can press you to think about the situation with mr williams in [00:13:39] about the situation with mr williams in a little bit of a broader context [00:13:42] a little bit of a broader context because [00:13:42] because we could talk about criminal justice or [00:13:45] we could talk about criminal justice or we could talk about testing [00:13:47] we could talk about testing as you may know the intellectual the [00:13:49] as you may know the intellectual the international baccalaureate exams this [00:13:51] international baccalaureate exams this last year because of code were not [00:13:53] last year because of code were not actually given but instead students were [00:13:56] actually given but instead students were given a score that was their predicted [00:13:58] given a score that was their predicted score based on the previous portfolio [00:14:00] score based on the previous portfolio work they'd submitted we can talk about [00:14:03] work they'd submitted we can talk about testing uh [00:14:05] testing uh in remote settings where your facial [00:14:09] in remote settings where your facial your image is sort of being analyzed by [00:14:11] your image is sort of being analyzed by a camera that's trying to detect whether [00:14:13] a camera that's trying to detect whether you're cheating [00:14:14] you're cheating we can talk about insurance we can talk [00:14:17] we can talk about insurance we can talk about it [00:14:19] about it 36 other domains where this stuff is [00:14:21] 36 other domains where this stuff is really effective life and the broader [00:14:23] really effective life and the broader question really is [00:14:25] question really is what does the incident involving robert [00:14:27] what does the incident involving robert williams tell us about law [00:14:29] williams tell us about law about artificial intelligence and about [00:14:31] about artificial intelligence and about how society is changing [00:14:33] how society is changing and our legal system is changing in [00:14:35] and our legal system is changing in response to this technology [00:14:37] response to this technology so that is the tip of a very very big [00:14:38] so that is the tip of a very very big iceberg now let me acknowledge again [00:14:42] iceberg now let me acknowledge again the point about how [00:14:44] the point about how there's a lot about this subject that [00:14:47] there's a lot about this subject that goes deeper it doesn't just start with [00:14:49] goes deeper it doesn't just start with the history of artificial intelligence [00:14:51] the history of artificial intelligence it actually starts with the history of [00:14:55] it actually starts with the history of really modern society [00:14:57] really modern society now on the screen you see the picture of [00:14:59] now on the screen you see the picture of a very intense looking man [00:15:01] a very intense looking man named [00:15:02] named max weber [00:15:03] max weber for anybody who's ever taken a class on [00:15:05] for anybody who's ever taken a class on social theory or sociology his name [00:15:07] social theory or sociology his name might be familiar [00:15:09] might be familiar there's a lot i could say about him but [00:15:10] there's a lot i could say about him but here's the main point i want to make [00:15:12] here's the main point i want to make writing [00:15:13] writing in the very early 20th century max weber [00:15:16] in the very early 20th century max weber was looking around society and observing [00:15:18] was looking around society and observing things noting [00:15:20] things noting society didn't work the same way [00:15:22] society didn't work the same way then that it did 100 or 200 years ago [00:15:26] then that it did 100 or 200 years ago many many people worked inside [00:15:27] many many people worked inside organizations with a hierarchy [00:15:30] organizations with a hierarchy formal systems of authority [00:15:32] formal systems of authority organizations had a director an [00:15:33] organizations had a director an assistant director officials clerks [00:15:37] assistant director officials clerks and all of this observed max labor was a [00:15:40] and all of this observed max labor was a means to which the modern nation state [00:15:42] means to which the modern nation state processed information [00:15:43] processed information took it and decided what to do with it [00:15:45] took it and decided what to do with it rationally sometimes developing the [00:15:48] rationally sometimes developing the mechanisms to act as if by reflex by [00:15:51] mechanisms to act as if by reflex by recognizing the kind of problem and [00:15:52] recognizing the kind of problem and quickly delivering a response sometimes [00:15:54] quickly delivering a response sometimes by elevating it to people who could sit [00:15:56] by elevating it to people who could sit in an office talk in a conference room [00:15:58] in an office talk in a conference room and come up with a solution thinking [00:16:00] and come up with a solution thinking presumably logically and what max faber [00:16:03] presumably logically and what max faber noted much to the influence of people [00:16:05] noted much to the influence of people who came after him including yours truly [00:16:08] who came after him including yours truly is that these bureaucracies aspired to [00:16:11] is that these bureaucracies aspired to work like a machine right they were [00:16:14] work like a machine right they were trying to automate the process of [00:16:16] trying to automate the process of decision-making in some way to the point [00:16:18] decision-making in some way to the point that it could be predictable and [00:16:20] that it could be predictable and rational and [00:16:23] rational and weber pointed out that that was all well [00:16:25] weber pointed out that that was all well and good but there were going to be some [00:16:27] and good but there were going to be some problems along the way and in some ways [00:16:29] problems along the way and in some ways i'm here to tell you that many of the [00:16:30] i'm here to tell you that many of the problems that weber highlighted how we [00:16:33] problems that weber highlighted how we have a love-hate relationship with these [00:16:35] have a love-hate relationship with these bureaucracies on the one hand we think [00:16:36] bureaucracies on the one hand we think that they're inefficient that they're [00:16:38] that they're inefficient that they're rule-bound that they're not creative [00:16:39] rule-bound that they're not creative that they're frustrating that they're [00:16:41] that they're frustrating that they're slow but at the same time we can't live [00:16:43] slow but at the same time we can't live without them [00:16:44] without them that will end up illuminating in some [00:16:46] that will end up illuminating in some ways [00:16:47] ways some of the really interesting choices [00:16:49] some of the really interesting choices we have about how we use artificial [00:16:50] we have about how we use artificial intelligence maybe in some ways to [00:16:53] intelligence maybe in some ways to replace conventional bureaucracy but i [00:16:55] replace conventional bureaucracy but i would argue in other ways [00:16:56] would argue in other ways to replicate and in some ways channel [00:16:59] to replicate and in some ways channel some of the same [00:17:00] some of the same tragic conflicts and tensions [00:17:04] tragic conflicts and tensions now [00:17:05] now channeling max weber to some degree and [00:17:06] channeling max weber to some degree and also reflecting my own interest in ai in [00:17:09] also reflecting my own interest in ai in 2016 i wrote a piece [00:17:11] 2016 i wrote a piece that had the following punchline [00:17:13] that had the following punchline basically which was [00:17:14] basically which was sometimes we're going to deal with the [00:17:16] sometimes we're going to deal with the concerns we have about the role of [00:17:18] concerns we have about the role of artificial intelligence by suggesting [00:17:20] artificial intelligence by suggesting that really all we're building are [00:17:22] that really all we're building are recommendation engines not really that [00:17:25] recommendation engines not really that different from the way netflix works you [00:17:27] different from the way netflix works you may also like to watch this [00:17:29] may also like to watch this judge you may think that this person [00:17:31] judge you may think that this person deserves a harsher sentence but it's [00:17:34] deserves a harsher sentence but it's really up to you judge you don't have to [00:17:36] really up to you judge you don't have to be the one to decide or rather you don't [00:17:38] be the one to decide or rather you don't you have to be the one to decide we [00:17:39] you have to be the one to decide we don't have to be the ones to decide we [00:17:41] don't have to be the ones to decide we the ones who designed the ai system [00:17:43] the ones who designed the ai system we're just giving you a recommendation [00:17:45] we're just giving you a recommendation we're using these techniques to give you [00:17:47] we're using these techniques to give you a sense of what the likelihood is this [00:17:48] a sense of what the likelihood is this person will be offended [00:17:50] person will be offended and the point i was trying to make in [00:17:51] and the point i was trying to make in 2016 which seems now like a long time [00:17:53] 2016 which seems now like a long time ago [00:17:54] ago is that actually [00:17:56] is that actually that line between the computer program [00:17:58] that line between the computer program in particular the ai system that has [00:18:00] in particular the ai system that has sophisticated user interface capacities [00:18:03] sophisticated user interface capacities to sort of speak to you in natural [00:18:04] to sort of speak to you in natural language or to serve up the information [00:18:07] language or to serve up the information in a way that's easier for you to [00:18:08] in a way that's easier for you to assimilate it's really difficult to [00:18:10] assimilate it's really difficult to police that line between they're just [00:18:12] police that line between they're just supporting your decision and they're [00:18:14] supporting your decision and they're actually making the decision [00:18:16] actually making the decision now here's one place where i can [00:18:17] now here's one place where i can highlight my point at the very beginning [00:18:19] highlight my point at the very beginning about how [00:18:20] about how law emerges with organizations which [00:18:22] law emerges with organizations which merges with ai if you really want to [00:18:23] merges with ai if you really want to understand the effect right so if you [00:18:26] understand the effect right so if you want to know if an ai system is actually [00:18:28] want to know if an ai system is actually serving as a decision support tool [00:18:30] serving as a decision support tool rather than actually making a decision [00:18:32] rather than actually making a decision you're going to want to know the answer [00:18:33] you're going to want to know the answer to questions like well [00:18:35] to questions like well are the designers of that system liable [00:18:37] are the designers of that system liable if it turns out to make a recommendation [00:18:39] if it turns out to make a recommendation that's really really bad that results in [00:18:42] that's really really bad that results in people getting injured [00:18:44] people getting injured or conversely is the organization run in [00:18:47] or conversely is the organization run in a way that the decision making the [00:18:49] a way that the decision making the decision maker using the ai system [00:18:52] decision maker using the ai system is being audited and is being checked to [00:18:54] is being audited and is being checked to see if all of her decisions are just [00:18:56] see if all of her decisions are just exactly rubber stamping what the [00:18:58] exactly rubber stamping what the software does and if that's the case [00:19:00] software does and if that's the case what's the point of having the human [00:19:01] what's the point of having the human decision making in the loop anyway right [00:19:03] decision making in the loop anyway right so i'm giving you the sense that we're [00:19:04] so i'm giving you the sense that we're building up to this point of all these [00:19:06] building up to this point of all these conflicts and questions and meanwhile [00:19:09] conflicts and questions and meanwhile people like robert williams are getting [00:19:10] people like robert williams are getting arrested [00:19:12] arrested but now let me return to this point [00:19:14] but now let me return to this point about how [00:19:15] about how humans often are not great decision [00:19:17] humans often are not great decision makers too [00:19:18] makers too so we can think about where it is that [00:19:20] so we can think about where it is that human cognition [00:19:22] human cognition fails in terms of perception [00:19:24] fails in terms of perception we can think about how humans add up [00:19:26] we can think about how humans add up information and come up with a thought [00:19:28] information and come up with a thought or a decision we can think about what [00:19:30] or a decision we can think about what motivates humans whether it is that even [00:19:33] motivates humans whether it is that even if i have every reason in the world [00:19:35] if i have every reason in the world based on my job to be fair when i'm [00:19:38] based on my job to be fair when i'm working in a police station i'm deciding [00:19:40] working in a police station i'm deciding who to arrest if i have an improper [00:19:42] who to arrest if i have an improper motivation if i want to impress somebody [00:19:44] motivation if i want to impress somebody who happens to be on a ride-along with [00:19:46] who happens to be on a ride-along with me that day or if i really dislike the [00:19:49] me that day or if i really dislike the person who works in this particular area [00:19:51] person who works in this particular area of town and i want to arrest them [00:19:52] of town and i want to arrest them because i have a nefarious motivation [00:19:54] because i have a nefarious motivation that can mean that human decision making [00:19:57] that can mean that human decision making gets all messed up and even the legal [00:19:59] gets all messed up and even the legal arrangements we have to police human [00:20:01] arrangements we have to police human behavior [00:20:02] behavior can fall short [00:20:04] can fall short but my next slide which is probably the [00:20:06] but my next slide which is probably the messiest slide of the whole presentation [00:20:08] messiest slide of the whole presentation so you don't have to memorize it or even [00:20:10] so you don't have to memorize it or even read it all [00:20:11] read it all you know i can make these available to [00:20:13] you know i can make these available to you later but here's the punchline the [00:20:14] you later but here's the punchline the punchline is that the mere argument that [00:20:17] punchline is that the mere argument that humans are not as good as the [00:20:19] humans are not as good as the performance of ai systems in a discrete [00:20:22] performance of ai systems in a discrete test like facial recognition does not [00:20:24] test like facial recognition does not really answer the question of how you [00:20:26] really answer the question of how you want [00:20:27] want ai systems to be used by organizations [00:20:30] ai systems to be used by organizations to make decisions because the devil's [00:20:32] to make decisions because the devil's really in the details let me just pick [00:20:34] really in the details let me just pick two points here to highlight [00:20:35] two points here to highlight let's talk about perception [00:20:37] let's talk about perception so the field of [00:20:39] so the field of the neurophysiology of how vision works [00:20:42] the neurophysiology of how vision works is really really complicated and [00:20:44] is really really complicated and fascinating and it's not an accident i [00:20:46] fascinating and it's not an accident i would argue that some of the coolest [00:20:48] would argue that some of the coolest things that we have been learning [00:20:50] things that we have been learning about [00:20:51] about how to develop better image recognition [00:20:54] how to develop better image recognition systems in the ii space are influenced [00:20:56] systems in the ii space are influenced by what we learned from neuroscience [00:20:59] by what we learned from neuroscience but the fact that that's still a bit of [00:21:00] but the fact that that's still a bit of a mystery highlights to you that [00:21:03] a mystery highlights to you that we actually [00:21:04] we actually only understand a little bit about how [00:21:06] only understand a little bit about how humans make visual processing decisions [00:21:09] humans make visual processing decisions for example we know that it takes about [00:21:11] for example we know that it takes about 100 milliseconds for humans to perceive [00:21:13] 100 milliseconds for humans to perceive whether a picture reflects a person of [00:21:15] whether a picture reflects a person of one gender or another generally for [00:21:18] one gender or another generally for humans to pick up emotions for humans to [00:21:19] humans to pick up emotions for humans to recognize familiar faces but eyewitness [00:21:22] recognize familiar faces but eyewitness identification i do you remember that [00:21:24] identification i do you remember that involves unfamiliar faces do you [00:21:26] involves unfamiliar faces do you remember whether this image is showing [00:21:28] remember whether this image is showing you the person that you [00:21:30] you the person that you think you saw two weeks ago when the [00:21:32] think you saw two weeks ago when the glass was shattered and somebody came [00:21:34] glass was shattered and somebody came into your apartment at night and grabbed [00:21:36] into your apartment at night and grabbed your beautiful collection of baseball [00:21:38] your beautiful collection of baseball cards and left [00:21:40] cards and left that is a lot less exact and as one of [00:21:42] that is a lot less exact and as one of my colleagues explained in a dissenting [00:21:44] my colleagues explained in a dissenting opinion in a case called people we read [00:21:46] opinion in a case called people we read you know [00:21:47] you know we would be [00:21:48] we would be grossly inaccurate if we suggested that [00:21:50] grossly inaccurate if we suggested that that is a [00:21:52] that is a system of identification that works [00:21:54] system of identification that works really really well [00:21:56] really really well but then of course if you compare that [00:21:58] but then of course if you compare that to the way ai systems work on the one [00:22:00] to the way ai systems work on the one hand ai systems might be [00:22:02] hand ai systems might be in the lab much more accurate than [00:22:04] in the lab much more accurate than humans at picking out the similarity [00:22:06] humans at picking out the similarity between two images [00:22:08] between two images just at random not that are sort of [00:22:10] just at random not that are sort of known earlier the way humans might know [00:22:12] known earlier the way humans might know them earlier [00:22:13] them earlier but on the other hand the ability of [00:22:15] but on the other hand the ability of those systems outside of the lab to [00:22:18] those systems outside of the lab to operate effectively [00:22:19] operate effectively and particularly to detect emotions for [00:22:21] and particularly to detect emotions for example [00:22:23] example is not so great [00:22:25] is not so great these systems have [00:22:28] these systems have in a number of applications and [00:22:29] in a number of applications and instantiations real differences in how [00:22:32] instantiations real differences in how effectively they work for pictures of [00:22:35] effectively they work for pictures of people who identify as white rather than [00:22:36] people who identify as white rather than for blacks or asians and of course you [00:22:38] for blacks or asians and of course you have all kinds of other failure modes [00:22:40] have all kinds of other failure modes like hackie [00:22:41] like hackie and then of course we could talk about [00:22:44] and then of course we could talk about legal arrangements and here i would just [00:22:46] legal arrangements and here i would just note that [00:22:47] note that we humans have [00:22:49] we humans have hundreds of years of experience dealing [00:22:52] hundreds of years of experience dealing with human mistakes that's really what [00:22:54] with human mistakes that's really what the legal system is designed to do [00:22:56] the legal system is designed to do we are only learning now how to adapt [00:22:59] we are only learning now how to adapt our legal rules and standards to deal [00:23:01] our legal rules and standards to deal with the mistakes that machines make [00:23:04] with the mistakes that machines make we're not starting from scratch but it [00:23:06] we're not starting from scratch but it would be a mistake to assume that we [00:23:08] would be a mistake to assume that we figured out exactly how to do that [00:23:11] figured out exactly how to do that so [00:23:13] so now i want to make the point that when [00:23:14] now i want to make the point that when we are dealing with problems posed by ai [00:23:17] we are dealing with problems posed by ai in the legal system [00:23:19] in the legal system we are not starting from scratch and the [00:23:22] we are not starting from scratch and the best way i can make that point is to [00:23:23] best way i can make that point is to just highlight for those of you who are [00:23:25] just highlight for those of you who are vicariously interested in asking [00:23:27] vicariously interested in asking yourself what would it be like to go to [00:23:28] yourself what would it be like to go to law school what would that feel like [00:23:31] law school what would that feel like and you're thinking well maybe that [00:23:32] and you're thinking well maybe that would not be terrible maybe it might be [00:23:34] would not be terrible maybe it might be kind of fun [00:23:35] kind of fun i'll give you a flavor of some of the [00:23:36] i'll give you a flavor of some of the subjects that people learn about in law [00:23:38] subjects that people learn about in law school and it will not take a rocket [00:23:40] school and it will not take a rocket scientist that will not take a stanford [00:23:42] scientist that will not take a stanford computer science professor to see [00:23:44] computer science professor to see that these subjects that we cover in law [00:23:46] that these subjects that we cover in law school they're just literally touching [00:23:48] school they're just literally touching right up against ai [00:23:50] right up against ai already and it will continue to be the [00:23:52] already and it will continue to be the case [00:23:53] case so in an area of law called agency law [00:23:56] so in an area of law called agency law is where we figure out like if professor [00:23:57] is where we figure out like if professor sadik says to a ta i want you to go [00:24:00] sadik says to a ta i want you to go across campus and i want you to pick up [00:24:02] across campus and i want you to pick up this particular [00:24:04] this particular computer and i want you to carry it to [00:24:06] computer and i want you to carry it to the other side of campus and along the [00:24:08] the other side of campus and along the way the person picks up the computer but [00:24:10] way the person picks up the computer but then gets distracted drops the computer [00:24:12] then gets distracted drops the computer and kills a bird and it turns out that [00:24:14] and kills a bird and it turns out that that bird is the prize-winning bird of [00:24:15] that bird is the prize-winning bird of somebody's like bird collection or [00:24:17] somebody's like bird collection or whatever [00:24:18] whatever does professor sadik end up being [00:24:20] does professor sadik end up being responsible well agency law resolves [00:24:22] responsible well agency law resolves that kind of question when are you [00:24:24] that kind of question when are you responsible for the actions of others in [00:24:26] responsible for the actions of others in your organization of your agent [00:24:29] your organization of your agent now ordinarily agency law applies to the [00:24:31] now ordinarily agency law applies to the actions that you begin to put in motion [00:24:33] actions that you begin to put in motion that some other human being engages in [00:24:36] that some other human being engages in but you can totally see how this branch [00:24:38] but you can totally see how this branch of law is beginning to grapple with the [00:24:40] of law is beginning to grapple with the question of when you are responsible for [00:24:42] question of when you are responsible for the actions that you set in motion [00:24:44] the actions that you set in motion because you design an agent to do [00:24:46] because you design an agent to do something [00:24:47] something like to sort employee applicants and [00:24:49] like to sort employee applicants and then the agent does that the artificial [00:24:52] then the agent does that the artificial software based agent [00:24:54] software based agent okay so then you have my core field of [00:24:56] okay so then you have my core field of administrative law and legislation this [00:24:58] administrative law and legislation this is the law of what [00:25:00] is the law of what counts as sufficient justification for [00:25:02] counts as sufficient justification for any action of government [00:25:04] any action of government if the president signs an executive [00:25:06] if the president signs an executive order saying i don't want [00:25:08] order saying i don't want the census to keep on going until [00:25:11] the census to keep on going until december i want it to stop in october [00:25:14] december i want it to stop in october when does the president have the power [00:25:15] when does the president have the power to do that how does that power [00:25:19] to do that how does that power get into some conflict potentially with [00:25:21] get into some conflict potentially with the power of congress to pass a law [00:25:22] the power of congress to pass a law saying how long the sense is supposed to [00:25:24] saying how long the sense is supposed to continue you get the idea what if the [00:25:26] continue you get the idea what if the government says well you're going to [00:25:27] government says well you're going to have to move out of this home because we [00:25:29] have to move out of this home because we want to build a road through here what [00:25:31] want to build a road through here what right do you have to challenge that kind [00:25:32] right do you have to challenge that kind of action [00:25:34] of action so obviously the more and more that [00:25:36] so obviously the more and more that government decision making [00:25:37] government decision making involves reliance on machines the more [00:25:40] involves reliance on machines the more and more that this branch of law is [00:25:42] and more that this branch of law is going to have to deal with the question [00:25:43] going to have to deal with the question of [00:25:44] of what does it mean when the machine is [00:25:45] what does it mean when the machine is empowered to play a crucial role in that [00:25:47] empowered to play a crucial role in that government decision does that make it [00:25:48] government decision does that make it more reliable less reliable more fair [00:25:51] more reliable less reliable more fair less fair when can we do that one can we [00:25:53] less fair when can we do that one can we not do that [00:25:54] not do that last but certainly not least tort law [00:25:56] last but certainly not least tort law tort law is about who has a duty to whom [00:25:59] tort law is about who has a duty to whom what counts as a reasonable decision and [00:26:01] what counts as a reasonable decision and how do we attribute causal [00:26:02] how do we attribute causal responsibility for bad things that [00:26:04] responsibility for bad things that happen translation let's say you're back [00:26:06] happen translation let's say you're back on campus and sadly you get covert 19. [00:26:10] on campus and sadly you get covert 19. can you blame the university [00:26:11] can you blame the university when why can you blame the university [00:26:13] when why can you blame the university why can you not [00:26:15] why can you not or you know let's suppose forget covert [00:26:17] or you know let's suppose forget covert 19 for a moment let's suppose that [00:26:19] 19 for a moment let's suppose that you're in a lab and sadly your lab [00:26:20] you're in a lab and sadly your lab partner [00:26:21] partner decides to try to attack you and you [00:26:25] decides to try to attack you and you survive but you're asking well wasn't [00:26:27] survive but you're asking well wasn't the university responsible for making [00:26:29] the university responsible for making sure that i wasn't attacked that's [00:26:31] sure that i wasn't attacked that's toward law [00:26:32] toward law and you can imagine that as the [00:26:34] and you can imagine that as the information that is the fuel of modern [00:26:38] information that is the fuel of modern ai systems and the sort of fuel for [00:26:39] ai systems and the sort of fuel for machine learning increasingly flows to [00:26:42] machine learning increasingly flows to systems that are interconnected [00:26:43] systems that are interconnected questions about what a decision maker [00:26:45] questions about what a decision maker does with that information and whether [00:26:47] does with that information and whether that information makes the decision [00:26:49] that information makes the decision maker responsible for a different kind [00:26:51] maker responsible for a different kind of safety protection relative to [00:26:53] of safety protection relative to somebody that could be protected that [00:26:55] somebody that could be protected that all becomes more interesting [00:26:57] all becomes more interesting okay [00:26:59] okay so let me give you some context for how [00:27:00] so let me give you some context for how to think about these problems by just [00:27:02] to think about these problems by just acknowledging the history of ai is kind [00:27:04] acknowledging the history of ai is kind of long and it does not start with the [00:27:06] of long and it does not start with the birth of the internet it goes back [00:27:07] birth of the internet it goes back further in history to some of professor [00:27:10] further in history to some of professor sadik's colleagues in the computer [00:27:12] sadik's colleagues in the computer science department at stanford so i [00:27:14] science department at stanford so i could go on and on about this but my [00:27:16] could go on and on about this but my little uh [00:27:19] subtext in addition to what i want to [00:27:20] subtext in addition to what i want to share about the history of ai is to kind [00:27:22] share about the history of ai is to kind of quickly give you a sense of how in [00:27:23] of quickly give you a sense of how in the world i became super interested in [00:27:26] the world i became super interested in this beginning a little in college but [00:27:27] this beginning a little in college but then again when i worked in government [00:27:29] then again when i worked in government at the treasury department and even more [00:27:30] at the treasury department and even more so when i came back from working for [00:27:32] so when i came back from working for obama in uh 2010 [00:27:35] obama in uh 2010 so [00:27:36] so just look at those pictures for a moment [00:27:38] just look at those pictures for a moment you might recognize some of these faces [00:27:40] you might recognize some of these faces i'm sure you recognize at least one [00:27:42] i'm sure you recognize at least one the one where the woman had read as it [00:27:44] the one where the woman had read as it were but if you go back a little further [00:27:46] were but if you go back a little further what you're going to see in the picture [00:27:47] what you're going to see in the picture under 1950s is herbert simon [00:27:50] under 1950s is herbert simon a really really smart man whose parents [00:27:52] a really really smart man whose parents were refugees from germany who spent [00:27:55] were refugees from germany who spent most of his career at carnegie mellon [00:27:56] most of his career at carnegie mellon university [00:27:58] university and [00:27:59] and i mean come on you have to be pretty [00:28:01] i mean come on you have to be pretty smart if you [00:28:03] smart if you start as a political scientist but then [00:28:06] start as a political scientist but then become interested in psychology end up [00:28:08] become interested in psychology end up writing about economics and winning the [00:28:11] writing about economics and winning the nobel prize in economics and along the [00:28:13] nobel prize in economics and along the way you become like a major pioneer of [00:28:16] way you become like a major pioneer of ai that was herbert simon for you he was [00:28:18] ai that was herbert simon for you he was so brilliant i recommend to you any book [00:28:20] so brilliant i recommend to you any book or article ever written by herbert simon [00:28:22] or article ever written by herbert simon among other things [00:28:24] among other things one of the reasons he won the nobel [00:28:25] one of the reasons he won the nobel prize in economics is because he [00:28:27] prize in economics is because he developed the notion of bounded [00:28:28] developed the notion of bounded rationality which is at the core of what [00:28:29] rationality which is at the core of what we call behavioral economics now the [00:28:31] we call behavioral economics now the notion that you may not [00:28:34] notion that you may not you may be best modeled as a human not [00:28:36] you may be best modeled as a human not as somebody who's trying to optimize but [00:28:37] as somebody who's trying to optimize but as somebody who's trying to satisfy a [00:28:39] as somebody who's trying to satisfy a certain threshold and we can certainly [00:28:41] certain threshold and we can certainly use that insight to imagine how to [00:28:43] use that insight to imagine how to design a software agent and how to do [00:28:45] design a software agent and how to do machine learning which is one reason why [00:28:47] machine learning which is one reason why you can imagine his expertise and [00:28:49] you can imagine his expertise and brilliance got transferred over into ai [00:28:52] brilliance got transferred over into ai he's most associated ai with the story [00:28:54] he's most associated ai with the story of um [00:28:56] of um development of systems to do basically [00:28:59] development of systems to do basically like first order logic [00:29:01] like first order logic and mathematical type reasoning what [00:29:04] and mathematical type reasoning what some refer to as good old-fashioned ai [00:29:07] some refer to as good old-fashioned ai and i'll just note here that that was [00:29:09] and i'll just note here that that was really important but always [00:29:13] really important but always treat it as the holy grail maybe [00:29:14] treat it as the holy grail maybe something was elusive and not possible [00:29:17] something was elusive and not possible to realize the kind of instinctive [00:29:20] to realize the kind of instinctive almost automatic decision making and [00:29:22] almost automatic decision making and motion that now is so much at the [00:29:25] motion that now is so much at the cutting edge of what we are helping [00:29:27] cutting edge of what we are helping robots and ai systems to do [00:29:30] robots and ai systems to do the recognition piece was missing even [00:29:33] the recognition piece was missing even if the cognition piece at least around [00:29:35] if the cognition piece at least around how you prove theorems was possible to [00:29:38] how you prove theorems was possible to instantiate early on briefly by the [00:29:40] instantiate early on briefly by the 1970s the picture really is different [00:29:43] 1970s the picture really is different here that picture includes [00:29:46] here that picture includes ed fagenbaum somebody who is a colleague [00:29:48] ed fagenbaum somebody who is a colleague in computer science and always someone [00:29:51] in computer science and always someone fascinating to talk to somebody who's [00:29:52] fascinating to talk to somebody who's been one of my mentors a little bit and [00:29:54] been one of my mentors a little bit and trying to learn about ai and he's very [00:29:56] trying to learn about ai and he's very much associated with expert systems with [00:29:58] much associated with expert systems with taking insights from not only [00:30:02] taking insights from not only the work of herbert's simon who was [00:30:04] the work of herbert's simon who was actually [00:30:05] actually ed fagenbaum's [00:30:07] ed fagenbaum's but also from [00:30:09] but also from psychology and sociology and decision [00:30:12] psychology and sociology and decision theory to develop systems that could act [00:30:15] theory to develop systems that could act almost as experts and replicate [00:30:17] almost as experts and replicate knowledge in particular domains [00:30:19] knowledge in particular domains and then by the 2000s the real [00:30:21] and then by the 2000s the real phenomenon that changes everything and [00:30:23] phenomenon that changes everything and certainly gives rise to the prominence [00:30:25] certainly gives rise to the prominence of the person in the third picture [00:30:26] of the person in the third picture sheryl sandberg is the rise of the [00:30:28] sheryl sandberg is the rise of the internet because of course all this [00:30:30] internet because of course all this stuff about ai was happening partly in [00:30:32] stuff about ai was happening partly in academic labs [00:30:34] academic labs and partly in defense departments [00:30:37] and partly in defense departments but uh suddenly the ability we have to [00:30:40] but uh suddenly the ability we have to harvest and centralize [00:30:43] harvest and centralize billions and billions and billions of [00:30:45] billions and billions and billions of pieces of behavioral data from humans [00:30:47] pieces of behavioral data from humans and of course to do it in [00:30:49] and of course to do it in systems that work [00:30:51] systems that work faster and have access to more computing [00:30:53] faster and have access to more computing power lets us do some truly amazing [00:30:56] power lets us do some truly amazing things and i'll just note here that my [00:30:58] things and i'll just note here that my interest in this begins in college in 93 [00:31:00] interest in this begins in college in 93 when i was trying to understand how [00:31:01] when i was trying to understand how human decision making could be modeled [00:31:03] human decision making could be modeled so kind of very much the herbert simon [00:31:04] so kind of very much the herbert simon sort of work [00:31:06] sort of work but when i was working in treasury in [00:31:07] but when i was working in treasury in the late 90s [00:31:09] the late 90s wasn't lost on me that there was just so [00:31:10] wasn't lost on me that there was just so much [00:31:11] much data that the us government had gathered [00:31:13] data that the us government had gathered around financial transactions and i was [00:31:16] around financial transactions and i was interested in privacy as you might have [00:31:18] interested in privacy as you might have been but also interested in the idea [00:31:20] been but also interested in the idea that if that data were available how [00:31:22] that if that data were available how could it be used in a way that was [00:31:24] could it be used in a way that was efficient lawful [00:31:26] efficient lawful analytically sophisticated to detect [00:31:28] analytically sophisticated to detect really really problematic [00:31:30] really really problematic uses of the financial system including [00:31:32] uses of the financial system including to commit corruption for example to [00:31:34] to commit corruption for example to launder money and so on [00:31:36] launder money and so on and so i became exposed to some of the [00:31:38] and so i became exposed to some of the techniques that you're learning about in [00:31:39] techniques that you're learning about in this class right now [00:31:41] this class right now when i came back from the obama [00:31:42] when i came back from the obama administration in 2010 it struck me that [00:31:45] administration in 2010 it struck me that so many of the domains in which i was [00:31:46] so many of the domains in which i was working particularly around public [00:31:48] working particularly around public health and criminal justice [00:31:50] health and criminal justice were already being affected by early [00:31:52] were already being affected by early examples and applications of this stuff [00:31:55] examples and applications of this stuff so i became really interested in trying [00:31:58] so i became really interested in trying to understand how this stuff would [00:32:00] to understand how this stuff would affect [00:32:01] affect every aspect of decision making in law [00:32:03] every aspect of decision making in law and political science and try to learn [00:32:05] and political science and try to learn more about what you're learning about [00:32:07] more about what you're learning about right now [00:32:08] right now so here's where i want to highlight [00:32:10] so here's where i want to highlight where my own thinking went after i [00:32:12] where my own thinking went after i returned from the white house in 2010 [00:32:14] returned from the white house in 2010 it struck me that some of the most [00:32:16] it struck me that some of the most interesting work happening in ai in [00:32:18] interesting work happening in ai in universities but also in the private [00:32:19] universities but also in the private sector was about pushing the boundaries [00:32:21] sector was about pushing the boundaries of analytical techniques to discern [00:32:23] of analytical techniques to discern patterns [00:32:24] patterns unsupervised learning reinforcement [00:32:26] unsupervised learning reinforcement learning and so on and the breakthroughs [00:32:28] learning and so on and the breakthroughs were really extraordinary and they [00:32:29] were really extraordinary and they continue to be [00:32:30] continue to be but it was also striking to me that [00:32:32] but it was also striking to me that these techniques in their raw form were [00:32:35] these techniques in their raw form were not necessarily designed to influence or [00:32:36] not necessarily designed to influence or help non-experts they were not [00:32:38] help non-experts they were not necessarily designed to solve real-world [00:32:40] necessarily designed to solve real-world problems [00:32:41] problems so if instead you're looking at how ai [00:32:43] so if instead you're looking at how ai techniques get used [00:32:45] techniques get used like they were used in the arrest of [00:32:47] like they were used in the arrest of robert williams you're not dealing with [00:32:49] robert williams you're not dealing with ai techniques by themselves you're [00:32:51] ai techniques by themselves you're dealing with ai systems [00:32:53] dealing with ai systems which [00:32:54] which my co-author and i defined using [00:32:56] my co-author and i defined using probably a little too much mumbo jumbo a [00:32:58] probably a little too much mumbo jumbo a socio-technical embodiment of policy [00:33:01] socio-technical embodiment of policy codified an appropriate con [00:33:03] codified an appropriate con computational learning tool so a system [00:33:06] computational learning tool so a system to gather data and learn from the data [00:33:09] to gather data and learn from the data and embedded in a specific institutional [00:33:11] and embedded in a specific institutional context meaning it fits in an [00:33:12] context meaning it fits in an organization and is given a certain [00:33:15] organization and is given a certain purview [00:33:16] purview people who make decisions are told [00:33:18] people who make decisions are told here's how you can use the tool here's [00:33:20] here's how you can use the tool here's how you shouldn't use the tool right [00:33:22] how you shouldn't use the tool right and really what that means is when you [00:33:24] and really what that means is when you want to understand how ai is being used [00:33:26] want to understand how ai is being used in the real world [00:33:27] in the real world you have to understand relationships of [00:33:29] you have to understand relationships of power who gets to decide that the system [00:33:33] power who gets to decide that the system works the way it does and that somebody [00:33:35] works the way it does and that somebody can point to that system and claim that [00:33:38] can point to that system and claim that it embodies some kind of intelligence [00:33:42] why does this matter well it matters [00:33:44] why does this matter well it matters because now here we get to the other [00:33:46] because now here we get to the other side of the coin of the internet right [00:33:47] side of the coin of the internet right we're not in a world where this is [00:33:49] we're not in a world where this is mostly happening in the lab right now [00:33:50] mostly happening in the lab right now we're in a world where really important [00:33:52] we're in a world where really important things in the world are being affected [00:33:54] things in the world are being affected by ai i cannot give this lecture without [00:33:56] by ai i cannot give this lecture without pointing to the toothbrush that somebody [00:33:58] pointing to the toothbrush that somebody recently gave me as a gift that [00:34:00] recently gave me as a gift that advertises how it uses artificial [00:34:02] advertises how it uses artificial intelligence to learn how to brush your [00:34:04] intelligence to learn how to brush your teeth [00:34:04] teeth and this is the genesis of a concept i [00:34:07] and this is the genesis of a concept i call toothbrush maturity [00:34:09] call toothbrush maturity when technology gets to be so ubiquitous [00:34:11] when technology gets to be so ubiquitous to the point that it intersects even [00:34:13] to the point that it intersects even with a toothbrush then you know that [00:34:15] with a toothbrush then you know that you're dealing with something that has [00:34:17] you're dealing with something that has to be understood in its real world [00:34:19] to be understood in its real world context and not just in the theoretical [00:34:21] context and not just in the theoretical stories you can tell about how well it's [00:34:23] stories you can tell about how well it's going to work another example of this [00:34:25] going to work another example of this really though is that [00:34:27] really though is that the very large internet companies that [00:34:28] the very large internet companies that are around us in silicon valley have a [00:34:30] are around us in silicon valley have a market capitalization that you can't [00:34:32] market capitalization that you can't really explain without understanding [00:34:33] really explain without understanding just how well online advertising must be [00:34:36] just how well online advertising must be working and how much it's leveraging the [00:34:38] working and how much it's leveraging the enormous amounts of data that are [00:34:40] enormous amounts of data that are generated by the internet and analyzed [00:34:43] generated by the internet and analyzed by some of the ai techniques that you [00:34:45] by some of the ai techniques that you are learning about here where's this [00:34:48] are learning about here where's this going is so interesting and the short [00:34:50] going is so interesting and the short answer is i really wonder whether anyone [00:34:53] answer is i really wonder whether anyone really fully knows [00:34:55] really fully knows and that's true of almost any technology [00:34:56] and that's true of almost any technology right you can't always predict by the [00:34:58] right you can't always predict by the way i'm about three quarters of the way [00:34:59] way i'm about three quarters of the way through the presentation so just bear [00:35:01] through the presentation so just bear with me for a few more minutes but [00:35:03] with me for a few more minutes but you know i can point to the different [00:35:05] you know i can point to the different things here but the main point i want [00:35:06] things here but the main point i want this slide to highlight for you is that [00:35:08] this slide to highlight for you is that some of the breakthroughs that we're [00:35:10] some of the breakthroughs that we're seeing right now are not so much [00:35:13] seeing right now are not so much progress in terms of just more clever [00:35:15] progress in terms of just more clever algorithms or even more different data [00:35:17] algorithms or even more different data but it's partly that it's partly just [00:35:19] but it's partly that it's partly just leveraging more and more computing power [00:35:21] leveraging more and more computing power i wonder where that's going to go i [00:35:23] i wonder where that's going to go i don't know that that's sustainable [00:35:25] don't know that that's sustainable but i do think that if you want to get a [00:35:27] but i do think that if you want to get a sense of where this field is going think [00:35:30] sense of where this field is going think a little bit about language in [00:35:32] a little bit about language in particular [00:35:33] particular because [00:35:35] because if i go back and think a little bit [00:35:37] if i go back and think a little bit about how government agencies were [00:35:38] about how government agencies were making decisions in the late 90s when i [00:35:40] making decisions in the late 90s when i was there [00:35:41] was there most of the expert analysis was being [00:35:44] most of the expert analysis was being done using techniques like probit logit [00:35:47] done using techniques like probit logit econometrics like regression stuff [00:35:48] econometrics like regression stuff you're going to be learning about in [00:35:49] you're going to be learning about in this class but it was being mediated [00:35:51] this class but it was being mediated through humans presenting to each other [00:35:54] through humans presenting to each other what ar systems may increasingly have [00:35:55] what ar systems may increasingly have the capacity to do is to use those very [00:35:57] the capacity to do is to use those very same techniques but to then communicate [00:35:59] same techniques but to then communicate with the user in a way that is adaptive [00:36:01] with the user in a way that is adaptive to the human [00:36:03] to the human and able to leverage [00:36:05] and able to leverage language in a way that previously [00:36:07] language in a way that previously software did not do so that persuasive [00:36:10] software did not do so that persuasive ability of software is something we have [00:36:12] ability of software is something we have never really seen before and as we have [00:36:15] never really seen before and as we have more effective use of compute and [00:36:16] more effective use of compute and greater use of compute i think the feats [00:36:19] greater use of compute i think the feats that will be possible when you sort of [00:36:20] that will be possible when you sort of marry up the gt3 type stuff with the [00:36:23] marry up the gt3 type stuff with the analytics will be very different which [00:36:25] analytics will be very different which is to say a lot of humans who are [00:36:26] is to say a lot of humans who are consuming the output are not necessarily [00:36:29] consuming the output are not necessarily going to be in a great position to be [00:36:30] going to be in a great position to be very sophisticated arbiters of whether [00:36:32] very sophisticated arbiters of whether what they're being told or recommended [00:36:34] what they're being told or recommended is accurate or not [00:36:36] is accurate or not just to wrap up [00:36:38] just to wrap up there are all kinds of interesting [00:36:39] there are all kinds of interesting intersections now about the law and [00:36:42] intersections now about the law and ai and policy problems that result [00:36:45] ai and policy problems that result i want to make a pitch to you this is [00:36:46] i want to make a pitch to you this is kind of tentative i'm not as certain [00:36:48] kind of tentative i'm not as certain about this as i am about other things [00:36:50] about this as i am about other things that we're actually having this really [00:36:52] that we're actually having this really weird bifurcated bimodal distribution of [00:36:55] weird bifurcated bimodal distribution of attention to the problems where some [00:36:57] attention to the problems where some problems now are so familiar [00:37:00] problems now are so familiar that we don't necessarily know how to [00:37:01] that we don't necessarily know how to solve them but you will hear the [00:37:02] solve them but you will hear the buzzwords very often explainability [00:37:04] buzzwords very often explainability interpretability [00:37:06] interpretability bias [00:37:07] bias privacy [00:37:08] privacy etc and these problems i think of as not [00:37:12] etc and these problems i think of as not medium to long-term problems they are [00:37:13] medium to long-term problems they are present day problems they have hit [00:37:15] present day problems they have hit already just ask robert williams [00:37:18] already just ask robert williams and then when you see an interview with [00:37:20] and then when you see an interview with like elon musk [00:37:22] like elon musk then you're going to hear about [00:37:23] then you're going to hear about catastrophic or existential risk [00:37:26] catastrophic or existential risk i think that it would probably be a big [00:37:28] i think that it would probably be a big mistake to ignore catastrophic of [00:37:30] mistake to ignore catastrophic of existence or existential risk much as i [00:37:33] existence or existential risk much as i would have argued in the 1960s if i'd [00:37:35] would have argued in the 1960s if i'd been alive then and an adult then [00:37:38] been alive then and an adult then anybody who was interested in the future [00:37:39] anybody who was interested in the future of fossil fuels even if we didn't have [00:37:41] of fossil fuels even if we didn't have all the science [00:37:42] all the science would probably be [00:37:44] would probably be making a mistake if they ignored [00:37:46] making a mistake if they ignored completely what the risks might be if [00:37:47] completely what the risks might be if they were trying to understand the risk [00:37:48] they were trying to understand the risk systemically for the planet [00:37:50] systemically for the planet of [00:37:51] of the use at scale of these techniques for [00:37:53] the use at scale of these techniques for producing energy once the rest of the [00:37:56] producing energy once the rest of the world meaning poor people in indonesia [00:37:59] world meaning poor people in indonesia india and africa and china began to [00:38:01] india and africa and china began to demand [00:38:02] demand the level of consumption of energy that [00:38:03] the level of consumption of energy that americans and europeans have taken had [00:38:05] americans and europeans have taken had taken for granted right [00:38:08] taken for granted right but i still think that in some ways the [00:38:10] but i still think that in some ways the catastrophic or existential risk piece [00:38:13] catastrophic or existential risk piece is not a risk that i believe the world [00:38:16] is not a risk that i believe the world is likely to be facing [00:38:18] is likely to be facing in five years or eight years or ten [00:38:20] in five years or eight years or ten years [00:38:21] years that's just maybe um something we can go [00:38:23] that's just maybe um something we can go into in the q a about why that is but i [00:38:25] into in the q a about why that is but i suspect that the the level of delegation [00:38:28] suspect that the the level of delegation we have already engaged in the ai [00:38:29] we have already engaged in the ai systems doesn't get to the point where [00:38:31] systems doesn't get to the point where they can [00:38:32] they can protect their purview and power without [00:38:34] protect their purview and power without our intervention as well as they might [00:38:37] our intervention as well as they might someday [00:38:38] someday and obviously that requires for the [00:38:40] and obviously that requires for the thinking but that leaves some seriously [00:38:43] thinking but that leaves some seriously interesting issues that i think really [00:38:46] interesting issues that i think really deserve attention more in the short term [00:38:48] deserve attention more in the short term for one [00:38:50] for one this question of where causal [00:38:51] this question of where causal responsibility lies [00:38:53] responsibility lies when a system that deploys ai [00:38:57] when a system that deploys ai acts in a way that is not safe think [00:38:59] acts in a way that is not safe think about the autonomous vehicle but not [00:39:02] about the autonomous vehicle but not only the autonomous vehicle think about [00:39:04] only the autonomous vehicle think about the ai system in a large company that [00:39:06] the ai system in a large company that increasingly is making financial [00:39:08] increasingly is making financial decisions [00:39:09] decisions reviewed by humans perhaps needed by [00:39:10] reviewed by humans perhaps needed by humans but increasingly in an autonomous [00:39:12] humans but increasingly in an autonomous way i think that problems involving [00:39:15] way i think that problems involving power and collective action are really [00:39:17] power and collective action are really interesting in this space so if you're [00:39:19] interesting in this space so if you're running a large company and suddenly [00:39:22] running a large company and suddenly 27 of those jobs are now going to be [00:39:26] 27 of those jobs are now going to be or the functions done by different [00:39:28] or the functions done by different people in different jobs are now going [00:39:29] people in different jobs are now going to be done by ai systems how does that [00:39:32] to be done by ai systems how does that redistribute power within the [00:39:33] redistribute power within the organization [00:39:35] organization how does the advent of lethal autonomous [00:39:38] how does the advent of lethal autonomous weapons influence the distribution of [00:39:40] weapons influence the distribution of power in geopolitics for example how [00:39:42] power in geopolitics for example how does it empower countries with smaller [00:39:43] does it empower countries with smaller armies and so on [00:39:46] armies and so on another point which is familiar to [00:39:47] another point which is familiar to people working on cars in particular is [00:39:50] people working on cars in particular is that precision can spur disagreement [00:39:52] that precision can spur disagreement right now a lot of legal rules are [00:39:54] right now a lot of legal rules are written in fairly general terms [00:39:56] written in fairly general terms which is to say [00:39:59] which is to say humans are not supposed to drive when [00:40:00] humans are not supposed to drive when they are impaired they're supposed to [00:40:02] they are impaired they're supposed to engage in driving that shows reasonable [00:40:05] engage in driving that shows reasonable care etc etc these are fairly vague [00:40:08] care etc etc these are fairly vague descriptions in the court figure out [00:40:10] descriptions in the court figure out what that means in particular fact [00:40:11] what that means in particular fact patterns with the help of a jury [00:40:13] patterns with the help of a jury but when you can actually program [00:40:16] but when you can actually program an automated system to make split-second [00:40:19] an automated system to make split-second decisions that are extremely precise [00:40:21] decisions that are extremely precise about when and how to prioritize [00:40:24] about when and how to prioritize exposing some smaller number of humans [00:40:27] exposing some smaller number of humans to risk when you can save a larger [00:40:29] to risk when you can save a larger number of humans like a variation of a [00:40:30] number of humans like a variation of a trolley problem that will spur [00:40:32] trolley problem that will spur disagreements that didn't exist before [00:40:34] disagreements that didn't exist before just like mapping technologies when they [00:40:36] just like mapping technologies when they developed and became more precise [00:40:38] developed and became more precise spurred disagreements between different [00:40:40] spurred disagreements between different countries [00:40:41] countries that previously shared borders in very [00:40:43] that previously shared borders in very inhospitable locations when the border [00:40:45] inhospitable locations when the border could really not be [00:40:46] could really not be traced with quite as much detail and [00:40:48] traced with quite as much detail and specificity [00:40:51] specificity just to mention maybe one or two last [00:40:53] just to mention maybe one or two last quick things on this slide [00:40:55] quick things on this slide i think it's going to be really [00:40:56] i think it's going to be really interesting as ai systems [00:40:59] interesting as ai systems pose the question of what it means [00:41:02] pose the question of what it means to maximize social welfare like how do [00:41:04] to maximize social welfare like how do you design a system that is going to [00:41:05] you design a system that is going to have its core attribute this is what you [00:41:07] have its core attribute this is what you want to do and some people are trying to [00:41:09] want to do and some people are trying to do that it's going to try to keep humans [00:41:11] do that it's going to try to keep humans safe [00:41:12] safe or it's going to try to avoid doing [00:41:14] or it's going to try to avoid doing anything that will imperil too many [00:41:15] anything that will imperil too many people [00:41:16] people taking human values and turning them [00:41:18] taking human values and turning them into code is actually really really [00:41:20] into code is actually really really difficult [00:41:22] difficult and it is related to the process through [00:41:24] and it is related to the process through which humans think about change and [00:41:26] which humans think about change and conflict which is to say [00:41:27] conflict which is to say we often deal with conflict or [00:41:29] we often deal with conflict or institutions like courts and [00:41:30] institutions like courts and legislatures increasingly as we deal [00:41:32] legislatures increasingly as we deal with conflict or machines we'll have to [00:41:35] with conflict or machines we'll have to program machines to help diffuse [00:41:37] program machines to help diffuse conflict and not only to point out how [00:41:39] conflict and not only to point out how two views that seem to be very similar [00:41:41] two views that seem to be very similar are actually in tension with each other [00:41:44] are actually in tension with each other all right so we've gotten to my very [00:41:45] all right so we've gotten to my very last slide i'll end here there's [00:41:47] last slide i'll end here there's probably too much text on it but here's [00:41:49] probably too much text on it but here's what i want to highlight [00:41:51] what i want to highlight if you are listening to this lecture and [00:41:53] if you are listening to this lecture and you're thinking i hope that part of my [00:41:54] you're thinking i hope that part of my career is spent thinking about how i can [00:41:57] career is spent thinking about how i can help move ai and the design of ai so [00:42:00] help move ai and the design of ai so that it is socially beneficial [00:42:04] that it is socially beneficial i want to highlight to you that that is [00:42:06] i want to highlight to you that that is actually really difficult to define it [00:42:08] actually really difficult to define it probably in ways you've already [00:42:09] probably in ways you've already anticipated but i want to highlight in [00:42:11] anticipated but i want to highlight in particular a tension between two [00:42:13] particular a tension between two different ways of thinking of what the [00:42:14] different ways of thinking of what the social good is for purposes of ai and [00:42:17] social good is for purposes of ai and pretty much everything else [00:42:19] pretty much everything else in one version of what it means to work [00:42:21] in one version of what it means to work for the social good you basically [00:42:23] for the social good you basically develop systems that increasingly are [00:42:25] develop systems that increasingly are good at giving people what they want [00:42:27] good at giving people what they want what they say they want but especially [00:42:28] what they say they want but especially what their behavior indicates that they [00:42:31] what their behavior indicates that they like and that they value [00:42:32] like and that they value so the entertainment that they want the [00:42:34] so the entertainment that they want the products that they want the classes that [00:42:36] products that they want the classes that they want the kind of teaching that [00:42:39] they want the kind of teaching that maximizes student evaluation you know [00:42:41] maximizes student evaluation you know feedback and so on [00:42:42] feedback and so on but of course you know part of what [00:42:44] but of course you know part of what makes life so interesting is that [00:42:45] makes life so interesting is that there's a separation sometimes between [00:42:47] there's a separation sometimes between what people say they want [00:42:49] what people say they want and what they actually want or what [00:42:51] and what they actually want or what people say they want and what they do or [00:42:53] people say they want and what they do or for that matter what people want at time [00:42:55] for that matter what people want at time one when you started listening to this [00:42:57] one when you started listening to this lecture and what you want right now it's [00:42:58] lecture and what you want right now it's probably to stop right and once you [00:43:00] probably to stop right and once you start admitting to the idea that human [00:43:02] start admitting to the idea that human welfare is more complicated [00:43:05] welfare is more complicated and further once you start designing [00:43:06] and further once you start designing systems that are in real time shaping [00:43:08] systems that are in real time shaping human [00:43:10] human affect and culture and behavior it [00:43:12] affect and culture and behavior it actually becomes really really difficult [00:43:14] actually becomes really really difficult to know where to land like how to take [00:43:17] to know where to land like how to take advantage of the human knowledge you [00:43:18] advantage of the human knowledge you have to know like how to make humans [00:43:20] have to know like how to make humans better off [00:43:21] better off i don't know how we solve that problem [00:43:24] i don't know how we solve that problem but i do know that the things that i do [00:43:26] but i do know that the things that i do as a judge and the things that we do in [00:43:28] as a judge and the things that we do in law schools as lawyers and the things [00:43:31] law schools as lawyers and the things you do as technical experts are [00:43:33] you do as technical experts are increasingly merging and i don't think [00:43:35] increasingly merging and i don't think that we can answer these really tough [00:43:37] that we can answer these really tough questions without acknowledging that our [00:43:39] questions without acknowledging that our bodies of knowledge have a border that [00:43:41] bodies of knowledge have a border that is increasingly becoming really blurry [00:43:44] is increasingly becoming really blurry and with that [00:43:45] and with that i'm going to stop and thank you for [00:43:47] i'm going to stop and thank you for listening and i'm looking forward to [00:43:48] listening and i'm looking forward to your comments and questions and [00:43:50] your comments and questions and feedbacks concurring opinions to sense [00:43:53] feedbacks concurring opinions to sense whatever you want to share [00:44:02] so i think yeah so the way you're going [00:44:04] so i think yeah so the way you're going to go for [00:44:05] to go for questions and thoughts is either raise [00:44:08] questions and thoughts is either raise your hand or put it on a chat and then [00:44:12] your hand or put it on a chat and then they'll just call you or read the [00:44:13] they'll just call you or read the question great [00:44:23] i should add that if we were in a real [00:44:25] i should add that if we were in a real classroom [00:44:26] classroom what law professors do is we call on [00:44:28] what law professors do is we call on people so i would call on people but i [00:44:31] people so i would call on people but i can't really call them people so i'm [00:44:32] can't really call them people so i'm going to wait for your questions i have [00:44:34] going to wait for your questions i have a question you mentioned that a lot of [00:44:36] a question you mentioned that a lot of the laws right that will be used for ai [00:44:39] the laws right that will be used for ai actually already exists you can give an [00:44:41] actually already exists you can give an example for an existing law that you [00:44:43] example for an existing law that you think will be used for ai applications [00:44:46] think will be used for ai applications or systems [00:44:48] or systems pretty soon [00:44:52] absolutely great question thank you [00:44:54] absolutely great question thank you the short answer is let me let me start [00:44:56] the short answer is let me let me start with the common law which are the [00:44:58] with the common law which are the subjects that we teach [00:44:59] subjects that we teach law students in the first year that are [00:45:02] law students in the first year that are they're sort of defined by the fact that [00:45:04] they're sort of defined by the fact that the law is a little bit more judge-made [00:45:06] the law is a little bit more judge-made so someone like you as you as you [00:45:09] so someone like you as you as you learn [00:45:10] learn basic civics of a system like the [00:45:11] basic civics of a system like the american system [00:45:13] american system in most cases the legislature is elected [00:45:15] in most cases the legislature is elected to enact what the law is and then [00:45:18] to enact what the law is and then the executive branch will implement it [00:45:20] the executive branch will implement it and then the courts will judge and [00:45:22] and then the courts will judge and interpret like what the law means but [00:45:23] interpret like what the law means but there are certain branches of the law [00:45:25] there are certain branches of the law where in our anglo-american tradition [00:45:27] where in our anglo-american tradition the law is actually first developed by [00:45:30] the law is actually first developed by judges over time little by little case [00:45:32] judges over time little by little case by case and then the legislature will [00:45:34] by case and then the legislature will jump in and they will tweak the law in [00:45:36] jump in and they will tweak the law in this way that way so so those bodies of [00:45:38] this way that way so so those bodies of law include [00:45:39] law include contract law and tort law and both of [00:45:42] contract law and tort law and both of those are so clearly about ai at some [00:45:45] those are so clearly about ai at some level so contract law is the law of [00:45:48] level so contract law is the law of promises that we make to each other and [00:45:49] promises that we make to each other and when they're binding and when they're [00:45:50] when they're binding and when they're not right [00:45:51] not right when professor saudi promised you a good [00:45:53] when professor saudi promised you a good class i think she's delivering as [00:45:56] class i think she's delivering as are her colleagues in this class but [00:45:58] are her colleagues in this class but like if you say well the class wasn't [00:46:00] like if you say well the class wasn't good i've been defrauded [00:46:02] good i've been defrauded you know the law will try to determine [00:46:04] you know the law will try to determine whether there's like a actual legal [00:46:05] whether there's like a actual legal claim that you have or not [00:46:07] claim that you have or not so in the air context just imagine now [00:46:09] so in the air context just imagine now for a moment [00:46:10] for a moment that increasingly transactions are being [00:46:12] that increasingly transactions are being made by two ai systems making spot [00:46:15] made by two ai systems making spot contracts with each other in a split [00:46:17] contracts with each other in a split second because you've pre-programmed one [00:46:19] second because you've pre-programmed one to say as long as this stock falls below [00:46:21] to say as long as this stock falls below this price buy a whole bunch of it right [00:46:24] this price buy a whole bunch of it right and when the lines of supply and demand [00:46:26] and when the lines of supply and demand cross because two ai systems are talking [00:46:28] cross because two ai systems are talking to each other [00:46:29] to each other the deal is made but then it turns out [00:46:31] the deal is made but then it turns out that maybe this was class c shares of [00:46:33] that maybe this was class c shares of stock not class a [00:46:35] stock not class a and so like whose gets who's stuck you [00:46:38] and so like whose gets who's stuck you know dealing with the cost of a [00:46:39] know dealing with the cost of a transaction that is not what both [00:46:40] transaction that is not what both parties wanted so existing contract law [00:46:42] parties wanted so existing contract law has a lot to say about that [00:46:44] has a lot to say about that now tort law is an area of law that [00:46:47] now tort law is an area of law that frankly when i was a law student i [00:46:49] frankly when i was a law student i thought was really boring when i was a [00:46:51] thought was really boring when i was a law professor i thought was kind of [00:46:53] law professor i thought was kind of technically complicated not that [00:46:54] technically complicated not that interesting to me i had my hands full of [00:46:56] interesting to me i had my hands full of teaching other stuff as a judge i think [00:46:58] teaching other stuff as a judge i think of it as fascinating really really [00:47:00] of it as fascinating really really interesting that's the body of law [00:47:02] interesting that's the body of law governing [00:47:04] governing when your conduct harms other people and [00:47:07] when your conduct harms other people and when you were liable for that [00:47:09] when you were liable for that there is no way to have a discussion [00:47:11] there is no way to have a discussion about cars like automated cars without [00:47:13] about cars like automated cars without having tort law be a big part of that [00:47:16] having tort law be a big part of that so like you know [00:47:17] so like you know to what extent is the [00:47:20] to what extent is the designer of the software that runs the [00:47:23] designer of the software that runs the vision system for the car responsible [00:47:25] vision system for the car responsible for this person who gets run over versus [00:47:28] for this person who gets run over versus the person who runs the company that [00:47:30] the person who runs the company that tested the software versus the company [00:47:32] tested the software versus the company that designed the car and marketed for [00:47:33] that designed the car and marketed for you versus the driver that pushed the [00:47:35] you versus the driver that pushed the car to operate in really bad weather and [00:47:38] car to operate in really bad weather and and torque is really complicated but i [00:47:40] and torque is really complicated but i will just give you one quick insight [00:47:42] will just give you one quick insight which is very intuitive which is [00:47:44] which is very intuitive which is one workforce concept in tort law which [00:47:46] one workforce concept in tort law which has been on the books for a while [00:47:48] has been on the books for a while is the notion that the tortella should [00:47:50] is the notion that the tortella should pay attention other things being equal [00:47:52] pay attention other things being equal to who in that chain of causation is in [00:47:55] to who in that chain of causation is in the best position to have avoided the [00:47:56] the best position to have avoided the harm at the lowest cost least cost [00:47:59] harm at the lowest cost least cost avoidable and you can see how here that [00:48:01] avoidable and you can see how here that would be a really interesting and [00:48:02] would be a really interesting and important question like who could have [00:48:03] important question like who could have done a very little bit to avoid that [00:48:05] done a very little bit to avoid that person from getting run over there are [00:48:07] person from getting run over there are dozens of other areas of law but this to [00:48:09] dozens of other areas of law but this to me is a really good example of how when [00:48:11] me is a really good example of how when silicon valley says like oh we have to [00:48:12] silicon valley says like oh we have to decide if ai is going to be regulated [00:48:15] decide if ai is going to be regulated i think there's a little disconnect with [00:48:18] i think there's a little disconnect with reality [00:48:25] okay i think we have [00:48:28] okay i think we have go ahead yeah [00:48:34] yeah thanks i have a question so um [00:48:39] yeah thanks i have a question so um these systems are probably like very [00:48:41] these systems are probably like very opaque for judges and people who [00:48:44] opaque for judges and people who actually have to make decisions about [00:48:46] actually have to make decisions about you know what happened and how does it [00:48:48] you know what happened and how does it interact with case law and statutes and [00:48:50] interact with case law and statutes and whatever and my question is if as a [00:48:53] whatever and my question is if as a judge you have to answer a question of [00:48:54] judge you have to answer a question of fact about a particular ai system [00:48:57] fact about a particular ai system you just like have experts come in and [00:48:59] you just like have experts come in and testify and be like well nobody else has [00:49:01] testify and be like well nobody else has any hope of being able to understand it [00:49:03] any hope of being able to understand it so we're just going to take what they [00:49:04] so we're just going to take what they say as gospel or do like judges and [00:49:07] say as gospel or do like judges and clerks and things like actually try to [00:49:10] clerks and things like actually try to educate themselves on math behind how [00:49:12] educate themselves on math behind how the stuff works [00:49:14] the stuff works that is an excellent question and i have [00:49:17] that is an excellent question and i have good news and bad news what do you want [00:49:18] good news and bad news what do you want to hear first [00:49:21] to hear first uh let's let's hear the good news [00:49:23] uh let's let's hear the good news okay [00:49:24] okay the good news is that in some ways the [00:49:26] the good news is that in some ways the problem you've just described is not [00:49:28] problem you've just described is not completely new to law [00:49:30] completely new to law and in our system we have a kind of [00:49:32] and in our system we have a kind of interplay of [00:49:33] interplay of decision making that involves [00:49:37] decision making that involves jurors being [00:49:38] jurors being asked [00:49:39] asked subject to some instructions given by [00:49:41] subject to some instructions given by judges how to interpret an ambiguous [00:49:43] judges how to interpret an ambiguous fact pattern lay people who are supposed [00:49:45] fact pattern lay people who are supposed to be acting in their best [00:49:47] to be acting in their best conscientious effort to do the right [00:49:49] conscientious effort to do the right thing and follow the instructions and [00:49:51] thing and follow the instructions and then [00:49:53] then experts who in an adversarial system are [00:49:55] experts who in an adversarial system are selected carefully vetted carefully [00:49:58] selected carefully vetted carefully debriefed before they sort of come [00:49:59] debriefed before they sort of come before the court and then subject to [00:50:01] before the court and then subject to cross-examination who can shed light on [00:50:04] cross-examination who can shed light on things and help the jurors and the judge [00:50:06] things and help the jurors and the judge make decisions and then judges who are [00:50:08] make decisions and then judges who are supposed to resolve questions as a [00:50:10] supposed to resolve questions as a matter of law like questions that [00:50:11] matter of law like questions that ultimately are more about how to [00:50:12] ultimately are more about how to interpret the legal issue itself [00:50:14] interpret the legal issue itself and if for example a statute says [00:50:17] and if for example a statute says a quote-unquote highly autonomous system [00:50:20] a quote-unquote highly autonomous system shall be regulated subject to subpart j [00:50:22] shall be regulated subject to subpart j but then like is this a highly [00:50:24] but then like is this a highly autonomous system or not so that's a [00:50:25] autonomous system or not so that's a mixed question of law in fact for [00:50:26] mixed question of law in fact for example the judge might do part of it so [00:50:29] example the judge might do part of it so in particular the expert testimony piece [00:50:32] in particular the expert testimony piece will frequently involve experts who come [00:50:35] will frequently involve experts who come and pick at the really intricate like [00:50:37] and pick at the really intricate like math type questions that you're raising [00:50:40] math type questions that you're raising and i wouldn't say the system is perfect [00:50:42] and i wouldn't say the system is perfect but i would say the system works okay so [00:50:44] but i would say the system works okay so other contexts where the highly [00:50:45] other contexts where the highly technical [00:50:47] technical gets adjudicated kind of like it would [00:50:49] gets adjudicated kind of like it would in an ai context would be dna evidence [00:50:51] in an ai context would be dna evidence for example [00:50:53] for example base pairs and you know what does it [00:50:55] base pairs and you know what does it mean if you say [00:50:57] mean if you say the match is 1 in [00:50:59] the match is 1 in 1.7 billion how do you know that what's [00:51:03] 1.7 billion how do you know that what's the difference between [00:51:04] the difference between 1 and 1 3 and 3 million which is what [00:51:06] 1 and 1 3 and 3 million which is what the other experts said like we have a [00:51:08] the other experts said like we have a way of dealing with elections [00:51:10] way of dealing with elections here is the bad news though [00:51:12] here is the bad news though i think the bad news is that none of the [00:51:14] i think the bad news is that none of the technologies in the past [00:51:16] technologies in the past have had the potential that ai systems [00:51:19] have had the potential that ai systems do [00:51:20] do to talk back and i think that is like [00:51:23] to talk back and i think that is like not a small thing because what that [00:51:25] not a small thing because what that means is [00:51:26] means is these ai systems can be designed in a [00:51:28] these ai systems can be designed in a way that i think creates a little bit of [00:51:30] way that i think creates a little bit of a comforting illusion [00:51:32] a comforting illusion that even the experts understand what's [00:51:34] that even the experts understand what's really going on when they may be [00:51:36] really going on when they may be influenced more by [00:51:38] influenced more by design choices that might be really [00:51:41] design choices that might be really really hard to arbiter [00:51:43] really hard to arbiter where [00:51:44] where the ai system is actually [00:51:46] the ai system is actually maximizing not [00:51:49] maximizing not the level of accuracy it conveys to the [00:51:51] the level of accuracy it conveys to the user about the mathematical basis for a [00:51:53] user about the mathematical basis for a conclusion like that this person is [00:51:55] conclusion like that this person is likely to reoffend [00:51:57] likely to reoffend but instead [00:51:58] but instead maximizing the possibility that the [00:52:00] maximizing the possibility that the decision maker who's being influenced by [00:52:02] decision maker who's being influenced by the ai system is going to agree with it [00:52:04] the ai system is going to agree with it and that could even be an expert who's [00:52:05] and that could even be an expert who's testifying right so this is where [00:52:09] testifying right so this is where ai accountability begins to merge with [00:52:11] ai accountability begins to merge with cyber security with me like ultimately [00:52:14] cyber security with me like ultimately like cyber security problems are very [00:52:15] like cyber security problems are very much about how like if you go back [00:52:17] much about how like if you go back literally to supply chains and how you [00:52:19] literally to supply chains and how you can mess up [00:52:21] can mess up the very core architecture of how like a [00:52:22] the very core architecture of how like a microprocessor works the ways you can [00:52:25] microprocessor works the ways you can bias results that can become incredibly [00:52:27] bias results that can become incredibly difficult even for an expert to pick a [00:52:28] difficult even for an expert to pick a part and i don't think we have a great [00:52:30] part and i don't think we have a great answer for that like there may be [00:52:32] answer for that like there may be blockchain type like really fancy ways [00:52:35] blockchain type like really fancy ways of of using like [00:52:37] of of using like you know hardcore like [00:52:39] you know hardcore like encryption like stuff to sort of like [00:52:42] encryption like stuff to sort of like have greater [00:52:44] have greater confidence in results and to know when [00:52:47] confidence in results and to know when things have been messed with [00:52:49] things have been messed with but somewhere along the line there are [00:52:50] but somewhere along the line there are humans and humans are imperfect and i [00:52:52] humans and humans are imperfect and i just i worry about that piece of it a [00:52:54] just i worry about that piece of it a lot [00:52:59] awesome and [00:53:03] uh we have a couple of questions in the [00:53:04] uh we have a couple of questions in the chat [00:53:05] chat uh are you able to view the chat [00:53:08] uh are you able to view the chat uh let's see i see three participants [00:53:10] uh let's see i see three participants raised hand that's oh [00:53:16] [Music] [00:53:18] [Music] do you want to take both of them [00:53:19] do you want to take both of them together [00:53:20] together yes let me see from a global perspective [00:53:23] yes let me see from a global perspective any collaboration between countries [00:53:25] any collaboration between countries regard of ai adoption you see ai systems [00:53:28] regard of ai adoption you see ai systems from countries with different values [00:53:30] from countries with different values will be prevented from being adopted [00:53:32] will be prevented from being adopted okay and then similar to how developing [00:53:33] okay and then similar to how developing countries will strive to attain [00:53:36] countries will strive to attain the same level of life quality as [00:53:37] the same level of life quality as developed countries it seems just a [00:53:39] developed countries it seems just a matter of time before ai becomes the [00:53:41] matter of time before ai becomes the next thing [00:53:42] next thing how should we think about international [00:53:44] how should we think about international cooperational rivalry oh yes in ai [00:53:47] cooperational rivalry oh yes in ai development is there anything we can do [00:53:49] development is there anything we can do as technologists to help how should we [00:53:51] as technologists to help how should we navigate a time where u.s [00:53:54] navigate a time where u.s has marked ai software export [00:53:56] has marked ai software export restrictions okay great questions [00:53:59] restrictions okay great questions let me start with the second one a [00:54:01] let me start with the second one a little bit so [00:54:04] i think that [00:54:08] as technologists you can probably help [00:54:10] as technologists you can probably help by [00:54:11] by trying to make sure that the hype [00:54:13] trying to make sure that the hype doesn't run away with the discussion of [00:54:15] doesn't run away with the discussion of these issues [00:54:17] these issues so i could find people [00:54:19] so i could find people who are in the national security world i [00:54:22] who are in the national security world i can find people who are in the public [00:54:23] can find people who are in the public intellectual world [00:54:25] intellectual world who will see the relationship between [00:54:27] who will see the relationship between the us and other countries [00:54:30] the us and other countries so much through the lens of rivalry [00:54:32] so much through the lens of rivalry that very little space will be left [00:54:34] that very little space will be left for any collaboration between scientists [00:54:36] for any collaboration between scientists for example between [00:54:38] for example between civil society nonprofits that are trying [00:54:40] civil society nonprofits that are trying to reduce the risk of climate change by [00:54:42] to reduce the risk of climate change by using machine learning tools or whatever [00:54:45] using machine learning tools or whatever and i think that technologists will be [00:54:47] and i think that technologists will be important voices in saying [00:54:49] important voices in saying we can be legitimately concerned about [00:54:52] we can be legitimately concerned about how differences in technological [00:54:53] how differences in technological development can affect geopolitics or [00:54:56] development can affect geopolitics or relations between different countries [00:54:58] relations between different countries but not run to the conclusion that [00:54:59] but not run to the conclusion that everything is pure competition [00:55:01] everything is pure competition i bet you that there are people in this [00:55:03] i bet you that there are people in this class who are not born in the u.s well i [00:55:06] class who are not born in the u.s well i know that's the case because i wasn't [00:55:07] know that's the case because i wasn't born in the u.s and i know that's true [00:55:09] born in the u.s and i know that's true of others of you and to me that's like a [00:55:12] of others of you and to me that's like a really poignant reminder of the risks of [00:55:15] really poignant reminder of the risks of having the conversation about eye shut [00:55:17] having the conversation about eye shut down to the point that it becomes too [00:55:19] down to the point that it becomes too one-sided and too [00:55:21] one-sided and too too much just about national advancement [00:55:23] too much just about national advancement in no way do i want to deny [00:55:26] in no way do i want to deny and this will get me a little into this [00:55:27] and this will get me a little into this first to the earlier question [00:55:30] first to the earlier question that there were different agendas [00:55:31] that there were different agendas different goals different [00:55:34] different goals different geopolitical objectives of different [00:55:36] geopolitical objectives of different countries and that getting some [00:55:38] countries and that getting some advantage in ai technology can translate [00:55:40] advantage in ai technology can translate into potential military [00:55:42] into potential military and economic advantage so some balance [00:55:45] and economic advantage so some balance has to be struck and that requires some [00:55:47] has to be struck and that requires some careful discussion it requires some [00:55:49] careful discussion it requires some norms it requires some cooperation from [00:55:51] norms it requires some cooperation from universities because we think of [00:55:53] universities because we think of universities as working best when mostly [00:55:55] universities as working best when mostly things are pretty open in the university [00:55:57] things are pretty open in the university we share knowledge i learn from you all [00:55:59] we share knowledge i learn from you all you will learn from me etc [00:56:01] you will learn from me etc but the reality is sometimes careful [00:56:03] but the reality is sometimes careful languages have to be drawn [00:56:05] languages have to be drawn and ultimately that does reflect the [00:56:07] and ultimately that does reflect the reality that countries do have different [00:56:09] reality that countries do have different values about others and i'll just [00:56:11] values about others and i'll just mention one word that it is a simple way [00:56:13] mention one word that it is a simple way of making that point but there are other [00:56:14] of making that point but there are other examples i could give you [00:56:16] examples i could give you the word is privacy [00:56:18] the word is privacy so i can imagine many countries in the [00:56:20] so i can imagine many countries in the world that could argue either [00:56:24] world that could argue either that their populations simply have no [00:56:26] that their populations simply have no particular reason to value privacy the [00:56:28] particular reason to value privacy the way americans do or [00:56:30] way americans do or that whether their population values [00:56:32] that whether their population values privacy or not their law is such that [00:56:34] privacy or not their law is such that they prioritize other things and they're [00:56:36] they prioritize other things and they're simply going to gather a ton of data [00:56:37] simply going to gather a ton of data about a ton of stuff [00:56:39] about a ton of stuff and [00:56:40] and i don't think that data translates [00:56:43] i don't think that data translates automatically to greater power in the ai [00:56:44] automatically to greater power in the ai space but it is a significant advantage [00:56:46] space but it is a significant advantage if you have it like it'll be interesting [00:56:48] if you have it like it'll be interesting to ask in the coming months and years to [00:56:50] to ask in the coming months and years to what extent some combination of [00:56:52] what extent some combination of reinforcement learning and the [00:56:54] reinforcement learning and the generation of artificial sort of fake [00:56:56] generation of artificial sort of fake data [00:56:57] data like or fake you know information inputs [00:56:59] like or fake you know information inputs into like a reinforcement learning [00:57:01] into like a reinforcement learning algorithm can make up for just raw [00:57:03] algorithm can make up for just raw access to real world data but i still [00:57:06] access to real world data but i still think a corpus of actual human behavior [00:57:08] think a corpus of actual human behavior is quite something and if you simply [00:57:09] is quite something and if you simply didn't have to worry about privacy [00:57:12] didn't have to worry about privacy the insights you might get into the [00:57:14] the insights you might get into the expressions on people's faces when [00:57:16] expressions on people's faces when they're having a private conversation [00:57:17] they're having a private conversation about something incredibly sensitive or [00:57:20] about something incredibly sensitive or the fear you can see in somebody's eyes [00:57:22] the fear you can see in somebody's eyes when they begin to realize that they've [00:57:23] when they begin to realize that they've said something [00:57:24] said something on social media that is likely to bring [00:57:27] on social media that is likely to bring a knock on the door from the police [00:57:29] a knock on the door from the police is valuable particularly if your goal is [00:57:32] is valuable particularly if your goal is to try to not only [00:57:34] to try to not only not only improve the lot and the [00:57:36] not only improve the lot and the well-being of the population but to [00:57:37] well-being of the population but to control them and to limit the extent to [00:57:39] control them and to limit the extent to which they push against you so i think [00:57:42] which they push against you so i think this leaves us in a really challenging [00:57:43] this leaves us in a really challenging space like i really really would urge [00:57:45] space like i really really would urge all of us [00:57:46] all of us to highlight the importance of some [00:57:48] to highlight the importance of some public collaboration across borders on [00:57:51] public collaboration across borders on this [00:57:52] this we don't stand much of a chance i think [00:57:54] we don't stand much of a chance i think of getting really to where humanity [00:57:55] of getting really to where humanity needs to get to on so many crucial [00:57:57] needs to get to on so many crucial issues including ai safety by the way we [00:57:59] issues including ai safety by the way we don't have some sharing of information [00:58:01] don't have some sharing of information pure competition is going to drive a lot [00:58:03] pure competition is going to drive a lot of the dangerous and riskiest [00:58:06] of the dangerous and riskiest technological experimentation on the [00:58:07] technological experimentation on the ground at least for a while [00:58:09] ground at least for a while on the other hand i think we'd be naive [00:58:11] on the other hand i think we'd be naive to think that everybody shares interests [00:58:13] to think that everybody shares interests so [00:58:14] so some degree of building of norms and [00:58:16] some degree of building of norms and cooperation among communities of people [00:58:19] cooperation among communities of people who are in civil society in the world of [00:58:21] who are in civil society in the world of nonprofits or philanthropy or education [00:58:24] nonprofits or philanthropy or education i think will be really crucial [00:58:33] as a [00:58:34] as a question yes [00:58:37] question yes the general question this [00:58:39] the general question this comes to be real [00:58:40] comes to be real if i just say i've developed projects [00:58:42] if i just say i've developed projects and i have to connect all the [00:58:44] and i have to connect all the information from the network [00:58:46] information from the network uh [00:58:47] uh so what should i [00:58:49] so what should i share or take care of [00:58:52] share or take care of something like the copyright [00:58:54] something like the copyright privacy [00:58:56] privacy and the commission from the source [00:58:57] and the commission from the source society or even some citation i have to [00:59:00] society or even some citation i have to prepare [00:59:02] prepare this kind of thing [00:59:08] yeah yeah thank you i this is a really [00:59:10] yeah yeah thank you i this is a really big subject let me just try to abstract [00:59:12] big subject let me just try to abstract a little bit from your good question [00:59:14] a little bit from your good question you're basically [00:59:16] you're basically raising the broader question of how we [00:59:18] raising the broader question of how we might think of ownership and [00:59:20] might think of ownership and responsibility over data particularly as [00:59:23] responsibility over data particularly as people work together on these projects [00:59:25] people work together on these projects that mine huge amounts of data [00:59:28] that mine huge amounts of data and use it in maybe different ways i [00:59:30] and use it in maybe different ways i mean [00:59:32] mean i and i'll give you a short answer but [00:59:34] i and i'll give you a short answer but then i'll elaborate a little bit the [00:59:36] then i'll elaborate a little bit the short answer is increasingly the world [00:59:38] short answer is increasingly the world is waking up to the fact that control [00:59:40] is waking up to the fact that control over data really is control over [00:59:41] over data really is control over property in some ways so just as you [00:59:44] property in some ways so just as you might have a [00:59:46] might have a use agreement that says to somebody you [00:59:49] use agreement that says to somebody you can you can license the use of this [00:59:51] can you can license the use of this technology that i've developed let's say [00:59:53] technology that i've developed let's say a piece of hardware like a camera that [00:59:56] a piece of hardware like a camera that can [00:59:57] can see really really well at night but by [00:59:59] see really really well at night but by using this camera you are [01:00:02] using this camera you are agreeing not to use it to uh you know [01:00:05] agreeing not to use it to uh you know look into people's homes without the [01:00:06] look into people's homes without the permission or something so increasingly [01:00:09] permission or something so increasingly i would say there is a criss-crossing [01:00:11] i would say there is a criss-crossing regime of law some of it is state law [01:00:14] regime of law some of it is state law some of it is federal law pertaining to [01:00:16] some of it is federal law pertaining to particular classes of data like medical [01:00:17] particular classes of data like medical data that really highly regulates the [01:00:20] data that really highly regulates the use of data [01:00:21] use of data by the same token are there [01:00:23] by the same token are there opportunities still to harvest scrape [01:00:25] opportunities still to harvest scrape even data from say the public internet [01:00:28] even data from say the public internet that can then be used in different ways [01:00:29] that can then be used in different ways well of course and sometimes [01:00:32] well of course and sometimes that will allow breakthroughs to occur [01:00:34] that will allow breakthroughs to occur in ai [01:00:36] in ai but but this is where it gets really [01:00:37] but but this is where it gets really tricky because we're really in the midst [01:00:39] tricky because we're really in the midst now of developing [01:00:41] now of developing national and then eventually maybe [01:00:43] national and then eventually maybe global norms about what it means to [01:00:46] global norms about what it means to appropriately design ai systems that [01:00:48] appropriately design ai systems that will get rid of data that no longer are [01:00:50] will get rid of data that no longer are needed for the original purpose so let [01:00:53] needed for the original purpose so let me give you two competing perspectives [01:00:54] me give you two competing perspectives about that first one is that because ai [01:00:58] about that first one is that because ai systems are so capable of developing [01:01:01] systems are so capable of developing insights using the techniques you are [01:01:02] insights using the techniques you are learning in this class [01:01:04] learning in this class that discern patterns in the data that [01:01:07] that discern patterns in the data that human intuition would not have been able [01:01:09] human intuition would not have been able to detect [01:01:10] to detect big masses of data used in new ways is [01:01:13] big masses of data used in new ways is risky because it means that maybe you [01:01:15] risky because it means that maybe you end up getting embarrassed about people [01:01:17] end up getting embarrassed about people discovering that [01:01:19] discovering that the fact that you like a certain kind of [01:01:22] the fact that you like a certain kind of literature and that your eyes move in a [01:01:24] literature and that your eyes move in a certain way when you're in a [01:01:25] certain way when you're in a conversation mean that you have a really [01:01:27] conversation mean that you have a really short attention span and you can't be [01:01:29] short attention span and you can't be trusted with a certain kind of job right [01:01:31] trusted with a certain kind of job right so those questions are partly being [01:01:34] so those questions are partly being mediated with the respect to like do we [01:01:36] mediated with the respect to like do we not create norms that once data are used [01:01:39] not create norms that once data are used for the purpose for which they are [01:01:40] for the purpose for which they are collected we destroy the data [01:01:43] collected we destroy the data now i'll give you the perspective of me [01:01:44] now i'll give you the perspective of me the academic [01:01:46] the academic so a lot of my early work as a law [01:01:49] so a lot of my early work as a law professor was historical i would [01:01:51] professor was historical i would actually look at [01:01:52] actually look at old [01:01:53] old memos and documents that were going back [01:01:56] memos and documents that were going back to like presidential decisions made like [01:01:57] to like presidential decisions made like in the roosevelt administration where i [01:01:59] in the roosevelt administration where i was looking at what happened when [01:02:01] was looking at what happened when roseville was trying to reorganize the [01:02:02] roseville was trying to reorganize the government on the eve of world war ii [01:02:04] government on the eve of world war ii and how he was trying to protect certain [01:02:06] and how he was trying to protect certain programs from being defunded as the [01:02:09] programs from being defunded as the country was getting ready to go to war [01:02:10] country was getting ready to go to war by the way as a little subplot on that [01:02:12] by the way as a little subplot on that one of the things i learned about that i [01:02:14] one of the things i learned about that i was not expecting is that there was a [01:02:15] was not expecting is that there was a big biological weapons research program [01:02:18] big biological weapons research program that was being funded with white house [01:02:20] that was being funded with white house support despite the fact that that [01:02:22] support despite the fact that that arguably contravened certainly contrary [01:02:24] arguably contravened certainly contrary statements the white house had made and [01:02:26] statements the white house had made and arguably contravened you know certain [01:02:28] arguably contravened you know certain aspects of legal norms at the time but [01:02:31] aspects of legal norms at the time but long story short the point is [01:02:33] long story short the point is if the norm had been followed that you [01:02:34] if the norm had been followed that you only keep the data for its original [01:02:36] only keep the data for its original intended use and then get rid of it [01:02:39] intended use and then get rid of it where in the world would i be able to [01:02:40] where in the world would i be able to write the stuff that i did how would i [01:02:42] write the stuff that i did how would i be able to do it and you could say okay [01:02:44] be able to do it and you could say okay well those are presidential records [01:02:45] well those are presidential records that's different but in general the [01:02:47] that's different but in general the historians who write about how humans [01:02:50] historians who write about how humans lived 150 years ago are doing it with [01:02:52] lived 150 years ago are doing it with data that were not intended for [01:02:54] data that were not intended for historians [01:02:55] historians so i think we have to strike a balance [01:02:58] so i think we have to strike a balance but i would say you should assume [01:02:59] but i would say you should assume whenever you're dealing with data that [01:03:00] whenever you're dealing with data that there's probably some rule and if there [01:03:02] there's probably some rule and if there isn't some rule that's in the law it's [01:03:04] isn't some rule that's in the law it's probably in human subjects requirements [01:03:06] probably in human subjects requirements in a universal [01:03:17] yeah and my question is with regards to [01:03:20] yeah and my question is with regards to the out of lab and the in the models [01:03:24] the out of lab and the in the models working inside the lab which you [01:03:26] working inside the lab which you mentioned [01:03:27] mentioned so and uh so let's say if uh an [01:03:30] so and uh so let's say if uh an organization develops a model let's say [01:03:33] organization develops a model let's say it's for an auto [01:03:35] it's for an auto self-driving cars for example i'm just [01:03:37] self-driving cars for example i'm just picking it up [01:03:39] picking it up 99 of the time [01:03:42] 99 of the time it inside the lab the models work very [01:03:45] it inside the lab the models work very well [01:03:46] well and outside the lab you know it causes [01:03:50] and outside the lab you know it causes uh loss of a person a life [01:03:54] uh loss of a person a life is an example i'm taking an extreme here [01:03:57] is an example i'm taking an extreme here so if this kind of situation comes to [01:04:00] so if this kind of situation comes to you as a judge in your code [01:04:04] you as a judge in your code what is the decision making [01:04:06] what is the decision making which on the law side which you take [01:04:09] which on the law side which you take with regards to the ai model and what's [01:04:13] with regards to the ai model and what's the [01:04:14] the thinking which goes behind you [01:04:18] thinking which goes behind you to make this kind of [01:04:20] to make this kind of the judgment against [01:04:22] the judgment against the maker of the model or so it's [01:04:26] the maker of the model or so it's working 99 [01:04:28] working 99 of the time but there are failures [01:04:31] of the time but there are failures because you know the models are [01:04:33] because you know the models are true uh given what has been coded and [01:04:36] true uh given what has been coded and what the training has been it has been [01:04:38] what the training has been it has been done on so i'm just wondering the [01:04:41] done on so i'm just wondering the decision process what would uh [01:04:44] decision process what would uh so is it tied to the risk or which the [01:04:47] so is it tied to the risk or which the company has [01:04:48] company has done the due diligence [01:04:50] done the due diligence or [01:04:51] or what is the essence of [01:04:54] what is the essence of the responsibility and [01:04:57] the responsibility and so that was the broad questions thank [01:05:00] so that was the broad questions thank you [01:05:01] you thank you so [01:05:02] thank you so that's a another good opportunity for me [01:05:05] that's a another good opportunity for me to share with you how much existing law [01:05:07] to share with you how much existing law is already grappling with these issues [01:05:09] is already grappling with these issues that arise that are so relevant to ai [01:05:12] that arise that are so relevant to ai and particularly to the scaling up of ai [01:05:15] and particularly to the scaling up of ai outside the laboratory [01:05:17] outside the laboratory so i'll preface this by saying that [01:05:18] so i'll preface this by saying that because i'm a sitting judge i wouldn't [01:05:21] because i'm a sitting judge i wouldn't want you to feel like i'm telling you [01:05:23] want you to feel like i'm telling you exactly how i would decide the case if [01:05:25] exactly how i would decide the case if it came up because we actually have [01:05:27] it came up because we actually have cases that are not like what you're [01:05:28] cases that are not like what you're saying that are pending in the [01:05:29] saying that are pending in the california courts and i'm not supposed [01:05:31] california courts and i'm not supposed to say how decide them but i can tell [01:05:33] to say how decide them but i can tell you [01:05:35] you i can tell you in general the bodies of [01:05:36] i can tell you in general the bodies of law that are relevant to trying to deal [01:05:38] law that are relevant to trying to deal with this question [01:05:39] with this question and in what direction they've moved over [01:05:41] and in what direction they've moved over time [01:05:43] time so [01:05:44] so we have bodies of law particularly in [01:05:47] we have bodies of law particularly in tort law and in contract law [01:05:49] tort law and in contract law and in consumer protection law more [01:05:51] and in consumer protection law more generally that can use basically three [01:05:54] generally that can use basically three sorts of techniques to deal with the [01:05:56] sorts of techniques to deal with the risk that you're pointing to which [01:05:58] risk that you're pointing to which arises when you go from [01:06:00] arises when you go from highly efficacious behavior in the lab [01:06:02] highly efficacious behavior in the lab to what happens when [01:06:05] to what happens when some technology is operating quote [01:06:06] some technology is operating quote unquote in the real world so when a [01:06:08] unquote in the real world so when a vision system for example is tested in [01:06:10] vision system for example is tested in control conditions it works fine but now [01:06:12] control conditions it works fine but now you're putting it on the front end of a [01:06:15] you're putting it on the front end of a car that is going to work mostly [01:06:17] car that is going to work mostly autonomously and it's driving around [01:06:19] autonomously and it's driving around palo alto and then even driving around [01:06:21] palo alto and then even driving around some much more irregular environment [01:06:23] some much more irregular environment like [01:06:24] like some dusty road unpaved in northern [01:06:26] some dusty road unpaved in northern mexico [01:06:28] mexico so one body of law is tort law again [01:06:31] so one body of law is tort law again this is the body of law involving [01:06:33] this is the body of law involving the duties that you owe for example to [01:06:35] the duties that you owe for example to others as a company or as a person and [01:06:38] others as a company or as a person and here [01:06:39] here a core insight of tort lies that you [01:06:42] a core insight of tort lies that you have a [01:06:43] have a duty of care if there's a theory of [01:06:45] duty of care if there's a theory of negligence that is being or well let me [01:06:47] negligence that is being or well let me put it this way [01:06:50] if the claim is that the manufacturer [01:06:52] if the claim is that the manufacturer should have been more careful than the [01:06:54] should have been more careful than the manufacturer was and the manufacturer [01:06:57] manufacturer was and the manufacturer owes a duty to the person who's using [01:06:59] owes a duty to the person who's using the product which is its own separate [01:07:00] the product which is its own separate question [01:07:01] question a crucial [01:07:02] a crucial issue will be [01:07:04] issue will be the extent to which prevailing norms in [01:07:07] the extent to which prevailing norms in the industry about how much testing [01:07:09] the industry about how much testing happens outside the lab or followed or [01:07:12] happens outside the lab or followed or not [01:07:12] not the more those norms converge [01:07:15] the more those norms converge the easier it is for the company to say [01:07:17] the easier it is for the company to say perhaps like well look we did some [01:07:19] perhaps like well look we did some outside the lab testing [01:07:20] outside the lab testing you know you could easily spend a [01:07:23] you know you could easily spend a billion dollars a day testing outside [01:07:25] billion dollars a day testing outside the lab infinitely but we did enough [01:07:27] the lab infinitely but we did enough testing that we met the industry not [01:07:30] testing that we met the industry not a different technique would be to rely [01:07:32] a different technique would be to rely more on contract law where you could say [01:07:36] more on contract law where you could say i was sold a product that was guaranteed [01:07:38] i was sold a product that was guaranteed to me to have a degree of safety and [01:07:41] to me to have a degree of safety and efficacy in it and in fact it didn't [01:07:43] efficacy in it and in fact it didn't reflect that [01:07:45] reflect that because it wasn't tested outside the lab [01:07:47] because it wasn't tested outside the lab and the promise that was made to me was [01:07:49] and the promise that was made to me was not that this product was just tested in [01:07:50] not that this product was just tested in the lab it also implied a lot of testing [01:07:52] the lab it also implied a lot of testing outside the lab [01:07:53] outside the lab and the third strategy is more [01:07:55] and the third strategy is more administrative regulation [01:07:57] administrative regulation so this is like what the fda does with [01:07:58] so this is like what the fda does with respect to pharmaceutical products and [01:08:00] respect to pharmaceutical products and here the key insight [01:08:03] here the key insight is that we don't rely purely just on [01:08:05] is that we don't rely purely just on tort law or contract law we actually [01:08:08] tort law or contract law we actually have the government saying to you you [01:08:10] have the government saying to you you can only sell this product if you've [01:08:13] can only sell this product if you've tested it in a particular way and as you [01:08:15] tested it in a particular way and as you go through that pharmaceutical approval [01:08:17] go through that pharmaceutical approval process and get into phase two phase [01:08:19] process and get into phase two phase three phase four trials effectively what [01:08:21] three phase four trials effectively what you're doing is you're going further and [01:08:23] you're doing is you're going further and further outside the lab [01:08:24] further outside the lab so we can do this to some degree they're [01:08:26] so we can do this to some degree they're going to be some nuances but i think [01:08:28] going to be some nuances but i think it's really important to remember we [01:08:29] it's really important to remember we have different tools we can use to deal [01:08:32] have different tools we can use to deal with that risk [01:08:35] thank you [01:08:37] thank you notice we have a couple of questions on [01:08:38] notice we have a couple of questions on the chat so we do some of those and then [01:08:40] the chat so we do some of those and then come back to the raised hands yes i [01:08:42] come back to the raised hands yes i think maybe we have time just to take uh [01:08:45] think maybe we have time just to take uh uh these two [01:08:47] uh these two are you guys these two questions so [01:08:49] are you guys these two questions so first [01:08:50] first on the criminal justice system [01:08:55] oh i see that okay i'm going to just [01:08:56] oh i see that okay i'm going to just read it out if we find that an algorithm [01:08:58] read it out if we find that an algorithm deployed in the criminal justice system [01:09:00] deployed in the criminal justice system that doesn't explicitly take into [01:09:01] that doesn't explicitly take into account race but it's systematically [01:09:02] account race but it's systematically discriminated against black people [01:09:04] discriminated against black people perhaps compass [01:09:05] perhaps compass are the your legal ways to counter that [01:09:07] are the your legal ways to counter that discrimination is it constitutional they [01:09:10] discrimination is it constitutional they can do a count to protect a character [01:09:11] can do a count to protect a character like race oh great okay doesn't mean we [01:09:14] like race oh great okay doesn't mean we uh just shouldn't use the algorithm oh [01:09:16] uh just shouldn't use the algorithm oh okay excellent question [01:09:18] okay excellent question so the short answer is the legal system [01:09:20] so the short answer is the legal system has in america for good reason [01:09:23] has in america for good reason been long [01:09:24] been long deeply concerned with racial inequities [01:09:27] deeply concerned with racial inequities whether it has [01:09:28] whether it has been sufficiently concerned is a [01:09:30] been sufficiently concerned is a question that we can leave for another [01:09:32] question that we can leave for another day and others can talk about but [01:09:34] day and others can talk about but it has been concerned about it and that [01:09:36] it has been concerned about it and that concern is reflected in several parts of [01:09:38] concern is reflected in several parts of our legal system it's reflected in a lot [01:09:40] our legal system it's reflected in a lot of statutes at the state and federal [01:09:43] of statutes at the state and federal level against discrimination by race [01:09:46] level against discrimination by race like [01:09:47] like provisions against discrimination and [01:09:49] provisions against discrimination and employment and hiring [01:09:51] employment and hiring but it's also reflected in the [01:09:53] but it's also reflected in the constitution for example in the equal [01:09:55] constitution for example in the equal protection clause of the constitution [01:09:57] protection clause of the constitution and here i would just say [01:10:00] and here i would just say the legal system treats differently uses [01:10:03] the legal system treats differently uses of race that are well first of all the [01:10:06] of race that are well first of all the legal system treats different uses of [01:10:07] legal system treats different uses of race than other classifications that [01:10:09] race than other classifications that people might be subjected to [01:10:11] people might be subjected to they are subjected to [01:10:13] they are subjected to what the legal system calls strict [01:10:15] what the legal system calls strict scrutiny which is a very very demanding [01:10:18] scrutiny which is a very very demanding form of review where they're essentially [01:10:21] form of review where they're essentially explicit [01:10:22] explicit classifications are not permitted unless [01:10:24] classifications are not permitted unless there's a very [01:10:26] there's a very compelling strong justification on the [01:10:28] compelling strong justification on the part of the government and there's no [01:10:31] part of the government and there's no really no realistic way of doing it in a [01:10:34] really no realistic way of doing it in a different manner [01:10:35] different manner where it gets more complicated with [01:10:37] where it gets more complicated with something like compass is where there's [01:10:38] something like compass is where there's no explicit racial classification and [01:10:41] no explicit racial classification and you still have biases and here i just [01:10:44] you still have biases and here i just note that you can [01:10:46] note that you can find algorithmic ways [01:10:48] find algorithmic ways to reduce that bias or even disappear it [01:10:50] to reduce that bias or even disappear it completely [01:10:52] completely and there might even be legal reasons to [01:10:53] and there might even be legal reasons to do that but [01:10:55] do that but generally when you do that one of two [01:10:57] generally when you do that one of two trade-offs will happen you will either [01:10:59] trade-offs will happen you will either increase the likelihood that other [01:11:01] increase the likelihood that other variables that may not otherwise be so [01:11:04] variables that may not otherwise be so consequential become more consequential [01:11:06] consequential become more consequential so to some people that might mean you're [01:11:08] so to some people that might mean you're introducing a certain kind of different [01:11:09] introducing a certain kind of different bias although we may not care about it [01:11:11] bias although we may not care about it as much because it may not be racial [01:11:14] as much because it may not be racial or you simply reduce the accuracy of the [01:11:16] or you simply reduce the accuracy of the overall model in some cases now [01:11:18] overall model in some cases now that may be entirely sensible to do [01:11:21] that may be entirely sensible to do but these questions about when and how [01:11:23] but these questions about when and how you recalibrate the results of a [01:11:25] you recalibrate the results of a decision-making process because it [01:11:27] decision-making process because it doesn't take race into account and yet [01:11:29] doesn't take race into account and yet it still [01:11:30] it still gives unequal results is a very familiar [01:11:33] gives unequal results is a very familiar and vexing and difficult question in [01:11:35] and vexing and difficult question in criminal justice and in the legal system [01:11:37] criminal justice and in the legal system more generally let me just take the last [01:11:39] more generally let me just take the last question then [01:11:40] question then which one is it the one that says via [01:11:42] which one is it the one that says via advertising complicated [01:11:45] question actually [01:11:47] question actually okay so my question is regarding is that [01:11:49] okay so my question is regarding is that one how much tolerance okay would citing [01:11:51] one how much tolerance okay would citing have when ai makes mistakes on the one [01:11:54] have when ai makes mistakes on the one hand we say humans are not perfect on [01:11:56] hand we say humans are not perfect on the other hand we want to see how much a [01:11:58] the other hand we want to see how much a company can improve an algorithm to [01:12:00] company can improve an algorithm to avoid an accident caused by their ai [01:12:02] avoid an accident caused by their ai powered product to what extent should [01:12:03] powered product to what extent should they our product manufacturers optimize [01:12:05] they our product manufacturers optimize their product to become acceptable [01:12:07] their product to become acceptable great question [01:12:10] i think this gives us a chance to end [01:12:12] i think this gives us a chance to end where we began actually because when it [01:12:14] where we began actually because when it started i highlighted to you that [01:12:17] started i highlighted to you that not only can ai have many benefits for [01:12:19] not only can ai have many benefits for society [01:12:20] society i named some but also that the relevant [01:12:23] i named some but also that the relevant comparison is not to perfection but to [01:12:26] comparison is not to perfection but to what imperfect forms of decision making [01:12:28] what imperfect forms of decision making we might have to rely on if we don't [01:12:30] we might have to rely on if we don't rely on a particular ai system [01:12:33] rely on a particular ai system but [01:12:34] but it would be wrong to conclude from that [01:12:36] it would be wrong to conclude from that discussion [01:12:38] discussion that as long as ai systems are more [01:12:41] that as long as ai systems are more accurate than human decision makers then [01:12:43] accurate than human decision makers then there's no legal problem or there's no [01:12:46] there's no legal problem or there's no policy problem i think instead the [01:12:48] policy problem i think instead the reality is [01:12:50] reality is that as ai performance improves in a [01:12:52] that as ai performance improves in a very discreet domain [01:12:54] very discreet domain two things might happen [01:12:56] two things might happen the first that are relevant to the [01:12:57] the first that are relevant to the answer to your question the first is [01:13:00] answer to your question the first is that we may come to understand and trust [01:13:03] that we may come to understand and trust ai systems better to make that discrete [01:13:06] ai systems better to make that discrete decision [01:13:07] decision so long as it doesn't introduce some [01:13:08] so long as it doesn't introduce some other biases that we think of as even [01:13:10] other biases that we think of as even more concerning [01:13:12] more concerning so [01:13:12] so notice the point that i made about [01:13:14] notice the point that i made about how ai systems [01:13:16] how ai systems might be really good at picking out [01:13:18] might be really good at picking out faces that are unfamiliar relative to [01:13:20] faces that are unfamiliar relative to humans who are picking out unfamiliar [01:13:22] humans who are picking out unfamiliar faces [01:13:23] faces but if they're trying to discern emotion [01:13:26] but if they're trying to discern emotion they may not be as good as humans right [01:13:28] they may not be as good as humans right now that might change over time but [01:13:30] now that might change over time but right now that means that we have to be [01:13:31] right now that means that we have to be very specific with respect to what we're [01:13:33] very specific with respect to what we're expecting system to do rather than [01:13:35] expecting system to do rather than presuming that there's a sort of halo [01:13:36] presuming that there's a sort of halo efficacy beyond the very narrow context [01:13:39] efficacy beyond the very narrow context in which it's been tested [01:13:41] in which it's been tested which might have to include just beyond [01:13:43] which might have to include just beyond the lab right [01:13:45] the lab right number two as ai systems get better in [01:13:48] number two as ai systems get better in general [01:13:50] general the standard of care that the legal [01:13:51] the standard of care that the legal system uses to discern whether something [01:13:53] system uses to discern whether something works effectively or not will begin to [01:13:56] works effectively or not will begin to be redefined so that it's not just human [01:13:58] be redefined so that it's not just human efficacy it's a well-performing ai [01:14:00] efficacy it's a well-performing ai systems efficacy and not only a [01:14:02] systems efficacy and not only a well-performing a system but ideally a [01:14:04] well-performing a system but ideally a well-performing ai system that does not [01:14:06] well-performing ai system that does not have a built-in set of biases that we [01:14:08] have a built-in set of biases that we consider problematic so that means for [01:14:10] consider problematic so that means for example [01:14:11] example if over time 70 [01:14:14] if over time 70 of human passengers were in autonomous [01:14:16] of human passengers were in autonomous vehicles rather than human driven ones [01:14:19] vehicles rather than human driven ones been a faulty form of performance from [01:14:21] been a faulty form of performance from one of those vehicles that increases the [01:14:24] one of those vehicles that increases the risk of [01:14:25] risk of harm and actually results in somebody [01:14:27] harm and actually results in somebody getting harmed [01:14:28] getting harmed might still be actionable [01:14:30] might still be actionable despite the fact that even that faulty [01:14:33] despite the fact that even that faulty system works a lot better than human [01:14:34] system works a lot better than human drivers did [01:14:36] drivers did it also means that even if ai systems [01:14:38] it also means that even if ai systems are better at discerning new faces [01:14:40] are better at discerning new faces if their accuracy is much much better [01:14:42] if their accuracy is much much better for white faces than black faces [01:14:45] for white faces than black faces that would be a policy and potentially [01:14:47] that would be a policy and potentially legal problem for some [01:14:49] legal problem for some that might require remediation and [01:14:51] that might require remediation and attention even if the system works [01:14:53] attention even if the system works better than humans [01:14:55] better than humans all of which is why i think these [01:14:57] all of which is why i think these problems are going to keep your [01:14:58] problems are going to keep your generation busy for a long time [01:15:02] thank you all right [01:15:04] thank you all right thank you so much you know thanks for [01:15:06] thank you so much you know thanks for the great talk and awesome discussion [01:15:08] the great talk and awesome discussion and there are lots of interesting [01:15:09] and there are lots of interesting questions this is really fun [01:15:11] questions this is really fun uh so let's thank you know again [01:15:15] uh so let's thank you know again thank you everybody i really enjoyed [01:15:16] thank you everybody i really enjoyed this and i appreciate your very [01:15:18] this and i appreciate your very thoughtful questions and best of luck [01:15:25] you ================================================================================ LECTURE 052 ================================================================================ Stanford Fireside Talks: Robustness in Machine Learning I Robust Machine Learning Source: https://www.youtube.com/watch?v=xr8AHGlieOE --- Transcript [00:00:05] so today we're pleased to have tatsu [00:00:08] so today we're pleased to have tatsu hashimoto here with us um tatsu did his [00:00:11] hashimoto here with us um tatsu did his phd at mit did a post talk at stanford [00:00:15] phd at mit did a post talk at stanford spent one year [00:00:16] spent one year as a researcher at microsoft semantic [00:00:18] as a researcher at microsoft semantic machines and he's joining stanford um as [00:00:21] machines and he's joining stanford um as of last month as a fresh assistant [00:00:23] of last month as a fresh assistant professor so welcome to [00:00:25] professor so welcome to welcome back to stanford he'll actually [00:00:28] welcome back to stanford he'll actually be teaching 221 in the winter [00:00:30] be teaching 221 in the winter so if you like his talk you should go [00:00:32] so if you like his talk you should go tell all your friends to have them take [00:00:34] tell all your friends to have them take 221 in the in the winter [00:00:37] 221 in the in the winter so tatsu has worked on a number of areas [00:00:40] so tatsu has worked on a number of areas from computational biology text [00:00:42] from computational biology text generation and nlp but he's probably [00:00:45] generation and nlp but he's probably really well known for his work on [00:00:48] really well known for his work on you know robustness and machine learning [00:00:50] you know robustness and machine learning and i think throughout this course um [00:00:54] and i think throughout this course um we've emphasized that machine learning [00:00:55] we've emphasized that machine learning is something that's really [00:00:58] is something that's really being deployed in the real world all [00:01:00] being deployed in the real world all over right now and having real impact in [00:01:03] over right now and having real impact in the world just last week we heard from [00:01:05] the world just last week we heard from you know about this so i think [00:01:07] you know about this so i think robustness of machine learning systems [00:01:09] robustness of machine learning systems is a really really important area and [00:01:11] is a really really important area and tatsu is an expert in this so i'm really [00:01:13] tatsu is an expert in this so i'm really happy to have him tell us [00:01:15] happy to have him tell us what robustness and machine learning is [00:01:17] what robustness and machine learning is all about and where things are at the [00:01:19] all about and where things are at the moment so take it away tatsu [00:01:23] moment so take it away tatsu okay great [00:01:25] okay great um so i want to start with emphasizing [00:01:28] um so i want to start with emphasizing sort of what percy already said which is [00:01:30] sort of what percy already said which is that there's been this enormous and [00:01:32] that there's been this enormous and rapid progress in machine learning [00:01:35] rapid progress in machine learning over the last decade or so and [00:01:37] over the last decade or so and especially in tasks like image [00:01:39] especially in tasks like image recognition um 10 years ago [00:01:42] recognition um 10 years ago or errors were at the level of like 20 [00:01:44] or errors were at the level of like 20 30 percent and human level performance [00:01:46] 30 percent and human level performance was you know sub uh seven percent [00:01:50] was you know sub uh seven percent and there was this huge gap in [00:01:51] and there was this huge gap in performance and everyone said it'll take [00:01:53] performance and everyone said it'll take a long time to reach human level [00:01:54] a long time to reach human level performance [00:01:56] performance but nowadays really human level [00:01:57] but nowadays really human level performance is being achieved on all [00:01:59] performance is being achieved on all sorts of tasks image recognition as of [00:02:01] sorts of tasks image recognition as of say 2015 but also in tasks like natural [00:02:04] say 2015 but also in tasks like natural language processing and much more [00:02:06] language processing and much more challenging reasoning-based tasks [00:02:08] challenging reasoning-based tasks these systems are now getting really [00:02:10] these systems are now getting really close if not exceeding human performance [00:02:13] close if not exceeding human performance and so machine learning has really [00:02:15] and so machine learning has really achieved these sort of great successes [00:02:17] achieved these sort of great successes and they're being deployed and we can [00:02:18] and they're being deployed and we can sort of ask what is machine learning [00:02:20] sort of ask what is machine learning been good at and what is it good at and [00:02:22] been good at and what is it good at and it's really good at extracting patterns [00:02:24] it's really good at extracting patterns from training data [00:02:25] from training data and applying this on a test distribution [00:02:28] and applying this on a test distribution to do some prediction [00:02:30] to do some prediction and so we can think of this as you know [00:02:31] and so we can think of this as you know classic digit prediction task you have [00:02:33] classic digit prediction task you have some images of digits and you need to [00:02:35] some images of digits and you need to return the you know numbers that are [00:02:36] return the you know numbers that are associated with them as long as sort of [00:02:38] associated with them as long as sort of the source and the target distributions [00:02:40] the source and the target distributions look the same [00:02:42] look the same modern machine learning systems based on [00:02:44] modern machine learning systems based on large amounts of data and neural nets [00:02:46] large amounts of data and neural nets are going to do exceedingly well on [00:02:47] are going to do exceedingly well on these tasks but really the challenge is [00:02:50] these tasks but really the challenge is what if [00:02:50] what if um the training data doesn't look very [00:02:52] um the training data doesn't look very much like the test data [00:02:54] much like the test data um in these cases we're gonna have a lot [00:02:56] um in these cases we're gonna have a lot of challenges so on the [00:02:58] of challenges so on the image that i put here you know in the [00:02:59] image that i put here you know in the source domain we have these like black [00:03:01] source domain we have these like black and white images and sort of desaturated [00:03:04] and white images and sort of desaturated settings and now at test time you have [00:03:05] settings and now at test time you have these yellow cabs in new york um and you [00:03:08] these yellow cabs in new york um and you know your predictions might not work so [00:03:09] know your predictions might not work so well once you have this what's called [00:03:11] well once you have this what's called distribution shift [00:03:13] distribution shift and so once we start to think about [00:03:15] and so once we start to think about going beyond just sort of data that [00:03:18] going beyond just sort of data that looks like the training data we see a [00:03:19] looks like the training data we see a lot of problems on the horizon and we've [00:03:20] lot of problems on the horizon and we've discovered a lot of these problems [00:03:22] discovered a lot of these problems beyond tesla accuracy [00:03:25] beyond tesla accuracy and i'm going to at the beginning of [00:03:26] and i'm going to at the beginning of this talk cover sort of three classes of [00:03:28] this talk cover sort of three classes of problems [00:03:29] problems that hopefully [00:03:30] that hopefully you'll think about as you sort of [00:03:32] you'll think about as you sort of continue on your journey in ai and [00:03:33] continue on your journey in ai and machine learning the first one is sort [00:03:35] machine learning the first one is sort of discrimination [00:03:37] of discrimination and performance on minorities [00:03:39] and performance on minorities another one is vulnerability to [00:03:41] another one is vulnerability to adversaries in high stakes secure [00:03:44] adversaries in high stakes secure applications um and then last one which [00:03:47] applications um and then last one which is a little bit more abstract but i will [00:03:49] is a little bit more abstract but i will get into this in more detail is that [00:03:51] get into this in more detail is that models don't really display an [00:03:52] models don't really display an understanding of the tasks that they're [00:03:55] understanding of the tasks that they're actually performing and this is going to [00:03:56] actually performing and this is going to be a little bit abstract but because the [00:03:58] be a little bit abstract but because the ai focused class i think this is an [00:03:59] ai focused class i think this is an important thing uh to be discussing and [00:04:02] important thing uh to be discussing and going through [00:04:03] going through and so sort of the unifying theme like [00:04:04] and so sort of the unifying theme like these seem like very different problems [00:04:06] these seem like very different problems right like problems that machine [00:04:07] right like problems that machine learning systems have today [00:04:09] learning systems have today but really they're all sort of connected [00:04:11] but really they're all sort of connected with a single underlying theme and that [00:04:14] with a single underlying theme and that many of these problems can be cast as [00:04:15] many of these problems can be cast as these problems in robustness [00:04:18] these problems in robustness and so when the training distribution [00:04:19] and so when the training distribution and the test distribution are different [00:04:21] and the test distribution are different these models break down because they're [00:04:23] these models break down because they're broke [00:04:25] so to start with let's talk about sort [00:04:27] so to start with let's talk about sort of discrimination and fairness and [00:04:29] of discrimination and fairness and minority groups [00:04:30] minority groups so [00:04:31] so a really typical thing that happens in a [00:04:34] a really typical thing that happens in a lot of machine learning systems today is [00:04:36] lot of machine learning systems today is that there's sort of a majority group [00:04:38] that there's sort of a majority group let's say you know western cultures [00:04:40] let's say you know western cultures english text [00:04:41] english text or [00:04:42] or sort of males in many cases so in this [00:04:44] sort of males in many cases so in this majority group that dominates the [00:04:46] majority group that dominates the training data you get extremely good [00:04:48] training data you get extremely good superhuman performance in these systems [00:04:51] superhuman performance in these systems and often you're going to be deploying [00:04:52] and often you're going to be deploying this to a wide variety of users and so [00:04:55] this to a wide variety of users and so you will have minorities using your [00:04:57] you will have minorities using your system [00:04:58] system and in these cases you end up with [00:05:00] and in these cases you end up with horrible sort of near random performance [00:05:02] horrible sort of near random performance and you can sort of immediately see how [00:05:04] and you can sort of immediately see how this is a discrimination issue and sort [00:05:05] this is a discrimination issue and sort of an equity issue [00:05:07] of an equity issue and i'm going to go over a lot of these [00:05:09] and i'm going to go over a lot of these examples in turn but these just show up [00:05:10] examples in turn but these just show up in all sorts of places that you might [00:05:12] in all sorts of places that you might not initially think about when you think [00:05:13] not initially think about when you think about fairness problems like say a [00:05:15] about fairness problems like say a dependency parsing or video captioning [00:05:18] dependency parsing or video captioning face recognition is a very common one [00:05:19] face recognition is a very common one that people probably already know [00:05:21] that people probably already know but in these sorts of like common widely [00:05:23] but in these sorts of like common widely deployed ml systems you start to see [00:05:25] deployed ml systems you start to see these gaps between how these systems [00:05:26] these gaps between how these systems perform [00:05:28] perform on majority groups versus minority [00:05:31] on majority groups versus minority groups [00:05:34] so the first one that i think is [00:05:35] so the first one that i think is probably maybe surprising to many people [00:05:38] probably maybe surprising to many people that there's these kinds of gaps [00:05:40] that there's these kinds of gaps it's a test called dependency parsing so [00:05:42] it's a test called dependency parsing so the input is just sort of sentences [00:05:44] the input is just sort of sentences tokenized and sort of split up so an [00:05:47] tokenized and sort of split up so an example here is bills on ports and [00:05:48] example here is bills on ports and immigration we're submitted by senator [00:05:50] immigration we're submitted by senator brownback republican of kansas um and [00:05:53] brownback republican of kansas um and the output is that you're supposed to [00:05:54] the output is that you're supposed to analyze sort of the syntactic structure [00:05:56] analyze sort of the syntactic structure of this sentence and create dependencies [00:05:58] of this sentence and create dependencies between what are called headwords uh and [00:06:00] between what are called headwords uh and they're dependent and so you end up with [00:06:02] they're dependent and so you end up with what looks like a tree here [00:06:04] what looks like a tree here um and so the sentence above like the [00:06:05] um and so the sentence above like the bills on ports and so on um can be [00:06:07] bills on ports and so on um can be parsed into this v-shaped uh structure [00:06:10] parsed into this v-shaped uh structure here on the bottom and so this is called [00:06:12] here on the bottom and so this is called dependency parsing because there's these [00:06:14] dependency parsing because there's these explicit dependencies uh between tokens [00:06:16] explicit dependencies uh between tokens that show up um in your data [00:06:19] that show up um in your data and in sort of classical nlp pipelines [00:06:22] and in sort of classical nlp pipelines such as say if you want to extract [00:06:23] such as say if you want to extract relations between uh people or entities [00:06:26] relations between uh people or entities you know who was the person that [00:06:27] you know who was the person that submitted the bill in the sentence for [00:06:28] submitted the bill in the sentence for example you might use something like a [00:06:30] example you might use something like a dependency parser to look at [00:06:32] dependency parser to look at dependencies in your sentence and to [00:06:34] dependencies in your sentence and to extract relations right so this is sort [00:06:35] extract relations right so this is sort of a first step in terms of uh getting [00:06:38] of a first step in terms of uh getting these kinds of more sophisticated [00:06:39] these kinds of more sophisticated analyses in these sort of classical [00:06:41] analyses in these sort of classical pipelines nowadays many things are sort [00:06:42] pipelines nowadays many things are sort of end-to-end and neural um but that [00:06:44] of end-to-end and neural um but that sort of decide the point here [00:06:47] sort of decide the point here and what's sort of surprising or maybe [00:06:49] and what's sort of surprising or maybe not surprising if you've thought about [00:06:50] not surprising if you've thought about these kinds of problems is that these [00:06:52] these kinds of problems is that these parsers do much much worse on [00:06:55] parsers do much much worse on data that's not commonly used to train [00:06:58] data that's not commonly used to train these dependency parsers so this is a [00:07:00] these dependency parsers so this is a study from [00:07:01] study from sue lemblaja in 2016 where they took a [00:07:04] sue lemblaja in 2016 where they took a bunch of different dependency parsers [00:07:06] bunch of different dependency parsers and applied them to text [00:07:09] and applied them to text from standard american english as well [00:07:11] from standard american english as well as african-american vernacular and [00:07:13] as african-american vernacular and that's the column labeled [00:07:14] that's the column labeled aalis [00:07:16] aalis and the performance here is measured by [00:07:18] and the performance here is measured by what's called label attachment score so [00:07:20] what's called label attachment score so that's how well do you reconstruct the [00:07:21] that's how well do you reconstruct the tree [00:07:22] tree and the numbers here you know you might [00:07:23] and the numbers here you know you might not really know how to internalize this [00:07:25] not really know how to internalize this but you see these big gaps right so in [00:07:27] but you see these big gaps right so in terms of [00:07:28] terms of standard american english you get these [00:07:30] standard american english you get these 57 uh sort of f1 score type accuracy and [00:07:33] 57 uh sort of f1 score type accuracy and then 43 in african-americans and you get [00:07:36] then 43 in african-americans and you get a 14-point gap and sort of [00:07:38] a 14-point gap and sort of state-of-the-art for this task say you [00:07:39] state-of-the-art for this task say you know you're competing over like one [00:07:40] know you're competing over like one point difference so these are enormous [00:07:42] point difference so these are enormous gaps uh once you go from standard [00:07:44] gaps uh once you go from standard american english to african-american [00:07:45] american english to african-american vernacular [00:07:47] vernacular and these kinds of things can have huge [00:07:49] and these kinds of things can have huge downstream impact um if they're used in [00:07:51] downstream impact um if they're used in things like relation extraction or qa [00:07:53] things like relation extraction or qa systems right because texts from [00:07:55] systems right because texts from african-americans are just [00:07:56] african-americans are just systematically not going to get [00:07:58] systematically not going to get extracted into say relations or entities [00:08:00] extracted into say relations or entities when you build knowledge bases and [00:08:02] when you build knowledge bases and things like that [00:08:03] things like that and so you might sort of see how this [00:08:05] and so you might sort of see how this begins to affect these kinds of minority [00:08:07] begins to affect these kinds of minority groups [00:08:09] groups through these kinds of robustness [00:08:10] through these kinds of robustness problems [00:08:14] another example [00:08:16] another example is video captioning so many of you have [00:08:18] is video captioning so many of you have already interacted with systems like [00:08:20] already interacted with systems like this through youtube's video captioning [00:08:22] this through youtube's video captioning system [00:08:23] system where the input is you know you have a [00:08:25] where the input is you know you have a video with some spoken text audio and [00:08:27] video with some spoken text audio and the output is text captions that are [00:08:30] the output is text captions that are automatically added to the video [00:08:33] automatically added to the video and these things are increasingly [00:08:34] and these things are increasingly important because say if you have um [00:08:37] important because say if you have um i know that in medical domains if you [00:08:39] i know that in medical domains if you have medicaid funded sort of videos that [00:08:41] have medicaid funded sort of videos that you need to put up on the internet you [00:08:42] you need to put up on the internet you need to have captions and so in these [00:08:44] need to have captions and so in these cases you have you either run these [00:08:46] cases you have you either run these systems or you have people transcribe [00:08:48] systems or you have people transcribe the videos [00:08:51] the videos and what's been found is that these [00:08:53] and what's been found is that these kinds of systems work a lot worse for [00:08:54] kinds of systems work a lot worse for women um so this is a study by rachel [00:08:56] women um so this is a study by rachel chapman 2017 [00:08:58] chapman 2017 where she basically showed that if you [00:09:00] where she basically showed that if you took uh male versus female speakers and [00:09:02] took uh male versus female speakers and you ran them through [00:09:04] you ran them through youtube's video captioning system you [00:09:06] youtube's video captioning system you guys systematically higher error rates [00:09:08] guys systematically higher error rates for women and you see that sort of the [00:09:09] for women and you see that sort of the median error rate is essentially the [00:09:11] median error rate is essentially the upper quartile error rate for maps so [00:09:13] upper quartile error rate for maps so that's actually a pretty substantial [00:09:14] that's actually a pretty substantial difference in the word error rate [00:09:16] difference in the word error rate between these two groups [00:09:18] between these two groups and you also see sort of expected [00:09:19] and you also see sort of expected differences between dialects which is [00:09:21] differences between dialects which is you know scottish speakers get [00:09:22] you know scottish speakers get substantially worse [00:09:24] substantially worse um video captioning accuracy uh whereas [00:09:27] um video captioning accuracy uh whereas you know speakers from california uh get [00:09:29] you know speakers from california uh get really good word error rates and so you [00:09:31] really good word error rates and so you know you can sort of see how this sort [00:09:33] know you can sort of see how this sort of manifests right youtube being based [00:09:34] of manifests right youtube being based in california obviously dog fooded with [00:09:37] in california obviously dog fooded with people with californian accents [00:09:39] people with californian accents and when tested out of distribution on [00:09:41] and when tested out of distribution on scottish speakers suddenly performs a [00:09:42] scottish speakers suddenly performs a lot worse [00:09:44] lot worse and so this is the kind of robustness [00:09:46] and so this is the kind of robustness problems that you initially don't think [00:09:47] problems that you initially don't think about because you sort of think about [00:09:49] about because you sort of think about well is our model performing well on you [00:09:52] well is our model performing well on you know really a complex uh input and so [00:09:55] know really a complex uh input and so you might put in some really complex [00:09:56] you might put in some really complex inputs as a california speaker but [00:09:59] inputs as a california speaker but really you haven't tested out of [00:10:00] really you haven't tested out of distribution on scottish accents [00:10:05] distribution on scottish accents and then we'll come to another example [00:10:07] and then we'll come to another example which many of you hopefully already know [00:10:09] which many of you hopefully already know in facial recognition this has sort of [00:10:11] in facial recognition this has sort of been really widely discussed even in the [00:10:13] been really widely discussed even in the media and just to go over what the task [00:10:15] media and just to go over what the task is the input [00:10:17] is the input is images um [00:10:19] is images um possibly containing a face or not [00:10:21] possibly containing a face or not depending on the task [00:10:23] depending on the task and you can do many sort of things with [00:10:24] and you can do many sort of things with these images and there are many outputs [00:10:26] these images and there are many outputs that are associated with face [00:10:27] that are associated with face recognition or identification tasks [00:10:30] recognition or identification tasks and so you might ask is there a face in [00:10:32] and so you might ask is there a face in this image and that's sort of face uh [00:10:34] this image and that's sort of face uh sorry recognition um you might need to [00:10:36] sorry recognition um you might need to match a given face to uh database of [00:10:39] match a given face to uh database of faces and that would be identification [00:10:41] faces and that would be identification um or you might need to predict [00:10:42] um or you might need to predict attributes is this um [00:10:45] attributes is this um face [00:10:46] face a female face or a male face or you know [00:10:49] a female face or a male face or you know happy or sad you have many sort of [00:10:50] happy or sad you have many sort of attribute prediction tasks that can be [00:10:52] attribute prediction tasks that can be built on top of faces [00:10:54] built on top of faces and [00:10:56] and this is one of the original studies i [00:10:58] this is one of the original studies i think in terms of highlighting [00:11:00] think in terms of highlighting how bad these kinds of systems can be [00:11:02] how bad these kinds of systems can be in sort of widespread ways and so [00:11:05] in sort of widespread ways and so there's a study from mit media lab [00:11:07] there's a study from mit media lab gender shades and joy bullani vilam [00:11:10] gender shades and joy bullani vilam winnie in 2018 [00:11:12] winnie in 2018 where she basically took [00:11:13] where she basically took a whole bunch of um portraits of [00:11:16] a whole bunch of um portraits of legislators from different countries [00:11:18] legislators from different countries african and i think northern european [00:11:21] african and i think northern european and ran them through [00:11:22] and ran them through different face [00:11:24] different face attribute prediction systems for whether [00:11:25] attribute prediction systems for whether or not they were male or female and what [00:11:28] or not they were male or female and what you can sort of see on this uh on the [00:11:30] you can sort of see on this uh on the top right [00:11:31] top right is that dark female skin uh results in [00:11:35] is that dark female skin uh results in much worse [00:11:36] much worse gender predictions compared to [00:11:38] gender predictions compared to light skinned males where you basically [00:11:40] light skinned males where you basically have perfect prediction [00:11:43] have perfect prediction and these kinds of things are [00:11:45] and these kinds of things are pretty problematic if you've been [00:11:46] pretty problematic if you've been testing your systems on light-skinned [00:11:49] testing your systems on light-skinned people you think your system is near [00:11:50] people you think your system is near perfect and so you might be using it for [00:11:52] perfect and so you might be using it for really high-stakes tasks [00:11:55] really high-stakes tasks where you need a 100 performance but [00:11:56] where you need a 100 performance but then when applied to these sort of [00:11:58] then when applied to these sort of darker skinned uh demographic groups you [00:12:01] darker skinned uh demographic groups you end up with substantially worse [00:12:02] end up with substantially worse performance and so you're not you don't [00:12:04] performance and so you're not you don't you don't even realize um the kinds of [00:12:06] you don't even realize um the kinds of harms that you're causing by using these [00:12:08] harms that you're causing by using these kinds of systems [00:12:11] kinds of systems and what's sort of problematic and sort [00:12:13] and what's sort of problematic and sort of you can see is that they reflect a [00:12:15] of you can see is that they reflect a lot of the benchmark data that's been [00:12:16] lot of the benchmark data that's been constructed for this task and so on the [00:12:18] constructed for this task and so on the right on the right bottom here [00:12:20] right on the right bottom here um you see [00:12:22] um you see the distribution of sort of skin color [00:12:25] the distribution of sort of skin color and gender for benchmark data sets in [00:12:28] and gender for benchmark data sets in this kind of like sort of gender [00:12:30] this kind of like sort of gender um identification from face image tasks [00:12:33] um identification from face image tasks and what you see is that they're sort of [00:12:34] and what you see is that they're sort of a systematic underrepresentation of both [00:12:37] a systematic underrepresentation of both females and darker skinned demographics [00:12:42] females and darker skinned demographics and you might say you know this really [00:12:44] and you might say you know this really just reflects the underlying data [00:12:46] just reflects the underlying data distribution and so maybe all we need to [00:12:48] distribution and so maybe all we need to get is you know unbiased data you hear [00:12:49] get is you know unbiased data you hear this term a lot from i think people who [00:12:53] this term a lot from i think people who um haven't thought too deeply about [00:12:55] um haven't thought too deeply about problems of robustness but the issue is [00:12:57] problems of robustness but the issue is that there's really no such thing as [00:12:59] that there's really no such thing as truly unbiased data in the sense that [00:13:00] truly unbiased data in the sense that there will always be an underrepresented [00:13:02] there will always be an underrepresented group if you slice your data fine enough [00:13:04] group if you slice your data fine enough so we need to really just go beyond [00:13:05] so we need to really just go beyond thinking about balancing the data set [00:13:07] thinking about balancing the data set and we need to think about how can we [00:13:08] and we need to think about how can we make our models work well [00:13:10] make our models work well even for really small groups really [00:13:11] even for really small groups really small demographic groups and even [00:13:12] small demographic groups and even individuals [00:13:16] another task [00:13:18] another task that has these kinds of issues is [00:13:20] that has these kinds of issues is language identification so as an input [00:13:22] language identification so as an input you might be uh working at twitter and [00:13:24] you might be uh working at twitter and you need to identify the language of a [00:13:25] you need to identify the language of a tweet so that you can run a machine [00:13:27] tweet so that you can run a machine translation system and automatically [00:13:28] translation system and automatically translate a tweet into someone's uh sort [00:13:31] translate a tweet into someone's uh sort of speaking language [00:13:33] of speaking language but in order to do this you need to [00:13:35] but in order to do this you need to first identify what text the tweet is [00:13:37] first identify what text the tweet is written in right and so you might have a [00:13:39] written in right and so you might have a lot of different kinds of inputs and [00:13:41] lot of different kinds of inputs and this figure one shows the challenge in [00:13:43] this figure one shows the challenge in this task [00:13:44] this task so you might have um dialectical text so [00:13:47] so you might have um dialectical text so the top one is on nigerian english [00:13:50] the top one is on nigerian english the second one is sort of sort of irish [00:13:52] the second one is sort of sort of irish tweets and the last one you can have [00:13:54] tweets and the last one you can have code switching so you can have a mix of [00:13:55] code switching so you can have a mix of both indonesian and english and so in [00:13:58] both indonesian and english and so in language identification when you're [00:13:59] language identification when you're given these kinds of tweets you need to [00:14:01] given these kinds of tweets you need to identify the source language [00:14:03] identify the source language that they were written in [00:14:04] that they were written in and so the output of the task is the [00:14:06] and so the output of the task is the language [00:14:08] language what's sort of been identified is that [00:14:10] what's sort of been identified is that there's systematic biases once again in [00:14:12] there's systematic biases once again in language identification and sort of one [00:14:14] language identification and sort of one that's immediately a little bit [00:14:15] that's immediately a little bit troubling is that african-american [00:14:17] troubling is that african-american english often gets identified not as [00:14:19] english often gets identified not as english so there's sort of like an [00:14:21] english so there's sort of like an implicit normative judgment being made [00:14:23] implicit normative judgment being made here that african-american vernacular is [00:14:25] here that african-american vernacular is uh not english and you see this uh error [00:14:27] uh not english and you see this uh error right here [00:14:28] right here as aae having almost double the array of [00:14:31] as aae having almost double the array of language identification compared to a [00:14:33] language identification compared to a more standard american english [00:14:35] more standard american english data set [00:14:37] data set you also see this across languages and [00:14:39] you also see this across languages and this is a study by jergens at all in [00:14:40] this is a study by jergens at all in 2017 [00:14:42] 2017 where if you sort sort of the languages [00:14:44] where if you sort sort of the languages by the human development [00:14:46] by the human development index of the countries um you see that [00:14:48] index of the countries um you see that there's this decreasing uh recall or [00:14:50] there's this decreasing uh recall or decreasing accuracy um as the countries [00:14:53] decreasing accuracy um as the countries get less and less developed and that's [00:14:54] get less and less developed and that's because often these kinds of countries [00:14:57] because often these kinds of countries have under-resourced [00:14:58] have under-resourced data sets and so there isn't as much [00:15:01] data sets and so there isn't as much data with which to train these language [00:15:03] data with which to train these language identification systems [00:15:04] identification systems so you see the systematic biases in [00:15:06] so you see the systematic biases in terms of how well developed and how [00:15:08] terms of how well developed and how internet connected these countries are [00:15:11] internet connected these countries are and this this leads to sort of [00:15:12] and this this leads to sort of representational harms right like if [00:15:14] representational harms right like if you're an african-american english [00:15:15] you're an african-american english speaker and a system you know tells you [00:15:17] speaker and a system you know tells you that you're not speaking english that's [00:15:19] that you're not speaking english that's kind of harmful and there's also utility [00:15:21] kind of harmful and there's also utility harms right like if your text doesn't [00:15:23] harms right like if your text doesn't automatically get translated to english [00:15:24] automatically get translated to english you know you're going to [00:15:26] you know you're going to your tweets won't reach as wide of an [00:15:28] your tweets won't reach as wide of an audience [00:15:29] audience and so you can think of these [00:15:31] and so you can think of these as having sort of pretty serious [00:15:32] as having sort of pretty serious implications for fairness as machine [00:15:34] implications for fairness as machine learning becomes more widespread and [00:15:36] learning becomes more widespread and more useful [00:15:37] more useful and more impactful [00:15:38] and more impactful and so there's these problems of [00:15:41] and so there's these problems of serious sort of active discrimination um [00:15:44] serious sort of active discrimination um where this was a story in the new york [00:15:46] where this was a story in the new york times where [00:15:47] times where um a face [00:15:48] um a face recognition system identified a person [00:15:50] recognition system identified a person as being a criminal and this was faulty [00:15:52] as being a criminal and this was faulty and this was essentially the only reason [00:15:55] and this was essentially the only reason for arresting this michigan man and so [00:15:57] for arresting this michigan man and so if you have a system that's [00:15:59] if you have a system that's much more error prone on african [00:16:01] much more error prone on african americans you're basically going to have [00:16:02] americans you're basically going to have a much higher error rate when deploying [00:16:04] a much higher error rate when deploying these kinds of algorithms so you have [00:16:05] these kinds of algorithms so you have these active discriminations and harms [00:16:07] these active discriminations and harms that are being done but on the other [00:16:09] that are being done but on the other side right we think that you know as [00:16:11] side right we think that you know as people taking machine learning and [00:16:12] people taking machine learning and studying machine learning we think that [00:16:14] studying machine learning we think that these kinds of technologies are broadly [00:16:15] these kinds of technologies are broadly beneficial and useful um and increase [00:16:18] beneficial and useful um and increase efficiency and so you know there's a [00:16:20] efficiency and so you know there's a study um by eric barone yolfsen which [00:16:23] study um by eric barone yolfsen which says you know the application of machine [00:16:25] says you know the application of machine translation systems um increased you [00:16:27] translation systems um increased you know exports on ebay by 17.5 because [00:16:31] know exports on ebay by 17.5 because it's really easy to translate text and [00:16:32] it's really easy to translate text and so people from other countries can buy [00:16:34] so people from other countries can buy your product but if for example [00:16:37] your product but if for example language identifiers can't identify your [00:16:38] language identifiers can't identify your language and so you can't use machine [00:16:40] language and so you can't use machine translation systems then you don't get [00:16:42] translation systems then you don't get these benefits right so you get unequal [00:16:44] these benefits right so you get unequal access to sort of the fruits of these [00:16:46] access to sort of the fruits of these kinds of ai systems and so this can lead [00:16:48] kinds of ai systems and so this can lead to sort of uh harms in both ways you [00:16:51] to sort of uh harms in both ways you don't get access to the benefits um and [00:16:53] don't get access to the benefits um and you get these kinds of active harms from [00:16:55] you get these kinds of active harms from the errors that these systems make [00:16:58] the errors that these systems make um and i'm gonna stop here because i [00:16:59] um and i'm gonna stop here because i think fairness is a topic that many [00:17:01] think fairness is a topic that many people have [00:17:02] people have uh feelings and comments about and i'd [00:17:04] uh feelings and comments about and i'd be happy to sort of just sit around and [00:17:06] be happy to sort of just sit around and discuss for the next couple of minutes [00:17:07] discuss for the next couple of minutes if anyone has questions um about sort of [00:17:10] if anyone has questions um about sort of how fairness and these sort of [00:17:11] how fairness and these sort of robustness questions interact with each [00:17:12] robustness questions interact with each other [00:17:13] other yeah uh there's a bunch of questions in [00:17:15] yeah uh there's a bunch of questions in the chat oh [00:17:17] the chat oh yes uh sorry i have full screen so i'm [00:17:19] yes uh sorry i have full screen so i'm gonna [00:17:20] gonna give me a moment to pull up the chat um [00:17:24] give me a moment to pull up the chat um yes okay [00:17:26] yes okay okay i see it now [00:17:28] okay i see it now ah okay so [00:17:30] ah okay so i think [00:17:32] i think there are um i'll start with the first [00:17:34] there are um i'll start with the first question which is um [00:17:36] question which is um having um balanced unbiased data is not [00:17:38] having um balanced unbiased data is not enough and so this is a very subtle [00:17:40] enough and so this is a very subtle point so there are data that you can [00:17:41] point so there are data that you can construct that will make you [00:17:43] construct that will make you um robust to certain kinds of groups [00:17:45] um robust to certain kinds of groups right like so let's let's go back to um [00:17:47] right like so let's let's go back to um just this slide [00:17:51] um so if we look at this sort of [00:17:53] um so if we look at this sort of distribution of data it's clear that at [00:17:55] distribution of data it's clear that at least for this like top one aliens uh [00:17:57] least for this like top one aliens uh we're probably going to have some sort [00:17:59] we're probably going to have some sort of bias in terms of light versus dark [00:18:01] of bias in terms of light versus dark skin because dark skin is so [00:18:02] skin because dark skin is so underrepresented [00:18:03] underrepresented um but if we balance this data out right [00:18:06] um but if we balance this data out right we might still have um sort of [00:18:08] we might still have um sort of unbalanced unbalanced uh demographics in [00:18:11] unbalanced unbalanced uh demographics in certain other areas right maybe it's not [00:18:14] certain other areas right maybe it's not uh dark versus light skin but maybe it's [00:18:15] uh dark versus light skin but maybe it's geographic region it might be income [00:18:18] geographic region it might be income right these kinds of problems are [00:18:19] right these kinds of problems are enumerable and so what you really need [00:18:21] enumerable and so what you really need and not is not necessarily the search [00:18:23] and not is not necessarily the search for this unreachable perfectly balanced [00:18:25] for this unreachable perfectly balanced data set [00:18:26] data set but sort of a model that can sort of do [00:18:29] but sort of a model that can sort of do well on small data sets or small um [00:18:31] well on small data sets or small um amounts of data so you want to have a [00:18:33] amounts of data so you want to have a model that can take in these kinds of [00:18:34] model that can take in these kinds of imbalanced data and do well both on the [00:18:37] imbalanced data and do well both on the male and or sorry the dark and the light [00:18:38] male and or sorry the dark and the light skin and the important thing in this [00:18:40] skin and the important thing in this task is there's no real trade-off right [00:18:42] task is there's no real trade-off right like there's no real reason that you [00:18:43] like there's no real reason that you can't do well on both the light and dark [00:18:45] can't do well on both the light and dark skin and i think that's the sort of [00:18:46] skin and i think that's the sort of crucial structure here like if you can [00:18:48] crucial structure here like if you can do well on both groups [00:18:50] do well on both groups then it's not really about the amount of [00:18:52] then it's not really about the amount of data or the distribution of data it's [00:18:54] data or the distribution of data it's more about the model and sort of how [00:18:55] more about the model and sort of how you're learning it [00:18:58] and the second question is there way to [00:19:00] and the second question is there way to audit models without having access [00:19:03] audit models without having access to the model um [00:19:05] to the model um so that's a interesting question i mean [00:19:09] so that's a interesting question i mean i'm not sure if you meant by this like [00:19:11] i'm not sure if you meant by this like access to the model's outputs or [00:19:13] access to the model's outputs or something else like if you have access [00:19:15] something else like if you have access to the model's outputs right you can [00:19:16] to the model's outputs right you can perform a study like gender shades where [00:19:18] perform a study like gender shades where you run the model [00:19:20] you run the model on certain [00:19:21] on certain sort of challenge examples and you look [00:19:23] sort of challenge examples and you look at what the error rate is and you say [00:19:25] at what the error rate is and you say well clearly we are doing much worse on [00:19:27] well clearly we are doing much worse on dark skinned females than light-skinned [00:19:28] dark skinned females than light-skinned males so there's some sort of sort of [00:19:30] males so there's some sort of sort of bias so you can audit models that way it [00:19:32] bias so you can audit models that way it becomes much harder to audit models if [00:19:34] becomes much harder to audit models if you can't execute the model on your own [00:19:36] you can't execute the model on your own data then you'll have to do something a [00:19:38] data then you'll have to do something a little bit tricky and it requires [00:19:40] little bit tricky and it requires very specialized conditions i think to [00:19:42] very specialized conditions i think to be able to audit those kinds of models [00:19:46] also feel free to ask follow-up [00:19:47] also feel free to ask follow-up questions if i didn't answer any of [00:19:48] questions if i didn't answer any of these [00:19:50] these so uh similar to the issue with a person [00:19:52] so uh similar to the issue with a person in michigan there has been efforts in [00:19:54] in michigan there has been efforts in applying ai to model future problem [00:19:55] applying ai to model future problem human behavior [00:19:57] human behavior um [00:19:58] um oh this is a comment yes um and that's [00:20:00] oh this is a comment yes um and that's highly problematic uh i think in one of [00:20:02] highly problematic uh i think in one of the earlier uh slider talks there was a [00:20:05] the earlier uh slider talks there was a discussion about sort of how [00:20:07] discussion about sort of how amplification and feedback effects are [00:20:08] amplification and feedback effects are really insidious and yeah so predicting [00:20:10] really insidious and yeah so predicting the future and actioning on predicted [00:20:12] the future and actioning on predicted future behavior is even more problematic [00:20:14] future behavior is even more problematic than the task i described here because [00:20:17] than the task i described here because acting on the real world will change the [00:20:18] acting on the real world will change the outcome right so if you predict that [00:20:20] outcome right so if you predict that crime will happen in a certain area you [00:20:21] crime will happen in a certain area you assign more police and you find more [00:20:23] assign more police and you find more crime that's going to lead to a pretty [00:20:24] crime that's going to lead to a pretty vicious feedback loop so you need to [00:20:26] vicious feedback loop so you need to really think about [00:20:28] really think about sort of the whole socio technical system [00:20:31] sort of the whole socio technical system rather than just the classification [00:20:32] rather than just the classification system narrowly narrowly when you're in [00:20:35] system narrowly narrowly when you're in those settings [00:20:36] those settings um [00:20:38] um the last one uh it seems that we can [00:20:40] the last one uh it seems that we can always slice our data into more cell [00:20:41] always slice our data into more cell populations to test for fairness are [00:20:43] populations to test for fairness are there industry standards for what we [00:20:45] there industry standards for what we should usually start with [00:20:47] should usually start with that's a great question and also a [00:20:48] that's a great question and also a really important academic one um so [00:20:51] really important academic one um so there is a sort of [00:20:53] there is a sort of easy answer which is that a lot of [00:20:56] easy answer which is that a lot of research and a lot of industry work has [00:20:58] research and a lot of industry work has focused on sort of legally protected [00:20:59] focused on sort of legally protected groups [00:21:00] groups and that's a well-defined set of [00:21:02] and that's a well-defined set of attributes that you can't discriminate [00:21:04] attributes that you can't discriminate on and so you can group by those you can [00:21:05] on and so you can group by those you can group by intersections of those and you [00:21:07] group by intersections of those and you can say those are the groups i shouldn't [00:21:09] can say those are the groups i shouldn't discriminate on [00:21:10] discriminate on but sort of academically this seems [00:21:12] but sort of academically this seems unsatisfying because why should those be [00:21:14] unsatisfying because why should those be the only things we care about and [00:21:16] the only things we care about and there's a lot of work on sort of [00:21:17] there's a lot of work on sort of individualized fairness making sure that [00:21:19] individualized fairness making sure that you do well on sort of individual people [00:21:22] you do well on sort of individual people make sure you treat people that are [00:21:23] make sure you treat people that are similar similarly and things like that [00:21:25] similar similarly and things like that and that's a whole active area of [00:21:26] and that's a whole active area of research and sort of not really [00:21:28] research and sort of not really something where there's an obvious and [00:21:30] something where there's an obvious and clear answer yet [00:21:32] clear answer yet okay [00:21:33] okay any other final questions before i move [00:21:35] any other final questions before i move on [00:21:42] okay um so now i'm going to move on to [00:21:46] okay um so now i'm going to move on to sort of the second point that i talked [00:21:47] sort of the second point that i talked about before that machine learning [00:21:49] about before that machine learning systems aren't really secure and can't [00:21:51] systems aren't really secure and can't really be used in many high-stakes [00:21:53] really be used in many high-stakes situations [00:21:54] situations so [00:21:55] so i'm going to start with one of the most [00:21:58] i'm going to start with one of the most well-known examples of this um called [00:22:00] well-known examples of this um called adversarial examples [00:22:02] adversarial examples where on the left we have an image this [00:22:04] where on the left we have an image this is a panda and a classification system [00:22:07] is a panda and a classification system gets this mostly right so it's panda [00:22:08] gets this mostly right so it's panda with 57 confidence that's great now what [00:22:12] with 57 confidence that's great now what we're going to do is we're going to add [00:22:14] we're going to do is we're going to add a very specially designed and visually [00:22:16] a very specially designed and visually imperceptible perturbation so this [00:22:18] imperceptible perturbation so this middle panel looks like complete noise [00:22:20] middle panel looks like complete noise we scale it down so that it looks just [00:22:22] we scale it down so that it looks just like zeros and then we add it to the [00:22:24] like zeros and then we add it to the panda image and we get the image on the [00:22:26] panda image and we get the image on the right now we run our image classifier [00:22:28] right now we run our image classifier and what we get out is it's almost [00:22:30] and what we get out is it's almost certainly a given which is you know [00:22:31] certainly a given which is you know completely wrong [00:22:33] completely wrong and so what this tells us is [00:22:35] and so what this tells us is we can find visually imperceptible [00:22:37] we can find visually imperceptible perturbations that lead to [00:22:39] perturbations that lead to very confident misclassifications and [00:22:41] very confident misclassifications and i'm not going to you know show you the [00:22:42] i'm not going to you know show you the results of this adversarial example [00:22:44] results of this adversarial example stuff but you can do this to almost any [00:22:46] stuff but you can do this to almost any system and you can completely and [00:22:47] system and you can completely and catastrophically destroy the accuracy of [00:22:50] catastrophically destroy the accuracy of all of these systems and this also [00:22:52] all of these systems and this also happens in nlp systems and so on and so [00:22:54] happens in nlp systems and so on and so this is a really sort of hard to avoid [00:22:56] this is a really sort of hard to avoid and almost sort of universal behavior uh [00:22:58] and almost sort of universal behavior uh and i want to show you how you know [00:23:01] and i want to show you how you know sort of robust this kind of behavior is [00:23:04] sort of robust this kind of behavior is um and so it doesn't have to be images [00:23:06] um and so it doesn't have to be images on a computer screen it can happen by [00:23:08] on a computer screen it can happen by putting little black and white patches [00:23:11] putting little black and white patches to a stop sign and so on the left a [00:23:13] to a stop sign and so on the left a system is going to classify that as a [00:23:14] system is going to classify that as a yield instead of a stop sign [00:23:16] yield instead of a stop sign the middle one is a fun 3d printed [00:23:20] the middle one is a fun 3d printed toy where if you try to run an object [00:23:22] toy where if you try to run an object recognizer it will say gun from almost [00:23:24] recognizer it will say gun from almost any angle [00:23:25] any angle and the right one is adversarial sticker [00:23:28] and the right one is adversarial sticker where if you stick it anywhere and you [00:23:29] where if you stick it anywhere and you take an image it's going to say that [00:23:30] take an image it's going to say that it's a toaster [00:23:32] it's a toaster instead of a banana which is what it [00:23:34] instead of a banana which is what it should be so these are very many [00:23:36] should be so these are very many different formats but you have this uh [00:23:37] different formats but you have this uh same and kind of disturbing phenomena [00:23:40] same and kind of disturbing phenomena where [00:23:41] where you know it's obvious to us that [00:23:43] you know it's obvious to us that something shouldn't be [00:23:45] something shouldn't be tricking us like you know black and [00:23:46] tricking us like you know black and white patches that are that small or [00:23:48] white patches that are that small or like weird texture on a turtle shouldn't [00:23:50] like weird texture on a turtle shouldn't really be fooling us into changing our [00:23:52] really be fooling us into changing our predictions but it really fools [00:23:55] predictions but it really fools these image classifiers [00:23:57] these image classifiers and when you first see this you think [00:23:59] and when you first see this you think there must be a really simple patch [00:24:01] there must be a really simple patch right like maybe you like run it through [00:24:02] right like maybe you like run it through a jpeg compressor maybe you add a little [00:24:04] a jpeg compressor maybe you add a little bit of extra noise to every image [00:24:06] bit of extra noise to every image and so this has led to an enormous [00:24:08] and so this has led to an enormous number of papers [00:24:10] number of papers over like a hundred or so over the last [00:24:13] over like a hundred or so over the last um five or six years in which people [00:24:16] um five or six years in which people have tried a lot of different things to [00:24:17] have tried a lot of different things to defend [00:24:18] defend against [00:24:20] against these kinds of what's called adversarial [00:24:21] these kinds of what's called adversarial perturbations but the problem is that [00:24:23] perturbations but the problem is that every time someone comes up with a [00:24:25] every time someone comes up with a defense um soon after [00:24:27] defense um soon after someone's gonna someone breaks it by [00:24:29] someone's gonna someone breaks it by finding you know a better attack or even [00:24:31] finding you know a better attack or even somewhat more disturbingly just running [00:24:33] somewhat more disturbingly just running the old attack for longer [00:24:35] the old attack for longer and so it kind of seems like this is a [00:24:37] and so it kind of seems like this is a really persistent [00:24:39] really persistent um and serious phenomena and [00:24:42] um and serious phenomena and i think the recent view of a lot of [00:24:44] i think the recent view of a lot of these adversarial example um [00:24:48] these adversarial example um type [00:24:49] type problems is not that you know there's [00:24:51] problems is not that you know there's some [00:24:52] some really degenerate artifact about the way [00:24:54] really degenerate artifact about the way we train models or the way we um [00:24:56] we train models or the way we um optimize things it's really just the [00:24:58] optimize things it's really just the fact that there are a lot of ways to [00:25:00] fact that there are a lot of ways to have a high performance prediction [00:25:02] have a high performance prediction system and many of the ways in which we [00:25:04] system and many of the ways in which we can predict accurately rely on these [00:25:06] can predict accurately rely on these what we're going to call non-robust [00:25:08] what we're going to call non-robust features [00:25:09] features and so when we try to say classify a dog [00:25:12] and so when we try to say classify a dog or a cat or so on we as humans rely on [00:25:15] or a cat or so on we as humans rely on these what [00:25:16] these what we're going to term robust features [00:25:18] we're going to term robust features right we try to identify eyes and snout [00:25:20] right we try to identify eyes and snout in these parts [00:25:21] in these parts and so these kinds of things are pretty [00:25:23] and so these kinds of things are pretty robust to pixel level perturbations but [00:25:25] robust to pixel level perturbations but actually you know low level textures and [00:25:28] actually you know low level textures and really small image patches are also very [00:25:30] really small image patches are also very predictive of classes of dogs versus [00:25:33] predictive of classes of dogs versus cats let's say [00:25:35] cats let's say and who are we to say right that that's [00:25:37] and who are we to say right that that's like an incorrect way to make the [00:25:38] like an incorrect way to make the predictions because when we train the [00:25:40] predictions because when we train the model [00:25:41] model all we're saying is you know just [00:25:42] all we're saying is you know just classify these dogs and cats well [00:25:45] classify these dogs and cats well and so you can think of this as saying [00:25:46] and so you can think of this as saying you know our problem is underspecified [00:25:49] you know our problem is underspecified and there are many ways to get a good [00:25:50] and there are many ways to get a good classifier and some of them really rely [00:25:53] classifier and some of them really rely on the use of these sort of like [00:25:54] on the use of these sort of like non-robust features [00:25:57] non-robust features um [00:25:58] um and [00:25:59] and this has kind of serious security [00:26:01] this has kind of serious security implications right [00:26:03] implications right um if you're trying to make a [00:26:04] um if you're trying to make a self-driving car system [00:26:07] self-driving car system um [00:26:09] um you know the stop sign being classified [00:26:10] you know the stop sign being classified as a yield is pretty bad you might run [00:26:12] as a yield is pretty bad you might run over a pedestrian and this really [00:26:14] over a pedestrian and this really prevents the application of machine [00:26:16] prevents the application of machine learning systems in things like [00:26:18] learning systems in things like um [00:26:19] um self-driving cars or at least we should [00:26:21] self-driving cars or at least we should be very hesitant if we believe that [00:26:23] be very hesitant if we believe that these kinds of problems are inherent [00:26:24] these kinds of problems are inherent right because the world is kind of [00:26:26] right because the world is kind of designed so that they're um really [00:26:28] designed so that they're um really easily perceptible to humans and not [00:26:30] easily perceptible to humans and not necessarily designed so that small [00:26:32] necessarily designed so that small perturbations say by putting on stickers [00:26:35] perturbations say by putting on stickers don't change um stop signs to yield [00:26:37] don't change um stop signs to yield signs um and in other cases right vision [00:26:40] signs um and in other cases right vision systems i think are being increasingly [00:26:42] systems i think are being increasingly being used in high-stakes applications [00:26:44] being used in high-stakes applications like we might reasonably imagine say [00:26:46] like we might reasonably imagine say right at a tsa checkpoint there's a [00:26:48] right at a tsa checkpoint there's a camera that's running and it tries to [00:26:49] camera that's running and it tries to identify whether or not you have a gun [00:26:51] identify whether or not you have a gun right and if you can make these [00:26:52] right and if you can make these adversarial examples that say make a gun [00:26:54] adversarial examples that say make a gun a knot gun or a turtle a gun that seems [00:26:57] a knot gun or a turtle a gun that seems very problematic right we can't use [00:26:59] very problematic right we can't use vision systems for those kinds of high [00:27:00] vision systems for those kinds of high stakes applications that we might want [00:27:02] stakes applications that we might want to use them for [00:27:04] to use them for and so both of these really pose [00:27:05] and so both of these really pose challenges for the use of machine [00:27:07] challenges for the use of machine learning in these like sort of high [00:27:09] learning in these like sort of high stakes life or death kind of uh settings [00:27:12] stakes life or death kind of uh settings um [00:27:13] um and i'm gonna stop here um to take [00:27:15] and i'm gonna stop here um to take questions about uh adversarial examples [00:27:18] questions about uh adversarial examples for the next couple minutes [00:27:22] do i know why the first example was [00:27:24] do i know why the first example was classified as a yield sign [00:27:26] classified as a yield sign um that's a good question i mean with [00:27:28] um that's a good question i mean with all these adversarial examples the [00:27:31] all these adversarial examples the reason why they're being classified as [00:27:33] reason why they're being classified as yield is pretty [00:27:35] yield is pretty confusing i think well [00:27:36] confusing i think well um for example like why is this turtle [00:27:39] um for example like why is this turtle being classified as a gun i'm really not [00:27:41] being classified as a gun i'm really not sure it doesn't look anything like a gun [00:27:42] sure it doesn't look anything like a gun and the textures don't look like a gun [00:27:45] and the textures don't look like a gun the way these things are constructed [00:27:46] the way these things are constructed right is they're constructed by [00:27:48] right is they're constructed by optimization process you're basically [00:27:50] optimization process you're basically looking for [00:27:52] looking for perturbations on say a normal turtle [00:27:54] perturbations on say a normal turtle texture that lead it to being a gun and [00:27:56] texture that lead it to being a gun and so there's no real interpretable reason [00:27:58] so there's no real interpretable reason why say this looks like a yield sign or [00:28:01] why say this looks like a yield sign or this is being classified as a gun from [00:28:02] this is being classified as a gun from every angle [00:28:06] can i train these train on these [00:28:08] can i train these train on these examples to correct classification [00:28:11] examples to correct classification oh sorry i skipped a question how do we [00:28:14] oh sorry i skipped a question how do we define the non-robust feature versus [00:28:17] define the non-robust feature versus ah yes okay so this is an ad hoc [00:28:19] ah yes okay so this is an ad hoc definition so the split just here is [00:28:22] definition so the split just here is just whether or not a feature can be [00:28:23] just whether or not a feature can be flipped by changing um the image [00:28:26] flipped by changing um the image slightly in pixel space um and so that's [00:28:29] slightly in pixel space um and so that's really the working definition here of [00:28:31] really the working definition here of robust versus non-robust [00:28:33] robust versus non-robust and i think [00:28:35] and i think if we were being more precise i think [00:28:37] if we were being more precise i think this should really be split as saying [00:28:39] this should really be split as saying visually imperceptible is non-robust and [00:28:42] visually imperceptible is non-robust and visually perceptible is possibly robust [00:28:44] visually perceptible is possibly robust and i think that's sort of a pretty [00:28:45] and i think that's sort of a pretty reasonable split anything that humans [00:28:47] reasonable split anything that humans really cannot visually tell [00:28:50] really cannot visually tell should not be being used as sort of [00:28:51] should not be being used as sort of features as inputs to a [00:28:54] features as inputs to a reliable prediction system [00:28:57] um can i train on these examples to [00:28:59] um can i train on these examples to correct classification um i'm going to [00:29:02] correct classification um i'm going to try to interpret this question because [00:29:03] try to interpret this question because i'm not 100 sure i think what you're [00:29:06] i'm not 100 sure i think what you're describing is an idea called adversarial [00:29:08] describing is an idea called adversarial training so the idea is to basically [00:29:11] training so the idea is to basically instead of training on um [00:29:14] instead of training on um just the input image let's just go back [00:29:16] just the input image let's just go back a little bit instead of trying on the [00:29:17] a little bit instead of trying on the pandas right we try to train our system [00:29:20] pandas right we try to train our system to [00:29:21] to basically classify this image the [00:29:23] basically classify this image the adversarial image as being a pen right [00:29:25] adversarial image as being a pen right and you might think okay this is good um [00:29:27] and you might think okay this is good um we can now make this a panda [00:29:29] we can now make this a panda but now we need to prevent um some other [00:29:32] but now we need to prevent um some other sort of adversarially designed noise [00:29:34] sort of adversarially designed noise from making sure that we look like a [00:29:35] from making sure that we look like a panda right because there's probably [00:29:37] panda right because there's probably many many many different attacks that [00:29:40] many many many different attacks that will change this into a panda and so the [00:29:42] will change this into a panda and so the idea that you're describing here is [00:29:44] idea that you're describing here is basically called adversarial training [00:29:46] basically called adversarial training and i think that's in [00:29:47] and i think that's in yes that's one of the earliest [00:29:50] yes that's one of the earliest defense approaches [00:29:52] defense approaches and it's empirically reasonably [00:29:54] and it's empirically reasonably effective [00:29:55] effective um [00:29:56] um but it's [00:29:58] but it's you can still attack it by sort of more [00:30:00] you can still attack it by sort of more sophisticated methods you can find [00:30:02] sophisticated methods you can find still visually imperceptible attacks [00:30:04] still visually imperceptible attacks after adversarial training that breaks [00:30:06] after adversarial training that breaks the system so this is not really a [00:30:07] the system so this is not really a foolproof way [00:30:08] foolproof way of um trying to make models more robust [00:30:10] of um trying to make models more robust it's better than nothing [00:30:12] it's better than nothing um are you seeing yet unfound defenses [00:30:15] um are you seeing yet unfound defenses are needed for ml self-driving cars to [00:30:16] are needed for ml self-driving cars to be secure from nefarious attacks i think [00:30:18] be secure from nefarious attacks i think this is a great question [00:30:21] this is a great question i think i was i'm being a little bit too [00:30:24] i think i was i'm being a little bit too aggressive in terms of the things that [00:30:26] aggressive in terms of the things that i'm saying right um [00:30:29] i'm saying right um sort of it's an open question whether or [00:30:30] sort of it's an open question whether or not these kinds of attacks [00:30:32] not these kinds of attacks are really feasible in the real world or [00:30:34] are really feasible in the real world or whether or not they're things that we [00:30:36] whether or not they're things that we should [00:30:36] should worry about right like in the real world [00:30:38] worry about right like in the real world i can easily cut out cut off a stop sign [00:30:40] i can easily cut out cut off a stop sign you know using a saw and that's an [00:30:42] you know using a saw and that's an adversarial human attack but we're not [00:30:44] adversarial human attack but we're not too worried about that attack and so [00:30:46] too worried about that attack and so maybe we shouldn't be worried about [00:30:47] maybe we shouldn't be worried about adversarial attacks on self-driving car [00:30:49] adversarial attacks on self-driving car systems but i think [00:30:51] systems but i think there are two things that this [00:30:52] there are two things that this highlights one thing is that we should [00:30:54] highlights one thing is that we should be a little bit careful when we deploy [00:30:56] be a little bit careful when we deploy these self-driving car systems right [00:30:57] these self-driving car systems right like we should have fail safes that for [00:30:58] like we should have fail safes that for example don't rely just on vision [00:31:00] example don't rely just on vision that seems pretty important we might [00:31:02] that seems pretty important we might want to use like radar or lidar radar [00:31:04] want to use like radar or lidar radar doesn't work on soft people let's say [00:31:05] doesn't work on soft people let's say lidar um try to make sure we're not [00:31:07] lidar um try to make sure we're not going to run over people when we miss [00:31:08] going to run over people when we miss detect the stop sign right like having [00:31:10] detect the stop sign right like having lots of orthogonal checks becomes [00:31:12] lots of orthogonal checks becomes increasingly important once you realize [00:31:14] increasingly important once you realize that there are ways to fool these vision [00:31:16] that there are ways to fool these vision systems [00:31:16] systems um [00:31:18] um and [00:31:19] and i think people are working on sort of [00:31:21] i think people are working on sort of provably robust machine learning systems [00:31:24] provably robust machine learning systems um maybe in settings like sort of [00:31:26] um maybe in settings like sort of military applications those become truly [00:31:27] military applications those become truly important um and so there is progress on [00:31:30] important um and so there is progress on that but it's just that provably robust [00:31:31] that but it's just that provably robust systems achieve much worse average [00:31:34] systems achieve much worse average accuracy than non-robust systems so [00:31:35] accuracy than non-robust systems so there's this big gap right now that we [00:31:37] there's this big gap right now that we don't really know how to close [00:31:39] don't really know how to close um [00:31:41] um okay in my opinion is research now [00:31:43] okay in my opinion is research now shifting towards reformulating models to [00:31:45] shifting towards reformulating models to rely on robust features instead of [00:31:46] rely on robust features instead of finding ad hoc defenses [00:31:48] finding ad hoc defenses um [00:31:49] um that's a good question i think there is [00:31:51] that's a good question i think there is still a big gap in terms of provably [00:31:53] still a big gap in terms of provably robust defenses versus [00:31:56] robust defenses versus um these what you might call ad hoc [00:31:58] um these what you might call ad hoc defenses that work well for like say one [00:32:00] defenses that work well for like say one or two targeted attack types [00:32:03] or two targeted attack types um [00:32:04] um but i think there's things like [00:32:05] but i think there's things like randomized smoothing and procedures like [00:32:07] randomized smoothing and procedures like that now that sort of get the best of [00:32:09] that now that sort of get the best of both worlds in some sense they're [00:32:10] both worlds in some sense they're getting increasingly they're provably [00:32:11] getting increasingly they're provably correct in some framework and they're [00:32:14] correct in some framework and they're getting increasingly better um and so i [00:32:16] getting increasingly better um and so i think for high-stakes applications i [00:32:17] think for high-stakes applications i think we'll end up in a place where [00:32:19] think we'll end up in a place where we'll lose some average case accuracy [00:32:21] we'll lose some average case accuracy but not catastrophically so and we'll [00:32:23] but not catastrophically so and we'll still have sort of adversarially robust [00:32:25] still have sort of adversarially robust models [00:32:27] models that's sort of where i imagine the field [00:32:29] that's sort of where i imagine the field will go [00:32:30] will go um it does seem like ad-hoc defenses [00:32:32] um it does seem like ad-hoc defenses keep getting broken so that's not really [00:32:33] keep getting broken so that's not really a path towards like truly robust systems [00:32:35] a path towards like truly robust systems even though they might make [00:32:37] even though they might make for more useful systems overall if for [00:32:39] for more useful systems overall if for example adversarial training leads to [00:32:42] example adversarial training leads to more interpretable latent features [00:32:45] more interpretable latent features um does producing adversarial attacks [00:32:47] um does producing adversarial attacks require access to the model if so isn't [00:32:49] require access to the model if so isn't this just an issue of info security [00:32:51] this just an issue of info security equivalent of [00:32:53] equivalent of uh [00:32:54] uh i can't pass that second sentence but [00:32:56] i can't pass that second sentence but yes i agree [00:32:58] yes i agree with this general sentiment as well [00:33:00] with this general sentiment as well right so if you need access to the [00:33:02] right so if you need access to the internals of the model [00:33:03] internals of the model then really at that point you know [00:33:05] then really at that point you know you've rooted the system if you're [00:33:08] you've rooted the system if you're attacking say a medical imaging system [00:33:10] attacking say a medical imaging system uh or you've like have access to [00:33:12] uh or you've like have access to someone's car and if you're the mossad [00:33:14] someone's car and if you're the mossad you can probably like mess with their [00:33:15] you can probably like mess with their brakes right um and so it's true that [00:33:18] brakes right um and so it's true that those attack models are pretty obscure [00:33:21] those attack models are pretty obscure and weird but there are for um what are [00:33:24] and weird but there are for um what are called white box attacks which only [00:33:25] called white box attacks which only require to evaluate the model [00:33:28] require to evaluate the model for one and for two um [00:33:30] for one and for two um a lot of systems are shared right so you [00:33:32] a lot of systems are shared right so you only need to learn to attack them [00:33:34] only need to learn to attack them uh once and so if you're trying to [00:33:36] uh once and so if you're trying to attack tesla's um auto driving system [00:33:38] attack tesla's um auto driving system you need to get a tesla you need to [00:33:40] you need to get a tesla you need to figure out you know the adversarial [00:33:41] figure out you know the adversarial sticker that's going to make your system [00:33:43] sticker that's going to make your system go haywire and then you need to like [00:33:45] go haywire and then you need to like paste that everywhere right that doesn't [00:33:46] paste that everywhere right that doesn't require a particularly sophisticated uh [00:33:49] require a particularly sophisticated uh threat model in order to execute [00:33:52] threat model in order to execute so i think there's models in which these [00:33:54] so i think there's models in which these are sort of real and problematic even [00:33:56] are sort of real and problematic even though i think there's a lot of you know [00:33:57] though i think there's a lot of you know questions and debate about whether or [00:33:59] questions and debate about whether or not we should really care about this [00:34:01] not we should really care about this like cost-benefit trade-offs of robust [00:34:03] like cost-benefit trade-offs of robust versus non-robust systems and so on but [00:34:05] versus non-robust systems and so on but it's an important thing to keep in mind [00:34:06] it's an important thing to keep in mind right [00:34:10] okay [00:34:12] okay any other questions for the adversarial [00:34:14] any other questions for the adversarial examples part [00:34:20] okay [00:34:22] okay and so now i'm going to get to the last [00:34:23] and so now i'm going to get to the last part which given that this is a ai class [00:34:26] part which given that this is a ai class is maybe the most important of the [00:34:29] is maybe the most important of the uh three failures of sort of machine [00:34:31] uh three failures of sort of machine learning now in terms of robustness i [00:34:33] learning now in terms of robustness i mean i think it's one of understanding [00:34:35] mean i think it's one of understanding and [00:34:36] and people sort of throw this word around a [00:34:38] people sort of throw this word around a lot that models don't really understand [00:34:40] lot that models don't really understand and it's hard to pin down what [00:34:41] and it's hard to pin down what understanding is but it's very easy to [00:34:43] understanding is but it's very easy to show when models don't understand and so [00:34:46] show when models don't understand and so we can go through some examples here [00:34:47] we can go through some examples here we'll go through them again in more [00:34:49] we'll go through them again in more detail [00:34:50] detail um but this is from an overview of what [00:34:52] um but this is from an overview of what people call shortcuts um in the citation [00:34:54] people call shortcuts um in the citation on the bottom [00:34:55] on the bottom um and for example let's say we're [00:34:57] um and for example let's say we're trying to caption the this image here [00:34:59] trying to caption the this image here and so we need to you know describe in [00:35:02] and so we need to you know describe in text what's happening but really [00:35:04] text what's happening but really sometimes these systems might just use [00:35:05] sometimes these systems might just use the background instead of actually [00:35:07] the background instead of actually recognizing hills and skies and sheep [00:35:09] recognizing hills and skies and sheep um [00:35:11] um adversarial examples maybe you know [00:35:13] adversarial examples maybe you know because we're recognizing textures and [00:35:15] because we're recognizing textures and not actually recognizing the shape of [00:35:16] not actually recognizing the shape of things like teapots [00:35:18] things like teapots and if we're doing medical image [00:35:19] and if we're doing medical image diagnostics you know we might be looking [00:35:21] diagnostics you know we might be looking at markings on x-rays like what hospital [00:35:24] at markings on x-rays like what hospital the x-ray came from instead of actually [00:35:26] the x-ray came from instead of actually performing prediction and so in all [00:35:27] performing prediction and so in all these cases we're making use of these [00:35:30] these cases we're making use of these pieces of information that shouldn't [00:35:32] pieces of information that shouldn't really be central to the tasks like [00:35:33] really be central to the tasks like they're not the core production tasks [00:35:35] they're not the core production tasks that we care about and somehow the model [00:35:37] that we care about and somehow the model has picked them up and learned to do [00:35:38] has picked them up and learned to do really well um and i'm just gonna group [00:35:41] really well um and i'm just gonna group this broadly under this label of [00:35:43] this broadly under this label of shortcut learning [00:35:46] shortcut learning and the way to think about this is that [00:35:48] and the way to think about this is that when we train models in machine learning [00:35:50] when we train models in machine learning we're training them to do well well [00:35:52] we're training them to do well well directly on the training set and these [00:35:54] directly on the training set and these days we now expect them to do well on [00:35:56] days we now expect them to do well on the test set but really what we would [00:35:58] the test set but really what we would ideally like them to do as sort of [00:36:00] ideally like them to do as sort of systems that reason and understand and [00:36:02] systems that reason and understand and so on is that they perform well you know [00:36:05] so on is that they perform well you know on these like challenge sets like really [00:36:07] on these like challenge sets like really difficult examples that we've [00:36:08] difficult examples that we've constructed to break the model [00:36:10] constructed to break the model um and so you can think about this as [00:36:12] um and so you can think about this as you know there's a lot of possible rules [00:36:14] you know there's a lot of possible rules that work well in the training set and [00:36:16] that work well in the training set and there's fewer that work well on um test [00:36:18] there's fewer that work well on um test sets and there's you know very very few [00:36:21] sets and there's you know very very few that work well on sort of these [00:36:23] that work well on sort of these challenge sets the intended true [00:36:25] challenge sets the intended true reasoning that we would like our models [00:36:27] reasoning that we would like our models to extract [00:36:28] to extract and so we can think about machine [00:36:30] and so we can think about machine learning today as you know we've gone [00:36:32] learning today as you know we've gone from this like [00:36:33] from this like uh tan colored circle where we were [00:36:35] uh tan colored circle where we were before to this you know blue colored [00:36:38] before to this you know blue colored circle where we are now where we do all [00:36:39] circle where we are now where we do all the tests at but where we really want to [00:36:41] the tests at but where we really want to be is still further we want to make sure [00:36:42] be is still further we want to make sure we learn the right sort of mechanism [00:36:44] we learn the right sort of mechanism um and you may have heard this classic [00:36:47] um and you may have heard this classic ai story um of tank identification and i [00:36:50] ai story um of tank identification and i think this is like a really old uh like [00:36:52] think this is like a really old uh like you know cold war kind of story where um [00:36:55] you know cold war kind of story where um i'm gonna read this out loud you know [00:36:56] i'm gonna read this out loud you know the army trained the program to [00:36:57] the army trained the program to differentiate american tanks from [00:36:59] differentiate american tanks from russian tanks it got 100 accuracy on a [00:37:01] russian tanks it got 100 accuracy on a test set but later people realized that [00:37:04] test set but later people realized that american tanks were photographed on a [00:37:05] american tanks were photographed on a sunny day and russian tanks were [00:37:07] sunny day and russian tanks were photographed on a cloudy day and the [00:37:08] photographed on a cloudy day and the computer launched effect brightness [00:37:11] computer launched effect brightness not actually detected tanks [00:37:13] not actually detected tanks and so you know this is exactly the kind [00:37:14] and so you know this is exactly the kind of problem that i'm talking about right [00:37:16] of problem that i'm talking about right um where we have this extremely high [00:37:18] um where we have this extremely high test accuracy and we are super happy but [00:37:21] test accuracy and we are super happy but then we realize we haven't learned [00:37:22] then we realize we haven't learned anything about the underlying task [00:37:24] anything about the underlying task um and this has been attributed to a lot [00:37:26] um and this has been attributed to a lot of different people it was i think [00:37:28] of different people it was i think originally in written form on dreyfus's [00:37:30] originally in written form on dreyfus's textbook um but turns out it's an actual [00:37:32] textbook um but turns out it's an actual real example this can't really be [00:37:34] real example this can't really be attributed to any actual experiment run [00:37:36] attributed to any actual experiment run by the army um the citation here is [00:37:38] by the army um the citation here is warren he has a website where he has [00:37:40] warren he has a website where he has gone through all the possible [00:37:41] gone through all the possible attributions of this urban myth um [00:37:44] attributions of this urban myth um but really this urban myth has is so [00:37:46] but really this urban myth has is so popular i think at least in the ai [00:37:48] popular i think at least in the ai machine learning community because [00:37:49] machine learning community because there's a kernel of truth uh to this [00:37:53] there's a kernel of truth uh to this um [00:37:54] um sort of story and we're gonna go through [00:37:57] sort of story and we're gonna go through several examples of um [00:38:00] several examples of um tasks today where there's these kinds of [00:38:02] tasks today where there's these kinds of failures [00:38:04] failures um so one of them that's kind of fun uh [00:38:06] um so one of them that's kind of fun uh is [00:38:07] is this vision task apparently of where [00:38:09] this vision task apparently of where people have tried to predict gender from [00:38:11] people have tried to predict gender from iris patterns [00:38:12] iris patterns um and [00:38:14] um and there was apparently some belief that [00:38:16] there was apparently some belief that this was a task that you can perform [00:38:18] this was a task that you can perform because you can actually get reasonably [00:38:19] because you can actually get reasonably high test accuracy if you train cnns on [00:38:23] high test accuracy if you train cnns on cropped images of viruses and you try to [00:38:24] cropped images of viruses and you try to predict gender [00:38:26] predict gender um but there's this paper [00:38:28] um but there's this paper that identified that actually this is [00:38:30] that identified that actually this is not actually because of the iris it's [00:38:32] not actually because of the iris it's because um female eyes often have [00:38:34] because um female eyes often have mascara and that systematically shifts [00:38:37] mascara and that systematically shifts the brightness of the images um and this [00:38:39] the brightness of the images um and this sort of [00:38:40] sort of histogram tells sort of you know a [00:38:42] histogram tells sort of you know a thousand words in one image and so on [00:38:44] thousand words in one image and so on the top one you see this distribution of [00:38:47] the top one you see this distribution of males and this x-axis here is the [00:38:49] males and this x-axis here is the average brightness of the image and so [00:38:51] average brightness of the image and so the distributions look pretty similar [00:38:52] the distributions look pretty similar like males and females have a similar [00:38:54] like males and females have a similar brightness distribution [00:38:56] brightness distribution when females have no cosmetics on [00:38:59] when females have no cosmetics on but if you restrict yourself to females [00:39:01] but if you restrict yourself to females with cosmetics you know this red [00:39:03] with cosmetics you know this red distribution shifts to the left [00:39:04] distribution shifts to the left and the image becomes darker and so you [00:39:07] and the image becomes darker and so you see that there's this you know very [00:39:08] see that there's this you know very strong confounding effect of the female [00:39:11] strong confounding effect of the female eyes having mascara therefore being [00:39:14] eyes having mascara therefore being darker and therefore these systems [00:39:16] darker and therefore these systems predicting quite well based on this [00:39:18] predicting quite well based on this average uh darkness [00:39:20] average uh darkness even though really apparently wasn't [00:39:22] even though really apparently wasn't learning anything at all in terms of [00:39:23] learning anything at all in terms of this actual prediction task [00:39:26] this actual prediction task another one from sort of the sort of [00:39:28] another one from sort of the sort of guern website and sort of the [00:39:30] guern website and sort of the investigation into this tank phenomenon [00:39:32] investigation into this tank phenomenon which is interesting is uh kaggle [00:39:34] which is interesting is uh kaggle fisheries competition where the task is [00:39:36] fisheries competition where the task is you're given images um of fishes being [00:39:39] you're given images um of fishes being caught on a fishing boat and the task is [00:39:41] caught on a fishing boat and the task is to identify whether or not these uh [00:39:43] to identify whether or not these uh boats are catching fish illegally so [00:39:45] boats are catching fish illegally so you're supposed to identify whether or [00:39:46] you're supposed to identify whether or not these fish are part of a set of sort [00:39:49] not these fish are part of a set of sort of protected category of fish you're not [00:39:51] of protected category of fish you're not supposed to catch [00:39:52] supposed to catch i mean it turns out on the training set [00:39:54] i mean it turns out on the training set you can do extremely well on this task [00:39:56] you can do extremely well on this task using a very simple heuristic [00:39:59] using a very simple heuristic these images come from a relatively [00:40:00] these images come from a relatively small number of boats so you first [00:40:01] small number of boats so you first identify each boat and then you identify [00:40:04] identify each boat and then you identify for each boat whether or not they've [00:40:05] for each boat whether or not they've been catching illegal fish and this [00:40:07] been catching illegal fish and this approach does really well because it [00:40:09] approach does really well because it turns out only a few fish or a few boats [00:40:11] turns out only a few fish or a few boats um catch these illegal types of fish and [00:40:13] um catch these illegal types of fish and so by first identifying the boat and [00:40:15] so by first identifying the boat and then by identifying the fish you can get [00:40:17] then by identifying the fish you can get extremely high accuracy even though you [00:40:19] extremely high accuracy even though you have learned nothing about actually [00:40:21] have learned nothing about actually performing this fish identification task [00:40:25] performing this fish identification task another one that seems maybe more high [00:40:27] another one that seems maybe more high stakes and problematic [00:40:29] stakes and problematic is um in medical prediction there is a [00:40:31] is um in medical prediction there is a lot of talk about you know tumor [00:40:33] lot of talk about you know tumor identification or chest x-ray [00:40:36] identification or chest x-ray sort of malignancy prediction [00:40:38] sort of malignancy prediction and in these cases right we it's pretty [00:40:41] and in these cases right we it's pretty important to ask whether or not we're [00:40:43] important to ask whether or not we're doing well because [00:40:45] doing well because these are high stakes situations then [00:40:46] these are high stakes situations then you would like to make sure that you're [00:40:48] you would like to make sure that you're not being fooled by some sort of feature [00:40:50] not being fooled by some sort of feature that makes the cast easier than it [00:40:51] that makes the cast easier than it actually should be [00:40:53] actually should be and there's often claims now of these [00:40:55] and there's often claims now of these systems performing [00:40:57] systems performing just as well as human doctors in terms [00:40:59] just as well as human doctors in terms of their diagnostic accuracy and so on [00:41:02] of their diagnostic accuracy and so on and [00:41:03] and one sort of really interesting um and [00:41:06] one sort of really interesting um and maybe a little problematic example is [00:41:09] maybe a little problematic example is when you have these sort of tumors um [00:41:12] when you have these sort of tumors um sort of skin lesions that you're trying [00:41:13] sort of skin lesions that you're trying to classify as whether or not they're [00:41:14] to classify as whether or not they're cancerous or not um doctors will often [00:41:17] cancerous or not um doctors will often put surgical markers to highlight [00:41:20] put surgical markers to highlight tumors that they think are more serious [00:41:22] tumors that they think are more serious than others just so that when someone [00:41:23] than others just so that when someone else is looking at them they can [00:41:24] else is looking at them they can immediately identify the more [00:41:26] immediately identify the more problematic ones [00:41:28] problematic ones and [00:41:29] and the training set apparently for these [00:41:31] the training set apparently for these systems contain a lot of these markings [00:41:33] systems contain a lot of these markings and so there was a examination into [00:41:36] and so there was a examination into these sort of tumor classification [00:41:37] these sort of tumor classification systems where they artificially added uh [00:41:40] systems where they artificially added uh markings to these images as well as [00:41:42] markings to these images as well as cropped out markings from already marked [00:41:45] cropped out markings from already marked images and they show that they can [00:41:46] images and they show that they can basically flip the classification output [00:41:48] basically flip the classification output of these systems [00:41:50] of these systems and so in some ways the high accuracy of [00:41:52] and so in some ways the high accuracy of these kinds of classification systems [00:41:54] these kinds of classification systems are not because they're identifying [00:41:55] are not because they're identifying humors it's because they're piggybacking [00:41:58] humors it's because they're piggybacking on humans who have already in many cases [00:42:01] on humans who have already in many cases classified the tumors as being malignant [00:42:02] classified the tumors as being malignant or not [00:42:04] or not um an early um problem that someone [00:42:07] um an early um problem that someone identified in one of the earlier works [00:42:09] identified in one of the earlier works and oh sorry identification [00:42:12] of this esteva in 2011 is when people [00:42:16] of this esteva in 2011 is when people are trying to identify whether or not [00:42:18] are trying to identify whether or not tumors are malignant when in serious [00:42:20] tumors are malignant when in serious cases people would include rulers to [00:42:22] cases people would include rulers to show how big the tumor is and so the [00:42:24] show how big the tumor is and so the existence of a ruler would serve as a [00:42:25] existence of a ruler would serve as a sort of spurious correlation or a [00:42:27] sort of spurious correlation or a confounder [00:42:29] confounder in terms of [00:42:31] in terms of uh whether or not a tumor was malignant [00:42:34] uh whether or not a tumor was malignant and finally [00:42:36] and finally one that i think people are now aware [00:42:38] one that i think people are now aware about but initially i think people are [00:42:40] about but initially i think people are um [00:42:41] um sort of not as aware of is that hospital [00:42:44] sort of not as aware of is that hospital id often serves as a really reliable [00:42:46] id often serves as a really reliable indicator of both sort of base risk [00:42:49] indicator of both sort of base risk level as well as the type of procedures [00:42:51] level as well as the type of procedures being performed at a hospital [00:42:54] being performed at a hospital and this you can think of as like the [00:42:55] and this you can think of as like the analogous to the boat example in the [00:42:57] analogous to the boat example in the fishing problem where if you identify [00:43:00] fishing problem where if you identify hospitals that say have a lot of [00:43:03] hospitals that say have a lot of smokers you're going to much more likely [00:43:05] smokers you're going to much more likely find [00:43:06] find cancer in lung chest x-rays from those [00:43:09] cancer in lung chest x-rays from those types of hospitals and so it's really [00:43:11] types of hospitals and so it's really important to try to remove the effect of [00:43:13] important to try to remove the effect of sort of identifying the hospital and [00:43:14] sort of identifying the hospital and then identifying the base risk [00:43:19] um [00:43:20] um a really interesting one i wasn't aware [00:43:22] a really interesting one i wasn't aware of in image classification until [00:43:24] of in image classification until yesterday or so is um pascal voc [00:43:28] yesterday or so is um pascal voc is a pretty common uh object innovation [00:43:30] is a pretty common uh object innovation data set and uh [00:43:33] data set and uh a bias that's been identified is that [00:43:34] a bias that's been identified is that the horse class for this [00:43:36] the horse class for this i guess was taken by a single horse [00:43:38] i guess was taken by a single horse photographer who put in watermarks at [00:43:40] photographer who put in watermarks at the bottom left of the image so around [00:43:41] the bottom left of the image so around 20 of the horse images have a watermark [00:43:45] 20 of the horse images have a watermark and reliable classification systems just [00:43:47] and reliable classification systems just learn to pick up on the watermark so you [00:43:49] learn to pick up on the watermark so you can make cars um classified as horses as [00:43:51] can make cars um classified as horses as long as you add the watermark on the [00:43:53] long as you add the watermark on the bottom right [00:43:55] bottom right and so this is something where unless [00:43:57] and so this is something where unless you really carefully looked at the data [00:43:59] you really carefully looked at the data set you probably won't even realize that [00:44:01] set you probably won't even realize that this kind of bias exists until you've [00:44:03] this kind of bias exists until you've actually um [00:44:05] actually um sort of carefully examined an [00:44:06] sort of carefully examined an adversarially examine the data sets that [00:44:09] adversarially examine the data sets that you have [00:44:13] sort of finally [00:44:14] sort of finally there is [00:44:15] there is i've mostly talked about vision [00:44:18] i've mostly talked about vision examples thus far but this is a [00:44:20] examples thus far but this is a this sort of shortcuts and lack of [00:44:22] this sort of shortcuts and lack of understanding [00:44:24] understanding is a problem that's common to [00:44:25] is a problem that's common to every area and i'm going to give [00:44:27] every area and i'm going to give probably a very well-known example in [00:44:29] probably a very well-known example in natural language processing [00:44:30] natural language processing the task here is entailment prediction [00:44:33] the task here is entailment prediction so you're given a pair of sentences one [00:44:35] so you're given a pair of sentences one is called the premise and another one is [00:44:37] is called the premise and another one is called the hypothesis so the first one [00:44:38] called the hypothesis so the first one is a sentence like the economy could be [00:44:40] is a sentence like the economy could be better [00:44:41] better and the second one is a sentence like [00:44:43] and the second one is a sentence like the economy has never been better [00:44:46] the economy has never been better and the goal here is to say does the [00:44:48] and the goal here is to say does the hypothesis logically follow from the [00:44:51] hypothesis logically follow from the statement made in the premise and so if [00:44:53] statement made in the premise and so if they follow you say it's entailed if [00:44:55] they follow you say it's entailed if it's contradicted you say it's a [00:44:56] it's contradicted you say it's a contradiction and if it's neither you [00:44:58] contradiction and if it's neither you say it's neither and so it's a [00:44:59] say it's neither and so it's a three-class classification problem and [00:45:01] three-class classification problem and the way these tasks are sorry the way [00:45:03] the way these tasks are sorry the way these examples are constructed is [00:45:05] these examples are constructed is through crowdsourcing where you extract [00:45:07] through crowdsourcing where you extract the premise sentence from some sort of [00:45:09] the premise sentence from some sort of large internet text [00:45:11] large internet text or newswire text i guess um and you have [00:45:14] or newswire text i guess um and you have a label that you randomly pick so you [00:45:16] a label that you randomly pick so you say i have a premise and it's going to [00:45:18] say i have a premise and it's going to be a contradiction and then you ask [00:45:20] be a contradiction and then you ask crowd workers to write down a [00:45:21] crowd workers to write down a contradiction so they write something [00:45:22] contradiction so they write something like the economy has never been better [00:45:25] like the economy has never been better right [00:45:27] right and so [00:45:28] and so uh what happens here is that crowd [00:45:29] uh what happens here is that crowd workers because they're writing the [00:45:31] workers because they're writing the hypothesis text [00:45:32] hypothesis text after you're seeing the label have [00:45:34] after you're seeing the label have systematic biases and the bias that's [00:45:36] systematic biases and the bias that's really strong is this negation [00:45:39] really strong is this negation where they learn that negation is often [00:45:42] where they learn that negation is often or sorry where the bias is that when [00:45:44] or sorry where the bias is that when something's not entailed they use [00:45:45] something's not entailed they use negation and so a model will often learn [00:45:47] negation and so a model will often learn to associate the negation or lack [00:45:49] to associate the negation or lack thereof with the outcome label and so [00:45:52] thereof with the outcome label and so instead of actually doing these sort of [00:45:54] instead of actually doing these sort of entailment tasks they often pick up [00:45:55] entailment tasks they often pick up these negation biases and even more [00:45:58] these negation biases and even more problematically [00:46:00] problematically systems have what's called or sorry [00:46:02] systems have what's called or sorry what's called a hypothesis-only baseline [00:46:04] what's called a hypothesis-only baseline where you don't even look at the premise [00:46:06] where you don't even look at the premise can do extremely well right and there's [00:46:08] can do extremely well right and there's no way to do well on this task while [00:46:09] no way to do well on this task while looking just at the hypothesis right [00:46:11] looking just at the hypothesis right because how can you know [00:46:12] because how can you know that the hypothesis is entailed from the [00:46:14] that the hypothesis is entailed from the premise while only just looking at one [00:46:17] premise while only just looking at one and so this shows like this really [00:46:18] and so this shows like this really strong bias that these crowd workers put [00:46:21] strong bias that these crowd workers put into this data set [00:46:25] and so this has kind of serious [00:46:26] and so this has kind of serious implications for the project of [00:46:28] implications for the project of pushing [00:46:30] pushing machine learning and getting towards [00:46:31] machine learning and getting towards understanding and general ai right [00:46:34] understanding and general ai right because [00:46:34] because thus far all of machine learning has [00:46:36] thus far all of machine learning has been predicated on benchmark progress [00:46:38] been predicated on benchmark progress and that's the way in which the field [00:46:40] and that's the way in which the field has really grown and done well you know [00:46:43] has really grown and done well you know imagenet and mnli and these sort of [00:46:46] imagenet and mnli and these sort of well-known tasks you get everyone [00:46:47] well-known tasks you get everyone together and we push on these numbers [00:46:49] together and we push on these numbers and we hope that improvements in these [00:46:52] and we hope that improvements in these benchmark performances lead to [00:46:53] benchmark performances lead to understanding but it's clear that [00:46:54] understanding but it's clear that because of these biases that may not [00:46:57] because of these biases that may not necessarily be the case right and so [00:46:58] necessarily be the case right and so like we need a different sort of [00:47:00] like we need a different sort of paradigm to link machine learning [00:47:02] paradigm to link machine learning performance to understanding [00:47:05] performance to understanding and the other problem that i hope by [00:47:06] and the other problem that i hope by sort of going over so many examples i [00:47:08] sort of going over so many examples i was able to impress upon you [00:47:11] was able to impress upon you is that you know there are so many [00:47:12] is that you know there are so many shortcuts right and like you know in [00:47:15] shortcuts right and like you know in this uh negation bias from crowd workers [00:47:17] this uh negation bias from crowd workers like you wouldn't know about this unless [00:47:19] like you wouldn't know about this unless you sort of looked at the data set [00:47:22] you sort of looked at the data set carefully after being told that there [00:47:23] carefully after being told that there was a bias right where the water marks [00:47:25] was a bias right where the water marks were horses like i don't even know how [00:47:27] were horses like i don't even know how they found that given that how sort of [00:47:29] they found that given that how sort of minor that is [00:47:31] minor that is so it becomes really hard to say like [00:47:34] so it becomes really hard to say like we're just going to construct a data set [00:47:35] we're just going to construct a data set free of shortcuts like when you're told [00:47:37] free of shortcuts like when you're told about these shortcuts afterwards it [00:47:39] about these shortcuts afterwards it seems really obvious but like how can [00:47:41] seems really obvious but like how can you construct a shortcut free data set [00:47:43] you construct a shortcut free data set um [00:47:45] um and so that's sort of the real challenge [00:47:47] and so that's sort of the real challenge now if we think that we can't get data [00:47:49] now if we think that we can't get data sets free of these shortcuts and these [00:47:50] sets free of these shortcuts and these biases and these minority groups you [00:47:53] biases and these minority groups you know we need a new way of sort of trying [00:47:54] know we need a new way of sort of trying to make sure that our models really [00:47:56] to make sure that our models really learn the right thing [00:47:58] learn the right thing um and i'm going to stop here for a [00:48:00] um and i'm going to stop here for a moment to sort of talk about shortcuts [00:48:02] moment to sort of talk about shortcuts and understanding and hopefully people [00:48:03] and understanding and hopefully people have lots of questions because i think [00:48:04] have lots of questions because i think this one is a fun one uh in terms of [00:48:07] this one is a fun one uh in terms of thinking about how machine learning [00:48:08] thinking about how machine learning really stay ai and so on [00:48:15] sure um [00:48:18] just thinking about like some committee [00:48:20] just thinking about like some committee modeling and and um you know i'm coming [00:48:22] modeling and and um you know i'm coming back to that stop sign with the patches [00:48:25] back to that stop sign with the patches that seem as a yield sign [00:48:27] that seem as a yield sign if you had um [00:48:29] if you had um you know does it make more sense to use [00:48:32] you know does it make more sense to use one big model trained on every piece of [00:48:34] one big model trained on every piece of data you can find [00:48:36] data you can find or does it make sense to you know [00:48:39] or does it make sense to you know train a bunch of models on some [00:48:41] train a bunch of models on some partitions of the data [00:48:43] partitions of the data that might overlap in some way and then [00:48:45] that might overlap in some way and then combine those votes in order to make it [00:48:47] combine those votes in order to make it less [00:48:49] less um [00:48:51] um uh i'm sort of um [00:48:53] uh i'm sort of um different sort of robotically um [00:48:57] different sort of robotically um easily fooled in a robotic way um [00:49:00] easily fooled in a robotic way um yeah i think that's um a generally good [00:49:04] yeah i think that's um a generally good thing to do like so [00:49:06] thing to do like so i guess there's two answers and like the [00:49:08] i guess there's two answers and like the more general one [00:49:09] more general one is to think about the trade-off between [00:49:12] is to think about the trade-off between uh model capacity [00:49:13] uh model capacity and your ability to fit these like sort [00:49:16] and your ability to fit these like sort of minority groups or these like [00:49:17] of minority groups or these like shortcuts right so [00:49:19] shortcuts right so the idea you're describing let's say we [00:49:21] the idea you're describing let's say we have you know 10 or like 100 different [00:49:23] have you know 10 or like 100 different models right and we fit them to [00:49:24] models right and we fit them to different parts of the data then we [00:49:26] different parts of the data then we might have a model that's dedicated to [00:49:27] might have a model that's dedicated to shortcuts but we might also have a model [00:49:29] shortcuts but we might also have a model that like really learns the right thing [00:49:30] that like really learns the right thing right [00:49:31] right um and so you know the more flexible our [00:49:34] um and so you know the more flexible our model the bigger our sort of model class [00:49:36] model the bigger our sort of model class the more we can say like part of the [00:49:38] the more we can say like part of the model might dedicate the shortcuts but [00:49:39] model might dedicate the shortcuts but that's okay because the rest of our [00:49:41] that's okay because the rest of our model will still learn the right thing [00:49:42] model will still learn the right thing right but that's sort of still a hope [00:49:44] right but that's sort of still a hope there's no real guarantee that this will [00:49:46] there's no real guarantee that this will happen and if the shortcuts are strong [00:49:47] happen and if the shortcuts are strong enough that's what the model will learn [00:49:49] enough that's what the model will learn so i think [00:49:50] so i think it seems really important um to have [00:49:52] it seems really important um to have bigger capacity models like that's sort [00:49:54] bigger capacity models like that's sort of given but you know how can we learn [00:49:57] of given but you know how can we learn um big models well like without [00:49:59] um big models well like without overfitting how can we make sure that [00:50:01] overfitting how can we make sure that they still learn sort of the right thing [00:50:02] they still learn sort of the right thing like if one part of the model fits the [00:50:04] like if one part of the model fits the shortcuts how can we make sure the rest [00:50:05] shortcuts how can we make sure the rest of it learns to do the right sort of [00:50:08] of it learns to do the right sort of prediction without shortcuts that's sort [00:50:09] prediction without shortcuts that's sort of the open question i think in this [00:50:11] of the open question i think in this area yeah [00:50:13] area yeah um [00:50:13] um [Music] [00:50:15] [Music] there was a question for image [00:50:17] there was a question for image segmentation do we have a way to know [00:50:18] segmentation do we have a way to know which part of the image contribute to [00:50:20] which part of the image contribute to the prediction we could call it [00:50:21] the prediction we could call it prediction traceability yes [00:50:24] prediction traceability yes um so [00:50:25] um so i of course glossed over quite a bit but [00:50:27] i of course glossed over quite a bit but the this paper la pushkin 2019 um here [00:50:32] the this paper la pushkin 2019 um here is exactly about that is about trying to [00:50:34] is exactly about that is about trying to identify or to attribute um [00:50:37] identify or to attribute um predictions to parts of the image using [00:50:41] predictions to parts of the image using interpretability methods and that's how [00:50:43] interpretability methods and that's how they found i think this horse [00:50:46] they found i think this horse problem where they attributed [00:50:48] problem where they attributed predictions to locations and they found [00:50:50] predictions to locations and they found that for horses they were always [00:50:52] that for horses they were always localized to the bottom right and it was [00:50:54] localized to the bottom right and it was because of this watermark and so i think [00:50:56] because of this watermark and so i think a big [00:50:57] a big important use case of interpretability [00:50:59] important use case of interpretability methods is exactly this [00:51:01] methods is exactly this it's to identify these kinds of [00:51:03] it's to identify these kinds of shortcuts by attributing predictions to [00:51:05] shortcuts by attributing predictions to locations in the image or to subgroups [00:51:08] locations in the image or to subgroups in the data set [00:51:11] in the data set um [00:51:12] um going along the above comment are there [00:51:13] going along the above comment are there methods of finding what parts of the [00:51:14] methods of finding what parts of the image or data example has high weights [00:51:16] image or data example has high weights associated with it yes [00:51:18] associated with it yes um so [00:51:19] um so the general approach i think that people [00:51:22] the general approach i think that people have i think there's of course many [00:51:24] have i think there's of course many different methods but [00:51:26] different methods but shapley values are a pretty general [00:51:28] shapley values are a pretty general framework you can sort of think about [00:51:30] framework you can sort of think about um [00:51:32] um the analogy is kind of like this think [00:51:33] the analogy is kind of like this think about each pixel or each region or [00:51:36] about each pixel or each region or subpart of the image as participating in [00:51:38] subpart of the image as participating in the prediction and you can ask when i [00:51:40] the prediction and you can ask when i remove that part of the image [00:51:43] remove that part of the image how much does the prediction accuracy go [00:51:45] how much does the prediction accuracy go down right [00:51:46] down right and so you can do that after sort of [00:51:48] and so you can do that after sort of randomly removing other parts of the [00:51:49] randomly removing other parts of the image so you randomly remove the other [00:51:51] image so you randomly remove the other parts of the image like drop random [00:51:52] parts of the image like drop random pixels and then you drop the part you're [00:51:54] pixels and then you drop the part you're interested in you ask on average how [00:51:56] interested in you ask on average how much more accuracy does this uh part [00:51:59] much more accuracy does this uh part that i'm interested in give me right and [00:52:01] that i'm interested in give me right and that's sort of this estimate called the [00:52:02] that's sort of this estimate called the shampoo value [00:52:03] shampoo value and a punggway i think who is here has a [00:52:05] and a punggway i think who is here has a paper on you know approximations to that [00:52:07] paper on you know approximations to that based on the influence function so [00:52:09] based on the influence function so there's all sorts of ways um in the [00:52:11] there's all sorts of ways um in the interpretability community about doing [00:52:13] interpretability community about doing what's called feature attribution [00:52:16] what's called feature attribution um another question for a problem with [00:52:18] um another question for a problem with discrimination would a [00:52:19] discrimination would a reasonable approach be to adopt active [00:52:22] reasonable approach be to adopt active learning by default where the model can [00:52:23] learning by default where the model can train with more emphasis on wrongly [00:52:25] train with more emphasis on wrongly categorized examples then the hope is [00:52:27] categorized examples then the hope is that the model could steer itself away [00:52:28] that the model could steer itself away from the initial biases over time or is [00:52:30] from the initial biases over time or is it not as simple yeah so that's um [00:52:33] it not as simple yeah so that's um obviously sort of uh [00:52:35] obviously sort of uh actually one way around it right like [00:52:37] actually one way around it right like when you can collect your own data and [00:52:39] when you can collect your own data and you have the ability to know what you're [00:52:41] you have the ability to know what you're getting wrong [00:52:42] getting wrong then you know collecting data in the [00:52:44] then you know collecting data in the places where you're wrong serves as a [00:52:46] places where you're wrong serves as a negative feedback loop right you get [00:52:47] negative feedback loop right you get more data where you're wrong your model [00:52:49] more data where you're wrong your model gets the training signal it needs to [00:52:51] gets the training signal it needs to correct [00:52:52] correct um and so eventually you'll learn the [00:52:54] um and so eventually you'll learn the right thing it's just that active [00:52:55] right thing it's just that active learning on the scale that you need is [00:52:57] learning on the scale that you need is very very challenging right like can you [00:53:00] very very challenging right like can you actively collect image net size style [00:53:02] actively collect image net size style data very challenging also very [00:53:04] data very challenging also very challenging to know what parts of the [00:53:06] challenging to know what parts of the data you're doing badly on right um you [00:53:08] data you're doing badly on right um you need to know well enough to say oh the [00:53:10] need to know well enough to say oh the part of the data i'm doing badly on is [00:53:12] part of the data i'm doing badly on is horses without the watermark right [00:53:14] horses without the watermark right that's sort of a hard thing to be able [00:53:16] that's sort of a hard thing to be able to say so you need to know what you [00:53:18] to say so you need to know what you don't know which is almost equally as [00:53:20] don't know which is almost equally as challenging as robustness [00:53:23] challenging as robustness um oh wow there's a lot of questions now [00:53:25] um oh wow there's a lot of questions now um [00:53:28] um physicians are experimenting in [00:53:29] physicians are experimenting in end-of-life care based on ai to nudge [00:53:31] end-of-life care based on ai to nudge conversations do you have any [00:53:33] conversations do you have any suggestions for patients [00:53:34] suggestions for patients and doctors [00:53:36] and doctors um [00:53:38] that is yeah that's very challenging i [00:53:41] that is yeah that's very challenging i do think one important thing [00:53:43] do think one important thing about sort of these high stakes settings [00:53:45] about sort of these high stakes settings is to think about sort of the [00:53:46] is to think about sort of the alternative and the whole [00:53:48] alternative and the whole decision systems that they're part of [00:53:50] decision systems that they're part of right um [00:53:51] right um so say medical diagnosis or i'll give [00:53:54] so say medical diagnosis or i'll give another example i'm more familiar with [00:53:56] another example i'm more familiar with which is um predicting whether or not [00:53:58] which is um predicting whether or not someone will commit crime again and so [00:54:00] someone will commit crime again and so should be released for parole right [00:54:01] should be released for parole right these are both really high stakes [00:54:03] these are both really high stakes prediction tasks and the way they're [00:54:05] prediction tasks and the way they're performed is there's you know a machine [00:54:06] performed is there's you know a machine learning system and the human and they [00:54:08] learning system and the human and they sort of jointly make a decision and so [00:54:09] sort of jointly make a decision and so you need to think about not only the [00:54:11] you need to think about not only the machine learning system which is sort of [00:54:12] machine learning system which is sort of what i've talked about here but also the [00:54:13] what i've talked about here but also the human part right and how they integrate [00:54:15] human part right and how they integrate and how their decisions get combined [00:54:17] and how their decisions get combined together um i think that's actually the [00:54:19] together um i think that's actually the important part like how do humans [00:54:20] important part like how do humans override the machines how do they [00:54:22] override the machines how do they incorporate the suggestions of the [00:54:23] incorporate the suggestions of the machines even more so than the [00:54:24] machines even more so than the predictions which i think [00:54:26] predictions which i think need to always be taken uh carefully um [00:54:30] need to always be taken uh carefully um next one how can we combine models [00:54:31] next one how can we combine models objectives to gain greater understanding [00:54:33] objectives to gain greater understanding of the world and to combine it to create [00:54:34] of the world and to combine it to create intelligent behavior um [00:54:36] intelligent behavior um yeah i think this is uh this is [00:54:38] yeah i think this is uh this is basically the [00:54:39] basically the the open question right the unfortunate [00:54:41] the open question right the unfortunate thing that we don't know how to do um [00:54:43] thing that we don't know how to do um and i think that's sort of the thing [00:54:44] and i think that's sort of the thing that we are struggling with in sort of [00:54:46] that we are struggling with in sort of the robustness uh generalization [00:54:48] the robustness uh generalization literature like what is the right [00:54:50] literature like what is the right approach and i think there's not really [00:54:51] approach and i think there's not really even a consensus is it like more data [00:54:53] even a consensus is it like more data collection is it smarter ways of [00:54:55] collection is it smarter ways of training the model or is it better [00:54:56] training the model or is it better models um it's [00:54:59] models um it's not clear yet what the right way is yet [00:55:01] not clear yet what the right way is yet unfortunately you don't have a great [00:55:02] unfortunately you don't have a great answer beyond that [00:55:03] answer beyond that um [00:55:04] um for shortcuts i think even people use [00:55:06] for shortcuts i think even people use shortcuts to identify something um do we [00:55:08] shortcuts to identify something um do we have some way to understand shortcuts [00:55:09] have some way to understand shortcuts based on training data [00:55:11] based on training data yeah this is a good point um so humans [00:55:13] yeah this is a good point um so humans will also use shortcuts but i think [00:55:16] will also use shortcuts but i think the important thing here is that these [00:55:17] the important thing here is that these shortcuts are a lot more crude than the [00:55:19] shortcuts are a lot more crude than the humans ones that we use [00:55:21] humans ones that we use and at some level they don't even pass [00:55:24] and at some level they don't even pass basic sanity checks right like for [00:55:26] basic sanity checks right like for entailment we know that the output [00:55:27] entailment we know that the output should depend on both the both the [00:55:29] should depend on both the both the inputs but in reality it's only [00:55:32] inputs but in reality it's only depending on the hypothesis which is [00:55:33] depending on the hypothesis which is very problematic [00:55:35] very problematic and what people do today is they [00:55:37] and what people do today is they have these sort of challenge sets like [00:55:39] have these sort of challenge sets like examine the performance of a model just [00:55:41] examine the performance of a model just based on the hypothesis and these kinds [00:55:43] based on the hypothesis and these kinds of sort of what are close to those are [00:55:45] of sort of what are close to those are unit tests [00:55:46] unit tests help catch these kinds of shortcuts in [00:55:48] help catch these kinds of shortcuts in many cases so i think [00:55:49] many cases so i think um one way to detect them is things like [00:55:51] um one way to detect them is things like that like our model shouldn't be [00:55:53] that like our model shouldn't be sensitive to certain perturbations or it [00:55:55] sensitive to certain perturbations or it should be sensitive certain [00:55:56] should be sensitive certain perturbations when you go off of those [00:55:58] perturbations when you go off of those kinds of assertions [00:56:01] kinds of assertions is there currently a correlation between [00:56:03] is there currently a correlation between model capacity and number of shortcuts [00:56:05] model capacity and number of shortcuts employed are larger models more likely [00:56:07] employed are larger models more likely to happen on the correct correlations or [00:56:09] to happen on the correct correlations or are smaller models more likely to use [00:56:11] are smaller models more likely to use shortcuts [00:56:12] shortcuts that's a good question i think [00:56:15] that's a good question i think the general sense that i get from [00:56:16] the general sense that i get from reading the literature is that smaller [00:56:18] reading the literature is that smaller models are more likely to use shortcuts [00:56:20] models are more likely to use shortcuts in many ways um for example in this [00:56:23] in many ways um for example in this paper about sort of watermark [00:56:26] paper about sort of watermark based shortcuts [00:56:27] based shortcuts linear models did a lot worse like they [00:56:29] linear models did a lot worse like they would really pick up heavily on these [00:56:31] would really pick up heavily on these watermarks whereas cnns did so um with [00:56:34] watermarks whereas cnns did so um with less weights and less frequency [00:56:36] less weights and less frequency um and i think generally that's true [00:56:38] um and i think generally that's true that like large capacity models trained [00:56:40] that like large capacity models trained with a lot of data can use some of its [00:56:42] with a lot of data can use some of its capacity just to model the shortcuts and [00:56:43] capacity just to model the shortcuts and it'll still do well on the data without [00:56:45] it'll still do well on the data without shortcuts as long as they exist but [00:56:48] shortcuts as long as they exist but really the key thing here is like you [00:56:49] really the key thing here is like you need to at least see some data without [00:56:51] need to at least see some data without the shortcut pattern [00:56:54] um [00:56:55] um how much are you oh i already answered [00:56:57] how much are you oh i already answered okay great [00:57:00] yeah so i wanna get into the breakout [00:57:02] yeah so i wanna get into the breakout now actually especially because someone [00:57:03] now actually especially because someone asked the question you know uh what sort [00:57:05] asked the question you know uh what sort of the solution and you know are humans [00:57:08] of the solution and you know are humans robust so um if woody you could uh drop [00:57:11] robust so um if woody you could uh drop us into a breakout session for say uh [00:57:13] us into a breakout session for say uh five or six minutes [00:57:15] five or six minutes um it would be great to talk about these [00:57:17] um it would be great to talk about these two questions so um the first one is are [00:57:19] two questions so um the first one is are the brittleness issues inevitable you [00:57:21] the brittleness issues inevitable you know what do you think are the solutions [00:57:22] know what do you think are the solutions like what are the right approaches that [00:57:24] like what are the right approaches that you think of i mean the second question [00:57:26] you think of i mean the second question is are humans robust um and you know if [00:57:28] is are humans robust um and you know if you think so then what's the key [00:57:30] you think so then what's the key ingredient that makes humans robust or [00:57:32] ingredient that makes humans robust or more robust than machines [00:57:35] more robust than machines awesome yes i'll create the breakout [00:57:37] awesome yes i'll create the breakout rooms if everyone wants to take a quick [00:57:38] rooms if everyone wants to take a quick screenshot or try to remember these i'll [00:57:41] screenshot or try to remember these i'll post the questions in the chat as well [00:57:42] post the questions in the chat as well but they won't be in the breakout [00:57:44] but they won't be in the breakout session [00:57:46] session oh [00:57:48] oh okay [00:57:53] great um i think i'm unmuted right let's [00:57:56] great um i think i'm unmuted right let's see [00:57:57] see yes okay great all right excellent [00:57:59] yes okay great all right excellent all right so i'll go through the second [00:58:01] all right so i'll go through the second part a little bit quicker um i'm glad [00:58:04] part a little bit quicker um i'm glad that we got so many good questions on [00:58:05] that we got so many good questions on the first part which is the more [00:58:07] the first part which is the more important of the two parts of this talk [00:58:09] important of the two parts of this talk and the second part is thinking a little [00:58:11] and the second part is thinking a little bit about how can we do learning how can [00:58:13] bit about how can we do learning how can we fix these problems and the kinds of [00:58:15] we fix these problems and the kinds of research that [00:58:16] research that percy and i and others in stanford have [00:58:19] percy and i and others in stanford have been doing and so the key problem right [00:58:22] been doing and so the key problem right that i think you should keep in mind [00:58:23] that i think you should keep in mind with all this is that the training [00:58:25] with all this is that the training distribution is very different from the [00:58:26] distribution is very different from the test distribution this is the root of [00:58:27] test distribution this is the root of all evil for these robustness problems [00:58:30] all evil for these robustness problems um and so [00:58:32] um and so we need to think about you know is the [00:58:33] we need to think about you know is the limitation that we can't generalize from [00:58:35] limitation that we can't generalize from train to test inevitable or can we come [00:58:36] train to test inevitable or can we come up with some clever data collection [00:58:38] up with some clever data collection schemes or model training mechanisms [00:58:41] schemes or model training mechanisms that allow us to generalize um and so to [00:58:43] that allow us to generalize um and so to do this we need to think a little bit [00:58:44] do this we need to think a little bit about you know [00:58:45] about you know how distributions can shift [00:58:48] how distributions can shift and so i'll give you a little bit of [00:58:50] and so i'll give you a little bit of definitions the first one is like [00:58:51] definitions the first one is like covariate shift and this is what you [00:58:53] covariate shift and this is what you usually think of when people say the [00:58:55] usually think of when people say the distributions are different so you might [00:58:57] distributions are different so you might get you know these let's say you're [00:58:58] get you know these let's say you're making a face recognizer you have these [00:59:00] making a face recognizer you have these really nice welded portraits in the [00:59:02] really nice welded portraits in the training data and at test time you're [00:59:03] training data and at test time you're using it with cctvs so all sorts of [00:59:06] using it with cctvs so all sorts of different environments but the [00:59:07] different environments but the underlying task is the same and there [00:59:09] underlying task is the same and there should be a single predictor that does [00:59:11] should be a single predictor that does well both on portraits and images [00:59:13] well both on portraits and images cropped from cctvs [00:59:15] cropped from cctvs another example is label shift where you [00:59:18] another example is label shift where you know the input features look similar but [00:59:20] know the input features look similar but now the output labels distribution has [00:59:22] now the output labels distribution has shifted so for example if you're making [00:59:24] shifted so for example if you're making a face recognizer and at training time [00:59:26] a face recognizer and at training time you know you need really precise [00:59:27] you know you need really precise matching so you're only going to be [00:59:29] matching so you're only going to be calling detections when the images look [00:59:30] calling detections when the images look exactly the same but at test time you [00:59:32] exactly the same but at test time you know you're making a product for your [00:59:34] know you're making a product for your camera and so it can be a little bit [00:59:35] camera and so it can be a little bit looser and so you might deal with blurry [00:59:37] looser and so you might deal with blurry images and so on and so the limits test [00:59:40] images and so on and so the limits test for this is you know you have the same [00:59:42] for this is you know you have the same predictor but you're changing your [00:59:43] predictor but you're changing your threshold you're just saying it's okay [00:59:45] threshold you're just saying it's okay if you know i'm a little bit less [00:59:46] if you know i'm a little bit less confident i'm still gonna make the call [00:59:48] confident i'm still gonna make the call and this is you know an instance of [00:59:49] and this is you know an instance of label shift [00:59:50] label shift and the final one that is basically [00:59:52] and the final one that is basically intractable in all cases is concept [00:59:54] intractable in all cases is concept drift um and so here you know you might [00:59:56] drift um and so here you know you might have a prediction task where you're [00:59:58] have a prediction task where you're trying to initially recognize um faces [01:00:01] trying to initially recognize um faces of you know the same people but then at [01:00:03] of you know the same people but then at test time you want to match people [01:00:04] test time you want to match people across time like young pictures and old [01:00:07] across time like young pictures and old pictures the task is fundamentally [01:00:08] pictures the task is fundamentally different right like whether or not [01:00:09] different right like whether or not you're matching the same person or [01:00:11] you're matching the same person or person shifted in time and so no one [01:00:13] person shifted in time and so no one predictor is going to do really well on [01:00:15] predictor is going to do really well on both of these tasks and so there's sort [01:00:16] both of these tasks and so there's sort of a fundamental change on in the task [01:00:19] of a fundamental change on in the task definition [01:00:20] definition i mean i'm going to go over to sort of [01:00:22] i mean i'm going to go over to sort of ways to deal with all these problems the [01:00:25] ways to deal with all these problems the first one is we're just going to collect [01:00:27] first one is we're just going to collect more data like someone sort of talked [01:00:28] more data like someone sort of talked about this as a question earlier and [01:00:30] about this as a question earlier and this is sort of the key thing right like [01:00:32] this is sort of the key thing right like if we get more data we can do a lot more [01:00:33] if we get more data we can do a lot more things and the second part is a little [01:00:35] things and the second part is a little bit more ambitious and it's going to say [01:00:37] bit more ambitious and it's going to say you know let's try to only do with what [01:00:39] you know let's try to only do with what data we have so the first idea is let's [01:00:42] data we have so the first idea is let's just say you know [01:00:43] just say you know we're going to try to generalize to a [01:00:44] we're going to try to generalize to a test set and we're just going to collect [01:00:46] test set and we're just going to collect more data so a classic example of this [01:00:48] more data so a classic example of this kind of task is um you're recognizing [01:00:50] kind of task is um you're recognizing digits so you have images of digits the [01:00:52] digits so you have images of digits the left one is mnist which is really old [01:00:54] left one is mnist which is really old usps even older actually an svhn which [01:00:56] usps even older actually an svhn which is a more modern recognized digits from [01:00:58] is a more modern recognized digits from um i think uh male [01:01:01] um i think uh male numbers that are taken from the wild so [01:01:04] numbers that are taken from the wild so in all of these cases you need to output [01:01:05] in all of these cases you need to output the number right like this one's a 2 [01:01:07] the number right like this one's a 2 this one's a 7 and so on [01:01:09] this one's a 7 and so on we're collecting new data let's say we [01:01:11] we're collecting new data let's say we trained a model on mnist and we want to [01:01:13] trained a model on mnist and we want to do well on svhn at test time and so this [01:01:16] do well on svhn at test time and so this is a distribution shift [01:01:18] is a distribution shift um but we might be able to collect more [01:01:19] um but we might be able to collect more data right it's unrealistic to maybe say [01:01:21] data right it's unrealistic to maybe say we only have this and we have to predict [01:01:23] we only have this and we have to predict this [01:01:24] this and so [01:01:26] and so what we might have is we might be able [01:01:27] what we might have is we might be able to collect some unlabeled data from svhn [01:01:30] to collect some unlabeled data from svhn right like we can't afford someone [01:01:32] right like we can't afford someone labeling them by hand but we might just [01:01:33] labeling them by hand but we might just be able to get the images and this is [01:01:34] be able to get the images and this is called unsupervised domain adaptation [01:01:37] called unsupervised domain adaptation um and if we can collect labels that's [01:01:39] um and if we can collect labels that's all called um supervised domain [01:01:41] all called um supervised domain adaptation and that's even better [01:01:43] adaptation and that's even better and so we can ask like when can we do [01:01:44] and so we can ask like when can we do learning right like if we're in [01:01:46] learning right like if we're in covariate shift and we have this like [01:01:48] covariate shift and we have this like source data we might be able to do [01:01:50] source data we might be able to do learning because we have a better model [01:01:52] learning because we have a better model that can adapt to these different kinds [01:01:54] that can adapt to these different kinds of distribution shifts that occur [01:01:56] of distribution shifts that occur so the general thing that you should [01:01:57] so the general thing that you should sort of think about is you know if we're [01:01:59] sort of think about is you know if we're in covariate shift settings and [01:02:01] in covariate shift settings and we have source domain data so unlabeled [01:02:03] we have source domain data so unlabeled data from our target distribution [01:02:06] data from our target distribution then we can actually sometimes [01:02:08] then we can actually sometimes generalize to our target task even [01:02:10] generalize to our target task even though we don't have labels [01:02:11] though we don't have labels um [01:02:12] um and so this is sort of the setting that [01:02:14] and so this is sort of the setting that we're going to be talking about for the [01:02:16] we're going to be talking about for the rest of the talk the rest of the couple [01:02:18] rest of the talk the rest of the couple minutes of this talk where we're going [01:02:20] minutes of this talk where we're going to say you know we have this covariate [01:02:22] to say you know we have this covariate shift problem where the prediction task [01:02:24] shift problem where the prediction task is fundamentally the same [01:02:26] is fundamentally the same and how can we generalize [01:02:28] and how can we generalize so [01:02:29] so the easiest thing to do and the most [01:02:31] the easiest thing to do and the most classic thing to do is re-weighting so [01:02:33] classic thing to do is re-weighting so let's say we have training data that's [01:02:34] let's say we have training data that's 90 frontal images but 10 [01:02:37] 90 frontal images but 10 images taken from the side and we want [01:02:39] images taken from the side and we want to generalize the test data that's 50 50 [01:02:41] to generalize the test data that's 50 50 front and side right so how can we do [01:02:43] front and side right so how can we do this well let's just re-weight the data [01:02:45] this well let's just re-weight the data set so each frontal facing image counts [01:02:47] set so each frontal facing image counts for less and each side facing image [01:02:50] for less and each side facing image counts for more right so we've [01:02:51] counts for more right so we've artificially rebalanced the data and [01:02:53] artificially rebalanced the data and this gives us this like assumption-free [01:02:55] this gives us this like assumption-free way of getting estimates of how well our [01:02:57] way of getting estimates of how well our model will perform on this sort of 50 50 [01:03:00] model will perform on this sort of 50 50 test set and you know this applies to [01:03:02] test set and you know this applies to all of what i've talked about before [01:03:03] all of what i've talked about before like if your data is imbalanced it has a [01:03:05] like if your data is imbalanced it has a minority group or maybe has shortcuts [01:03:07] minority group or maybe has shortcuts maybe we can rebalance it to get rid of [01:03:09] maybe we can rebalance it to get rid of all these problems right [01:03:10] all these problems right so are we done [01:03:12] so are we done no uh because you know even though i [01:03:14] no uh because you know even though i talked about it that way if we re [01:03:16] talked about it that way if we re shuffle or like restructure the data set [01:03:18] shuffle or like restructure the data set you know the training data 100 percent [01:03:20] you know the training data 100 percent men and the test date is 100 women so [01:03:22] men and the test date is 100 women so there's no overlap and if we try to [01:03:24] there's no overlap and if we try to reweight this data we're going to get [01:03:26] reweight this data we're going to get infinite error right because we need to [01:03:27] infinite error right because we need to infinitely update the women that we [01:03:29] infinitely update the women that we don't have in our training data so this [01:03:31] don't have in our training data so this is the fundamental problem with all [01:03:33] is the fundamental problem with all these approaches like when you don't [01:03:34] these approaches like when you don't have any overlap your estimates all blow [01:03:36] have any overlap your estimates all blow up and go to infinity [01:03:38] up and go to infinity and so in the real world [01:03:40] and so in the real world everything is non-overlapping and so [01:03:43] everything is non-overlapping and so usually these kinds of estimates don't [01:03:44] usually these kinds of estimates don't work [01:03:46] work but intuitively we might think you know [01:03:48] but intuitively we might think you know these kinds of tasks are possible and [01:03:50] these kinds of tasks are possible and the reason why you know um you and i and [01:03:53] the reason why you know um you and i and many others think this is possible is [01:03:54] many others think this is possible is this intuition right like let's say we [01:03:56] this intuition right like let's say we have training data that's blue images [01:03:58] have training data that's blue images and test data that's orange images like [01:04:00] and test data that's orange images like clearly there is no overlap between any [01:04:02] clearly there is no overlap between any of these images right they're on [01:04:03] of these images right they're on different color channels [01:04:05] different color channels um but if we desaturate the images we [01:04:07] um but if we desaturate the images we can perform prediction on the [01:04:08] can perform prediction on the desaturated image and we'll get really [01:04:10] desaturated image and we'll get really good performance right [01:04:12] good performance right and so the intuition that if we can't [01:04:14] and so the intuition that if we can't distinguish the two domains because [01:04:15] distinguish the two domains because we've desaturated the images we might do [01:04:18] we've desaturated the images we might do really well and this is the idea behind [01:04:20] really well and this is the idea behind you know most of modern domain [01:04:21] you know most of modern domain adaptation where you learn to represent [01:04:24] adaptation where you learn to represent your data in a space that doesn't change [01:04:26] your data in a space that doesn't change when you go from training to test [01:04:28] when you go from training to test distribution and so you measure how much [01:04:30] distribution and so you measure how much your [01:04:31] your data changes in this sort of a higher [01:04:33] data changes in this sort of a higher level representation and if your data is [01:04:36] level representation and if your data is close then you're going to do really [01:04:37] close then you're going to do really well and the sort of thing to keep in [01:04:38] well and the sort of thing to keep in mind is you know your test performance [01:04:41] mind is you know your test performance is going to be your training data's [01:04:42] is going to be your training data's performance plus some sort of distance [01:04:44] performance plus some sort of distance that measures how different the training [01:04:46] that measures how different the training and test distributions are [01:04:48] and test distributions are and if you keep this small you're going [01:04:50] and if you keep this small you're going to do really well [01:04:53] and so you can think about this sort of [01:04:55] and so you can think about this sort of very simply as saying you know this gap [01:04:56] very simply as saying you know this gap you know the test error of a model is [01:04:58] you know the test error of a model is the source performance how well we do on [01:05:00] the source performance how well we do on the training data and the gap between [01:05:02] the training data and the gap between trainer and tester [01:05:04] trainer and tester and [01:05:05] and this idea is that you know we're going [01:05:07] this idea is that you know we're going to look at a domain distance where no [01:05:09] to look at a domain distance where no matter what model we pick we're going to [01:05:11] matter what model we pick we're going to do well because the distributions look [01:05:13] do well because the distributions look so similar right like if images are [01:05:14] so similar right like if images are desaturated and they look identical it [01:05:16] desaturated and they look identical it doesn't matter what the model is they do [01:05:18] doesn't matter what the model is they do identically [01:05:20] identically and so if we do well on the source [01:05:22] and so if we do well on the source domain and the domain distance is low we [01:05:25] domain and the domain distance is low we might be able to generalize [01:05:26] might be able to generalize and this is really interesting and [01:05:28] and this is really interesting and optimistic right like these all seem [01:05:30] optimistic right like these all seem like things that we can measure and [01:05:32] like things that we can measure and think about and they give us conditions [01:05:34] think about and they give us conditions under which we might be able to do well [01:05:36] under which we might be able to do well on a test distribution and there's been [01:05:37] on a test distribution and there's been a lot of work over the last almost two [01:05:40] a lot of work over the last almost two decades now um on these kinds of domain [01:05:42] decades now um on these kinds of domain distances and bounds and how you can [01:05:44] distances and bounds and how you can learn from unlabeled data [01:05:46] learn from unlabeled data um and they give great intuition they [01:05:48] um and they give great intuition they let you think carefully about these [01:05:49] let you think carefully about these kinds of problems but unfortunately if [01:05:51] kinds of problems but unfortunately if you try to actually compute these bounds [01:05:53] you try to actually compute these bounds you know what's my guarantee of test [01:05:55] you know what's my guarantee of test error they're usually vacuous so you'll [01:05:56] error they're usually vacuous so you'll say [01:05:57] say things like the accuracy will be greater [01:05:59] things like the accuracy will be greater than zero and the error will be uh less [01:06:01] than zero and the error will be uh less than one which is not super helpful [01:06:05] and um just to go over how these kinds [01:06:07] and um just to go over how these kinds of things often work in practice [01:06:10] of things often work in practice domain distance is the thing i just [01:06:11] domain distance is the thing i just described is the basis for a lot of [01:06:13] described is the basis for a lot of these modern domain adaptation methods [01:06:15] these modern domain adaptation methods and the key idea is that neural nets are [01:06:18] and the key idea is that neural nets are everywhere because neural nets work [01:06:20] everywhere because neural nets work and you use them as a way to measure the [01:06:21] and you use them as a way to measure the domain distance [01:06:23] domain distance and the idea here is that you have one [01:06:26] and the idea here is that you have one part of your model maximizing [01:06:27] part of your model maximizing performance on the training data and you [01:06:29] performance on the training data and you have another part of the model making [01:06:30] have another part of the model making sure that you can't distinguish between [01:06:32] sure that you can't distinguish between the training and the data distribution [01:06:34] the training and the data distribution on this sort of what's called like a [01:06:35] on this sort of what's called like a bottleneck feature space where [01:06:38] bottleneck feature space where at this level your training and test [01:06:40] at this level your training and test data should be indistinguishable and yet [01:06:42] data should be indistinguishable and yet useful for actually doing the task [01:06:47] and so you know we have this hope that [01:06:49] and so you know we have this hope that you know we have this re-weighting thing [01:06:50] you know we have this re-weighting thing which has no model dependence um and [01:06:53] which has no model dependence um and this model-based domain [01:06:55] this model-based domain distances that you know requires us to [01:06:57] distances that you know requires us to carefully construct a neural network but [01:06:58] carefully construct a neural network but on the other hand that's the only way we [01:07:00] on the other hand that's the only way we can get these things to work in the real [01:07:02] can get these things to work in the real world right like with re-weighting often [01:07:04] world right like with re-weighting often these weights are infinity and so [01:07:06] these weights are infinity and so there's no free lunch we either need [01:07:07] there's no free lunch we either need model assumptions or assumptions about [01:07:10] model assumptions or assumptions about overlap on the domain [01:07:12] overlap on the domain um but if we have one of those and [01:07:14] um but if we have one of those and unlabeled data on the test domain we can [01:07:16] unlabeled data on the test domain we can actually sometimes do well [01:07:20] the other approach that i'm go over a [01:07:22] the other approach that i'm go over a little quickly because i'm running low [01:07:24] little quickly because i'm running low on time [01:07:25] on time i can describe very uh at a very high [01:07:26] i can describe very uh at a very high level as this idea right [01:07:29] level as this idea right as i said before the main issue is that [01:07:31] as i said before the main issue is that our training distribution and the test [01:07:33] our training distribution and the test distribution are different right and if [01:07:35] distribution are different right and if they were the same we'd be done but [01:07:37] they were the same we'd be done but they're not [01:07:38] they're not but what if i told you i give you a list [01:07:40] but what if i told you i give you a list of 100 possible test distributions and i [01:07:42] of 100 possible test distributions and i say your true test distribution is going [01:07:44] say your true test distribution is going to be one amongst these hundred right [01:07:47] to be one amongst these hundred right then we can train a model to do well on [01:07:48] then we can train a model to do well on all of these right we just go through [01:07:50] all of these right we just go through each one of these test sets and we say [01:07:52] each one of these test sets and we say our model has to do well on the worst [01:07:54] our model has to do well on the worst one and if we can get a model that works [01:07:56] one and if we can get a model that works on all of them we know that our model is [01:07:58] on all of them we know that our model is going to do well in the true test set so [01:08:00] going to do well in the true test set so this is thinking about you know a [01:08:02] this is thinking about you know a potential set of test distributions and [01:08:04] potential set of test distributions and considering the worst case and this is [01:08:06] considering the worst case and this is what's called a min and max optimization [01:08:08] what's called a min and max optimization problem [01:08:09] problem so we're going to find the best model [01:08:10] so we're going to find the best model that's the min part that works well over [01:08:13] that's the min part that works well over the worst possible potential test set [01:08:15] the worst possible potential test set and that's the max part and this idea is [01:08:17] and that's the max part and this idea is going to work whenever the true test set [01:08:20] going to work whenever the true test set um is contained in this uncertainty [01:08:22] um is contained in this uncertainty queue right we're saying worst case over [01:08:24] queue right we're saying worst case over this big q which is the set of potential [01:08:26] this big q which is the set of potential test distributions and this fails [01:08:28] test distributions and this fails whenever q is too small or two bit if q [01:08:30] whenever q is too small or two bit if q doesn't contain the real test [01:08:31] doesn't contain the real test distribution you've got no guarantees if [01:08:33] distribution you've got no guarantees if q is so big that it contains everything [01:08:35] q is so big that it contains everything then your model is going to be [01:08:37] then your model is going to be so pessimistic right because it has to [01:08:39] so pessimistic right because it has to be prepared for any possible [01:08:40] be prepared for any possible distribution that's just going to [01:08:42] distribution that's just going to predict um 50 50 or you know something [01:08:44] predict um 50 50 or you know something vacuous for all of your inputs [01:08:48] vacuous for all of your inputs i'm going to skip over a bit of examples [01:08:52] i'm going to skip over a bit of examples and i'm going to go to this slide [01:08:54] and i'm going to go to this slide and say [01:08:54] and say these kinds of ideas can be applied in [01:08:56] these kinds of ideas can be applied in each one of the settings that i [01:08:57] each one of the settings that i described before [01:08:59] described before talking about minority groups in [01:09:00] talking about minority groups in fairness or adversarial examples or [01:09:03] fairness or adversarial examples or understanding by carefully choosing the [01:09:05] understanding by carefully choosing the kinds of worst-case groups [01:09:07] kinds of worst-case groups in the case of minority groups we [01:09:09] in the case of minority groups we basically explicitly list out all the [01:09:10] basically explicitly list out all the possible minorities that we care about [01:09:12] possible minorities that we care about and we consider the worst case [01:09:14] and we consider the worst case performance over all of those [01:09:16] performance over all of those in adversarial examples we know that [01:09:18] in adversarial examples we know that these images before and after [01:09:20] these images before and after perturbation are close so we consider [01:09:22] perturbation are close so we consider all distributions that are nearby each [01:09:24] all distributions that are nearby each other in pixel space and then we [01:09:26] other in pixel space and then we maximize over the worst case over those [01:09:29] maximize over the worst case over those and then for shortcuts what we can do is [01:09:31] and then for shortcuts what we can do is we can explicitly construct groups that [01:09:33] we can explicitly construct groups that don't contain some of these shortcuts [01:09:35] don't contain some of these shortcuts and we enumerate all such groups and [01:09:37] and we enumerate all such groups and then make sure that these worst case [01:09:38] then make sure that these worst case groups work well so for example [01:09:41] groups work well so for example if we have a model that relies too much [01:09:42] if we have a model that relies too much on backgrounds we construct [01:09:44] on backgrounds we construct subgroups of the data that have sort of [01:09:46] subgroups of the data that have sort of mismatching backgrounds and objects [01:09:50] so the [01:09:52] so the to basically wrap up here right the [01:09:53] to basically wrap up here right the limits of this kind of approach is that [01:09:55] limits of this kind of approach is that if we pick too small of a worst case [01:09:57] if we pick too small of a worst case group we got no robustness and if we [01:09:59] group we got no robustness and if we pick too broad of a worst case group we [01:10:01] pick too broad of a worst case group we get vacuous models and there's no simple [01:10:03] get vacuous models and there's no simple or general principle for designing these [01:10:05] or general principle for designing these losses even though this approach gives [01:10:07] losses even though this approach gives us really nice ways of thinking about [01:10:08] us really nice ways of thinking about and optimizing models for the worst case [01:10:11] and optimizing models for the worst case and getting guarantees [01:10:13] and getting guarantees okay i'm going to wrap up there if [01:10:15] okay i'm going to wrap up there if anyone has questions [01:10:16] anyone has questions i would be happy to [01:10:18] i would be happy to answer them i can stay for a little bit [01:10:20] answer them i can stay for a little bit longer [01:10:21] longer and chat if people have questions [01:10:24] and chat if people have questions all right maybe we should just uh thank [01:10:26] all right maybe we should just uh thank tatsu for his really insightful and [01:10:29] tatsu for his really insightful and interesting talk [01:10:31] interesting talk and then maybe [01:10:33] and then maybe run off if they have to go [01:10:35] run off if they have to go so thanks tata [01:10:39] thank you [01:10:41] thank you really a lot of interesting things food [01:10:43] really a lot of interesting things food for thought so i hope that everyone is [01:10:46] for thought so i hope that everyone is kind of have their eyes opened with [01:10:48] kind of have their eyes opened with respect to all the different problems [01:10:50] respect to all the different problems that we're we're seeing and hopefully [01:10:52] that we're we're seeing and hopefully motivated to help [01:10:53] motivated to help solve some of these because i think [01:10:54] solve some of these because i think there's a lot of interesting open [01:10:56] there's a lot of interesting open research questions here ================================================================================ LECTURE 053 ================================================================================ Fireside Talks: State of Robotics I Automation and Robotics Engineering Lectures - Stanford Source: https://www.youtube.com/watch?v=hVsR9DdR3qE --- Transcript [00:00:05] hi everyone okay let's start to 21 the [00:00:08] hi everyone okay let's start to 21 the second lecture or the second week [00:00:11] second lecture or the second week um of this quarter so [00:00:14] um of this quarter so um yeah hi i'm dorsa uh you saw me last [00:00:17] um yeah hi i'm dorsa uh you saw me last time i'm co-teaching this class with [00:00:18] time i'm co-teaching this class with percy um and today our plan is to talk a [00:00:23] percy um and today our plan is to talk a little bit about robotics so this is [00:00:25] little bit about robotics so this is this is going to be kind of an informal [00:00:27] this is going to be kind of an informal introduction to robotics a little bit of [00:00:29] introduction to robotics a little bit of a history a little bit of state of the [00:00:31] a history a little bit of state of the art [00:00:32] art some cool videos um and and a bit of [00:00:35] some cool videos um and and a bit of chat so i don't need to finish my slides [00:00:37] chat so i don't need to finish my slides i have a lot of slides i'm probably not [00:00:39] i have a lot of slides i'm probably not going to finish my slides so feel free [00:00:42] going to finish my slides so feel free to interrupt me any point in time if you [00:00:44] to interrupt me any point in time if you have questions about anything we can [00:00:45] have questions about anything we can make this an informal discussion again [00:00:47] make this an informal discussion again at any point in time uh i will probably [00:00:50] at any point in time uh i will probably not cover like all the things that are [00:00:51] not cover like all the things that are in the slides anyways [00:00:53] in the slides anyways um all right so let's get into some [00:00:56] um all right so let's get into some quick logistics [00:00:58] quick logistics all right so [00:01:00] all right so this was [00:01:01] this was our plan for the class i'm sure you've [00:01:03] our plan for the class i'm sure you've seen this from the last lecture so the [00:01:05] seen this from the last lecture so the plan was to start with reflex based [00:01:07] plan was to start with reflex based models we've kind of already started [00:01:09] models we've kind of already started that so [00:01:10] that so percy basically went over uh the machine [00:01:14] percy basically went over uh the machine learning components last week and we [00:01:16] learning components last week and we also have another week of machine [00:01:18] also have another week of machine learning a little bit of deep learning [00:01:20] learning a little bit of deep learning and then starting next week uh we are [00:01:22] and then starting next week uh we are going to talk about state-based models [00:01:24] going to talk about state-based models so i'm going to cover a lot of that so [00:01:27] so i'm going to cover a lot of that so we're going to do search mdps games i'm [00:01:30] we're going to do search mdps games i'm going to reuse a lot of videos from last [00:01:32] going to reuse a lot of videos from last year just giving you a heads up we'll [00:01:33] year just giving you a heads up we'll probably add and remove some but last [00:01:36] probably add and remove some but last year's videos there in nvidia we have a [00:01:38] year's videos there in nvidia we have a whiteboard everything is great so i'll [00:01:40] whiteboard everything is great so i'll probably reuse a lot of that but we [00:01:42] probably reuse a lot of that but we basically will plan to break them into [00:01:44] basically will plan to break them into modules [00:01:45] modules um and then percy will cover variable [00:01:47] um and then percy will cover variable based models and i will finish logic and [00:01:51] based models and i will finish logic and finish the class so so that was just a [00:01:52] finish the class so so that was just a quick overview of what the plan is if [00:01:55] quick overview of what the plan is if you remember monday lectures are not [00:01:57] you remember monday lectures are not modules right monday lectures are going [00:01:59] modules right monday lectures are going to be guest talks and chats and having [00:02:02] to be guest talks and chats and having fun so um just to give you an idea of [00:02:05] fun so um just to give you an idea of the monday lectures um [00:02:07] the monday lectures um so this is a tentative schedule you've [00:02:09] so this is a tentative schedule you've already done the introduction to ai [00:02:11] already done the introduction to ai percy did that [00:02:13] percy did that today i will be doing this talk on state [00:02:15] today i will be doing this talk on state of robotics talking a little bit about [00:02:17] of robotics talking a little bit about what that is why you should care about [00:02:18] what that is why you should care about it [00:02:19] it and then next week we have next monday [00:02:21] and then next week we have next monday we have a guest speaker um and the guest [00:02:24] we have a guest speaker um and the guest speaker is mariano florentino spoiler he [00:02:27] speaker is mariano florentino spoiler he is a faculty in the law school [00:02:29] is a faculty in the law school he's also on the california supreme [00:02:31] he's also on the california supreme court so it should be a fun talk to [00:02:34] court so it should be a fun talk to attend he does a lot of work around ai [00:02:37] attend he does a lot of work around ai and law he does teach a class actually [00:02:39] and law he does teach a class actually on regulating ai and he has a lot of [00:02:40] on regulating ai and he has a lot of interesting opinions on that so totally [00:02:43] interesting opinions on that so totally recommend showing up for that i think [00:02:44] recommend showing up for that i think that would be a lot of fun [00:02:46] that would be a lot of fun then the week after we have tatsu [00:02:48] then the week after we have tatsu hashimato so tatsu is a new faculty in [00:02:51] hashimato so tatsu is a new faculty in the cs department he does a lot of work [00:02:53] the cs department he does a lot of work around robust machine learning so he'll [00:02:55] around robust machine learning so he'll probably be talking about that [00:02:57] probably be talking about that followed by percy talking about state of [00:02:59] followed by percy talking about state of natural language processing in week five [00:03:01] natural language processing in week five by the way this is tentative so do not [00:03:03] by the way this is tentative so do not quote me on that some of these dates [00:03:04] quote me on that some of these dates might change i think the speakers are [00:03:06] might change i think the speakers are accurate the dates might move around [00:03:09] accurate the dates might move around and then finally in week six we have we [00:03:11] and then finally in week six we have we have emma pearson talking about ai and [00:03:13] have emma pearson talking about ai and equality and healthcare uh she she's got [00:03:16] equality and healthcare uh she she's got she'll be a faculty at cornell tech [00:03:18] she'll be a faculty at cornell tech joining next year so it would be [00:03:19] joining next year so it would be interesting to hear [00:03:21] interesting to hear hear from her [00:03:22] hear from her and then week seven is kind of like a [00:03:24] and then week seven is kind of like a fun chat just with percy and i will show [00:03:27] fun chat just with percy and i will show up and you can ask us anything basically [00:03:29] up and you can ask us anything basically um we plan to think about like grad [00:03:31] um we plan to think about like grad school and talk about like those topics [00:03:33] school and talk about like those topics probably so if you have any questions [00:03:34] probably so if you have any questions about that research what to do after 221 [00:03:37] about that research what to do after 221 i think that's a good week to attend [00:03:40] i think that's a good week to attend uh week eight we have you offshore so he [00:03:42] uh week eight we have you offshore so he was faculty at stanford and he has done [00:03:45] was faculty at stanford and he has done a lot of work like from back in the days [00:03:47] a lot of work like from back in the days in ai and his you know kind of like all [00:03:48] in ai and his you know kind of like all the revolution of ai so really fun to [00:03:51] the revolution of ai so really fun to show up for his talk [00:03:53] show up for his talk and then week nine we have drago and [00:03:55] and then week nine we have drago and golovk so drago is the head of [00:03:57] golovk so drago is the head of autonomous driving at waymo uh so if [00:04:00] autonomous driving at waymo uh so if you're interested in learning about [00:04:01] you're interested in learning about autonomous driving to the extent that he [00:04:03] autonomous driving to the extent that he can talk about it [00:04:04] can talk about it that would be weakness [00:04:06] that would be weakness um and then week 10 i will do a [00:04:09] um and then week 10 i will do a conclusion so um that is and i'll wrap [00:04:12] conclusion so um that is and i'll wrap up the class so so that is kind of our [00:04:14] up the class so so that is kind of our plan from one day lectures i just want [00:04:15] plan from one day lectures i just want to advertise it so and you kind of like [00:04:17] to advertise it so and you kind of like have a plan of what will show up next [00:04:20] have a plan of what will show up next um next couple weeks okay [00:04:24] um next couple weeks okay all right any questions [00:04:26] all right any questions if you have any questions by the way um [00:04:28] if you have any questions by the way um just put it on chat or interrupt [00:04:31] just put it on chat or interrupt all right so today we want to talk about [00:04:32] all right so today we want to talk about robotics um and i just wanted to start [00:04:35] robotics um and i just wanted to start it off i have a lot of videos today so [00:04:37] it off i have a lot of videos today so it'll be fun videos i just wanted to [00:04:39] it'll be fun videos i just wanted to start off by some video showing robots [00:04:41] start off by some video showing robots can dance just to advertise this [00:04:48] [Music] [00:04:51] you guys can hear this right [00:04:54] you guys can hear this right yeah okay [00:05:01] [Music] [00:05:20] anyways i wanted to start with this [00:05:22] anyways i wanted to start with this video just to motivate why we care about [00:05:24] video just to motivate why we care about robotics this is spot from boston [00:05:26] robotics this is spot from boston dynamics boston dynamics is this company [00:05:29] dynamics boston dynamics is this company that does a lot of really cool robotics [00:05:31] that does a lot of really cool robotics type of work we'll see more of the [00:05:32] type of work we'll see more of the robots later in this lecture but robots [00:05:35] robots later in this lecture but robots can dance they're cool and let's just [00:05:37] can dance they're cool and let's just start the the conversation with this [00:05:39] start the the conversation with this question of well what is a robot and [00:05:42] question of well what is a robot and when is it that we call it a robot so [00:05:45] when is it that we call it a robot so question that i have and i think maybe [00:05:46] question that i have and i think maybe like it would be a good starter that you [00:05:48] like it would be a good starter that you can just go into breakout drones for two [00:05:50] can just go into breakout drones for two to three minutes just chat about this [00:05:52] to three minutes just chat about this and then the question that i guess i [00:05:54] and then the question that i guess i have is think about a hammer if is is a [00:05:58] have is think about a hammer if is is a hammer a robot what do you think [00:06:00] hammer a robot what do you think and think about like google or google [00:06:03] and think about like google or google home or like siri is that a robot so [00:06:05] home or like siri is that a robot so what defines a robot and i think that's [00:06:08] what defines a robot and i think that's just like just the starter to think [00:06:10] just like just the starter to think about like what is a robot why should we [00:06:12] about like what is a robot why should we care about it and go to breakout rooms [00:06:14] care about it and go to breakout rooms two to three minutes we'll come back put [00:06:16] two to three minutes we'll come back put your answers introduce yourself talk to [00:06:19] your answers introduce yourself talk to talk to the people in the breakout room [00:06:20] talk to the people in the breakout room and then put your answers when you come [00:06:22] and then put your answers when you come back on chat and we'll continue [00:06:28] yes i believe everyone's back all right [00:06:31] yes i believe everyone's back all right yeah so i hope you just met your friends [00:06:33] yeah so i hope you just met your friends and other people in the class and yeah [00:06:35] and other people in the class and yeah if you have thoughts put it in the chat [00:06:37] if you have thoughts put it in the chat we'll look at it later what is a robot [00:06:40] we'll look at it later what is a robot but let's actually continue with our [00:06:43] but let's actually continue with our talk today because again i have a lot of [00:06:45] talk today because again i have a lot of videos [00:06:46] videos uh so my plan for today is uh to do a [00:06:49] uh so my plan for today is uh to do a bunch of things i'm gonna start with a [00:06:50] bunch of things i'm gonna start with a quick history of robotics where did it [00:06:52] quick history of robotics where did it come from why do we care about it [00:06:55] come from why do we care about it um then i'm gonna spend a bit of time [00:06:57] um then i'm gonna spend a bit of time talking about why you should care about [00:06:59] talking about why you should care about robots and why should this class care [00:07:01] robots and why should this class care about robot how are robots related to ai [00:07:03] about robot how are robots related to ai what is their relationship there [00:07:05] what is their relationship there and then i wanted to spend a little bit [00:07:07] and then i wanted to spend a little bit of time talking about robotics at [00:07:09] of time talking about robotics at stanford so um what are the research [00:07:11] stanford so um what are the research that's done at stanford uh who are the [00:07:13] that's done at stanford uh who are the faculty who do robotics at stanford just [00:07:15] faculty who do robotics at stanford just so you know the faces and you know what [00:07:17] so you know the faces and you know what classes to take and kind of like uh what [00:07:20] classes to take and kind of like uh what type of research is done and how you can [00:07:21] type of research is done and how you can get in touch with them [00:07:23] get in touch with them this is probably to the extent that i [00:07:25] this is probably to the extent that i can get but if i have time i'm gonna [00:07:27] can get but if i have time i'm gonna talk a little bit about some exciting [00:07:29] talk a little bit about some exciting robotics applications like all the [00:07:31] robotics applications like all the awesome things that are happening [00:07:33] awesome things that are happening um also all the not so awesome things [00:07:35] um also all the not so awesome things that are happening the fact that robots [00:07:37] that are happening the fact that robots are far from perfect or like not there [00:07:40] are far from perfect or like not there yet and then if i have time which is [00:07:42] yet and then if i have time which is very likely to not be the case [00:07:44] very likely to not be the case i will talk a little bit about my own [00:07:46] i will talk a little bit about my own research around interactive robotics [00:07:49] research around interactive robotics so again the rules of it is at any point [00:07:51] so again the rules of it is at any point in time interrupt just ask questions [00:07:54] in time interrupt just ask questions raise your hand like or we'll do that [00:07:57] raise your hand like or we'll do that and we'll go from there [00:07:59] and we'll go from there and let's just jump into this quick [00:08:01] and let's just jump into this quick history of robotics where does it come [00:08:03] history of robotics where does it come from where does the word robot even [00:08:05] from where does the word robot even comes from [00:08:06] comes from so the word robot actually is kind of [00:08:08] so the word robot actually is kind of old it comes from this display from carl [00:08:11] old it comes from this display from carl catholic in 1921 the play is called the [00:08:13] catholic in 1921 the play is called the rosen's universal robots and it's about [00:08:16] rosen's universal robots and it's about basically this mechanical men that are [00:08:18] basically this mechanical men that are built in factory and are supposed to do [00:08:20] built in factory and are supposed to do work for humans and then they rise [00:08:23] work for humans and then they rise against humans so that's kind of like [00:08:25] against humans so that's kind of like the part of it um and and basically it [00:08:28] the part of it um and and basically it has a little bit of a dystopian view of [00:08:30] has a little bit of a dystopian view of that and and the word these mechanical [00:08:32] that and and the word these mechanical men basically are are called robota [00:08:35] men basically are are called robota which basically means slave or kind of [00:08:38] which basically means slave or kind of like labor type work in czech i don't [00:08:41] like labor type work in czech i don't know if anyone knows czech and i don't [00:08:42] know if anyone knows czech and i don't know how accurate that is but that's [00:08:44] know how accurate that is but that's basically what i read on wikipedia so i [00:08:46] basically what i read on wikipedia so i assume that is that is accurate [00:08:48] assume that is that is accurate um so that was the word robot but [00:08:50] um so that was the word robot but then the word robotic was also first um [00:08:53] then the word robotic was also first um introduced by this guy isaac asimov [00:08:56] introduced by this guy isaac asimov who is a science fiction writer and then [00:08:58] who is a science fiction writer and then he came around 1950s and he wrote a [00:09:01] he came around 1950s and he wrote a bunch of books about robots and and the [00:09:03] bunch of books about robots and and the view of it was a little bit nicer it was [00:09:05] view of it was a little bit nicer it was a little bit friendlier than this [00:09:06] a little bit friendlier than this dystopian view and and he talked about [00:09:09] dystopian view and and he talked about these different rules of robotics the [00:09:11] these different rules of robotics the robots were there to help humans and [00:09:13] robots were there to help humans and they were supposed to like follow these [00:09:14] they were supposed to like follow these different rules so you might you guys [00:09:16] different rules so you might you guys might have heard of these three rules of [00:09:18] might have heard of these three rules of robotics by azek asimov so the first one [00:09:21] robotics by azek asimov so the first one is that a robot may not injure a human [00:09:23] is that a robot may not injure a human being or through inaction allow a human [00:09:26] being or through inaction allow a human being to come to harm [00:09:28] being to come to harm the second one is a robot must obey the [00:09:30] the second one is a robot must obey the orders given given it by human beings [00:09:32] orders given given it by human beings except for each or where such orders [00:09:35] except for each or where such orders with conflict with the first law [00:09:38] with conflict with the first law and then the last one is a robot must [00:09:40] and then the last one is a robot must protect its own existence as long as [00:09:42] protect its own existence as long as such protection does not conflict with [00:09:43] such protection does not conflict with the first and second one okay so kind of [00:09:46] the first and second one okay so kind of like [00:09:47] like this is this is obvious right like you [00:09:48] this is this is obvious right like you don't want the robots to to go against [00:09:50] don't want the robots to to go against humans and you don't want the robot to [00:09:52] humans and you don't want the robot to kill itself and and these are kind of [00:09:55] kill itself and and these are kind of the three rules and and the reason i'm [00:09:57] the three rules and and the reason i'm mentioning this is that people are [00:09:58] mentioning this is that people are coming back to these rules like even [00:10:00] coming back to these rules like even these days like when you're people are [00:10:02] these days like when you're people are thinking about robots they feel like oh [00:10:03] thinking about robots they feel like oh a robot should try to satisfy these [00:10:05] a robot should try to satisfy these three laws of robotics and these are [00:10:07] three laws of robotics and these are kind of like the core laws that need to [00:10:09] kind of like the core laws that need to be satisfied [00:10:10] be satisfied and and the thing about these laws is [00:10:12] and and the thing about these laws is sure it is nice but they're kind of [00:10:14] sure it is nice but they're kind of obvious right and actually satisfying [00:10:17] obvious right and actually satisfying them is the most difficult part and it [00:10:19] them is the most difficult part and it doesn't really like go go through that [00:10:21] doesn't really like go go through that like like if i'm if i have my robot [00:10:23] like like if i'm if i have my robot running gradient descent on the loss [00:10:25] running gradient descent on the loss function how do i define that loss [00:10:27] function how do i define that loss function accurately so i satisfy these [00:10:29] function accurately so i satisfy these these rules like that is not a very [00:10:31] these rules like that is not a very obvious thing and then i guess get it in [00:10:33] obvious thing and then i guess get it in go go and talk about that [00:10:35] go go and talk about that um let me give you an example so let's [00:10:37] um let me give you an example so let's say i have rosie the robot that's [00:10:39] say i have rosie the robot that's supposed to like clean my house or i [00:10:41] supposed to like clean my house or i have a roomba that roomba is supposed to [00:10:43] have a roomba that roomba is supposed to clean my house [00:10:44] clean my house and let's say i have built this super [00:10:46] and let's say i have built this super nice intricate house of cards okay [00:10:49] nice intricate house of cards okay any human was helping me clean my house [00:10:52] any human was helping me clean my house would know that you shouldn't touch this [00:10:53] would know that you shouldn't touch this house of cards because they spend so [00:10:55] house of cards because they spend so much time and this is so valuable to me [00:10:57] much time and this is so valuable to me and then you shouldn't touch it you [00:10:58] and then you shouldn't touch it you shouldn't go and clean it but a robot [00:11:00] shouldn't go and clean it but a robot wouldn't know that why would it like why [00:11:02] wouldn't know that why would it like why would it know what a house of cards is [00:11:04] would it know what a house of cards is or like how much energy i've put in in [00:11:06] or like how much energy i've put in in creating it um so so that is kind of [00:11:09] creating it um so so that is kind of like the objective function and sure [00:11:10] like the objective function and sure it's not about harm it might be harming [00:11:13] it's not about harm it might be harming me it's harming humans uh but but in [00:11:15] me it's harming humans uh but but in general thinking about what should be [00:11:17] general thinking about what should be the objective that the robot should [00:11:19] the objective that the robot should satisfy what should the reward function [00:11:21] satisfy what should the reward function be we'll be talking about mdps in a [00:11:22] be we'll be talking about mdps in a couple of weeks and we'll be talking [00:11:24] couple of weeks and we'll be talking about reward functions what should the [00:11:26] about reward functions what should the reward function be is actually a very [00:11:29] reward function be is actually a very difficult problem this is an active area [00:11:31] difficult problem this is an active area of research trying to understand what [00:11:33] of research trying to understand what are the human preferences what are the [00:11:35] are the human preferences what are the things that humans actually want robotic [00:11:37] things that humans actually want robotic agents to do around them and then at the [00:11:39] agents to do around them and then at the same time what does the robot think [00:11:41] same time what does the robot think those preferences are there's a mismatch [00:11:43] those preferences are there's a mismatch between that there's always going to be [00:11:44] between that there's always going to be a mismatch between that and that [00:11:47] a mismatch between that and that mismatch how harmful is that going to be [00:11:49] mismatch how harmful is that going to be how unsafe is it that the robot doesn't [00:11:51] how unsafe is it that the robot doesn't know everything [00:11:53] know everything and and you might ask well why don't you [00:11:56] and and you might ask well why don't you just like write that as an objective [00:11:58] just like write that as an objective that hey robot don't go and destroy [00:12:00] that hey robot don't go and destroy doris's house apart [00:12:02] doris's house apart oh that's perfectly fine but the thing [00:12:04] oh that's perfectly fine but the thing is [00:12:05] is like it is really hard to write out all [00:12:08] like it is really hard to write out all these specifications all these [00:12:09] these specifications all these properties that we would want to satisfy [00:12:11] properties that we would want to satisfy in the world because there's just so [00:12:13] in the world because there's just so much context so much information in the [00:12:14] much context so much information in the world and humans just know that like as [00:12:17] world and humans just know that like as they grow up like they learn and they [00:12:18] they grow up like they learn and they know that and robots wouldn't [00:12:20] know that and robots wouldn't necessarily know that and you might say [00:12:22] necessarily know that and you might say that's just data so let's just learn [00:12:24] that's just data so let's just learn through that a lot of people agree with [00:12:26] through that a lot of people agree with that a lot of people disagree with that [00:12:28] that a lot of people disagree with that it's still like a point of debate that [00:12:30] it's still like a point of debate that is that just data like i just need to [00:12:32] is that just data like i just need to like show more things so the robot knows [00:12:34] like show more things so the robot knows house of cards are important to yourself [00:12:36] house of cards are important to yourself like that's also an another being but in [00:12:38] like that's also an another being but in general the point of this slide is hey [00:12:41] general the point of this slide is hey these rules of uh [00:12:43] these rules of uh uh asymmetry these rules that are put [00:12:45] uh asymmetry these rules that are put out there they're not that obvious to [00:12:47] out there they're not that obvious to satisfy sure i can say don't kill humans [00:12:49] satisfy sure i can say don't kill humans but it's not like really obvious how i [00:12:52] but it's not like really obvious how i write out like what does it mean not [00:12:53] write out like what does it mean not harming humans [00:12:55] harming humans and the second point i want to make here [00:12:57] and the second point i want to make here is that even that is still under [00:13:00] is that even that is still under question then not harming humans it's [00:13:02] question then not harming humans it's actually not obvious to everyone that we [00:13:04] actually not obvious to everyone that we shouldn't use robots not harm humans [00:13:07] shouldn't use robots not harm humans which is a little bit silly in my [00:13:08] which is a little bit silly in my opinion but i'm just talking about [00:13:10] opinion but i'm just talking about everyone's opinion here [00:13:12] everyone's opinion here so like if you think about if you think [00:13:14] so like if you think about if you think about like our defense or other other [00:13:16] about like our defense or other other countries defense right like uh people [00:13:19] countries defense right like uh people use these things called autonomous [00:13:21] use these things called autonomous weapon systems which are basically like [00:13:24] weapon systems which are basically like you can have drones that can detect [00:13:26] you can have drones that can detect targets and shoot at them you can have [00:13:28] targets and shoot at them you can have detail autonomous weapon systems these [00:13:30] detail autonomous weapon systems these are commonly referred to laws [00:13:33] are commonly referred to laws by laws and lethal autonomous weapon [00:13:35] by laws and lethal autonomous weapon system basically and there it's a big [00:13:37] system basically and there it's a big question should we use these should we [00:13:39] question should we use these should we not use this when can we use like we [00:13:41] not use this when can we use like we thought autonomous weapon systems and [00:13:43] thought autonomous weapon systems and and yeah it's not like it is here it's [00:13:45] and yeah it's not like it is here it's not a thing that's science fiction it's [00:13:47] not a thing that's science fiction it's actually a thing that uh our like [00:13:50] actually a thing that uh our like basically u.s defense has it like like [00:13:52] basically u.s defense has it like like the whole like all over different [00:13:54] the whole like all over different countries have some version of this [00:13:56] countries have some version of this and and the question is like yeah if we [00:13:58] and and the question is like yeah if we don't use it like other countries might [00:14:00] don't use it like other countries might use it and how do we think about um like [00:14:02] use it and how do we think about um like using or not using these systems like [00:14:04] using or not using these systems like how do we put a ban event on it what [00:14:05] how do we put a ban event on it what does a ban on it mean [00:14:07] does a ban on it mean um and there's a lot of um debate around [00:14:09] um and there's a lot of um debate around this start russell who's a faculty at uc [00:14:12] this start russell who's a faculty at uc berkeley and he also features a either [00:14:14] berkeley and he also features a either he is he's basically a proponent of [00:14:16] he is he's basically a proponent of banning or fully banning and with [00:14:18] banning or fully banning and with autonomous weapon systems and he has a [00:14:20] autonomous weapon systems and he has a lot of interesting talks around this we [00:14:22] lot of interesting talks around this we will talk about this a little bit more [00:14:24] will talk about this a little bit more toward um like in in the conclusion [00:14:26] toward um like in in the conclusion lecture but this is also something i [00:14:28] lecture but this is also something i wanted to mention because it could be an [00:14:29] wanted to mention because it could be an interesting topic to talk to tino about [00:14:31] interesting topic to talk to tino about next week like when you think about laws [00:14:33] next week like when you think about laws and regulating these things how does how [00:14:35] and regulating these things how does how does the regulation how do the [00:14:36] does the regulation how do the revelations like work and make sense [00:14:40] revelations like work and make sense so yeah so even like not harming humans [00:14:42] so yeah so even like not harming humans which is what isaac [00:14:43] which is what isaac said like that is still like even under [00:14:45] said like that is still like even under question here like it's not clear if [00:14:47] question here like it's not clear if that is what we want to do [00:14:49] that is what we want to do but okay let's go back to the history of [00:14:51] but okay let's go back to the history of robotics like why do we have robotics [00:14:53] robotics like why do we have robotics when did it start so around like 50s and [00:14:55] when did it start so around like 50s and 60s um we had a lot of excitement around [00:14:58] 60s um we had a lot of excitement around ai right so person was talking about ai [00:15:00] ai right so person was talking about ai last week talking about the history of [00:15:02] last week talking about the history of ai and that was the time that there was [00:15:05] ai and that was the time that there was a ton of excitement and even touring [00:15:07] a ton of excitement and even touring turing has this paper where in the paper [00:15:09] turing has this paper where in the paper he writes the best thing we can do is to [00:15:12] he writes the best thing we can do is to build a robot with tv cameras for its [00:15:14] build a robot with tv cameras for its eyes and motors for its legs and have it [00:15:17] eyes and motors for its legs and have it run around the countryside and learn [00:15:19] run around the countryside and learn from the world [00:15:21] from the world so this is from back in the day like [00:15:22] so this is from back in the day like touring time and this is what he was [00:15:24] touring time and this is what he was thinking and like even in the same like [00:15:27] thinking and like even in the same like in in the same paper later in that paper [00:15:29] in in the same paper later in that paper turing says well this is too difficult i [00:15:32] turing says well this is too difficult i don't want to deal with like this [00:15:33] don't want to deal with like this physical interaction with the [00:15:34] physical interaction with the countryside so instead maybe we should [00:15:37] countryside so instead maybe we should focus on the problem of intelligence [00:15:39] focus on the problem of intelligence maybe we should focus on ai and that is [00:15:41] maybe we should focus on ai and that is how like the next six 50 years was all [00:15:43] how like the next six 50 years was all about ai and building good ai systems [00:15:46] about ai and building good ai systems and there was a lot around robotics too [00:15:48] and there was a lot around robotics too i like sure like robotics also has seen [00:15:50] i like sure like robotics also has seen a lot of advances since 50s [00:15:52] a lot of advances since 50s but a lot more has happened in the ai [00:15:55] but a lot more has happened in the ai side of things just because the robotics [00:15:57] side of things just because the robotics side was so difficult [00:15:59] side was so difficult um an example of that is thinking about [00:16:02] um an example of that is thinking about a big blue which basically won its first [00:16:04] a big blue which basically won its first game against the world champion for [00:16:06] game against the world champion for chess in 1996 and it was doing like [00:16:09] chess in 1996 and it was doing like amazing intelligence right they had an [00:16:10] amazing intelligence right they had an intelligence being able to like win at [00:16:12] intelligence being able to like win at this game of chess but the thing that [00:16:14] this game of chess but the thing that was happening was that the chess pieces [00:16:17] was happening was that the chess pieces were moved by humans because that was so [00:16:19] were moved by humans because that was so difficult like grasping is still so [00:16:21] difficult like grasping is still so difficult when you think about like a [00:16:23] difficult when you think about like a robot trying to actually like move these [00:16:25] robot trying to actually like move these pieces and and that was not solved like [00:16:27] pieces and and that was not solved like in 1996 in any race at all yeah [00:16:31] in 1996 in any race at all yeah all right so [00:16:32] all right so when did the first robot comes i've been [00:16:34] when did the first robot comes i've been talking about this history and and like [00:16:36] talking about this history and and like the question is okay so what was the [00:16:38] the question is okay so what was the first robot out there the first [00:16:39] first robot out there the first intelligent robot so the first [00:16:41] intelligent robot so the first intelligent robot was shaky i have a [00:16:43] intelligent robot was shaky i have a video of it here uh it's kind of like a [00:16:45] video of it here uh it's kind of like a five minute video so it's a little bit [00:16:46] five minute video so it's a little bit of a long video but let's just watch it [00:16:48] of a long video but let's just watch it i think it has a lot of interesting [00:16:50] i think it has a lot of interesting history in it [00:16:53] shaky was the world's first mobile [00:16:55] shaky was the world's first mobile intelligent robot embodying numerous [00:16:58] intelligent robot embodying numerous breakthroughs in artificial intelligence [00:17:00] breakthroughs in artificial intelligence robotics computer vision navigation and [00:17:02] robotics computer vision navigation and other research areas [00:17:04] other research areas the robot was developed from 1966 to [00:17:07] the robot was developed from 1966 to 1972 by sri international then called [00:17:11] 1972 by sri international then called stanford research institute [00:17:13] stanford research institute and its legacy and impact are still very [00:17:15] and its legacy and impact are still very much alive today shaky is really the [00:17:17] much alive today shaky is really the great grandfather of things like [00:17:19] great grandfather of things like self-driving cars and even military [00:17:21] self-driving cars and even military drones the hardware was really pretty [00:17:24] drones the hardware was really pretty primitive [00:17:26] primitive but the software architecture and the [00:17:28] but the software architecture and the software algorithms are what changed the [00:17:31] software algorithms are what changed the world i think we all thought we were [00:17:32] world i think we all thought we were doing really interesting stuff so it [00:17:34] doing really interesting stuff so it didn't really uh dawn on us that we were [00:17:37] didn't really uh dawn on us that we were doing anything special shaky established [00:17:40] doing anything special shaky established a position about [00:17:42] a position about what we should be thinking about as [00:17:44] what we should be thinking about as possible as feasible [00:17:46] possible as feasible to understand why shaky is so important [00:17:49] to understand why shaky is so important we have to go back to 1966 and we have [00:17:52] we have to go back to 1966 and we have to understand where ai research was at [00:17:54] to understand where ai research was at that time [00:17:55] that time well you have to remember that it was [00:17:57] well you have to remember that it was pretty much of a a green field which [00:17:59] pretty much of a a green field which nikki started [00:18:01] nikki started all over the country and even outside of [00:18:03] all over the country and even outside of the united states people were beginning [00:18:05] the united states people were beginning to build the components to artificial [00:18:07] to build the components to artificial intelligence nobody had tried at the [00:18:09] intelligence nobody had tried at the time that shaky was launched to [00:18:12] time that shaky was launched to integrate all the components of ai and [00:18:14] integrate all the components of ai and robotics into a single [00:18:16] robotics into a single uh [00:18:17] uh moving vehicle that could reason about [00:18:20] moving vehicle that could reason about the world could sense the world around [00:18:21] the world could sense the world around it and could take actions prior to 1966 [00:18:25] it and could take actions prior to 1966 there were no robots or at least [00:18:27] there were no robots or at least non-intelligent ones [00:18:29] non-intelligent ones [Music] [00:18:31] [Music] the concept of an intelligent robot was [00:18:33] the concept of an intelligent robot was limited to the realm of fiction you will [00:18:36] limited to the realm of fiction you will meet a charming character in the robot [00:18:38] meet a charming character in the robot always at your service when you read the [00:18:42] always at your service when you read the title of the original proposal it was [00:18:44] title of the original proposal it was something like a mobile automaton for [00:18:47] something like a mobile automaton for reconnaissance [00:18:49] reconnaissance and the reason we called it an automaton [00:18:52] and the reason we called it an automaton was because until shaking you couldn't [00:18:55] was because until shaking you couldn't go to a funding agency and say i want [00:18:57] go to a funding agency and say i want money to make a science fiction kind of [00:18:59] money to make a science fiction kind of device so we needed name and finally [00:19:03] device so we needed name and finally charlie [00:19:04] charlie in his inimitable fashion said [00:19:06] in his inimitable fashion said it shakes like hell when it moves let's [00:19:08] it shakes like hell when it moves let's just call it shaky [00:19:10] just call it shaky key components of shaky's hardware were [00:19:13] key components of shaky's hardware were a tv camera to observe its environment [00:19:15] a tv camera to observe its environment an antenna radio link [00:19:17] an antenna radio link bump detectors and a push bar to move [00:19:19] bump detectors and a push bar to move objects [00:19:20] objects my role was mainly get the images and [00:19:24] my role was mainly get the images and get uh whatever coordinates they needed [00:19:26] get uh whatever coordinates they needed to determine where they were and uh [00:19:29] to determine where they were and uh you know extract the information from [00:19:31] you know extract the information from the image i remember when i first saw it [00:19:34] the image i remember when i first saw it gee that looks like a dishwasher [00:19:36] gee that looks like a dishwasher on wheels while charming shaky wasn't [00:19:39] on wheels while charming shaky wasn't impressive for its looks [00:19:41] impressive for its looks it was the ai and programming [00:19:43] it was the ai and programming advancements that made it famous [00:19:45] advancements that made it famous we structured shaky software in four [00:19:47] we structured shaky software in four distinct layers [00:19:49] distinct layers and that was the first time the layered [00:19:52] and that was the first time the layered architecture was used for robots [00:19:54] architecture was used for robots chicky's pioneering software [00:19:56] chicky's pioneering software architecture paved the way to a new era [00:19:58] architecture paved the way to a new era of ai and robotics the sri team later [00:20:01] of ai and robotics the sri team later developed flaky a research robot that [00:20:04] developed flaky a research robot that demonstrated fuzzy logic and [00:20:05] demonstrated fuzzy logic and goal-oriented behavior [00:20:07] goal-oriented behavior then came centabots one of the earliest [00:20:10] then came centabots one of the earliest projects in swarm robotics where 100 [00:20:12] projects in swarm robotics where 100 autonomous robots demonstrated the [00:20:14] autonomous robots demonstrated the ability to map a complex area [00:20:16] ability to map a complex area collaboratively [00:20:18] collaboratively i like how it's code that isn't just [00:20:21] i like how it's code that isn't just turning numbers into other numbers you [00:20:23] turning numbers into other numbers you get to see the thing come to life right [00:20:25] get to see the thing come to life right next to you [00:20:30] shaky also inspired research in natural [00:20:32] shaky also inspired research in natural language-based interactions leading to [00:20:34] language-based interactions leading to the popular speech-based technologies [00:20:36] the popular speech-based technologies that we use today [00:20:38] that we use today shakey's breakthrough in computer vision [00:20:40] shakey's breakthrough in computer vision is now used to help drivers stay in [00:20:42] is now used to help drivers stay in their lanes [00:20:44] their lanes and every time you get driving [00:20:45] and every time you get driving directions on your phone or navigation [00:20:47] directions on your phone or navigation system you are benefiting from the a [00:20:49] system you are benefiting from the a star navigation algorithm first invented [00:20:52] star navigation algorithm first invented for shaky [00:20:54] for shaky even nasa's mars exploration rovers use [00:20:57] even nasa's mars exploration rovers use navigational techniques that were first [00:20:59] navigational techniques that were first launched with shaky [00:21:01] launched with shaky the future is things like potentially [00:21:02] the future is things like potentially having uh teams of autonomous aircraft [00:21:05] having uh teams of autonomous aircraft that could go out for example and do [00:21:07] that could go out for example and do firefighting and doing this either fully [00:21:09] firefighting and doing this either fully autonomously or potentially in tandem [00:21:10] autonomously or potentially in tandem with human piloted aircraft that can go [00:21:12] with human piloted aircraft that can go out and work with them collaboratively [00:21:16] out and work with them collaboratively shaky now resides in the computer [00:21:18] shaky now resides in the computer history museum visible to hundreds of [00:21:20] history museum visible to hundreds of thousands of visitors annually [00:21:22] thousands of visitors annually and in 2017 shaky was honored with an [00:21:25] and in 2017 shaky was honored with an ieee milestone achievement award [00:21:27] ieee milestone achievement award the shaken milestone's important because [00:21:29] the shaken milestone's important because first of all shaky is the world's first [00:21:32] first of all shaky is the world's first mobile intelligent robot [00:21:34] mobile intelligent robot in addition [00:21:35] in addition this is the first ieee milestone in [00:21:38] this is the first ieee milestone in areas of either robotics or artificial [00:21:41] areas of either robotics or artificial intelligence [00:21:42] intelligence looking back more than 50 years after [00:21:45] looking back more than 50 years after the shaky project began it's inspiring [00:21:47] the shaky project began it's inspiring to see how one small team can make such [00:21:50] to see how one small team can make such an impact [00:21:51] an impact how one ambitious idea continues to [00:21:53] how one ambitious idea continues to benefit our lives [00:21:56] benefit our lives how one robot changed the world [00:21:59] how one robot changed the world we didn't realize i think any of us what [00:22:00] we didn't realize i think any of us what the significance of this was we knew we [00:22:02] the significance of this was we knew we were the first but nobody knew where it [00:22:04] were the first but nobody knew where it was going and i don't think any of us [00:22:06] was going and i don't think any of us would have predicted [00:22:08] would have predicted what happened [00:22:10] shaky planted the flag way out there [00:22:13] shaky planted the flag way out there it's a model of the kind of ambitious [00:22:15] it's a model of the kind of ambitious projects that we should be looking at in [00:22:17] projects that we should be looking at in the future [00:22:19] the future all right so that was [00:22:22] all right so that was shaky's video this is actually the yeah [00:22:24] shaky's video this is actually the yeah and the computer history museum uh down [00:22:27] and the computer history museum uh down the road so when things open up like i [00:22:29] the road so when things open up like i suggest going there and seeing shaky [00:22:31] suggest going there and seeing shaky there [00:22:32] there um cool so that was my quick history of [00:22:34] um cool so that was my quick history of robotics um [00:22:37] robotics um any questions [00:22:38] any questions thumbs up just feel free to ask [00:22:41] thumbs up just feel free to ask for [00:22:42] for later and [00:22:43] later and so uh in the next part what i'd like to [00:22:46] so uh in the next part what i'd like to do is i'd like to talk a little bit [00:22:48] do is i'd like to talk a little bit about how is this related to some of the [00:22:50] about how is this related to some of the topics that we are we're learning [00:22:51] topics that we are we're learning through this class so how are robots in [00:22:53] through this class so how are robots in general like using ideas around ai and i [00:22:56] general like using ideas around ai and i want to spend a little bit of time on [00:22:57] want to spend a little bit of time on that [00:22:58] that so [00:22:59] so if you think about robotics there's this [00:23:02] if you think about robotics there's this common architecture that is usually used [00:23:05] common architecture that is usually used for robots [00:23:06] for robots um it's not it's more under question [00:23:09] um it's not it's more under question these days but back in the day actually [00:23:11] these days but back in the day actually like this was the architecture that a [00:23:12] like this was the architecture that a lot of robots tend to use which is the [00:23:15] lot of robots tend to use which is the sensing planning and acting architecture [00:23:17] sensing planning and acting architecture and then looping that right so so you [00:23:19] and then looping that right so so you sense the world you watch the world you [00:23:21] sense the world you watch the world you perceive the world you do perception [00:23:23] perceive the world you do perception and then from that you try to plan [00:23:26] and then from that you try to plan on what to do next that is where the [00:23:28] on what to do next that is where the intelligence lies [00:23:30] intelligence lies and then after that you just act you [00:23:32] and then after that you just act you execute that plan [00:23:34] execute that plan and then once you've acted like you can [00:23:36] and then once you've acted like you can go back and sense plan and act again and [00:23:38] go back and sense plan and act again and that's a very common architecture that [00:23:40] that's a very common architecture that most robots do use and then these days [00:23:43] most robots do use and then these days people are thinking about a more [00:23:45] people are thinking about a more intertwined relationship between sense [00:23:47] intertwined relationship between sense plan and act for example uh you [00:23:49] plan and act for example uh you shouldn't just sense the world for the [00:23:50] shouldn't just sense the world for the sake of sensing right like sensing needs [00:23:52] sake of sensing right like sensing needs to be active so there's this area called [00:23:54] to be active so there's this area called active perception which is about the [00:23:56] active perception which is about the fact that i only sense the parts i care [00:23:59] fact that i only sense the parts i care about and i need to act on and and we [00:24:01] about and i need to act on and and we should have like this intertwined [00:24:02] should have like this intertwined relationship between acting and sensing [00:24:05] relationship between acting and sensing uh or there's another paradigm these [00:24:06] uh or there's another paradigm these days that basically tries to go from [00:24:08] days that basically tries to go from images to to actuation and and kind of [00:24:11] images to to actuation and and kind of like skip that planet [00:24:13] like skip that planet uh part of it by by replacing by [00:24:15] uh part of it by by replacing by replacing it by neural networks right so [00:24:17] replacing it by neural networks right so if i have a machine learning system and [00:24:19] if i have a machine learning system and i start from images can i get directly a [00:24:21] i start from images can i get directly a control on my robot [00:24:24] control on my robot that's another paradigm i'll talk about [00:24:25] that's another paradigm i'll talk about that a little bit actually in this [00:24:27] that a little bit actually in this section but let's just consider this [00:24:29] section but let's just consider this this type of paradigm of sensing [00:24:30] this type of paradigm of sensing planning and acting [00:24:32] planning and acting and in this class starting next week [00:24:34] and in this class starting next week we're going to first talk about search [00:24:36] we're going to first talk about search and actually as you heard in the video [00:24:38] and actually as you heard in the video uh we're going to talk about algorithms [00:24:40] uh we're going to talk about algorithms like ai star that's actually something [00:24:42] like ai star that's actually something that we will we will discuss next week [00:24:44] that we will we will discuss next week and ai start was introduced for shaky [00:24:47] and ai start was introduced for shaky right like the point of it it's actually [00:24:48] right like the point of it it's actually like an extension of dijkstra's [00:24:49] like an extension of dijkstra's algorithm and it does have a bunch of [00:24:51] algorithm and it does have a bunch of heuristics and it's fairly fast and it [00:24:54] heuristics and it's fairly fast and it was introduced for for things like [00:24:55] was introduced for for things like robots moving around and navigating [00:24:57] robots moving around and navigating around today we do use a lot of sampling [00:25:00] around today we do use a lot of sampling based type techniques so the algorithm [00:25:02] based type techniques so the algorithm that you see here running is this [00:25:04] that you see here running is this algorithm called rrt star which is [00:25:06] algorithm called rrt star which is similar to a star to some extent but [00:25:08] similar to a star to some extent but it's more of a sampling based algorithm [00:25:10] it's more of a sampling based algorithm creating this tree this dense tree and [00:25:12] creating this tree this dense tree and navigating along along the right lines [00:25:14] navigating along along the right lines of tree so so the type of things we will [00:25:16] of tree so so the type of things we will be talking about next week search [00:25:18] be talking about next week search actually falls a bunch like in planning [00:25:21] actually falls a bunch like in planning for robots right like when you're [00:25:22] for robots right like when you're planning for robots you should think [00:25:23] planning for robots you should think about searching in that space how do you [00:25:25] about searching in that space how do you get from one location space to another [00:25:28] get from one location space to another location or how do we get from one robot [00:25:30] location or how do we get from one robot configuration to another robot [00:25:32] configuration to another robot configuration [00:25:34] configuration following search the week after we're [00:25:36] following search the week after we're going to talk about mvps and games right [00:25:38] going to talk about mvps and games right mvps or markup decision processes [00:25:41] mvps or markup decision processes basically the idea there is the world [00:25:43] basically the idea there is the world has uncertainty and we should actually [00:25:45] has uncertainty and we should actually model those probabilities and [00:25:47] model those probabilities and uncertainties and that commonly shows up [00:25:49] uncertainties and that commonly shows up when you think about robots interacting [00:25:51] when you think about robots interacting with each other or with the world in [00:25:52] with each other or with the world in general right like when you have dynamic [00:25:54] general right like when you have dynamic environments around you when you have a [00:25:56] environments around you when you have a self-driving car driving right next to [00:25:58] self-driving car driving right next to other cars right that that you can model [00:26:00] other cars right that that you can model that as an mvp and similarly if you [00:26:03] that as an mvp and similarly if you think about this interaction with [00:26:05] think about this interaction with another intelligent agent you can model [00:26:07] another intelligent agent you can model that as a game and then we will be [00:26:08] that as a game and then we will be discussing that in a couple of weeks and [00:26:11] discussing that in a couple of weeks and these ideas show up a lot in robotics so [00:26:14] these ideas show up a lot in robotics so here uh the video on the left basically [00:26:16] here uh the video on the left basically shows two robots that are trying to [00:26:18] shows two robots that are trying to coordinate with each other uh and what [00:26:20] coordinate with each other uh and what they're trying to do is they're trying [00:26:22] they're trying to do is they're trying to move this rod together but the [00:26:24] to move this rod together but the interesting thing is that they're [00:26:26] interesting thing is that they're decentralized they don't talk to each [00:26:28] decentralized they don't talk to each other and they have different [00:26:29] other and they have different observabilities so the robot in the [00:26:31] observabilities so the robot in the front can see the the [00:26:33] front can see the the books here and the robot in the back can [00:26:35] books here and the robot in the back can only see the boxes and just because of [00:26:38] only see the boxes and just because of the forces and the feel of the forces [00:26:40] the forces and the feel of the forces they can understand what the other agent [00:26:42] they can understand what the other agent is doing they can learn what the other [00:26:43] is doing they can learn what the other agent's policy is and coordinate with [00:26:45] agent's policy is and coordinate with that agent to do this collaborative type [00:26:48] that agent to do this collaborative type of maneuver [00:26:50] of maneuver here's another example uh this is also [00:26:52] here's another example uh this is also from [00:26:53] from my lab so uh here what we're looking at [00:26:56] my lab so uh here what we're looking at is we're looking at two robots playing [00:26:58] is we're looking at two robots playing your hockey um and and here we are again [00:27:01] your hockey um and and here we are again like having this paradigm this game [00:27:02] like having this paradigm this game theoretic paradigm of two robots trying [00:27:04] theoretic paradigm of two robots trying to coordinate with each other there's a [00:27:06] to coordinate with each other there's a bit of more learning happening here so [00:27:08] bit of more learning happening here so the robots are trying to again learn the [00:27:10] the robots are trying to again learn the policy of the other agent or [00:27:11] policy of the other agent or representation of the policy of the [00:27:13] representation of the policy of the other robot [00:27:14] other robot and based on that like kind of trick the [00:27:17] and based on that like kind of trick the other agent or influence the other agent [00:27:19] other agent or influence the other agent and win this air hockey game [00:27:22] and win this air hockey game so okay so mvps and games pretty much [00:27:24] so okay so mvps and games pretty much show up for any type of interactive [00:27:27] show up for any type of interactive systems and as robots are leaving [00:27:29] systems and as robots are leaving factory floors they have more and more [00:27:30] factory floors they have more and more interactions with people or with other [00:27:32] interactions with people or with other agents and they're super useful again [00:27:34] agents and they're super useful again for for the planning part of robotics [00:27:38] for for the planning part of robotics we will see bayesian networks [00:27:39] we will see bayesian networks immediately after that asian networks [00:27:41] immediately after that asian networks again are super useful when it comes to [00:27:44] again are super useful when it comes to things like mapping and estimation uh so [00:27:47] things like mapping and estimation uh so here there's this algorithm called [00:27:49] here there's this algorithm called simultaneous localization and mapping [00:27:51] simultaneous localization and mapping slam [00:27:52] slam and basically the idea of it is when you [00:27:54] and basically the idea of it is when you go to this new environment and you don't [00:27:56] go to this new environment and you don't know anything about this new environment [00:27:57] know anything about this new environment you're going to sample points and based [00:27:59] you're going to sample points and based on that you're going to create a map and [00:28:01] on that you're going to create a map and navigate yourself around it and estimate [00:28:03] navigate yourself around it and estimate where you are so a lot of ideas around [00:28:06] where you are so a lot of ideas around bayesian networks shows up here uh [00:28:08] bayesian networks shows up here uh actually in one of the homeworks uh [00:28:10] actually in one of the homeworks uh we're going to look at uh things like [00:28:13] we're going to look at uh things like particle filters and things like state [00:28:15] particle filters and things like state estimation [00:28:16] estimation and and based on that [00:28:18] and and based on that how [00:28:19] how how do we use ideas from bayesian [00:28:20] how do we use ideas from bayesian networks uh to do better estimation of [00:28:23] networks uh to do better estimation of where we are where other agents are and [00:28:25] where we are where other agents are and again that is super useful for any [00:28:27] again that is super useful for any robotic system that tries to like do [00:28:29] robotic system that tries to like do anything in the world basically okay [00:28:32] anything in the world basically okay all right so a lot of that was around [00:28:34] all right so a lot of that was around planning uh logic is also another topic [00:28:36] planning uh logic is also another topic we will discuss in this class it is not [00:28:38] we will discuss in this class it is not as much used in robotics but it does [00:28:41] as much used in robotics but it does show up at various places so here this [00:28:44] show up at various places so here this is actually a work by chris gazitz group [00:28:46] is actually a work by chris gazitz group uh and um [00:28:48] uh and um this is called [00:28:50] this is called ltl mop [00:28:51] ltl mop uh which is basically a tool and the [00:28:54] uh which is basically a tool and the idea here is they try to get this robot [00:28:56] idea here is they try to get this robot navigate the space and go to various [00:28:58] navigate the space and go to various squares here [00:29:00] squares here and while doing that it tries to satisfy [00:29:03] and while doing that it tries to satisfy some logical formula so instead of [00:29:05] some logical formula so instead of giving it an objective a loss function [00:29:07] giving it an objective a loss function and then doing gradient descent let's [00:29:08] and then doing gradient descent let's say on that and try to come up with a [00:29:10] say on that and try to come up with a policy what this robot does is it gets [00:29:12] policy what this robot does is it gets that logic formula and based on that it [00:29:15] that logic formula and based on that it creates a plan on how to navigate this [00:29:17] creates a plan on how to navigate this space and why would anyone want to do [00:29:20] space and why would anyone want to do that well the reason why would anyone [00:29:21] that well the reason why would anyone want to do that is if you have that [00:29:23] want to do that is if you have that logic formula you can actually prove [00:29:25] logic formula you can actually prove things about this robot you can actually [00:29:27] things about this robot you can actually prove that if it would satisfy the [00:29:29] prove that if it would satisfy the specification or if it wouldn't satisfy [00:29:31] specification or if it wouldn't satisfy the specification and again that is very [00:29:34] the specification and again that is very useful when you think about safety of [00:29:36] useful when you think about safety of let's say autonomous cars right if you [00:29:37] let's say autonomous cars right if you want to prove that your autonomous cars [00:29:39] want to prove that your autonomous cars are safe [00:29:40] are safe you would need to add a little bit of [00:29:42] you would need to add a little bit of logic there you would need to think [00:29:43] logic there you would need to think about how that can be used in planning [00:29:46] about how that can be used in planning and also it also helps with transparency [00:29:48] and also it also helps with transparency because there is a smaller gap between [00:29:50] because there is a smaller gap between let's say natural language and temporal [00:29:53] let's say natural language and temporal logic which is this logical language [00:29:55] logic which is this logical language that they're using here and that smaller [00:29:58] that they're using here and that smaller gap can help us like have a more [00:29:59] gap can help us like have a more transparent and [00:30:01] transparent and clear idea of what the robot is doing [00:30:03] clear idea of what the robot is doing here okay [00:30:05] here okay all right so that was all planning and [00:30:08] all right so that was all planning and all the topics that we are discussing in [00:30:10] all the topics that we are discussing in this class [00:30:11] this class we are currently talking about machine [00:30:13] we are currently talking about machine learning a common place that machine [00:30:15] learning a common place that machine learning shows up in robotics is on the [00:30:18] learning shows up in robotics is on the sensing side of things so you sense the [00:30:20] sensing side of things so you sense the world and based on that you you perceive [00:30:22] world and based on that you you perceive the role so [00:30:23] the role so perception and vision is is a big part [00:30:26] perception and vision is is a big part of robotics and uh a lot of that is done [00:30:29] of robotics and uh a lot of that is done using machine learning these days right [00:30:31] using machine learning these days right so so you have a machine learning [00:30:33] so so you have a machine learning network that basically tries to do [00:30:35] network that basically tries to do object recognition and activity [00:30:37] object recognition and activity prediction of what other objects around [00:30:39] prediction of what other objects around you are doing uh or basically like who's [00:30:42] you are doing uh or basically like who's the owner of the car in this case i [00:30:44] the owner of the car in this case i think [00:30:45] think um [00:30:46] um and what are the other objects around us [00:30:47] and what are the other objects around us and things of those forms so that's a [00:30:49] and things of those forms so that's a very common place that machine learning [00:30:51] very common place that machine learning shows up in robotics and the sensing and [00:30:53] shows up in robotics and the sensing and perception side of things [00:30:56] perception side of things and you might ask about this acting part [00:30:57] and you might ask about this acting part well what goes in the acting part so [00:30:59] well what goes in the acting part so it's not just ai that shows up in [00:31:01] it's not just ai that shows up in robotics a bunch of other fields also [00:31:03] robotics a bunch of other fields also show up in robotics specifically control [00:31:05] show up in robotics specifically control theory and optimization is kind of like [00:31:07] theory and optimization is kind of like the core of this acting component of [00:31:09] the core of this acting component of this architecture of sense plan and act [00:31:12] this architecture of sense plan and act um and and basically the idea is you [00:31:14] um and and basically the idea is you might again have an objective like [00:31:16] might again have an objective like following a trajectory and you actually [00:31:18] following a trajectory and you actually want to put the right control the right [00:31:20] want to put the right control the right accelerations and fraud on the steering [00:31:22] accelerations and fraud on the steering angle on your autonomous car in this [00:31:24] angle on your autonomous car in this case and get your autonomous car [00:31:27] case and get your autonomous car navigators [00:31:29] navigators here [00:31:30] here and a lot of that a core of that is [00:31:32] and a lot of that a core of that is actually done using ideas around control [00:31:34] actually done using ideas around control theory more recently people have been [00:31:36] theory more recently people have been using ideas around machine learning here [00:31:38] using ideas around machine learning here too so like adding ideas around deep [00:31:40] too so like adding ideas around deep learning uh for actually planning and [00:31:43] learning uh for actually planning and getting the car to to navigate in this [00:31:45] getting the car to to navigate in this space [00:31:46] space so those are some of the core ideas as i [00:31:49] so those are some of the core ideas as i mentioned earlier there are some other [00:31:51] mentioned earlier there are some other paradigms of the sense planning act [00:31:54] paradigms of the sense planning act one specific paradigm which i think is [00:31:56] one specific paradigm which i think is pretty interesting is to use machine [00:31:59] pretty interesting is to use machine learning to do kind of what is all of [00:32:01] learning to do kind of what is all of the sense plan and act by by trying to [00:32:04] the sense plan and act by by trying to learn from humans [00:32:06] learn from humans so so this is commonly referred to as [00:32:08] so so this is commonly referred to as learning from demonstrations or [00:32:10] learning from demonstrations or imitation learning and the idea of it is [00:32:13] imitation learning and the idea of it is if i just watch how humans do things [00:32:16] if i just watch how humans do things then from that i can just directly [00:32:18] then from that i can just directly figure out what their objective was or [00:32:19] figure out what their objective was or what their policy is [00:32:21] what their policy is and then the idea has been around since [00:32:23] and then the idea has been around since like 2000s in the area of robotics [00:32:26] like 2000s in the area of robotics the work on the left that i want to show [00:32:28] the work on the left that i want to show is actually like this idea directly [00:32:30] is actually like this idea directly being being applied um to robotics this [00:32:33] being being applied um to robotics this is work that peter rubio and andrew ing [00:32:35] is work that peter rubio and andrew ing did in 2004 at stanford um and basically [00:32:39] did in 2004 at stanford um and basically the idea is um [00:32:41] the idea is um there are basically these helicopters [00:32:43] there are basically these helicopters that before then um it was really hard [00:32:46] that before then um it was really hard to fly them by just using control like [00:32:48] to fly them by just using control like by just using like ai and control it was [00:32:50] by just using like ai and control it was actually just really hard to to to fly [00:32:53] actually just really hard to to to fly them [00:32:53] them and then there are these pilots that can [00:32:55] and then there are these pilots that can fly these helicopters just much more [00:32:57] fly these helicopters just much more easily much more simply and and [00:33:00] easily much more simply and and basically the idea is uh could we use [00:33:02] basically the idea is uh could we use that here uh to get this horizontal fly [00:33:06] that here uh to get this horizontal fly in this space so this is [00:33:16] and by watching a human pilot learn to [00:33:18] and by watching a human pilot learn to fly it then learns to fly by itself [00:33:21] fly it then learns to fly by itself and so what it does is watch the person [00:33:24] and so what it does is watch the person fly and then um they'll try to fly the [00:33:26] fly and then um they'll try to fly the same stunt movements by itself and maybe [00:33:29] same stunt movements by itself and maybe try a few times until instead of nails [00:33:32] try a few times until instead of nails and what you're seeing is the end result [00:33:34] and what you're seeing is the end result of this machine learning process called [00:33:36] of this machine learning process called apprenticeship [00:33:39] so that was this idea of apprenticeship [00:33:41] so that was this idea of apprenticeship learning where kind of like for the [00:33:43] learning where kind of like for the first time like people were able to fly [00:33:45] first time like people were able to fly these quadcopters autonomously [00:33:48] these quadcopters autonomously but by learning from human pilots by [00:33:50] but by learning from human pilots by learning from human experts and that [00:33:52] learning from human experts and that idea has been around in research and [00:33:54] idea has been around in research and people in general have been thinking [00:33:55] people in general have been thinking about how we can learn from humans how [00:33:57] about how we can learn from humans how can we get robots to act in the world by [00:34:00] can we get robots to act in the world by directly learning from humans but not [00:34:02] directly learning from humans but not just from demonstrations but also from [00:34:05] just from demonstrations but also from things like asking their preferences or [00:34:07] things like asking their preferences or language instructions uh so in my lab [00:34:09] language instructions uh so in my lab actually we're doing [00:34:11] actually we're doing a lot of work in this domain where we're [00:34:13] a lot of work in this domain where we're looking at preference-based learning and [00:34:15] looking at preference-based learning and actively basically querying people for [00:34:18] actively basically querying people for what their preference trajectory is and [00:34:20] what their preference trajectory is and in order to basically get a robot to [00:34:23] in order to basically get a robot to play some version of mini golf here like [00:34:25] play some version of mini golf here like targeting for one of these balls and get [00:34:27] targeting for one of these balls and get the robot to actually hit the ball [00:34:28] the robot to actually hit the ball correctly so it gets in the right goal [00:34:31] correctly so it gets in the right goal and that's kind of exciting because you [00:34:32] and that's kind of exciting because you can learn from all sorts of human [00:34:34] can learn from all sorts of human feedback right you can learn from [00:34:36] feedback right you can learn from demonstrations comparisons um language [00:34:39] demonstrations comparisons um language and from that you are able to basically [00:34:42] and from that you are able to basically get a robot to an interesting type of [00:34:44] get a robot to an interesting type of maneuver [00:34:45] maneuver another place that machine learning [00:34:46] another place that machine learning shows up is again like by combining this [00:34:49] shows up is again like by combining this sensing plan and act into one giant box [00:34:52] sensing plan and act into one giant box right like starting from visual data and [00:34:54] right like starting from visual data and once you start from visual data can you [00:34:57] once you start from visual data can you try to directly get actions on your [00:34:59] try to directly get actions on your robots and and this idea of robot [00:35:01] robots and and this idea of robot learning has been very popular in recent [00:35:03] learning has been very popular in recent years um this is work from berkeley from [00:35:06] years um this is work from berkeley from 2015 [00:35:07] 2015 where basically the idea is to get the [00:35:09] where basically the idea is to get the robot to just try out things and then [00:35:13] robot to just try out things and then from that from these images and and kind [00:35:15] from that from these images and and kind of joint angles of what the robot tries [00:35:18] of joint angles of what the robot tries from that it learns how to achieve the [00:35:19] from that it learns how to achieve the task and how to do the test it's a very [00:35:22] task and how to do the test it's a very data intensive type of process but [00:35:24] data intensive type of process but there's a lot of excitement because you [00:35:26] there's a lot of excitement because you can achieve things that you weren't able [00:35:27] can achieve things that you weren't able to achieve before [00:35:29] to achieve before and we have seen a lot of advances in [00:35:30] and we have seen a lot of advances in machine learning uh when it's applied to [00:35:32] machine learning uh when it's applied to let's say nlp and vision places where we [00:35:35] let's say nlp and vision places where we have a lot of data we don't have that [00:35:37] have a lot of data we don't have that much data in robotics but if that is the [00:35:39] much data in robotics but if that is the bottleneck then maybe we can create an [00:35:41] bottleneck then maybe we can create an arm farm kind of like this this image [00:35:44] arm farm kind of like this this image this video here [00:35:46] this video here and and create an arm form and basically [00:35:48] and and create an arm form and basically just collect lots of lots of data of [00:35:50] just collect lots of lots of data of robots basically like navigating in this [00:35:52] robots basically like navigating in this space moving randomly inside of this box [00:35:55] space moving randomly inside of this box and from that learn how to just grasp [00:35:58] and from that learn how to just grasp any object that we see [00:35:59] any object that we see um this is actually worked by uh google [00:36:02] um this is actually worked by uh google robotics so google x has a sub group [00:36:04] robotics so google x has a sub group google robotics that does a lot of work [00:36:07] google robotics that does a lot of work on robot learning a lot of interesting [00:36:08] on robot learning a lot of interesting work [00:36:09] work and yeah like very intensive you need to [00:36:12] and yeah like very intensive you need to very compute intensive [00:36:13] very compute intensive but lots of excitement around this idea [00:36:16] but lots of excitement around this idea of directly learning actions uh from [00:36:18] of directly learning actions uh from sensing data okay [00:36:22] all right [00:36:24] all right so [00:36:24] so let me move to oh okay so so those were [00:36:27] let me move to oh okay so so those were all the things i wanted to say on how ai [00:36:29] all the things i wanted to say on how ai is used in robotics but ai is not the [00:36:32] is used in robotics but ai is not the only thing that is used in robotics um [00:36:35] only thing that is used in robotics um there are a bunch of other things like [00:36:36] there are a bunch of other things like as you can as you've probably noticed [00:36:38] as you can as you've probably noticed robotics spans a bunch of different [00:36:39] robotics spans a bunch of different different departments [00:36:41] different departments for example you see robotics in [00:36:42] for example you see robotics in mechanical engineering and that has a [00:36:44] mechanical engineering and that has a very different view of robotics and and [00:36:47] very different view of robotics and and that view is usually focused on design [00:36:49] that view is usually focused on design and co-design which is a super important [00:36:51] and co-design which is a super important problem right so if you're thinking [00:36:52] problem right so if you're thinking about building an arm or building a hand [00:36:55] about building an arm or building a hand that needs to do very precise [00:36:56] that needs to do very precise manipulation what type of sensors you're [00:36:58] manipulation what type of sensors you're using how are you building this system [00:37:00] using how are you building this system like those are all really good questions [00:37:02] like those are all really good questions like how do we make sure that you build [00:37:04] like how do we make sure that you build a prosthetic that is not too heavy and [00:37:06] a prosthetic that is not too heavy and it's also comfortable and is also very [00:37:08] it's also comfortable and is also very safe for the person to use for walking [00:37:10] safe for the person to use for walking like these are all really great [00:37:11] like these are all really great questions that are usually like design [00:37:13] questions that are usually like design questions that are super important in [00:37:16] questions that are super important in robotics [00:37:17] robotics um another reason that this is important [00:37:19] um another reason that this is important is that there's a whole new area around [00:37:21] is that there's a whole new area around co-design which basically says well for [00:37:25] co-design which basically says well for whatever hardware we pick there's going [00:37:27] whatever hardware we pick there's going to be some ai algorithm but if i change [00:37:29] to be some ai algorithm but if i change that hardware my ai algorithm is going [00:37:31] that hardware my ai algorithm is going to change and if i change my ai [00:37:33] to change and if i change my ai algorithm like that could run on a [00:37:35] algorithm like that could run on a different hardware very differently so [00:37:37] different hardware very differently so can we design both of these at the same [00:37:38] can we design both of these at the same time we design what our robot should [00:37:41] time we design what our robot should look like and what algorithm should run [00:37:43] look like and what algorithm should run should run on them kind of at the same [00:37:45] should run on them kind of at the same time or can we have reconfigurable [00:37:47] time or can we have reconfigurable robots and a lot of excitement around [00:37:49] robots and a lot of excitement around this area in general when you think [00:37:51] this area in general when you think about design and co-design of these [00:37:53] about design and co-design of these systems [00:37:54] systems i just want to show a few other cool [00:37:56] i just want to show a few other cool designs that are out there that are very [00:37:59] designs that are out there that are very impressive [00:38:00] impressive um one design that i want to show is [00:38:02] um one design that i want to show is this robot on the left [00:38:04] this robot on the left this robot on the left is is a dynamic [00:38:07] this robot on the left is is a dynamic integrity structure so what it is is it [00:38:10] integrity structure so what it is is it basically has a bunch of rigid rigid [00:38:12] basically has a bunch of rigid rigid links like these guys [00:38:14] links like these guys and then these links are connected by [00:38:17] and then these links are connected by things that are kind of like ropes [00:38:20] things that are kind of like ropes and it's kind of like a funny structure [00:38:22] and it's kind of like a funny structure but the interesting thing about it is it [00:38:25] but the interesting thing about it is it is shock resistant so like nasa really [00:38:27] is shock resistant so like nasa really cares about this robot because landing [00:38:29] cares about this robot because landing robots on the surface of mars is really [00:38:31] robots on the surface of mars is really difficult [00:38:32] difficult but if i just drop this robot like [00:38:35] but if i just drop this robot like nothing happens to it and it can just [00:38:36] nothing happens to it and it can just roll around and continue moving forward [00:38:39] roll around and continue moving forward which is again a very interesting design [00:38:41] which is again a very interesting design for something that you don't want to to [00:38:43] for something that you don't want to to break because robots in general are [00:38:45] break because robots in general are pretty rigid and and this robot is very [00:38:47] pretty rigid and and this robot is very flexible so there's a lot of interest in [00:38:49] flexible so there's a lot of interest in basically building robots that are [00:38:51] basically building robots that are softer that are less frigid that are [00:38:53] softer that are less frigid that are flexible and this is an example of that [00:38:56] flexible and this is an example of that another robot that i think is kind of [00:38:58] another robot that i think is kind of fun i'll show this a little bit later a [00:39:00] fun i'll show this a little bit later a video of it um there's this robot from [00:39:02] video of it um there's this robot from stanford this is from mark kotkowski's [00:39:04] stanford this is from mark kotkowski's lab and it's called stickybot it's it's [00:39:07] lab and it's called stickybot it's it's basically a robot that has [00:39:09] basically a robot that has kind of gekko-like hands so it's hands [00:39:12] kind of gekko-like hands so it's hands are inspired by like hands of gecko and [00:39:14] are inspired by like hands of gecko and having like these suction cups and [00:39:16] having like these suction cups and because of that it can climb up like [00:39:18] because of that it can climb up like walls and really like slippery like [00:39:20] walls and really like slippery like slopes um which is again a very [00:39:23] slopes um which is again a very interesting design [00:39:25] interesting design another design that i find super [00:39:26] another design that i find super interesting is uh this inflatable snake [00:39:28] interesting is uh this inflatable snake robot this is from allison nakamura's [00:39:30] robot this is from allison nakamura's lab again in the mechanical engineering [00:39:32] lab again in the mechanical engineering at stanford and the idea here is that [00:39:35] at stanford and the idea here is that this robot can inflate itself as it goes [00:39:38] this robot can inflate itself as it goes through different different parts of the [00:39:40] through different different parts of the environment so it might be really [00:39:42] environment so it might be really difficult for example to go through go [00:39:43] difficult for example to go through go through this hole but like as the robot [00:39:45] through this hole but like as the robot is inflating it's actually making its [00:39:47] is inflating it's actually making its way through different like kind of [00:39:49] way through different like kind of narrow spaces and getting to various [00:39:51] narrow spaces and getting to various areas um of the space [00:39:54] areas um of the space this can also be used inside of our body [00:39:56] this can also be used inside of our body for example when we are doing like more [00:39:57] for example when we are doing like more intelligence type of endoscopy like we [00:39:59] intelligence type of endoscopy like we can actually send some of these robots [00:40:01] can actually send some of these robots inside of the body and navigate a little [00:40:02] inside of the body and navigate a little bit better inside of our body for [00:40:04] bit better inside of our body for robotic surgery for things like [00:40:07] robotic surgery for things like endoscopies and so on [00:40:10] endoscopies and so on i can take any questions now actually [00:40:13] i can take any questions now actually about any of this [00:40:16] about any of this uh a quick question so you mentioned [00:40:17] uh a quick question so you mentioned that robotic counseling stuff like [00:40:19] that robotic counseling stuff like computer revision machine learning [00:40:21] computer revision machine learning working by the system so what do you [00:40:23] working by the system so what do you recommend for people who want to go into [00:40:25] recommend for people who want to go into the field [00:40:26] the field uh do you study all [00:40:29] uh do you study all control of their novels or is there [00:40:32] control of their novels or is there was a role [00:40:34] was a role yeah that's a very good question yeah so [00:40:36] yeah that's a very good question yeah so it kind of seems like a giant thing [00:40:38] it kind of seems like a giant thing right like because it incorporates like [00:40:40] right like because it incorporates like everything and and the field of robotics [00:40:42] everything and and the field of robotics in general when you go to the [00:40:43] in general when you go to the conferences it is interesting because [00:40:45] conferences it is interesting because you see people from all these different [00:40:47] you see people from all these different fields but but they're coming together [00:40:49] fields but but they're coming together for the same problem but not the same [00:40:51] for the same problem but not the same technique which is which is a very [00:40:53] technique which is which is a very interesting kind of fields to be in but [00:40:55] interesting kind of fields to be in but now at the end of the day like everyone [00:40:57] now at the end of the day like everyone just like focuses on their expertise and [00:41:00] just like focuses on their expertise and then brings it together as part of a [00:41:02] then brings it together as part of a team so for example robotics and cs like [00:41:05] team so for example robotics and cs like at stanford i'll be talking about that [00:41:06] at stanford i'll be talking about that actually a little bit there's a lot [00:41:08] actually a little bit there's a lot focused on developing ai algorithms [00:41:10] focused on developing ai algorithms developing algorithmic side of this but [00:41:12] developing algorithmic side of this but they don't really like to as much on on [00:41:14] they don't really like to as much on on the design side but robotics in the [00:41:16] the design side but robotics in the mechanical engineering is very focused [00:41:17] mechanical engineering is very focused on building new designs new structures [00:41:20] on building new designs new structures and yeah we do have a lot of joint [00:41:21] and yeah we do have a lot of joint projects where we use new designs and [00:41:23] projects where we use new designs and try to develop new algorithms for them [00:41:25] try to develop new algorithms for them and lots of collaboration in these [00:41:27] and lots of collaboration in these different fields so it is a very [00:41:29] different fields so it is a very interdisciplinary field but yeah as i [00:41:31] interdisciplinary field but yeah as i said it might seem too large it's not [00:41:33] said it might seem too large it's not that large at the end of the day [00:41:34] that large at the end of the day everyone focuses on the thing that [00:41:36] everyone focuses on the thing that they're actually like very interested in [00:41:38] they're actually like very interested in or the same goal of building robots [00:41:43] i have a question [00:41:46] i have a question what subfields of robotics would be [00:41:48] what subfields of robotics would be really great to go into a startup for [00:41:50] really great to go into a startup for right now [00:41:51] right now uh so to start it right is that what you [00:41:53] uh so to start it right is that what you yeah so that's a good question to ask um [00:41:56] yeah so that's a good question to ask um yeah so i think there's a lot of [00:41:57] yeah so i think there's a lot of excitement around autonomous driving [00:41:58] excitement around autonomous driving right so and autonomous driving these [00:42:00] right so and autonomous driving these days is very focused on like vision and [00:42:03] days is very focused on like vision and machine learning and control theory so [00:42:05] machine learning and control theory so like those three kind of like [00:42:06] like those three kind of like backgrounds are commonly used in [00:42:08] backgrounds are commonly used in autonomous driving but beyond autonomous [00:42:11] autonomous driving but beyond autonomous driving i think um again a lot of big [00:42:13] driving i think um again a lot of big companies are using them so it's not [00:42:15] companies are using them so it's not necessarily startups but beyond that [00:42:17] necessarily startups but beyond that yeah so people are very interested in [00:42:18] yeah so people are very interested in domestic robots these days right [00:42:20] domestic robots these days right actually like getting robots inside [00:42:22] actually like getting robots inside people's houses which wasn't the case [00:42:25] people's houses which wasn't the case like a few years back even right like we [00:42:26] like a few years back even right like we have robots functioning very well on [00:42:29] have robots functioning very well on factory floors but having robots in our [00:42:31] factory floors but having robots in our homes is a big problem and there were [00:42:33] homes is a big problem and there were some startups that weren't very [00:42:34] some startups that weren't very successful so it's kind of like a [00:42:36] successful so it's kind of like a kind of like an edgy area to to be in [00:42:39] kind of like an edgy area to to be in but i think there's a lot of excitement [00:42:40] but i think there's a lot of excitement there like uh home automation like [00:42:43] there like uh home automation like things like kind of next generation [00:42:45] things like kind of next generation chromebooks or other things that we can [00:42:46] chromebooks or other things that we can have in in homes and again for that i [00:42:49] have in in homes and again for that i think like a lot of these systems are [00:42:50] think like a lot of these systems are using machine learning ai in general so [00:42:53] using machine learning ai in general so that is again like a huge like [00:42:54] that is again like a huge like background that you would want to have [00:42:56] background that you would want to have but the design is pretty important there [00:42:58] but the design is pretty important there so like with the type of like hardware [00:43:00] so like with the type of like hardware design that you use there is actually [00:43:01] design that you use there is actually super important [00:43:03] super important yeah and healthcare like like kind of [00:43:05] yeah and healthcare like like kind of like thinking about robotics being used [00:43:06] like thinking about robotics being used in healthcare in hospitals and things of [00:43:08] in healthcare in hospitals and things of those form i think that is also a very [00:43:10] those form i think that is also a very exciting area [00:43:13] exciting area thank you [00:43:14] thank you wanna ask a question [00:43:16] wanna ask a question uh yeah so he's kind of like follow up [00:43:18] uh yeah so he's kind of like follow up so he's about like a robotics like [00:43:20] so he's about like a robotics like research because right now like [00:43:23] research because right now like you know like after you said previously [00:43:25] you know like after you said previously be like there's a lot of things like [00:43:27] be like there's a lot of things like happening in ai stuff and also like [00:43:29] happening in ai stuff and also like there's also like something happening [00:43:31] there's also like something happening control theory so [00:43:33] control theory so is there any research kind of like [00:43:35] is there any research kind of like moving these two things together because [00:43:38] moving these two things together because uh for my undergrad i study also [00:43:40] uh for my undergrad i study also certainly something like control theory [00:43:41] certainly something like control theory and also ai because i find out like in [00:43:43] and also ai because i find out like in control theory they have like some [00:43:45] control theory they have like some really interesting concepts actually [00:43:47] really interesting concepts actually like stability like observability like [00:43:51] like stability like observability like all of this [00:43:53] all of this have you so [00:43:53] have you so there are any research like for example [00:43:55] there are any research like for example using this concept for example like a [00:43:57] using this concept for example like a reinforcement learning center [00:44:00] reinforcement learning center in like learning site apps which is [00:44:02] in like learning site apps which is missing like a stability or like [00:44:04] missing like a stability or like some concepts like this is there any [00:44:06] some concepts like this is there any research about that yeah there's a lot [00:44:08] research about that yeah there's a lot of excitement actually around that and i [00:44:10] of excitement actually around that and i totally agree with that right there is a [00:44:11] totally agree with that right there is a lot of interesting topics in control [00:44:12] lot of interesting topics in control theory there are a lot of interesting [00:44:14] theory there are a lot of interesting topics in ai and machine learning that [00:44:16] topics in ai and machine learning that are just making their ways into robotics [00:44:19] are just making their ways into robotics and the two communities um at first [00:44:21] and the two communities um at first there was a little bit of like clashes [00:44:23] there was a little bit of like clashes between them i would say but now i think [00:44:25] between them i would say but now i think there's a lot of like coming together [00:44:26] there's a lot of like coming together and trying to combine those ideas [00:44:28] and trying to combine those ideas there's a new conference called learning [00:44:30] there's a new conference called learning for uh dynamics and control of what we [00:44:33] for uh dynamics and control of what we see and the whole point of that is [00:44:35] see and the whole point of that is actually bring in learning people and [00:44:37] actually bring in learning people and control people together to try to use [00:44:39] control people together to try to use the same ideas yeah lots of research [00:44:41] the same ideas yeah lots of research trying to bring learning and control [00:44:42] trying to bring learning and control together and i think that is actually [00:44:44] together and i think that is actually the right direction because as you said [00:44:46] the right direction because as you said lots of interesting ideas in dynamics [00:44:47] lots of interesting ideas in dynamics and control and i think a lot of those [00:44:49] and control and i think a lot of those ideas could be used as prior structures [00:44:53] ideas could be used as prior structures uh that could be put on learning based [00:44:55] uh that could be put on learning based systems so when you're let's say [00:44:56] systems so when you're let's say training a neural network you can bring [00:44:58] training a neural network you can bring in structure that you know about the [00:44:59] in structure that you know about the system that comes from control theory [00:45:01] system that comes from control theory let's say [00:45:03] let's say [Music] [00:45:06] [Music] i had a question [00:45:07] i had a question sure [00:45:08] sure sure i was wondering you know you talked [00:45:10] sure i was wondering you know you talked about um armed forms and [00:45:12] about um armed forms and you know collecting lots of data do you [00:45:14] you know collecting lots of data do you feel like the field [00:45:16] feel like the field is more data limited or more algorithm [00:45:19] is more data limited or more algorithm and learning limited i think about when [00:45:21] and learning limited i think about when i learned to drive [00:45:22] i learned to drive it wasn't that way it should have been [00:45:24] it wasn't that way it should have been more data but it wasn't it was just [00:45:26] more data but it wasn't it was just maybe um a couple weeks of practice [00:45:29] maybe um a couple weeks of practice um and then i was [00:45:30] um and then i was ready that's a very good point so um i [00:45:34] ready that's a very good point so um i think it's a combination i do think the [00:45:35] think it's a combination i do think the field is very data limited and it's [00:45:37] field is very data limited and it's interesting because yeah when you learn [00:45:39] interesting because yeah when you learn to drive right you spend a couple of [00:45:41] to drive right you spend a couple of weeks but you have seen cars drive right [00:45:43] weeks but you have seen cars drive right next to you like the learning by [00:45:45] next to you like the learning by observation which is like a very [00:45:46] observation which is like a very interesting type of learning you're not [00:45:48] interesting type of learning you're not learning by doing you're learning by [00:45:49] learning by doing you're learning by like observing other people like driving [00:45:51] like observing other people like driving right next to you [00:45:52] right next to you uh and that has so much information in [00:45:55] uh and that has so much information in it it's kind of the same problem that i [00:45:57] it it's kind of the same problem that i mentioned earlier with the house of [00:45:58] mentioned earlier with the house of cards example right like your autonomous [00:46:00] cards example right like your autonomous car doesn't know that it's important but [00:46:02] car doesn't know that it's important but like you would know that you have so [00:46:04] like you would know that you have so much context like in the world and [00:46:06] much context like in the world and because of that like i think like the [00:46:08] because of that like i think like the issue is specifically for autonomous [00:46:10] issue is specifically for autonomous cars is some of these corner cases so [00:46:12] cars is some of these corner cases so like driving on a highway like that's [00:46:13] like driving on a highway like that's like solved right [00:46:15] like solved right the issue is like some of these corner [00:46:17] the issue is like some of these corner cases that it hasn't seen yet in the [00:46:18] cases that it hasn't seen yet in the data and maybe more data will solve that [00:46:21] data and maybe more data will solve that so i think more data is definitely a [00:46:23] so i think more data is definitely a problem i think we can still do better [00:46:25] problem i think we can still do better on our algorithms too but data is still [00:46:27] on our algorithms too but data is still like i would say that's the higher kind [00:46:29] like i would say that's the higher kind of issue at least in autonomous driving [00:46:31] of issue at least in autonomous driving i would say [00:46:34] i would say on the data side um do you feel that [00:46:36] on the data side um do you feel that synthetic data could be something that [00:46:38] synthetic data could be something that would be useful for machine learning [00:46:39] would be useful for machine learning applications or is that something that's [00:46:41] applications or is that something that's kind of like gonna always be a fantasy [00:46:43] kind of like gonna always be a fantasy i think it's super useful right like if [00:46:45] i think it's super useful right like if you can create near accident driving [00:46:46] you can create near accident driving scenarios and driving and then kind of [00:46:48] scenarios and driving and then kind of train your car in those settings and [00:46:50] train your car in those settings and then just generate that automatically [00:46:52] then just generate that automatically that would be super useful so then you [00:46:54] that would be super useful so then you don't need to wait forever to see like [00:46:55] don't need to wait forever to see like in your accident driving scenario on the [00:46:57] in your accident driving scenario on the vehicle i think one issue there is like [00:46:59] vehicle i think one issue there is like this simulation to reality gap which is [00:47:02] this simulation to reality gap which is which is a big problem specifically for [00:47:04] which is a big problem specifically for robotics but i do think generation of [00:47:06] robotics but i do think generation of data is important yeah [00:47:10] data is important yeah uh hi so i have a question regarding the [00:47:13] uh hi so i have a question regarding the deterministic of [00:47:15] deterministic of robotics for example uh in terms of [00:47:18] robotics for example uh in terms of machine regulations my government [00:47:19] machine regulations my government usually requires that actions of a [00:47:22] usually requires that actions of a machine to be [00:47:23] machine to be state and manufacturers of the machine [00:47:25] state and manufacturers of the machine sometimes responsible for these actions [00:47:27] sometimes responsible for these actions there is a machine learning base or deep [00:47:29] there is a machine learning base or deep learning based algorithms uh they are [00:47:31] learning based algorithms uh they are statistical by uh by definition so [00:47:34] statistical by uh by definition so how could [00:47:36] how could how do we define the responsibility of [00:47:38] how do we define the responsibility of the um manufacturer of the machines and [00:47:41] the um manufacturer of the machines and if there's artists that accidentally [00:47:43] if there's artists that accidentally comes by autonomous driver uh who should [00:47:45] comes by autonomous driver uh who should be responsible in this case yeah that's [00:47:48] be responsible in this case yeah that's very good that's a very good question [00:47:49] very good that's a very good question for tino next week actually so very good [00:47:52] for tino next week actually so very good question um yeah so the first point that [00:47:55] question um yeah so the first point that you made that yeah like like all the [00:47:57] you made that yeah like like all the laws are about about deterministic [00:47:59] laws are about about deterministic systems that's actually not like always [00:48:00] systems that's actually not like always the case so for example michael [00:48:02] the case so for example michael kochenderfer in the aeronastro [00:48:04] kochenderfer in the aeronastro department he has done a lot of work [00:48:06] department he has done a lot of work around palm vps that actually run on [00:48:08] around palm vps that actually run on like aircraft systems so there's this [00:48:10] like aircraft systems so there's this acas excuse system which is basically [00:48:13] acas excuse system which is basically unmanned like aircraft [00:48:15] unmanned like aircraft like like all of the all the motions [00:48:16] like like all of the all the motions landing and taking off and all of that [00:48:18] landing and taking off and all of that is done in kind of like an online [00:48:19] is done in kind of like an online setting but the system is small it's [00:48:21] setting but the system is small it's upon vp which with like a few states and [00:48:24] upon vp which with like a few states and all that and you can verify everything [00:48:25] all that and you can verify everything about it so there's a lot of interesting [00:48:27] about it so there's a lot of interesting rules around verification and validation [00:48:29] rules around verification and validation in this space and even if it is not [00:48:31] in this space and even if it is not deterministic you can't still like [00:48:33] deterministic you can't still like verify it uh so the small palm bp for [00:48:35] verify it uh so the small palm bp for example partially observable markup [00:48:37] example partially observable markup decision process you won't see that in [00:48:39] decision process you won't see that in the class if you're interested in that [00:48:40] the class if you're interested in that topic take michael's class but when it [00:48:42] topic take michael's class but when it comes to neural networks yeah we don't [00:48:44] comes to neural networks yeah we don't really have that much guarantees around [00:48:46] really have that much guarantees around it and [00:48:48] it and there's a lot of again discussion here [00:48:50] there's a lot of again discussion here so some people are taking the route of [00:48:51] so some people are taking the route of trying to prove things and trying to [00:48:53] trying to prove things and trying to verify neural networks clark barrett is [00:48:56] verify neural networks clark barrett is someone in the cs department who does a [00:48:58] someone in the cs department who does a lot of work on verification of neural [00:48:59] lot of work on verification of neural networks again we are limited in the [00:49:01] networks again we are limited in the size there so we can't have giant neural [00:49:03] size there so we can't have giant neural networks there [00:49:05] networks there another kind of perspective on this is [00:49:07] another kind of perspective on this is giving these statistical guarantees [00:49:09] giving these statistical guarantees right like if my autonomous car is safer [00:49:11] right like if my autonomous car is safer than humans statistically maybe that is [00:49:13] than humans statistically maybe that is good enough and we are okay with some [00:49:14] good enough and we are okay with some number of accidents some number of times [00:49:17] number of accidents some number of times maybe you would be okay with that [00:49:19] maybe you would be okay with that um and some of it is yeah like [00:49:20] um and some of it is yeah like acceptance issues too right like the [00:49:22] acceptance issues too right like the first like aircraft that were out there [00:49:24] first like aircraft that were out there probably not safe probably people were [00:49:26] probably not safe probably people were okay with that and the number of deaths [00:49:28] okay with that and the number of deaths were like higher right and and i think [00:49:30] were like higher right and and i think there's a little bit of that acceptance [00:49:32] there's a little bit of that acceptance issue for how this is going to pan out [00:49:34] issue for how this is going to pan out but it is actually a very good question [00:49:36] but it is actually a very good question of how are we going to regulate and who [00:49:38] of how are we going to regulate and who is going to be at plane uh one can take [00:49:40] is going to be at plane uh one can take tesla's approach and be like a human is [00:49:43] tesla's approach and be like a human is always in control so if anything happens [00:49:45] always in control so if anything happens it was it was a human's fault which is [00:49:46] it was it was a human's fault which is kind of like a weird type of type of [00:49:49] kind of like a weird type of type of approach i would say um it's not [00:49:52] approach i would say um it's not necessarily the safest way to go uh but [00:49:55] necessarily the safest way to go uh but yeah i don't honestly have good answers [00:49:56] yeah i don't honestly have good answers for this something to ask tino next week [00:50:03] i have a question with regards to the [00:50:05] i have a question with regards to the co-design [00:50:07] co-design so you mentioned about the hardware and [00:50:09] so you mentioned about the hardware and the ai algorithm needs to go hand in [00:50:12] the ai algorithm needs to go hand in hand [00:50:13] hand so [00:50:15] so for example the self-driving car has to [00:50:18] for example the self-driving car has to make you know the algorithms have to [00:50:20] make you know the algorithms have to make [00:50:21] make a quick decisions with the real time [00:50:24] a quick decisions with the real time change in the environment [00:50:26] change in the environment and if the algorithms take a long time [00:50:28] and if the algorithms take a long time to run and i know the hardware wise [00:50:31] to run and i know the hardware wise there are a variety of uh [00:50:34] there are a variety of uh ways the algorithm can be put on let's [00:50:37] ways the algorithm can be put on let's say on a gpu or on other aspects so [00:50:41] say on a gpu or on other aspects so would you please uh [00:50:43] would you please uh are there any uh pointers towards how [00:50:46] are there any uh pointers towards how these two [00:50:48] these two go hand in hand and what is the [00:50:52] go hand in hand and what is the best so i was more thinking about it in [00:50:55] best so i was more thinking about it in an offline kind of fashion right so in [00:50:57] an offline kind of fashion right so in an offline kind of fashion you can have [00:50:59] an offline kind of fashion you can have like a fancy algorithm that does like [00:51:01] like a fancy algorithm that does like everything like that has that takes a [00:51:02] everything like that has that takes a lot of compute let's say for a hardware [00:51:05] lot of compute let's say for a hardware that is very simple or you can kind of [00:51:06] that is very simple or you can kind of like increase complexity of your [00:51:08] like increase complexity of your hardware and on the other hand have a [00:51:10] hardware and on the other hand have a really simple algorithm that runs on it [00:51:12] really simple algorithm that runs on it one place that so i wasn't really [00:51:13] one place that so i wasn't really thinking about the online aspect of it [00:51:16] thinking about the online aspect of it which you're right like yeah they're [00:51:17] which you're right like yeah they're like running at different frequencies so [00:51:19] like running at different frequencies so how could they work together uh one [00:51:22] how could they work together uh one example of that is an assistive robotics [00:51:24] example of that is an assistive robotics assistive teleoperation when you're [00:51:26] assistive teleoperation when you're using a joystick to control a robot arm [00:51:28] using a joystick to control a robot arm this is something that you work on [00:51:30] this is something that you work on commonly and you can make the hardware [00:51:32] commonly and you can make the hardware very interesting very intricate like use [00:51:34] very interesting very intricate like use haptic devices and then be able to [00:51:36] haptic devices and then be able to control things like much easier on the [00:51:38] control things like much easier on the other hand you can have a hardware [00:51:39] other hand you can have a hardware that's really simple so for example [00:51:42] that's really simple so for example there are these sip and puff devices uh [00:51:44] there are these sip and puff devices uh that a lot of patients with disabilities [00:51:46] that a lot of patients with disabilities use and it's a very simple device you [00:51:48] use and it's a very simple device you can only like sip and puff like that's [00:51:50] can only like sip and puff like that's the only thing you can do but then the [00:51:51] the only thing you can do but then the algorithm that's underneath needs to be [00:51:53] algorithm that's underneath needs to be like much more complicated to be able to [00:51:55] like much more complicated to be able to capture what that what that zip-and-puff [00:51:57] capture what that what that zip-and-puff means so so that's kind of like one [00:51:59] means so so that's kind of like one place that this interplay between [00:52:00] place that this interplay between hardware and soft hardware and algorithm [00:52:02] hardware and soft hardware and algorithm really shows up [00:52:05] really shows up all right thank you [00:52:07] all right thank you guys yeah let me okay so let me continue [00:52:09] guys yeah let me okay so let me continue a little bit um and then i'll stop at [00:52:11] a little bit um and then i'll stop at the end of this section and i can take [00:52:12] the end of this section and i can take more questions too [00:52:14] more questions too um after that i kind of want to show [00:52:16] um after that i kind of want to show some of these applications too at some [00:52:18] some of these applications too at some point this section is small so the [00:52:19] point this section is small so the robotics at stanford one yeah so okay so [00:52:22] robotics at stanford one yeah so okay so we talked about all of these we talked [00:52:23] we talked about all of these we talked about the history um [00:52:25] about the history um robotics said stanford had an [00:52:27] robotics said stanford had an interesting history too so it does have [00:52:29] interesting history too so it does have like an old history uh here's a video [00:52:32] like an old history uh here's a video that i kind of just wanted to show for [00:52:33] that i kind of just wanted to show for fun uh this is a video from usama kitube [00:52:36] fun uh this is a video from usama kitube slab and sam always has the best videos [00:52:39] slab and sam always has the best videos um he has these two robots juliet and [00:52:42] um he has these two robots juliet and romeo the other fun thing in this video [00:52:44] romeo the other fun thing in this video for me is lots of these people are like [00:52:47] for me is lots of these people are like faculty now or like very famous in the [00:52:50] faculty now or like very famous in the field so it's kind of fun to see them as [00:52:51] field so it's kind of fun to see them as grad students at stanford so yeah this [00:52:54] grad students at stanford so yeah this is like oliver brock um this is gates if [00:52:57] is like oliver brock um this is gates if you like look at it closely it is first [00:52:59] you like look at it closely it is first floor updates it hasn't changed much [00:53:02] floor updates it hasn't changed much uh this robot is still on the first [00:53:04] uh this robot is still on the first floor of gates so if you get a chance to [00:53:06] floor of gates so if you get a chance to go there it is still like sombra [00:53:09] go there it is still like sombra um and yeah like this is basically [00:53:11] um and yeah like this is basically getting this robot helper to help you do [00:53:14] getting this robot helper to help you do various sort of things like move objects [00:53:17] various sort of things like move objects for you um they have basically two of [00:53:19] for you um they have basically two of these let me move forward a little bit [00:53:21] these let me move forward a little bit so it helps you like carry objects and [00:53:23] so it helps you like carry objects and things of those forms it's very old [00:53:25] things of those forms it's very old robot and some of these concepts that [00:53:28] robot and some of these concepts that you study like now even like thinking [00:53:30] you study like now even like thinking about interaction between humans like [00:53:32] about interaction between humans like they were actually like they were [00:53:33] they were actually like they were thinking about this back in the day this [00:53:35] thinking about this back in the day this is collaborative transport these robots [00:53:37] is collaborative transport these robots are not decentralized they do actually [00:53:38] are not decentralized they do actually have a centralized control but they're [00:53:40] have a centralized control but they're compliant so meaning that they're not [00:53:42] compliant so meaning that they're not rigid right like if you move it like [00:53:44] rigid right like if you move it like they also get moved [00:53:46] they also get moved um [00:53:47] um yeah so i think and then later on [00:53:49] yeah so i think and then later on there's this video of dancing uh with [00:53:53] there's this video of dancing uh with romeo so again it's compliant and it [00:53:55] romeo so again it's compliant and it kind of moves around you and all that so [00:53:57] kind of moves around you and all that so this is some old videos from stanford [00:54:00] this is some old videos from stanford robotics some more recent video of [00:54:02] robotics some more recent video of videos of stanford robotics and [00:54:05] videos of stanford robotics and successes i guess [00:54:06] successes i guess is um [00:54:08] is um about the star for grand challenge so um [00:54:11] about the star for grand challenge so um it's not that recent it's from 2005. so [00:54:14] it's not that recent it's from 2005. so darpa grand challenge was this [00:54:16] darpa grand challenge was this competition that darpa put out and [00:54:18] competition that darpa put out and basically it was a competition that was [00:54:19] basically it was a competition that was trying to get researchers to work on [00:54:21] trying to get researchers to work on autonomous driving [00:54:23] autonomous driving and this was um this was basically um [00:54:27] and this was um this was basically um the competition in 2005 where sebastian [00:54:29] the competition in 2005 where sebastian thrawn uh was heading the stanford team [00:54:32] thrawn uh was heading the stanford team and [00:54:33] and stanley [00:54:34] stanley was uh the vehicle and actually won the [00:54:38] was uh the vehicle and actually won the competition [00:54:41] [Music] [00:54:49] in case you don't recognize it that is a [00:54:52] in case you don't recognize it that is a volkswagen tourette [00:54:54] volkswagen tourette into the finish line configuration [00:55:00] ladies and gentlemen [00:55:02] ladies and gentlemen boys and girls [00:55:04] boys and girls it's been done [00:55:11] [Applause] [00:55:13] [Applause] so that is stanley passing the finishing [00:55:16] so that is stanley passing the finishing line after this actually sebastian [00:55:18] line after this actually sebastian throne left [00:55:19] throne left stanford and joined google and started [00:55:21] stanford and joined google and started the google self-driving car uh groupter [00:55:24] the google self-driving car uh groupter now waymo and yeah lots of advances in [00:55:26] now waymo and yeah lots of advances in autonomous driving since then uh but [00:55:28] autonomous driving since then uh but this was kind of one of the big [00:55:29] this was kind of one of the big successes of stanford robotics winning [00:55:32] successes of stanford robotics winning basically uh this uh darpa grand [00:55:34] basically uh this uh darpa grand challenge which is very exciting but in [00:55:36] challenge which is very exciting but in general robotics at stanford uh kind of [00:55:39] general robotics at stanford uh kind of falls into a bunch of different [00:55:40] falls into a bunch of different departments so in computer science uh [00:55:42] departments so in computer science uh here are some of the faculty i just [00:55:43] here are some of the faculty i just wanted to show their faces so [00:55:45] wanted to show their faces so uh you know who they are and you can [00:55:47] uh you know who they are and you can take classes from them later on and [00:55:49] take classes from them later on and things of those form so uh usama kiteb [00:55:52] things of those form so uh usama kiteb i've already showed a lot of videos from [00:55:54] i've already showed a lot of videos from his lab i have one more from his lab [00:55:56] his lab i have one more from his lab that i'll show later [00:55:57] that i'll show later uh and someone does a lot of work around [00:55:59] uh and someone does a lot of work around field robotics meaning that i'm gonna [00:56:01] field robotics meaning that i'm gonna send robots to places that humans [00:56:02] send robots to places that humans haven't seen before and see what happens [00:56:04] haven't seen before and see what happens which is like really exciting [00:56:06] which is like really exciting um then we have ken salisbury who does a [00:56:09] um then we have ken salisbury who does a lot of work around like helper robots [00:56:10] lot of work around like helper robots and and building systems that can [00:56:12] and and building systems that can actually help people sylvia does a lot [00:56:14] actually help people sylvia does a lot of work around vision and robotics so [00:56:17] of work around vision and robotics so he's primarily a vision faculty but he [00:56:19] he's primarily a vision faculty but he is thinking about that intersection of [00:56:20] is thinking about that intersection of vision robotics and some of the more [00:56:22] vision robotics and some of the more recent um people have joined including [00:56:24] recent um people have joined including myself jeanette and chelsea so does a [00:56:27] myself jeanette and chelsea so does a lot of work around manipulation i am [00:56:29] lot of work around manipulation i am personally very interested in [00:56:30] personally very interested in interaction so when you think about [00:56:31] interaction so when you think about multi-agent interaction or interaction [00:56:33] multi-agent interaction or interaction with humans and chelsea does a lot of [00:56:35] with humans and chelsea does a lot of work around robot learning metal [00:56:36] work around robot learning metal learning and things of those form [00:56:39] learning and things of those form in addition to some of these faculty [00:56:40] in addition to some of these faculty there are some other folks in the cs [00:56:42] there are some other folks in the cs department who do a lot of work that is [00:56:44] department who do a lot of work that is related to robotics so again fei-fei [00:56:46] related to robotics so again fei-fei does quite a bit of work and vision but [00:56:48] does quite a bit of work and vision but also is interested in robotics that [00:56:50] also is interested in robotics that intersection and then karen liu and [00:56:53] intersection and then karen liu and jojen wu both of them recently joined [00:56:55] jojen wu both of them recently joined stanford and they do a ton of work [00:56:57] stanford and they do a ton of work around physical simulation graphics [00:56:59] around physical simulation graphics things of those form uh and that has a [00:57:02] things of those form uh and that has a lot of relations to to building robots [00:57:04] lot of relations to to building robots that can work with um like deformable [00:57:07] that can work with um like deformable objects and things of those form [00:57:10] objects and things of those form and some folks who used to do robotics i [00:57:12] and some folks who used to do robotics i guess are andrew and sebastian so [00:57:14] guess are andrew and sebastian so i showed a video of andrew's learning [00:57:16] i showed a video of andrew's learning from demonstration work earlier the the [00:57:19] from demonstration work earlier the the flying [00:57:20] flying video of the helicopter and then [00:57:22] video of the helicopter and then sebastian has done a lot of work in [00:57:24] sebastian has done a lot of work in autonomous driving [00:57:25] autonomous driving uh they're both kind of around so andrew [00:57:27] uh they're both kind of around so andrew andrew does a lot of work in healthcare [00:57:29] andrew does a lot of work in healthcare these days lesten comes in he's an [00:57:31] these days lesten comes in he's an adjunct faculty now [00:57:33] adjunct faculty now outside of computer science you still [00:57:34] outside of computer science you still have a lot of robotics faculty so in the [00:57:36] have a lot of robotics faculty so in the aero astro department we have grayscale [00:57:38] aero astro department we have grayscale max schrager and marco pavone and [00:57:40] max schrager and marco pavone and michael kochendurfer i mentioned [00:57:42] michael kochendurfer i mentioned michael's work around aircraft systems [00:57:44] michael's work around aircraft systems earlier so building these acasu systems [00:57:47] earlier so building these acasu systems and and trying to prove properties [00:57:49] and and trying to prove properties around them they all do a lot of work [00:57:51] around them they all do a lot of work around drones and quadcopters and [00:57:54] around drones and quadcopters and helicopters things of those form [00:57:56] helicopters things of those form multi-agent systems and being able to [00:57:58] multi-agent systems and being able to get guarantees and talk about risk [00:58:01] get guarantees and talk about risk and finally in mechanical engineering we [00:58:02] and finally in mechanical engineering we have a good number of faculty allison [00:58:04] have a good number of faculty allison nakamura sean former mark otkowski steve [00:58:07] nakamura sean former mark otkowski steve collins and monroe kennedy a lot of the [00:58:09] collins and monroe kennedy a lot of the like al most of almost all of them do [00:58:12] like al most of almost all of them do quite quite interesting work here on [00:58:13] quite quite interesting work here on design too so building systems that are [00:58:15] design too so building systems that are actually like interesting and useful the [00:58:18] actually like interesting and useful the the sticky bot that i showed earlier was [00:58:20] the sticky bot that i showed earlier was from mark slap [00:58:21] from mark slap the snake robot was from from allison's [00:58:23] the snake robot was from from allison's lab sean does a lot of interesting work [00:58:25] lab sean does a lot of interesting work at the intersection of robotics and hci [00:58:28] at the intersection of robotics and hci so if that is something you're [00:58:29] so if that is something you're interested in you should check you [00:58:30] interested in you should check you should check out like what these faculty [00:58:32] should check out like what these faculty teach and and all that so that was kind [00:58:34] teach and and all that so that was kind of my very quick robotics at stanford [00:58:37] of my very quick robotics at stanford type of overview let me spend another [00:58:40] type of overview let me spend another five minutes showing some of these [00:58:41] five minutes showing some of these applications and maybe after that i'll [00:58:43] applications and maybe after that i'll take questions for the last five minutes [00:58:46] take questions for the last five minutes and i have a seven minute video [00:58:48] and i have a seven minute video that i'll just leave after the class for [00:58:51] that i'll just leave after the class for you guys to watch it's like a 50-year [00:58:53] you guys to watch it's like a 50-year like history of robotics which is kind [00:58:56] like history of robotics which is kind of fun [00:58:57] of fun all right so i wanted to show you [00:58:59] all right so i wanted to show you basically generally some exciting [00:59:01] basically generally some exciting applications of robotics and i actually [00:59:03] applications of robotics and i actually had a hard time classifying it because i [00:59:05] had a hard time classifying it because i think classifying them [00:59:07] think classifying them are can they can be classified with [00:59:09] are can they can be classified with different axes and and it was hard to [00:59:11] different axes and and it was hard to classify them but i ended up classifying [00:59:12] classify them but i ended up classifying them into three main groups [00:59:15] them into three main groups the first group is bio-inspired robots [00:59:18] the first group is bio-inspired robots which is basically let's look at biology [00:59:20] which is basically let's look at biology and try to build robots that are useful [00:59:22] and try to build robots that are useful so it's a lot of interesting design goes [00:59:24] so it's a lot of interesting design goes there [00:59:24] there another interesting direction is soft [00:59:26] another interesting direction is soft robotics meaning that hey i'm going to [00:59:28] robotics meaning that hey i'm going to build systems that that use soft [00:59:31] build systems that that use soft material so they're the integrity [00:59:33] material so they're the integrity structure was an example of that so [00:59:35] structure was an example of that so they're flexible they're soft they're [00:59:36] they're flexible they're soft they're not rigid they're not going to break [00:59:38] not rigid they're not going to break i'm not going to actually talk much [00:59:40] i'm not going to actually talk much about soft robots but i'm going to talk [00:59:42] about soft robots but i'm going to talk about manipulating soft objects which is [00:59:45] about manipulating soft objects which is a very difficult algorithmic question [00:59:48] a very difficult algorithmic question and then finally if i have time i will [00:59:51] and then finally if i have time i will talk a little bit about domestic and [00:59:52] talk a little bit about domestic and interactive robots which is something [00:59:54] interactive robots which is something that i think is really exciting like [00:59:55] that i think is really exciting like this interaction with humans is [00:59:56] this interaction with humans is something that you should really care [00:59:58] something that you should really care about as robots are basically starting [01:00:00] about as robots are basically starting to interact with us [01:00:03] to interact with us all right so bioinspired robots this is [01:00:05] all right so bioinspired robots this is more of an like interesting design [01:00:08] more of an like interesting design question so from kind of early on [01:00:11] question so from kind of early on everyone was interested in humanoids [01:00:13] everyone was interested in humanoids because you want robots to look like [01:00:14] because you want robots to look like humans for some reason so there's a lot [01:00:17] humans for some reason so there's a lot of work around humanoids and building [01:00:19] of work around humanoids and building robots that look like humans that is [01:00:21] robots that look like humans that is they have two arms and two hands and two [01:00:23] they have two arms and two hands and two legs and a face [01:00:25] legs and a face but at some point people realize robots [01:00:28] but at some point people realize robots don't need to look like humans and they [01:00:30] don't need to look like humans and they started looking at the nature in general [01:00:32] started looking at the nature in general and and then started thinking about [01:00:34] and and then started thinking about generally bioinspired robots right there [01:00:36] generally bioinspired robots right there are a lot of a lot of animals that can [01:00:38] are a lot of a lot of animals that can get to places that humans can't and can [01:00:41] get to places that humans can't and can be built robots that are similar to them [01:00:44] be built robots that are similar to them and and [01:00:45] and and another interesting topic that shows up [01:00:47] another interesting topic that shows up here specifically under humanoids is [01:00:49] here specifically under humanoids is this idea of walking so people have been [01:00:51] this idea of walking so people have been obsessed with walking for years now and [01:00:54] obsessed with walking for years now and it's an interesting problem like if you [01:00:56] it's an interesting problem like if you want to build a robot that walks kind of [01:00:58] want to build a robot that walks kind of like humans that is still very difficult [01:01:01] like humans that is still very difficult like all walking robots have like weird [01:01:04] like all walking robots have like weird gates and they don't really like walk [01:01:06] gates and they don't really like walk like human like and when they do they're [01:01:08] like human like and when they do they're just super inefficient like humans are [01:01:10] just super inefficient like humans are just amazing at walking and that's in [01:01:12] just amazing at walking and that's in general very active area of research [01:01:14] general very active area of research trying to get robots to walk and why do [01:01:17] trying to get robots to walk and why do we care about that well first off it's [01:01:19] we care about that well first off it's an interesting question second like [01:01:21] an interesting question second like building exoskeletons building systems [01:01:22] building exoskeletons building systems that can help people watch is always an [01:01:24] that can help people watch is always an interest [01:01:25] interest in this field [01:01:27] in this field so let's look at a few bioinspired [01:01:29] so let's look at a few bioinspired robots i only just want to show a lot of [01:01:31] robots i only just want to show a lot of videos of these systems [01:01:33] videos of these systems so one type of bio-inspired robots is [01:01:36] so one type of bio-inspired robots is looking at insects like cockroaches [01:01:39] looking at insects like cockroaches and try to build robots that kind of act [01:01:41] and try to build robots that kind of act like cockroaches because they're amazing [01:01:43] like cockroaches because they're amazing at getting through like obstacles and [01:01:46] at getting through like obstacles and there's a team at uc berkeley ron [01:01:48] there's a team at uc berkeley ron fearings group that basically designs [01:01:50] fearings group that basically designs robots [01:01:51] robots that are [01:01:53] that are similar to cockroaches and they go [01:01:55] similar to cockroaches and they go through places like cockroaches [01:01:57] through places like cockroaches and the nice thing about it is yeah [01:01:59] and the nice thing about it is yeah they're very agile they go through [01:02:01] they're very agile they go through things the other thing is they're small [01:02:02] things the other thing is they're small and they can be super fast so you can [01:02:04] and they can be super fast so you can have a swarm of these robots get to [01:02:06] have a swarm of these robots get to places super fast [01:02:08] places super fast and another interesting thing about [01:02:10] and another interesting thing about cockroaches is when they navigate they [01:02:12] cockroaches is when they navigate they use their antennas so that is actually [01:02:15] use their antennas so that is actually how they figure out where the ball is [01:02:17] how they figure out where the ball is like using the antenna they figure out [01:02:19] like using the antenna they figure out they kind of like follow the ball and [01:02:21] they kind of like follow the ball and and basically people in ron's group have [01:02:23] and basically people in ron's group have have been using similar type of ideas to [01:02:26] have been using similar type of ideas to be able to sense the world and actuate [01:02:27] be able to sense the world and actuate in this world um and they even build [01:02:29] in this world um and they even build these robots using origami which is kind [01:02:32] these robots using origami which is kind of interesting but it makes sure that [01:02:34] of interesting but it makes sure that they're small and they're not they don't [01:02:36] they're small and they're not they don't take as much battery power and energy [01:02:38] take as much battery power and energy and [01:02:39] and and they're kind of like white [01:02:42] and they're kind of like white let me actually move to this one [01:02:44] let me actually move to this one the other robot that i showed you a [01:02:46] the other robot that i showed you a little bit earlier is the bioinspired [01:02:48] little bit earlier is the bioinspired robot is the sticky bot from mark slap [01:02:50] robot is the sticky bot from mark slap so uh basically if you look at gecko [01:02:53] so uh basically if you look at gecko feet it has these suction cups uh that [01:02:57] feet it has these suction cups uh that it's very very tiny suction cups that [01:02:59] it's very very tiny suction cups that gets attached to glass and because of [01:03:01] gets attached to glass and because of that like this robot is using a similar [01:03:03] that like this robot is using a similar type of paradigm and it can just like [01:03:05] type of paradigm and it can just like walk up like really slippery slopes [01:03:08] walk up like really slippery slopes um [01:03:09] um like polish granite [01:03:11] like polish granite and that is super impressive [01:03:14] and that is super impressive thing so lots of cool design going on [01:03:16] thing so lots of cool design going on here [01:03:18] here um [01:03:20] um similarly uh i think snake robots and [01:03:22] similarly uh i think snake robots and eel robots are very popular there are a [01:03:24] eel robots are very popular there are a lot of like links connected to each [01:03:26] lot of like links connected to each other and they can navigate easily this [01:03:28] other and they can navigate easily this is an elite robot it's an underwater [01:03:30] is an elite robot it's an underwater robot it does have a camera in the front [01:03:32] robot it does have a camera in the front and based on that it navigates [01:03:35] and based on that it navigates uh which is kind of cool [01:03:38] all right [01:03:40] all right and then this is a [01:03:42] and then this is a this is a hopper so this is uh basically [01:03:45] this is a hopper so this is uh basically a robot again from ron's group uh salto [01:03:48] a robot again from ron's group uh salto which is kind of like [01:03:50] which is kind of like jumps around like a bush baby [01:03:52] jumps around like a bush baby um it's kind of cool [01:03:55] um it's kind of cool if if you guys have used things like [01:03:57] if if you guys have used things like mujoku which is a simulation environment [01:03:59] mujoku which is a simulation environment you might see these random animals in [01:04:01] you might see these random animals in murdocal part of it is roboticists care [01:04:04] murdocal part of it is roboticists care really about like different types of [01:04:05] really about like different types of animals like swimming type of motion [01:04:07] animals like swimming type of motion hopping and [01:04:09] hopping and and that's why those show up in this [01:04:11] and that's why those show up in this mutuco type environment which is a [01:04:13] mutuco type environment which is a simulation wise physics simulator that [01:04:15] simulation wise physics simulator that allows you to train things basically in [01:04:17] allows you to train things basically in simulation [01:04:19] simulation all right so those were those were [01:04:20] all right so those were those were generally bioinspired type robots as i [01:04:23] generally bioinspired type robots as i mentioned earlier we've been obsessed [01:04:24] mentioned earlier we've been obsessed with humanoids so lots of energy and [01:04:27] with humanoids so lots of energy and money goes into building humanoids [01:04:29] money goes into building humanoids um honda actually like spent a lot of [01:04:32] um honda actually like spent a lot of money on building this this robot called [01:04:34] money on building this this robot called azima [01:04:39] it's walking is kind of weird [01:04:47] [Music] [01:04:51] i am happy to be here with you today [01:04:54] i am happy to be here with you today thank you [01:04:55] thank you i'm excited to be here in washington d.c [01:04:58] i'm excited to be here in washington d.c [Music] [01:05:01] [Music] all right i'm gonna cut it there [01:05:03] all right i'm gonna cut it there um but yeah so humanoids have been [01:05:06] um but yeah so humanoids have been really exciting and another place that [01:05:08] really exciting and another place that actually humanoids show up um is uh [01:05:12] actually humanoids show up um is uh basically um [01:05:14] basically um when you have um [01:05:16] when you have um one other example i want to show [01:05:17] one other example i want to show basically is sending humanoids type [01:05:19] basically is sending humanoids type robots to spaces that you wouldn't be [01:05:21] robots to spaces that you wouldn't be able to send before so i want to show a [01:05:23] able to send before so i want to show a video of uh this robot from usama's [01:05:25] video of uh this robot from usama's group uh ocean one you some of you might [01:05:27] group uh ocean one you some of you might have seen this video [01:05:29] have seen this video um this is basically not full humanoid [01:05:31] um this is basically not full humanoid it does have two arms so you can [01:05:33] it does have two arms so you can teleoperate it and get the robot do [01:05:35] teleoperate it and get the robot do various things [01:05:36] various things um but it goes underwater to places that [01:05:39] um but it goes underwater to places that people have not been able to go before [01:05:42] people have not been able to go before so let's just quickly watch this video [01:05:43] so let's just quickly watch this video it's kind of like a nice video [01:05:47] it's kind of like a nice video it's almost work [01:05:50] it's almost work [Music] [01:05:58] ocean one is aimed at bringing a new [01:06:02] ocean one is aimed at bringing a new capability for underwater exploration [01:06:07] the intent here is to have a diver [01:06:10] the intent here is to have a diver diving virtually creating a robot that [01:06:14] diving virtually creating a robot that can be [01:06:15] can be the [01:06:16] the physical representation of the human [01:06:19] physical representation of the human if you want a robotic diver that can [01:06:21] if you want a robotic diver that can have bi-manual capabilities [01:06:24] have bi-manual capabilities so it has two hands it has a stereo [01:06:26] so it has two hands it has a stereo vision and the most amazing thing about [01:06:29] vision and the most amazing thing about it is that [01:06:30] it is that you can feel [01:06:33] you can feel what the robot is doing while it's [01:06:35] what the robot is doing while it's sitting up on the boat [01:06:37] sitting up on the boat and this is combining the technology of [01:06:40] and this is combining the technology of haptics that is the idea that we can [01:06:43] haptics that is the idea that we can reflect the contact forces [01:06:46] reflect the contact forces it's almost like you are there with the [01:06:49] it's almost like you are there with the sense of touch you create a new [01:06:51] sense of touch you create a new dimension of perception [01:06:54] dimension of perception this robot is oil filled it allows us to [01:06:57] this robot is oil filled it allows us to take the robot very deep this robot can [01:07:00] take the robot very deep this robot can go to thousands is this stanford truly a [01:07:03] go to thousands is this stanford truly a human-like machine [01:07:05] human-like machine that is also human-friendly [01:07:07] that is also human-friendly [Music] [01:07:09] [Music] essentially shipwrecked located about 20 [01:07:12] essentially shipwrecked located about 20 miles off the coast of toulon in france [01:07:15] miles off the coast of toulon in france at 100 meters [01:07:19] in the last year we have been working [01:07:22] in the last year we have been working and getting our robots ready to take on [01:07:25] and getting our robots ready to take on that expedition [01:07:28] that expedition we are going to land on the moon [01:07:31] we are going to land on the moon [Music] [01:07:35] more than 70 percent of the surface of [01:07:37] more than 70 percent of the surface of the planet is water we have a lot of [01:07:39] the planet is water we have a lot of structures a lot of polar potteries to [01:07:42] structures a lot of polar potteries to monitor [01:07:43] monitor we need to reach down there you can [01:07:46] we need to reach down there you can think about it as a solution [01:07:48] think about it as a solution [Music] [01:07:51] [Music] that much time i'm going to kind of like [01:07:53] that much time i'm going to kind of like move forward because probably the last [01:07:55] move forward because probably the last thing i want to show is the walking uh [01:07:57] thing i want to show is the walking uh video and then i'll close it and ask for [01:08:00] video and then i'll close it and ask for questions but yeah like uh in general if [01:08:03] questions but yeah like uh in general if you're interested in underwater robots [01:08:04] you're interested in underwater robots kusama does a lot of work around that [01:08:06] kusama does a lot of work around that he's at stanford he teaches classes talk [01:08:08] he's at stanford he teaches classes talk to him i think that's a very interesting [01:08:10] to him i think that's a very interesting direction [01:08:11] direction uh and then finally i guess the last [01:08:13] uh and then finally i guess the last video that i can show under this [01:08:15] video that i can show under this category is um this idea of walking and [01:08:18] category is um this idea of walking and jumping in things of those forms so um [01:08:21] jumping in things of those forms so um so there's a lot of again excitement [01:08:22] so there's a lot of again excitement around that and boston dynamics the [01:08:25] around that and boston dynamics the first video i showed you guys for the [01:08:27] first video i showed you guys for the dancing robot was also from boston [01:08:28] dancing robot was also from boston dynamics [01:08:29] dynamics um does a lot of work around building [01:08:32] um does a lot of work around building very dynamic robots so they have really [01:08:34] very dynamic robots so they have really good controllers for these robots uh and [01:08:36] good controllers for these robots uh and this is atlas [01:08:38] this is atlas from austin dynamics [01:08:40] from austin dynamics jumps [01:08:55] flips even [01:09:06] that is super impressive [01:09:11] and usually roboticists show that one [01:09:14] and usually roboticists show that one video of it working and don't talk about [01:09:16] video of it working and don't talk about videos of it not working but more [01:09:18] videos of it not working but more recently people are showing more videos [01:09:21] recently people are showing more videos of things not exactly working so i think [01:09:24] of things not exactly working so i think this video it actually it's super [01:09:26] this video it actually it's super impressive how it recovers because that [01:09:28] impressive how it recovers because that is really hard to do in real time [01:09:31] is really hard to do in real time that was a failure so [01:09:34] that was a failure so yeah so okay so lots of excitement [01:09:37] yeah so okay so lots of excitement around these areas [01:09:39] around these areas i can start taking questions now i'm not [01:09:42] i can start taking questions now i'm not gonna show more videos i have more stuff [01:09:43] gonna show more videos i have more stuff to show but let me just like answer a [01:09:45] to show but let me just like answer a couple of questions and then i'll just [01:09:48] couple of questions and then i'll just at the end i took 220 i'll [01:09:50] at the end i took 220 i'll leave this video up 50 years of robotics [01:09:52] leave this video up 50 years of robotics that osama put together and has fun [01:09:54] that osama put together and has fun music [01:09:56] music um so any questions [01:09:59] um so any questions i had a question i was super impressed [01:10:00] i had a question i was super impressed by the last video you showed of the [01:10:03] by the last video you showed of the robot doing flips i was wondering it [01:10:05] robot doing flips i was wondering it looked really heavy like it had a lot of [01:10:07] looked really heavy like it had a lot of materials and i was wondering if um like [01:10:09] materials and i was wondering if um like why they chose to equip it with such [01:10:12] why they chose to equip it with such materials i i thought you know maybe [01:10:13] materials i i thought you know maybe using lighter um materials so it could [01:10:16] using lighter um materials so it could be easier to jump i was just curious if [01:10:17] be easier to jump i was just curious if you knew um the reason behind uh how [01:10:20] you knew um the reason behind uh how they designed that robot that's a very [01:10:22] they designed that robot that's a very good question yeah so i don't know the [01:10:24] good question yeah so i don't know the details of it because i don't personally [01:10:25] details of it because i don't personally like work too much on uh like walking [01:10:27] like work too much on uh like walking and like the design side of things uh [01:10:29] and like the design side of things uh but yeah i actually don't know like what [01:10:31] but yeah i actually don't know like what material they're using they definitely [01:10:33] material they're using they definitely do consider different types of material [01:10:34] do consider different types of material and making sure there's lightweight and [01:10:36] and making sure there's lightweight and all that but i think there's just a lot [01:10:38] all that but i think there's just a lot of like joints and a lot of stuff going [01:10:40] of like joints and a lot of stuff going on with that robot um if you're [01:10:42] on with that robot um if you're interested to learn more about that yeah [01:10:44] interested to learn more about that yeah boston check out like boston dynamics [01:10:46] boston check out like boston dynamics basically website they have all their [01:10:48] basically website they have all their other cool robots there [01:10:54] i have a question [01:10:56] i have a question um so i was wondering um what's some [01:10:59] um so i was wondering um what's some examples of state-of-the-art research [01:11:02] examples of state-of-the-art research involved [01:11:03] involved both robotics and nlp or language based [01:11:06] both robotics and nlp or language based ai for example voice activated systems [01:11:09] ai for example voice activated systems or related work [01:11:11] or related work yeah there's a lot of excitement around [01:11:13] yeah there's a lot of excitement around actually nlp and robotics uh i actually [01:11:15] actually nlp and robotics uh i actually have a student uh joined between me and [01:11:17] have a student uh joined between me and percy which is very exciting this is [01:11:19] percy which is very exciting this is like first time doing nlp robotics [01:11:21] like first time doing nlp robotics together um there's a lot of work around [01:11:24] together um there's a lot of work around instruction following so basically just [01:11:26] instruction following so basically just making an interactive teaching so when [01:11:28] making an interactive teaching so when you have a person and a robot at home [01:11:30] you have a person and a robot at home like how would the robot learn that you [01:11:32] like how would the robot learn that you care about house of cards or you don't [01:11:34] care about house of cards or you don't care about house of cards type of type [01:11:36] care about house of cards type of type of a thing so thinking about human robot [01:11:38] of a thing so thinking about human robot interaction a little bit more carefully [01:11:40] interaction a little bit more carefully than you actually have access to nlp so [01:11:42] than you actually have access to nlp so so that is one place that it shows up [01:11:45] so that is one place that it shows up another place that is a little bit [01:11:46] another place that is a little bit harder to think about but i think it has [01:11:48] harder to think about but i think it has a lot of value is uh thinking about the [01:11:51] a lot of value is uh thinking about the large data set [01:11:53] large data set of natural language data that we have [01:11:55] of natural language data that we have generally like we have a lot of like [01:11:57] generally like we have a lot of like text data and from that if you can from [01:11:59] text data and from that if you can from that learn something about context learn [01:12:01] that learn something about context learn something about how a robot should like [01:12:04] something about how a robot should like cook an egg like i think that is that is [01:12:06] cook an egg like i think that is that is very interesting too and i haven't seen [01:12:08] very interesting too and i haven't seen that much work around it but again lots [01:12:10] that much work around it but again lots of lots of excitement in that particular [01:12:12] of lots of excitement in that particular intersection of nlp and robotics [01:12:19] hi i had a question as well [01:12:21] hi i had a question as well maybe this goes more into the scope of [01:12:23] maybe this goes more into the scope of visual recognition but robots will be [01:12:25] visual recognition but robots will be playing a part in this too [01:12:27] playing a part in this too so the world unfortunately will always [01:12:29] so the world unfortunately will always consist of good factors and evil factors [01:12:31] consist of good factors and evil factors and for international security purposes [01:12:33] and for international security purposes there will be a role if there already is [01:12:35] there will be a role if there already is a rule for robots and autonomous systems [01:12:38] a rule for robots and autonomous systems but those same methods can unfortunately [01:12:39] but those same methods can unfortunately also be used for human rights violations [01:12:41] also be used for human rights violations too how do you build it so that any [01:12:43] too how do you build it so that any technology will always be neutral is its [01:12:45] technology will always be neutral is its use that determines what the outcome is [01:12:47] use that determines what the outcome is but in the case of human rights [01:12:48] but in the case of human rights violations how can you build systems so [01:12:51] violations how can you build systems so that [01:12:52] that um an authoritarian regime [01:12:54] um an authoritarian regime or what would be the way to using [01:12:56] or what would be the way to using technology to evade an authoritarian [01:12:58] technology to evade an authoritarian regime that will have the most best [01:13:00] regime that will have the most best technology [01:13:01] technology i've seen some works around how to full [01:13:04] i've seen some works around how to full uh facial recognitions [01:13:06] uh facial recognitions how can technology work against [01:13:07] how can technology work against technology when it's needed and also [01:13:09] technology when it's needed and also serve its purpose [01:13:10] serve its purpose i think it's a tough question [01:13:12] i think it's a tough question it is a very tough question i refer that [01:13:14] it is a very tough question i refer that to tino next week specifically but uh in [01:13:17] to tino next week specifically but uh in the case of vision and in the case of [01:13:19] the case of vision and in the case of like using ml like in general like [01:13:20] like using ml like in general like machine learning i think it is much like [01:13:22] machine learning i think it is much like much tougher uh in the last lecture i [01:13:24] much tougher uh in the last lecture i will talk about this a little bit like [01:13:26] will talk about this a little bit like this idea of like fooling like neural [01:13:28] this idea of like fooling like neural networks there's some recent work around [01:13:30] networks there's some recent work around uh basically showing that you can always [01:13:32] uh basically showing that you can always find adversarial examples so this idea [01:13:34] find adversarial examples so this idea of trying to like safeguard your system [01:13:36] of trying to like safeguard your system so it doesn't get affected by [01:13:38] so it doesn't get affected by adversarial examples is just not going [01:13:40] adversarial examples is just not going to work like there's proof uh by [01:13:43] to work like there's proof uh by auditioning here and others uh who have [01:13:46] auditioning here and others uh who have actually shown that you can always find [01:13:47] actually shown that you can always find like basically an adversarial example in [01:13:50] like basically an adversarial example in some settings under some distances uh in [01:13:52] some settings under some distances uh in the case of robotics um i've been part [01:13:55] the case of robotics um i've been part of some discussions around this idea of [01:13:56] of some discussions around this idea of autonomous weapon systems as i mentioned [01:13:58] autonomous weapon systems as i mentioned earlier um for those discussions [01:14:02] earlier um for those discussions there are discussions on the number of [01:14:03] there are discussions on the number of drones that for example can be purchased [01:14:05] drones that for example can be purchased at the same time and things of those [01:14:07] at the same time and things of those forms a lot of concerns there is [01:14:10] forms a lot of concerns there is basically autonomous weapon systems [01:14:11] basically autonomous weapon systems becoming weapon of mass destruction uh [01:14:14] becoming weapon of mass destruction uh which is kind of scary like as i talk [01:14:16] which is kind of scary like as i talk about it um but yeah like um discussions [01:14:19] about it um but yeah like um discussions around that is what sort of limitations [01:14:21] around that is what sort of limitations can be put what sort of regulations can [01:14:24] can be put what sort of regulations can be put there so like people don't buy [01:14:26] be put there so like people don't buy too many drones at the same time and [01:14:27] too many drones at the same time and weaponize it let's say like things of [01:14:29] weaponize it let's say like things of those forms [01:14:30] those forms but i'm definitely not an expert in this [01:14:33] but i'm definitely not an expert in this i refer you to tina hoyler next week [01:14:36] i refer you to tina hoyler next week on more details on this [01:14:38] on more details on this thank you thanks so [01:14:40] thank you thanks so much i have a question regarding uh [01:14:43] much i have a question regarding uh among all these uh since robotic is such [01:14:46] among all these uh since robotic is such an integrated subject uh like it [01:14:48] an integrated subject uh like it integrates like mechanical engineering [01:14:50] integrates like mechanical engineering artificial intelligence and also power [01:14:52] artificial intelligence and also power management [01:14:54] management and also regulations what is that the [01:14:56] and also regulations what is that the biggest limiting factor uh that prevents [01:14:58] biggest limiting factor uh that prevents robotics from uh like affecting [01:15:01] robotics from uh like affecting everyone's life from being like widely [01:15:03] everyone's life from being like widely adopted [01:15:04] adopted and by being widely adopted uh they're [01:15:07] and by being widely adopted uh they're still like dealing with uncertainty is [01:15:08] still like dealing with uncertainty is still like so difficult right so like [01:15:11] still like so difficult right so like you have robots in factory floors [01:15:13] you have robots in factory floors confined spaces they can move around so [01:15:15] confined spaces they can move around so much easily uh putting it in a world [01:15:17] much easily uh putting it in a world where humans are just like walking [01:15:18] where humans are just like walking around it there's so many reasons that [01:15:20] around it there's so many reasons that the human could walk around it and like [01:15:22] the human could walk around it and like figuring out what those reasons are can [01:15:24] figuring out what those reasons are can be really difficult so in general [01:15:25] be really difficult so in general dealing with uncertainty dealing with in [01:15:28] dealing with uncertainty dealing with in the case of autonomous driving or [01:15:29] the case of autonomous driving or dealing with things like near accident [01:15:31] dealing with things like near accident scenarios that it hasn't seen before uh [01:15:34] scenarios that it hasn't seen before uh all of those like that uncertainty is [01:15:36] all of those like that uncertainty is like a big factor that's not allowing us [01:15:38] like a big factor that's not allowing us to have robots like out there like [01:15:41] to have robots like out there like widely used in our everyday lives [01:15:44] widely used in our everyday lives so given the current ai technology [01:15:46] so given the current ai technology mostly are based on learning algorithms [01:15:48] mostly are based on learning algorithms but if you keep doing the learning [01:15:50] but if you keep doing the learning algorithm that means you can only like [01:15:52] algorithm that means you can only like learn existing behaviors uh so in order [01:15:55] learn existing behaviors uh so in order to deal with these uncertainties are [01:15:57] to deal with these uncertainties are there any efforts to deal with [01:15:59] there any efforts to deal with uncertainties in life or do something [01:16:00] uncertainties in life or do something like a self-generated motion or [01:16:02] like a self-generated motion or self-motivated actions from the robots [01:16:05] self-motivated actions from the robots itself [01:16:07] itself yeah so yeah definitely yeah so there's [01:16:09] yeah so yeah definitely yeah so there's a lot of work around um like actively [01:16:11] a lot of work around um like actively generating these scenarios active [01:16:13] generating these scenarios active learning in this domain so the robot had [01:16:15] learning in this domain so the robot had but the robot still has some sort of [01:16:16] but the robot still has some sort of hypothesis space that it can search in [01:16:19] hypothesis space that it can search in right so uh like you have a hypothesis [01:16:21] right so uh like you have a hypothesis space of things that can happen and [01:16:23] space of things that can happen and within that you can search um and yeah [01:16:26] within that you can search um and yeah so so like there are these things that [01:16:28] so so like there are these things that are called known unknowns and unknown [01:16:30] are called known unknowns and unknown unknowns you can't really do much around [01:16:32] unknowns you can't really do much around unknown unknowns better than like other [01:16:34] unknown unknowns better than like other than just randomly experiencing them but [01:16:37] than just randomly experiencing them but for known unknowns yeah definitely like [01:16:39] for known unknowns yeah definitely like there's a lot of work on actively [01:16:40] there's a lot of work on actively looking for the most informative data i [01:16:43] looking for the most informative data i guess another reason that we don't have [01:16:44] guess another reason that we don't have robots widely used is it's such an [01:16:47] robots widely used is it's such an integrated system and it's such an [01:16:49] integrated system and it's such an interconnected system so you have like [01:16:51] interconnected system so you have like the best ai algorithm and all of a [01:16:53] the best ai algorithm and all of a sudden your camera fails [01:16:55] sudden your camera fails like the hardware failure can affect [01:16:57] like the hardware failure can affect like there's so many things that can [01:16:59] like there's so many things that can fail in that pipeline [01:17:01] fail in that pipeline that makes it just such a difficult [01:17:02] that makes it just such a difficult system to debug it's like everything [01:17:04] system to debug it's like everything coming together [01:17:12] so all right so i'm going to just leave [01:17:14] so all right so i'm going to just leave this video on and then at the end of it [01:17:15] this video on and then at the end of it i'm going to sign off uh because it's so [01:17:17] i'm going to sign off uh because it's so fun it has fun music [01:17:19] fun it has fun music osama again made this osama is awesome [01:17:21] osama again made this osama is awesome in making music but if you have more [01:17:23] in making music but if you have more questions about these things just come [01:17:25] questions about these things just come to the office hour and i'll be happy to [01:17:26] to the office hour and i'll be happy to answer any questions let's just watch 50 [01:17:29] answer any questions let's just watch 50 years history of robotics so if you guys [01:17:31] years history of robotics so if you guys want to sign off sign up to i'll talk to [01:17:33] want to sign off sign up to i'll talk to you later this is seven minutes all [01:17:36] you later this is seven minutes all right [01:17:47] so [01:17:48] so [Music] [01:18:02] [Music] so [01:18:03] so [Music] [01:18:40] [Music] wow [01:19:02] is [01:19:05] is [Music] [01:19:29] [Music] [01:19:45] um [01:19:51] yes [01:19:53] yes [Music] [01:20:05] thank you [01:20:31] [Music] [01:21:12] [Music] oh [01:21:15] [Music] [01:21:37] um [01:21:48] foreign [01:21:52] [Music] [01:22:40] [Music] [Laughter] [01:22:45] [Music] [01:23:37] [Music] [Laughter] [01:23:41] [Music] [01:24:35] [Music] all right [01:24:37] all right that was kind of long [01:24:39] that was kind of long but kind of fun [01:24:40] but kind of fun so this is for repress 2020 [01:24:43] so this is for repress 2020 okay [01:24:45] okay do they still have 30 people oh my god [01:24:47] do they still have 30 people oh my god okay [01:24:48] okay all right [01:24:49] all right uh good seeing you all uh that's it [01:24:53] uh good seeing you all uh that's it i'll see you at office hours in our next [01:24:55] i'll see you at office hours in our next lecture later bye ================================================================================ LECTURE 054 ================================================================================ Stanford Talk: Inequality in Healthcare, AI & Data Science to Reduce Inequality - Improve Healthcare Source: https://www.youtube.com/watch?v=0IZhDmh1dmI --- Transcript [00:00:05] all right let's get started so welcome [00:00:07] all right let's get started so welcome everyone um we're really pleased to have [00:00:10] everyone um we're really pleased to have emma pearson with us today [00:00:12] emma pearson with us today so emma actually comes from [00:00:14] so emma actually comes from stanford well i mean not originally but [00:00:16] stanford well i mean not originally but she spent uh her undergrad and [00:00:19] she spent uh her undergrad and grad school at stanford [00:00:22] grad school at stanford she actually took one of my classes when [00:00:24] she actually took one of my classes when i first saw it started [00:00:27] i first saw it started at stanford so um emma is uh has done a [00:00:30] at stanford so um emma is uh has done a lot of great work at the [00:00:33] lot of great work at the in machine learning in particular [00:00:35] in machine learning in particular addressing fairness and my group has [00:00:37] addressing fairness and my group has done um some of uh work in fairness and [00:00:40] done um some of uh work in fairness and we always [00:00:41] we always go and ask emma because she's kind of [00:00:42] go and ask emma because she's kind of the uh our go-to resident expert on the [00:00:45] the uh our go-to resident expert on the topic um so she's since graduated she's [00:00:48] topic um so she's since graduated she's spending a [00:00:50] spending a a year at msr new england before [00:00:53] a year at msr new england before starting as an assistant professor in uh [00:00:55] starting as an assistant professor in uh cornell [00:00:57] cornell um next year so [00:00:59] um next year so i'm sure she'll have a lot of [00:01:00] i'm sure she'll have a lot of interesting things and important things [00:01:02] interesting things and important things to say along with the general theme [00:01:04] to say along with the general theme that we've been trying to go for in [00:01:06] that we've been trying to go for in these [00:01:07] these classes which is how ai really matters [00:01:09] classes which is how ai really matters and affects people's lives so please [00:01:11] and affects people's lives so please take it away emma [00:01:13] take it away emma thank you thank you for this invitation [00:01:15] thank you thank you for this invitation it's a pleasure to be here um to be back [00:01:17] it's a pleasure to be here um to be back at stanford if only virtually and [00:01:19] at stanford if only virtually and actually to be back specifically in [00:01:21] actually to be back specifically in cs221 which was actually the first [00:01:23] cs221 which was actually the first computer science class i ever took at [00:01:24] computer science class i ever took at stanford so it brings back fond memories [00:01:27] stanford so it brings back fond memories and i'm not just saying that to suck up [00:01:29] and i'm not just saying that to suck up to the professors [00:01:30] to the professors okay so today i'm going to be talking um [00:01:33] okay so today i'm going to be talking um basically giving [00:01:35] basically giving a [00:01:37] a two-part talk in the first part of the [00:01:39] two-part talk in the first part of the talk um i'm going to give an overview of [00:01:40] talk um i'm going to give an overview of some of the recent projects that i've [00:01:42] some of the recent projects that i've worked on um discussing sort of social [00:01:44] worked on um discussing sort of social implications of ai and trying to use it [00:01:46] implications of ai and trying to use it to improve people's lives uh and then [00:01:48] to improve people's lives uh and then i'm going to give a little bit of a [00:01:49] i'm going to give a little bit of a story about how i got here just in case [00:01:51] story about how i got here just in case it's useful to you as you're sort of [00:01:52] it's useful to you as you're sort of trying to unravel your own professional [00:01:54] trying to unravel your own professional choices [00:01:57] so at a high level as percy said i use [00:01:59] so at a high level as percy said i use ai and data science and for very [00:02:02] ai and data science and for very practical applications and the specific [00:02:03] practical applications and the specific applications i focus on are reducing [00:02:05] applications i focus on are reducing inequality and improving health care [00:02:09] inequality and improving health care today i'm going to be talking about [00:02:11] today i'm going to be talking about using ai to study inequality in three [00:02:13] using ai to study inequality in three areas first i'm going to tell you a [00:02:15] areas first i'm going to tell you a story about policing and how we can use [00:02:16] story about policing and how we can use ai to study inequality and policing then [00:02:19] ai to study inequality and policing then i'll talk about using ai to study [00:02:21] i'll talk about using ai to study inequality and pain and then finally [00:02:22] inequality and pain and then finally i'll talk about using it to study [00:02:25] i'll talk about using it to study inequality in covid 19. [00:02:29] so let's jump right into it let's talk [00:02:31] so let's jump right into it let's talk about policing this is joint work with a [00:02:33] about policing this is joint work with a number of excellent co-authors whose [00:02:35] number of excellent co-authors whose names i will now attempt to rattle off [00:02:37] names i will now attempt to rattle off uh camilla yan sam [00:02:39] uh camilla yan sam uh dan amy uh vignesh cheryl uh phoebe [00:02:43] uh dan amy uh vignesh cheryl uh phoebe uh sorry ravi and sharid uh so it's [00:02:46] uh sorry ravi and sharid uh so it's quite quite a large project and the [00:02:48] quite quite a large project and the efforts of a ton of people [00:02:51] efforts of a ton of people so [00:02:52] so why is policing something we care about [00:02:54] why is policing something we care about i think this year that point doesn't [00:02:55] i think this year that point doesn't really need to be explained right it's [00:02:57] really need to be explained right it's obvious that policing has a tremendous [00:02:59] obvious that policing has a tremendous impact on communities across the united [00:03:01] impact on communities across the united states and in fact it's one of the major [00:03:03] states and in fact it's one of the major leading causes of death for young men [00:03:05] leading causes of death for young men particularly for young african-american [00:03:07] particularly for young african-american men [00:03:09] men and today i'm going to be talking to you [00:03:10] and today i'm going to be talking to you about police traffic stops why do we [00:03:12] about police traffic stops why do we care about police traffic stops well [00:03:14] care about police traffic stops well they're one of the most common ways we [00:03:15] they're one of the most common ways we interact with police tens of millions of [00:03:17] interact with police tens of millions of americans are stopped every year [00:03:21] americans are stopped every year and there's concern that traffic stops [00:03:22] and there's concern that traffic stops may be racially discriminatory to be [00:03:25] may be racially discriminatory to be clear about what i mean by racial [00:03:26] clear about what i mean by racial discrimination and i'll make this more [00:03:28] discrimination and i'll make this more precise in a couple slides this is when [00:03:30] precise in a couple slides this is when someone is treated more negatively [00:03:31] someone is treated more negatively because of their race so someone is [00:03:33] because of their race so someone is stopped by police because they're black [00:03:36] stopped by police because they're black they wouldn't have been stopped by [00:03:37] they wouldn't have been stopped by police had they been driving the same [00:03:39] police had they been driving the same way in the same car but they've been [00:03:40] way in the same car but they've been white for example [00:03:42] white for example now this is obviously very bad if it's [00:03:43] now this is obviously very bad if it's happening but it's hard to statistically [00:03:45] happening but it's hard to statistically test for let's talk about why [00:03:49] test for let's talk about why so first challenge we confronted when we [00:03:51] so first challenge we confronted when we embarked on this project is that there [00:03:52] embarked on this project is that there was no unified data set tracking every [00:03:55] was no unified data set tracking every stop made by the police [00:03:57] stop made by the police rather the way data is stored is that [00:03:59] rather the way data is stored is that each department stores data in its own [00:04:01] each department stores data in its own little system in its own into a [00:04:02] little system in its own into a syncretic format [00:04:05] syncretic format and so we set about creating this data [00:04:06] and so we set about creating this data set and we did so in two stages in the [00:04:09] set and we did so in two stages in the first stage our journalist collaborators [00:04:11] first stage our journalist collaborators submitted data requests to more than 150 [00:04:13] submitted data requests to more than 150 police departments over the course of [00:04:15] police departments over the course of five years this was like a colossal [00:04:17] five years this was like a colossal amount of work for them journalists are [00:04:18] amount of work for them journalists are amazing collaborators [00:04:20] amazing collaborators now of course now that data comes [00:04:22] now of course now that data comes pouring in and you have this like [00:04:23] pouring in and you have this like nightmarish data standardization task [00:04:25] nightmarish data standardization task right where every every single data set [00:04:27] right where every every single data set is in a different format so we put in [00:04:28] is in a different format so we put in thousands of hours to clean up the data [00:04:30] thousands of hours to clean up the data and put it into a standard format [00:04:34] now the good news for you is that we've [00:04:36] now the good news for you is that we've made all this data available for you so [00:04:38] made all this data available for you so if you're looking for interesting data [00:04:39] if you're looking for interesting data sets on inequality or on policing this [00:04:42] sets on inequality or on policing this is a publicly available resource which [00:04:44] is a publicly available resource which is easy to download the full data set [00:04:46] is easy to download the full data set tracks some 227 million stops made [00:04:49] tracks some 227 million stops made across 56 city agencies so that's stuff [00:04:51] across 56 city agencies so that's stuff like the san francisco police department [00:04:53] like the san francisco police department and 33 state agencies that would be like [00:04:56] and 33 state agencies that would be like the california highway patrol and in the [00:04:58] the california highway patrol and in the main analysis i'll be talking to you [00:05:00] main analysis i'll be talking to you about today we're going to be analyzing [00:05:01] about today we're going to be analyzing 95 million stops the reason that number [00:05:04] 95 million stops the reason that number is somewhat smaller is for example we [00:05:06] is somewhat smaller is for example we have to filter for departments that have [00:05:08] have to filter for departments that have enough data to even do this analysis at [00:05:10] enough data to even do this analysis at all if the department doesn't track the [00:05:12] all if the department doesn't track the race of stop drivers it's very hard to [00:05:14] race of stop drivers it's very hard to analyze racial discrimination [00:05:18] so in our analysis we look at three [00:05:20] so in our analysis we look at three different questions we look at whether [00:05:21] different questions we look at whether the police discriminate in whom they [00:05:23] the police discriminate in whom they stop in the first place we look at [00:05:24] stop in the first place we look at whether they discriminate in whom they [00:05:26] whether they discriminate in whom they search after stopping them and then we [00:05:27] search after stopping them and then we look at how policy changes affect these [00:05:29] look at how policy changes affect these things [00:05:30] things today i'm only going to be talking to [00:05:32] today i'm only going to be talking to you about the second question and i'll [00:05:34] you about the second question and i'll do so because it's particularly [00:05:35] do so because it's particularly interesting from sort of a data science [00:05:37] interesting from sort of a data science ai methods standpoint but also because [00:05:39] ai methods standpoint but also because the methods i'll be describing to you [00:05:41] the methods i'll be describing to you are applicable to studying bias in many [00:05:43] are applicable to studying bias in many other human decisions as i'll describe [00:05:47] so our police search is discriminatory [00:05:52] so our police search is discriminatory a little bit of context on police [00:05:53] a little bit of context on police searches so after the police stop a [00:05:55] searches so after the police stop a driver they're allowed to conduct a [00:05:56] driver they're allowed to conduct a search in order to find contraband [00:05:58] search in order to find contraband contraband here means things you're not [00:06:00] contraband here means things you're not supposed to be carrying like illegal [00:06:01] supposed to be carrying like illegal drugs weapons etc [00:06:03] drugs weapons etc the purpose of a search is to find [00:06:05] the purpose of a search is to find contraband they're not supposed to [00:06:06] contraband they're not supposed to search you just because they're curious [00:06:07] search you just because they're curious or because they're trying to harass you [00:06:09] or because they're trying to harass you or whatever [00:06:10] or whatever so because the purpose of a search is to [00:06:12] so because the purpose of a search is to find contraband we're going to test [00:06:14] find contraband we're going to test whether minorities are searched when [00:06:16] whether minorities are searched when they are less likely to have contraband [00:06:18] they are less likely to have contraband at a lower threshold of evidence so if [00:06:20] at a lower threshold of evidence so if police are searching white drivers for [00:06:22] police are searching white drivers for example when they're only 40 likely to [00:06:24] example when they're only 40 likely to carry contraband but they're searching [00:06:25] carry contraband but they're searching black drivers when they're only 20 [00:06:27] black drivers when they're only 20 likely to carry contraband those [00:06:29] likely to carry contraband those different different thresholds that [00:06:30] different different thresholds that would be discrimination under our [00:06:32] would be discrimination under our definition of discrimination importantly [00:06:34] definition of discrimination importantly this is only one way the police can [00:06:36] this is only one way the police can discriminate there are a lot of other [00:06:37] discriminate there are a lot of other problematic things the police can do as [00:06:40] problematic things the police can do as we've seen this year of course [00:06:42] we've seen this year of course we're testing for a very specific type [00:06:44] we're testing for a very specific type of police discrimination this is not [00:06:45] of police discrimination this is not comprehensive [00:06:49] so first simple test of whether the [00:06:51] so first simple test of whether the police are discriminating and whom they [00:06:52] police are discriminating and whom they search is to look at the search rates in [00:06:54] search is to look at the search rates in other words how likely is someone to be [00:06:56] other words how likely is someone to be searched after a stop and the results of [00:06:58] searched after a stop and the results of this analysis are shown for our data in [00:07:01] this analysis are shown for our data in the graph at right state patrol stops on [00:07:03] the graph at right state patrol stops on the left city stops on the right we're [00:07:05] the left city stops on the right we're plotting the average search rate across [00:07:06] plotting the average search rate across locations on the y-axis [00:07:09] locations on the y-axis and you can see that there are like [00:07:10] and you can see that there are like these very big gaps in this plot with [00:07:12] these very big gaps in this plot with black and hispanic drivers much more [00:07:13] black and hispanic drivers much more likely to be searched after a stop than [00:07:15] likely to be searched after a stop than are white drivers [00:07:18] are white drivers but this by itself does not prove that [00:07:20] but this by itself does not prove that the police are being discriminatory i.e [00:07:22] the police are being discriminatory i.e applying different thresholds on the [00:07:24] applying different thresholds on the basis of race [00:07:25] basis of race it's possible that some races are more [00:07:27] it's possible that some races are more likely to carry contraband drugs weapons [00:07:29] likely to carry contraband drugs weapons whatever [00:07:30] whatever the purpose of a search is to find [00:07:32] the purpose of a search is to find contraband so if some groups are more [00:07:33] contraband so if some groups are more likely to carry it police may be more [00:07:35] likely to carry it police may be more likely to search them even in the [00:07:37] likely to search them even in the absence of applying different thresholds [00:07:39] absence of applying different thresholds on the basis of race [00:07:42] so a second simple test that's been [00:07:44] so a second simple test that's been proposed to get around this problem is [00:07:46] proposed to get around this problem is to look not at the rates of searches but [00:07:48] to look not at the rates of searches but at the outcomes of those searches [00:07:50] at the outcomes of those searches this is called an outcome test and the [00:07:52] this is called an outcome test and the idea is you look at how likely the [00:07:54] idea is you look at how likely the search is to find contraband we call [00:07:56] search is to find contraband we call that the hit rate this was proposed by [00:07:58] that the hit rate this was proposed by becker and other economists it's decades [00:08:00] becker and other economists it's decades old it's a very frequent test for in the [00:08:02] old it's a very frequent test for in the economics literature [00:08:04] economics literature the intuition behind this test is like [00:08:06] the intuition behind this test is like look if searches of white drivers are [00:08:08] look if searches of white drivers are finding contraband 90 of the time but [00:08:10] finding contraband 90 of the time but searches of black drivers are finding [00:08:12] searches of black drivers are finding contraband only 10 of the time it [00:08:14] contraband only 10 of the time it suggests that police are searching white [00:08:16] suggests that police are searching white drivers only when they're very likely to [00:08:17] drivers only when they're very likely to carry contraband but they're searching [00:08:19] carry contraband but they're searching black drivers on the basis of relatively [00:08:21] black drivers on the basis of relatively little evidence indicative of [00:08:23] little evidence indicative of discrimination [00:08:24] discrimination so if there are differences in the hit [00:08:26] so if there are differences in the hit rates by race that's discrimination [00:08:28] rates by race that's discrimination under the outcome test and when you do [00:08:30] under the outcome test and when you do this analysis on our data you do indeed [00:08:32] this analysis on our data you do indeed see that hit rates are lower for black [00:08:34] see that hit rates are lower for black and hispanic drivers in both state stops [00:08:36] and hispanic drivers in both state stops and city stops than they are for white [00:08:38] and city stops than they are for white drivers suggesting discrimination [00:08:40] drivers suggesting discrimination against minority groups [00:08:43] but it turns out that there's a flaw in [00:08:45] but it turns out that there's a flaw in the outcome test as well and this is [00:08:47] the outcome test as well and this is called infra marginality and i'm going [00:08:49] called infra marginality and i'm going to illustrate it with a simple [00:08:50] to illustrate it with a simple hypothetical example totally [00:08:51] hypothetical example totally hypothetical these numbers are made up [00:08:54] hypothetical these numbers are made up imagine there are two races black [00:08:55] imagine there are two races black drivers and white drivers and imagine [00:08:58] drivers and white drivers and imagine among each race there are two groups [00:08:59] among each race there are two groups those who are very likely to carry [00:09:01] those who are very likely to carry contraband and those who are quite [00:09:02] contraband and those who are quite unlikely [00:09:04] unlikely and these groups are easy to tell apart [00:09:05] and these groups are easy to tell apart you know maybe one of them is wearing [00:09:06] you know maybe one of them is wearing blue hats [00:09:08] blue hats so among the likely group fifty percent [00:09:10] so among the likely group fifty percent of black drivers carry contraband and [00:09:12] of black drivers carry contraband and seventy-five percent of white drivers [00:09:14] seventy-five percent of white drivers carry contraband among the unlikely [00:09:16] carry contraband among the unlikely group five percent carry contraband [00:09:18] group five percent carry contraband regardless of their race [00:09:20] regardless of their race and importantly imagine in this [00:09:22] and importantly imagine in this hypothetical example that the police are [00:09:24] hypothetical example that the police are not being discriminatory they search [00:09:26] not being discriminatory they search everyone who is more than 10 percent [00:09:28] everyone who is more than 10 percent likely to carry contraband they apply [00:09:30] likely to carry contraband they apply the same threshold irrespective of [00:09:32] the same threshold irrespective of driver race [00:09:33] driver race what are the hit rates for white and [00:09:35] what are the hit rates for white and black drivers going to be in this [00:09:36] black drivers going to be in this hypothetical example [00:09:38] hypothetical example well the police are going to search all [00:09:40] well the police are going to search all the likely drivers and they're going to [00:09:41] the likely drivers and they're going to end up with a hit rate of 50 for black [00:09:43] end up with a hit rate of 50 for black drivers and 75 for white drivers [00:09:46] drivers and 75 for white drivers so from that difference in hit rates [00:09:48] so from that difference in hit rates we're going to conclude that there's [00:09:50] we're going to conclude that there's discrimination in this hypothetical [00:09:51] discrimination in this hypothetical example but that's a misleading [00:09:53] example but that's a misleading conclusion because by assumption we're [00:09:55] conclusion because by assumption we're applying the same threshold to both [00:09:57] applying the same threshold to both groups [00:09:59] groups so why is this happening why are we [00:10:00] so why is this happening why are we getting this misleading result well it's [00:10:02] getting this misleading result well it's happening because the statistic we're [00:10:04] happening because the statistic we're looking at the probability of carrying [00:10:06] looking at the probability of carrying contraband conditional and being above [00:10:08] contraband conditional and being above the threshold [00:10:09] the threshold is not the same as what we actually care [00:10:11] is not the same as what we actually care about which is this [00:10:15] so these are simply different quantities [00:10:17] so these are simply different quantities threshold itself is hard to infer [00:10:19] threshold itself is hard to infer it's not directly measurable from the [00:10:21] it's not directly measurable from the data the way the hit rate is [00:10:23] data the way the hit rate is so the solution that's been proposed is [00:10:25] so the solution that's been proposed is to use a bayesian latent variable model [00:10:27] to use a bayesian latent variable model to try to infer this threshold so i'll [00:10:29] to try to infer this threshold so i'll tell you about that now before i do [00:10:31] tell you about that now before i do though are there any pressing questions [00:10:32] though are there any pressing questions and also am i talking at an appropriate [00:10:34] and also am i talking at an appropriate volume [00:10:36] volume cool [00:10:39] so [00:10:40] so the threshold test proposes a stylized [00:10:42] the threshold test proposes a stylized model of a police stop and when i say [00:10:43] model of a police stop and when i say stylized what i mean is you can never [00:10:45] stylized what i mean is you can never capture all aspects of the real world in [00:10:47] capture all aspects of the real world in math right your hope is that you capture [00:10:49] math right your hope is that you capture sort of enough relevant aspects of the [00:10:51] sort of enough relevant aspects of the real world to enable you to measure the [00:10:52] real world to enable you to measure the quantities of interest [00:10:54] quantities of interest in this case the thing we want to [00:10:55] in this case the thing we want to measure is that threshold at which the [00:10:57] measure is that threshold at which the search is being conducted so the goal of [00:10:59] search is being conducted so the goal of this model is to estimate the search [00:11:01] this model is to estimate the search thresholds which are consistent with the [00:11:03] thresholds which are consistent with the observed data namely the search rates [00:11:05] observed data namely the search rates and the hit rates [00:11:06] and the hit rates and discrimination just as before is if [00:11:09] and discrimination just as before is if lower search thresholds are being [00:11:10] lower search thresholds are being applied in searches of minority drivers [00:11:15] so here's how the threshold test models [00:11:17] so here's how the threshold test models a police stop [00:11:19] a police stop we imagine that when the officer stops [00:11:21] we imagine that when the officer stops someone they estimate the probability p [00:11:24] someone they estimate the probability p that that person carrier is contraband p [00:11:26] that that person carrier is contraband p captures you know contextual factors [00:11:28] captures you know contextual factors like the age and the gender of the [00:11:30] like the age and the gender of the driver how nervous they're acting etc [00:11:33] driver how nervous they're acting etc and it's drawn from a risk distribution [00:11:35] and it's drawn from a risk distribution which is shown graphically at right so [00:11:37] which is shown graphically at right so the risk distribution is a probability [00:11:39] the risk distribution is a probability distribution on the unit interval so it [00:11:41] distribution on the unit interval so it ranges from zero to one [00:11:44] ranges from zero to one for example if the police you know pull [00:11:46] for example if the police you know pull over a bus driver p is probably quite [00:11:48] over a bus driver p is probably quite low right because he's like driving kids [00:11:50] low right because he's like driving kids around hopefully he's not also like [00:11:52] around hopefully he's not also like burying weapons or drugs and so p would [00:11:54] burying weapons or drugs and so p would be pretty low on the other hand if they [00:11:56] be pretty low on the other hand if they pull over a driver and he's like acting [00:11:58] pull over a driver and he's like acting woozy and drinking out of a bottle like [00:12:00] woozy and drinking out of a bottle like that's pretty sketchy p is probably [00:12:02] that's pretty sketchy p is probably higher [00:12:05] now in order to fit this model at all [00:12:07] now in order to fit this model at all you have to make some assumption about [00:12:08] you have to make some assumption about what the risk distributions look like [00:12:10] what the risk distributions look like you can't fit arbitrary probability [00:12:12] you can't fit arbitrary probability distributions because then you would [00:12:13] distributions because then you would have infinite degrees of freedom so the [00:12:16] have infinite degrees of freedom so the parametric assumption that the model [00:12:17] parametric assumption that the model makes is that the risk distributions are [00:12:19] makes is that the risk distributions are beta distributions which is a very [00:12:21] beta distributions which is a very standard distribution on the unit [00:12:23] standard distribution on the unit interval [00:12:26] now if p is greater than some threshold [00:12:28] now if p is greater than some threshold the officer searches the person and if [00:12:30] the officer searches the person and if they search the person they find [00:12:31] they search the person they find contraband with probability p [00:12:34] contraband with probability p so in the case of the bus driver he'd be [00:12:36] so in the case of the bus driver he'd be below the threshold and so the officer [00:12:38] below the threshold and so the officer wouldn't search him wouldn't find [00:12:39] wouldn't search him wouldn't find contraband [00:12:41] contraband in the case of the woozy acting driver [00:12:43] in the case of the woozy acting driver he would be above the threshold so the [00:12:45] he would be above the threshold so the officer would search him and would find [00:12:46] officer would search him and would find contraband with a 75 [00:12:48] contraband with a 75 probability [00:12:52] the model allows the thresholds in the [00:12:54] the model allows the thresholds in the risk distributions to vary by race and [00:12:55] risk distributions to vary by race and location and discrimination as before as [00:12:58] location and discrimination as before as if lower thresholds are being applied in [00:12:59] if lower thresholds are being applied in searches of minority drivers [00:13:03] now in order to fit this model at all [00:13:06] now in order to fit this model at all this being a bayesian model you have to [00:13:07] this being a bayesian model you have to specify how you go from the unobserved [00:13:09] specify how you go from the unobserved objects to the observed data so what are [00:13:12] objects to the observed data so what are the unobserved objects and the observed [00:13:13] the unobserved objects and the observed data here [00:13:14] data here well the unobserved objects are the [00:13:16] well the unobserved objects are the thresholds which are the main thing we [00:13:18] thresholds which are the main thing we care about and the risk distributions so [00:13:20] care about and the risk distributions so graphically that's the dotted line and [00:13:22] graphically that's the dotted line and the blue line in the figure right [00:13:25] the blue line in the figure right the observed data are the search rates [00:13:27] the observed data are the search rates and the hit rates for each race and [00:13:28] and the hit rates for each race and location for example the search rate for [00:13:31] location for example the search rate for black drivers in alameda county is 30 [00:13:33] black drivers in alameda county is 30 and the hit rate is 40 [00:13:36] and the hit rate is 40 so how do we go from unobserved to [00:13:37] so how do we go from unobserved to observed [00:13:39] observed well i've shown this graphically at [00:13:40] well i've shown this graphically at right [00:13:41] right the search rate is the amount of the [00:13:44] the search rate is the amount of the risk distribution that lies above the [00:13:46] risk distribution that lies above the threshold so graphically it's the amount [00:13:48] threshold so graphically it's the amount of gray mass you can also you know [00:13:50] of gray mass you can also you know express it as 1 minus the cdf of the [00:13:52] express it as 1 minus the cdf of the risk distribution this is intuitive [00:13:54] risk distribution this is intuitive right it's how much of the risk [00:13:55] right it's how much of the risk distribution lies above this threshold [00:13:58] distribution lies above this threshold the hit rate is what is the expected [00:14:00] the hit rate is what is the expected value of the risk distribution [00:14:02] value of the risk distribution conditional on drawing from the gray [00:14:03] conditional on drawing from the gray mass so conditional on drawing from the [00:14:05] mass so conditional on drawing from the portion of the risk distribution which [00:14:07] portion of the risk distribution which lies above the threshold what's your [00:14:08] lies above the threshold what's your expected value [00:14:10] expected value so that's how we go from these [00:14:11] so that's how we go from these unobserved objects to the observed data [00:14:13] unobserved objects to the observed data that's sort of the likelihood portion of [00:14:15] that's sort of the likelihood portion of the bayesian model [00:14:17] the bayesian model to complete the bayesian model [00:14:18] to complete the bayesian model specification you also need a prior you [00:14:20] specification you also need a prior you need to place priors on your parameters [00:14:22] need to place priors on your parameters not going to describe that in detail [00:14:24] not going to describe that in detail but basically in order to complete the [00:14:26] but basically in order to complete the specification you place priors on the [00:14:28] specification you place priors on the thresholds and the risk distribution [00:14:29] thresholds and the risk distribution parameters [00:14:30] parameters now by combining those two things the [00:14:32] now by combining those two things the likelihood in the prior you can use [00:14:34] likelihood in the prior you can use standard patient inference to infer the [00:14:36] standard patient inference to infer the posterior over the parameters [00:14:39] posterior over the parameters and the specific thing we care about is [00:14:41] and the specific thing we care about is what is our best estimate of what those [00:14:43] what is our best estimate of what those thresholds are given our observed data [00:14:48] now unfortunately it turns out the story [00:14:50] now unfortunately it turns out the story i told you is a little too simple and it [00:14:53] i told you is a little too simple and it turns out that fitting a model on a data [00:14:54] turns out that fitting a model on a data set of our size is much much much too [00:14:56] set of our size is much much much too slow and the reason goes back to the [00:14:59] slow and the reason goes back to the fact that the risk distributions are [00:15:00] fact that the risk distributions are beta distributions in order to compute [00:15:03] beta distributions in order to compute the search rate and the hit rate you [00:15:04] the search rate and the hit rate you have to compute the cdf and conditional [00:15:06] have to compute the cdf and conditional mean of the beta distribution and it [00:15:08] mean of the beta distribution and it turns out that that is very slow [00:15:10] turns out that that is very slow especially when you have to compute [00:15:11] especially when you have to compute their gradients this [00:15:15] which you have to do to use the m [00:15:19] which you have to do to use the m the exact mathematical details of y i'm [00:15:21] the exact mathematical details of y i'm not going to get into but the tl dr is [00:15:23] not going to get into but the tl dr is that fitting the entire national data [00:15:24] that fitting the entire national data set is impossible and perhaps more [00:15:27] set is impossible and perhaps more importantly the test can't be used by [00:15:28] importantly the test can't be used by people who really need it journalists [00:15:31] people who really need it journalists police departments anyone who doesn't [00:15:32] police departments anyone who doesn't have sort of a ton of compute and a ton [00:15:35] have sort of a ton of compute and a ton of grad students [00:15:38] so what we had to do was replace the [00:15:41] so what we had to do was replace the beta distributions with a new family of [00:15:42] beta distributions with a new family of probability distributions called [00:15:44] probability distributions called discriminant distributions [00:15:46] discriminant distributions and describing those distributions and [00:15:47] and describing those distributions and details beyond the scope of this talk [00:15:49] details beyond the scope of this talk although i'm happy to chat with people [00:15:50] although i'm happy to chat with people afterwards if they're specifically [00:15:51] afterwards if they're specifically interested in probability distributions [00:15:53] interested in probability distributions but it turns out that this new family of [00:15:55] but it turns out that this new family of probability distributions makes the test [00:15:57] probability distributions makes the test run two orders of magnitude faster [00:16:00] run two orders of magnitude faster and that makes it feasible to run on a [00:16:02] and that makes it feasible to run on a data set of our size [00:16:03] data set of our size i guess a high level takeaway here is [00:16:05] i guess a high level takeaway here is that like probability distributions are [00:16:07] that like probability distributions are not just something you learn in cs 109 [00:16:09] not just something you learn in cs 109 so you can pass cs109 they're actually [00:16:11] so you can pass cs109 they're actually quite practically important and it's [00:16:12] quite practically important and it's worth paying attention to them and [00:16:14] worth paying attention to them and thinking about what their drawbacks are [00:16:18] for now though i'm just going to show [00:16:20] for now though i'm just going to show you the results which is you know now we [00:16:22] you the results which is you know now we can actually take this fast threshold [00:16:23] can actually take this fast threshold test and we can apply it to our national [00:16:25] test and we can apply it to our national data set [00:16:26] data set and so here what i'm showing you is the [00:16:28] and so here what i'm showing you is the output of this model it's the average [00:16:29] output of this model it's the average estimated threshold [00:16:31] estimated threshold and again we're averaging across [00:16:32] and again we're averaging across locations [00:16:33] locations and you can see that the average [00:16:35] and you can see that the average threshold is lower for black and [00:16:37] threshold is lower for black and hispanic drivers than it is for white [00:16:39] hispanic drivers than it is for white drivers suggesting that they're being [00:16:41] drivers suggesting that they're being searched on the basis of less evidence [00:16:45] so to summarize what i've shown you from [00:16:47] so to summarize what i've shown you from this search analysis i've shown you [00:16:48] this search analysis i've shown you three results i've shown you that search [00:16:50] three results i've shown you that search rates are higher for minorities that hit [00:16:52] rates are higher for minorities that hit rates are lower and that thresholds are [00:16:54] rates are lower and that thresholds are lower [00:16:55] lower this is sort of a characteristic pattern [00:16:57] this is sort of a characteristic pattern for discriminatory searches you'll see [00:16:59] for discriminatory searches you'll see the same pattern for example if you look [00:17:01] the same pattern for example if you look at stop and frisk data in new york city [00:17:03] at stop and frisk data in new york city which is a very very obviously [00:17:05] which is a very very obviously discriminatory policy [00:17:07] discriminatory policy all three tests here is suggesting [00:17:09] all three tests here is suggesting discrimination against minorities but [00:17:11] discrimination against minorities but the threshold test is doing so in a way [00:17:12] the threshold test is doing so in a way which is robust to the statistical flaws [00:17:14] which is robust to the statistical flaws of simpler tests like infra-marginality [00:17:19] i mentioned that the same methods can be [00:17:21] i mentioned that the same methods can be applied in other data sets where you [00:17:22] applied in other data sets where you have a binary decision and a binary [00:17:24] have a binary decision and a binary outcome so i just want to give you some [00:17:26] outcome so i just want to give you some quick examples of this [00:17:27] quick examples of this for example we can apply it in the [00:17:29] for example we can apply it in the medical domain to covet testing for [00:17:31] medical domain to covet testing for example where the binary decision is [00:17:33] example where the binary decision is does someone get tested for covid and [00:17:35] does someone get tested for covid and the binary outcome is do they test [00:17:37] the binary outcome is do they test positive for covet and if you see for [00:17:40] positive for covet and if you see for example that uh minorities who get [00:17:42] example that uh minorities who get tested for covet are much more likely to [00:17:44] tested for covet are much more likely to test positive then it's a worrisome sign [00:17:47] test positive then it's a worrisome sign because it suggests that they're only [00:17:48] because it suggests that they're only getting tested at higher thresholds of [00:17:51] getting tested at higher thresholds of evidence they're maybe being [00:17:52] evidence they're maybe being under-tested for covid and in fact we we [00:17:54] under-tested for covid and in fact we we do see some evidence that that is the [00:17:56] do see some evidence that that is the case so this is sort of a more broadly [00:17:57] case so this is sort of a more broadly applicable methodology [00:18:01] applicable methodology finally to close on the public policy [00:18:03] finally to close on the public policy impact of this work um i mentioned one [00:18:04] impact of this work um i mentioned one benefit of using this different [00:18:06] benefit of using this different probability distribution as your test [00:18:07] probability distribution as your test runs 100 times faster and this makes it [00:18:09] runs 100 times faster and this makes it easier for journalists to use and in [00:18:11] easier for journalists to use and in fact that was exactly what we saw the [00:18:13] fact that was exactly what we saw the los angeles times was able to take our [00:18:15] los angeles times was able to take our faster test with some assistance from [00:18:17] faster test with some assistance from our team and use it to show that black [00:18:19] our team and use it to show that black and hispanic drivers in los angeles were [00:18:21] and hispanic drivers in los angeles were being searched on the basis of less [00:18:22] being searched on the basis of less evidence [00:18:23] evidence and in response to that within about a [00:18:25] and in response to that within about a week the lapd announced that they were [00:18:27] week the lapd announced that they were going to cut back on police searches in [00:18:29] going to cut back on police searches in response to these concerns over racial [00:18:31] response to these concerns over racial bias this is why working with [00:18:32] bias this is why working with journalists and other real world actors [00:18:34] journalists and other real world actors is nice because they help you translate [00:18:36] is nice because they help you translate your sort of research findings into real [00:18:37] your sort of research findings into real world impact [00:18:40] okay so before i go on to the second [00:18:42] okay so before i go on to the second story are there any questions i should [00:18:44] story are there any questions i should answer [00:18:46] answer yeah um so we have one question [00:18:49] yeah um so we have one question from a student asking um in india police [00:18:52] from a student asking um in india police harass the poor [00:18:54] harass the poor um based on how someone is dressed for [00:18:56] um based on how someone is dressed for two or two-seated drivers for example um [00:18:59] two or two-seated drivers for example um so can this model that you've been [00:19:00] so can this model that you've been describing be applied based on economic [00:19:02] describing be applied based on economic status [00:19:04] status um [00:19:05] um that's [00:19:06] that's that's super you know i've given this [00:19:08] that's super you know i've given this talk like 50 times and no one has ever [00:19:09] talk like 50 times and no one has ever asked that question that's super [00:19:10] asked that question that's super interesting i would be curious to hear [00:19:12] interesting i would be curious to hear more i there is nothing in principle [00:19:14] more i there is nothing in principle which precludes applying it on the basis [00:19:15] which precludes applying it on the basis of economic status [00:19:19] okay should i go that's the only [00:19:20] okay should i go that's the only question for now [00:19:22] question for now okay uh cool all right so let's move to [00:19:25] okay uh cool all right so let's move to our second story which is about using ai [00:19:27] our second story which is about using ai to study inequality um in in pain um and [00:19:30] to study inequality um in in pain um and this is joint work with david yuray [00:19:32] this is joint work with david yuray sendal uh nco uray is a is a professor [00:19:35] sendal uh nco uray is a is a professor here and he also prefers black and white [00:19:37] here and he also prefers black and white photos it would appear [00:19:39] photos it would appear okay oh he's also my academic advisor i [00:19:41] okay oh he's also my academic advisor i guess this is a relevant point [00:19:43] guess this is a relevant point okay so a general fact about pain is [00:19:46] okay so a general fact about pain is that disadvantaged groups experience [00:19:47] that disadvantaged groups experience more of it um you see this for [00:19:49] more of it um you see this for socioeconomic disadvantage across a [00:19:51] socioeconomic disadvantage across a variety of types of pain across multiple [00:19:53] variety of types of pain across multiple continents across multiple samples it's [00:19:54] continents across multiple samples it's quite a robust finding and you see it [00:19:56] quite a robust finding and you see it for racially disadvantaged groups as [00:19:58] for racially disadvantaged groups as well [00:20:00] well and this is also true in the condition [00:20:01] and this is also true in the condition i'll be talking about today knee [00:20:03] i'll be talking about today knee osteoarthritis which is one of the most [00:20:04] osteoarthritis which is one of the most common causes of disabling pain uh in [00:20:07] common causes of disabling pain uh in older adults so mechanically what's [00:20:08] older adults so mechanically what's happening is that with the wear and tear [00:20:10] happening is that with the wear and tear of time sort of the padding between your [00:20:12] of time sort of the padding between your knee bones the roads the bones grind [00:20:14] knee bones the roads the bones grind together uh and this causes a lot of [00:20:16] together uh and this causes a lot of pain and it's like very common you know [00:20:17] pain and it's like very common you know odds are good that like multiple people [00:20:19] odds are good that like multiple people listening to this talk will develop it [00:20:22] so [00:20:23] so in osteoarthritis as in other conditions [00:20:26] in osteoarthritis as in other conditions disadvantaged groups experience worse [00:20:28] disadvantaged groups experience worse pain [00:20:29] pain a natural explanation is like oh maybe [00:20:31] a natural explanation is like oh maybe they just have worse [00:20:32] they just have worse osteoarthritis but here's the [00:20:34] osteoarthritis but here's the interesting thing here's the fact we're [00:20:36] interesting thing here's the fact we're going to try and explain it turns out [00:20:38] going to try and explain it turns out these groups have worse pain even when [00:20:39] these groups have worse pain even when we control for how severe the doctor [00:20:41] we control for how severe the doctor thinks their disease is [00:20:44] thinks their disease is so i want to explain to you what i mean [00:20:46] so i want to explain to you what i mean by that but in order for that to make [00:20:47] by that but in order for that to make sense i have to explain how we measure [00:20:50] sense i have to explain how we measure severity and pain [00:20:51] severity and pain so how do we measure severity basically [00:20:53] so how do we measure severity basically a doctor looks at an x-ray of the knee [00:20:55] a doctor looks at an x-ray of the knee grades it on a bunch of factors [00:20:57] grades it on a bunch of factors and says this is a summary score so like [00:21:00] and says this is a summary score so like specifically you know they'll look at an [00:21:02] specifically you know they'll look at an x-ray of the knee and say stuff like oh [00:21:03] x-ray of the knee and say stuff like oh you definitely have an osteophyte bone [00:21:05] you definitely have an osteophyte bone spur um and you have these other [00:21:07] spur um and you have these other features like the joint space between [00:21:09] features like the joint space between your knee bones has reduced uh and so [00:21:11] your knee bones has reduced uh and so i'm going to give it a score called the [00:21:12] i'm going to give it a score called the kelgrand lawrence grade that ranges from [00:21:14] kelgrand lawrence grade that ranges from zero to four [00:21:15] zero to four and it's sort of a categorical summary [00:21:17] and it's sort of a categorical summary measure where higher scores indicate [00:21:19] measure where higher scores indicate worse pain [00:21:21] worse pain how do we measure pain well you ask the [00:21:23] how do we measure pain well you ask the patient a bunch of questions like how [00:21:24] patient a bunch of questions like how much pain do you feel when you're [00:21:25] much pain do you feel when you're bending your knee and then we take the [00:21:27] bending your knee and then we take the answers to those questions and we [00:21:29] answers to those questions and we aggregate it into a single score called [00:21:30] aggregate it into a single score called coos pain score so it's the result of a [00:21:33] coos pain score so it's the result of a survey [00:21:36] the data we're going to be using comes [00:21:37] the data we're going to be using comes from the osteoarthritis initiative it's [00:21:39] from the osteoarthritis initiative it's publicly available data [00:21:41] publicly available data all the results i'm going to be [00:21:42] all the results i'm going to be presenting are on about 300 people and [00:21:44] presenting are on about 300 people and we're going to be comparing pain by [00:21:46] we're going to be comparing pain by three binary groupings we're going to be [00:21:47] three binary groupings we're going to be comparing black to non-black patients [00:21:49] comparing black to non-black patients and almost all the non-black patients in [00:21:51] and almost all the non-black patients in the dataset are white and we're going to [00:21:53] the dataset are white and we're going to be comparing lower and higher income [00:21:54] be comparing lower and higher income patients lower and higher education [00:21:56] patients lower and higher education patients [00:22:00] so what do i mean when i say [00:22:01] so what do i mean when i say disadvantaged patients have more pain so [00:22:03] disadvantaged patients have more pain so here what i'm showing you is a vertical [00:22:05] here what i'm showing you is a vertical histogram with pain on the y-axis so [00:22:07] histogram with pain on the y-axis so lower scores indicate worse pain [00:22:09] lower scores indicate worse pain and i'm showing you the histograms for [00:22:11] and i'm showing you the histograms for black versus non-black patients and you [00:22:13] black versus non-black patients and you can see that there's a big visual [00:22:15] can see that there's a big visual difference in the histogram where black [00:22:16] difference in the histogram where black patients have worse pain if you want to [00:22:18] patients have worse pain if you want to summarize it in a single measure you can [00:22:20] summarize it in a single measure you can just take the difference in means for [00:22:21] just take the difference in means for the two groups and it's about 10.6 [00:22:23] the two groups and it's about 10.6 points on the ku scale which is about [00:22:25] points on the ku scale which is about two-thirds of a standard deviation so [00:22:27] two-thirds of a standard deviation so it's a big gap [00:22:30] and the results for income and education [00:22:32] and the results for income and education are somewhat smaller but still [00:22:33] are somewhat smaller but still substantively large and statistically [00:22:35] substantively large and statistically significant [00:22:37] significant the things i'm showing in parentheses or [00:22:39] the things i'm showing in parentheses or the confidence intervals [00:22:42] so what happens when we control for [00:22:43] so what happens when we control for severity does the pain gap go away it [00:22:46] severity does the pain gap go away it turns out that it doesn't so now the [00:22:48] turns out that it doesn't so now the graph i'm showing you right has severity [00:22:50] graph i'm showing you right has severity on the x-axis that klg score i was [00:22:52] on the x-axis that klg score i was telling you about before and pain is on [00:22:54] telling you about before and pain is on the y-axis as before and the important [00:22:57] the y-axis as before and the important point from this graph is that the orange [00:22:59] point from this graph is that the orange and blue lines are not on top of each [00:23:00] and blue lines are not on top of each other even conditional on severity [00:23:03] other even conditional on severity there's a gap in pain between black and [00:23:05] there's a gap in pain between black and non-black patients and if we want to [00:23:07] non-black patients and if we want to summarize the size of that gap in a [00:23:09] summarize the size of that gap in a single number the standard way to do so [00:23:11] single number the standard way to do so is with a linear regression specifically [00:23:13] is with a linear regression specifically we do a regression on pain of pain on [00:23:16] we do a regression on pain of pain on race and klg and that tells us basically [00:23:19] race and klg and that tells us basically the size of the pain gap when we control [00:23:21] the size of the pain gap when we control for that severity score klg [00:23:24] for that severity score klg and i've shown those numerical results [00:23:26] and i've shown those numerical results in the second numerical column you can [00:23:28] in the second numerical column you can see that for race for example the pain [00:23:30] see that for race for example the pain gap shrinks from 10.6 points when we [00:23:33] gap shrinks from 10.6 points when we don't control for anything to 9.7 points [00:23:35] don't control for anything to 9.7 points when we do control for klg the important [00:23:38] when we do control for klg the important point being it really doesn't get all [00:23:39] point being it really doesn't get all that much smaller right 10.6 is almost [00:23:42] that much smaller right 10.6 is almost you know as big as 9.7 it only gets nine [00:23:44] you know as big as 9.7 it only gets nine percent smaller and results for income [00:23:46] percent smaller and results for income and education are similar so the high [00:23:48] and education are similar so the high level takeaway is controlling for [00:23:50] level takeaway is controlling for severity doesn't do very much to narrow [00:23:52] severity doesn't do very much to narrow the pain gap this isn't our unique [00:23:54] the pain gap this isn't our unique finding by the way other studies find [00:23:56] finding by the way other studies find this as well the goal of our paper is to [00:23:58] this as well the goal of our paper is to explain why why is there a pain gap even [00:24:01] explain why why is there a pain gap even conditional on severity [00:24:05] specifically we're going to try and [00:24:07] specifically we're going to try and differentiate between two theories the [00:24:09] differentiate between two theories the first theory we call the outside their [00:24:11] first theory we call the outside their knees theory namely that there are [00:24:12] knees theory namely that there are non-knee related factors which are [00:24:14] non-knee related factors which are causing disadvantaged patients to report [00:24:16] causing disadvantaged patients to report higher pain even when their knee disease [00:24:19] higher pain even when their knee disease is no more severe [00:24:20] is no more severe and this isn't just some like crazy [00:24:22] and this isn't just some like crazy theory we plucked out of thin air a [00:24:24] theory we plucked out of thin air a bunch of prior work points to some [00:24:25] bunch of prior work points to some factors that might you know might cause [00:24:27] factors that might you know might cause higher pain and disadvantaged groups [00:24:29] higher pain and disadvantaged groups maybe higher life stress differences in [00:24:31] maybe higher life stress differences in access to pain medication differences in [00:24:33] access to pain medication differences in how different groups report pain you [00:24:35] how different groups report pain you know there are a whole bunch of [00:24:36] know there are a whole bunch of possibilities [00:24:38] possibilities the commonality here though is that [00:24:40] the commonality here though is that whatever the factor is it isn't anything [00:24:42] whatever the factor is it isn't anything that can be seen in a knee x-ray it's [00:24:44] that can be seen in a knee x-ray it's something outside the knee [00:24:47] something outside the knee but there's a second possibility right [00:24:49] but there's a second possibility right and we call this the in their knees [00:24:50] and we call this the in their knees theory namely that there are pain [00:24:52] theory namely that there are pain related ailments in the knee x-ray which [00:24:55] related ailments in the knee x-ray which klg isn't capturing and if we could [00:24:57] klg isn't capturing and if we could capture these physical features we would [00:24:59] capture these physical features we would be able to explain more of the pain gap [00:25:02] be able to explain more of the pain gap so under the first theory there's [00:25:03] so under the first theory there's nothing to be seen in the knee x-ray [00:25:05] nothing to be seen in the knee x-ray that would explain this gap and under [00:25:07] that would explain this gap and under the second theory there is something to [00:25:09] the second theory there is something to be seen that klg isn't picking up [00:25:16] so why is the second hypothesis [00:25:18] so why is the second hypothesis plausible here are two reasons the first [00:25:20] plausible here are two reasons the first is that we don't understand pain all [00:25:21] is that we don't understand pain all that well this is true generally it's [00:25:23] that well this is true generally it's also true in osteoarthritis specifically [00:25:25] also true in osteoarthritis specifically klg just doesn't explain all that much [00:25:27] klg just doesn't explain all that much of the variation in pain [00:25:28] of the variation in pain and a possible reason for this is that [00:25:30] and a possible reason for this is that klg was developed decades ago in heavily [00:25:33] klg was developed decades ago in heavily white british populations and so it's [00:25:35] white british populations and so it's plausible that it's not capturing all [00:25:37] plausible that it's not capturing all the environmental or occupational [00:25:38] the environmental or occupational features that may be relevant to pain in [00:25:41] features that may be relevant to pain in modern and more diverse populations that [00:25:43] modern and more diverse populations that may live and work very differently [00:25:47] so we're going to try and test whether [00:25:49] so we're going to try and test whether there are overlooked physical features [00:25:50] there are overlooked physical features in the knee which would explain the [00:25:52] in the knee which would explain the higher pain levels in disadvantaged [00:25:53] higher pain levels in disadvantaged groups [00:25:55] groups this isn't just an academically [00:25:56] this isn't just an academically interesting question it's also a [00:25:58] interesting question it's also a question with concrete clinical [00:25:59] question with concrete clinical implications and the reason is that [00:26:01] implications and the reason is that whether you get knee surgery depends on [00:26:03] whether you get knee surgery depends on whether the source of your pain is in [00:26:04] whether the source of your pain is in your knee [00:26:05] your knee if you go to the doctor in a lot of pain [00:26:07] if you go to the doctor in a lot of pain and she looks at your knee and she says [00:26:08] and she looks at your knee and she says i'm sorry i can't see what's wrong with [00:26:10] i'm sorry i can't see what's wrong with it she's unlikely to get you knee [00:26:12] it she's unlikely to get you knee surgery for an apparently healthy knee [00:26:14] surgery for an apparently healthy knee she's more likely to prescribe [00:26:15] she's more likely to prescribe non-specific therapies like opioids or [00:26:17] non-specific therapies like opioids or other painkillers [00:26:19] other painkillers in contrast if you go to the doctor in a [00:26:20] in contrast if you go to the doctor in a lot of pain and she says aha i know [00:26:22] lot of pain and she says aha i know exactly what's wrong with you you very [00:26:24] exactly what's wrong with you you very severe radiographic arthritis you know [00:26:26] severe radiographic arthritis you know you're poor on the kelken lawrence scale [00:26:28] you're poor on the kelken lawrence scale then it's much more likely under [00:26:29] then it's much more likely under clinical guidelines that you'll get some [00:26:31] clinical guidelines that you'll get some kind of surgical intervention [00:26:33] kind of surgical intervention consequently if klg is missing true [00:26:35] consequently if klg is missing true sources of pain within the knee and [00:26:37] sources of pain within the knee and disadvantaged groups these groups may be [00:26:39] disadvantaged groups these groups may be under referred for surgery [00:26:43] okay so we're going to try and test this [00:26:45] okay so we're going to try and test this and and methodologically what we're [00:26:47] and and methodologically what we're going to do is we're going to train a [00:26:48] going to do is we're going to train a convolutional neural network this is how [00:26:50] convolutional neural network this is how you know this is sophisticated because [00:26:51] you know this is sophisticated because we're using deep learning to search for [00:26:53] we're using deep learning to search for additional signal in the knee x-ray [00:26:55] additional signal in the knee x-ray which would explain the higher pain [00:26:56] which would explain the higher pain levels in disadvantaged groups [00:27:00] so what does that actually mean how are [00:27:02] so what does that actually mean how are you going to search for additional [00:27:03] you going to search for additional signal in the knee x-ray well the [00:27:05] signal in the knee x-ray well the standard approach to searching for [00:27:06] standard approach to searching for signal in a medical image is to train a [00:27:08] signal in a medical image is to train a model to replicate the doctor's clinical [00:27:10] model to replicate the doctor's clinical judgment to train it to predict klg [00:27:14] judgment to train it to predict klg the problem though is that if klg [00:27:15] the problem though is that if klg doesn't capture all the pain relevant [00:27:17] doesn't capture all the pain relevant features we don't want to just replicate [00:27:18] features we don't want to just replicate it we don't want to set a ceiling of [00:27:20] it we don't want to set a ceiling of clinical knowledge when by hypothesis [00:27:22] clinical knowledge when by hypothesis that clinical knowledge might be biased [00:27:24] that clinical knowledge might be biased or incomplete [00:27:26] or incomplete so instead what we're going to do is [00:27:28] so instead what we're going to do is train the model to learn from the [00:27:29] train the model to learn from the patient by predicting ku's pain score [00:27:33] patient by predicting ku's pain score so to be very clear the input to the [00:27:34] so to be very clear the input to the model is an x-ray of the knees and the [00:27:37] model is an x-ray of the knees and the output is a knee specific pain [00:27:38] output is a knee specific pain prediction called alg-p for algorithmic [00:27:40] prediction called alg-p for algorithmic severity measure [00:27:43] severity measure and if when we control for this [00:27:45] and if when we control for this algorithmic severity measure alpe it [00:27:47] algorithmic severity measure alpe it narrows the pain gap more than does [00:27:48] narrows the pain gap more than does controlling for this clinical severity [00:27:50] controlling for this clinical severity measure klg [00:27:52] measure klg it implies that the clinical severity [00:27:53] it implies that the clinical severity score is overlooking knee features which [00:27:55] score is overlooking knee features which might explain disadvantaged patients [00:27:57] might explain disadvantaged patients higher pain levels [00:28:01] before i go to the results any any [00:28:03] before i go to the results any any questions about sort of the setup [00:28:10] okay [00:28:11] okay um it's in terms of comparing the pain [00:28:13] um it's in terms of comparing the pain gaps between different factors like [00:28:15] gaps between different factors like income and race do we have to consider [00:28:17] income and race do we have to consider overlap between the groups [00:28:20] overlap between the groups yeah that's a great question um [00:28:23] yeah that's a great question um there is overlap between the groups [00:28:24] there is overlap between the groups there's correlation between all three of [00:28:26] there's correlation between all three of these binary variables [00:28:28] these binary variables each of the individual pain gaps remains [00:28:30] each of the individual pain gaps remains statistically significant even when you [00:28:32] statistically significant even when you control for all three at once [00:28:34] control for all three at once uh you could probably do an analysis [00:28:36] uh you could probably do an analysis where you sort of controlled for all [00:28:37] where you sort of controlled for all three at once so that might be an [00:28:38] three at once so that might be an interesting thing to do here to kind of [00:28:40] interesting thing to do here to kind of keep the exposition as as clear as [00:28:42] keep the exposition as as clear as possible we looked at each group [00:28:43] possible we looked at each group separately but yeah it's a good point [00:28:45] separately but yeah it's a good point they're definitely correlated [00:28:50] great i think that's it for now [00:28:54] okay so our first result is that the [00:28:56] okay so our first result is that the algorithm does in fact find additional [00:28:58] algorithm does in fact find additional signal for pain in the knee x-ray the [00:29:00] signal for pain in the knee x-ray the algorithmic severity score alk p [00:29:01] algorithmic severity score alk p predicts pain better than the clinician [00:29:03] predicts pain better than the clinician severity score klg the r squared is [00:29:05] severity score klg the r squared is higher the difference is statistically [00:29:07] higher the difference is statistically significant and you see similar results [00:29:08] significant and you see similar results for other predictive measures [00:29:10] for other predictive measures but like those r squareds are really not [00:29:12] but like those r squareds are really not that high right r squared ranges from [00:29:14] that high right r squared ranges from zero to one if we're at 0.16 that's not [00:29:16] zero to one if we're at 0.16 that's not all that high [00:29:17] all that high um and it's not the central question of [00:29:18] um and it's not the central question of our analysis anyway which is does [00:29:20] our analysis anyway which is does controlling for the algorithmic severity [00:29:22] controlling for the algorithmic severity score reduce the pain gap [00:29:26] score reduce the pain gap and it turns out that the answer to that [00:29:27] and it turns out that the answer to that second and more important question is [00:29:29] second and more important question is also yes [00:29:31] also yes so here the first column is just what i [00:29:33] so here the first column is just what i showed you before it says when you [00:29:35] showed you before it says when you control for klg the pain gap doesn't get [00:29:37] control for klg the pain gap doesn't get that much smaller [00:29:38] that much smaller but the second column is new it says [00:29:40] but the second column is new it says when you control for the algorithm [00:29:42] when you control for the algorithm severity score lp how much smaller does [00:29:44] severity score lp how much smaller does the pain gap get the final column gives [00:29:46] the pain gap get the final column gives the ratio of the two columns so for [00:29:48] the ratio of the two columns so for arrays for example you can see that the [00:29:50] arrays for example you can see that the algorithm explains 43 of the pain gap [00:29:52] algorithm explains 43 of the pain gap while klg explains only nine percent the [00:29:55] while klg explains only nine percent the ratio of those two point two numbers is [00:29:56] ratio of those two point two numbers is four point seven [00:29:58] four point seven the overall implication is that yes [00:30:00] the overall implication is that yes there is overlooked signal in the knee [00:30:02] there is overlooked signal in the knee x-ray which helps explain disadvantaged [00:30:04] x-ray which helps explain disadvantaged patients higher pain so this supports [00:30:06] patients higher pain so this supports the in the knees hypothesis [00:30:12] uh yes you should never fit a neural net [00:30:14] uh yes you should never fit a neural net without doing a lot of robustness checks [00:30:16] without doing a lot of robustness checks um [00:30:17] um whatever current computer science [00:30:18] whatever current computer science practice may be uh and so we do a lot of [00:30:20] practice may be uh and so we do a lot of them i'm not going to talk about them [00:30:21] them i'm not going to talk about them now but happy to talk about them more [00:30:22] now but happy to talk about them more later if people have specific questions [00:30:26] later if people have specific questions i do though just want to talk about two [00:30:27] i do though just want to talk about two accessory results uh the first is that a [00:30:30] accessory results uh the first is that a diverse data set improves uh performance [00:30:34] diverse data set improves uh performance specifically we compare training the [00:30:36] specifically we compare training the model on a non-diverse train set from [00:30:38] model on a non-diverse train set from which we've removed all black patients [00:30:40] which we've removed all black patients to a diverse train set from which we've [00:30:42] to a diverse train set from which we've removed the same number of non-black [00:30:44] removed the same number of non-black patients [00:30:45] patients um and so the size of the train set [00:30:47] um and so the size of the train set remains the same we've just altered the [00:30:48] remains the same we've just altered the racial diversity of it and what we find [00:30:50] racial diversity of it and what we find is that while both models beat klg using [00:30:53] is that while both models beat klg using a diverse train set further boosts [00:30:54] a diverse train set further boosts performance you get a better r squared [00:30:56] performance you get a better r squared you get a bigger reduction in the pain [00:30:57] you get a bigger reduction in the pain gap you see similar results for income [00:30:59] gap you see similar results for income and education as well so to put this [00:31:01] and education as well so to put this within broader context the sort of ai in [00:31:03] within broader context the sort of ai in medicine [00:31:04] medicine there's been a lot of concern that the [00:31:06] there's been a lot of concern that the training data sets may not be [00:31:07] training data sets may not be sufficiently diverse um you know and [00:31:10] sufficiently diverse um you know and this is actually more broadly true than [00:31:12] this is actually more broadly true than ai in medicine this is true in medicine [00:31:13] ai in medicine this is true in medicine full stop um and this sort of testifies [00:31:16] full stop um and this sort of testifies to [00:31:17] to the importance of collecting diverse [00:31:19] the importance of collecting diverse diverse data [00:31:22] and then finally to speak about the [00:31:24] and then finally to speak about the clinical implications as i said one of [00:31:26] clinical implications as i said one of the clinical implications of having good [00:31:28] the clinical implications of having good severity scores is that it influences [00:31:30] severity scores is that it influences the way surgery is allocated so we [00:31:32] the way surgery is allocated so we decide to test how would using [00:31:33] decide to test how would using algorithmic pain scores affect the way [00:31:35] algorithmic pain scores affect the way surgery is allocated [00:31:37] surgery is allocated now to test the way surgery is allocated [00:31:39] now to test the way surgery is allocated we replicate a previous study and we say [00:31:41] we replicate a previous study and we say we're going to assume knee surgery is [00:31:42] we're going to assume knee surgery is given to patients with high pain and [00:31:44] given to patients with high pain and severe disease so two you have to [00:31:45] severe disease so two you have to satisfy two criteria [00:31:47] satisfy two criteria and we try measuring severity in two [00:31:49] and we try measuring severity in two different ways using klg the clinician [00:31:51] different ways using klg the clinician severity score and using alcp the [00:31:53] severity score and using alcp the algorithm severity score and we find [00:31:55] algorithm severity score and we find that because alk p gives disadvantaged [00:31:57] that because alk p gives disadvantaged patients higher severity scores it's in [00:31:59] patients higher severity scores it's in turn more likely to recommend them for [00:32:01] turn more likely to recommend them for surgery for example among black patients [00:32:04] surgery for example among black patients roughly twice as many knees were [00:32:05] roughly twice as many knees were eligible for surgery when using the [00:32:07] eligible for surgery when using the algorithm's severity measure as opposed [00:32:09] algorithm's severity measure as opposed to klgs [00:32:12] so to summarize we trained a deep [00:32:13] so to summarize we trained a deep learning algorithm to predict pain from [00:32:15] learning algorithm to predict pain from knee x-rays our algorithm finds [00:32:17] knee x-rays our algorithm finds overlooked signal in the x-ray which [00:32:18] overlooked signal in the x-ray which helps explain disadvantaged patients [00:32:20] helps explain disadvantaged patients higher pain and a clinical implication [00:32:22] higher pain and a clinical implication is that these disadvantaged groups may [00:32:23] is that these disadvantaged groups may be under referred for surgery [00:32:27] be under referred for surgery to put this within broader context of [00:32:29] to put this within broader context of sort of ai and medicine and ai fairness [00:32:31] sort of ai and medicine and ai fairness there's been a lot of previous and very [00:32:33] there's been a lot of previous and very important work on how machine learning [00:32:34] important work on how machine learning methods can potentially increase [00:32:36] methods can potentially increase disparities in medicine and in other [00:32:38] disparities in medicine and in other high-stakes domains and that's super [00:32:40] high-stakes domains and that's super important [00:32:42] important but we should also keep the more [00:32:43] but we should also keep the more optimistic flip side in mind that [00:32:45] optimistic flip side in mind that machine learning and ai you know they [00:32:46] machine learning and ai you know they give us predictive superpowers and they [00:32:48] give us predictive superpowers and they shouldn't inherently be a bad thing if [00:32:50] shouldn't inherently be a bad thing if we're wise enough to apply them properly [00:32:52] we're wise enough to apply them properly specifically here we show how machine [00:32:54] specifically here we show how machine learning methods can also reduce [00:32:55] learning methods can also reduce disparities by detecting signal that [00:32:57] disparities by detecting signal that humans miss [00:32:58] humans miss key to our results here key to reducing [00:33:00] key to our results here key to reducing rather than two increasing disparities [00:33:02] rather than two increasing disparities is first the choice of the prediction [00:33:04] is first the choice of the prediction test so we didn't just try and replicate [00:33:06] test so we didn't just try and replicate clinical knowledge [00:33:07] clinical knowledge and second we train the model on a [00:33:09] and second we train the model on a diverse data set and we show that that [00:33:11] diverse data set and we show that that contributes to our results [00:33:13] contributes to our results any questions about this before i go to [00:33:15] any questions about this before i go to the third and final story [00:33:17] the third and final story yeah so we have a question from [00:33:20] yeah so we have a question from the first uh first section here slides [00:33:23] the first uh first section here slides um so [00:33:24] um so can the bayesian threshold test be [00:33:26] can the bayesian threshold test be applied where the observed data is the [00:33:28] applied where the observed data is the output of an algorithm [00:33:31] output of an algorithm the observed [00:33:33] the observed uh i mean you would have to give me more [00:33:35] uh i mean you would have to give me more details but i'm intrigued i mean there's [00:33:37] details but i'm intrigued i mean there's nothing [00:33:39] nothing it's designed to assess bias in in [00:33:42] it's designed to assess bias in in decision making so whether the decision [00:33:44] decision making so whether the decision maker is human or algorithmic there's [00:33:46] maker is human or algorithmic there's you know you could apply it to both i [00:33:47] you know you could apply it to both i would say [00:33:50] would say in the case of an algorithm you know [00:33:51] in the case of an algorithm you know it's likely that you know [00:33:54] it's likely that you know like at least in principle someone knows [00:33:55] like at least in principle someone knows the threshold right so it might be [00:33:57] the threshold right so it might be easier to just like figure out the [00:33:59] easier to just like figure out the actual source code or procedure behind [00:34:01] actual source code or procedure behind the algorithm rather than attempting to [00:34:02] the algorithm rather than attempting to infer it but there still might be some [00:34:04] infer it but there still might be some algorithmic settings where you don't [00:34:05] algorithmic settings where you don't know that threshold for example it's [00:34:07] know that threshold for example it's some third-party company and they won't [00:34:09] some third-party company and they won't tell you what they're doing and then in [00:34:11] tell you what they're doing and then in principle you might want to apply it [00:34:12] principle you might want to apply it here [00:34:14] here and then on the on the line of [00:34:17] and then on the on the line of kind of [00:34:18] kind of determining whether or not something is [00:34:20] determining whether or not something is discriminatory or biased um [00:34:22] discriminatory or biased um what metric would you suggest for [00:34:24] what metric would you suggest for testing if something like compass is [00:34:26] testing if something like compass is discriminatory [00:34:28] discriminatory so how do you know if in algorithms [00:34:30] so how do you know if in algorithms uh that [00:34:32] uh that that's a big question i you know i i [00:34:34] that's a big question i you know i i would say [00:34:35] would say it is highly context dependent um you [00:34:39] it is highly context dependent um you know if you observe large disparities in [00:34:42] know if you observe large disparities in things like you know in the case of [00:34:43] things like you know in the case of compass you see these big disparities in [00:34:44] compass you see these big disparities in like fpr and tpr fall spots are right to [00:34:46] like fpr and tpr fall spots are right to a positive rate um that should certainly [00:34:48] a positive rate um that should certainly be a red flag that you want to dig [00:34:50] be a red flag that you want to dig deeper on but then you want to try and [00:34:51] deeper on but then you want to try and understand like why are these things to [00:34:53] understand like why are these things to rising and how can i ameliorate the [00:34:55] rising and how can i ameliorate the situation i don't think [00:34:57] situation i don't think i i would not say like in all cases use [00:34:59] i i would not say like in all cases use auc and that is your golden answer you [00:35:02] auc and that is your golden answer you know no i don't think so [00:35:07] should i go [00:35:09] should i go yeah i think you're good to go [00:35:11] yeah i think you're good to go okay cool so um now i'm going to move to [00:35:13] okay cool so um now i'm going to move to our final story on inequality um this is [00:35:15] our final story on inequality um this is joint work with serena and with pangwai [00:35:16] joint work with serena and with pangwai so i'm a little nervous because because [00:35:18] so i'm a little nervous because because palmer will actually know if the details [00:35:19] palmer will actually know if the details are wrong here um uh and then with so so [00:35:23] are wrong here um uh and then with so so serena is a computer science phd in [00:35:24] serena is a computer science phd in ura's lab [00:35:26] ura's lab and then [00:35:27] and then we also worked with uh jaleen who's an [00:35:29] we also worked with uh jaleen who's an epidemiologist at northwestern uh and [00:35:31] epidemiologist at northwestern uh and then we also worked with beth and david [00:35:33] then we also worked with beth and david who are sociologists and then with euray [00:35:36] who are sociologists and then with euray who's a computer scientist so it's very [00:35:37] who's a computer scientist so it's very interdisciplinary work because we're [00:35:39] interdisciplinary work because we're sort of studying inequality in copen 19 [00:35:41] sort of studying inequality in copen 19 so intuitively it sort of draws on on [00:35:43] so intuitively it sort of draws on on people in a bunch of different domains [00:35:46] okay [00:35:47] okay so as you know viruses like covet 19 [00:35:50] so as you know viruses like covet 19 spread through human contact that's why [00:35:52] spread through human contact that's why i'm giving this talk remotely rather [00:35:53] i'm giving this talk remotely rather than in person [00:35:55] than in person which is to say there is an underlying [00:35:56] which is to say there is an underlying contact network which modulates the [00:35:58] contact network which modulates the spread of the virus [00:36:02] so under a simple epidemiological model [00:36:04] so under a simple epidemiological model an infected person can infect anyone she [00:36:06] an infected person can infect anyone she comes into contact with with some [00:36:08] comes into contact with with some probability [00:36:09] probability those people then infect their contacts [00:36:11] those people then infect their contacts and then you get this you know sort of [00:36:13] and then you get this you know sort of incredible spread of the disease across [00:36:15] incredible spread of the disease across the network [00:36:18] the network so because this network is so important [00:36:20] so because this network is so important to the spread of the disease current [00:36:22] to the spread of the disease current models often attempt to estimate it in [00:36:24] models often attempt to estimate it in some way so they can simulate the spread [00:36:26] some way so they can simulate the spread of the virus but they often have to use [00:36:28] of the virus but they often have to use simplistic estimates of the underlying [00:36:30] simplistic estimates of the underlying contact networks because intuitively [00:36:32] contact networks because intuitively it's very hard to know who everyone [00:36:33] it's very hard to know who everyone comes into contact with unless you're [00:36:34] comes into contact with unless you're living in some kind of surveillance [00:36:36] living in some kind of surveillance state [00:36:37] state so people do this in various ways they [00:36:39] so people do this in various ways they might assume for example that anyone can [00:36:40] might assume for example that anyone can infect anyone so the network is fully [00:36:42] infect anyone so the network is fully connected [00:36:43] connected or you might use some kind of network [00:36:45] or you might use some kind of network which captures trends at a very macro [00:36:47] which captures trends at a very macro level for example an airline network [00:36:49] level for example an airline network which connects city to city but doesn't [00:36:50] which connects city to city but doesn't tell you anything about the network [00:36:52] tell you anything about the network within a city or you might use [00:36:54] within a city or you might use historical data and say i'm just going [00:36:56] historical data and say i'm just going to assume that what patterns looked like [00:36:58] to assume that what patterns looked like in 2016 are what they look like now [00:37:02] in 2016 are what they look like now intuitively though having really crude [00:37:04] intuitively though having really crude estimates of the contact network is not [00:37:06] estimates of the contact network is not enough for a couple reasons the first is [00:37:08] enough for a couple reasons the first is that we're undergoing an incredibly [00:37:10] that we're undergoing an incredibly dramatic change in human mobility you [00:37:12] dramatic change in human mobility you know probably in any of our lifetimes [00:37:13] know probably in any of our lifetimes hopefully in any of our future lifetimes [00:37:15] hopefully in any of our future lifetimes also right we have these stay-at-home [00:37:17] also right we have these stay-at-home orders reopening policies like [00:37:18] orders reopening policies like everything is crazy [00:37:21] everything is crazy and the second is that we often want to [00:37:23] and the second is that we often want to find or ask very fine-grained questions [00:37:25] find or ask very fine-grained questions um that depend on mobility in a very [00:37:27] um that depend on mobility in a very fine-grained way for example we might [00:37:29] fine-grained way for example we might want to know the impact of fine-grained [00:37:31] want to know the impact of fine-grained re-opening policies like what happens if [00:37:32] re-opening policies like what happens if i open [00:37:34] i open i open restaurants [00:37:35] i open restaurants you know from 3 to 4 p.m on saturdays [00:37:38] you know from 3 to 4 p.m on saturdays but not on wednesdays or something like [00:37:40] but not on wednesdays or something like this [00:37:40] this we also might want to understand [00:37:42] we also might want to understand inequality in infections by race or by [00:37:45] inequality in infections by race or by socioeconomic status due to mobility [00:37:48] socioeconomic status due to mobility patterns and intuitively if we want to [00:37:50] patterns and intuitively if we want to do that we need to understand mobility [00:37:52] do that we need to understand mobility at a fine grained level simply [00:37:54] at a fine grained level simply understanding how new york is connected [00:37:56] understanding how new york is connected to la won't be very useful to helping me [00:37:59] to la won't be very useful to helping me understand disparities in infection [00:38:01] understand disparities in infection rates within new york for example [00:38:02] rates within new york for example between rich and poor new york [00:38:04] between rich and poor new york neighborhoods [00:38:07] so because we have to understand this [00:38:09] so because we have to understand this mobility network in a fine-grained way [00:38:11] mobility network in a fine-grained way our approach is a two-step approach in [00:38:13] our approach is a two-step approach in the first step we're going to try and [00:38:15] the first step we're going to try and estimate the human contact mobility [00:38:17] estimate the human contact mobility network and then we're going to try and [00:38:19] network and then we're going to try and build a model to capture transmission on [00:38:21] build a model to capture transmission on this network so let's talk about each of [00:38:23] this network so let's talk about each of these steps in turn [00:38:26] so how do we estimate this network well [00:38:28] so how do we estimate this network well we're going to use cell phone mobility [00:38:30] we're going to use cell phone mobility data from a company called safegraph [00:38:32] data from a company called safegraph specifically that data is going to tell [00:38:34] specifically that data is going to tell us how many hourly visits there are from [00:38:36] us how many hourly visits there are from a neighborhood to a place what do i mean [00:38:38] a neighborhood to a place what do i mean by neighborhood uh this is like a census [00:38:41] by neighborhood uh this is like a census block group which you can think of as a [00:38:42] block group which you can think of as a fairly fine grain census area with a [00:38:44] fairly fine grain census area with a couple hundred to a couple thousand [00:38:46] couple hundred to a couple thousand people [00:38:47] people a place which i'll refer to as a poi [00:38:49] a place which i'll refer to as a poi throughout the talk is a point of [00:38:51] throughout the talk is a point of interest like a restaurant or a cafe or [00:38:53] interest like a restaurant or a cafe or a religious establishment you can think [00:38:55] a religious establishment you can think of them broadly as places people go when [00:38:57] of them broadly as places people go when they're not at home [00:38:59] they're not at home so our cell phone mobility data set [00:39:01] so our cell phone mobility data set basically gives us some sense of the [00:39:02] basically gives us some sense of the number of hourly visits from a [00:39:04] number of hourly visits from a neighborhood to a place [00:39:08] so mathematically what we're going to [00:39:09] so mathematically what we're going to try and estimate is a neighborhood is a [00:39:12] try and estimate is a neighborhood is a network that links cbgs neighborhoods to [00:39:14] network that links cbgs neighborhoods to pois places so you can think of this in [00:39:17] pois places so you can think of this in various ways you could think of it as a [00:39:18] various ways you could think of it as a list of matrices a list of networks [00:39:21] list of matrices a list of networks where each network [00:39:22] where each network each network represents sort of traffic [00:39:24] each network represents sort of traffic at one hour or you could think of it as [00:39:26] at one hour or you could think of it as like a three-dimensional cube [00:39:28] like a three-dimensional cube where the dimensions are sort of [00:39:29] where the dimensions are sort of neighborhoods places and and time slices [00:39:32] neighborhoods places and and time slices but that's the object we're going to try [00:39:34] but that's the object we're going to try to estimate [00:39:37] the problem we run into though is that [00:39:39] the problem we run into though is that the cell phone data that safegraph [00:39:41] the cell phone data that safegraph provides doesn't actually provide us [00:39:43] provides doesn't actually provide us with an exact estimate of that hourly [00:39:45] with an exact estimate of that hourly network [00:39:46] network the data they give us for the number of [00:39:48] the data they give us for the number of visits from cbgs to pois is only at a [00:39:51] visits from cbgs to pois is only at a weekly or monthly level because of the [00:39:52] weekly or monthly level because of the way they aggregate their data and they [00:39:54] way they aggregate their data and they also censor it for privacy reasons [00:39:57] also censor it for privacy reasons so in terms of the actual data that we [00:39:59] so in terms of the actual data that we have we have the number of hourly people [00:40:02] have we have the number of hourly people going to each poi the number of hourly [00:40:04] going to each poi the number of hourly people leaving each cbg [00:40:06] people leaving each cbg and then we have a noisy estimate of the [00:40:08] and then we have a noisy estimate of the networks connecting pois to cbgs so you [00:40:11] networks connecting pois to cbgs so you can sort of think of it as the number of [00:40:12] can sort of think of it as the number of people going out the number of people [00:40:14] people going out the number of people coming in and then annoys the estimate [00:40:16] coming in and then annoys the estimate of the matrix linking going out to [00:40:18] of the matrix linking going out to coming in [00:40:21] now it turns out luckily that there is a [00:40:23] now it turns out luckily that there is a machine learning algorithm uh which is [00:40:25] machine learning algorithm uh which is designed exactly for this scenario and [00:40:27] designed exactly for this scenario and which you will learn about if you're [00:40:29] which you will learn about if you're lucky enough to work with pong we and [00:40:30] lucky enough to work with pong we and the other people in percy's lab this is [00:40:32] the other people in percy's lab this is very much pangwai's work and this is [00:40:33] very much pangwai's work and this is very fundamental to this to this project [00:40:36] very fundamental to this to this project and it's called iterative proportional [00:40:37] and it's called iterative proportional fitting [00:40:38] fitting and basically it's designed for exactly [00:40:40] and basically it's designed for exactly this setting it says let's imagine that [00:40:42] this setting it says let's imagine that you're trying to estimate some matrix [00:40:44] you're trying to estimate some matrix and you know the row sums of that matrix [00:40:46] and you know the row sums of that matrix and you know the column sums of that [00:40:47] and you know the column sums of that matrix and then you have a noisy [00:40:49] matrix and then you have a noisy estimate of the matrix itself [00:40:51] estimate of the matrix itself ipf is an algorithm that will give you [00:40:53] ipf is an algorithm that will give you back a matrix which is consistent with [00:40:55] back a matrix which is consistent with those row sums and column sums and [00:40:57] those row sums and column sums and subject to that constraint is as similar [00:40:59] subject to that constraint is as similar as possible in terms of kl divergence to [00:41:01] as possible in terms of kl divergence to the initial noisy matrix and that's [00:41:04] the initial noisy matrix and that's exactly the setting that we're operating [00:41:05] exactly the setting that we're operating in here so we use ipf to estimate the [00:41:08] in here so we use ipf to estimate the true mobility networks from the noisy [00:41:10] true mobility networks from the noisy safe graph data [00:41:14] so that's a little mathy a little [00:41:15] so that's a little mathy a little abstract let's sort of give you a [00:41:17] abstract let's sort of give you a picture [00:41:18] picture so here what we're showing you is an [00:41:19] so here what we're showing you is an example from the chicago msa [00:41:22] example from the chicago msa and we're showing you from two slices [00:41:24] and we're showing you from two slices two time slices uh the first time slice [00:41:26] two time slices uh the first time slice on the left comes from early march and [00:41:27] on the left comes from early march and the second comes from early april uh [00:41:30] the second comes from early april uh after social distancing measures have [00:41:31] after social distancing measures have started to take effect [00:41:33] started to take effect and the gray lines here represent the [00:41:35] and the gray lines here represent the number of hourly visits from a cbg to a [00:41:37] number of hourly visits from a cbg to a poi [00:41:38] poi so you can see two things from this this [00:41:40] so you can see two things from this this visualization uh first is that the [00:41:43] visualization uh first is that the density of the gray lines decreases [00:41:44] density of the gray lines decreases indicating that total mobility has [00:41:46] indicating that total mobility has decreased from march till april [00:41:48] decreased from march till april and the second is that most of the lines [00:41:50] and the second is that most of the lines are vertical indicating that people [00:41:51] are vertical indicating that people mostly hang around their own homes and [00:41:53] mostly hang around their own homes and that makes sense [00:41:57] okay so now we got our network [00:41:59] okay so now we got our network honestly if you didn't understand any of [00:42:00] honestly if you didn't understand any of the matthew that that's fine the the [00:42:02] the matthew that that's fine the the main point is we have a network linking [00:42:03] main point is we have a network linking pois to cbgs at an hourly level now we [00:42:06] pois to cbgs at an hourly level now we have to put a disease transmission model [00:42:08] have to put a disease transmission model on top of this network uh and this [00:42:09] on top of this network uh and this relies on a pretty simple uh [00:42:11] relies on a pretty simple uh epidemiological model and i'm going to [00:42:12] epidemiological model and i'm going to give you a 30 second crash course in [00:42:14] give you a 30 second crash course in epidemiology and then you'll know about [00:42:16] epidemiology and then you'll know about as much as i do about epidemiology so [00:42:18] as much as i do about epidemiology so let's describe the model now [00:42:20] let's describe the model now so a very standard model in epidemiology [00:42:22] so a very standard model in epidemiology is called an s-e-i-r model and probably [00:42:25] is called an s-e-i-r model and probably some of you have heard of this if you've [00:42:26] some of you have heard of this if you've been [00:42:27] been reading the news [00:42:28] reading the news and the basic idea is that people move [00:42:31] and the basic idea is that people move through four states in that order [00:42:32] through four states in that order s-e-i-r you can't go in any other order [00:42:34] s-e-i-r you can't go in any other order you can't go back and loops so how does [00:42:36] you can't go back and loops so how does this work you start at the beginning [00:42:38] this work you start at the beginning before a disease has entered a [00:42:39] before a disease has entered a population you start in the susceptible [00:42:41] population you start in the susceptible state which is to say you don't have the [00:42:43] state which is to say you don't have the disease you've never had the disease but [00:42:45] disease you've never had the disease but you're susceptible to it [00:42:46] you're susceptible to it now if you come into contact with [00:42:48] now if you come into contact with someone who's infectious you can move to [00:42:49] someone who's infectious you can move to the exposed state uh which is to say you [00:42:52] the exposed state uh which is to say you now have the virus but you're not [00:42:54] now have the virus but you're not infectious yourself yet so it's sort of [00:42:56] infectious yourself yet so it's sort of in in your body but at low levels [00:42:58] in in your body but at low levels now after some period of time you move [00:43:00] now after some period of time you move from exposed to infectious meaning you [00:43:03] from exposed to infectious meaning you have it and you can infect other people [00:43:05] have it and you can infect other people and then after some further period of [00:43:06] and then after some further period of time you move to the removed state which [00:43:08] time you move to the removed state which is to say you no longer have the disease [00:43:10] is to say you no longer have the disease you can't catch the disease maybe you've [00:43:12] you can't catch the disease maybe you've recovered maybe you've died but at any [00:43:14] recovered maybe you've died but at any point in any case you can't catch it [00:43:15] point in any case you can't catch it again [00:43:16] again so what we're going to do is we're going [00:43:18] so what we're going to do is we're going to say we're at each hour of our [00:43:20] to say we're at each hour of our simulation for each neighborhood each [00:43:23] simulation for each neighborhood each cpg in our simulation we're going to [00:43:25] cpg in our simulation we're going to model the fraction of people in each of [00:43:27] model the fraction of people in each of these four states so we might say in [00:43:29] these four states so we might say in neighborhood five at hour four ninety [00:43:32] neighborhood five at hour four ninety percent of people are in the susceptible [00:43:33] percent of people are in the susceptible state seven percent are in the exposed [00:43:35] state seven percent are in the exposed state one percent are in the infectious [00:43:36] state one percent are in the infectious state and two percent are in the removed [00:43:38] state and two percent are in the removed state [00:43:39] state and then we're going to update that hour [00:43:41] and then we're going to update that hour by hour [00:43:44] so we have to model transitions between [00:43:46] so we have to model transitions between these four states [00:43:48] these four states two of the transitions the last two [00:43:50] two of the transitions the last two transitions are pretty straightforward [00:43:51] transitions are pretty straightforward and boring and don't depend on mobility [00:43:53] and boring and don't depend on mobility we just say at each time step you have [00:43:55] we just say at each time step you have some constant chance of transitioning to [00:43:57] some constant chance of transitioning to the next step [00:44:00] but intuitively the first transition [00:44:02] but intuitively the first transition that s to e transition is going to [00:44:04] that s to e transition is going to depend a lot on mobility because whether [00:44:07] depend a lot on mobility because whether or not you get sick depends on whom you [00:44:09] or not you get sick depends on whom you come into contact with [00:44:12] so how do we model this critical s to e [00:44:14] so how do we model this critical s to e transition [00:44:15] transition we assume that infections can occur in [00:44:17] we assume that infections can occur in two ways at cbgs and at pois you can [00:44:21] two ways at cbgs and at pois you can think of cbg infections as like you're [00:44:23] think of cbg infections as like you're just hanging around your house but [00:44:24] just hanging around your house but unfortunately someone in your house is [00:44:26] unfortunately someone in your house is sick and so now you're sick you can [00:44:27] sick and so now you're sick you can think of poi infections as you went out [00:44:29] think of poi infections as you went out to a bar there was someone in the bar [00:44:31] to a bar there was someone in the bar who was sick and now you yourself are [00:44:32] who was sick and now you yourself are sick [00:44:35] so we assume that the cbg infection rate [00:44:37] so we assume that the cbg infection rate is just proportional to the fraction of [00:44:39] is just proportional to the fraction of a cvg which is infected intuitively if [00:44:42] a cvg which is infected intuitively if more people are in your neighborhood or [00:44:43] more people are in your neighborhood or sick it's more likely that you yourself [00:44:45] sick it's more likely that you yourself will get sick [00:44:47] will get sick the poi infection rate is a little bit [00:44:49] the poi infection rate is a little bit more complicated we assume that the [00:44:52] more complicated we assume that the probability of getting infected at a poi [00:44:53] probability of getting infected at a poi is proportional to the fraction of the [00:44:55] is proportional to the fraction of the poi which is infected times a poi [00:44:58] poi which is infected times a poi specific factor which is capturing sort [00:45:01] specific factor which is capturing sort of specific features about the poi like [00:45:03] of specific features about the poi like how big it is how long people stay there [00:45:05] how big it is how long people stay there um and so intuitively places that are [00:45:07] um and so intuitively places that are sort of smaller and more crowded are [00:45:09] sort of smaller and more crowded are more dangerous and that's what this this [00:45:11] more dangerous and that's what this this part of the simulation is capturing [00:45:18] a nice thing about this model is that [00:45:20] a nice thing about this model is that it's relatively simple for each city [00:45:22] it's relatively simple for each city we're only going to have three free [00:45:23] we're only going to have three free parameters which remain fixed over time [00:45:25] parameters which remain fixed over time in spite of the dramatic changes in [00:45:27] in spite of the dramatic changes in human mobility those three frame free [00:45:29] human mobility those three frame free parameters are going to scale those two [00:45:31] parameters are going to scale those two types of infections infections at cbgs [00:45:34] types of infections infections at cbgs and infections at pois and then we're [00:45:36] and infections at pois and then we're also going to have a parameter which [00:45:38] also going to have a parameter which scales sort of the initial conditions in [00:45:40] scales sort of the initial conditions in the model what fraction of people [00:45:41] the model what fraction of people started infected the rest of the [00:45:43] started infected the rest of the parameters we're just going to take from [00:45:44] parameters we're just going to take from the prior literature we're not going to [00:45:46] the prior literature we're not going to estimate them at all and this is [00:45:48] estimate them at all and this is important because it minimizes concerns [00:45:49] important because it minimizes concerns about overfitting [00:45:53] okay [00:45:54] okay how are we actually going to choose [00:45:55] how are we actually going to choose those three free parameters we're going [00:45:57] those three free parameters we're going to do what's called grid search we're [00:45:58] to do what's called grid search we're going to look over all possible [00:46:00] going to look over all possible parameter combinations of those three [00:46:02] parameter combinations of those three free parameters for each city [00:46:04] free parameters for each city how are we going to choose which one is [00:46:05] how are we going to choose which one is best well we're going to take real covet [00:46:08] best well we're going to take real covet case data the number of coveted cases [00:46:10] case data the number of coveted cases every day from the new york times and [00:46:12] every day from the new york times and we're going to keep the parameter [00:46:13] we're going to keep the parameter combination which gives us the best fit [00:46:15] combination which gives us the best fit to real cases in terms of arma c [00:46:18] to real cases in terms of arma c now in order to capture uncertainty in [00:46:20] now in order to capture uncertainty in the parameters we're actually not just [00:46:22] the parameters we're actually not just going to show results from that best fit [00:46:24] going to show results from that best fit set of parameters we're also going to [00:46:26] set of parameters we're also going to use all parameter settings which yield [00:46:28] use all parameter settings which yield an rmse within 20 of that best fit rmse [00:46:31] an rmse within 20 of that best fit rmse and that kind of captures the idea that [00:46:33] and that kind of captures the idea that like look our parameters are somewhat [00:46:35] like look our parameters are somewhat uncertain here uh and you know we want [00:46:37] uncertain here uh and you know we want to capture that uncertainty some of you [00:46:39] to capture that uncertainty some of you might be thinking like oh you know i [00:46:40] might be thinking like oh you know i think beijing inference or something [00:46:42] think beijing inference or something might be a more principled way to do [00:46:43] might be a more principled way to do this totally agree uh please figure it [00:46:46] this totally agree uh please figure it out and write to us i think that would [00:46:47] out and write to us i think that would be awesome um and in terms of the time [00:46:49] be awesome um and in terms of the time period we're going to model oh but like [00:46:51] period we're going to model oh but like why didn't we do that because it was [00:46:53] why didn't we do that because it was computed it's computationally difficult [00:46:55] computed it's computationally difficult as it is to fit this model and so we [00:46:56] as it is to fit this model and so we just weren't you know with that that [00:46:58] just weren't you know with that that would have been a further computational [00:46:59] would have been a further computational difficulty but i think it's an [00:47:00] difficulty but i think it's an interesting direction for future visions [00:47:02] interesting direction for future visions and friends is only taught next week [00:47:05] and friends is only taught next week oh nice oh so [00:47:06] oh nice oh so hopefully [00:47:08] hopefully they did they understand the fleet stuff [00:47:09] they did they understand the fleet stuff at all whatever okay well you'll [00:47:10] at all whatever okay well you'll understand it even better next week [00:47:11] understand it even better next week that'll be great anyway um [00:47:14] that'll be great anyway um but yeah but next week you can figure [00:47:15] but yeah but next week you can figure this out for us that sounds good um okay [00:47:18] this out for us that sounds good um okay cool anyway we model early march to [00:47:20] cool anyway we model early march to early may um and the reason we choose [00:47:21] early may um and the reason we choose that time period is that's what was [00:47:23] that time period is that's what was available while we were doing this [00:47:24] available while we were doing this analysis [00:47:26] analysis cool uh okay so to make things a little [00:47:28] cool uh okay so to make things a little more concrete i want to show you just a [00:47:30] more concrete i want to show you just a sort of video of how this model looks [00:47:32] sort of video of how this model looks over time is this actually going to work [00:47:34] over time is this actually going to work praying okay [00:47:38] praying okay okay so what's going on here i'll just [00:47:40] okay so what's going on here i'll just talk you through the three graphs in [00:47:41] talk you through the three graphs in turn the graph at right is showing you [00:47:43] turn the graph at right is showing you mobility over time this is not from our [00:47:45] mobility over time this is not from our model it's from the raw data the y-axis [00:47:48] model it's from the raw data the y-axis is the number of visits to pois you can [00:47:50] is the number of visits to pois you can think of it as a measure of overall [00:47:51] think of it as a measure of overall mobility [00:47:52] mobility and you can see that it drops pretty [00:47:54] and you can see that it drops pretty dramatically about three weeks into the [00:47:56] dramatically about three weeks into the simulation so three weeks into march and [00:47:58] simulation so three weeks into march and that's as social distancing takes effect [00:48:01] that's as social distancing takes effect the middle graph is model output it's [00:48:03] the middle graph is model output it's showing you the fraction of people the [00:48:05] showing you the fraction of people the model thinks are in each of the four [00:48:07] model thinks are in each of the four states and it's a logarithmic graph um [00:48:10] states and it's a logarithmic graph um and you can see that the fraction of [00:48:11] and you can see that the fraction of people in the eir states i.e those [00:48:13] people in the eir states i.e those who've had the disease rises over time [00:48:16] who've had the disease rises over time and you can also see sort of how [00:48:18] and you can also see sort of how mobility is feeding into the model for [00:48:20] mobility is feeding into the model for example if you look at that e-state you [00:48:21] example if you look at that e-state you can see sort of a very high frequency [00:48:23] can see sort of a very high frequency wiggle just like there is a very high [00:48:25] wiggle just like there is a very high frequency wiggle in the mobility [00:48:27] frequency wiggle in the mobility patterns those are daily changes in [00:48:29] patterns those are daily changes in mobility so over the course of the day [00:48:31] mobility so over the course of the day and that's basically telling you look [00:48:32] and that's basically telling you look people are more likely to get sick when [00:48:34] people are more likely to get sick when they're going out in the middle of the [00:48:35] they're going out in the middle of the day than in the middle of the night [00:48:38] day than in the middle of the night and finally that graph at the left is [00:48:40] and finally that graph at the left is showing you uh by sort of spatially [00:48:43] showing you uh by sort of spatially geographically where does the model [00:48:45] geographically where does the model think people are most likely to get sick [00:48:47] think people are most likely to get sick redder indicates that a larger fraction [00:48:49] redder indicates that a larger fraction of the population is in one of the [00:48:50] of the population is in one of the infected states and you can see sort of [00:48:53] infected states and you can see sort of that especially red segment in the [00:48:55] that especially red segment in the middle of the city [00:48:56] middle of the city and i'll return to that point in a bit [00:49:01] so [00:49:02] so okay anyone can make pretty graphs does [00:49:04] okay anyone can make pretty graphs does this actually fit the data yes it turns [00:49:06] this actually fit the data yes it turns out it does fit observed case count data [00:49:08] out it does fit observed case count data reasonably well here the orange x's are [00:49:11] reasonably well here the orange x's are reported cases and the blue is the model [00:49:13] reported cases and the blue is the model prediction and that you can see that it [00:49:15] prediction and that you can see that it fits the observed data reasonably well [00:49:17] fits the observed data reasonably well uh even if as in the left plot you only [00:49:20] uh even if as in the left plot you only fit the model on data prior to april and [00:49:22] fit the model on data prior to april and then see how well it performs on data [00:49:24] then see how well it performs on data from april to may it continues to fit [00:49:27] from april to may it continues to fit the data reasonably well [00:49:31] this isn't just true in chicago that's [00:49:33] this isn't just true in chicago that's not a cherry-picked example it fits data [00:49:35] not a cherry-picked example it fits data pretty well across cities [00:49:37] pretty well across cities and it turns out that it also fits the [00:49:39] and it turns out that it also fits the data better than two baselines that we [00:49:41] data better than two baselines that we try comparing to but i think the high [00:49:43] try comparing to but i think the high level point here is not that we have [00:49:44] level point here is not that we have some super duper predictive model the [00:49:46] some super duper predictive model the high level point is like look we have [00:49:48] high level point is like look we have this model that fits the data reasonably [00:49:50] this model that fits the data reasonably well but also enables you to ask very [00:49:52] well but also enables you to ask very fine-grained questions so let's talk [00:49:54] fine-grained questions so let's talk about some of those fine green questions [00:49:56] about some of those fine green questions now what are some of the questions you [00:49:58] now what are some of the questions you can ask with this model [00:50:03] so this model is of [00:50:05] so this model is of what would have happened if we had done [00:50:07] what would have happened if we had done something differently what if we had [00:50:08] something differently what if we had started distancing a week later what if [00:50:10] started distancing a week later what if we had distanced to only 50 as much as [00:50:12] we had distanced to only 50 as much as we actually had [00:50:14] we actually had it can help you ask stuff like what are [00:50:16] it can help you ask stuff like what are the riskiest locations the riskiest [00:50:18] the riskiest locations the riskiest pli's are there pois which are likely to [00:50:20] pli's are there pois which are likely to be super spreader locations because they [00:50:22] be super spreader locations because they have a ton of people [00:50:24] have a ton of people it can help you answer questions like [00:50:26] it can help you answer questions like what's the impact of different reopening [00:50:27] what's the impact of different reopening strategies what happens if you reopen [00:50:29] strategies what happens if you reopen pois only halfway for example like only [00:50:32] pois only halfway for example like only to half of their maximum capacity how do [00:50:34] to half of their maximum capacity how do infection rates look like under that [00:50:36] infection rates look like under that scenario [00:50:37] scenario and finally it can help you [00:50:39] and finally it can help you understand why socioeconomic and racial [00:50:41] understand why socioeconomic and racial disparities arise and today i'm actually [00:50:44] disparities arise and today i'm actually only going to talk about that fourth [00:50:45] only going to talk about that fourth question the rest of the answers to the [00:50:47] question the rest of the answers to the other questions you can find in our [00:50:49] other questions you can find in our paper and i think there are probably [00:50:50] paper and i think there are probably other interesting questions you can ask [00:50:53] other interesting questions you can ask as well um the basic point though is [00:50:55] as well um the basic point though is because the model sort of you know it [00:50:57] because the model sort of you know it models mobility from neighborhoods to [00:50:59] models mobility from neighborhoods to individual places in such a fine grained [00:51:01] individual places in such a fine grained way you can ask a lot of questions that [00:51:03] way you can ask a lot of questions that naturally flow from that fine-grained [00:51:05] naturally flow from that fine-grained mobility network [00:51:08] mobility network okay so let's talk briefly about [00:51:09] okay so let's talk briefly about disparities so we know that socially [00:51:12] disparities so we know that socially disadvantaged racial and socio-economic [00:51:14] disadvantaged racial and socio-economic groups were hit harder by covet 19. [00:51:16] groups were hit harder by covet 19. higher case rates higher death rates [00:51:18] higher case rates higher death rates disparities are very dramatic that's not [00:51:19] disparities are very dramatic that's not our work that's prior work it's very [00:51:21] our work that's prior work it's very very clear very striking [00:51:24] very clear very striking so [00:51:25] so there are a bunch of reasons for this [00:51:26] there are a bunch of reasons for this right it's not all mobility you know [00:51:28] right it's not all mobility you know it's stuff like pre-existing conditions [00:51:29] it's stuff like pre-existing conditions differences in access to care worse care [00:51:31] differences in access to care worse care when they do you know get get into the [00:51:33] when they do you know get get into the hospital [00:51:34] hospital etc but mobility is probably part of it [00:51:37] etc but mobility is probably part of it too we know for example that if you are [00:51:40] too we know for example that if you are of lower socioeconomic status it's [00:51:42] of lower socioeconomic status it's harder for you to work from home more [00:51:43] harder for you to work from home more likely that you're an essential worker [00:51:44] likely that you're an essential worker more likely that you have to go out and [00:51:46] more likely that you have to go out and do these dangerous jobs and you expose [00:51:48] do these dangerous jobs and you expose yourself to risk of infection [00:51:50] yourself to risk of infection so it's interesting to ask first does [00:51:52] so it's interesting to ask first does our model learn [00:51:54] our model learn that disparities you know flow in part [00:51:56] that disparities you know flow in part from mobility like can the model [00:51:58] from mobility like can the model naturally predict the emergence of these [00:51:59] naturally predict the emergence of these disparities and second if it does can it [00:52:02] disparities and second if it does can it expose the mechanisms via which these [00:52:04] expose the mechanisms via which these disparities arise [00:52:06] disparities arise in order to study this we don't actually [00:52:08] in order to study this we don't actually have data on individual people so what [00:52:09] have data on individual people so what we do is we compare neighborhoods we [00:52:11] we do is we compare neighborhoods we compare higher and lower income [00:52:13] compare higher and lower income neighborhoods for example and we look at [00:52:15] neighborhoods for example and we look at how infection rates vary [00:52:19] so a first result is yes the model does [00:52:21] so a first result is yes the model does predict the emergence of these [00:52:23] predict the emergence of these disparities based on mobility patterns [00:52:25] disparities based on mobility patterns alone here the left graph is showing you [00:52:28] alone here the left graph is showing you disparities by income and the right [00:52:30] disparities by income and the right graph is showing you disparities by race [00:52:33] graph is showing you disparities by race on the x-axis what we're plotting is how [00:52:35] on the x-axis what we're plotting is how much likelier are people to get infected [00:52:37] much likelier are people to get infected so for the left graph if you're in a [00:52:39] so for the left graph if you're in a lower income cbg and on the right graph [00:52:42] lower income cbg and on the right graph if you're from a less white cbg [00:52:44] if you're from a less white cbg and you can see basically that all those [00:52:46] and you can see basically that all those boxes all the blue boxes are to the [00:52:48] boxes all the blue boxes are to the right of one indicating that people are [00:52:50] right of one indicating that people are likelier to get infected under the [00:52:52] likelier to get infected under the simulation uh if they're from a lower [00:52:55] simulation uh if they're from a lower income or a less white cbg [00:52:58] income or a less white cbg so the model is predicting these ses and [00:53:00] so the model is predicting these ses and racial disparities socioeconomic and [00:53:01] racial disparities socioeconomic and racial disparities on the basis of [00:53:03] racial disparities on the basis of mobility patterns alone [00:53:06] and because the disparities by [00:53:08] and because the disparities by socioeconomic status are particularly [00:53:09] socioeconomic status are particularly dramatic i'll focus on those for the [00:53:11] dramatic i'll focus on those for the rest of the talk but you can see all the [00:53:12] rest of the talk but you can see all the results for both in the paper [00:53:16] results for both in the paper so why is this happening uh well we show [00:53:19] so why is this happening uh well we show two two mechanisms via which it arises [00:53:21] two two mechanisms via which it arises the first you probably already guessed [00:53:23] the first you probably already guessed it's that people from lower income and [00:53:24] it's that people from lower income and less white cbgs weren't able to reduce [00:53:27] less white cbgs weren't able to reduce their mobility as much uh they had to go [00:53:29] their mobility as much uh they had to go out more and this is probably in part [00:53:31] out more and this is probably in part because of stuff like differences in [00:53:32] because of stuff like differences in occupation they're more likely to be [00:53:34] occupation they're more likely to be essential workers [00:53:36] essential workers but the second mechanism is a little [00:53:38] but the second mechanism is a little subtler [00:53:39] subtler and it's this it's that when they do go [00:53:41] and it's this it's that when they do go out they go to places which are smaller [00:53:44] out they go to places which are smaller and more crowded and therefore more [00:53:45] and more crowded and therefore more dangerous and this is true even within [00:53:47] dangerous and this is true even within the same type of poi so even conditional [00:53:50] the same type of poi so even conditional on i went out i went to a restaurant the [00:53:53] on i went out i went to a restaurant the people coming from lower income cbgs [00:53:55] people coming from lower income cbgs tend to go to restaurants that are [00:53:57] tend to go to restaurants that are smaller and more crowded and more [00:53:58] smaller and more crowded and more dangerous and that's the second thing [00:54:00] dangerous and that's the second thing contributing to these infection rate [00:54:02] contributing to these infection rate disparities [00:54:06] so i want to show an example of this for [00:54:08] so i want to show an example of this for philadelphia which is the place where we [00:54:10] philadelphia which is the place where we see the most striking disparities let's [00:54:12] see the most striking disparities let's see if i can get this to play [00:54:15] see if i can get this to play yeah okay so here this graph on the left [00:54:17] yeah okay so here this graph on the left is showing you philadelphia and it's [00:54:19] is showing you philadelphia and it's showing you results over time and what [00:54:21] showing you results over time and what you can see over time is that you know [00:54:24] you can see over time is that you know this big red spot emerges in the middle [00:54:27] this big red spot emerges in the middle of philadelphia [00:54:28] of philadelphia and where is that well it turns out to [00:54:30] and where is that well it turns out to be the place uh with the highest [00:54:32] be the place uh with the highest population density so that's the top [00:54:34] population density so that's the top right plot and it's also the place with [00:54:36] right plot and it's also the place with the lowest income so this sort of very [00:54:38] the lowest income so this sort of very high density low income area has higher [00:54:41] high density low income area has higher predicted infection rates in our model [00:54:43] predicted infection rates in our model and that's happening because the pois [00:54:44] and that's happening because the pois that people are going out to are smaller [00:54:47] that people are going out to are smaller and more crowded and more dangerous [00:54:52] final um implication is that you know [00:54:54] final um implication is that you know the model can look at sort of the [00:54:55] the model can look at sort of the predicted impact of reopening plans uh [00:54:58] predicted impact of reopening plans uh for people in lower income [00:55:01] for people in lower income deciles as opposed to the population as [00:55:03] deciles as opposed to the population as a whole and basically what we show is [00:55:05] a whole and basically what we show is that often reopening plants have larger [00:55:07] that often reopening plants have larger predicted impacts for people in lower [00:55:09] predicted impacts for people in lower income decile so for sort of for people [00:55:11] income decile so for sort of for people in poorer neighborhoods than for the [00:55:12] in poorer neighborhoods than for the population as a whole so when you do [00:55:14] population as a whole so when you do consider a reopening plan it's important [00:55:16] consider a reopening plan it's important not just to consider the overall impact [00:55:19] not just to consider the overall impact but also the impact on poorer [00:55:21] but also the impact on poorer neighborhoods [00:55:22] neighborhoods and in fact california is starting to [00:55:24] and in fact california is starting to consider doing stuff like this like you [00:55:26] consider doing stuff like this like you have to look at racial disparities and [00:55:27] have to look at racial disparities and reopening and racial disparities and [00:55:29] reopening and racial disparities and impact you can't just look at the impact [00:55:31] impact you can't just look at the impact on the population as a whole this is [00:55:33] on the population as a whole this is also good practice by the way when [00:55:34] also good practice by the way when you're evaluating the impact of an [00:55:36] you're evaluating the impact of an algorithm you shouldn't just look at how [00:55:38] algorithm you shouldn't just look at how it performs on the population as a whole [00:55:39] it performs on the population as a whole you need to also look at how it performs [00:55:41] you need to also look at how it performs on different subgroups [00:55:45] so [00:55:46] so takeaways um this approach sort of [00:55:48] takeaways um this approach sort of showcases the power of fine cream [00:55:49] showcases the power of fine cream mobility networks we show that even a [00:55:51] mobility networks we show that even a simple model leads to accurate fits in [00:55:53] simple model leads to accurate fits in 10 different american cities [00:55:55] 10 different american cities metropolitan statistical areas we show [00:55:58] metropolitan statistical areas we show that it can scale [00:55:59] that it can scale even to sort of large networks with lots [00:56:01] even to sort of large networks with lots of places and lots of people [00:56:03] of places and lots of people we show that you can because you can [00:56:05] we show that you can because you can capture these very micro trends down to [00:56:07] capture these very micro trends down to neighborhoods and locations by the hour [00:56:09] neighborhoods and locations by the hour you can perform these detailed analyses [00:56:11] you can perform these detailed analyses that can potentially inform more [00:56:12] that can potentially inform more equitable analysis uh to cobit 19. um i [00:56:16] equitable analysis uh to cobit 19. um i think a general question that i would [00:56:18] think a general question that i would have for people and i don't know if we [00:56:19] have for people and i don't know if we want to talk about this now or at the [00:56:21] want to talk about this now or at the end or not at all but like what are [00:56:23] end or not at all but like what are other questions you might want to answer [00:56:24] other questions you might want to answer with this model um [00:56:27] with this model um because i think there are a lot of other [00:56:28] because i think there are a lot of other things you can potentially ask beyond [00:56:30] things you can potentially ask beyond what we have asked and i'm curious as to [00:56:32] what we have asked and i'm curious as to your thoughts should we return to that [00:56:33] your thoughts should we return to that point at the end though i don't know how [00:56:34] point at the end though i don't know how we're doing it on time yeah we have 20 [00:56:36] we're doing it on time yeah we have 20 minutes left so maybe we can take some [00:56:38] minutes left so maybe we can take some questions now and then we'll move on [00:56:40] questions now and then we'll move on sounds good [00:56:42] sounds good cool uh so actually [00:56:46] cool uh so actually if [00:56:46] if you guys have working mics do you want [00:56:49] you guys have working mics do you want to read all your own questions [00:56:51] to read all your own questions so uh my question was if you are able to [00:56:54] so uh my question was if you are able to take into account the percentage of [00:56:56] take into account the percentage of people that wear masks [00:56:58] people that wear masks we are not that's a great question [00:57:00] we are not that's a great question you're uh your reviewer two three one i [00:57:03] you're uh your reviewer two three one i don't know many reviewers have that [00:57:04] don't know many reviewers have that question we do not attempt to take into [00:57:06] question we do not attempt to take into account the fraction of people wearing [00:57:07] account the fraction of people wearing masks and i think that's an interesting [00:57:08] masks and i think that's an interesting direction for future work [00:57:13] yeah hi i suppose this sort of um goes [00:57:15] yeah hi i suppose this sort of um goes along with your question that you put at [00:57:17] along with your question that you put at the end of the slide which is what other [00:57:18] the end of the slide which is what other questions you might answer with this [00:57:19] questions you might answer with this model [00:57:20] model but um i was wondering like could this [00:57:22] but um i was wondering like could this model of mobility be used to analyze [00:57:24] model of mobility be used to analyze like other mobility issues that don't [00:57:26] like other mobility issues that don't revolve around like health or [00:57:28] revolve around like health or epidemiology such as how different types [00:57:30] epidemiology such as how different types of zoning codes or access or use of [00:57:32] of zoning codes or access or use of public transportation and different cbgs [00:57:34] public transportation and different cbgs affect mobility of those neighborhoods [00:57:37] affect mobility of those neighborhoods uh yeah absolutely i mean safer data is [00:57:40] uh yeah absolutely i mean safer data is very broadly relevant to social science [00:57:43] very broadly relevant to social science and other questions of mobility [00:57:46] and other questions of mobility and we're using it for other projects as [00:57:48] and we're using it for other projects as well it's definitely [00:57:50] well it's definitely a gold mine for for other other [00:57:52] a gold mine for for other other yes and yes [00:57:55] yes and yes i was wondering do you think it's [00:57:56] i was wondering do you think it's possible that we can make connections [00:57:58] possible that we can make connections between [00:57:59] between like physical mobility between [00:58:02] like physical mobility between cbjs and pois and whether that somehow [00:58:06] cbjs and pois and whether that somehow correlates to the degree of [00:58:07] correlates to the degree of socioeconomic mobilities with [00:58:10] socioeconomic mobilities with ebgs [00:58:13] ebgs yeah and you might look at um [00:58:16] yeah and you might look at um yeah you might look at susan athey and [00:58:18] yeah you might look at susan athey and uh against cow athey and gents cow would [00:58:20] uh against cow athey and gents cow would be the names to look up but like they [00:58:21] be the names to look up but like they look at sort of socioeconomic [00:58:22] look at sort of socioeconomic segregation sorry they look at racial [00:58:24] segregation sorry they look at racial segregation using safecraft data but [00:58:26] segregation using safecraft data but then they correlate it with other [00:58:28] then they correlate it with other measures a sort of economic opportunity [00:58:30] measures a sort of economic opportunity using work from rush chetty i think in [00:58:32] using work from rush chetty i think in their paper so that's absolutely [00:58:33] their paper so that's absolutely something you can do i mean causal [00:58:35] something you can do i mean causal claims are hard but it's so interesting [00:58:39] and i think that's it [00:58:41] and i think that's it cool all right let's go on then great [00:58:45] cool all right let's go on then great okay cool so so i was i was asked to [00:58:47] okay cool so so i was i was asked to speak briefly about sort of how i ended [00:58:49] speak briefly about sort of how i ended up on on this path and doing this kind [00:58:50] up on on this path and doing this kind of work um just in case it was helpful [00:58:52] of work um just in case it was helpful to people so i attempted to to write [00:58:54] to people so i attempted to to write this down [00:58:55] this down um [00:58:58] okay so you know i liked math and [00:59:00] okay so you know i liked math and physics and other [00:59:02] physics and other similarly nerdy stuff ever since i was a [00:59:04] similarly nerdy stuff ever since i was a little kid um this was a picture of me [00:59:06] little kid um this was a picture of me dressing up as a chess board for [00:59:07] dressing up as a chess board for halloween um so you can tell i was super [00:59:09] halloween um so you can tell i was super cool and definitely had a ton of friends [00:59:11] cool and definitely had a ton of friends and i took my first ai class in high [00:59:13] and i took my first ai class in high school but i was the only girl in the [00:59:15] school but i was the only girl in the class and i had a lot of less experience [00:59:17] class and i had a lot of less experience than the boys [00:59:19] than the boys and some of them made fun of my lack of [00:59:20] and some of them made fun of my lack of experience and told me i was the worst [00:59:22] experience and told me i was the worst in the class so by the time i got to [00:59:24] in the class so by the time i got to stanford i actually decided i was not [00:59:25] stanford i actually decided i was not particularly good at computer science [00:59:28] particularly good at computer science and i came to stanford as a physics [00:59:29] and i came to stanford as a physics major and i did not even take any cs [00:59:31] major and i did not even take any cs classes my first year at stanford [00:59:35] classes my first year at stanford but [00:59:38] in my second year at stanford um i [00:59:40] in my second year at stanford um i decided i should give computer science [00:59:41] decided i should give computer science another try and so i actually enrolled [00:59:43] another try and so i actually enrolled in this class which at that point was [00:59:45] in this class which at that point was not taught by percy i don't know who [00:59:47] not taught by percy i don't know who who were the teachers [00:59:49] who were the teachers no [00:59:50] no mm-hmm [00:59:52] mm-hmm so that i think so and some other i [00:59:54] so that i think so and some other i don't know i don't even remember who [00:59:55] don't know i don't even remember who taught the class i do remember the class [00:59:57] taught the class i do remember the class was awesome um and and and honestly like [01:00:00] was awesome um and and and honestly like i i this is not propaganda like it was [01:00:01] i i this is not propaganda like it was actually true i thought i thought the [01:00:02] actually true i thought i thought the computer science was super cool um and [01:00:04] computer science was super cool um and that summer i started doing computer [01:00:06] that summer i started doing computer science research in a physics lab i was [01:00:08] science research in a physics lab i was developing algorithms to identify [01:00:10] developing algorithms to identify certain types of galaxies um but i [01:00:12] certain types of galaxies um but i realized something was missing that i [01:00:14] realized something was missing that i thought you know ai was amazing but i [01:00:16] thought you know ai was amazing but i didn't want to use it to study galaxies [01:00:18] didn't want to use it to study galaxies that were you know millions of light [01:00:19] that were you know millions of light years away there were too many problems [01:00:21] years away there were too many problems that were closer to home [01:00:24] that were closer to home and this was really driven home for me a [01:00:26] and this was really driven home for me a few months later when i got a genetic [01:00:28] few months later when i got a genetic test which told me i carried a mutation [01:00:29] test which told me i carried a mutation that gave me a very high risk of getting [01:00:31] that gave me a very high risk of getting cancer it's called a brachia mutation [01:00:33] cancer it's called a brachia mutation some of you might have heard of it [01:00:35] some of you might have heard of it and as you can imagine this was pretty [01:00:37] and as you can imagine this was pretty difficult news to receive as a 20 year [01:00:38] difficult news to receive as a 20 year old and i spent the next few months [01:00:40] old and i spent the next few months being pretty upset about it [01:00:43] being pretty upset about it and during this period when i was trying [01:00:45] and during this period when i was trying to come to terms with the news i came [01:00:47] to come to terms with the news i came across this paper out of daphne kohler's [01:00:49] across this paper out of daphne kohler's lab she was an ai professor at stanford [01:00:51] lab she was an ai professor at stanford at the time i'm sure many of you have [01:00:52] at the time i'm sure many of you have heard of her these days she's an [01:00:54] heard of her these days she's an industry [01:00:55] industry and in her paper they take images of [01:00:57] and in her paper they take images of cells from cancer patients and they [01:00:59] cells from cancer patients and they apply a computer vision model to try to [01:01:01] apply a computer vision model to try to predict whether the patients will [01:01:02] predict whether the patients will survive [01:01:03] survive and these days you would probably do [01:01:04] and these days you would probably do this using deep learning back then they [01:01:06] this using deep learning back then they were using old school computer vision i [01:01:08] were using old school computer vision i thought it was the most amazing thing [01:01:10] thought it was the most amazing thing and more importantly it gave me hope i [01:01:12] and more importantly it gave me hope i thought fine i have this cancer problem [01:01:14] thought fine i have this cancer problem i'll work on this problem with ai i knew [01:01:17] i'll work on this problem with ai i knew that i wasn't going to cure cancer but i [01:01:19] that i wasn't going to cure cancer but i thought that working on it and learning [01:01:20] thought that working on it and learning about it would make me feel better [01:01:22] about it would make me feel better understanding the things that frighten [01:01:23] understanding the things that frighten you often does [01:01:25] you often does and so i wrote to daphne and she wrote [01:01:27] and so i wrote to daphne and she wrote back on new year's day and she offered [01:01:28] back on new year's day and she offered me a spot doing research in her lab and [01:01:30] me a spot doing research in her lab and i took it [01:01:31] i took it there's a lesson i take from this kind [01:01:33] there's a lesson i take from this kind of tough period in my life which is that [01:01:35] of tough period in my life which is that crying is underrated or as gandalf puts [01:01:37] crying is underrated or as gandalf puts it in lord of the rings not all tears [01:01:39] it in lord of the rings not all tears are an evil um i i found tough times [01:01:42] are an evil um i i found tough times like this one to be very useful in [01:01:44] like this one to be very useful in crystallizing what matters to me from [01:01:46] crystallizing what matters to me from that point on i became much more [01:01:47] that point on i became much more intensely focused on what i wanted to do [01:01:49] intensely focused on what i wanted to do in life [01:01:51] in life so then i graduated stanford uh with a [01:01:53] so then i graduated stanford uh with a bachelor's in physics and a master's in [01:01:54] bachelor's in physics and a master's in computer science and i decided to take a [01:01:56] computer science and i decided to take a job at 23andme which was a genetics [01:01:58] job at 23andme which was a genetics company that offered very cheap testing [01:02:00] company that offered very cheap testing for the bronco mutation which i carried [01:02:03] for the bronco mutation which i carried uh their test was so cheap that my [01:02:05] uh their test was so cheap that my little sister who is then only a [01:02:06] little sister who is then only a teenager could afford to get tested and [01:02:09] teenager could afford to get tested and learned that she did not carry the same [01:02:10] learned that she did not carry the same mutation that i did and i thought that [01:02:12] mutation that i did and i thought that was amazing you know expanding access to [01:02:14] was amazing you know expanding access to genetic testing in that way so i [01:02:15] genetic testing in that way so i accepted a job at 23andme [01:02:19] but about a month after i arrived at [01:02:21] but about a month after i arrived at 23andme the us government sent them a [01:02:23] 23andme the us government sent them a letter ordering them to stop selling [01:02:25] letter ordering them to stop selling their health-related tests because they [01:02:27] their health-related tests because they hadn't gotten basically the proper [01:02:29] hadn't gotten basically the proper regulatory approval so they could no [01:02:30] regulatory approval so they could no longer sell their broker test which was [01:02:32] longer sell their broker test which was the whole reason i had gone to the [01:02:33] the whole reason i had gone to the company in the first place and for the [01:02:35] company in the first place and for the entire year i was there i basically [01:02:36] entire year i was there i basically didn't do any bracket research at all [01:02:38] didn't do any bracket research at all which i think is another important [01:02:40] which i think is another important lesson like even if you start out on a [01:02:41] lesson like even if you start out on a path with the best of intentions it's [01:02:43] path with the best of intentions it's very easy to get really derailed at [01:02:45] very easy to get really derailed at least for me and it's very hard to [01:02:47] least for me and it's very hard to predict what projects will pan out [01:02:51] then i went back to school to start my [01:02:53] then i went back to school to start my graduate research i was still very [01:02:54] graduate research i was still very motivated to do cancer research and over [01:02:57] motivated to do cancer research and over the next couple years i wrote a half [01:02:58] the next couple years i wrote a half dozen papers developing ai papers for [01:03:01] dozen papers developing ai papers for computational biology methods [01:03:03] computational biology methods but i began to feel my work was [01:03:05] but i began to feel my work was unsatisfying because i was still too far [01:03:07] unsatisfying because i was still too far away from real people's lives early in [01:03:09] away from real people's lives early in my graduate research my grandpa who [01:03:11] my graduate research my grandpa who carried the same genetic mutation that i [01:03:13] carried the same genetic mutation that i do died of brain cancer we were very [01:03:16] do died of brain cancer we were very close that's us playing chess up there [01:03:18] close that's us playing chess up there when i was little and i wrote a fair bit [01:03:20] when i was little and i wrote a fair bit of my master's thesis uh next to his [01:03:22] of my master's thesis uh next to his hospital bed [01:03:23] hospital bed the thesis develops new dimensionality [01:03:25] the thesis develops new dimensionality reduction methods like a fancy pca or [01:03:28] reduction methods like a fancy pca or factor analysis basically if you've [01:03:29] factor analysis basically if you've heard of those things [01:03:31] heard of those things for a certain type of biological data [01:03:33] for a certain type of biological data which is important in cancer and many [01:03:35] which is important in cancer and many other settings and work like that work [01:03:37] other settings and work like that work like what i was doing just felt very far [01:03:39] like what i was doing just felt very far away decades away from helping people [01:03:42] away decades away from helping people like my grandpa which is not to say that [01:03:44] like my grandpa which is not to say that no one should do it i think it's super [01:03:45] no one should do it i think it's super important that you have people doing [01:03:46] important that you have people doing that kind of fundamental research even [01:03:48] that kind of fundamental research even if it doesn't touch real people's lives [01:03:50] if it doesn't touch real people's lives for a long time [01:03:51] for a long time but i began to feel that it wasn't for [01:03:53] but i began to feel that it wasn't for me that i wanted something that was [01:03:55] me that i wanted something that was going to help people in the short term [01:03:56] going to help people in the short term if i was going to be happy for the with [01:03:58] if i was going to be happy for the with the sort of research that i was doing [01:04:02] so [01:04:03] so based on sort of that understanding i [01:04:05] based on sort of that understanding i started working on data sets where each [01:04:07] started working on data sets where each row was not a cell or a gene or [01:04:09] row was not a cell or a gene or something very abstract but a person i [01:04:13] something very abstract but a person i kept working on health care problems and [01:04:14] kept working on health care problems and i also started working more on [01:04:16] i also started working more on inequality that took me forward to some [01:04:18] inequality that took me forward to some of the problems that i've told you about [01:04:20] of the problems that i've told you about today studying things like inequality [01:04:22] today studying things like inequality and pain and policing and covid [01:04:24] and pain and policing and covid which feel very concrete to me [01:04:31] looking back at my research i see a lot [01:04:33] looking back at my research i see a lot of failures and wrong turns i went to [01:04:35] of failures and wrong turns i went to 23andme to research braca and i failed [01:04:38] 23andme to research braca and i failed to do that i went to grad school to [01:04:40] to do that i went to grad school to study cancer and i mostly failed to do [01:04:41] study cancer and i mostly failed to do that i've spent more than 10 000 hours [01:04:44] that i've spent more than 10 000 hours of my life getting a phd and i think [01:04:46] of my life getting a phd and i think it's fair to say that many of those [01:04:47] it's fair to say that many of those hours have not made anyone's lives [01:04:49] hours have not made anyone's lives better [01:04:50] better there's a lot of time running down blind [01:04:51] there's a lot of time running down blind alleys and even when you do have a good [01:04:53] alleys and even when you do have a good idea there's a lot of time polishing it [01:04:55] idea there's a lot of time polishing it and repolishing it to get it published [01:04:57] and repolishing it to get it published and even when you do publish it often [01:04:59] and even when you do publish it often very few people read it and even when [01:05:00] very few people read it and even when people do read it does it actually [01:05:02] people do read it does it actually change their minds [01:05:04] change their minds a few months ago a man contacted me [01:05:06] a few months ago a man contacted me because he was writing a new york times [01:05:07] because he was writing a new york times piece about our work on policing that i [01:05:09] piece about our work on policing that i was just telling you about and i went [01:05:11] was just telling you about and i went back and forth with him meticulously [01:05:13] back and forth with him meticulously trying to make sure he stated our [01:05:14] trying to make sure he stated our conclusions accurately i'm sure he was [01:05:16] conclusions accurately i'm sure he was very sick of me and when the piece [01:05:18] very sick of me and when the piece finally came out i read the new york [01:05:20] finally came out i read the new york times comment section and it was obvious [01:05:22] times comment section and it was obvious that none of the commenters were [01:05:23] that none of the commenters were actually reading our research they were [01:05:24] actually reading our research they were just spouting what they already believed [01:05:26] just spouting what they already believed and that is probably the project i've [01:05:28] and that is probably the project i've gotten to work on which has been most [01:05:30] gotten to work on which has been most impactful [01:05:36] but even though i've spent so much of my [01:05:38] but even though i've spent so much of my life failing to do good i still think [01:05:40] life failing to do good i still think it's important to try and that's the [01:05:41] it's important to try and that's the final topic i want to discuss i outlined [01:05:44] final topic i want to discuss i outlined this part of the talk the night ruth [01:05:45] this part of the talk the night ruth bader ginsburg died i heard the news and [01:05:48] bader ginsburg died i heard the news and i knew i wasn't going to be able to [01:05:49] i knew i wasn't going to be able to write any more code that day so i [01:05:50] write any more code that day so i decided i'd just walk until i felt [01:05:52] decided i'd just walk until i felt better but unfortunately it got very [01:05:54] better but unfortunately it got very dark and cold before that happened so [01:05:56] dark and cold before that happened so you'll forgive me if this comes across [01:05:58] you'll forgive me if this comes across as a little moralistic or maudlin but it [01:06:00] as a little moralistic or maudlin but it wasn't the best evening [01:06:02] wasn't the best evening but before i tell you why i think you [01:06:04] but before i tell you why i think you should try to do good rather than just [01:06:06] should try to do good rather than just making a lot of money i want to [01:06:07] making a lot of money i want to acknowledge that there are students [01:06:08] acknowledge that there are students watching this talk who really do need to [01:06:10] watching this talk who really do need to make a lot of money when you graduate [01:06:12] make a lot of money when you graduate you have families to support you have [01:06:14] you have families to support you have huge student loans these are frightening [01:06:16] huge student loans these are frightening economic times and if that describes you [01:06:18] economic times and if that describes you i'm not going to lecture you and you [01:06:19] i'm not going to lecture you and you should please feel free to ignore this [01:06:21] should please feel free to ignore this bit [01:06:24] still i can almost promise that for some [01:06:26] still i can almost promise that for some of you listening to this talk there will [01:06:28] of you listening to this talk there will come a point not too far for not too [01:06:30] come a point not too far for not too long from now where you will have a [01:06:32] long from now where you will have a choice between multiple jobs which are [01:06:34] choice between multiple jobs which are both fun and both interesting and both [01:06:36] both fun and both interesting and both pay you more money than you could [01:06:38] pay you more money than you could possibly need as a young person that's [01:06:40] possibly need as a young person that's what the stanford computer science [01:06:41] what the stanford computer science salary survey shows for the last six [01:06:43] salary survey shows for the last six years i have data [01:06:45] years i have data and when that moment comes i'm asking [01:06:47] and when that moment comes i'm asking you to choose a job that makes the world [01:06:49] you to choose a job that makes the world better and not just in some trivial way [01:06:51] better and not just in some trivial way and not just for the very richest people [01:06:53] and not just for the very richest people i'm not asking you to donate a kidney or [01:06:55] i'm not asking you to donate a kidney or storm the beaches at normandy or risk [01:06:57] storm the beaches at normandy or risk your lives treating coped patients i'm [01:06:59] your lives treating coped patients i'm asking you to choose to make a large [01:07:01] asking you to choose to make a large amount of money as opposed to an obscene [01:07:03] amount of money as opposed to an obscene amount of money it's just not that big a [01:07:05] amount of money it's just not that big a sacrifice in a world with such desperate [01:07:07] sacrifice in a world with such desperate problems where we've gotten so lucky [01:07:09] problems where we've gotten so lucky and i also think you'll find that you'll [01:07:11] and i also think you'll find that you'll get more enjoyment out of whatever money [01:07:13] get more enjoyment out of whatever money you do make [01:07:14] you do make if you feel like you earned it doing [01:07:16] if you feel like you earned it doing something meaningful [01:07:19] the other reason i think we're compelled [01:07:21] the other reason i think we're compelled to fight for good is that there are a [01:07:22] to fight for good is that there are a lot of people doing the opposite i don't [01:07:25] lot of people doing the opposite i don't want to get political about this but i [01:07:26] want to get political about this but i think we've all seen just how [01:07:28] think we've all seen just how catastrophic the consequences of that [01:07:30] catastrophic the consequences of that can be [01:07:31] can be so if we who are given the most power to [01:07:33] so if we who are given the most power to push the world in the right direction [01:07:35] push the world in the right direction take morally neutral or morally harmful [01:07:37] take morally neutral or morally harmful careers instead the world will slide in [01:07:40] careers instead the world will slide in the wrong direction i am only here [01:07:42] the wrong direction i am only here giving this lecture many of you are only [01:07:44] giving this lecture many of you are only here listening to it because people like [01:07:46] here listening to it because people like ruth bader ginsburg woke up every day [01:07:48] ruth bader ginsburg woke up every day for decades vowing to push the world in [01:07:50] for decades vowing to push the world in the right direction to expand the circle [01:07:52] the right direction to expand the circle of people allowed to be in classes like [01:07:54] of people allowed to be in classes like this one she could have gone into [01:07:55] this one she could have gone into corporate law instead apparently she had [01:07:58] corporate law instead apparently she had a taste for armani suits she could have [01:07:59] a taste for armani suits she could have bought a lot more of them [01:08:02] i think ultimately a lot of us take high [01:08:05] i think ultimately a lot of us take high paying socially neutral or socially [01:08:07] paying socially neutral or socially harmful jobs not because we really need [01:08:09] harmful jobs not because we really need all that money but because we've [01:08:10] all that money but because we've internalized the implicit and insidious [01:08:12] internalized the implicit and insidious claim that if we make a lot of money [01:08:14] claim that if we make a lot of money we're good engineers we've made it in [01:08:16] we're good engineers we've made it in life we're worthy of respect [01:08:18] life we're worthy of respect we need to break that link we need to [01:08:20] we need to break that link we need to redefine what it means to be a good [01:08:22] redefine what it means to be a good engineer a few weeks ago i got an email [01:08:25] engineer a few weeks ago i got an email from a recruiter from the big finance [01:08:27] from a recruiter from the big finance firm and i responded the way i typically [01:08:28] firm and i responded the way i typically do i told him i don't work for finance [01:08:30] do i told him i don't work for finance firms and he asked me why i didn't want [01:08:32] firms and he asked me why i didn't want to work with the best engineers in the [01:08:33] to work with the best engineers in the world [01:08:34] world and i thought [01:08:36] and i thought the best engineers in the world think [01:08:37] the best engineers in the world think about the social implications of their [01:08:39] about the social implications of their work [01:08:40] work the biggest factor determining your [01:08:41] the biggest factor determining your impact will not be whether you [01:08:43] impact will not be whether you understand all the variants of gradient [01:08:45] understand all the variants of gradient based optimization although you like [01:08:47] based optimization although you like should put some effort into learning [01:08:49] should put some effort into learning those both because they are very useful [01:08:50] those both because they are very useful and so pogba like won't kill me [01:08:53] and so pogba like won't kill me um the biggest factor determining your [01:08:55] um the biggest factor determining your impact will be the problems you choose [01:08:57] impact will be the problems you choose to work on that's what makes a great [01:08:59] to work on that's what makes a great engineer [01:09:02] i'll close with a quote from tanahisi [01:09:04] i'll close with a quote from tanahisi coats between the world and me which is [01:09:06] coats between the world and me which is a letter he writes to his son who is [01:09:07] a letter he writes to his son who is about your age he writes i would have [01:09:10] about your age he writes i would have you be a conscious citizen of this [01:09:11] you be a conscious citizen of this terrible and beautiful world [01:09:13] terrible and beautiful world this is what i would wish for my child [01:09:15] this is what i would wish for my child and for my students and for myself [01:09:17] and for my students and for myself and for you as well thanks very much for [01:09:19] and for you as well thanks very much for listening and i'm happy to take any [01:09:21] listening and i'm happy to take any further questions [01:09:24] uh hi well thank you very much this was [01:09:27] uh hi well thank you very much this was impressive listening to you um well i [01:09:30] impressive listening to you um well i was i was thinking while you were [01:09:31] was i was thinking while you were talking about this [01:09:36] medical [01:09:37] medical impact [01:09:39] impact and how to study that [01:09:40] and how to study that especially [01:09:42] especially with regards to [01:09:44] with regards to the bias [01:09:49] uh population so um [01:09:52] uh population so um there are quite a lot of actually uh [01:09:54] there are quite a lot of actually uh biases cognitive biases that [01:09:56] biases cognitive biases that doctors can [01:09:58] doctors can can show while taking important medical [01:10:00] can show while taking important medical decisions [01:10:03] decisions are we able to study somehow i mean how [01:10:05] are we able to study somehow i mean how they [01:10:07] they [Music] [01:10:09] [Music] how this happens so [01:10:11] how this happens so just to help them avoid and [01:10:14] just to help them avoid and eliminate or reduce this [01:10:16] eliminate or reduce this this kind of risk [01:10:18] this kind of risk yeah totally i mean i think this sort of [01:10:20] yeah totally i mean i think this sort of like behavioral economics approach to [01:10:22] like behavioral economics approach to like let's understand doctors biases and [01:10:24] like let's understand doctors biases and put them in in terms of sort of these [01:10:26] put them in in terms of sort of these common [01:10:27] common heuristics that people use these common [01:10:29] heuristics that people use these common biases that people have um is a broad [01:10:32] biases that people have um is a broad and promising line of research [01:10:34] and promising line of research i'm not in a behavioral economist [01:10:36] i'm not in a behavioral economist for one example of this type of work i [01:10:38] for one example of this type of work i would point you to [01:10:39] would point you to it's a recent paper it's called like who [01:10:41] it's a recent paper it's called like who gets tested for heart attack and who [01:10:43] gets tested for heart attack and who should be um and it sort of studies you [01:10:46] should be um and it sort of studies you know how do these cognitive [01:10:48] know how do these cognitive uh heuristics that people use play into [01:10:50] uh heuristics that people use play into decisions like this the authors are you [01:10:52] decisions like this the authors are you should look for senator milanathan and z [01:10:53] should look for senator milanathan and z at obermeyer and there may be some other [01:10:55] at obermeyer and there may be some other authors as well but but the broad answer [01:10:57] authors as well but but the broad answer to your question is yes you know you can [01:10:59] to your question is yes you know you can you can absolutely study doctor's biases [01:11:01] you can absolutely study doctor's biases in terms of sort of things we know [01:11:02] in terms of sort of things we know cognitively about people and they make [01:11:04] cognitively about people and they make decisions [01:11:06] decisions thank you [01:11:10] the other uh read out your question [01:11:17] i'm sure yeah just about the [01:11:20] i'm sure yeah just about the the difference between racism [01:11:22] the difference between racism is um i'm not sure if any study [01:11:25] is um i'm not sure if any study actually um [01:11:26] actually um about us if any difference between you [01:11:29] about us if any difference between you know iq happiness [01:11:31] know iq happiness is what actually caused the difference [01:11:34] is what actually caused the difference or there's a difference as a i guess it [01:11:36] or there's a difference as a i guess it becomes either correlation or [01:11:38] becomes either correlation or consequence [01:11:39] consequence is there any [01:11:40] is there any study looking deeper on us [01:11:43] study looking deeper on us to understand the difference [01:11:46] to understand the difference uh [01:11:46] uh i mean there are a lot of differences [01:11:48] i mean there are a lot of differences looking in a lot of studies looking at [01:11:50] looking in a lot of studies looking at differences by race and ethnicity uh [01:11:53] differences by race and ethnicity uh this is a fraught topic it you know some [01:11:56] this is a fraught topic it you know some of the studies have not been good [01:11:57] of the studies have not been good studies like so like in particular [01:11:59] studies like so like in particular studies of racial differences in iq i [01:12:01] studies of racial differences in iq i think it's a very fraught topic and then [01:12:02] think it's a very fraught topic and then there's stuff which is you know not at [01:12:04] there's stuff which is you know not at all fraud like let's look at racial [01:12:05] all fraud like let's look at racial differences and i don't know incidents [01:12:07] differences and i don't know incidents of breast cancer or deaths from breast [01:12:08] of breast cancer or deaths from breast cancer so yes i mean there's there's [01:12:11] cancer so yes i mean there's there's there's a lot of [01:12:13] there's a lot of research in this area um [01:12:15] research in this area um of varying quality uh but a lot of it is [01:12:17] of varying quality uh but a lot of it is super important [01:12:20] super important it is extremely difficult to figure out [01:12:22] it is extremely difficult to figure out causality here and more studies that [01:12:24] causality here and more studies that claim right they can are [01:12:27] claim right they can are pushing a particular [01:12:29] pushing a particular political agenda and should be treated [01:12:31] political agenda and should be treated for a lot of skepticism [01:12:33] for a lot of skepticism okay thank you [01:12:40] okay [01:12:41] okay uh [01:12:42] uh since we are about time [01:12:48] yeah sounds good so if there's no other [01:12:52] yeah sounds good so if there's no other questions uh let's uh [01:12:54] questions uh let's uh um if you can mute and clap that would [01:12:56] um if you can mute and clap that would be great i'll count to three so we can [01:12:58] be great i'll count to three so we can all uh give him a really big round of [01:13:01] all uh give him a really big round of applause that was an amazing talk [01:13:11] thank you it's a pleasure thank you for [01:13:12] thank you it's a pleasure thank you for the great questions [01:13:19] you ================================================================================ LECTURE 055 ================================================================================ Fireside Talks: Artificial Intelligence (AI) and Language Source: https://www.youtube.com/watch?v=pI72PseZQo8 --- Transcript [00:00:05] okay great let's get started so welcome [00:00:08] okay great let's get started so welcome everyone to the fireside chat or talk on [00:00:11] everyone to the fireside chat or talk on ai and language [00:00:13] ai and language um so today we're going to do something [00:00:14] um so today we're going to do something a little bit different i want all of you [00:00:16] a little bit different i want all of you good to go to slide [00:00:19] good to go to slide oh [00:00:20] oh and [00:00:20] and the guest code is cs221 so we're going [00:00:23] the guest code is cs221 so we're going to try to use this [00:00:24] to try to use this platform for doing q a and also i'm [00:00:27] platform for doing q a and also i'm going to have a number of polls [00:00:28] going to have a number of polls throughout [00:00:29] throughout the talk [00:00:30] the talk so the first question if you click here [00:00:33] so the first question if you click here you'll hopefully see that it's what city [00:00:35] you'll hopefully see that it's what city are you in right now [00:00:38] are you in right now and i already got some responses san [00:00:40] and i already got some responses san jose palo alto stanford [00:00:42] jose palo alto stanford seattle um [00:00:44] seattle um college station [00:00:46] college station fort mill [00:00:47] fort mill cupertino new york city [00:00:49] cupertino new york city so welcome everyone from all over [00:00:53] so welcome everyone from all over um and if you go oops i guess [00:00:56] um and if you go oops i guess this [00:00:58] this zoom thing is in a way okay i guess i [00:01:00] zoom thing is in a way okay i guess i can't [00:01:01] can't see that but [00:01:02] see that but anyway you can go to there should be a q [00:01:04] anyway you can go to there should be a q a tab [00:01:06] a tab um [00:01:07] um where you can go and type in your [00:01:09] where you can go and type in your questionnaire uh try to monitor that [00:01:11] questionnaire uh try to monitor that throughout the [00:01:12] throughout the hour [00:01:14] hour okay [00:01:16] okay um all right so [00:01:20] i wanna start by asking you a very [00:01:22] i wanna start by asking you a very simple question what is the difference [00:01:24] simple question what is the difference between these two cute little kittens [00:01:27] between these two cute little kittens and these two kids here [00:01:30] and these two kids here anyone know the answer [00:01:33] anyone know the answer both can see smell [00:01:36] both can see smell taste move around the environment [00:01:39] taste move around the environment kids are sometimes cute too [00:01:42] kids are sometimes cute too what's the what's the main difference [00:01:44] what's the what's the main difference let's make this interactive someone just [00:01:45] let's make this interactive someone just just shout out an answer [00:01:56] humans can talk [00:01:58] humans can talk humans can talk yes thank you [00:02:01] humans can talk yes thank you that is the main difference and while [00:02:04] that is the main difference and while animals do have [00:02:06] animals do have some sorts of communication especially [00:02:08] some sorts of communication especially songbirds and dolphins and and honeybees [00:02:11] songbirds and dolphins and and honeybees have their waggle dance [00:02:13] have their waggle dance none i think could boast as such a rich [00:02:15] none i think could boast as such a rich and complex language as the human [00:02:17] and complex language as the human language so it's really i think language [00:02:19] language so it's really i think language is something that's uniquely human and [00:02:21] is something that's uniquely human and defines who we are [00:02:24] defines who we are so before getting into [00:02:25] so before getting into talking about ai and nlp i want to talk [00:02:28] talking about ai and nlp i want to talk spend some time just talking about why [00:02:30] spend some time just talking about why language is special and hopeless that we [00:02:33] language is special and hopeless that we can get a richer appreciation for a [00:02:35] can get a richer appreciation for a language [00:02:37] language so if i had one slide to summarize [00:02:39] so if i had one slide to summarize language um this would be it [00:02:42] language um this would be it so this is one of my favorite xkcd [00:02:44] so this is one of my favorite xkcd comics um some of you probably seen it [00:02:45] comics um some of you probably seen it but i'll just read it anyway because i [00:02:47] but i'll just read it anyway because i think it really highlights the the right [00:02:50] think it really highlights the the right mode so [00:02:51] mode so anyway i could care less [00:02:54] anyway i could care less i think you mean you couldn't care less [00:02:56] i think you mean you couldn't care less saying that you could care less implies [00:02:58] saying that you could care less implies that you care at least some amount [00:03:00] that you care at least some amount i don't know [00:03:02] i don't know we're these unbelievably complex brains [00:03:05] we're these unbelievably complex brains drifting in the void trying to in vain [00:03:07] drifting in the void trying to in vain to connect with one another by flinging [00:03:09] to connect with one another by flinging uh words out into the darkness every [00:03:12] uh words out into the darkness every choice of phrasing and spelling and tone [00:03:14] choice of phrasing and spelling and tone timing carries countless signals and [00:03:15] timing carries countless signals and context and subtext and more and every [00:03:17] context and subtext and more and every listener interprets these signals in [00:03:19] listener interprets these signals in their own way language isn't a formal [00:03:21] their own way language isn't a formal system it's glorified chaos [00:03:24] system it's glorified chaos and you never know for sure what any [00:03:26] and you never know for sure what any words will mean to anyone all you can do [00:03:28] words will mean to anyone all you can do is try to get better at guessing how [00:03:29] is try to get better at guessing how your words affect people so that you [00:03:31] your words affect people so that you have a better chance of finding ones [00:03:33] have a better chance of finding ones that will make them feel something like [00:03:35] that will make them feel something like what you want them to feel everything [00:03:37] what you want them to feel everything else is pointless [00:03:39] else is pointless so i assume you're giving me tips on how [00:03:41] so i assume you're giving me tips on how you interpret words because you want me [00:03:43] you interpret words because you want me to feel less alone if so [00:03:45] to feel less alone if so then thank you that means a lot [00:03:47] then thank you that means a lot but if you're just running my sentences [00:03:49] but if you're just running my sentences past the mental checklist so you can [00:03:51] past the mental checklist so you can show off how well you know it then i [00:03:53] show off how well you know it then i could care less [00:03:55] could care less so [00:03:56] so what do we learn from this so first [00:03:58] what do we learn from this so first language [00:04:00] language is is social it's it's meant for [00:04:02] is is social it's it's meant for communication right i think a lot of us [00:04:05] communication right i think a lot of us coming more from a kind of data or ml [00:04:07] coming more from a kind of data or ml background i think language is just a [00:04:09] background i think language is just a body of text but it's really this uh [00:04:12] body of text but it's really this uh this dynamic thing that humans invented [00:04:14] this dynamic thing that humans invented communicate with each other [00:04:16] communicate with each other the other thing is that language [00:04:18] the other thing is that language or talk is is cheap [00:04:20] or talk is is cheap um and something about language requires [00:04:23] um and something about language requires incredible amount of trust between the [00:04:25] incredible amount of trust between the people so that [00:04:27] people so that it actually can can function but [00:04:30] it actually can can function but interestingly it can also be used to [00:04:31] interestingly it can also be used to deceive which is [00:04:33] deceive which is interesting right now [00:04:35] interesting right now and it's just kind of miraculous how [00:04:36] and it's just kind of miraculous how language allows us to express all these [00:04:39] language allows us to express all these different thoughts from poetry to math [00:04:42] different thoughts from poetry to math to [00:04:44] to how to fix a bike and so on [00:04:47] how to fix a bike and so on so where did language come from the [00:04:50] so where did language come from the short answer is no one really knows and [00:04:52] short answer is no one really knows and it's really hard to pinpoint it because [00:04:54] it's really hard to pinpoint it because while writing came around 3000 bc and [00:04:58] while writing came around 3000 bc and before then it was a long period of [00:04:59] before then it was a long period of spoken language and spoken language [00:05:02] spoken language and spoken language doesn't leave fossils or anything [00:05:04] doesn't leave fossils or anything um and there was so much skeptic [00:05:06] um and there was so much skeptic kind of controversy around that that was [00:05:08] kind of controversy around that that was actually banned the study of origins of [00:05:10] actually banned the study of origins of language was banned for about a hundred [00:05:11] language was banned for about a hundred years and [00:05:13] years and um [00:05:13] um in paris [00:05:15] in paris so but but we do [00:05:17] so but but we do just kind of conservatively put an [00:05:18] just kind of conservatively put an estimate it started maybe [00:05:21] estimate it started maybe 2.5 million years ago when homo sapiens [00:05:24] 2.5 million years ago when homo sapiens first came on the scene to [00:05:27] first came on the scene to sometime between 100 000 years ago which [00:05:29] sometime between 100 000 years ago which is when modern humans really started [00:05:33] is when modern humans really started doing things [00:05:35] doing things which is a huge range but to put in [00:05:37] which is a huge range but to put in perspective this is a very recent [00:05:39] perspective this is a very recent development compared to the history of [00:05:41] development compared to the history of all of life on earth [00:05:43] all of life on earth um and [00:05:44] um and we know that it served an evolutionary [00:05:47] we know that it served an evolutionary purpose so if you read sapiens this book [00:05:49] purpose so if you read sapiens this book by yuval harari [00:05:52] by yuval harari language is perhaps one of the key [00:05:55] language is perhaps one of the key reasons why homo sapiens became so [00:05:57] reasons why homo sapiens became so dominant because they allowed you to [00:05:59] dominant because they allowed you to communicate and coordinate on such [00:06:02] communicate and coordinate on such massive levels and for example detecting [00:06:05] massive levels and for example detecting when coordinating on the hunt for [00:06:07] when coordinating on the hunt for example or [00:06:08] example or communicating about food sources and so [00:06:10] communicating about food sources and so on um and interesting language allows [00:06:13] on um and interesting language allows you to talk about things that aren't [00:06:14] you to talk about things that aren't here and now [00:06:16] here and now that that is probably the one of the [00:06:18] that that is probably the one of the most powerful things and in fact it [00:06:19] most powerful things and in fact it tells allows you to talk about things [00:06:21] tells allows you to talk about things that don't even exist there's a whole [00:06:23] that don't even exist there's a whole genre called fiction that's about that [00:06:25] genre called fiction that's about that and things which are in the abstract [00:06:28] and things which are in the abstract so [00:06:29] so um in contrast like i said before you [00:06:31] um in contrast like i said before you know our kind of uh [00:06:33] know our kind of uh sister fields of [00:06:35] sister fields of you know computer vision robots uh [00:06:37] you know computer vision robots uh robotics uh tap into capabilities that [00:06:40] robotics uh tap into capabilities that have been around for you know much much [00:06:42] have been around for you know much much longer like vision is over 500 million [00:06:44] longer like vision is over 500 million years ago and language is you know [00:06:46] years ago and language is you know barely let's say [00:06:48] barely let's say uh conservatively maybe of a million or [00:06:50] uh conservatively maybe of a million or two years ago [00:06:54] so uh just for fun let me do a poll um [00:06:58] so uh just for fun let me do a poll um let me actually have to create the poll [00:06:59] let me actually have to create the poll first so what language do you speak [00:07:02] first so what language do you speak languages do you use [00:07:05] languages do you use here speak [00:07:08] let's see multiple choice free text [00:07:12] let's see multiple choice free text okay [00:07:13] okay let's see if this works [00:07:15] let's see if this works i have to disable [00:07:17] i have to disable okay so go to the poll and [00:07:20] okay so go to the poll and let you fill that out [00:07:22] let you fill that out so [00:07:24] so we know that [00:07:26] we know that language there's not one language [00:07:28] language there's not one language there's multiple languages [00:07:30] there's multiple languages and furthermore languages [00:07:33] and furthermore languages have [00:07:34] have evolved [00:07:36] evolved so you can draw kind of a giant family [00:07:38] so you can draw kind of a giant family tree of languages [00:07:41] tree of languages and this branch just shows the [00:07:43] and this branch just shows the indo-european languages which covers all [00:07:45] indo-european languages which covers all of europe and iran and some parts of [00:07:48] of europe and iran and some parts of northern india which developed like 10 [00:07:50] northern india which developed like 10 000 years ago and this branch often to [00:07:53] 000 years ago and this branch often to kind of germanic languages and romance [00:07:54] kind of germanic languages and romance languages germanic went into german and [00:07:58] languages germanic went into german and english and so on and today there's um 6 [00:08:01] english and so on and today there's um 6 500 languages [00:08:03] 500 languages many of which are actually going extinct [00:08:06] many of which are actually going extinct because language again is social so if [00:08:08] because language again is social so if you don't have anyone to talk to your [00:08:09] you don't have anyone to talk to your language just kind of disappears [00:08:12] language just kind of disappears and language is changing all the time [00:08:15] and language is changing all the time you know english has definitely evolved [00:08:16] you know english has definitely evolved since shakespeare [00:08:18] since shakespeare but you know i think in grade school [00:08:20] but you know i think in grade school you're probably told that they is [00:08:22] you're probably told that they is supposed to be plural and you shouldn't [00:08:24] supposed to be plural and you shouldn't use say they to refer to a singular [00:08:26] use say they to refer to a singular person but now it's uh [00:08:29] person but now it's uh especially with kind of this trend for [00:08:31] especially with kind of this trend for um [00:08:32] um having gender neutral pronouns they is [00:08:34] having gender neutral pronouns they is you know kind of proudly singular and [00:08:36] you know kind of proudly singular and miriam webster kind of declared it as [00:08:38] miriam webster kind of declared it as the word of the year in 2019 [00:08:40] the word of the year in 2019 um you can think about internet slang [00:08:42] um you can think about internet slang and emojis is also kind of a [00:08:44] and emojis is also kind of a continuation of language into the kind [00:08:46] continuation of language into the kind of digital sphere and so on [00:08:48] of digital sphere and so on um [00:08:49] um okay so i'm getting a lot of english [00:08:51] okay so i'm getting a lot of english mandarin chinese [00:08:53] mandarin chinese japanese [00:08:54] japanese so you know quite a bit of uh [00:08:57] so you know quite a bit of uh python 2 python 3. yes very nice [00:09:01] python 2 python 3. yes very nice okay so [00:09:04] okay so one thing that is [00:09:06] one thing that is i often get asked is you know all these [00:09:08] i often get asked is you know all these languages are some more harder or more [00:09:10] languages are some more harder or more powerful than another um it's clear that [00:09:13] powerful than another um it's clear that they're and i think widely accepted that [00:09:15] they're and i think widely accepted that all languages are kind of basically [00:09:17] all languages are kind of basically equivalent but there is kind of this [00:09:19] equivalent but there is kind of this hypothesis around the 20s called the [00:09:22] hypothesis around the 20s called the superior wharf hypothesis that says the [00:09:24] superior wharf hypothesis that says the structure of language affects speaker's [00:09:26] structure of language affects speaker's world views [00:09:28] world views and you see this in kind of a fiction [00:09:30] and you see this in kind of a fiction like george orwell's 1984 which talks [00:09:34] like george orwell's 1984 which talks about a new language called newspeak [00:09:36] about a new language called newspeak which was simplified so that they could [00:09:39] which was simplified so that they could make sure people couldn't think to even [00:09:41] make sure people couldn't think to even critique their government [00:09:43] critique their government um this has been challenged by a [00:09:45] um this has been challenged by a universalist school chomsky and pinker [00:09:47] universalist school chomsky and pinker who thinks that language and thoughts [00:09:50] who thinks that language and thoughts are universal and all the differences [00:09:52] are universal and all the differences are very superficial governed by a few [00:09:54] are very superficial governed by a few parameters [00:09:57] and [00:09:58] and it is true that languages do differ [00:10:01] it is true that languages do differ they are largely the same most languages [00:10:03] they are largely the same most languages have nouns and verbs that refer we all [00:10:06] have nouns and verbs that refer we all are humans living in the in the same [00:10:08] are humans living in the in the same world um but there are some differences [00:10:11] world um but there are some differences you know one for an example is that [00:10:13] you know one for an example is that english lacks what is called uh [00:10:15] english lacks what is called uh inclusivity which is the distinction [00:10:17] inclusivity which is the distinction between [00:10:19] between when you say we it's ambiguous whether [00:10:21] when you say we it's ambiguous whether you mean [00:10:22] you mean to include the speaker or not to include [00:10:24] to include the speaker or not to include the speaker whereas some languages like [00:10:26] the speaker whereas some languages like tamil [00:10:27] tamil or some chinese dialects actually have [00:10:30] or some chinese dialects actually have that distinction [00:10:31] that distinction or mandarin chinese lacks the [00:10:33] or mandarin chinese lacks the distinction between past tense and [00:10:35] distinction between past tense and present tense but of course has other [00:10:37] present tense but of course has other ways of you know accommodating for that [00:10:41] ways of you know accommodating for that um [00:10:42] um so one question uh maybe to just have [00:10:45] so one question uh maybe to just have another poll is [00:10:47] another poll is let me stop that poll is do you believe [00:10:52] let me stop that poll is do you believe believe that language shapes thought [00:10:56] believe that language shapes thought now [00:10:58] now i know that these uh questions are [00:11:02] i know that these uh questions are um you know obviously not binary but i [00:11:04] um you know obviously not binary but i just wanted you to kind of you know get [00:11:06] just wanted you to kind of you know get a gut feeling are you leaning more [00:11:08] a gut feeling are you leaning more towards yes or no on that and i will [00:11:11] towards yes or no on that and i will activate that [00:11:12] activate that poll so if you are do you kind of abide [00:11:16] poll so if you are do you kind of abide more by the uh superior warf hypothesis [00:11:18] more by the uh superior warf hypothesis that the structure of language does [00:11:20] that the structure of language does influence how you think about the world [00:11:22] influence how you think about the world or do you think that uh all humans are [00:11:25] or do you think that uh all humans are really the same and as we just happen to [00:11:27] really the same and as we just happen to learn different languages and those [00:11:29] learn different languages and those differences are fairly minor [00:11:34] so can you guys see the [00:11:36] so can you guys see the um [00:11:38] um the numbers [00:11:40] okay so we have [00:11:42] okay so we have about 90 percent superior dwarf um [00:11:46] about 90 percent superior dwarf um and about 10 no okay [00:11:50] and about 10 no okay so uh this is a richly hot hotly debated [00:11:53] so uh this is a richly hot hotly debated topic in linguistics even you know to [00:11:56] topic in linguistics even you know to this day [00:11:59] so another fascinating thing about [00:12:02] so another fascinating thing about language is [00:12:03] language is that we're not born knowing it babies [00:12:06] that we're not born knowing it babies can make sounds but it takes them a few [00:12:08] can make sounds but it takes them a few years to be able to actually acquire [00:12:10] years to be able to actually acquire language and importantly despite what [00:12:12] language and importantly despite what their parents might think they're [00:12:14] their parents might think they're they're not taught from explicit [00:12:15] they're not taught from explicit instructions [00:12:17] instructions or from teachers rather they're taught [00:12:19] or from teachers rather they're taught where they learn naturally from [00:12:21] where they learn naturally from immersion of language and you know by [00:12:24] immersion of language and you know by the time they're five they're actually [00:12:26] the time they're five they're actually fairly kind of have a fluent [00:12:28] fairly kind of have a fluent grasp of the language and speak [00:12:30] grasp of the language and speak grammatical sentences and so on um and [00:12:33] grammatical sentences and so on um and its language acquisition is very [00:12:35] its language acquisition is very multi-modal and grounded so language [00:12:38] multi-modal and grounded so language accompanies sight and sounds and [00:12:41] accompanies sight and sounds and actions and touch and all these things [00:12:43] actions and touch and all these things and it's active you can't teach a child [00:12:46] and it's active you can't teach a child by just putting in front of a tv and [00:12:49] by just putting in front of a tv and expecting language acquisition to [00:12:51] expecting language acquisition to um occur [00:12:53] um occur so uh you know one of the big big [00:12:55] so uh you know one of the big big questions around language acquisition is [00:12:58] questions around language acquisition is um [00:12:59] um is is the nature of nurture debate which [00:13:02] is is the nature of nurture debate which also [00:13:03] also affects kind of other [00:13:06] affects kind of other areas as well so [00:13:08] areas as well so the great question is language [00:13:11] the great question is language you know innate um and [00:13:14] you know innate um and um so chomsky who was a famous linguist [00:13:18] um so chomsky who was a famous linguist in the 50s he came up with this idea of [00:13:21] in the 50s he came up with this idea of what he called poverty of the stimulus [00:13:23] what he called poverty of the stimulus and he said that [00:13:25] and he said that basically [00:13:26] basically sentences that a child hears can't [00:13:29] sentences that a child hears can't possibly [00:13:30] possibly be [00:13:33] be responsible for the uh the richness [00:13:36] be responsible for the uh the richness of language that's um you know exhibited [00:13:38] of language that's um you know exhibited in actual [00:13:40] in actual in actual humans [00:13:42] in actual humans um [00:13:43] um so [00:13:44] so so he thought he concluded naturally [00:13:47] so he thought he concluded naturally that language must be [00:13:49] that language must be a large part of language must be [00:13:52] a large part of language must be innate [00:13:55] so i'm trying to delete this poll and [00:13:57] so i'm trying to delete this poll and add it add a new one at the same time [00:13:59] add it add a new one at the same time i'm talking [00:14:00] i'm talking okay so i'll ask you this poll um do you [00:14:03] okay so i'll ask you this poll um do you think language is an egg [00:14:05] think language is an egg um [00:14:07] um and if you think about it he does have [00:14:10] and if you think about it he does have a fair [00:14:11] a fair you know point because um [00:14:14] you know point because um you know uh [00:14:15] you know uh you know we do as children here's so few [00:14:18] you know we do as children here's so few of the examples [00:14:20] of the examples and then we can uh [00:14:22] and then we can uh you know not nearly enough to really [00:14:24] you know not nearly enough to really capture all the cases and we are [00:14:26] capture all the cases and we are constantly run into new language all [00:14:28] constantly run into new language all over the [00:14:29] over the all the time and we have to generalize [00:14:31] all the time and we have to generalize compositionally to longer sentences to [00:14:33] compositionally to longer sentences to new context and we kind of all land on [00:14:36] new context and we kind of all land on this kind of same language so [00:14:38] this kind of same language so um you know i think he [00:14:40] um you know i think he does have a point on the other hand you [00:14:42] does have a point on the other hand you know he was an experimentalist he was a [00:14:44] know he was an experimentalist he was a kind of a you know classic armchair uh [00:14:46] kind of a you know classic armchair uh linguist who thought deep thoughts about [00:14:48] linguist who thought deep thoughts about uh how things should be [00:14:50] uh how things should be um [00:14:51] um and there's also you know one could [00:14:54] and there's also you know one could imagine you know what about the role of [00:14:56] imagine you know what about the role of grounding and maybe that is the it's [00:14:58] grounding and maybe that is the it's part of the experience that really [00:14:59] part of the experience that really shapes [00:15:00] shapes language [00:15:02] language acquisition and maybe we are just kind [00:15:04] acquisition and maybe we are just kind of really malleable and so on [00:15:07] of really malleable and so on so it seems like um everyone's quite [00:15:10] so it seems like um everyone's quite divided on this so 53 yes and 47 [00:15:14] divided on this so 53 yes and 47 no [00:15:15] no okay that's always fun so maybe you guys [00:15:19] okay that's always fun so maybe you guys can [00:15:20] can you know talk about it that's with your [00:15:21] you know talk about it that's with your friends [00:15:22] friends um what i will say oh now now that whoa [00:15:25] um what i will say oh now now that whoa this is a tight race [00:15:26] this is a tight race um what i will say is that you know no [00:15:29] um what i will say is that you know no matter where you're on the spectrum once [00:15:31] matter where you're on the spectrum once you have kids you you really think that [00:15:34] you have kids you you really think that there's more innateness than there's not [00:15:36] there's more innateness than there's not so [00:15:37] so now it's equal 50 50. [00:15:41] now it's equal 50 50. okay [00:15:41] okay great [00:15:43] great so let's move on [00:15:44] so let's move on so let's take a look at language you [00:15:47] so let's take a look at language you know itself and i'm going to introduce [00:15:48] know itself and i'm going to introduce some basic concepts so there's a whole [00:15:50] some basic concepts so there's a whole field of linguistics that studies [00:15:52] field of linguistics that studies language and i really encourage you if [00:15:54] language and i really encourage you if you're interested to take a linguistic [00:15:56] you're interested to take a linguistic class i think it's one of the most [00:15:58] class i think it's one of the most interesting um eye-opening kind of [00:16:01] interesting um eye-opening kind of experiences um [00:16:03] experiences um but i'll just go over some kind of basic [00:16:05] but i'll just go over some kind of basic things quickly so here's a sentence [00:16:07] things quickly so here's a sentence beethoven was born in bonn and displays [00:16:08] beethoven was born in bonn and displays his musical talents at an early age now [00:16:11] his musical talents at an early age now what's going on in the sentence the [00:16:12] what's going on in the sentence the linguists ask you know what is the [00:16:14] linguists ask you know what is the structure of the sentence allows us to [00:16:16] structure of the sentence allows us to understand what it means so first of all [00:16:18] understand what it means so first of all there's tokenization which is you know [00:16:20] there's tokenization which is you know the [00:16:22] the uh the sentence is just a stream of [00:16:24] uh the sentence is just a stream of characters right and tokenization is a [00:16:27] characters right and tokenization is a process of converting that into words [00:16:29] process of converting that into words seems very simple right but as we'll see [00:16:31] seems very simple right but as we'll see later it's not as simple as it kind of [00:16:33] later it's not as simple as it kind of looks um [00:16:34] looks um uh part of speech tagging is uh the idea [00:16:38] uh part of speech tagging is uh the idea that some words are nouns and other [00:16:40] that some words are nouns and other words are [00:16:41] words are verbs and some of the words words verbs [00:16:43] verbs and some of the words words verbs are past tense versus present tense um [00:16:46] are past tense versus present tense um parsing goes a bit a step further and [00:16:49] parsing goes a bit a step further and talks about the grammatical relationship [00:16:51] talks about the grammatical relationship between words for example displayed [00:16:54] between words for example displayed has a subject and object and the subject [00:16:57] has a subject and object and the subject is in this case [00:16:58] is in this case what is the subject [00:17:05] anyone [00:17:08] i know they're english speakers in the [00:17:10] i know they're english speakers in the audience [00:17:16] is it bond [00:17:21] okay i think maybe it's okay it's [00:17:23] okay i think maybe it's okay it's beethoven right okay so [00:17:25] beethoven right okay so um even though beethoven is very far [00:17:27] um even though beethoven is very far away from display it's nonetheless [00:17:29] away from display it's nonetheless grammatically very close to displayed [00:17:32] grammatically very close to displayed and this is what [00:17:34] and this is what this is a property of language language [00:17:36] this is a property of language language has to be linearized so not everything [00:17:38] has to be linearized so not everything can be close to each other [00:17:40] can be close to each other um [00:17:40] um core reference resolution or anaphora [00:17:42] core reference resolution or anaphora corresponds to the fact that his some [00:17:45] corresponds to the fact that his some words are [00:17:47] words are pointers to other words or concepts so [00:17:50] pointers to other words or concepts so his refers to beethoven um named entity [00:17:53] his refers to beethoven um named entity recognition is the task of identifying [00:17:55] recognition is the task of identifying which entities now usually proper nouns [00:17:58] which entities now usually proper nouns are per people or locations or [00:18:00] are per people or locations or organizations [00:18:02] organizations um [00:18:03] um so [00:18:04] so what is what is a word [00:18:06] what is what is a word so let's go down into a word so okay [00:18:08] so let's go down into a word so okay here's a word light so [00:18:10] here's a word light so you know it seems like a word is pretty [00:18:12] you know it seems like a word is pretty straightforward forward at least in [00:18:13] straightforward forward at least in english [00:18:14] english but [00:18:15] but the problem is that the meaning [00:18:17] the problem is that the meaning of unit [00:18:18] of unit what what is kind of conceptually should [00:18:21] what what is kind of conceptually should be a word actually goes beyond a word so [00:18:22] be a word actually goes beyond a word so light bulb is kind of a unit it's not um [00:18:26] light bulb is kind of a unit it's not um light doesn't really capture the the [00:18:28] light doesn't really capture the the really the full meaning of that word um [00:18:31] really the full meaning of that word um sometimes the meaning unit is within a [00:18:33] sometimes the meaning unit is within a word for example lightning isn't just a [00:18:35] word for example lightning isn't just a blob it consists of light and then [00:18:39] blob it consists of light and then the suffix en ing which kind of uh [00:18:42] the suffix en ing which kind of uh you know trims it into [00:18:44] you know trims it into um a verb you know gerund construction [00:18:47] um a verb you know gerund construction um there's also cases where words have [00:18:49] um there's also cases where words have multiple meanings usually called word [00:18:51] multiple meanings usually called word senses so in all of these sentences you [00:18:54] senses so in all of these sentences you can see that light actually has a [00:18:56] can see that light actually has a different meaning [00:18:57] different meaning and we figure it out based on context [00:19:02] on the converse [00:19:04] on the converse conversely [00:19:05] conversely some [00:19:06] some meanings have multiple words that refer [00:19:09] meanings have multiple words that refer to essentially the same meaning that's [00:19:11] to essentially the same meaning that's synonymy this also happens with [00:19:13] synonymy this also happens with sentences so this is called paraphrase [00:19:15] sentences so this is called paraphrase multiple sentences could get at the same [00:19:18] multiple sentences could get at the same meaning [00:19:20] meaning just some huge caveat is that there's no [00:19:22] just some huge caveat is that there's no true equivalences between [00:19:24] true equivalences between any two words or sentence sentences [00:19:26] any two words or sentence sentences there's always different subtleties and [00:19:28] there's always different subtleties and meanings so you can think of more as a [00:19:30] meanings so you can think of more as a kind of distance metric [00:19:31] kind of distance metric um there's also [00:19:33] um there's also notions of relations between words like [00:19:36] notions of relations between words like hypotonomy which is is a relations and [00:19:38] hypotonomy which is is a relations and mironomy which is hazard relations and [00:19:41] mironomy which is hazard relations and this allows you to make do entailment um [00:19:44] this allows you to make do entailment um which is whether a sentence um [00:19:47] which is whether a sentence um logically um [00:19:50] logically um implies a second sentence um entailment [00:19:52] implies a second sentence um entailment you can kind of think about the the [00:19:54] you can kind of think about the the three set of end of language it's kind [00:19:57] three set of end of language it's kind of the problem that embodies um a lot of [00:19:59] of the problem that embodies um a lot of different tasks if you could solve [00:20:00] different tasks if you could solve entailment you can do question answering [00:20:02] entailment you can do question answering you can do sentiment classification you [00:20:04] you can do sentiment classification you know and so on [00:20:07] know and so on um [00:20:08] um oh are there [00:20:11] i haven't been monitoring the qa i don't [00:20:12] i haven't been monitoring the qa i don't know i don't think there's any questions [00:20:15] know i don't think there's any questions if you have a question maybe just uh [00:20:17] if you have a question maybe just uh holler [00:20:21] so this is all about lexical semantics [00:20:23] so this is all about lexical semantics the meaning of words um then we talk [00:20:25] the meaning of words um then we talk about compositional semantics [00:20:28] about compositional semantics um so this composition of semantics is [00:20:30] um so this composition of semantics is kind of rich tradition that goes back to [00:20:32] kind of rich tradition that goes back to kind of logic so this is fraga who's a [00:20:35] kind of logic so this is fraga who's a was a logician at the turn of the 20th [00:20:38] was a logician at the turn of the 20th century [00:20:39] century and there's two ideas um model theory [00:20:42] and there's two ideas um model theory and compositionality which i'll explain [00:20:44] and compositionality which i'll explain so the first is that sentences [00:20:47] so the first is that sentences are these are just symbols right it's [00:20:49] are these are just symbols right it's it's a convention that we say block 2 is [00:20:51] it's a convention that we say block 2 is blue and what [00:20:53] blue and what this sentence means has to be associated [00:20:55] this sentence means has to be associated with what is in the world so there's a [00:20:58] with what is in the world so there's a world in which the block 2 is actually [00:21:00] world in which the block 2 is actually blue and so this is kind of important [00:21:03] blue and so this is kind of important distinction which we kind of you know we [00:21:05] distinction which we kind of you know we kind of gloss over or you probably just [00:21:08] kind of gloss over or you probably just we don't even think of it because [00:21:10] we don't even think of it because of uh you know language is so [00:21:12] of uh you know language is so natural [00:21:14] natural the second one is compositionality which [00:21:16] the second one is compositionality which is that the meaning of a whole is the [00:21:18] is that the meaning of a whole is the meaning of the parts so you could that [00:21:20] meaning of the parts so you could that is this is compositionality is the key [00:21:22] is this is compositionality is the key thing that allows [00:21:24] thing that allows us to build [00:21:25] us to build more complex meanings out of smaller [00:21:28] more complex meanings out of smaller units and this is probably the reason [00:21:30] units and this is probably the reason why we can generalize to all sorts of [00:21:32] why we can generalize to all sorts of new contexts because we've learned [00:21:34] new contexts because we've learned the meaning of the words and we know how [00:21:36] the meaning of the words and we know how they combine [00:21:38] they combine together and that's how we can generate [00:21:42] together and that's how we can generate you know uh [00:21:43] you know uh we can interpret the new sentences and [00:21:45] we can interpret the new sentences and new contexts [00:21:47] new contexts um [00:21:49] um quantifiers i think are really you know [00:21:51] quantifiers i think are really you know interesting so uh every is a word that [00:21:54] interesting so uh every is a word that says uh [00:21:56] says uh you know says that well it's hard to [00:21:58] you know says that well it's hard to explain language without explaining in [00:22:00] explain language without explaining in terms of language so every means every [00:22:02] terms of language so every means every hopefully these pictures tell you uh [00:22:04] hopefully these pictures tell you uh what's going on and some is a kind of an [00:22:06] what's going on and some is a kind of an existential quantifier [00:22:09] existential quantifier there's also a quantifier scope [00:22:11] there's also a quantifier scope ambiguity which uh [00:22:13] ambiguity which uh means that if you have every non-blue [00:22:15] means that if you have every non-blue block is next to some blue block that [00:22:17] block is next to some blue block that could mean that every blue block is next [00:22:19] could mean that every blue block is next to some blue block which is that that [00:22:22] to some blue block which is that that was very tautological which is [00:22:25] was very tautological which is is could be different or that there [00:22:27] is could be different or that there exists some blue block that's actually [00:22:29] exists some blue block that's actually next to some [00:22:30] next to some to every non-blue block [00:22:33] to every non-blue block um [00:22:34] um so language is you know ambiguous um [00:22:38] so language is you know ambiguous um so modality uh [00:22:41] so modality uh or involve words like must and can [00:22:44] or involve words like must and can um [00:22:45] um and this [00:22:47] and this has to has to do with possible you know [00:22:50] has to has to do with possible you know worlds in all these possible worlds [00:22:52] worlds in all these possible worlds block 2 is blue but block one is only [00:22:58] block 2 is blue but block one is only a true in is read in one of the worlds [00:23:01] a true in is read in one of the worlds um beliefs are [00:23:02] um beliefs are interesting so um [00:23:04] interesting so um you know we know that clark kent is the [00:23:06] you know we know that clark kent is the same person as superman [00:23:08] same person as superman but [00:23:10] but um and naively you might think that we [00:23:11] um and naively you might think that we can just substitute these two [00:23:13] can just substitute these two uh in all contexts because they're [00:23:15] uh in all contexts because they're equivalent but you know lewis believes [00:23:18] equivalent but you know lewis believes that [00:23:18] that superman is a hero is not the same as [00:23:20] superman is a hero is not the same as lewis believes that clark kent is a hero [00:23:23] lewis believes that clark kent is a hero and this has to do with the fact that [00:23:25] and this has to do with the fact that you know this believes sets up a kind of [00:23:27] you know this believes sets up a kind of opaque context which you can't just do [00:23:29] opaque context which you can't just do substitution [00:23:31] substitution there's much more to be said you know [00:23:32] there's much more to be said you know about this if you study linguistics i [00:23:34] about this if you study linguistics i just want to give you a flavor for [00:23:36] just want to give you a flavor for how language can be kind of quite subtle [00:23:39] how language can be kind of quite subtle um here's some other examples of you [00:23:41] um here's some other examples of you know pragmatics [00:23:43] know pragmatics so um pragmatics the conversational [00:23:46] so um pragmatics the conversational implicature is this phenomenon where um [00:23:50] implicature is this phenomenon where um there is a sentence that you say [00:23:52] there is a sentence that you say um but there's actually additional [00:23:54] um but there's actually additional meaning beyond that sentence so if two [00:23:56] meaning beyond that sentence so if two people are talking and [00:23:57] people are talking and a says what earth happened to the roast [00:24:00] a says what earth happened to the roast beef and b says the dog is looking very [00:24:02] beef and b says the dog is looking very happy [00:24:03] happy sure the dog is looking very happy [00:24:05] sure the dog is looking very happy that's a sentence but really the [00:24:07] that's a sentence but really the implicature is the dog ate the roast [00:24:09] implicature is the dog ate the roast beef [00:24:10] beef um presupposition is [00:24:12] um presupposition is is actually kind of very [00:24:14] is actually kind of very you know [00:24:15] you know subtle but different is the background [00:24:17] subtle but different is the background assumption that's independent of the [00:24:18] assumption that's independent of the truth of a sentence so if i say i have [00:24:21] truth of a sentence so if i say i have stopped eating meat [00:24:23] stopped eating meat what's the presupposition [00:24:25] what's the presupposition that means i was once eating meat [00:24:27] that means i was once eating meat okay so regardless of whether i have [00:24:29] okay so regardless of whether i have stopped me or meat or if i even have i [00:24:31] stopped me or meat or if i even have i said i didn't stop eating meat that [00:24:33] said i didn't stop eating meat that still pre-supposes i was once eating [00:24:35] still pre-supposes i was once eating meat so presuppositions are these very [00:24:38] meat so presuppositions are these very slippery and insidious things that um [00:24:43] slippery and insidious things that um people [00:24:44] people use [00:24:45] use to convey um to convince other people of [00:24:48] to convey um to convince other people of things that without them knowing [00:24:50] things that without them knowing so be it's really useful to know what a [00:24:52] so be it's really useful to know what a presupposition is because if someone [00:24:54] presupposition is because if someone tries to presuppose something on you [00:24:56] tries to presuppose something on you then at least you'll have the language [00:24:58] then at least you'll have the language to you know detect what it is and it's [00:25:00] to you know detect what it is and it's precisely insidious because it's in the [00:25:01] precisely insidious because it's in the background so you're focusing on you [00:25:04] background so you're focusing on you know uh did i stop eating meat without [00:25:05] know uh did i stop eating meat without realizing that you just got you know a [00:25:07] realizing that you just got you know a presupposition that you might not agree [00:25:09] presupposition that you might not agree with uh late on you [00:25:13] okay so um paul grice who's this full of [00:25:17] okay so um paul grice who's this full of you know philosopher [00:25:19] you know philosopher um [00:25:20] um the established uh with kind of [00:25:23] the established uh with kind of established language as a kind of [00:25:25] established language as a kind of cooperative game between a speaker and a [00:25:28] cooperative game between a speaker and a listener and is that the dynamics of the [00:25:31] listener and is that the dynamics of the game is what gives rise to these things [00:25:33] game is what gives rise to these things like conversational implicature and [00:25:35] like conversational implicature and presupposition right this goes back to [00:25:37] presupposition right this goes back to xt comment it's really a [00:25:41] xt comment it's really a a game between speakers and listeners [00:25:43] a game between speakers and listeners who are trying to communicate and agree [00:25:45] who are trying to communicate and agree on something and the conventions and [00:25:47] on something and the conventions and what language means in all these [00:25:49] what language means in all these contexts is uh really kind of context [00:25:52] contexts is uh really kind of context dependent and fluid [00:25:55] dependent and fluid uh just a few other ideas um ambiguity [00:25:58] uh just a few other ideas um ambiguity vagueness and uncertainty so let me try [00:26:01] vagueness and uncertainty so let me try to [00:26:02] to explain what each of these means and how [00:26:04] explain what each of these means and how it's different so ambiguity means that a [00:26:06] it's different so ambiguity means that a sentence has more than one [00:26:08] sentence has more than one possible but precise interpretation so [00:26:11] possible but precise interpretation so here are some headlines um and let me [00:26:13] here are some headlines um and let me know what you think of them [00:26:15] know what you think of them okay so stolen painting found by tree [00:26:18] okay so stolen painting found by tree okay so what does that mean [00:26:20] okay so what does that mean how about iraqi head seeks arms [00:26:23] how about iraqi head seeks arms or local high school dropouts cut in [00:26:26] or local high school dropouts cut in half [00:26:27] half juvenile court to try shooting defendant [00:26:30] juvenile court to try shooting defendant kids mix nutritious snacks ban on new [00:26:34] kids mix nutritious snacks ban on new dancing on governor's desk [00:26:37] dancing on governor's desk and you can see if you're i have smiling [00:26:39] and you can see if you're i have smiling a little bit that um [00:26:42] a little bit that um these these [00:26:43] these these headlines are funny because they have uh [00:26:46] headlines are funny because they have uh a [00:26:47] a maybe a serious meaning and then a [00:26:48] maybe a serious meaning and then a meaning which is totally ridiculous but [00:26:50] meaning which is totally ridiculous but um is nonetheless [00:26:52] um is nonetheless kind of technically ambiguous [00:26:55] kind of technically ambiguous um vagueness [00:26:57] um vagueness is where a sentence is not uh [00:27:00] is where a sentence is not uh has one interpretation but it does not [00:27:02] has one interpretation but it does not specify the full information so if i [00:27:04] specify the full information so if i said i had a late lunch [00:27:06] said i had a late lunch you know there's no [00:27:07] you know there's no ambiguity there it's just that i didn't [00:27:10] ambiguity there it's just that i didn't tell you what time i laid lunch maybe it [00:27:12] tell you what time i laid lunch maybe it was one o'clock or two o'clock or [00:27:14] was one o'clock or two o'clock or something [00:27:17] uncertainty is another form of you know [00:27:20] uncertainty is another form of you know not knowing something and it's due to [00:27:22] not knowing something and it's due to having a perfect uh model [00:27:24] having a perfect uh model so say the witness was being [00:27:26] so say the witness was being contumacious um [00:27:28] contumacious um you know some of you probably know what [00:27:30] you know some of you probably know what that means so you're not uncertain but [00:27:31] that means so you're not uncertain but you know i'm some of you probably don't [00:27:34] you know i'm some of you probably don't um [00:27:35] um and uh you have this uncertainty which [00:27:38] and uh you have this uncertainty which is not a property of the sentence [00:27:41] is not a property of the sentence but of your uh the speaker's ability to [00:27:44] but of your uh the speaker's ability to understand you know natural language [00:27:48] understand you know natural language so all these things are [00:27:50] so all these things are useful to think about differently [00:27:52] useful to think about differently although often they get conflated [00:27:54] although often they get conflated especially in kind of more [00:27:56] especially in kind of more model-free [00:27:58] model-free methods [00:28:01] um so i will say that there is another [00:28:04] um so i will say that there is another forum of style of linguistics called [00:28:06] forum of style of linguistics called distributional semantics which is [00:28:08] distributional semantics which is actually goes back to the 50s as well um [00:28:11] actually goes back to the 50s as well um and i'll give you the basic idea so if i [00:28:14] and i'll give you the basic idea so if i give you these sentences the new design [00:28:16] give you these sentences the new design has blank lines [00:28:18] has blank lines let's try to keep the kitchen blank [00:28:20] let's try to keep the kitchen blank i forgot to [00:28:22] i forgot to blink out the cabinet [00:28:24] blink out the cabinet so what does blank mean [00:28:27] so what does blank mean or which what word goes there [00:28:36] someone say the answer [00:28:39] someone say the answer the answer's a chat [00:28:41] the answer's a chat oh okay i didn't know there was a this [00:28:44] oh okay i didn't know there was a this is in the zoom chat [00:28:45] is in the zoom chat yep okay [00:28:48] yep okay let's see [00:28:51] uh [00:28:53] uh i think i lost [00:28:55] i think i lost my [00:28:55] my [Music] [00:28:57] [Music] okay chat okay there we go [00:29:00] okay chat okay there we go ah okay there are our answers okay clean [00:29:03] ah okay there are our answers okay clean great [00:29:06] okay got it [00:29:09] okay so the idea about disreputation of [00:29:12] okay so the idea about disreputation of semantics is i didn't have to tell you [00:29:13] semantics is i didn't have to tell you what the word means the meaning of the [00:29:15] what the word means the meaning of the word is characterized by the contexts in [00:29:17] word is characterized by the contexts in which it appears so this is idea of the [00:29:19] which it appears so this is idea of the distributional hypothesis semantically [00:29:21] distributional hypothesis semantically similar words occur in similar contexts [00:29:23] similar words occur in similar contexts or more eloquently said by [00:29:26] or more eloquently said by firth you shall know a word by the [00:29:29] firth you shall know a word by the company it keeps [00:29:31] company it keeps so this is another way of thinking about [00:29:33] so this is another way of thinking about semantics um [00:29:35] semantics um and actually the one that has [00:29:38] and actually the one that has really been uh [00:29:40] really been uh picked up because it's so synergistic [00:29:43] picked up because it's so synergistic with a modern kind of statistical [00:29:44] with a modern kind of statistical techniques [00:29:46] techniques so just going to summarize um [00:29:48] so just going to summarize um there's there's two ways of thinking [00:29:50] there's there's two ways of thinking about semantics or meaning of sentences [00:29:52] about semantics or meaning of sentences one is composite essential semantics [00:29:54] one is composite essential semantics where it's more top down and you model [00:29:57] where it's more top down and you model first so you think about how [00:30:00] first so you think about how language works you try to you know think [00:30:03] language works you try to you know think about parse trees or you know semantic [00:30:05] about parse trees or you know semantic forms [00:30:06] forms um and you can capture a lot that way we [00:30:10] um and you can capture a lot that way we can we went through a lot of examples [00:30:12] can we went through a lot of examples where [00:30:13] where language you can feel kind of language [00:30:15] language you can feel kind of language does obey certain types of structures on [00:30:18] does obey certain types of structures on the other hand you can [00:30:19] the other hand you can think about distribution of semantics [00:30:21] think about distribution of semantics which is a bottom-up data first approach [00:30:23] which is a bottom-up data first approach and generally associated with vector [00:30:25] and generally associated with vector spaces where [00:30:26] spaces where you think about just words as [00:30:30] you think about just words as meaning and not really trying to nail [00:30:32] meaning and not really trying to nail down what meaning is but just associated [00:30:34] down what meaning is but just associated with the set of contexts in which you [00:30:36] with the set of contexts in which you know appears [00:30:38] know appears so let me do another [00:30:40] so let me do another poll [00:30:42] and [00:30:44] and let me hold on i need to create this [00:30:46] let me hold on i need to create this question [00:30:48] question um [00:30:50] um and ask [00:30:52] and ask what do you think is the best way to [00:30:54] what do you think is the best way to achieve natural language you know [00:30:56] achieve natural language you know understanding [00:31:02] so is it compositional semantics or [00:31:05] so is it compositional semantics or distributional semantics [00:31:10] okay so go to slido and [00:31:14] okay so go to slido and i'm curious what you you think [00:31:20] okay so [00:31:28] it would have been interesting to go [00:31:29] it would have been interesting to go back maybe 10 years and ask this [00:31:31] back maybe 10 years and ask this question [00:31:32] question because i think the answers would have [00:31:34] because i think the answers would have been quite different [00:31:35] been quite different and i'll talk a little bit more about [00:31:37] and i'll talk a little bit more about that in a bit [00:31:40] so it looks like [00:31:42] so it looks like it's about 30 [00:31:44] it's about 30 compositional and 70 [00:31:47] compositional and 70 maybe quarter three quarters [00:31:51] maybe quarter three quarters so most of you think that distributional [00:31:52] so most of you think that distributional semantics is the way to go [00:31:54] semantics is the way to go which is uh [00:31:56] which is uh maybe a concordant with [00:31:59] maybe a concordant with what is happening in the world right now [00:32:05] okay so why do i take um [00:32:07] okay so why do i take um a few minutes right now i went through a [00:32:09] a few minutes right now i went through a lot of material and maybe i'll just ask [00:32:12] lot of material and maybe i'll just ask if there are any questions [00:32:15] if there are any questions to discuss [00:32:45] no questions [00:32:49] someone has to have a question [00:32:55] so one question is context information [00:32:57] so one question is context information it's never spelled out but the meaning [00:32:59] it's never spelled out but the meaning depends on who is speaking out where [00:33:03] depends on who is speaking out where yeah so i've been kind of deliberately [00:33:05] yeah so i've been kind of deliberately vague about what context [00:33:08] vague about what context is [00:33:09] is um traditionally it's a linguistic [00:33:11] um traditionally it's a linguistic context which is the words next to a [00:33:14] context which is the words next to a sentence but um you could imagine that [00:33:16] sentence but um you could imagine that it could be very much generalized to [00:33:19] it could be very much generalized to uh [00:33:19] uh you know context of [00:33:21] you know context of the speaker [00:33:23] the speaker in the you know multilink multimodal [00:33:25] in the you know multilink multimodal what's going on in the world what who is [00:33:27] what's going on in the world what who is speaking to whom the person is speaking [00:33:30] speaking to whom the person is speaking and all of that [00:33:32] and all of that rich [00:33:34] rich you know contextual information [00:33:36] you know contextual information it definitely is uh [00:33:39] it definitely is uh useful for understanding the meaning of [00:33:41] useful for understanding the meaning of that word [00:33:43] that word um so in the beginning you said that [00:33:45] um so in the beginning you said that humans have the highest level of [00:33:46] humans have the highest level of communication how likely is it that so [00:33:48] communication how likely is it that so actually some animal has much higher [00:33:50] actually some animal has much higher level of communication but we're not [00:33:52] level of communication but we're not smart enough to understand it yeah [00:33:54] smart enough to understand it yeah that's a that's an interesting question [00:33:56] that's a that's an interesting question um [00:33:58] um so [00:33:59] so um [00:34:01] um it's yeah it's almost a little bit of a [00:34:04] it's yeah it's almost a little bit of a you know philosophical question [00:34:06] you know philosophical question uh because [00:34:08] uh because um [00:34:09] um you know it is you know in theory [00:34:11] you know it is you know in theory possible that some animal has you know [00:34:13] possible that some animal has you know brilliant system of communication and uh [00:34:16] brilliant system of communication and uh we just didn't you know measure it you [00:34:18] we just didn't you know measure it you know properly people have been surprised [00:34:20] know properly people have been surprised by how sophisticated certain animals are [00:34:23] by how sophisticated certain animals are able to you know communicate um [00:34:27] able to you know communicate um you know like you know dolphins or [00:34:29] you know like you know dolphins or elephants or even you know uh um you [00:34:31] elephants or even you know uh um you know bees [00:34:33] know bees often people [00:34:35] often people separate the line between having [00:34:38] separate the line between having recursion or [00:34:40] recursion or language that's able to express kind of [00:34:42] language that's able to express kind of compositional thoughts versus ones which [00:34:44] compositional thoughts versus ones which are maybe very contextual [00:34:47] are maybe very contextual nuanced but um [00:34:50] nuanced but um but don't have that kind of level of [00:34:52] but don't have that kind of level of abstraction [00:34:53] abstraction um [00:34:54] um and so [00:34:56] and so you know according to that i think we're [00:34:58] you know according to that i think we're pretty sure that humans are [00:35:00] pretty sure that humans are the ones that have the most amount of [00:35:02] the ones that have the most amount of abstraction but then again i guess this [00:35:04] abstraction but then again i guess this is also a very human-centric way of [00:35:06] is also a very human-centric way of defining you know what highest level of [00:35:09] defining you know what highest level of communication means um because maybe [00:35:12] communication means um because maybe some other creatures have more you know [00:35:14] some other creatures have more you know contextual more nuanced than you in [00:35:16] contextual more nuanced than you in human language [00:35:18] human language um [00:35:19] um elephants communicating below 20 hertz [00:35:22] elephants communicating below 20 hertz infrasound [00:35:24] infrasound um [00:35:25] um okay and hitchhiker guide thanks [00:35:29] okay and hitchhiker guide thanks that's a good one how about uh plants [00:35:31] that's a good one how about uh plants communication can be [00:35:34] communication can be um [00:35:36] um chemical color temperature even touch [00:35:39] chemical color temperature even touch yeah so there's a lot of other and i [00:35:41] yeah so there's a lot of other and i guess in general communication is not [00:35:43] guess in general communication is not the [00:35:44] the uh [00:35:46] uh you know same thing as as language [00:35:51] yeah so i'm using language very narrowly [00:35:54] yeah so i'm using language very narrowly here to mean kind of human language and [00:35:57] here to mean kind of human language and what we know to be human language of [00:35:59] what we know to be human language of course even humans communicate in other [00:36:00] course even humans communicate in other ways like gestures and [00:36:02] ways like gestures and um [00:36:03] um and so on [00:36:08] okay great let me move on um [00:36:11] okay great let me move on um for the questions [00:36:13] for the questions so what i want to do next is to talk [00:36:16] so what i want to do next is to talk about [00:36:19] building language understanding systems [00:36:22] building language understanding systems and what has happened over the last [00:36:25] and what has happened over the last 60 years [00:36:26] 60 years now that we have maybe a greater [00:36:28] now that we have maybe a greater appreciation of what [00:36:29] appreciation of what language is [00:36:31] language is um [00:36:32] um so we've kind of seen this slide before [00:36:35] so we've kind of seen this slide before it's the turing test alan turing in 1950 [00:36:38] it's the turing test alan turing in 1950 asked the philosophical question devise [00:36:40] asked the philosophical question devise the terrain test [00:36:41] the terrain test to [00:36:42] to test whether [00:36:44] test whether um [00:36:45] um you know a a [00:36:47] you know a a computer could be or a machine could be [00:36:50] computer could be or a machine could be intelligent by [00:36:52] intelligent by seeing if it could talk to a human and [00:36:54] seeing if it could talk to a human and convince the human that it was actually [00:36:56] convince the human that it was actually a person [00:36:57] a person um this is one of the dialogues that [00:36:59] um this is one of the dialogues that they have in from his paper [00:37:02] they have in from his paper um you can read it [00:37:04] um you can read it um what i want to emphasize here is that [00:37:07] um what i want to emphasize here is that you know turning was not interested in [00:37:09] you know turning was not interested in language [00:37:10] language and wasn't trying to design a language [00:37:11] and wasn't trying to design a language understanding test he was trying to [00:37:13] understanding test he was trying to design [00:37:14] design a test of intelligence [00:37:16] a test of intelligence and language was just [00:37:18] and language was just the means to convince someone or to kind [00:37:22] the means to convince someone or to kind of verify that there was something you [00:37:24] of verify that there was something you know up there [00:37:26] know up there um [00:37:26] um and so i think this is kind of very [00:37:29] and so i think this is kind of very interesting from an ai perspective [00:37:31] interesting from an ai perspective because language does seem like it has a [00:37:34] because language does seem like it has a unique capability of [00:37:36] unique capability of being able to [00:37:38] being able to even [00:37:39] even you know [00:37:40] you know in order to have intelligence you need [00:37:42] in order to have intelligence you need to be able to [00:37:44] to be able to uh [00:37:45] uh essentially [00:37:46] essentially show it in some way and um [00:37:51] show it in some way and um and you can think all the deep thoughts [00:37:53] and you can think all the deep thoughts in your head but if you don't have a way [00:37:54] in your head but if you don't have a way of kind of externalizing it then well [00:37:57] of kind of externalizing it then well it's it's like the if the tree falls in [00:37:59] it's it's like the if the tree falls in the [00:38:00] the forest doesn't make a sound kind of you [00:38:02] forest doesn't make a sound kind of you know setting [00:38:03] know setting um so and it [00:38:05] um so and it language has turned out to be a very uh [00:38:08] language has turned out to be a very uh useful way of kind of probing for [00:38:10] useful way of kind of probing for understanding [00:38:12] understanding one of the early systems i want to [00:38:14] one of the early systems i want to highlight [00:38:16] highlight is [00:38:16] is know eliza so joseph weisenbaum when he [00:38:20] know eliza so joseph weisenbaum when he was at mit [00:38:21] was at mit um developed this system [00:38:23] um developed this system called elisa which was really meant to [00:38:25] called elisa which was really meant to be a parody of a psychotherapist [00:38:28] be a parody of a psychotherapist so the system [00:38:29] so the system would uh you would lie on a couch and [00:38:32] would uh you would lie on a couch and you would talk to the system [00:38:33] you would talk to the system um and uh some dialogues would go like [00:38:36] um and uh some dialogues would go like this or eliza would say is something [00:38:39] this or eliza would say is something troubling you men are all like what's [00:38:41] troubling you men are all like what's the connection do you suppose they're [00:38:43] the connection do you suppose they're always bugging me about something or [00:38:44] always bugging me about something or other can you think of a specific [00:38:45] other can you think of a specific example well my boyfriend made me to [00:38:47] example well my boyfriend made me to come here [00:38:49] come here is it important to you that your [00:38:50] is it important to you that your boyfriend made it you come here this is [00:38:53] boyfriend made it you come here this is so depressed and so on [00:38:55] so depressed and so on um so [00:38:57] um so what was interesting about this system [00:38:59] what was interesting about this system is that [00:39:00] is that the system itself is very simple this is [00:39:02] the system itself is very simple this is you know 1960s so there's not that much [00:39:05] you know 1960s so there's not that much kind of [00:39:06] kind of you know going on in terms of you know [00:39:08] you know going on in terms of you know complexity it was based on rules and [00:39:11] complexity it was based on rules and matching so you can imagine a rule that [00:39:13] matching so you can imagine a rule that says if the word alike shows then you [00:39:16] says if the word alike shows then you ask you know what is the connection or [00:39:18] ask you know what is the connection or if you [00:39:19] if you uh say always [00:39:21] uh say always then you say can you elijah says can you [00:39:24] then you say can you elijah says can you think of an example specific example and [00:39:26] think of an example specific example and so on [00:39:27] so on um so it was very simple but what joseph [00:39:30] um so it was very simple but what joseph weisenbaum found out which was [00:39:32] weisenbaum found out which was really surprising is that the people he [00:39:35] really surprising is that the people he uh showed this to actually started [00:39:37] uh showed this to actually started getting emotionally attached and there [00:39:39] getting emotionally attached and there was one incident where as invalid [00:39:40] was one incident where as invalid secretary actually asked joseph to leave [00:39:43] secretary actually asked joseph to leave the room so that the secretary could [00:39:45] the room so that the secretary could have a you know a real conversation with [00:39:48] have a you know a real conversation with eliza [00:39:50] eliza so [00:39:51] so this is kind of in the 60s i think it [00:39:54] this is kind of in the 60s i think it was very perhaps [00:39:56] was very perhaps you know i guess telling of maybe what [00:39:59] you know i guess telling of maybe what is to come later i'll talk about gpg3 [00:40:01] is to come later i'll talk about gpg3 which is obviously a much more realistic [00:40:04] which is obviously a much more realistic version of this but you could think [00:40:05] version of this but you could think definitely about some of the you know [00:40:07] definitely about some of the you know consequences of of that technology [00:40:10] consequences of of that technology incidentally [00:40:11] incidentally weisenbaum later [00:40:14] weisenbaum later in his career became very pessimistic [00:40:15] in his career became very pessimistic and actually very negative and critical [00:40:17] and actually very negative and critical about you know technology maybe because [00:40:20] about you know technology maybe because he uh had this epiphany that well what [00:40:22] he uh had this epiphany that well what we're building is actually maybe not so [00:40:25] we're building is actually maybe not so good after all [00:40:27] good after all um so this is one of my kind of favorite [00:40:30] um so this is one of my kind of favorite natural language uh you know systems [00:40:32] natural language uh you know systems it's been built by terry buenograd [00:40:34] it's been built by terry buenograd was also at mit but he moved to stanford [00:40:37] was also at mit but he moved to stanford where he was became a you know a hdi [00:40:40] where he was became a you know a hdi faculty for a number of years um it's [00:40:42] faculty for a number of years um it's called churlu and the idea is that you [00:40:44] called churlu and the idea is that you have a person who is able to conduct a [00:40:46] have a person who is able to conduct a dialogue about a block's world [00:40:49] dialogue about a block's world environment so pick up a red block okay [00:40:52] environment so pick up a red block okay grasp the pyramid this computer is going [00:40:55] grasp the pyramid this computer is going to say when it doesn't understand things [00:40:57] to say when it doesn't understand things um [00:40:58] um you know find a block which is taller [00:41:00] you know find a block which is taller than when you're holding and put it into [00:41:01] than when you're holding and put it into the box so it's you know fairly [00:41:03] the box so it's you know fairly complicated and that the computer can [00:41:05] complicated and that the computer can kind of reason and do a nap for a [00:41:07] kind of reason and do a nap for a co-reference resolution um [00:41:09] co-reference resolution um and ask for clarifications and so on [00:41:12] and ask for clarifications and so on um what i think is remarkable [00:41:15] um what i think is remarkable about the system [00:41:17] about the system is that it was an end-to-end system um [00:41:20] is that it was an end-to-end system um included a parser it can do semantic [00:41:22] included a parser it can do semantic interpretation dialogue planning it [00:41:24] interpretation dialogue planning it wasn't just a language system in fact it [00:41:27] wasn't just a language system in fact it was more framed as an ai system that [00:41:29] was more framed as an ai system that could allow [00:41:31] could allow a robot to kind of do things in the [00:41:33] a robot to kind of do things in the world [00:41:34] world um you know [00:41:36] um you know and so this was you know in some sense [00:41:39] and so this was you know in some sense um kind of the first the real super [00:41:42] um kind of the first the real super ambitious uh project for its time [00:41:46] ambitious uh project for its time however um you know while sherloo worked [00:41:50] however um you know while sherloo worked really well in the limited domain [00:41:53] really well in the limited domain um terry werner later wrote this [00:41:56] um terry werner later wrote this paragraph which is interesting said a [00:41:58] paragraph which is interesting said a number of people suggest me that [00:42:00] number of people suggest me that this is a dead end in programming [00:42:03] this is a dead end in programming complex interactions between the [00:42:04] complex interactions between the components [00:42:06] components made it just really hard to [00:42:08] made it just really hard to understand what was going on [00:42:10] understand what was going on so eventually [00:42:12] so eventually terry couldn't even [00:42:14] terry couldn't even build and extend the program because it [00:42:16] build and extend the program because it was just too hard to keep you know in [00:42:18] was just too hard to keep you know in his head [00:42:20] his head so this is you know interesting because [00:42:22] so this is you know interesting because uh as we know um language understanding [00:42:25] uh as we know um language understanding didn't you know really get uh [00:42:27] didn't you know really get uh you know solved despite these kind of [00:42:29] you know solved despite these kind of narrow successes [00:42:31] narrow successes um [00:42:32] um and [00:42:33] and the history of nlp [00:42:35] the history of nlp mirrors uh quite closely the history of [00:42:38] mirrors uh quite closely the history of you know ai [00:42:39] you know ai in general remember in the first lecture [00:42:42] in general remember in the first lecture i talked about how um ai was filled with [00:42:46] i talked about how um ai was filled with more kind of these logical based methods [00:42:48] more kind of these logical based methods which [00:42:48] which didn't quite scale [00:42:51] didn't quite scale what's interesting is that at that [00:42:54] what's interesting is that at that at that time in ai in general there were [00:42:56] at that time in ai in general there were people [00:42:57] people working on neural networks although the [00:42:58] working on neural networks although the vast minority of people but in language [00:43:01] vast minority of people but in language it was perhaps even less so because i [00:43:03] it was perhaps even less so because i think language is actually [00:43:06] think language is actually a discrete communication system [00:43:10] a discrete communication system and linguist [00:43:11] and linguist there was kind of a rich body of work on [00:43:13] there was kind of a rich body of work on linguistics and nlp and linguistics kind [00:43:16] linguistics and nlp and linguistics kind of um you know uh co-evolved in [00:43:20] of um you know uh co-evolved in in certain ways that made it kind of [00:43:23] in certain ways that made it kind of um very natural to embrace the kind of [00:43:25] um very natural to embrace the kind of all the the logical structure that was [00:43:27] all the the logical structure that was embodied in you know in language [00:43:30] embodied in you know in language um [00:43:31] um but you know i think people realize that [00:43:33] but you know i think people realize that the there was cracks that were you know [00:43:36] the there was cracks that were you know showing at the seams in the even in the [00:43:39] showing at the seams in the even in the 70s but especially the late 80s and [00:43:42] 70s but especially the late 80s and in 1990 it was a time for [00:43:45] in 1990 it was a time for a new set of methods to come onto the [00:43:48] a new set of methods to come onto the scene [00:43:49] scene um so this actually started [00:43:51] um so this actually started uh a bit earlier from speech recognition [00:43:55] uh a bit earlier from speech recognition because speeches and language are you [00:43:58] because speeches and language are you know closely related and speech is [00:44:00] know closely related and speech is definitely something that you know uh is [00:44:03] definitely something that you know uh is the bridge between the continuous [00:44:05] the bridge between the continuous kind of noisy world where you want to be [00:44:08] kind of noisy world where you want to be doing kind of more pattern recognition [00:44:09] doing kind of more pattern recognition type things and the kind of a logical [00:44:11] type things and the kind of a logical world [00:44:12] world um so hmms uh hidden marker models were [00:44:15] um so hmms uh hidden marker models were developed for speech in the 70s and 80s [00:44:18] developed for speech in the 70s and 80s and [00:44:19] and finally in 1990 there was a kind of [00:44:21] finally in 1990 there was a kind of famous paper from ibm research um [00:44:25] famous paper from ibm research um colloquially called the ibm models for [00:44:27] colloquially called the ibm models for machine translation they developed a [00:44:28] machine translation they developed a probabilistic model that could translate [00:44:31] probabilistic model that could translate between um you know two languages and at [00:44:34] between um you know two languages and at before then um translation was [00:44:37] before then um translation was completely kind of logical and grammar [00:44:39] completely kind of logical and grammar and rule based and this was a radical [00:44:41] and rule based and this was a radical way of thinking about it this is [00:44:43] way of thinking about it this is actually instantly based on a lot of the [00:44:47] actually instantly based on a lot of the bayesian networks or that we'll see you [00:44:49] bayesian networks or that we'll see you later in the course [00:44:51] later in the course um so for a lot of the 90s a lot of [00:44:54] um so for a lot of the 90s a lot of these uh what are called generative [00:44:55] these uh what are called generative models um think about them extensions of [00:44:58] models um think about them extensions of asian networks really kind of dominated [00:45:00] asian networks really kind of dominated uh nlp [00:45:01] uh nlp um near about 2000 um [00:45:05] um near about 2000 um there were uh people started turning to [00:45:07] there were uh people started turning to discriminative models [00:45:09] discriminative models um you know aka uh you know [00:45:13] um you know aka uh you know classification linear classification um [00:45:16] classification linear classification um and um there was another famous paper [00:45:20] and um there was another famous paper called which introduced conditional [00:45:22] called which introduced conditional random fields which uh marries the [00:45:25] random fields which uh marries the structure that was so inherent in in [00:45:28] structure that was so inherent in in language with [00:45:30] language with um [00:45:31] um with [00:45:32] with basically linear classification [00:45:35] basically linear classification uh so this was used to do things such as [00:45:39] uh so this was used to do things such as named entity recognition where you would [00:45:41] named entity recognition where you would mark up words as [00:45:43] mark up words as names of people or companies and so on [00:45:47] names of people or companies and so on and so instead of predicting just one y [00:45:50] and so instead of predicting just one y from x you predict a bunch of y's [00:45:53] from x you predict a bunch of y's from x where y's are the labels of all [00:45:55] from x where y's are the labels of all the words [00:45:56] the words um so this technology was actually quite [00:45:59] um so this technology was actually quite influential in [00:46:02] in nlp but also more broadly in computer [00:46:05] in nlp but also more broadly in computer vision [00:46:06] vision where for much of the [00:46:08] where for much of the 2000s people were invested in having [00:46:12] 2000s people were invested in having this was a kind of main way people [00:46:14] this was a kind of main way people tackle kind of structured tasks [00:46:18] tackle kind of structured tasks another thing i'll mention is lane [00:46:19] another thing i'll mention is lane dirichlet allocation which also [00:46:22] dirichlet allocation which also uh came from models of language and here [00:46:25] uh came from models of language and here the emphasis on unsupervised learning [00:46:28] the emphasis on unsupervised learning where you point lda at a text and it can [00:46:30] where you point lda at a text and it can discover topics [00:46:33] discover topics in the text so here's a text where it [00:46:35] in the text so here's a text where it can discover things like oh some words [00:46:37] can discover things like oh some words are about budget some words about [00:46:40] are about budget some words about children and some words about arts in [00:46:42] children and some words about arts in kind of unsupervised way and this led to [00:46:45] kind of unsupervised way and this led to a whole [00:46:47] a whole kind of cottage industry of topic [00:46:50] kind of cottage industry of topic paper topic modeling papers and lda you [00:46:53] paper topic modeling papers and lda you know continues to be something that [00:46:55] know continues to be something that you know is commonly used in you know in [00:46:57] you know is commonly used in you know in practice [00:46:59] practice what i will say is that a lot of [00:47:03] what i will say is that a lot of these developments [00:47:04] these developments it's interesting to kind of think about [00:47:06] it's interesting to kind of think about how they were developed by someone [00:47:09] how they were developed by someone trying to solve [00:47:11] trying to solve address a problem in natural language [00:47:13] address a problem in natural language processing and it led to of more general [00:47:17] processing and it led to of more general technology that then was applied in all [00:47:20] technology that then was applied in all sorts of different you know areas like [00:47:22] sorts of different you know areas like computer vision and genomics and so on [00:47:26] computer vision and genomics and so on okay so so now um 2000 uh you know we're [00:47:30] okay so so now um 2000 uh you know we're ending [00:47:32] ending and we know that [00:47:33] and we know that at the end of the 2000s deep learning [00:47:35] at the end of the 2000s deep learning really started gaining momentum [00:47:39] really started gaining momentum it wasn't you know it was imagenet was [00:47:41] it wasn't you know it was imagenet was 2009 so it wasn't like a [00:47:44] 2009 so it wasn't like a you know a huge yet but there was [00:47:46] you know a huge yet but there was definitely kind of rumblings um and it's [00:47:49] definitely kind of rumblings um and it's interesting how kind of culturally how [00:47:53] interesting how kind of culturally how the nlp community kind of reacted [00:47:56] the nlp community kind of reacted at the time nlp and vision were both all [00:47:59] at the time nlp and vision were both all very kind of skeptical and if you think [00:48:02] very kind of skeptical and if you think about where nlp had been a lot of people [00:48:06] about where nlp had been a lot of people still view languages you know structure [00:48:08] still view languages you know structure heavy and languages has a lot of lane [00:48:11] heavy and languages has a lot of lane structure and no way that this mess of [00:48:13] structure and no way that this mess of neurons could actually do anything uh [00:48:16] neurons could actually do anything uh with this kind of [00:48:18] with this kind of intricate structure um and you can think [00:48:20] intricate structure um and you can think about a lot of the work in [00:48:22] about a lot of the work in the 2000s was really a marrying of this [00:48:26] the 2000s was really a marrying of this structure with kind of statistical [00:48:28] structure with kind of statistical methods [00:48:29] methods um so it was you can think about as [00:48:31] um so it was you can think about as putting probabilistic [00:48:34] putting probabilistic choices on kind of this very rich uh [00:48:38] choices on kind of this very rich uh structure discrete backbone right so in [00:48:40] structure discrete backbone right so in some ways this is kind of a [00:48:41] some ways this is kind of a reconciliation of the compositional [00:48:44] reconciliation of the compositional semantics with the distribution of [00:48:45] semantics with the distribution of status you have it you have both um [00:48:49] status you have it you have both um but but still i think [00:48:52] but but still i think it was still [00:48:53] it was still largely based on kind of traditional [00:48:56] largely based on kind of traditional kind of linguistic thinking so then um i [00:48:59] kind of linguistic thinking so then um i remember there was this [00:49:00] remember there was this 2011 you know workshop at nureps um [00:49:04] 2011 you know workshop at nureps um i was at it um and there was uh [00:49:07] i was at it um and there was uh there were some [00:49:09] there were some uh so neurops is you know the machine [00:49:10] uh so neurops is you know the machine learning conference so there were a [00:49:11] learning conference so there were a bunch of machine learning people who [00:49:13] bunch of machine learning people who were using vector based models to argue [00:49:16] were using vector based models to argue that this covers semantics and then you [00:49:18] that this covers semantics and then you have ray mooney who is much more of a [00:49:19] have ray mooney who is much more of a logic old-school ai [00:49:22] logic old-school ai at the time kind of person [00:49:24] at the time kind of person and a heated argument kind of broke out [00:49:27] and a heated argument kind of broke out and he is famous for saying you know you [00:49:29] and he is famous for saying you know you can't cram the meaning of a whole [00:49:31] can't cram the meaning of a whole sentence into a single uh vector [00:49:34] sentence into a single uh vector um [00:49:35] um okay so so that was captured the [00:49:37] okay so so that was captured the attitude you know at the time [00:49:40] attitude you know at the time um [00:49:41] um and then things kind of started changing [00:49:42] and then things kind of started changing i think the first kind of um [00:49:45] i think the first kind of um you know maybe move was uh you know word [00:49:47] you know maybe move was uh you know word to vec which was this way of taking a [00:49:50] to vec which was this way of taking a lots of text and [00:49:53] lots of text and embedding words [00:49:54] embedding words so that each embedding was uh [00:49:57] so that each embedding was uh characterizing the context of that word [00:50:00] characterizing the context of that word so actually word representations have [00:50:02] so actually word representations have been had been around you know since the [00:50:05] been had been around you know since the 90s but it was somehow this word event [00:50:08] 90s but it was somehow this word event uh [00:50:09] uh was came at the right time that [00:50:12] was came at the right time that really caused people to pay attention [00:50:13] really caused people to pay attention and i think one thing that they noticed [00:50:15] and i think one thing that they noticed which [00:50:16] which gathered a lot of attention was [00:50:18] gathered a lot of attention was the fact that you could do analogies for [00:50:20] the fact that you could do analogies for example if you embed things in a vector [00:50:23] example if you embed things in a vector space you see that country and capitals [00:50:25] space you see that country and capitals are related by a consistent you know [00:50:27] are related by a consistent you know relationship you know with some [00:50:29] relationship you know with some asterisks [00:50:31] asterisks there was a recent paper from last year [00:50:33] there was a recent paper from last year which i'll [00:50:35] which i'll which i'll just highlight because i [00:50:36] which i'll just highlight because i thought it was really interesting i mean [00:50:39] thought it was really interesting i mean six years later and they used a kind of [00:50:40] six years later and they used a kind of a simplest method but it was um they ran [00:50:43] a simplest method but it was um they ran this uh word event just on you know [00:50:46] this uh word event just on you know about [00:50:48] about three million abstracts of material [00:50:50] three million abstracts of material science papers um just strings and they [00:50:53] science papers um just strings and they were able to discover certain types of [00:50:56] were able to discover certain types of patterns um by looking at the vector [00:50:58] patterns um by looking at the vector spaces and actually predict uh certain [00:51:01] spaces and actually predict uh certain types of chem [00:51:02] types of chem compounds as being having certain kind [00:51:05] compounds as being having certain kind of material properties like you know [00:51:06] of material properties like you know therm being thermoelectric [00:51:08] therm being thermoelectric so uh this is kind of an interesting [00:51:12] so uh this is kind of an interesting uh [00:51:12] uh view of how something that's so [00:51:15] view of how something that's so kind of you know dead simple and knows [00:51:17] kind of you know dead simple and knows nothing about chemistry and only knows [00:51:19] nothing about chemistry and only knows about what crew occurrences can actually [00:51:21] about what crew occurrences can actually you know generate some interesting you [00:51:23] you know generate some interesting you know insights [00:51:25] know insights um so word effect wasn't [00:51:27] um so word effect wasn't um you know deep learning in the sense [00:51:29] um you know deep learning in the sense that it was only i guess one layer so it [00:51:33] that it was only i guess one layer so it was kind of shallow learning [00:51:35] was kind of shallow learning and i think [00:51:37] and i think uh [00:51:38] uh 2014 was when the deep learning [00:51:40] 2014 was when the deep learning community really kind of um i don't know [00:51:43] community really kind of um i don't know in some sense you know vindicated [00:51:45] in some sense you know vindicated itself in the nlp community so there's a [00:51:49] itself in the nlp community so there's a sequence the sequence learning paper [00:51:51] sequence the sequence learning paper from google in 2014 [00:51:54] from google in 2014 where [00:51:55] where they [00:51:56] they did machine translation [00:51:58] did machine translation and the way they did machine translation [00:52:00] and the way they did machine translation was by [00:52:01] was by taking a sentence and running a lstm um [00:52:06] taking a sentence and running a lstm um some if you don't uh know what it is [00:52:08] some if you don't uh know what it is it's fine it's just some [00:52:10] it's fine it's just some black box that embeds um the sentence [00:52:13] black box that embeds um the sentence into a single vector and then using that [00:52:15] into a single vector and then using that vector it spits out a new a new sentence [00:52:19] vector it spits out a new a new sentence so if you look watch the module on [00:52:21] so if you look watch the module on differential pro programming it'll give [00:52:22] differential pro programming it'll give you a better idea of what i'm talking [00:52:24] you a better idea of what i'm talking about [00:52:25] about um so this was really cramming the [00:52:28] um so this was really cramming the meaning of a sentence into a vector like [00:52:31] meaning of a sentence into a vector like literally [00:52:32] literally and you know at that time the results [00:52:34] and you know at that time the results were you know [00:52:36] were you know kind of okay but it was [00:52:38] kind of okay but it was enough of a proof of concept and you [00:52:41] enough of a proof of concept and you know surprising enough that it later [00:52:43] know surprising enough that it later extensions of this really transform into [00:52:47] extensions of this really transform into actually usable [00:52:48] actually usable you know technology [00:52:50] you know technology so it's interesting if you look at the [00:52:52] so it's interesting if you look at the progression from rule-based [00:52:55] progression from rule-based you know machine translation where [00:52:56] you know machine translation where there's no machine learning to [00:52:58] there's no machine learning to statistical machine translation where [00:53:00] statistical machine translation where they're still [00:53:01] they're still um [00:53:02] um you know probably [00:53:03] you know probably still data driven but [00:53:05] still data driven but based on [00:53:06] based on um more or less kind of some sort of [00:53:09] um more or less kind of some sort of structure of language to the neural [00:53:11] structure of language to the neural world where there's really even less [00:53:13] world where there's really even less structure um [00:53:15] structure um and things have kind of you know gotten [00:53:17] and things have kind of you know gotten you know better [00:53:19] you know better there's a [00:53:20] there's a researcher called fred gelnak who's [00:53:22] researcher called fred gelnak who's famously quoted as saying you know every [00:53:24] famously quoted as saying you know every time i fire a linguist my accuracy goes [00:53:27] time i fire a linguist my accuracy goes up um i'm not sure he said it was exact [00:53:29] up um i'm not sure he said it was exact words but that's [00:53:31] words but that's at least [00:53:32] at least kind of an urban [00:53:34] kind of an urban legend [00:53:35] legend um [00:53:36] um one other note i'll make is that no [00:53:38] one other note i'll make is that no machine translation seems to be [00:53:41] machine translation seems to be the task at least in nlp but maybe more [00:53:43] the task at least in nlp but maybe more broadly that has really [00:53:46] broadly that has really pushed the limits of kind of you know [00:53:48] pushed the limits of kind of you know technologies and i think it was really [00:53:50] technologies and i think it was really the driver that got you know c2c [00:53:52] the driver that got you know c2c technology kind of going [00:53:55] technology kind of going um so in 2016 google kind of [00:53:58] um so in 2016 google kind of completely transformed their machine [00:54:00] completely transformed their machine translation system [00:54:01] translation system and instead of having multiple [00:54:04] and instead of having multiple uh [00:54:05] uh systems one for every pair so you have [00:54:07] systems one for every pair so you have like n squared pairs you actually have [00:54:09] like n squared pairs you actually have one system that can do translations [00:54:12] one system that can do translations between any pair of languages which is [00:54:14] between any pair of languages which is you know really kind of you know [00:54:16] you know really kind of you know mind-blowing at the time and still kind [00:54:18] mind-blowing at the time and still kind of uh i think kind of impressive [00:54:21] of uh i think kind of impressive um [00:54:22] um i'll mention i think i mentioned some of [00:54:25] i'll mention i think i mentioned some of these things already but i think it's [00:54:26] these things already but i think it's worth highlighting that you know these [00:54:28] worth highlighting that you know these statistical methods do [00:54:30] statistical methods do have a lot of biases in them so if you [00:54:32] have a lot of biases in them so if you translate [00:54:33] translate uh these sentences you get genders [00:54:36] uh these sentences you get genders appearing which are correlated with [00:54:38] appearing which are correlated with certain types of professions um this is [00:54:41] certain types of professions um this is even more extreme so if you take a rare [00:54:43] even more extreme so if you take a rare language where there's not much data and [00:54:44] language where there's not much data and you pump in something that's [00:54:46] you pump in something that's just [00:54:47] just kind of garbage you get some really uh [00:54:50] kind of garbage you get some really uh some you know disturbing uh translations [00:54:52] some you know disturbing uh translations coming out um this was a film a few [00:54:54] coming out um this was a film a few years ago so they might have fixed this [00:54:56] years ago so they might have fixed this but nonetheless um [00:54:58] but nonetheless um it turns out that you can cram a lot [00:55:00] it turns out that you can cram a lot into a vector but there's some really [00:55:02] into a vector but there's some really weird stuff [00:55:03] weird stuff in there [00:55:06] in there um so then maybe this is a good time to [00:55:07] um so then maybe this is a good time to pause if there's any you know questions [00:55:09] pause if there's any you know questions before the next wave of slides [00:55:12] before the next wave of slides there are some questions on slider [00:55:14] there are some questions on slider oh okay sure i guess i was looking at [00:55:16] oh okay sure i guess i was looking at started looking at zoom [00:55:18] started looking at zoom chat [00:55:20] chat okay so questions [00:55:26] um [00:55:28] um oh whoa okay there's [00:55:30] oh whoa okay there's yes i'll cover gpg3 how about body [00:55:33] yes i'll cover gpg3 how about body language so i think i mentioned that [00:55:35] language so i think i mentioned that gestures [00:55:36] gestures are typically not studied in nlp but [00:55:40] are typically not studied in nlp but um but they're definitely fair game for [00:55:42] um but they're definitely fair game for you know communication and there's i [00:55:44] you know communication and there's i think [00:55:46] interesting uh [00:55:48] interesting uh kind of subfield mnlp which has to do [00:55:50] kind of subfield mnlp which has to do with grounding and how people [00:55:54] with grounding and how people use language in the world which it's [00:55:55] use language in the world which it's natural to consider gestures um [00:55:59] natural to consider gestures um uh let's see [00:56:01] uh let's see can we build a common electric language [00:56:04] can we build a common electric language with [00:56:04] with precise meaning so all languages can [00:56:06] precise meaning so all languages can reference based on it so this was uh [00:56:09] reference based on it so this was uh this has been tried so people have uh [00:56:11] this has been tried so people have uh there was a language called laguan which [00:56:13] there was a language called laguan which was developed [00:56:15] was developed to remove all ambiguity you know from [00:56:18] to remove all ambiguity you know from language so it would be you know precise [00:56:20] language so it would be you know precise and everyone would know what you mean um [00:56:23] and everyone would know what you mean um personally i think it was a kind of a [00:56:25] personally i think it was a kind of a little bit of a fool's errand because um [00:56:28] little bit of a fool's errand because um i think that ambiguity is exactly what [00:56:33] i think that ambiguity is exactly what allows language to be so efficient [00:56:36] allows language to be so efficient so the meaning of a [00:56:37] so the meaning of a of [00:56:38] of you know as [00:56:39] you know as a sentence [00:56:41] a sentence um [00:56:42] um is a function of not only the words but [00:56:44] is a function of not only the words but also of the of the context so if the [00:56:47] also of the of the context so if the context already makes some things kind [00:56:49] context already makes some things kind of obvious then um you don't [00:56:52] of obvious then um you don't really need to say it [00:56:54] really need to say it um so [00:56:55] um so and also it's just not language has you [00:56:58] and also it's just not language has you have to take into account the ease of [00:57:00] have to take into account the ease of acquisition [00:57:01] acquisition and it's much easier to [00:57:04] and it's much easier to kind of almost by construction to learn [00:57:06] kind of almost by construction to learn the languages that we've evolved to [00:57:08] the languages that we've evolved to learn um and something that's designed [00:57:11] learn um and something that's designed is generally not [00:57:13] is generally not you know going to be super productive [00:57:17] you know going to be super productive so nlu is a combination of plausibility [00:57:20] so nlu is a combination of plausibility and fluency [00:57:22] and fluency so what is i guess natural language [00:57:24] so what is i guess natural language understanding is something i haven't [00:57:27] understanding is something i haven't quite defined because i think there's no [00:57:29] quite defined because i think there's no accepted you know definition [00:57:33] accepted you know definition of it you can think about it as [00:57:35] of it you can think about it as demonstrating a proficiency or a number [00:57:37] demonstrating a proficiency or a number of [00:57:39] of you know tasks such as question [00:57:40] you know tasks such as question answering or translation [00:57:42] answering or translation um [00:57:43] um or i guess if you think about generation [00:57:46] or i guess if you think about generation you have to think about plausibility and [00:57:47] you have to think about plausibility and fluency but also you know [00:57:50] fluency but also you know um you know [00:57:51] um you know truthfulness [00:57:53] truthfulness why did you choose to do research in nlp [00:57:55] why did you choose to do research in nlp as opposed to other areas how would you [00:57:58] as opposed to other areas how would you think about [00:58:00] think about sorry this icon on the way [00:58:02] sorry this icon on the way what you wanted to [00:58:06] study um during the future [00:58:10] study um during the future sorry slido i don't know why slido has [00:58:12] sorry slido i don't know why slido has this icon that makes me hard to read [00:58:14] this icon that makes me hard to read this [00:58:15] this um so i guess how do i choose to do [00:58:17] um so i guess how do i choose to do research in nlp i guess this is a very [00:58:20] research in nlp i guess this is a very much [00:58:21] much i guess a personal [00:58:23] i guess a personal thing um so i don't think my answer is [00:58:26] thing um so i don't think my answer is you know necessarily good for everyone [00:58:28] you know necessarily good for everyone but i think um just the idea of [00:58:32] but i think um just the idea of what you can do [00:58:33] what you can do with language [00:58:35] with language just seems so [00:58:36] just seems so kind of powerful to me um like i said [00:58:39] kind of powerful to me um like i said it's one of these things that is so kind [00:58:41] it's one of these things that is so kind of uniquely human and also i think [00:58:45] of uniquely human and also i think it seems like a window to understanding [00:58:48] it seems like a window to understanding um you know kind of cognition [00:58:50] um you know kind of cognition because it's a way to kind of it's a way [00:58:53] because it's a way to kind of it's a way to do the i o from [00:58:55] to do the i o from out in and out of brains i suppose [00:58:58] out in and out of brains i suppose um is there research on how to [00:59:00] um is there research on how to incorporate real time changes in [00:59:02] incorporate real time changes in language like constant emergence of new [00:59:03] language like constant emergence of new words and you know phrases [00:59:06] words and you know phrases uh yeah so there is a lot of [00:59:08] uh yeah so there is a lot of interesting work uh studying um language [00:59:12] interesting work uh studying um language change over time um you know there's [00:59:15] change over time um you know there's historical linguistics which talks about [00:59:17] historical linguistics which talks about you know bigger scale changes of you [00:59:19] you know bigger scale changes of you know latin to spanish and french and so [00:59:21] know latin to spanish and french and so on uh but there's also uh interesting [00:59:24] on uh but there's also uh interesting opportunity to do it on the web because [00:59:26] opportunity to do it on the web because internet language changes you know very [00:59:29] internet language changes you know very uh very quickly there's always new words [00:59:31] uh very quickly there's always new words that come up and also on you know i [00:59:34] that come up and also on you know i guess on twitter things are geo you can [00:59:36] guess on twitter things are geo you can geo tag things so that you can actually [00:59:38] geo tag things so that you can actually witness the kind of how language you [00:59:41] witness the kind of how language you know spreads [00:59:43] know spreads over time [00:59:45] so yeah there is an active area of [00:59:48] so yeah there is an active area of thinking about um you know language uh [00:59:51] thinking about um you know language uh change um [00:59:53] change um not just new words but also the [00:59:56] not just new words but also the new words changing in meaning like the [00:59:58] new words changing in meaning like the word awful uh used to mean to kind of uh [01:00:02] word awful uh used to mean to kind of uh sorry the word awesome i think used to [01:00:05] sorry the word awesome i think used to mean something more like awful uh but [01:00:08] mean something more like awful uh but now it means it changed kind of flipped [01:00:10] now it means it changed kind of flipped in sentiment from you know negative two [01:00:13] in sentiment from you know negative two a positive [01:00:16] a positive um [01:00:17] um cool so [01:00:18] cool so maybe i will move on thanks for the [01:00:21] maybe i will move on thanks for the questions [01:00:22] questions okay so um up until this point so [01:00:26] okay so um up until this point so 2014 [01:00:27] 2014 is when kind of deep learning really [01:00:29] is when kind of deep learning really started gaining momentum and from 2014 [01:00:32] started gaining momentum and from 2014 to 18 it was really kind of [01:00:35] to 18 it was really kind of neuralizing everything you have parsers [01:00:38] neuralizing everything you have parsers co-reference resolution systems named [01:00:40] co-reference resolution systems named entity recognition systems [01:00:42] entity recognition systems everything kind of under the sun [01:00:44] everything kind of under the sun and you know numbers went up things did [01:00:47] and you know numbers went up things did get better because the models were just [01:00:48] get better because the models were just more powerful than what existed [01:00:51] more powerful than what existed previously [01:00:52] previously i think a big turning point came in 2018 [01:00:55] i think a big turning point came in 2018 so there's this paper called uh deep [01:00:57] so there's this paper called uh deep contextualized word embeddings maybe [01:00:59] contextualized word embeddings maybe more better known as elmo [01:01:02] more better known as elmo and the idea behind elmo can be [01:01:04] and the idea behind elmo can be summarized as follows so imagine you're [01:01:07] summarized as follows so imagine you're trying to do question answering so [01:01:09] trying to do question answering so you know our group we actually spent uh [01:01:11] you know our group we actually spent uh quite a bit of work creating the squad [01:01:14] quite a bit of work creating the squad question answering data set with a [01:01:15] question answering data set with a hundred thousand examples so it was a [01:01:17] hundred thousand examples so it was a lot of work to get that but um in some [01:01:19] lot of work to get that but um in some sense a thousand a hundred thousand is [01:01:21] sense a thousand a hundred thousand is really really small compared to the [01:01:24] really really small compared to the massive amounts of text [01:01:26] massive amounts of text on the web [01:01:27] on the web and so you the idea behind pre-training [01:01:29] and so you the idea behind pre-training is that you train a language model to [01:01:32] is that you train a language model to predict the next word given the previous [01:01:34] predict the next word given the previous context [01:01:35] context and so there should this is called i [01:01:37] and so there should this is called i guess self-supervision you just make up [01:01:39] guess self-supervision you just make up a task which is called predict the next [01:01:40] a task which is called predict the next word given previous words and then you [01:01:43] word given previous words and then you learn embeddings [01:01:45] learn embeddings and then you use those embeddings uh to [01:01:48] and then you use those embeddings uh to drive some downstream tasks where you [01:01:50] drive some downstream tasks where you have [01:01:51] have much fewer labeled examples [01:01:54] much fewer labeled examples and the result was that across the board [01:01:57] and the result was that across the board all across a number of different [01:01:58] all across a number of different benchmarks [01:02:01] benchmarks the accuracies [01:02:03] the accuracies went up [01:02:04] went up by a few points so i guess it's maybe [01:02:06] by a few points so i guess it's maybe hard to really appreciate what like [01:02:07] hard to really appreciate what like three points means but you know it was [01:02:10] three points means but you know it was think about these [01:02:12] think about these systems were already hard to improve and [01:02:14] systems were already hard to improve and one point gain was very good and this is [01:02:16] one point gain was very good and this is like you know substantial gain across a [01:02:18] like you know substantial gain across a wide variety of tasks [01:02:20] wide variety of tasks so this got a lot of nlp people really [01:02:23] so this got a lot of nlp people really excited [01:02:24] excited later that year [01:02:27] later that year albert [01:02:29] albert came out from [01:02:31] came out from from google and [01:02:33] from google and again across the which is i'm not going [01:02:36] again across the which is i'm not going to go into the details actually if you [01:02:38] to go into the details actually if you watch the differentiable [01:02:39] watch the differentiable programming lecture i do [01:02:41] programming lecture i do explain uh a bit more about what you [01:02:43] explain uh a bit more about what you know bird is doing um [01:02:46] know bird is doing um but [01:02:46] but think about it again as a it's a mass [01:02:48] think about it again as a it's a mass language model which is kind of like [01:02:50] language model which is kind of like predict a word uh given its context [01:02:52] predict a word uh given its context um and [01:02:54] um and this was [01:02:55] this was more or less just scaled up and you know [01:02:57] more or less just scaled up and you know engineered properly and this again [01:03:00] engineered properly and this again yielded huge gains over you know [01:03:02] yielded huge gains over you know previous you know methods [01:03:04] previous you know methods um [01:03:05] um so [01:03:06] so so this really i think changed the [01:03:08] so this really i think changed the game in nlp from [01:03:11] game in nlp from having [01:03:12] having you know kind of specific architectures [01:03:14] you know kind of specific architectures to do different tasks [01:03:16] to do different tasks to a world where you have one [01:03:18] to a world where you have one architecture that does multiple paths so [01:03:21] architecture that does multiple paths so i guess i didn't mention that bird is [01:03:23] i guess i didn't mention that bird is now used to essentially power [01:03:26] now used to essentially power all the or bert or plus friends [01:03:30] all the or bert or plus friends some muppet is used to power some [01:03:32] some muppet is used to power some downstream nlp tasks like co-reference [01:03:35] downstream nlp tasks like co-reference resolution or semantic parsing and so on [01:03:39] resolution or semantic parsing and so on so it's really kind of brings us one [01:03:41] so it's really kind of brings us one step closer to having kind of a unified [01:03:43] step closer to having kind of a unified representation back or unified one model [01:03:46] representation back or unified one model that can kind of roll them all in a [01:03:48] that can kind of roll them all in a sense [01:03:49] sense um so going back to reading [01:03:50] um so going back to reading comprehension one thing that [01:03:55] is remarkable is that [01:03:57] is remarkable is that uh if you look at the [01:03:59] uh if you look at the leaderboard and the accuracies um [01:04:01] leaderboard and the accuracies um they're way above you know human level [01:04:02] they're way above you know human level performance so these systems look like [01:04:05] performance so these systems look like they're getting uh superhuman you know [01:04:07] they're getting uh superhuman you know performance [01:04:09] performance but [01:04:10] but one thing that uh we did a few years ago [01:04:13] one thing that uh we did a few years ago was to really probe [01:04:15] was to really probe into whether these systems actually [01:04:17] into whether these systems actually understood language um so here's a [01:04:20] understood language um so here's a paragraph and a sentence um [01:04:23] paragraph and a sentence um the number of new hugo huguenot colonist [01:04:25] the number of new hugo huguenot colonist declined after what year emperor [01:04:27] declined after what year emperor correctly answers 1700 which is right [01:04:29] correctly answers 1700 which is right here [01:04:30] here but if you add a distracting sentence [01:04:32] but if you add a distracting sentence which looks like it answers the question [01:04:34] which looks like it answers the question but it doesn't then bert will get [01:04:36] but it doesn't then bert will get distracted and answer the wrong thing [01:04:38] distracted and answer the wrong thing and quantitatively [01:04:40] and quantitatively all the systems just fall [01:04:43] all the systems just fall by you know quite a bit when added when [01:04:46] by you know quite a bit when added when added this with this distracting [01:04:47] added this with this distracting sentence whereas humans obviously don't [01:04:49] sentence whereas humans obviously don't get followed by that as much [01:04:52] get followed by that as much so [01:04:52] so it's you know one thing to kind of keep [01:04:55] it's you know one thing to kind of keep in mind is that while um [01:04:58] in mind is that while um models have gotten [01:05:00] models have gotten you know impressive results on [01:05:01] you know impressive results on benchmarks there's still these blind [01:05:03] benchmarks there's still these blind spots that means that solving a [01:05:05] spots that means that solving a benchmark is not the same as solving the [01:05:08] benchmark is not the same as solving the actual underlying task which is can be [01:05:11] actual underlying task which is can be miscellaneous if you read headlines [01:05:12] miscellaneous if you read headlines where [01:05:13] where computers can read better than humans [01:05:15] computers can read better than humans where that's just not true computers do [01:05:17] where that's just not true computers do squad better than humans that is true [01:05:21] okay and what's what's a little bit more [01:05:23] okay and what's what's a little bit more maybe worries is that these models that [01:05:25] maybe worries is that these models that can be easy to break [01:05:27] can be easy to break um but we don't actually know how to you [01:05:30] um but we don't actually know how to you know fix them except for by training [01:05:32] know fix them except for by training larger models and hoping that they break [01:05:34] larger models and hoping that they break less [01:05:36] less okay any questions before i talk about [01:05:38] okay any questions before i talk about gpg 3 because that's going to be the [01:05:41] gpg 3 because that's going to be the last topic [01:05:43] last topic okay so um [01:05:45] okay so um is naming algorithms for cartoon [01:05:48] is naming algorithms for cartoon tv characters a thing or just a [01:05:50] tv characters a thing or just a coincidence for the two instances [01:05:52] coincidence for the two instances um i would say [01:05:54] um i would say i would gather that it's very not a [01:05:56] i would gather that it's very not a coincidence because um [01:05:59] coincidence because um uh because you also had ernie that came [01:06:01] uh because you also had ernie that came out afterwards um and big bird and it's [01:06:04] out afterwards um and big bird and it's just you know clearly people are going [01:06:07] just you know clearly people are going along with a theme there's another cast [01:06:09] along with a theme there's another cast of characters uh bark and marge are [01:06:11] of characters uh bark and marge are other um [01:06:13] other um i guess they're not muppets but they're [01:06:15] i guess they're not muppets but they're other uh from the simpsons and that's [01:06:17] other uh from the simpsons and that's who is another line that facebook has [01:06:19] who is another line that facebook has been pursuing [01:06:21] been pursuing um is there any work on going to improve [01:06:24] um is there any work on going to improve comprehension reading between the lines [01:06:26] comprehension reading between the lines understanding [01:06:27] understanding um [01:06:28] um it's actually interesting because uh [01:06:31] it's actually interesting because uh these models large models [01:06:34] these models large models are so contextual and leverage so much [01:06:37] are so contextual and leverage so much kind of external world knowledge about [01:06:40] kind of external world knowledge about text that they're almost kind of reading [01:06:42] text that they're almost kind of reading too much [01:06:44] too much between the lines um they're making [01:06:47] between the lines um they're making inferences and making assumptions [01:06:49] inferences and making assumptions um which is what at least all these [01:06:51] um which is what at least all these biases and in the models because it's [01:06:53] biases and in the models because it's not stated in the text they're just [01:06:55] not stated in the text they're just learning from you know associations [01:07:00] okay let me move on uh so to get to [01:07:05] the [01:07:07] the now the final thing so 2020 so may [01:07:11] now the final thing so 2020 so may open ai releases gpt3 um i'm skipping a [01:07:14] open ai releases gpt3 um i'm skipping a bunch of other models like gbt and gpt2 [01:07:16] bunch of other models like gbt and gpt2 which is in the truth of time and this [01:07:18] which is in the truth of time and this was [01:07:19] was essentially a big language mall i mean [01:07:21] essentially a big language mall i mean big is an understatement it's a [01:07:24] big is an understatement it's a ginormous language model trained on [01:07:26] ginormous language model trained on common crawl which is our [01:07:29] common crawl which is our best [01:07:30] best kind of approximation of the internet uh [01:07:32] kind of approximation of the internet uh so to speak and has 175 billion [01:07:35] so to speak and has 175 billion parameters um whereas bert had maybe [01:07:38] parameters um whereas bert had maybe like 300 you know million parameters or [01:07:40] like 300 you know million parameters or so so this is much much much larger [01:07:44] so so this is much much much larger so the interesting [01:07:46] so the interesting thing about gpe 3 [01:07:48] thing about gpe 3 is [01:07:49] is this ability to do what they call in [01:07:51] this ability to do what they call in context learning so traditionally if you [01:07:54] context learning so traditionally if you use bert what you would do [01:07:56] use bert what you would do is you have a model you show an example [01:07:59] is you have a model you show an example and then you perform an update and you [01:08:01] and then you perform an update and you show another example and you perform an [01:08:02] show another example and you perform an update and so on and this is called fine [01:08:04] update and so on and this is called fine tuning a language model but gpe3 showed [01:08:07] tuning a language model but gpe3 showed that that was [01:08:08] that that was actually not necessary to get some you [01:08:11] actually not necessary to get some you know interesting performance so you [01:08:13] know interesting performance so you could actually do zero shot training so [01:08:15] could actually do zero shot training so you could say translate english to [01:08:17] you could say translate english to french and you [01:08:19] french and you say cheese [01:08:21] say cheese and then you give the prompt and then it [01:08:23] and then you give the prompt and then it will actually do something reasonable [01:08:26] will actually do something reasonable or you give a few one example or a few [01:08:28] or you give a few one example or a few examples and notice that this is not [01:08:31] examples and notice that this is not training data in a conventional [01:08:33] training data in a conventional sense where you optimize a loss this is [01:08:36] sense where you optimize a loss this is actually the input into the language [01:08:39] actually the input into the language model [01:08:40] model so language model all it does is that [01:08:41] so language model all it does is that you give it a string and you ask it to [01:08:44] you give it a string and you ask it to predict next string right so this is [01:08:47] predict next string right so this is encoding [01:08:49] encoding a task actually in you know natural [01:08:51] a task actually in you know natural language which is kind of a very [01:08:53] language which is kind of a very different way of thinking about um you [01:08:55] different way of thinking about um you know learning [01:08:57] know learning um let's see do i have enough time maybe [01:08:59] um let's see do i have enough time maybe i'll [01:09:00] i'll uh so open ai has this playground um let [01:09:04] uh so open ai has this playground um let me just show you a little bit so uh [01:09:08] me just show you a little bit so uh there's many things you can do um [01:09:11] there's many things you can do um you can [01:09:12] you can so this is the prompt [01:09:14] so this is the prompt um [01:09:15] um and i'm gonna say how can i help you [01:09:18] and i'm gonna say how can i help you today um you know can you tell me the [01:09:21] today um you know can you tell me the difference between [01:09:24] difference between um [01:09:24] um sarsa and q learning [01:09:27] sarsa and q learning i actually don't know whether this will [01:09:29] i actually don't know whether this will work okay so sure [01:09:32] work okay so sure um policy inspired by q learning [01:09:35] um policy inspired by q learning um [01:09:37] um okay so [01:09:39] okay so this doesn't really answer the question [01:09:41] this doesn't really answer the question but that was [01:09:43] but that was nice but uh i didn't answer that [01:09:46] nice but uh i didn't answer that question [01:09:47] question um [01:09:50] yeah so [01:09:52] yeah so uh how about something else who founded [01:09:56] uh how about something else who founded microsoft [01:09:57] microsoft um [01:09:59] okay so this is also you know what lie [01:10:02] okay so this is also you know what lie but so you can kind of see that [01:10:05] but so you can kind of see that uh it generates fluent english but it [01:10:08] uh it generates fluent english but it sometimes uh doesn't have the best [01:10:10] sometimes uh doesn't have the best tendency to tell you know the truth um [01:10:13] tendency to tell you know the truth um this is another example so again this is [01:10:15] this is another example so again this is a prompt and [01:10:17] a prompt and let me [01:10:20] go to [01:10:22] go to let me dig this up so this is kind of [01:10:24] let me dig this up so this is kind of from the [01:10:25] from the course [01:10:27] course syllabus [01:10:28] syllabus um oops [01:10:36] so this is some complicated expression [01:10:38] so this is some complicated expression and then this is again the what you feed [01:10:40] and then this is again the what you feed into gp3 and it says artificial [01:10:42] into gp3 and it says artificial intelligence is the magic that makes [01:10:43] intelligence is the magic that makes computer do things that are not they're [01:10:45] computer do things that are not they're not supposed to do like talking their [01:10:47] not supposed to do like talking their cars interesting okay so it's [01:10:50] cars interesting okay so it's kind of um [01:10:51] kind of um well you can judge for yourself what to [01:10:53] well you can judge for yourself what to make of this [01:10:55] make of this um [01:10:56] um anyway you can [01:10:58] anyway you can if you have access you can burn quite a [01:10:59] if you have access you can burn quite a bit of time playing with this um [01:11:02] bit of time playing with this um another example i'll show is uh [01:11:06] another example i'll show is uh you can train it on cs221 quizzes so you [01:11:09] you can train it on cs221 quizzes so you train it on quiz one question and [01:11:11] train it on quiz one question and answers and you see how well it does on [01:11:12] answers and you see how well it does on quiz 2. [01:11:14] quiz 2. um [01:11:15] um so here are you know prompts uh in [01:11:20] so here are you know prompts uh in so what is bold is given to the [01:11:21] so what is bold is given to the algorithm so this should be familiar [01:11:24] algorithm so this should be familiar from quiz one and then you ask it which [01:11:26] from quiz one and then you ask it which of the following are examples of [01:11:27] of the following are examples of regression [01:11:28] regression um [01:11:30] um and it answers [01:11:31] and it answers uh a c and d which is which is wrong but [01:11:35] uh a c and d which is which is wrong but then it offers an explanation of this [01:11:37] then it offers an explanation of this which is you know kind of starts off on [01:11:41] which is you know kind of starts off on on target and then [01:11:43] on target and then um it kind of [01:11:45] um it kind of examples of supervised regression house [01:11:47] examples of supervised regression house price estimation which is actually [01:11:48] price estimation which is actually pretty good uh spam detection which [01:11:51] pretty good uh spam detection which contradicts what it answered [01:11:54] contradicts what it answered and [01:11:56] and you know this unsupervised regression i [01:11:57] you know this unsupervised regression i don't know if it's a thing [01:12:00] don't know if it's a thing so you can kind of get a sense that gpd3 [01:12:03] so you can kind of get a sense that gpd3 is really good at [01:12:06] is really good at generating [01:12:07] generating text [01:12:08] text which if you just if i didn't tell you [01:12:10] which if you just if i didn't tell you this was fake you'd probably read it and [01:12:13] this was fake you'd probably read it and you know [01:12:15] you know but [01:12:16] but you have to actually look pretty closely [01:12:17] you have to actually look pretty closely now to [01:12:20] now to know that uh you know something is up [01:12:25] okay there are actually so i'm trying to [01:12:27] okay there are actually so i'm trying to give you a [01:12:28] give you a maybe a more balanced view of gp3 i [01:12:31] maybe a more balanced view of gp3 i think if you go on the internet and look [01:12:32] think if you go on the internet and look at all the twitter you'll just be [01:12:34] at all the twitter you'll just be completely blown away by all the awesome [01:12:36] completely blown away by all the awesome things people are building out there but [01:12:38] things people are building out there but i want to kind of balance it out with [01:12:39] i want to kind of balance it out with some things which are like yeah maybe [01:12:42] some things which are like yeah maybe it's not doing everything but [01:12:43] it's not doing everything but um [01:12:45] um several other things to mention [01:12:47] several other things to mention now that gender bias i think is very hot [01:12:49] now that gender bias i think is very hot on people's minds in nlp uh [01:12:52] on people's minds in nlp uh there's [01:12:54] there's they have an entire section in the paper [01:12:55] they have an entire section in the paper that calls out that yes not surprisingly [01:12:58] that calls out that yes not surprisingly gpg3 has gender bias [01:13:01] gpg3 has gender bias i mentioned before that there was [01:13:02] i mentioned before that there was actually someone who used gpd3 to [01:13:04] actually someone who used gpd3 to generate a blog post and this ended up [01:13:06] generate a blog post and this ended up you know number one on hacker news for a [01:13:08] you know number one on hacker news for a while and there's even papers coming out [01:13:11] while and there's even papers coming out that says that gbd3 can primed with [01:13:14] that says that gbd3 can primed with extremist [01:13:16] extremist context can generate more extremist [01:13:19] context can generate more extremist content which is again perhaps not [01:13:21] content which is again perhaps not surprising given that this was trained [01:13:23] surprising given that this was trained on the internet and the internet has [01:13:25] on the internet and the internet has many things lovely things in it [01:13:28] many things lovely things in it um so some kind of things to think about [01:13:31] um so some kind of things to think about i think [01:13:32] i think so one clear question here is can we [01:13:35] so one clear question here is can we make gpd3 more unbiased and lux less [01:13:38] make gpd3 more unbiased and lux less toxic certainly out of the box this is [01:13:41] toxic certainly out of the box this is just kind of a [01:13:44] just kind of a wild tool that's kind of [01:13:46] wild tool that's kind of like you should not be using gpu [01:13:48] like you should not be using gpu directly to generate things that [01:13:51] directly to generate things that to show show people [01:13:53] to show show people um [01:13:54] um another question is what what are the [01:13:56] another question is what what are the societal impacts of you know using text [01:13:59] societal impacts of you know using text automatic generation [01:14:01] automatic generation you know which is kind of related even [01:14:03] you know which is kind of related even if it's not generally unbiased things um [01:14:05] if it's not generally unbiased things um but especially if it's generally [01:14:06] but especially if it's generally unbiased things um the obvious [01:14:09] unbiased things um the obvious worry is that [01:14:11] worry is that you know fake news is already a problem [01:14:13] you know fake news is already a problem and this is just a massive amplifier for [01:14:15] and this is just a massive amplifier for you know producing large amounts of [01:14:18] you know producing large amounts of credible [01:14:19] credible uh sounding text which is just you know [01:14:22] uh sounding text which is just you know swamps out anything else anything anyone [01:14:24] swamps out anything else anything anyone wants to say so that's [01:14:26] wants to say so that's a potentially a pretty dangerous you [01:14:28] a potentially a pretty dangerous you know world [01:14:29] know world um on the kind of more scientific side i [01:14:32] um on the kind of more scientific side i think there's still [01:14:33] think there's still the outstanding question of you know can [01:14:35] the outstanding question of you know can we achieve language understanding [01:14:37] we achieve language understanding without [01:14:38] without a model of the world or without um you [01:14:41] a model of the world or without um you know world experience so gpd3 is kind of [01:14:44] know world experience so gpd3 is kind of only trained on text [01:14:46] only trained on text and furthermore it has no it's just a [01:14:49] and furthermore it has no it's just a large transformer model which has no [01:14:51] large transformer model which has no internal structure [01:14:52] internal structure and [01:14:53] and you know one question is can it be said [01:14:56] you know one question is can it be said to [01:14:56] to you know understand language or does it [01:14:58] you know understand language or does it need you know the grounding experience [01:15:01] need you know the grounding experience so [01:15:02] so um [01:15:03] um if there were more time i would actually [01:15:04] if there were more time i would actually do a poll and get people to discuss this [01:15:07] do a poll and get people to discuss this because i think it's actually not [01:15:09] because i think it's actually not obvious what the answer is [01:15:11] obvious what the answer is um [01:15:13] um maybe i'll uh [01:15:14] maybe i'll uh end with final remarks and then i'll [01:15:16] end with final remarks and then i'll take questions so [01:15:18] take questions so one thing i want to just you know [01:15:20] one thing i want to just you know starting at the top i want to highlight [01:15:21] starting at the top i want to highlight that language is just incredibly rich [01:15:24] that language is just incredibly rich complex communication system you know [01:15:26] complex communication system you know quote-unquote from x-ray cd glorif [01:15:28] quote-unquote from x-ray cd glorif glorious chaos [01:15:30] glorious chaos but yeah which is i think you know [01:15:32] but yeah which is i think you know fascinating to study [01:15:34] fascinating to study at the same time there's a lot of [01:15:35] at the same time there's a lot of regularity and structure [01:15:37] regularity and structure there has to be because i think [01:15:39] there has to be because i think otherwise we wouldn't be able to [01:15:40] otherwise we wouldn't be able to productively coordinate and use language [01:15:43] productively coordinate and use language as systematically as we have been able [01:15:45] as systematically as we have been able to um current models kind of you know [01:15:47] to um current models kind of you know ignore it um [01:15:49] ignore it um which is you know but it's also unclear [01:15:51] which is you know but it's also unclear how to incorporate this information in a [01:15:53] how to incorporate this information in a way that kind of matters [01:15:55] way that kind of matters um as you can see the field is moving at [01:15:57] um as you can see the field is moving at an incredible pace right you know gb3 [01:16:00] an incredible pace right you know gb3 was on 2 [01:16:01] was on 2 20 20. um [01:16:03] 20 20. um bird was only two years ago c2c was you [01:16:06] bird was only two years ago c2c was you know six years ago and only in the last [01:16:08] know six years ago and only in the last you know six years nlp has been [01:16:10] you know six years nlp has been completely you know transformed i would [01:16:13] completely you know transformed i would say even in the last two years it's been [01:16:15] say even in the last two years it's been completely transformed so [01:16:16] completely transformed so um it's interesting what will happen in [01:16:18] um it's interesting what will happen in the next year and i think there's even [01:16:20] the next year and i think there's even more i think urgency to kind of [01:16:22] more i think urgency to kind of understand what these models are doing [01:16:24] understand what these models are doing and make sure that the technology is [01:16:26] and make sure that the technology is kind of directed in the right way [01:16:29] kind of directed in the right way um and then of course there's a lot more [01:16:31] um and then of course there's a lot more to be [01:16:32] to be said and learn from this there are [01:16:34] said and learn from this there are wonderful classes at stanford that you [01:16:36] wonderful classes at stanford that you can take 124 to 24 [01:16:39] can take 124 to 24 to learn more about [01:16:40] to learn more about nlp so [01:16:42] nlp so with that um i will [01:16:44] with that um i will take any questions [01:16:48] take any questions let's see [01:16:49] let's see where is [01:16:52] okay is voice language a better source [01:16:54] okay is voice language a better source than text language so this is kind of [01:16:56] than text language so this is kind of spoken audio [01:16:58] spoken audio uh it depends what you mean by better um [01:17:03] uh it depends what you mean by better um so [01:17:05] so voice does contain prosodic cues which [01:17:08] voice does contain prosodic cues which contain more information than what's [01:17:09] contain more information than what's just in the text [01:17:11] just in the text um unfortunately there's not that much [01:17:14] um unfortunately there's not that much of it compared to all the texts that's [01:17:16] of it compared to all the texts that's on the web [01:17:18] on the web um [01:17:19] um i think a lot of the issues of language [01:17:20] i think a lot of the issues of language understanding [01:17:22] understanding um can be somewhat factored out from you [01:17:25] um can be somewhat factored out from you know speech [01:17:27] know speech often people convert speech spoken [01:17:29] often people convert speech spoken language into written language and then [01:17:30] language into written language and then you take it from there although there's [01:17:32] you take it from there although there's definitely [01:17:34] definitely infrastructure in uh [01:17:36] infrastructure in uh speech as well how do you feel [01:17:38] speech as well how do you feel understanding multiple languages is a [01:17:40] understanding multiple languages is a plus of research [01:17:42] plus of research you know how do you feel understanding [01:17:44] you know how do you feel understanding of multiple languages is a plus [01:17:46] of multiple languages is a plus of research in nlp [01:17:48] of research in nlp uh so [01:17:50] uh so a lot of [01:17:51] a lot of research in nlp is very [01:17:54] research in nlp is very english centric which um [01:17:57] english centric which um you know is you know i think a potential [01:17:59] you know is you know i think a potential problem if you think about fairness [01:18:01] problem if you think about fairness right i think many [01:18:02] right i think many uh rare language and low resources [01:18:04] uh rare language and low resources languages as they're called um don't [01:18:06] languages as they're called um don't enjoy such high accuracies and those are [01:18:10] enjoy such high accuracies and those are probably the people who kind of need it [01:18:11] probably the people who kind of need it you know the most so there is a [01:18:13] you know the most so there is a community of people in nlp who are very [01:18:16] community of people in nlp who are very much interested in how to design more [01:18:18] much interested in how to design more efficient learning algorithms that help [01:18:21] efficient learning algorithms that help you know low resource languages [01:18:24] finally how has the study of nlp [01:18:27] finally how has the study of nlp helps linguistics grows as a field so [01:18:29] helps linguistics grows as a field so this is a really interesting question [01:18:31] this is a really interesting question unfortunately i won't have time to [01:18:32] unfortunately i won't have time to really do its justice um [01:18:35] really do its justice um chris potts um is a linguist at stanford [01:18:38] chris potts um is a linguist at stanford has lots of you know insightful opinions [01:18:41] has lots of you know insightful opinions about this um linguistics is [01:18:44] about this um linguistics is interesting because [01:18:46] interesting because it was for much of its history it was so [01:18:48] it was for much of its history it was so so dominated by kind of formal you know [01:18:52] so dominated by kind of formal you know uh grammars and semantics starting you [01:18:54] uh grammars and semantics starting you know from chomsky in some ways that his [01:18:57] know from chomsky in some ways that his era led to this kind of whole feel being [01:19:00] era led to this kind of whole feel being dominated by a certain kind of way you [01:19:02] dominated by a certain kind of way you know of thinking and it's almost that [01:19:05] know of thinking and it's almost that some of these other perspectives hasn't [01:19:06] some of these other perspectives hasn't really had a chance to you know to [01:19:09] really had a chance to you know to breathe i think this is [01:19:11] breathe i think this is in a lot of the still the this formal [01:19:14] in a lot of the still the this formal semantics is still very [01:19:16] semantics is still very much um kind of in the logical tradition [01:19:18] much um kind of in the logical tradition where you know you have sentences which [01:19:20] where you know you have sentences which are you look at carefully and you try to [01:19:24] are you look at carefully and you try to intuit what might be going on it's a [01:19:26] intuit what might be going on it's a very different approach than just [01:19:28] very different approach than just looking at [01:19:29] looking at a broad corpus and trying to make you [01:19:32] a broad corpus and trying to make you know sense of you know what's there so [01:19:35] know sense of you know what's there so but this is [01:19:36] but this is starting to change a bit i think at [01:19:38] starting to change a bit i think at least people are thinking about [01:19:42] least people are thinking about how [01:19:43] how these types of models that we're [01:19:45] these types of models that we're developing in machinery could be useful [01:19:48] developing in machinery could be useful i think that [01:19:49] i think that you know unfortunately [01:19:51] you know unfortunately there's it's it's a hard [01:19:53] there's it's it's a hard ask [01:19:54] ask um because [01:19:55] um because in some ways the [01:19:57] in some ways the these deep learning models don't give [01:19:59] these deep learning models don't give you more much more than existence proof [01:20:01] you more much more than existence proof of certain types of data leading to [01:20:04] of certain types of data leading to certain types of behavior [01:20:06] certain types of behavior and it doesn't really give you necessary [01:20:08] and it doesn't really give you necessary insight into [01:20:09] insight into um [01:20:10] um you know language you know itself [01:20:12] you know language you know itself because uh interpretability is kind of a [01:20:14] because uh interpretability is kind of a missing you know component but i think [01:20:16] missing you know component but i think this is a really great question and [01:20:19] this is a really great question and interesting to ask like how [01:20:20] interesting to ask like how these models can actually help us [01:20:22] these models can actually help us understand language better help us [01:20:24] understand language better help us understand language better not help you [01:20:26] understand language better not help you know computer exhibit better language [01:20:28] know computer exhibit better language understanding [01:20:30] understanding um [01:20:31] um can we alter [01:20:32] can we alter and add to the objective function of [01:20:34] and add to the objective function of modern models that makes them more [01:20:35] modern models that makes them more logical [01:20:36] logical and coherent [01:20:38] and coherent yeah so this is a [01:20:40] yeah so this is a kind of a natural direction that a lot [01:20:42] kind of a natural direction that a lot of people are thinking about [01:20:44] of people are thinking about um so why can't you have both you can [01:20:46] um so why can't you have both you can have [01:20:47] have you know the richness of modern neural [01:20:49] you know the richness of modern neural models plus some logical and there's [01:20:51] models plus some logical and there's been a bunch of work [01:20:53] been a bunch of work that uh tries to fuse the two you know [01:20:55] that uh tries to fuse the two you know together um [01:20:57] together um personally i feel like this is far from [01:21:00] personally i feel like this is far from you know being [01:21:02] you know being solved by just kind of simple [01:21:03] solved by just kind of simple combination because i think there's [01:21:06] combination because i think there's something maybe [01:21:07] something maybe deeper about [01:21:08] deeper about um [01:21:09] um you know how things should be kind of [01:21:11] you know how things should be kind of structured [01:21:13] structured like you can add a regularizer that [01:21:14] like you can add a regularizer that makes burr to kind of be more consistent [01:21:16] makes burr to kind of be more consistent for gpd3 that make these more consistent [01:21:19] for gpd3 that make these more consistent but i think the problem is that why are [01:21:22] but i think the problem is that why are we using these models in the first place [01:21:24] we using these models in the first place is that they can't be captured by you [01:21:26] is that they can't be captured by you know logical regularities and the many [01:21:28] know logical regularities and the many of the advantages that we're getting [01:21:30] of the advantages that we're getting from them [01:21:31] from them are [01:21:32] are in places where logic kind of fails to [01:21:35] in places where logic kind of fails to you know deliver so [01:21:37] you know deliver so in those areas i think we don't have [01:21:39] in those areas i think we don't have much [01:21:40] much of an option we don't have the option of [01:21:42] of an option we don't have the option of like uh slapping on a logical regulizer [01:21:45] like uh slapping on a logical regulizer because otherwise we would have just [01:21:47] because otherwise we would have just built a logical thing [01:21:49] built a logical thing um [01:21:51] um how far is real-time spoken [01:21:53] how far is real-time spoken communication in multiple languages uh [01:21:57] communication in multiple languages uh so this is like simultaneous translation [01:21:59] so this is like simultaneous translation so if i am on skype and i speak in [01:22:02] so if i am on skype and i speak in english and it comes out in [01:22:04] english and it comes out in uh [01:22:06] uh japanese or something um [01:22:08] japanese or something um this is actually i would say that this [01:22:11] this is actually i would say that this is [01:22:12] is i mean not mature technology by any [01:22:14] i mean not mature technology by any means right now but it's it's coming [01:22:16] means right now but it's it's coming and i think there's been work [01:22:19] and i think there's been work uh first of all speech recognition i [01:22:21] uh first of all speech recognition i think is you know getting really really [01:22:23] think is you know getting really really good then the other two main challenges [01:22:25] good then the other two main challenges in in machine translation what makes [01:22:27] in in machine translation what makes real time difficult is that you know [01:22:29] real time difficult is that you know word order in languages differ so you [01:22:31] word order in languages differ so you can't translate word to word so you have [01:22:33] can't translate word to word so you have to wait a little bit to get enough [01:22:35] to wait a little bit to get enough context and then translate it um you [01:22:38] context and then translate it um you know and so forth [01:22:40] know and so forth and but there are models on nlp that you [01:22:43] and but there are models on nlp that you know try to do this [01:22:44] know try to do this um and i think you can do a lot by kind [01:22:46] um and i think you can do a lot by kind of doing you know predictive predicting [01:22:49] of doing you know predictive predicting kind of essentially what you're you're [01:22:51] kind of essentially what you're you're going to say [01:22:52] going to say um [01:22:53] um so [01:22:54] so uh i think that [01:22:56] uh i think that a lot of this can be done without even [01:22:59] a lot of this can be done without even you know deep deep understanding of you [01:23:01] you know deep deep understanding of you know language um what i think we've [01:23:03] know language um what i think we've learned from translation is that you [01:23:05] learned from translation is that you know while translation is getting 90 of [01:23:08] know while translation is getting 90 of the way with translation doesn't really [01:23:10] the way with translation doesn't really require any understanding of language [01:23:12] require any understanding of language it's just kind of matching um kind of [01:23:14] it's just kind of matching um kind of symbols contextually i think getting the [01:23:17] symbols contextually i think getting the remaining [01:23:18] remaining bit and having translations that you can [01:23:21] bit and having translations that you can actually you know trust and are nuanced [01:23:24] actually you know trust and are nuanced then proper i think is going to require [01:23:26] then proper i think is going to require you know quite a bit more work [01:23:30] okay so is there okay another [01:23:33] okay so is there okay another question [01:23:34] question um are there examples of building [01:23:36] um are there examples of building languages with rrl [01:23:38] languages with rrl it making advantages communicate a multi [01:23:41] it making advantages communicate a multi in a [01:23:42] in a agent environment [01:23:43] agent environment um [01:23:44] um a multi-engine environment so there is a [01:23:47] a multi-engine environment so there is a bunch of work on what is called kind of [01:23:49] bunch of work on what is called kind of multi-agent communication where people [01:23:52] multi-agent communication where people set up some sort of environment and let [01:23:54] set up some sort of environment and let a bunch of rl agents um [01:23:57] a bunch of rl agents um train a bunch of our agents to act in [01:24:00] train a bunch of our agents to act in this environment where one of the [01:24:01] this environment where one of the actions is [01:24:03] actions is talk [01:24:04] talk right so this is an interesting um [01:24:07] right so this is an interesting um experiment where you can actually get [01:24:10] experiment where you can actually get certain types of languages to kind of [01:24:12] certain types of languages to kind of evolve [01:24:13] evolve from [01:24:15] from this procedure [01:24:16] this procedure uh and [01:24:19] uh and and language does help you play the game [01:24:21] and language does help you play the game or uh you know [01:24:24] or uh you know solve the environment better but it's [01:24:27] solve the environment better but it's rare that these languages just [01:24:29] rare that these languages just automatically line up with our [01:24:31] automatically line up with our notions of [01:24:32] notions of natural languages [01:24:34] natural languages because often these worlds are too [01:24:38] because often these worlds are too kind of limited [01:24:40] kind of limited for [01:24:41] for kind of language [01:24:43] kind of language to be [01:24:44] to be for it to be necessary to take on that [01:24:47] for it to be necessary to take on that kind of richness [01:24:49] kind of richness um and also i think just that [01:24:52] um and also i think just that there's no pressure that these have [01:24:54] there's no pressure that these have you know their human language might not [01:24:57] you know their human language might not is probably not optimal for [01:24:59] is probably not optimal for um you know for anything it's just what [01:25:01] um you know for anything it's just what we have and what we kind of happen to [01:25:04] we have and what we kind of happen to evolve [01:25:05] evolve so [01:25:06] so but that's a good question [01:25:07] but that's a good question what are major problems society that [01:25:09] what are major problems society that advancing nlp will solve versus what [01:25:12] advancing nlp will solve versus what problems they may create so there are a [01:25:15] problems they may create so there are a bunch of places where nlp [01:25:18] bunch of places where nlp can be [01:25:19] can be used for societal [01:25:21] used for societal uh you know good so the one thing that [01:25:24] uh you know good so the one thing that it can do in principle is to allow a [01:25:27] it can do in principle is to allow a broader set of people to [01:25:30] broader set of people to um especially with uh you know different [01:25:33] um especially with uh you know different you know uh people who might not speak [01:25:34] you know uh people who might not speak english to be able to kind of tap into [01:25:37] english to be able to kind of tap into you know english [01:25:38] you know english english resources and breaking down the [01:25:40] english resources and breaking down the kind of multilingual [01:25:42] kind of multilingual barriers um it's also been useful for [01:25:45] barriers um it's also been useful for um [01:25:46] um doing kind of analysis on [01:25:49] doing kind of analysis on studying how people talk so [01:25:51] studying how people talk so dandrovsky and others have a project on [01:25:54] dandrovsky and others have a project on see analyzing how [01:25:57] see analyzing how language of police officers and stops [01:26:00] language of police officers and stops compares depending on whether they're [01:26:02] compares depending on whether they're stopping you know a black person or a [01:26:04] stopping you know a black person or a white person and this could be and using [01:26:06] white person and this could be and using nlp techniques to try to study that [01:26:09] nlp techniques to try to study that question so i think um one huge area is [01:26:13] question so i think um one huge area is language is [01:26:14] language is used in a kind of all in a social and [01:26:17] used in a kind of all in a social and societal context [01:26:18] societal context and uh [01:26:19] and uh you know therefore [01:26:21] you know therefore building tools that help us [01:26:23] building tools that help us manage and navigate that you know [01:26:25] manage and navigate that you know societal can landscape can be really [01:26:27] societal can landscape can be really interesting um problems it can cause [01:26:30] interesting um problems it can cause certainly you know fake generations um [01:26:34] certainly you know fake generations um biases and models if we start trusting [01:26:36] biases and models if we start trusting you know translations or our systems um [01:26:39] you know translations or our systems um it could lead to [01:26:41] it could lead to um you know kind of [01:26:43] um you know kind of amplification effects where you know [01:26:45] amplification effects where you know kind of the the haves and have-nots kind [01:26:48] kind of the the haves and have-nots kind of push farther you know apart [01:26:51] of push farther you know apart all right well i think we're out of time [01:26:53] all right well i think we're out of time so [01:26:54] so thanks everyone for coming and listening [01:26:57] thanks everyone for coming and listening and have a good rest of the week [01:27:05] you ================================================================================ LECTURE 056 ================================================================================ General Conclusion | Stanford CS221: AI (Autumn 2021) Source: https://www.youtube.com/watch?v=iUGmupxCdjs --- Transcript [00:00:05] welcome everyone to the final lecture [00:00:08] welcome everyone to the final lecture let me just share my screen and we can [00:00:10] let me just share my screen and we can get going on this um [00:00:14] share screen [00:00:17] share screen okay so this [00:00:19] okay so this lecture is going to be broken up to [00:00:21] lecture is going to be broken up to three parts um first i'm going to do a [00:00:24] three parts um first i'm going to do a quick recap of the class [00:00:26] quick recap of the class um then i'm going to talk about uh [00:00:29] um then i'm going to talk about uh future classes that you might [00:00:31] future classes that you might take hopefully this class has piqued [00:00:33] take hopefully this class has piqued your interest in ai [00:00:34] your interest in ai and then finally i'm going to end with [00:00:37] and then finally i'm going to end with some broad remarks on where ai is going [00:00:40] some broad remarks on where ai is going and what we should all keep in mind [00:00:43] and what we should all keep in mind so since it's a live lecture um feel [00:00:45] so since it's a live lecture um feel free to interrupt and ask questions i'll [00:00:47] free to interrupt and ask questions i'll monitor the uh chat [00:00:50] monitor the uh chat or if uh samara if you notice anything [00:00:53] or if uh samara if you notice anything you can flag it to me as well [00:00:55] you can flag it to me as well all right so let's begin with a recap so [00:00:58] all right so let's begin with a recap so um no congratulations to making it so so [00:01:02] um no congratulations to making it so so far in the in the quarter we've covered [00:01:05] far in the in the quarter we've covered a lot of ground [00:01:07] a lot of ground um so i'm just going to highlight some [00:01:10] um so i'm just going to highlight some of the key things that you should keep [00:01:12] of the key things that you should keep in mind so recall [00:01:14] in mind so recall we started with this modeling inference [00:01:16] we started with this modeling inference learning paradigm so modeling is the [00:01:19] learning paradigm so modeling is the what it's about how you build a [00:01:21] what it's about how you build a mathematical model that approximates the [00:01:23] mathematical model that approximates the real world it might be a neural network [00:01:25] real world it might be a neural network might be a bayesian network [00:01:26] might be a bayesian network inference is the process of how you use [00:01:28] inference is the process of how you use the mathematical model to answer [00:01:30] the mathematical model to answer questions [00:01:31] questions it's trivial for neural networks but can [00:01:34] it's trivial for neural networks but can be really hard for bayesian networks and [00:01:36] be really hard for bayesian networks and learning is how do you take data and [00:01:38] learning is how do you take data and produce a model so that you can do [00:01:40] produce a model so that you can do inference on it [00:01:42] inference on it so [00:01:43] so in this course we talked about machine [00:01:45] in this course we talked about machine learning then reflex based models [00:01:47] learning then reflex based models state-based models variable-based models [00:01:49] state-based models variable-based models and logic so let me just go through [00:01:51] and logic so let me just go through each of them in turn [00:01:53] each of them in turn so in machine learning we presented the [00:01:55] so in machine learning we presented the loss minimization framework where you [00:01:57] loss minimization framework where you have a training set and you want to find [00:01:58] have a training set and you want to find parameters that minimize some loss and [00:02:00] parameters that minimize some loss and one thing i want to stress is how [00:02:02] one thing i want to stress is how general of a principle this is the loss [00:02:05] general of a principle this is the loss captures basically kind of what you want [00:02:07] captures basically kind of what you want a classifier to have and we explored a [00:02:09] a classifier to have and we explored a few different types of losses depending [00:02:11] few different types of losses depending on the task [00:02:12] on the task and then we had a fairly simple [00:02:15] and then we had a fairly simple algorithm stochastic gradient descent [00:02:18] algorithm stochastic gradient descent that was able to approximately optimize [00:02:20] that was able to approximately optimize these objective [00:02:23] these objective functions [00:02:25] functions and this is really kind of the workhorse [00:02:27] and this is really kind of the workhorse of machine learning these two slides [00:02:29] of machine learning these two slides it's [00:02:29] it's most of machine learning is can be you [00:02:32] most of machine learning is can be you know captured at least these days by [00:02:35] know captured at least these days by writing down a loss function and [00:02:36] writing down a loss function and optimizing it and it works for neural [00:02:39] optimizing it and it works for neural networks it works for [00:02:40] networks it works for um you know clustering and you know [00:02:42] um you know clustering and you know problems and kind of k-means and and so [00:02:45] problems and kind of k-means and and so on [00:02:47] on so i want to underscore that machine [00:02:49] so i want to underscore that machine learning is kind of a general [00:02:51] learning is kind of a general way of being it's the idea of taking [00:02:53] way of being it's the idea of taking data and turning into models but there's [00:02:55] data and turning into models but there's multiple types of models so we looked at [00:02:57] multiple types of models so we looked at reflex reflex based models in the very [00:02:59] reflex reflex based models in the very beginning [00:03:00] beginning linear models neural networks nearest [00:03:02] linear models neural networks nearest neighbors inferences just feed forward [00:03:05] neighbors inferences just feed forward pass through the neural network [00:03:08] pass through the neural network and learning we use to catch a green [00:03:10] and learning we use to catch a green descent or k-means in the case of [00:03:12] descent or k-means in the case of clustering [00:03:14] clustering then we looked at problems where [00:03:16] then we looked at problems where you weren't interested in just a single [00:03:18] you weren't interested in just a single decision but you were interested in a [00:03:21] decision but you were interested in a sequence of decisions from let's say [00:03:23] sequence of decisions from let's say from getting to point a from to point b [00:03:26] from getting to point a from to point b and we embarked on the journey of [00:03:28] and we embarked on the journey of state-based models and here the idea of [00:03:30] state-based models and here the idea of a state is a summary of all the past [00:03:32] a state is a summary of all the past actions sufficient to choose future [00:03:34] actions sufficient to choose future actions optimally [00:03:36] actions optimally and that crystally encapsulates what [00:03:38] and that crystally encapsulates what kind of a state based model is and you [00:03:39] kind of a state based model is and you have lots of practice [00:03:42] have lots of practice coming up with state-based models for [00:03:43] coming up with state-based models for various problems if they're [00:03:45] various problems if they're deterministic those are called search [00:03:46] deterministic those are called search problems you can use uh uniform cost [00:03:48] problems you can use uh uniform cost search or a-star if you have randomness [00:03:50] search or a-star if you have randomness you modify smarthof decision processes [00:03:53] you modify smarthof decision processes you can use things like value iteration [00:03:55] you can use things like value iteration for inference and for games [00:03:58] for inference and for games these capture the cases where there's an [00:04:00] these capture the cases where there's an adversary and you have to use a min max [00:04:02] adversary and you have to use a min max formulation [00:04:04] formulation we for search problems we didn't really [00:04:06] we for search problems we didn't really touch on learning although you can do [00:04:08] touch on learning although you can do that um [00:04:10] that um and and for [00:04:11] and and for mdps and games [00:04:13] mdps and games we have these reinforcement learning [00:04:15] we have these reinforcement learning you know algorithms so really you can [00:04:17] you know algorithms so really you can think about reinforcement as machine [00:04:18] think about reinforcement as machine learning for state-based models where [00:04:21] learning for state-based models where there's randomness in the environment [00:04:24] there's randomness in the environment then we move on to variable based models [00:04:26] then we move on to variable based models which is a higher level of abstraction [00:04:28] which is a higher level of abstraction um it's a kind of a different modeling [00:04:30] um it's a kind of a different modeling language if you will [00:04:32] language if you will we looked at the key idea here is a [00:04:34] we looked at the key idea here is a factor graph [00:04:36] factor graph which captures a set of variables whose [00:04:38] which captures a set of variables whose values you want to determine [00:04:40] values you want to determine and the the factors these are little [00:04:43] and the the factors these are little squares captures dependencies between [00:04:46] squares captures dependencies between variables and the key thing is that the [00:04:48] variables and the key thing is that the factors are generally local but the [00:04:50] factors are generally local but the questions you want to answer are [00:04:51] questions you want to answer are probably global [00:04:54] so they're deterministic we have [00:04:55] so they're deterministic we have constraint satisfaction problems for [00:04:57] constraint satisfaction problems for things like scheduling [00:04:59] things like scheduling we looked at backtracking and beam [00:05:01] we looked at backtracking and beam search [00:05:02] search if you put on your probability hat then [00:05:04] if you put on your probability hat then we can turn these factor graphs into [00:05:06] we can turn these factor graphs into markov networks by defining a [00:05:08] markov networks by defining a distribution [00:05:09] distribution over all the random variables [00:05:12] over all the random variables for inference we looked at gibbs [00:05:13] for inference we looked at gibbs sampling but there's other methods as [00:05:15] sampling but there's other methods as well and then [00:05:17] well and then to give [00:05:18] to give a kind of a more of a [00:05:20] a kind of a more of a interpretation to how the factors are [00:05:22] interpretation to how the factors are constructed we look at bayesian networks [00:05:24] constructed we look at bayesian networks where each of the factors was a local [00:05:26] where each of the factors was a local conditional probability and then we look [00:05:28] conditional probability and then we look at uh four backward and particle forking [00:05:30] at uh four backward and particle forking methods for um at least chain structure [00:05:33] methods for um at least chain structure bayesian networks there's much more to [00:05:34] bayesian networks there's much more to be said about here this is just kind of [00:05:36] be said about here this is just kind of a taste [00:05:36] a taste of variable-based models [00:05:39] of variable-based models um for loading we only look at learning [00:05:41] um for loading we only look at learning for bayesian networks [00:05:43] for bayesian networks based on the maximum likelihood [00:05:44] based on the maximum likelihood principle but you can apply maximum [00:05:45] principle but you can apply maximum likelihood to any probabilistic model [00:05:47] likelihood to any probabilistic model including markov networks for bayesian [00:05:49] including markov networks for bayesian networks it was really nice because um [00:05:52] networks it was really nice because um you know learning is closed form you [00:05:54] you know learning is closed form you just count and normalize [00:05:56] just count and normalize for latent variables models you have the [00:05:58] for latent variables models you have the expectation maximization algorithm [00:06:01] expectation maximization algorithm where you have to use inference to [00:06:03] where you have to use inference to impute the missing variables and then [00:06:05] impute the missing variables and then you count and normalize [00:06:08] and finally we look at logic based [00:06:10] and finally we look at logic based models [00:06:11] models and here the idea of logic [00:06:13] and here the idea of logic is that [00:06:15] is that it goes a kind of one level of [00:06:17] it goes a kind of one level of abstraction higher it introduces things [00:06:19] abstraction higher it introduces things called formulas which allows you to [00:06:22] called formulas which allows you to represent more kind of powerful things [00:06:24] represent more kind of powerful things even kind of infinite things you can [00:06:25] even kind of infinite things you can talk about like all the even numbers for [00:06:27] talk about like all the even numbers for example which is you know infinite set [00:06:30] example which is you know infinite set um we look at [00:06:32] um we look at two models propositional logic and first [00:06:34] two models propositional logic and first order logic um to do inference it [00:06:37] order logic um to do inference it generally is pretty um hard [00:06:40] generally is pretty um hard for propositional logic you can do [00:06:42] for propositional logic you can do you know model checking or you can work [00:06:45] you know model checking or you can work on the inference rules directly which is [00:06:47] on the inference rules directly which is one of the nice things about having [00:06:48] one of the nice things about having logical rules [00:06:50] logical rules modus ponens and resolution are [00:06:52] modus ponens and resolution are different [00:06:53] different inference methods [00:06:55] inference methods so [00:06:56] so you know sometimes i like to say and [00:06:57] you know sometimes i like to say and logic is about how you can express kind [00:06:59] logic is about how you can express kind of very complicated things very [00:07:00] of very complicated things very succinctly [00:07:04] and learning we didn't get to get a [00:07:05] and learning we didn't get to get a chance to talk about but there are ways [00:07:06] chance to talk about but there are ways to also bring machine learning to logic [00:07:10] to also bring machine learning to logic so um hopefully [00:07:12] so um hopefully this [00:07:13] this you know you see cs291 and and if you [00:07:16] you know you see cs291 and and if you haven't seen a lot of this material for [00:07:17] haven't seen a lot of this material for the first time it can be a little bit [00:07:19] the first time it can be a little bit overwhelming there's just so many models [00:07:20] overwhelming there's just so many models tools [00:07:21] tools um methods [00:07:23] um methods um but i hope that this kind of [00:07:25] um but i hope that this kind of organization gives you a way to think [00:07:28] organization gives you a way to think about you know how everything fits [00:07:31] about you know how everything fits together right i don't want you to think [00:07:33] together right i don't want you to think about like oh okay there's nearest [00:07:34] about like oh okay there's nearest neighbors and there's bayesian networks [00:07:35] neighbors and there's bayesian networks how are they related i i want you guys [00:07:38] how are they related i i want you guys to think about you know the trajectory [00:07:39] to think about you know the trajectory of [00:07:40] of there's a bunch of models with different [00:07:42] there's a bunch of models with different kind of uh which are bucketed into [00:07:44] kind of uh which are bucketed into reflex state and variable and logic and [00:07:47] reflex state and variable and logic and then you can do learning on on top of [00:07:50] then you can do learning on on top of that and that allows you to [00:07:52] that and that allows you to um have just a much more [00:07:55] um have just a much more nuanced and [00:07:56] nuanced and you know holistic picture of all the [00:07:58] you know holistic picture of all the methods in ai [00:08:02] methods in ai um and what's important i think is that [00:08:04] um and what's important i think is that you know the individual methods like [00:08:06] you know the individual methods like whether you use particle filtering will [00:08:08] whether you use particle filtering will change over time and in general in [00:08:10] change over time and in general in applications you might have to use [00:08:12] applications you might have to use something a bit more sophisticated so [00:08:14] something a bit more sophisticated so hopefully this class has imparted on you [00:08:17] hopefully this class has imparted on you just kind of a way of thinking about you [00:08:20] just kind of a way of thinking about you know the modeling and the inference and [00:08:21] know the modeling and the inference and learning as separate so that [00:08:24] learning as separate so that whenever you encounter a new algorithm [00:08:26] whenever you encounter a new algorithm that you read a paper somewhere um you [00:08:28] that you read a paper somewhere um you can actually kind of incorporate it into [00:08:30] can actually kind of incorporate it into your conceptual map [00:08:34] okay that's all i want to say about the [00:08:36] okay that's all i want to say about the recap are there any questions [00:08:40] recap are there any questions can you uh throw some light on what [00:08:43] can you uh throw some light on what should be the first tool that we should [00:08:45] should be the first tool that we should be using uh if we are presented with a [00:08:47] be using uh if we are presented with a problem [00:08:52] what is a tool that you should try first [00:08:54] what is a tool that you should try first it really depends on the problem [00:08:57] it really depends on the problem i think these days it's very [00:08:59] i think these days it's very natural and easy um to throw machine [00:09:02] natural and easy um to throw machine learning supervised classification at [00:09:04] learning supervised classification at the problem [00:09:05] the problem um and that makes sense when your [00:09:08] um and that makes sense when your um you know problem involves you know [00:09:11] um you know problem involves you know basically a single action it's high [00:09:13] basically a single action it's high dimensional you don't really know what [00:09:14] dimensional you don't really know what to do with it [00:09:16] to do with it um but for many problems like if you're [00:09:18] um but for many problems like if you're kind of scheduling or doing route [00:09:20] kind of scheduling or doing route planning [00:09:20] planning or something a bit more structured um [00:09:24] or something a bit more structured um you wouldn't want to necessarily start [00:09:25] you wouldn't want to necessarily start with machine learning because in sort of [00:09:26] with machine learning because in sort of start with machine learning [00:09:28] start with machine learning you need to gather data and if you don't [00:09:30] you need to gather data and if you don't have data then that might not be the [00:09:32] have data then that might not be the best place you know to start [00:09:35] best place you know to start so i don't think that there's any [00:09:38] so i don't think that there's any one place to start um [00:09:41] one place to start um and hopefully these [00:09:43] and hopefully these you can think about the cs291 toolboxes [00:09:45] you can think about the cs291 toolboxes i kind of the the first layer in the [00:09:48] i kind of the the first layer in the breadth first search these are the [00:09:49] breadth first search these are the different options you should think about [00:09:51] different options you should think about is machine learning good or should this [00:09:53] is machine learning good or should this should this be a search problem or [00:09:55] should this be a search problem or should this be a [00:09:56] should this be a bayesian network for example [00:10:03] most of the current machine learning are [00:10:05] most of the current machine learning are reflex based um they're a low-level [00:10:07] reflex based um they're a low-level intelligent comparative logic [00:10:09] intelligent comparative logic interesting [00:10:10] interesting point [00:10:12] point um [00:10:13] um we are in a very interesting time where [00:10:15] we are in a very interesting time where a lot of what we see is machine learning [00:10:18] a lot of what we see is machine learning and it's also very impressive how [00:10:21] and it's also very impressive how a lot of the so-called reflex based [00:10:23] a lot of the so-called reflex based models are actually capable of doing [00:10:25] models are actually capable of doing some fairly sophisticated things [00:10:28] some fairly sophisticated things like if you think about you know alphago [00:10:30] like if you think about you know alphago yes there was multi-college research [00:10:32] yes there was multi-college research that allowed you to actually build a [00:10:34] that allowed you to actually build a competitive agent but even just like uh [00:10:36] competitive agent but even just like uh um you know classifying a game board um [00:10:40] um you know classifying a game board um that i mean that could definitely beat [00:10:41] that i mean that could definitely beat me at a goal um [00:10:43] me at a goal um so [00:10:45] so you know it's [00:10:46] you know it's i think that [00:10:48] i think that you know in [00:10:49] you know in in cognitive science people talk about [00:10:51] in cognitive science people talk about system one and system two right and both [00:10:53] system one and system two right and both co kind of coexist system one is kind of [00:10:55] co kind of coexist system one is kind of the reflexive um [00:10:58] the reflexive um agent uh making kind of [00:11:01] agent uh making kind of you know guesses at what uh what should [00:11:03] you know guesses at what uh what should be the right thing to do and system two [00:11:05] be the right thing to do and system two is kind of the more rational well [00:11:07] is kind of the more rational well thought out and reason path and i think [00:11:09] thought out and reason path and i think we need both and the two [00:11:12] we need both and the two need to kind of coexist and kind of feed [00:11:14] need to kind of coexist and kind of feed off of each other [00:11:16] off of each other um could you give some examples of ml [00:11:18] um could you give some examples of ml methods for search problems and logic [00:11:21] methods for search problems and logic problems [00:11:22] problems uh so [00:11:24] uh so search um a lot of things can be passed [00:11:27] search um a lot of things can be passed as search problems so there's a whole [00:11:28] as search problems so there's a whole field called structure prediction which [00:11:30] field called structure prediction which your our goal is to output [00:11:32] your our goal is to output a structure let's say a graph or a [00:11:35] a structure let's say a graph or a sentence or [00:11:37] sentence or or something [00:11:39] or something and [00:11:40] and um in those cases you often want to [00:11:44] um in those cases you often want to learn how to do that [00:11:46] learn how to do that so there you actually combine [00:11:49] so there you actually combine some search techniques so the inference [00:11:52] some search techniques so the inference um [00:11:53] um algorithm becomes you know search um [00:11:56] algorithm becomes you know search um rather than just a fee for through a [00:11:58] rather than just a fee for through a neural network but the kind of the [00:11:59] neural network but the kind of the learning part is is still the same and [00:12:02] learning part is is still the same and we didn't talk about the structure [00:12:03] we didn't talk about the structure perception i think but it's a night um [00:12:05] perception i think but it's a night um you should look at it i think it's in [00:12:07] you should look at it i think it's in the slide still where [00:12:09] the slide still where you are [00:12:11] you are um [00:12:12] um able to [00:12:14] able to you know make a prediction using an [00:12:16] you know make a prediction using an inference algorithm you get a prediction [00:12:18] inference algorithm you get a prediction you compare that with the the correct [00:12:20] you compare that with the the correct prediction and you do a gradient update [00:12:24] prediction and you do a gradient update um and learn logic [00:12:26] um and learn logic um [00:12:27] um you know there's similar things you can [00:12:29] you know there's similar things you can you could do for example markov logic is [00:12:31] you could do for example markov logic is a way of using combining kind of markov [00:12:34] a way of using combining kind of markov logic in the context of markov networks [00:12:36] logic in the context of markov networks and markup networks you can estimate [00:12:38] and markup networks you can estimate using maximum likelihood [00:12:42] all right uh let's [00:12:44] all right uh let's yeah great questions um feel free to [00:12:47] yeah great questions um feel free to put more in the chat but i'm going to [00:12:49] put more in the chat but i'm going to move on for now since there's a few [00:12:51] move on for now since there's a few other things [00:12:52] other things um to get through [00:12:54] um to get through okay [00:12:55] okay so now you've taken cs221 maybe this was [00:12:57] so now you've taken cs221 maybe this was your first class maybe you've taken um [00:12:59] your first class maybe you've taken um by a bunch of other classes i want to [00:13:01] by a bunch of other classes i want to talk about [00:13:03] talk about what else is related to cs2201 so first [00:13:06] what else is related to cs2201 so first of all i'm not going to give you a whole [00:13:07] of all i'm not going to give you a whole list of the complete list of courses you [00:13:10] list of the complete list of courses you can see the list of ai courses on this [00:13:12] can see the list of ai courses on this website [00:13:13] website um but that isn't even the whole list of [00:13:15] um but that isn't even the whole list of courses which i think are relevant [00:13:17] courses which i think are relevant so what i've done here is try to help us [00:13:19] so what i've done here is try to help us understand what are the types of courses [00:13:21] understand what are the types of courses that you might be interested in why you [00:13:23] that you might be interested in why you might be interested in them and then i'm [00:13:25] might be interested in them and then i'm going to go through each category and [00:13:27] going to go through each category and give a few examples of the most kind of [00:13:29] give a few examples of the most kind of popular ones [00:13:31] popular ones so [00:13:33] so the the most kind of obvious type of [00:13:35] the the most kind of obvious type of course is well here we've taken 221 [00:13:38] course is well here we've taken 221 we've [00:13:40] we've learned about some methods let's learn [00:13:41] learned about some methods let's learn more about methods let's learn about you [00:13:44] more about methods let's learn about you know monte carlo markov chain monte [00:13:46] know monte carlo markov chain monte carlo in general for example and these [00:13:49] carlo in general for example and these tend to be a more general purpose [00:13:51] tend to be a more general purpose um [00:13:53] um but that's not the only type of course [00:13:54] but that's not the only type of course that i think is relevant so applications [00:13:57] that i think is relevant so applications are extremely useful of course we are [00:13:59] are extremely useful of course we are interested in you know applications [00:14:01] interested in you know applications because that's the kind of real impact [00:14:03] because that's the kind of real impact of ai is when they're applied to things [00:14:05] of ai is when they're applied to things but also in the other direction [00:14:07] but also in the other direction often when you take an applied class you [00:14:10] often when you take an applied class you learn the method much better than if you [00:14:12] learn the method much better than if you take a kind of an abstract because you [00:14:14] take a kind of an abstract because you appreciate when it works when it doesn't [00:14:15] appreciate when it works when it doesn't work all the nuances [00:14:18] work all the nuances and then finally [00:14:19] and then finally i would really stress and kind of invest [00:14:22] i would really stress and kind of invest in building depth for both kind of [00:14:23] in building depth for both kind of methods and applications [00:14:25] methods and applications and usually these are courses not in ai [00:14:27] and usually these are courses not in ai they might [00:14:28] they might for a method side maybe i'm [00:14:30] for a method side maybe i'm investigating kind of more mathematical [00:14:31] investigating kind of more mathematical foundations or an applied area if you're [00:14:34] foundations or an applied area if you're interested in computational biology take [00:14:36] interested in computational biology take a biology class [00:14:37] a biology class um i think these days it's too [00:14:40] um i think these days it's too i think easy to kind of go through [00:14:42] i think easy to kind of go through um kind of an ai curriculum and not [00:14:44] um kind of an ai curriculum and not really have as much depth [00:14:47] really have as much depth because you can you can do a lot with [00:14:50] because you can you can do a lot with with kind of just downloading you know [00:14:53] with kind of just downloading you know packages and running data but i think if [00:14:55] packages and running data but i think if you really wanted to especially if you [00:14:57] you really wanted to especially if you think about kind of research [00:14:58] think about kind of research having the depth kind of can distinguish [00:15:00] having the depth kind of can distinguish you from [00:15:01] you from um and make you [00:15:03] um and make you more able to come and come up with kind [00:15:05] more able to come and come up with kind of new insights and ideas [00:15:08] of new insights and ideas so let's start with methods um [00:15:11] so let's start with methods um so these are categorized into maybe the [00:15:13] so these are categorized into maybe the different topic areas that we've covered [00:15:15] different topic areas that we've covered in the class so first is you know [00:15:16] in the class so first is you know machine learning everyone else probably [00:15:18] machine learning everyone else probably knows about cs220 alliance so it's kind [00:15:20] knows about cs220 alliance so it's kind of the standard um [00:15:22] of the standard um you know poster child machine learning [00:15:24] you know poster child machine learning class [00:15:25] class um compared to cs221 i think there's [00:15:27] um compared to cs221 i think there's this question comes up a lot it's more [00:15:30] this question comes up a lot it's more kind of mathematical derivations rather [00:15:32] kind of mathematical derivations rather than as much you know programming um [00:15:34] than as much you know programming um there's more continuous variables cs201 [00:15:37] there's more continuous variables cs201 tries to shield you from that and deal [00:15:39] tries to shield you from that and deal with discrete variables just [00:15:42] with discrete variables just for the interest of kind of scoping and [00:15:44] for the interest of kind of scoping and you learn some kind of fancier things [00:15:45] you learn some kind of fancier things like kernel methods and pca so if you [00:15:47] like kernel methods and pca so if you really want to dig in more into machine [00:15:49] really want to dig in more into machine learning [00:15:51] learning you know that's the class for you [00:15:53] you know that's the class for you if you are looking for just a kind of [00:15:54] if you are looking for just a kind of more of how do i apply machine learning [00:15:57] more of how do i apply machine learning especially deep learning which has been [00:15:59] especially deep learning which has been you know increasingly important [00:16:02] you know increasingly important you know the ts230 will tell you how to [00:16:04] you know the ts230 will tell you how to train these duke neural networks which [00:16:07] train these duke neural networks which um is [00:16:08] um is has a lot of you know [00:16:10] has a lot of you know bells and whistles and things that you [00:16:12] bells and whistles and things that you need to know about like dropout and [00:16:14] need to know about like dropout and batch norm um to make these things work [00:16:16] batch norm um to make these things work so if you're really interested in the [00:16:17] so if you're really interested in the general practice of how you get deep [00:16:19] general practice of how you get deep learning to work [00:16:21] learning to work that's a class for you [00:16:24] so there's three other classes [00:16:27] so there's three other classes that i want to mention of course there's [00:16:28] that i want to mention of course there's more so the first class is machine [00:16:31] more so the first class is machine learning under distribution of shifts [00:16:34] learning under distribution of shifts so we have mentioned a few times that [00:16:36] so we have mentioned a few times that machine learning fails when the training [00:16:37] machine learning fails when the training distribution isn't the same as the test [00:16:39] distribution isn't the same as the test distribution for example [00:16:40] distribution for example um there's adversarial examples and this [00:16:44] um there's adversarial examples and this class is [00:16:45] class is aimed at [00:16:46] aimed at telling you you know [00:16:48] telling you you know what's going on there and what you can [00:16:49] what's going on there and what you can do about it [00:16:51] do about it um [00:16:52] um often you think about machine learning [00:16:54] often you think about machine learning as uh kind of one task at a time but [00:16:56] as uh kind of one task at a time but increasingly now [00:16:58] increasingly now we're seeing much more general learning [00:17:01] we're seeing much more general learning tools that allow you to generalize [00:17:02] tools that allow you to generalize across multiple tasks um so that's you [00:17:05] across multiple tasks um so that's you know pretty exciting [00:17:07] know pretty exciting and finally we think about machine [00:17:09] and finally we think about machine learning often on kind of single data [00:17:11] learning often on kind of single data points [00:17:12] points which are less structured but you know [00:17:14] which are less structured but you know machine learning can be done in the [00:17:15] machine learning can be done in the context of graphs [00:17:17] context of graphs so there's a class [00:17:18] so there's a class about that as well [00:17:21] about that as well then there's reinforcement learning if [00:17:22] then there's reinforcement learning if you like the reinforcement learning [00:17:24] you like the reinforcement learning section [00:17:25] section and want to know [00:17:27] and want to know more advanced methods take cs234 [00:17:30] more advanced methods take cs234 we get you get to learn about policy [00:17:32] we get you get to learn about policy search [00:17:33] search um whereas we've looked at kind of more [00:17:35] um whereas we've looked at kind of more like q learning which is that's maybe a [00:17:37] like q learning which is that's maybe a value function [00:17:38] value function um [00:17:39] um then there's a decision making on [00:17:41] then there's a decision making on there's uncertainty from the erastral [00:17:44] there's uncertainty from the erastral you know department which focuses more [00:17:46] you know department which focuses more on model based if you remember [00:17:47] on model based if you remember distinction between model based and [00:17:49] distinction between model based and model free so in kind of more serious [00:17:52] model free so in kind of more serious uh you know applications you really want [00:17:54] uh you know applications you really want to [00:17:55] to have maybe a model of what's going on [00:17:59] have maybe a model of what's going on in the world and you could do things [00:18:00] in the world and you could do things like planning rather than just like [00:18:01] like planning rather than just like being kind of a reflex agent [00:18:04] being kind of a reflex agent um general models if you enjoyed [00:18:06] um general models if you enjoyed bayesian networks um and markup networks [00:18:09] bayesian networks um and markup networks and this is the class for you cs228 um [00:18:13] and this is the class for you cs228 um probably graphical models it's kind of a [00:18:16] probably graphical models it's kind of a fairly kind of natural extension of the [00:18:18] fairly kind of natural extension of the things that we've talked about fancier [00:18:20] things that we've talked about fancier inference algorithms um how you learn [00:18:22] inference algorithms um how you learn the structure and so on [00:18:24] the structure and so on um [00:18:25] um recent in the last i guess i don't know [00:18:27] recent in the last i guess i don't know five years there's been a surge of [00:18:29] five years there's been a surge of interest in um [00:18:31] interest in um you know general models which are [00:18:32] you know general models which are supercharged with deep learning and uh [00:18:35] supercharged with deep learning and uh if you have probably many people have [00:18:37] if you have probably many people have seen gans generating really photo [00:18:39] seen gans generating really photo realistic images um this is all enabled [00:18:41] realistic images um this is all enabled by deep generative models which builds [00:18:43] by deep generative models which builds on the principles of general models but [00:18:45] on the principles of general models but you kind of combine it with deep [00:18:47] you kind of combine it with deep learning and you get you know [00:18:49] learning and you get you know really [00:18:50] really interesting results [00:18:52] interesting results so let's talk a little bit about [00:18:54] so let's talk a little bit about applications i'm only going to talk [00:18:56] applications i'm only going to talk about three applications vision language [00:18:58] about three applications vision language and robotics of course there's [00:18:59] and robotics of course there's computational biology there's healthcare [00:19:01] computational biology there's healthcare and there's other things which um i'm [00:19:03] and there's other things which um i'm not gonna have time to mention [00:19:05] not gonna have time to mention so vision there are uh [00:19:08] so vision there are uh there's a kind of a [00:19:10] there's a kind of a stock [00:19:11] stock i mean the the canonical vision uh class [00:19:13] i mean the the canonical vision uh class is i guess at this point as cs231n [00:19:17] is i guess at this point as cs231n it's fairly machine learning heavy you [00:19:19] it's fairly machine learning heavy you talk about like learn what confidence [00:19:20] talk about like learn what confidence and transformer so it's [00:19:22] and transformer so it's more um general purpose than vision [00:19:26] more um general purpose than vision but you talk about some you know vision [00:19:28] but you talk about some you know vision specific tasks like detection [00:19:30] specific tasks like detection segmentation generation [00:19:32] segmentation generation um [00:19:32] um there is [00:19:33] there is cs231a which talks is more kind of on [00:19:37] cs231a which talks is more kind of on the vision side so if you feel like [00:19:38] the vision side so if you feel like you've already [00:19:39] you've already uh know your ml [00:19:41] uh know your ml but you really want to learn more about [00:19:43] but you really want to learn more about kind of vision this might be a good um [00:19:46] kind of vision this might be a good um class for you um because vision [00:19:47] class for you um because vision ultimately is about how light you know [00:19:50] ultimately is about how light you know works in with um in the kind of a 3d [00:19:53] works in with um in the kind of a 3d world [00:19:54] world um and so you kind of get into that [00:19:57] um and so you kind of get into that there's also a i think a newish class on [00:20:01] there's also a i think a newish class on how um [00:20:02] how um ai intersects with graphics which is [00:20:04] ai intersects with graphics which is kind of a close cousin of vision [00:20:07] kind of a close cousin of vision um and this is has some emphasis on kind [00:20:10] um and this is has some emphasis on kind of generating things like generating [00:20:12] of generating things like generating animation but also a much more [00:20:16] animation but also a much more you know in-depth emphasis on kind of [00:20:18] you know in-depth emphasis on kind of rendering and geometry [00:20:22] okay so robotics uh [00:20:25] okay so robotics uh there's introduction to robotics where [00:20:27] there's introduction to robotics where you learn about um how to how you [00:20:30] you learn about um how to how you explore kind of physical models of your [00:20:32] explore kind of physical models of your products how to kind of move arms and [00:20:35] products how to kind of move arms and how to relate kind of joint angles to [00:20:37] how to relate kind of joint angles to actually what the robot does in the in [00:20:39] actually what the robot does in the in the real world [00:20:40] the real world um [00:20:42] um cs2237 has a little bit more [00:20:44] cs2237 has a little bit more uh [00:20:46] uh learning involved because for more [00:20:47] learning involved because for more complex robotics tasks you can't really [00:20:50] complex robotics tasks you can't really do everything from first principles so [00:20:51] do everything from first principles so there's some learning involved but you [00:20:53] there's some learning involved but you know you still need to [00:20:55] know you still need to look at kind of the [00:20:57] look at kind of the you know the structure of the robotics [00:20:58] you know the structure of the robotics problems [00:21:00] problems um language there's a few language [00:21:01] um language there's a few language classes cs224n is a kind of the standard [00:21:04] classes cs224n is a kind of the standard language uh class [00:21:06] language uh class um it is also [00:21:08] um it is also ml heavy just like cs231n [00:21:11] ml heavy just like cs231n um it talks about a sun a bunch of [00:21:13] um it talks about a sun a bunch of different language tasks like parsing [00:21:15] different language tasks like parsing and translation [00:21:17] and translation cs224u is um called natural language [00:21:20] cs224u is um called natural language understanding people ask like what's the [00:21:22] understanding people ask like what's the difference between processing [00:21:23] difference between processing understanding [00:21:24] understanding um [00:21:25] um historically there used to be a bigger [00:21:27] historically there used to be a bigger difference but now with deep learning i [00:21:29] difference but now with deep learning i think these two classes have [00:21:31] think these two classes have you know have much more overlap um you [00:21:34] you know have much more overlap um you can look at the topics they're kind of [00:21:36] can look at the topics they're kind of slightly you know different maybe more [00:21:38] slightly you know different maybe more emphasis on like i know semantics [00:21:41] emphasis on like i know semantics um [00:21:42] um there's a class on applications of [00:21:44] there's a class on applications of virtual assistants [00:21:45] virtual assistants and um next [00:21:47] and um next quarter i'm actually going to be [00:21:49] quarter i'm actually going to be teaching understanding and developing [00:21:50] teaching understanding and developing large language models um so you might [00:21:53] large language models um so you might have heard me talk about foundation [00:21:54] have heard me talk about foundation models or gp3 or things like that [00:21:58] models or gp3 or things like that beyond just the kind of the [00:22:00] beyond just the kind of the the technical aspects of how these [00:22:02] the technical aspects of how these models work and how they're built are a [00:22:05] models work and how they're built are a lot of kind of social ethical legal [00:22:07] lot of kind of social ethical legal considerations um [00:22:10] considerations um so [00:22:11] so we're going to talk about some of those [00:22:12] we're going to talk about some of those things as well as giving you hands-on [00:22:14] things as well as giving you hands-on experience giving you access to these [00:22:16] experience giving you access to these large language models so you can kind of [00:22:17] large language models so you can kind of feel them and kind of even train some of [00:22:20] feel them and kind of even train some of them yourself so it should be an [00:22:21] them yourself so it should be an exciting interesting class [00:22:25] okay so the third category is [00:22:27] okay so the third category is foundations um there's many types of [00:22:29] foundations um there's many types of foundations um these are more [00:22:31] foundations um these are more mathematical foundations um so convex [00:22:34] mathematical foundations um so convex optimization it's a great class [00:22:36] optimization it's a great class to [00:22:37] to really kind of understand you know [00:22:39] really kind of understand you know optimization [00:22:40] optimization so most machine learning people these [00:22:42] so most machine learning people these days think you know run s2d and that's [00:22:44] days think you know run s2d and that's that's good and for many things that's [00:22:47] that's good and for many things that's that's fine um for a kind of more [00:22:50] that's fine um for a kind of more sloppy optimization that's fine [00:22:52] sloppy optimization that's fine there are cases where you do want to [00:22:53] there are cases where you do want to optimize your [00:22:55] optimize your um your utility function and you need to [00:22:58] um your utility function and you need to do something more serious [00:23:00] do something more serious so optimization and also this class is [00:23:03] so optimization and also this class is actually a [00:23:04] actually a you know i took this class in a similar [00:23:06] you know i took this class in a similar class in in grad school and that's [00:23:08] class in in grad school and that's really when i started you know kind of [00:23:10] really when i started you know kind of understanding linear algebra so i think [00:23:12] understanding linear algebra so i think it's [00:23:13] it's even if you're not interested in [00:23:14] even if you're not interested in optimization [00:23:15] optimization it gives you familiarity with thinking [00:23:17] it gives you familiarity with thinking about kind of uh you know linear algebra [00:23:20] about kind of uh you know linear algebra um statistical inference so there's a [00:23:23] um statistical inference so there's a whole host of statistics classes which [00:23:26] whole host of statistics classes which is important to kind of think about so [00:23:28] is important to kind of think about so you know machinery and statistics have [00:23:30] you know machinery and statistics have clearly a lot of overlap but they kind [00:23:32] clearly a lot of overlap but they kind of have different emphasis statistics [00:23:34] of have different emphasis statistics focus on more on kind of scientific [00:23:36] focus on more on kind of scientific discovery especially learning more [00:23:37] discovery especially learning more engineering um so some of the questions [00:23:40] engineering um so some of the questions you might ask are different like you [00:23:41] you might ask are different like you care about like hypothesis testing and [00:23:43] care about like hypothesis testing and confidence intervals and the validity of [00:23:46] confidence intervals and the validity of your [00:23:46] your inferences because you don't always have [00:23:49] inferences because you don't always have just like a held out test set that you [00:23:50] just like a held out test set that you can or validation set that you can [00:23:53] can or validation set that you can measure performance against like you [00:23:54] measure performance against like you have in engineering um so a lot of if [00:23:56] have in engineering um so a lot of if you're thinking about you know more like [00:23:58] you're thinking about you know more like kind of scientific applications i think [00:24:00] kind of scientific applications i think a bit of rigorous statistical thinking [00:24:02] a bit of rigorous statistical thinking would be healthy [00:24:04] would be healthy and there's a class [00:24:06] and there's a class if you ever wonder [00:24:07] if you ever wonder why does it all work why is machine [00:24:09] why does it all work why is machine learning deep learning [00:24:11] learning deep learning why is it so effective you can take [00:24:14] why is it so effective you can take machine learning theory [00:24:16] machine learning theory and it talks quite [00:24:18] and it talks quite in-depth about the kind of uh fairly [00:24:20] in-depth about the kind of uh fairly technical probabilistic tools like [00:24:23] technical probabilistic tools like uniform convergence that helps you [00:24:24] uniform convergence that helps you explain [00:24:25] explain or partially explain the success of [00:24:27] or partially explain the success of machine learning [00:24:28] machine learning um [00:24:29] um although you know if you're it won't [00:24:32] although you know if you're it won't answer the question why things work [00:24:34] answer the question why things work there's a lot left to be understood but [00:24:36] there's a lot left to be understood but it hopefully will give you at least a [00:24:37] it hopefully will give you at least a little bit of taste of like oh okay now [00:24:40] little bit of taste of like oh okay now i understand like [00:24:41] i understand like it's not just a kind of a you know all [00:24:43] it's not just a kind of a you know all heuristic there's some kind of [00:24:45] heuristic there's some kind of statistical principles behind what we're [00:24:46] statistical principles behind what we're doing [00:24:48] doing um cognitive [00:24:49] um cognitive science and neuroscience are [00:24:53] science and neuroscience are kind of [00:24:54] kind of other areas that feed into ai [00:24:57] other areas that feed into ai kind of science you can think about as a [00:24:58] kind of science you can think about as a software we're thinking about the human [00:25:00] software we're thinking about the human mind so this class uh talks about using [00:25:03] mind so this class uh talks about using probabilistic programs remember from you [00:25:05] probabilistic programs remember from you know the bayesian networks [00:25:08] know the bayesian networks kind of modules to model human reasons [00:25:10] kind of modules to model human reasons so this is really kind of interesting [00:25:13] so this is really kind of interesting um and then [00:25:14] um and then you can look at neuroscience which is [00:25:16] you can look at neuroscience which is has to do maybe more with a clinical [00:25:18] has to do maybe more with a clinical hardware i mean this is a theoretical [00:25:20] hardware i mean this is a theoretical neuroscience class so it's not actually [00:25:21] neuroscience class so it's not actually going to be real [00:25:22] going to be real um you know [00:25:24] um you know you know hardware so to speak but you [00:25:26] you know hardware so to speak but you ask questions like you know what is the [00:25:29] ask questions like you know what is the um you know back propagation which is [00:25:31] um you know back propagation which is the bread and butter of um you know deep [00:25:33] the bread and butter of um you know deep learning actually you the brain can't [00:25:36] learning actually you the brain can't implement that because it's not a local [00:25:39] implement that because it's not a local kind of rule so people have been [00:25:41] kind of rule so people have been interested in these questions like what [00:25:42] interested in these questions like what is kind of a really plausible [00:25:45] is kind of a really plausible approximation that kind of explains [00:25:47] approximation that kind of explains learning so pretty interesting open [00:25:49] learning so pretty interesting open question there [00:25:51] question there okay to summarize um so here's the type [00:25:54] okay to summarize um so here's the type of classes methods so this is kind of [00:25:56] of classes methods so this is kind of going straight in some sense you learn [00:25:59] going straight in some sense you learn about more advanced techniques general [00:26:00] about more advanced techniques general purpose [00:26:01] purpose all good but i would really encourage [00:26:02] all good but i would really encourage you to also think about [00:26:04] you to also think about applications of ai especially things [00:26:07] applications of ai especially things that you really interest you and [00:26:10] that you really interest you and you know that you're passionate about [00:26:12] you know that you're passionate about and they again they really help you [00:26:13] and they again they really help you understand and appreciate the methods [00:26:15] understand and appreciate the methods that you're you're learning [00:26:17] that you're you're learning um and do invest you know some time in [00:26:21] um and do invest you know some time in investing in depth [00:26:22] investing in depth um and there's a lot of classes outside [00:26:26] um and there's a lot of classes outside ai at stanford so [00:26:28] ai at stanford so definitely explore and don't limit [00:26:30] definitely explore and don't limit yourself just to kind of ai classes [00:26:32] yourself just to kind of ai classes um so just some just [00:26:35] um so just some just general tips [00:26:37] general tips uh [00:26:38] uh you know beyond taking classes [00:26:40] you know beyond taking classes there are a lot of resources online [00:26:42] there are a lot of resources online talks you know tutorials blog posts um [00:26:46] talks you know tutorials blog posts um you know it's information rich and you [00:26:49] you know it's information rich and you know you can learn a lot from just [00:26:51] know you can learn a lot from just watching things online if that mode of [00:26:54] watching things online if that mode of learning works for you [00:26:55] learning works for you some people prefer downloading code and [00:26:57] some people prefer downloading code and tinkering a lot of stuff thankfully is [00:26:59] tinkering a lot of stuff thankfully is still open source and people release [00:27:01] still open source and people release their code and tutorials [00:27:03] their code and tutorials and just talk to professors and other [00:27:05] and just talk to professors and other students about you know what um [00:27:08] students about you know what um not just like what classes to take but [00:27:10] not just like what classes to take but how they think about ai in the world [00:27:13] how they think about ai in the world because a lot of [00:27:14] because a lot of um [00:27:15] um you know learning is not written down in [00:27:18] you know learning is not written down in some sort of formulaic uh [00:27:20] some sort of formulaic uh um [00:27:21] um you know textbook i think the field is [00:27:24] you know textbook i think the field is moving so fast that i think sometimes [00:27:26] moving so fast that i think sometimes it's just like in the in the heads of a [00:27:28] it's just like in the in the heads of a few people [00:27:31] all right so that's the end of the [00:27:32] all right so that's the end of the second section um i'll take any [00:27:35] second section um i'll take any questions now [00:27:37] questions now so uh [00:27:38] so uh is it okay to take 230 without first [00:27:41] is it okay to take 230 without first taking 229. i believe the answer is yes [00:27:44] taking 229. i believe the answer is yes um if anyone has taken these uh feel [00:27:46] um if anyone has taken these uh feel free to chime in i think 2 29 [00:27:49] free to chime in i think 2 29 um you [00:27:51] um you i mean especially if you've taken 221 [00:27:52] i mean especially if you've taken 221 that should be more enough to take 230. [00:27:55] that should be more enough to take 230. 221 really gets you kind of there you [00:27:57] 221 really gets you kind of there you derive uh [00:27:59] derive uh you know a lot of different learning [00:28:00] you know a lot of different learning algorithms and you know think about like [00:28:03] algorithms and you know think about like you know mixtures of gaussians and so on [00:28:05] you know mixtures of gaussians and so on which aren't um needed if you're just [00:28:07] which aren't um needed if you're just interested in you know applying deep [00:28:09] interested in you know applying deep learning [00:28:12] any other questions [00:28:15] any other questions what would be the best way to talk to [00:28:16] what would be the best way to talk to professors and other students [00:28:19] professors and other students that is a good question [00:28:22] that is a good question um [00:28:23] um i guess ed is probably not going to be [00:28:25] i guess ed is probably not going to be super [00:28:26] super i mean it's probably going to go dead [00:28:28] i mean it's probably going to go dead after the course um i guess email is [00:28:31] after the course um i guess email is isn't always an option i mean the best [00:28:34] isn't always an option i mean the best time to talk to a [00:28:36] time to talk to a you know professor other students is [00:28:37] you know professor other students is during the quarter when they're holding [00:28:40] during the quarter when they're holding office hours and everything [00:28:42] office hours and everything but maybe [00:28:43] but maybe you know after the course there's still [00:28:45] you know after the course there's still some professors still have office hours [00:28:47] some professors still have office hours um is it okay to take 2204n without [00:28:50] um is it okay to take 2204n without previous experience in deep learning [00:28:53] previous experience in deep learning um 224 the short answer is [00:28:57] um 224 the short answer is uh yes because i think the first [00:29:00] uh yes because i think the first uh [00:29:01] uh again condition on you haven't taken 221 [00:29:03] again condition on you haven't taken 221 you have kind of the the basics 224 n [00:29:06] you have kind of the the basics 224 n starts with some of the basics of deep [00:29:08] starts with some of the basics of deep learning um so you can get by [00:29:11] learning um so you can get by i think if you can take deep learning i [00:29:13] i think if you can take deep learning i think that's better [00:29:15] think that's better um [00:29:16] um i mean there's always this thing where [00:29:17] i mean there's always this thing where if you take [00:29:18] if you take the more products you kind of take then [00:29:20] the more products you kind of take then the more [00:29:21] the more time you'll be able to spend kind of [00:29:24] time you'll be able to spend kind of actually [00:29:25] actually you know enjoying the kind of language [00:29:26] you know enjoying the kind of language aspect rather than let's say the deep [00:29:28] aspect rather than let's say the deep learning aspect [00:29:30] learning aspect um cs2324 [00:29:32] um cs2324 would be offered online [00:29:35] would be offered online later [00:29:36] later um [00:29:37] um it's going to be offered in person in [00:29:39] it's going to be offered in person in the winter [00:29:40] the winter um [00:29:42] um nope [00:29:43] nope in the future [00:29:44] in the future it's definitely a possibility haven't [00:29:46] it's definitely a possibility haven't thought [00:29:47] thought that far in advance [00:29:50] that far in advance depending on how much interest there is [00:29:51] depending on how much interest there is i guess [00:29:55] um what are classes that would be [00:29:57] um what are classes that would be offered remote [00:30:01] uh [00:30:02] uh for that you have to check the [00:30:04] for that you have to check the um so i don't know which ones are remote [00:30:07] um so i don't know which ones are remote versus [00:30:08] versus in person i think by default [00:30:09] in person i think by default everything's going to be [00:30:11] everything's going to be attempt to be in person [00:30:13] attempt to be in person i think [00:30:21] all right [00:30:22] all right um so i think that's a low in the [00:30:24] um so i think that's a low in the questions so let me move on to the third [00:30:28] questions so let me move on to the third part [00:30:30] part okay so now we get to kind of step back [00:30:32] okay so now we get to kind of step back and think about where ai as a field [00:30:35] and think about where ai as a field is going [00:30:36] is going um [00:30:40] so [00:30:41] so you know if you think about where we are [00:30:42] you know if you think about where we are today in in ai [00:30:45] today in in ai i think of alphago as kind of [00:30:47] i think of alphago as kind of a quintessential [00:30:49] a quintessential of image that captures you know the the [00:30:52] of image that captures you know the the progress and the optimism that we're [00:30:54] progress and the optimism that we're feeling today right kind of a very bold [00:30:56] feeling today right kind of a very bold effort [00:30:57] effort surprised a lot of people [00:30:59] surprised a lot of people um experts [00:31:01] um experts go of go and ai um and kind of really [00:31:05] go of go and ai um and kind of really was a kind of a triumph of of sorts for [00:31:08] was a kind of a triumph of of sorts for you know ai and machine learning and [00:31:10] you know ai and machine learning and deep learning [00:31:11] deep learning and you kind of see this [00:31:14] and you kind of see this kind of [00:31:15] kind of optimism kind of [00:31:17] optimism kind of in boldness kind of continued with [00:31:19] in boldness kind of continued with things like you know gpt 3 which came [00:31:22] things like you know gpt 3 which came out last year open ai released this a [00:31:24] out last year open ai released this a large language model [00:31:26] large language model um 175 billion parameters trained on you [00:31:28] um 175 billion parameters trained on you know a bunch of text [00:31:31] know a bunch of text and orders of magnitude larger than the [00:31:33] and orders of magnitude larger than the previous model [00:31:35] previous model and um the cost is something like you [00:31:37] and um the cost is something like you know four or five billion or sorry four [00:31:39] know four or five billion or sorry four or five million dollars billion would be [00:31:41] or five million dollars billion would be a lot um and one thing that's [00:31:44] a lot um and one thing that's interesting is just a language model so [00:31:46] interesting is just a language model so remember a language model is just [00:31:47] remember a language model is just something that [00:31:48] something that takes a context and predicts the next [00:31:51] takes a context and predicts the next word [00:31:52] word so you think about this is the world's [00:31:53] so you think about this is the world's most boring task like why would you want [00:31:55] most boring task like why would you want to just predict the next word but it [00:31:58] to just predict the next word but it turns out that if you do this at scale [00:32:00] turns out that if you do this at scale you can do [00:32:01] you can do all sorts of other things like you can [00:32:04] all sorts of other things like you can get it to [00:32:05] get it to convert natural language into sql [00:32:07] convert natural language into sql queries or you can have do question [00:32:09] queries or you can have do question answering [00:32:10] answering with a [00:32:12] with a in a dialogue format [00:32:14] in a dialogue format you know it doesn't do any of these like [00:32:16] you know it doesn't do any of these like particularly well but the mere fact that [00:32:20] particularly well but the mere fact that now you have a single model that can if [00:32:22] now you have a single model that can if that wasn't trained for these tasks [00:32:23] that wasn't trained for these tasks doing anything sensible is impressive [00:32:25] doing anything sensible is impressive it's like [00:32:26] it's like the question isn't how well the bear [00:32:28] the question isn't how well the bear dances is like [00:32:29] dances is like bears dancing at all in in some ways and [00:32:32] bears dancing at all in in some ways and this has led to a whole era of um large [00:32:36] this has led to a whole era of um large models which are you know really [00:32:39] models which are you know really improving the accuracy across the board [00:32:41] improving the accuracy across the board on [00:32:42] on mostly kind of language tasks for now [00:32:44] mostly kind of language tasks for now but you see it in vision as well [00:32:48] but you see it in vision as well so [00:32:49] so and it's kind of this optimism that's [00:32:53] and it's kind of this optimism that's in progress that's really leading to ai [00:32:56] in progress that's really leading to ai being kind of deployed [00:32:58] being kind of deployed you know across a countless number of [00:33:00] you know across a countless number of different areas from you know [00:33:02] different areas from you know all the consumer services like you know [00:33:04] all the consumer services like you know think about facebook or google but also [00:33:07] think about facebook or google but also in other types of you know areas as well [00:33:10] in other types of you know areas as well although obviously to a lesser extent [00:33:12] although obviously to a lesser extent because they don't have as much kind of [00:33:15] because they don't have as much kind of you know ai expertise as the tech giants [00:33:18] you know ai expertise as the tech giants and it's also being applied [00:33:20] and it's also being applied in even [00:33:23] in even you know many areas like education or [00:33:25] you know many areas like education or credit employment [00:33:27] credit employment which really starts to kind of affect [00:33:29] which really starts to kind of affect you know people [00:33:31] you know people right you know that said like some of [00:33:33] right you know that said like some of these ai systems are like logistical [00:33:34] these ai systems are like logistical russia not gpt3 [00:33:36] russia not gpt3 many of them are actually closer to [00:33:38] many of them are actually closer to logistic regression than gpg3 but [00:33:39] logistic regression than gpg3 but nonetheless this whole umbrella of uh [00:33:42] nonetheless this whole umbrella of uh you know using data methods to automate [00:33:45] you know using data methods to automate certain types of decision making is a [00:33:47] certain types of decision making is a general trend that encompasses many [00:33:49] general trend that encompasses many different regimes [00:33:51] different regimes so now what i want to spend the last [00:33:53] so now what i want to spend the last kind of lecture [00:33:54] kind of lecture reflecting on is what is the societal [00:33:57] reflecting on is what is the societal you know impact of this of this trend [00:34:00] you know impact of this of this trend um [00:34:01] um having spent a whole quarter time [00:34:03] having spent a whole quarter time talking about the you know the [00:34:04] talking about the you know the technology um so i just want to use a [00:34:07] technology um so i just want to use a simple example you know machine [00:34:09] simple example you know machine translation you know many of you [00:34:10] translation you know many of you probably used it [00:34:12] probably used it um [00:34:13] um and it's one application where [00:34:15] and it's one application where just the quality of machine translation [00:34:17] just the quality of machine translation has just improved significantly due to [00:34:20] has just improved significantly due to advances in ai [00:34:21] advances in ai and which is great it can help break [00:34:24] and which is great it can help break down language barriers increase [00:34:25] down language barriers increase accessibility improve the kind of [00:34:28] accessibility improve the kind of productivity of the economy and and so [00:34:30] productivity of the economy and and so on [00:34:30] on um [00:34:32] um and so this is you know generally [00:34:34] and so this is you know generally positive [00:34:36] positive um but there's always a kind of [00:34:37] um but there's always a kind of you know you look at the kind of the [00:34:40] you know you look at the kind of the flip side of things um [00:34:42] flip side of things um and [00:34:43] and while they're ubiquitous they have [00:34:45] while they're ubiquitous they have problems [00:34:46] problems so for example hungarian is a language [00:34:49] so for example hungarian is a language that doesn't distinguish between [00:34:52] that doesn't distinguish between female male [00:34:53] female male third-person pronouns [00:34:55] third-person pronouns so when you translate into english it [00:34:57] so when you translate into english it has to guess what the the gender of a [00:34:59] has to guess what the the gender of a pronoun [00:35:00] pronoun is and you can see that it patterns very [00:35:03] is and you can see that it patterns very stereotypically a long kind of [00:35:05] stereotypically a long kind of professional stereotypes [00:35:08] professional stereotypes um and these kind of biases are [00:35:11] um and these kind of biases are you know sometimes if you i mean you [00:35:13] you know sometimes if you i mean you don't bother to think about it maybe [00:35:15] don't bother to think about it maybe don't uh [00:35:16] don't uh you know seem to raise a alarm but i [00:35:18] you know seem to raise a alarm but i think these are actually kind of the [00:35:21] think these are actually kind of the the you know the frog and better alive [00:35:24] the you know the frog and better alive kind of uh setting where it starts kind [00:35:26] kind of uh setting where it starts kind of creeping insidiously throughout you [00:35:28] of creeping insidiously throughout you know society and gets amplified [00:35:30] know society and gets amplified um so it's something really to be [00:35:32] um so it's something really to be you know careful of [00:35:34] you know careful of there's also weirder stuff so there's [00:35:36] there's also weirder stuff so there's kind of one example um from a few years [00:35:39] kind of one example um from a few years ago where in some you know maori which [00:35:42] ago where in some you know maori which is a i guess a language that doesn't [00:35:45] is a i guess a language that doesn't have that much data you type in some [00:35:47] have that much data you type in some nonsense dog dog dog and you get some [00:35:49] nonsense dog dog dog and you get some uh really disturbing stuff coming out [00:35:53] uh really disturbing stuff coming out um and no one really knows why this is [00:35:55] um and no one really knows why this is kind of happening [00:35:58] kind of happening i think many of these issues are due the [00:36:00] i think many of these issues are due the fact that machine learning thrives on [00:36:02] fact that machine learning thrives on these complex models fitting [00:36:04] these complex models fitting spurious or fitting correlations in data [00:36:07] spurious or fitting correlations in data so it's like we're kind of pushing the [00:36:10] so it's like we're kind of pushing the limits of what we can do and that that's [00:36:13] limits of what we can do and that that's kind of the [00:36:14] kind of the the [00:36:15] the the kind of outlook i think the field [00:36:17] the kind of outlook i think the field has had [00:36:18] has had for quite some time i mean you have to [00:36:20] for quite some time i mean you have to remember that even you know 10 years ago [00:36:22] remember that even you know 10 years ago like computer vision basically didn't [00:36:24] like computer vision basically didn't work [00:36:25] work uh and so people are like really for [00:36:28] uh and so people are like really for decades trying to get things to work at [00:36:30] decades trying to get things to work at all and now things work [00:36:32] all and now things work well now we have other things to worry [00:36:34] well now we have other things to worry about [00:36:35] about so i want to highlight something called [00:36:37] so i want to highlight something called spurious correlations which i think is [00:36:40] spurious correlations which i think is you know a [00:36:41] you know a cautionary tale [00:36:43] cautionary tale so here's a task um you know it's a [00:36:46] so here's a task um you know it's a pretty solid test you take an x-ray [00:36:49] pretty solid test you take an x-ray image of a chest and you're trying to [00:36:51] image of a chest and you're trying to predict whether there's a collapsed lung [00:36:53] predict whether there's a collapsed lung or not [00:36:54] or not and [00:36:55] and if you take a standard [00:36:57] if you take a standard you know computer vision machinery this [00:36:59] you know computer vision machinery this works pretty well [00:37:01] works pretty well uh but you know take a closer look at [00:37:04] uh but you know take a closer look at this uh this image see that tube coming [00:37:06] this uh this image see that tube coming out here [00:37:07] out here that's called a chest strain [00:37:10] that's called a chest strain and it's something that's it's a common [00:37:11] and it's something that's it's a common treatment for collapsed lungs [00:37:14] treatment for collapsed lungs okay so [00:37:15] okay so and it turns out that this is one of the [00:37:17] and it turns out that this is one of the signals that the model is picking up on [00:37:20] signals that the model is picking up on so it looks like hey this person was [00:37:22] so it looks like hey this person was treated for clubs long therefore [00:37:24] treated for clubs long therefore um you know he has a collapsed lung [00:37:26] um you know he has a collapsed lung okay so if you look at [00:37:29] okay so if you look at um the the accuracies [00:37:31] um the the accuracies the eoc of [00:37:33] the eoc of um [00:37:34] um here is the set of the entire population [00:37:37] here is the set of the entire population here the people who have gotten chest [00:37:38] here the people who have gotten chest strains [00:37:39] strains um they're damn you're predicting on [00:37:42] um they're damn you're predicting on them much more likely accurately than [00:37:44] them much more likely accurately than the people without chest strains [00:37:46] the people without chest strains so you might seem like oh okay we're [00:37:48] so you might seem like oh okay we're doing pretty well but actually [00:37:50] doing pretty well but actually you know first a segment of a population [00:37:53] you know first a segment of a population that doesn't have chest strains you're [00:37:54] that doesn't have chest strains you're doing actually pretty you know not not [00:37:56] doing actually pretty you know not not so well [00:37:57] so well and this is exactly the subpopulation of [00:37:59] and this is exactly the subpopulation of untreated patients that you actually [00:38:01] untreated patients that you actually care about because if they already have [00:38:02] care about because if they already have a chest strain like you don't need a [00:38:04] a chest strain like you don't need a prediction whether they have a klopse [00:38:05] prediction whether they have a klopse lung or not [00:38:06] lung or not so [00:38:07] so this kind of is a cautionary tale that [00:38:09] this kind of is a cautionary tale that you really need to [00:38:12] you really need to not just look at the accuracy but kind [00:38:13] not just look at the accuracy but kind of really understand [00:38:15] of really understand how the model is actually making that [00:38:17] how the model is actually making that prediction because if it's just latching [00:38:19] prediction because if it's just latching onto spurious correlations and you go [00:38:20] onto spurious correlations and you go deploy this [00:38:22] deploy this it might not be so good [00:38:25] it might not be so good here's another example [00:38:28] suppose you're trying to figure out the [00:38:29] suppose you're trying to figure out the effect of a treatment on survival of [00:38:32] effect of a treatment on survival of patients so here [00:38:34] patients so here maybe you did some study [00:38:36] maybe you did some study and here's the data so for untreated [00:38:38] and here's the data so for untreated patients eighty percent survive and for [00:38:40] patients eighty percent survive and for treated patients thirty percent survive [00:38:43] treated patients thirty percent survive so the question is does the treatment [00:38:45] so the question is does the treatment help [00:38:47] help so [00:38:48] so how many people think it helps [00:38:50] how many people think it helps so maybe raise your hand if you think it [00:38:52] so maybe raise your hand if you think it uh [00:38:53] uh you know doesn't help [00:38:56] you know doesn't help or just put something into the chat [00:38:58] or just put something into the chat that's fine too [00:39:01] that's fine too i'm trying to make this a little bit [00:39:02] i'm trying to make this a little bit interactive [00:39:04] interactive get people to think [00:39:06] get people to think doesn't help [00:39:09] that's possible unclear whether helps [00:39:12] that's possible unclear whether helps yeah who knows it's right [00:39:15] yeah who knows it's right um [00:39:16] um you know if you're you're very naive [00:39:18] you know if you're you're very naive about it you can think like oh okay well [00:39:20] about it you can think like oh okay well it's correlated survival is correlated [00:39:22] it's correlated survival is correlated with [00:39:23] with not treating [00:39:24] not treating but [00:39:25] but um exactly um [00:39:27] um exactly um you know sick people are more likely to [00:39:29] you know sick people are more likely to undergo treatment [00:39:31] undergo treatment so there's a hidden confounder here [00:39:32] so there's a hidden confounder here which is you know how sick you are so [00:39:34] which is you know how sick you are so this doesn't tell you anything [00:39:37] this doesn't tell you anything so if you're just doing machine learning [00:39:39] so if you're just doing machine learning naively [00:39:40] naively you could really be doing completely the [00:39:43] you could really be doing completely the wrong thing [00:39:44] wrong thing and there's this whole field of causal [00:39:47] and there's this whole field of causal inference which provides kind of [00:39:49] inference which provides kind of rigorous machinery to help you [00:39:52] rigorous machinery to help you answer these kind of questions [00:39:54] answer these kind of questions and especially kind of in these kind of [00:39:57] and especially kind of in these kind of high stakes medical settings where [00:39:59] high stakes medical settings where there's maybe not lack of data in lack [00:40:02] there's maybe not lack of data in lack of ground truth you really really need [00:40:04] of ground truth you really really need to be careful right like for machine [00:40:06] to be careful right like for machine translation you kind of can get a human [00:40:08] translation you kind of can get a human to look at the sentence like yep seems [00:40:10] to look at the sentence like yep seems reasonable and this is kind of the more [00:40:12] reasonable and this is kind of the more typical engineering attitude like i try [00:40:14] typical engineering attitude like i try it and i can always validate you if it [00:40:16] it and i can always validate you if it works like when you can validate when it [00:40:18] works like when you can validate when it works yeah maybe maybe it's okay to use [00:40:22] works yeah maybe maybe it's okay to use something that [00:40:23] something that you maybe don't fully understand but [00:40:25] you maybe don't fully understand but like when you have to [00:40:26] like when you have to rely on [00:40:28] rely on when there's no validation i think you [00:40:30] when there's no validation i think you have to lean more much more on kind of [00:40:32] have to lean more much more on kind of first principles [00:40:36] okay so [00:40:37] okay so cautionary tale is um always be aware of [00:40:39] cautionary tale is um always be aware of the limitations of a technology and [00:40:41] the limitations of a technology and machine learning definitely has a lot of [00:40:44] machine learning definitely has a lot of you know limitations and i think it's [00:40:46] you know limitations and i think it's really important that you walk away with [00:40:47] really important that you walk away with this class not thinking like oh yes [00:40:50] this class not thinking like oh yes machine learning let's now i know how to [00:40:52] machine learning let's now i know how to do sgd i get a data set and i can just [00:40:54] do sgd i get a data set and i can just go with it you have to be [00:40:56] go with it you have to be aware of the limitations [00:41:00] all right so now um the second part of [00:41:02] all right so now um the second part of this um you know the module i'm going to [00:41:04] this um you know the module i'm going to talk about ai aifx so [00:41:06] talk about ai aifx so many of you probably have heard [00:41:08] many of you probably have heard the term aifix kind of thrown around [00:41:10] the term aifix kind of thrown around it's often in use it's um you know often [00:41:13] it's often in use it's um you know often kind of there's a lot of [00:41:15] kind of there's a lot of you know heat kind of around this term [00:41:17] you know heat kind of around this term like you know [00:41:18] like you know ethics people are not being ethical and [00:41:20] ethics people are not being ethical and what's going on here [00:41:22] what's going on here and um you know what the the broadest [00:41:24] and um you know what the the broadest level it's it's about how we ensure that [00:41:27] level it's it's about how we ensure that ai is developed to you know benefit [00:41:29] ai is developed to you know benefit society and not harm society okay so [00:41:33] society and not harm society okay so sounds [00:41:34] sounds sounds easy not easy but like you know [00:41:38] sounds easy not easy but like you know uncontroversial right um [00:41:40] uncontroversial right um and and there's a lot of [00:41:42] and and there's a lot of principles and people have written a lot [00:41:44] principles and people have written a lot about this so i'm not an ethicist so i i [00:41:46] about this so i'm not an ethicist so i i mean i don't can't speak to this kind of [00:41:48] mean i don't can't speak to this kind of in in great depth but you know starting [00:41:51] in in great depth but you know starting with the kind of the belmont report from [00:41:52] with the kind of the belmont report from the 1979 on human subject research and [00:41:56] the 1979 on human subject research and there's an acm code of ethics and all [00:41:58] there's an acm code of ethics and all these companies are now putting on [00:42:00] these companies are now putting on alkyne responsible ai principles and so [00:42:02] alkyne responsible ai principles and so on it seems like there's a lot of [00:42:04] on it seems like there's a lot of guidelines [00:42:06] guidelines um which is which is good um [00:42:09] um which is which is good um often these things say like you know [00:42:11] often these things say like you know respect persons don't do harm to people [00:42:14] respect persons don't do harm to people um and which you look at and you say [00:42:17] um and which you look at and you say okay well [00:42:19] okay well yeah um yeah i don't want to harm people [00:42:22] yeah um yeah i don't want to harm people but the real question is like how does [00:42:24] but the real question is like how does how do these high level principles [00:42:25] how do these high level principles relate to [00:42:27] relate to the concrete actions you take because a [00:42:30] the concrete actions you take because a lot of these ethical issues aren't about [00:42:33] lot of these ethical issues aren't about like any [00:42:34] like any malice or [00:42:36] malice or um in mis misguided kind of intent it's [00:42:40] um in mis misguided kind of intent it's really about kind of [00:42:41] really about kind of um you know [00:42:43] um you know has to do with kind of ignorance if [00:42:44] has to do with kind of ignorance if you're not aware of something then um [00:42:46] you're not aware of something then um bad things can happen even if you're not [00:42:48] bad things can happen even if you're not aware of something [00:42:49] aware of something okay so what i'm going to do is walk [00:42:51] okay so what i'm going to do is walk through a few specific considerations [00:42:54] through a few specific considerations which has to do with this umbrella [00:42:56] which has to do with this umbrella you know ai ethics which hopefully [00:42:59] you know ai ethics which hopefully gives you a bit more concrete guidance [00:43:00] gives you a bit more concrete guidance so there's the considerations are data [00:43:03] so there's the considerations are data um there's one objectives you optimize [00:43:07] um there's one objectives you optimize there's inequality which we've talked [00:43:08] there's inequality which we've talked about before [00:43:10] about before our idea of harmful applications um [00:43:12] our idea of harmful applications um which [00:43:13] which um i'm gonna get uh um some you know a [00:43:16] um i'm gonna get uh um some you know a survey of people so get ready for that [00:43:19] survey of people so get ready for that and then automation versus augmentation [00:43:22] and then automation versus augmentation all right so [00:43:24] all right so moving on [00:43:26] moving on so data [00:43:31] so [00:43:32] so ai is largely powered by machine [00:43:34] ai is largely powered by machine learning and without data there's no [00:43:36] learning and without data there's no machine line [00:43:37] machine line so we must naturally ask the question [00:43:40] so we must naturally ask the question like what is this data that we're [00:43:42] like what is this data that we're talking about [00:43:44] talking about so [00:43:45] so here's an example [00:43:46] here's an example um [00:43:49] there's a data set called tiny images 80 [00:43:52] there's a data set called tiny images 80 million images which has been used since [00:43:55] million images which has been used since uh 2006 in the computer vision community [00:43:58] uh 2006 in the computer vision community and it was actually taken down [00:44:01] and it was actually taken down um because uh it was found to have [00:44:04] um because uh it was found to have various kind of offensive content you [00:44:06] various kind of offensive content you know in it even image then had some of [00:44:08] know in it even image then had some of these kind of uh [00:44:09] these kind of uh um objectionable you know issues and it [00:44:12] um objectionable you know issues and it was kind of cleaned up you know [00:44:14] was kind of cleaned up you know afterwards [00:44:15] afterwards so [00:44:16] so you know a lot of the times you know ai [00:44:18] you know a lot of the times you know ai systems are relying on web script data [00:44:20] systems are relying on web script data and we know sometimes on the web it's [00:44:23] and we know sometimes on the web it's not a prepaid place and if you're just [00:44:25] not a prepaid place and if you're just scraping data and not very carefully [00:44:26] scraping data and not very carefully looking at it you can kind of inherit a [00:44:29] looking at it you can kind of inherit a lot of these kind of offensive [00:44:32] lot of these kind of offensive material [00:44:35] material uh [00:44:36] uh second of all um [00:44:39] second of all um you know there are historical biases [00:44:41] you know there are historical biases inherent in data so kind of social [00:44:43] inherent in data so kind of social biases with race and gender um even if [00:44:46] biases with race and gender um even if they're not offensive the idea that you [00:44:49] they're not offensive the idea that you know maybe the lack of representation of [00:44:51] know maybe the lack of representation of certain kind of marginalized populations [00:44:54] certain kind of marginalized populations is itself kind of a problem so you have [00:44:56] is itself kind of a problem so you have kind of two types of problems one is you [00:44:58] kind of two types of problems one is you can represent people or uh badly or you [00:45:02] can represent people or uh badly or you cannot represent them and both are you [00:45:04] cannot represent them and both are you know things to worry about [00:45:07] know things to worry about so there's an another um [00:45:10] so there's an another um maybe [00:45:11] maybe thing that [00:45:12] thing that people don't normally think about when [00:45:14] people don't normally think about when it comes to data [00:45:15] it comes to data which is that um you know should a piece [00:45:19] which is that um you know should a piece of data let's say if [00:45:21] of data let's say if i go on a vacation and take a picture of [00:45:23] i go on a vacation and take a picture of my dog [00:45:24] my dog and i post it on flickr [00:45:26] and i post it on flickr and then some big tech company scrapes [00:45:28] and then some big tech company scrapes it and then does some pre-training and [00:45:31] it and then does some pre-training and uses it to do some scene classification [00:45:34] uses it to do some scene classification is this is this good [00:45:36] is this is this good or i mean is it should it be allowed [00:45:39] or i mean is it should it be allowed right and right now i think the there's [00:45:41] right and right now i think the there's no [00:45:43] no we're kind of pretty lazy fair about [00:45:45] we're kind of pretty lazy fair about this where internet scrapes are kind of [00:45:47] this where internet scrapes are kind of the norm [00:45:48] the norm um [00:45:49] um and there's no there's no consent here [00:45:52] and there's no there's no consent here there's maybe i mean a lot of things are [00:45:54] there's maybe i mean a lot of things are copyright and um i'm sure there's tons [00:45:57] copyright and um i'm sure there's tons of copyright you know potentially [00:46:00] of copyright you know potentially copyright violations [00:46:02] copyright violations so this kind of [00:46:04] so this kind of there's a question like data is produced [00:46:06] there's a question like data is produced by people for [00:46:08] by people for doing certain activities right i send i [00:46:11] doing certain activities right i send i post an article i write a book i send [00:46:13] post an article i write a book i send messages to people [00:46:14] messages to people and [00:46:15] and you know machine learning is something [00:46:17] you know machine learning is something that kind of sits on top and kind of [00:46:19] that kind of sits on top and kind of siphons that data for [00:46:20] siphons that data for usually another purpose [00:46:22] usually another purpose and the question is [00:46:24] and the question is you know whether [00:46:25] you know whether what right do i have to say like no that [00:46:29] what right do i have to say like no that should be allowed or not allowed and [00:46:30] should be allowed or not allowed and often this kind of goes without even [00:46:33] often this kind of goes without even your users being aware of what's [00:46:35] your users being aware of what's happening [00:46:39] so another piece of data that is [00:46:41] so another piece of data that is important is um [00:46:44] important is um you know how much work it takes to [00:46:46] you know how much work it takes to produce it [00:46:47] produce it so often we think about you know [00:46:49] so often we think about you know technology [00:46:50] technology and machine learning methods because [00:46:53] and machine learning methods because that's kind of well i mean from a [00:46:55] that's kind of well i mean from a computer science perspective that's kind [00:46:56] computer science perspective that's kind of the object of study [00:46:59] of the object of study uh but more and more i think it's [00:47:02] uh but more and more i think it's important to be aware that [00:47:04] important to be aware that um [00:47:05] um you know data takes you know what's [00:47:07] you know data takes you know what's powering all these things [00:47:09] powering all these things right like you think of ai as reducing [00:47:11] right like you think of ai as reducing human labor it makes things more [00:47:13] human labor it makes things more efficient and so on but it's not free [00:47:16] efficient and so on but it's not free and it requires resources so this is [00:47:17] and it requires resources so this is excellent book by mary gray and [00:47:19] excellent book by mary gray and siddharth suri called ghostwork that [00:47:22] siddharth suri called ghostwork that kind of documents the amount of kind of [00:47:24] kind of documents the amount of kind of human labor usually crowdsourcing that's [00:47:27] human labor usually crowdsourcing that's used to create data sets [00:47:29] used to create data sets um or moderate flag content um that is [00:47:32] um or moderate flag content um that is used to power these ai systems so a lot [00:47:35] used to power these ai systems so a lot of ai systems are [00:47:36] of ai systems are have the kind of a veneer being like [00:47:39] have the kind of a veneer being like automated [00:47:40] automated but really they're kind of [00:47:42] but really they're kind of you know being powered by people at some [00:47:44] you know being powered by people at some level [00:47:46] level um [00:47:47] um there's a kind of one example i want to [00:47:49] there's a kind of one example i want to point out [00:47:50] point out which is good food for thought is [00:47:53] which is good food for thought is you know in machine learning we like to [00:47:55] you know in machine learning we like to think about the distinction between [00:47:57] think about the distinction between label data [00:47:58] label data which is really expensive to obtain [00:48:00] which is really expensive to obtain because you have to like pay people to [00:48:01] because you have to like pay people to label it and unlabel data which is cheap [00:48:04] label it and unlabel data which is cheap or even free right [00:48:07] or even free right um [00:48:08] um but if you think about it if you go back [00:48:10] but if you think about it if you go back to what i said about data is created by [00:48:11] to what i said about data is created by people [00:48:12] people expanding capital right think about like [00:48:15] expanding capital right think about like quote-unquote raw text books and [00:48:17] quote-unquote raw text books and articles [00:48:18] articles um [00:48:19] um it's free because well we just [00:48:22] it's free because well we just took someone else's book that they spend [00:48:24] took someone else's book that they spend a lot of a whole year writing and we [00:48:26] a lot of a whole year writing and we didn't pay them for it right that's why [00:48:27] didn't pay them for it right that's why it's free [00:48:28] it's free and so so it's kind of [00:48:31] and so so it's kind of important to kind of keep that [00:48:32] important to kind of keep that perspective that you know a lot of [00:48:34] perspective that you know a lot of machine learning is um kind of deriving [00:48:36] machine learning is um kind of deriving value from [00:48:38] value from um [00:48:39] um uh you know the labor of people [00:48:42] uh you know the labor of people um who are not getting compensated um [00:48:44] um who are not getting compensated um you know for the asset [00:48:47] you know for the asset so you know just a little bit of [00:48:48] so you know just a little bit of perspective [00:48:51] all right so [00:48:52] all right so the second uh topic is objectives um [00:48:57] the second uh topic is objectives um so optimization is largely touted in [00:49:00] so optimization is largely touted in this class it's a powerful paradigm it [00:49:02] this class it's a powerful paradigm it allows you to express a desire in the [00:49:04] allows you to express a desire in the form of an objective function [00:49:06] form of an objective function and then [00:49:07] and then separate that from like you know the [00:49:10] separate that from like you know the intense resources and algorithms to make [00:49:12] intense resources and algorithms to make it kind of come true you make a wish and [00:49:14] it kind of come true you make a wish and then [00:49:15] then you know you can get it to come true [00:49:16] you know you can get it to come true that's the kind of a power optimization [00:49:19] that's the kind of a power optimization but the question is what should the [00:49:20] but the question is what should the objective be [00:49:22] objective be right [00:49:23] right ideally it would be something like you [00:49:24] ideally it would be something like you know happiness or you know productivity [00:49:26] know happiness or you know productivity but usually these things are impossible [00:49:28] but usually these things are impossible to measure [00:49:29] to measure so often we have surrogates [00:49:32] so often we have surrogates for them [00:49:33] for them and um so okay now we're not getting the [00:49:36] and um so okay now we're not getting the thing we actually care about because [00:49:38] thing we actually care about because their circuits are approximations [00:49:40] their circuits are approximations and further more there's kind of [00:49:42] and further more there's kind of different incentives right you know [00:49:44] different incentives right you know businesses are [00:49:46] businesses are always incentivized to maximize you know [00:49:48] always incentivized to maximize you know profit you know [00:49:50] profit you know i mean nothing against them that's what [00:49:51] i mean nothing against them that's what they're designed to do and um that's not [00:49:53] they're designed to do and um that's not always aligned with kind of what's the [00:49:55] always aligned with kind of what's the benefit for the you know the social good [00:49:58] benefit for the you know the social good um [00:49:59] um and [00:50:00] and you know so just an example um you know [00:50:04] you know so just an example um you know most internet companies use clicks or [00:50:06] most internet companies use clicks or views as a major component of their [00:50:08] views as a major component of their objective function [00:50:09] objective function why because it's it's the signal that [00:50:11] why because it's it's the signal that they have and it's really good at [00:50:13] they have and it's really good at driving up kind of profit and usually [00:50:15] driving up kind of profit and usually it's actually you know it does [00:50:17] it's actually you know it does reasonable things it gives you what you [00:50:19] reasonable things it gives you what you say you want [00:50:21] say you want but you know obviously what people's [00:50:22] but you know obviously what people's reflexive actions are at any given time [00:50:25] reflexive actions are at any given time are not necessarily represented their [00:50:26] are not necessarily represented their long-term goals [00:50:28] long-term goals and moreover at societal level we see [00:50:30] and moreover at societal level we see that this leads to potentially big [00:50:32] that this leads to potentially big problems like polarization which is you [00:50:34] problems like polarization which is you know a whole other topic that it won't [00:50:36] know a whole other topic that it won't get into [00:50:38] get into so so i think it's always important to [00:50:40] so so i think it's always important to think about like what the [00:50:42] think about like what the objectives that you've kind of set out [00:50:43] objectives that you've kind of set out to be and beware of any sort of [00:50:45] to be and beware of any sort of surrogates or misalignment incentives [00:50:49] surrogates or misalignment incentives inequality we've talked about kind of a [00:50:51] inequality we've talked about kind of a machine learning lecture where um [00:50:54] machine learning lecture where um remember the gender trades a project [00:50:56] remember the gender trades a project where um [00:50:58] where um different gender uh [00:50:59] different gender uh kind of image recognizers work [00:51:01] kind of image recognizers work differently on different uh populations [00:51:04] differently on different uh populations of people [00:51:05] of people um [00:51:07] um you know what do you do about this um [00:51:10] you know what do you do about this um well you can [00:51:11] well you can collect more data [00:51:13] collect more data uh for certain types of groups but often [00:51:15] uh for certain types of groups but often this is hard to do and more expensive so [00:51:18] this is hard to do and more expensive so you know there might not be kind of uh [00:51:20] you know there might not be kind of uh incentives to do this [00:51:22] incentives to do this unless there's you know regulation [00:51:27] one solution is data the second solution [00:51:28] one solution is data the second solution is in the methods we looked at how you [00:51:30] is in the methods we looked at how you can minimize the maximum group loss [00:51:33] can minimize the maximum group loss using group pro [00:51:35] using group pro and uh you can you know mitigate some of [00:51:38] and uh you can you know mitigate some of these performance disparities [00:51:40] these performance disparities you know of course [00:51:42] you know of course you know it's a big philosophical [00:51:43] you know it's a big philosophical question like what kind of tradeoffs you [00:51:46] question like what kind of tradeoffs you you want to you know [00:51:49] you want to you know take which you've had the [00:51:51] take which you've had the opportunity to kind of reflect on in in [00:51:53] opportunity to kind of reflect on in in the homework [00:51:56] the homework but one thing i want to mention is you [00:51:57] but one thing i want to mention is you know the idea of auditing auditing i [00:52:00] know the idea of auditing auditing i think is a really [00:52:01] think is a really powerful force because a lot of systems [00:52:04] powerful force because a lot of systems especially commercial systems are you [00:52:06] especially commercial systems are you don't really know what's going on [00:52:08] don't really know what's going on in them right and in the gender shades [00:52:10] in them right and in the gender shades example after the study came out [00:52:13] example after the study came out um you know companies were incentivized [00:52:15] um you know companies were incentivized to fix the problem and after you know a [00:52:18] to fix the problem and after you know a period of time you know disparities were [00:52:20] period of time you know disparities were largely vanished [00:52:22] largely vanished so just like by the mere fact of [00:52:24] so just like by the mere fact of studying what systems are doing [00:52:26] studying what systems are doing can [00:52:27] can actually sometimes be enough to [00:52:29] actually sometimes be enough to incentivize companies to take action [00:52:34] okay so now i'm going to do a little bit [00:52:36] okay so now i'm going to do a little bit of [00:52:36] of audience participation so get ready to [00:52:39] audience participation so get ready to kind of raise your hands [00:52:41] kind of raise your hands so [00:52:42] so there's a question of [00:52:44] there's a question of what applications are [00:52:47] what applications are are ethically okay [00:52:49] are ethically okay okay so this is gonna be um interesting [00:52:52] okay so this is gonna be um interesting and moreover [00:52:54] and moreover you know when a researcher makes a kind [00:52:56] you know when a researcher makes a kind of advance how do you assess its [00:52:58] of advance how do you assess its potential harms [00:53:00] potential harms okay so [00:53:02] okay so um here's one [00:53:04] um here's one autonomous weapon systems powered by ai [00:53:07] autonomous weapon systems powered by ai that can track objects and [00:53:09] that can track objects and fire [00:53:10] fire missiles or whatever [00:53:12] missiles or whatever so [00:53:13] so maybe you can vote either [00:53:15] maybe you can vote either up if you think this is a okay [00:53:19] up if you think this is a okay application or [00:53:20] application or a down if you think this is not okay [00:53:22] a down if you think this is not okay application [00:53:27] okay so i think most people who voted [00:53:29] okay so i think most people who voted say [00:53:30] say no if not all okay so this is [00:53:33] no if not all okay so this is should be an easy case i don't know what [00:53:35] should be an easy case i don't know what everyone else thinks hopefully this [00:53:37] everyone else thinks hopefully this should be [00:53:38] should be um okay so maybe maybe there's some [00:53:41] um okay so maybe maybe there's some um [00:53:42] um you know i think [00:53:44] you know i think you could maybe make a case for uh [00:53:47] you could maybe make a case for uh this but i think this is [00:53:50] this but i think this is largely [00:53:51] largely um in the community regarded as like [00:53:54] um in the community regarded as like very very you know ethically problematic [00:53:56] very very you know ethically problematic at the very least [00:54:00] so uh i i wrote yes for a defensive role [00:54:04] so uh i i wrote yes for a defensive role because i was thinking like uh if [00:54:08] because i was thinking like uh if just like there was an uh attack uh and [00:54:11] just like there was an uh attack uh and the israeli [00:54:12] the israeli uh autonomous system were able to defend [00:54:15] uh autonomous system were able to defend missiles and that protected a lot of [00:54:17] missiles and that protected a lot of people then in that perspective it might [00:54:20] people then in that perspective it might be [00:54:21] be uh good to have automatic [00:54:27] yeah so um i'm not going to get into the [00:54:30] yeah so um i'm not going to get into the kind of detail argument i think you can [00:54:32] kind of detail argument i think you can debate these things for a long time but [00:54:35] debate these things for a long time but um [00:54:36] um but i think i'm trying to kind of bring [00:54:38] but i think i'm trying to kind of bring a together a spectrum um so [00:54:41] a together a spectrum um so i think this is one example which most [00:54:43] i think this is one example which most people think is [00:54:45] people think is is uh very problematic so what about [00:54:47] is uh very problematic so what about deep fix [00:54:49] deep fix so again vote how many of you think this [00:54:52] so again vote how many of you think this is okay technology [00:54:54] is okay technology versus not okay [00:54:57] versus not okay um so [00:54:58] um so you know they have genuine use cases um [00:55:00] you know they have genuine use cases um maybe an entertainment like if you want [00:55:03] maybe an entertainment like if you want to create like an avatar or something or [00:55:06] to create like an avatar or something or pretend you're a you know a celebrity [00:55:09] pretend you're a you know a celebrity um [00:55:09] um [Music] [00:55:11] [Music] but on the other hand of course i mean [00:55:13] but on the other hand of course i mean as this picture points out you can fake [00:55:15] as this picture points out you can fake you know barack obama or some head of [00:55:18] you know barack obama or some head of state into to [00:55:19] state into to doctor your video that gets them to say [00:55:22] doctor your video that gets them to say you know anything [00:55:24] you know anything so obviously this is a potentially [00:55:26] so obviously this is a potentially pretty problematic [00:55:28] pretty problematic um [00:55:29] um from a point of view of uh this [00:55:31] from a point of view of uh this information [00:55:32] information where people can't tell what's [00:55:34] where people can't tell what's fact or what's fiction [00:55:36] fact or what's fiction so what about image generation [00:55:38] so what about image generation okay so this just gets a little bit more [00:55:41] okay so this just gets a little bit more interesting so suppose you you you know [00:55:42] interesting so suppose you you you know you're not generating barack obama [00:55:44] you're not generating barack obama you're just generating you know few [00:55:46] you're just generating you know few puppies okay how many of you think this [00:55:48] puppies okay how many of you think this is okay now cute puppies yeah okay who [00:55:51] is okay now cute puppies yeah okay who doesn't want more cute puppies [00:55:53] doesn't want more cute puppies um [00:55:56] um okay so more [00:55:58] okay so more yeses [00:56:00] but you know [00:56:03] but you know intuitively feels like it should be okay [00:56:05] intuitively feels like it should be okay because you kind of distance yourself [00:56:06] because you kind of distance yourself from [00:56:07] from you know any sort of weaponry [00:56:09] you know any sort of weaponry but [00:56:10] but you know the truth is that [00:56:12] you know the truth is that a lot of the [00:56:15] methods that you see [00:56:17] methods that you see is actually general purpose [00:56:19] is actually general purpose right like if you can generate cute [00:56:21] right like if you can generate cute puppies you can generate [00:56:22] puppies you can generate barack obama [00:56:24] barack obama right so this is where i think the [00:56:26] right so this is where i think the ethical [00:56:28] ethical dilemma comes about which is this idea [00:56:30] dilemma comes about which is this idea of dual [00:56:31] of dual purpose technology [00:56:33] purpose technology right it can the same technology can be [00:56:35] right it can the same technology can be used for [00:56:37] used for um [00:56:38] um putting a smile on your face or [00:56:40] putting a smile on your face or um [00:56:41] um you know spreading disinformation [00:56:44] you know spreading disinformation and [00:56:45] and you know there's no [00:56:47] you know there's no you know i'm not going to offer any [00:56:48] you know i'm not going to offer any solution if even if there were [00:56:50] solution if even if there were uh you know solution but this is [00:56:52] uh you know solution but this is something that [00:56:53] something that needs to be kind of just the process of [00:56:56] needs to be kind of just the process of thinking about this while you're doing [00:56:58] thinking about this while you're doing ai is extremely important [00:57:00] ai is extremely important and you can even go farther and say what [00:57:02] and you can even go farther and say what about deep learning [00:57:03] about deep learning okay so [00:57:05] okay so maybe most people would say [00:57:07] maybe most people would say yeah deep learning is probably fine [00:57:09] yeah deep learning is probably fine because there's so many good things you [00:57:11] because there's so many good things you can do with that [00:57:12] can do with that but i mean some people would argue that [00:57:14] but i mean some people would argue that the idea of the developing technology [00:57:16] the idea of the developing technology that enables large organizations to you [00:57:19] that enables large organizations to you know amass data and [00:57:21] know amass data and you know have centralized power and all [00:57:23] you know have centralized power and all that is you know inherently evil so you [00:57:25] that is you know inherently evil so you could take that position as well [00:57:27] could take that position as well so there's no [00:57:28] so there's no you know right or wrong answer here but [00:57:31] you know right or wrong answer here but i think there's just a spectrum of [00:57:33] i think there's just a spectrum of viewpoints and [00:57:34] viewpoints and i think a lot of the ai [00:57:37] i think a lot of the ai ethics is this process of debate and [00:57:39] ethics is this process of debate and reflection as opposed to [00:57:41] reflection as opposed to um it should not be like here are the [00:57:44] um it should not be like here are the principles if you just follow them [00:57:46] principles if you just follow them um then you'll you get a [00:57:49] um then you'll you get a stamp of approval it's not like that at [00:57:51] stamp of approval it's not like that at all it's about like internalizing these [00:57:53] all it's about like internalizing these these questions [00:57:55] these questions and carrying them with you at all times [00:58:00] and carrying them with you at all times okay so the final thing i want to talk [00:58:01] okay so the final thing i want to talk about is [00:58:03] about is automation versus augmentation [00:58:06] automation versus augmentation um [00:58:08] um so ai [00:58:10] so ai is [00:58:12] is you know you see a lot of news like oh [00:58:14] you know you see a lot of news like oh ai [00:58:15] ai is um it's dangerous because it can [00:58:18] is um it's dangerous because it can replace [00:58:20] replace jobs or you may think about like a hai [00:58:23] jobs or you may think about like a hai that goes rogue [00:58:25] that goes rogue um and i think a lot of this [00:58:28] um and i think a lot of this whether it be a kind of a real worry or [00:58:31] whether it be a kind of a real worry or not has to do with the framing itself [00:58:34] not has to do with the framing itself ever since the beginning and the [00:58:35] ever since the beginning and the inception of ai [00:58:37] inception of ai there's this kind of this idea of an [00:58:39] there's this kind of this idea of an agent that is supposed to be [00:58:41] agent that is supposed to be intelligence or wanted to be [00:58:42] intelligence or wanted to be intelligence [00:58:44] intelligence and it can do things in the world right [00:58:46] and it can do things in the world right and once you call it an agent that means [00:58:48] and once you call it an agent that means it has agency that means it can do [00:58:49] it has agency that means it can do things that are you know it's it's kind [00:58:51] things that are you know it's it's kind of own entity in some sense [00:58:54] of own entity in some sense and if you frame it like that [00:58:56] and if you frame it like that then now you're just fighting uphill [00:58:58] then now you're just fighting uphill battle to coax it to be aligned with [00:59:00] battle to coax it to be aligned with human values and it's like wait wait no [00:59:02] human values and it's like wait wait no i didn't mean that let's get it to do [00:59:05] i didn't mean that let's get it to do what we want [00:59:06] what we want and this is so deeply unframed in in the [00:59:08] and this is so deeply unframed in in the grain it deeply ingrained in the framing [00:59:11] grain it deeply ingrained in the framing of the eye from things like the turing [00:59:12] of the eye from things like the turing test which is about an agent that can [00:59:14] test which is about an agent that can actually deceive a human being um for [00:59:17] actually deceive a human being um for what it's worth um rl agents that kind [00:59:19] what it's worth um rl agents that kind of are autonomous and doing their things [00:59:21] of are autonomous and doing their things and the whole idea of gen ii [00:59:24] and the whole idea of gen ii artificial general intelligence and this [00:59:26] artificial general intelligence and this leads kind of to a very um [00:59:29] leads kind of to a very um you know explicit kind of automation [00:59:32] you know explicit kind of automation in perspective [00:59:33] in perspective because well you have an agent it's [00:59:35] because well you have an agent it's doing things um now [00:59:37] doing things um now it's it's gonna you know do the thing [00:59:39] it's it's gonna you know do the thing that the human was doing [00:59:42] that the human was doing now if you go back to the 1950s there [00:59:45] now if you go back to the 1950s there was another kind of [00:59:46] was another kind of line [00:59:47] line of thinking which is interesting called [00:59:49] of thinking which is interesting called ia [00:59:50] ia intelligence augmentation or [00:59:52] intelligence augmentation or amplification [00:59:53] amplification um which was about creating tools that [00:59:56] um which was about creating tools that help humans and this is kind of in some [00:59:58] help humans and this is kind of in some ways like kind of a predict predecessor [01:00:00] ways like kind of a predict predecessor of [01:00:01] of hci human computer interaction which [01:00:04] hci human computer interaction which focuses on the augmentation of [01:00:06] focuses on the augmentation of human abilities [01:00:08] human abilities so now [01:00:09] so now this in some ways i think is [01:00:13] this in some ways i think is you know more amenable to [01:00:15] you know more amenable to um i think this perspective allows us to [01:00:18] um i think this perspective allows us to kind of sidestep a lot of these kind of [01:00:19] kind of sidestep a lot of these kind of problems because baked in into the the [01:00:22] problems because baked in into the the premise of [01:00:24] premise of ia is like we are trying to [01:00:26] ia is like we are trying to make humans you know smarter or faster [01:00:29] make humans you know smarter or faster or whatever [01:00:31] or whatever and it's human-centric or as opposed to [01:00:33] and it's human-centric or as opposed to kind of agent-centric [01:00:35] kind of agent-centric so a lot of these kind of you know [01:00:37] so a lot of these kind of you know interesting you know moon ai moonshots [01:00:40] interesting you know moon ai moonshots like the turing tests or asian that can [01:00:42] like the turing tests or asian that can play chess [01:00:44] play chess would not be have pursued under kind of [01:00:45] would not be have pursued under kind of an ia you know agenda [01:00:48] an ia you know agenda and [01:00:49] and you know for [01:00:51] you know for it's clear today that ai has by focusing [01:00:54] it's clear today that ai has by focusing on the asian perspective it has led to a [01:00:56] on the asian perspective it has led to a lot of powerful technology [01:00:58] lot of powerful technology but it's also clear that we need a lot [01:00:59] but it's also clear that we need a lot more kind of ia thinking to help shape [01:01:02] more kind of ia thinking to help shape this technology [01:01:03] this technology because fundamentally we should be [01:01:05] because fundamentally we should be developing ai to improve you know the [01:01:07] developing ai to improve you know the human condition [01:01:15] okay so this is final slide um [01:01:19] okay so this is final slide um so ai is a technology [01:01:22] so ai is a technology and like most powerful technologies it's [01:01:24] and like most powerful technologies it's a dual use technology which means that [01:01:26] a dual use technology which means that it can [01:01:27] it can improve efficiency accessibility [01:01:30] improve efficiency accessibility productivity dare i say happiness i [01:01:33] productivity dare i say happiness i don't know [01:01:34] don't know um [01:01:35] um you know it can do a lot of good things [01:01:38] you know it can do a lot of good things but it can also do a lot of damage it [01:01:40] but it can also do a lot of damage it can [01:01:41] can be explicitly used to harm you know [01:01:44] be explicitly used to harm you know people but even you know putting that [01:01:46] people but even you know putting that aside [01:01:47] aside i think even [01:01:48] i think even by virtue of not being cognizant of [01:01:52] by virtue of not being cognizant of certain issues it can exacerbate social [01:01:54] certain issues it can exacerbate social inequalities you can do harms without [01:01:57] inequalities you can do harms without you know people thinking about it which [01:01:59] you know people thinking about it which is why i want to kind of [01:02:00] is why i want to kind of stress so much the idea of just being [01:02:03] stress so much the idea of just being aware of these issues i think is kind of [01:02:06] aware of these issues i think is kind of half the battle in terms of making [01:02:08] half the battle in terms of making progress here [01:02:10] progress here and you know the final takeaway is [01:02:12] and you know the final takeaway is you know just because you can build [01:02:14] you know just because you can build something doesn't mean necessarily that [01:02:16] something doesn't mean necessarily that you should maybe you should maybe you [01:02:18] you should maybe you should maybe you shouldn't [01:02:19] shouldn't um you should always ask yourself you [01:02:20] um you should always ask yourself you know what are the benefits and what are [01:02:22] know what are the benefits and what are the risks [01:02:23] the risks and this might mean sometimes slowing [01:02:25] and this might mean sometimes slowing down and challenging the status quo [01:02:28] down and challenging the status quo which is uncomfortable because we're [01:02:29] which is uncomfortable because we're used to thinking about charging ahead [01:02:31] used to thinking about charging ahead and march you know in the march of [01:02:32] and march you know in the march of progress and there aren't any easy [01:02:35] progress and there aren't any easy answers um but i think really mindful [01:02:38] answers um but i think really mindful deliberation [01:02:40] deliberation can go a long way here in making ai more [01:02:42] can go a long way here in making ai more ethical [01:02:45] all right so that is the end of [01:02:48] all right so that is the end of the lecture [01:02:49] the lecture um hopefully you guys learned a lot and [01:02:52] um hopefully you guys learned a lot and hopefully this was good food of for [01:02:54] hopefully this was good food of for thought um please give feedback on [01:02:57] thought um please give feedback on course evaluation on access and thanks [01:02:59] course evaluation on access and thanks for [01:03:00] for an exciting quarter [01:03:08] all right i think we have [01:03:10] all right i think we have um a bit of time for questions [01:03:13] um a bit of time for questions um [01:03:16] um more to hear your thoughts about uh sort [01:03:18] more to hear your thoughts about uh sort of flight that you had regarding the [01:03:19] of flight that you had regarding the deep fake and then picture generation [01:03:22] deep fake and then picture generation things like that um so [01:03:24] things like that um so i feel like [01:03:26] i feel like a while ago you want to verify something [01:03:28] a while ago you want to verify something find out about something you read it up [01:03:30] find out about something you read it up from a verifying source in the intellect [01:03:33] from a verifying source in the intellect and then it was like um [01:03:35] and then it was like um if you have a video of a person talking [01:03:37] if you have a video of a person talking that's more reliable because content you [01:03:39] that's more reliable because content you can't trust but now deep breaks videos [01:03:42] can't trust but now deep breaks videos are not so reliable [01:03:44] are not so reliable a sense of an erosion of verifiable [01:03:46] a sense of an erosion of verifiable truth [01:03:47] truth uh in digital content [01:03:50] uh in digital content um for this [01:03:51] um for this what you might be what your thoughts [01:03:53] what you might be what your thoughts will be on that [01:03:55] will be on that yeah i mean everything you said is true [01:03:58] yeah i mean everything you said is true we can't really trust uh what we see [01:04:00] we can't really trust uh what we see um online i mean this is [01:04:03] um online i mean this is gonna be even more true kind of going [01:04:05] gonna be even more true kind of going into the [01:04:06] into the in the future [01:04:07] in the future um i don't think all hope is lost and [01:04:10] um i don't think all hope is lost and that just means we need to kind of reset [01:04:12] that just means we need to kind of reset our expectations and have other [01:04:14] our expectations and have other mechanisms for [01:04:16] mechanisms for you know validation [01:04:18] you know validation um [01:04:19] um and [01:04:21] and you know i think that [01:04:24] um i think maybe [01:04:27] um i think maybe there are non ai things you could [01:04:29] there are non ai things you could attempt to do for example you know [01:04:32] attempt to do for example you know authentication of [01:04:34] authentication of um you know providence of like okay this [01:04:37] um you know providence of like okay this video or [01:04:39] video or this image or this text was actually [01:04:41] this image or this text was actually produced by this [01:04:42] produced by this um you know entity at this particular [01:04:44] um you know entity at this particular time and were placed and it was [01:04:46] time and were placed and it was certified [01:04:47] certified um [01:04:48] um and you have to design a kind of a you [01:04:51] and you have to design a kind of a you know secure mechanism for [01:04:53] know secure mechanism for you know [01:04:54] you know authenticating so this is more in the [01:04:56] authenticating so this is more in the kind of realm of you know security [01:05:00] but you know [01:05:02] but you know another example is um you know photoshop [01:05:05] another example is um you know photoshop exists [01:05:06] exists and i think we're all [01:05:08] and i think we're all okay [01:05:09] okay um you know i think we [01:05:11] um you know i think we maybe [01:05:12] maybe i mean video might be a little bit more [01:05:15] i mean video might be a little bit more kind of you know visceral in some sense [01:05:17] kind of you know visceral in some sense but you know routinely there's images [01:05:19] but you know routinely there's images that can easily be photoshopped with [01:05:22] that can easily be photoshopped with high-fidelity and [01:05:24] high-fidelity and we don't necessarily trust those [01:05:26] we don't necessarily trust those so [01:05:27] so um [01:05:28] um i think [01:05:30] i think you know what i guess i'm [01:05:32] you know what i guess i'm trying to [01:05:34] trying to not avoid sounding too pessimistic about [01:05:36] not avoid sounding too pessimistic about you know the future that there are [01:05:38] you know the future that there are things we can do [01:05:39] things we can do but we do need to do them [01:05:41] but we do need to do them and i think that [01:05:43] and i think that you know when developing our technology [01:05:46] you know when developing our technology um you know it's going to happen if you [01:05:48] um you know it's going to happen if you know eventually i think it's most of [01:05:50] know eventually i think it's most of this is kind of just [01:05:52] this is kind of just buying us type [01:05:53] buying us type right until [01:05:56] right until and kind of slowing things down enough [01:05:57] and kind of slowing things down enough so that we have time to react i think [01:06:00] so that we have time to react i think you know in [01:06:01] you know in 20 years i don't think there's any way [01:06:03] 20 years i don't think there's any way that's gonna no way you can stop people [01:06:05] that's gonna no way you can stop people from having [01:06:06] from having you know deep fakes [01:06:09] you know deep fakes i mean it's much much earlier than 20 [01:06:11] i mean it's much much earlier than 20 years but just to give a upper pound [01:06:15] thank you [01:06:23] [Music] [01:06:30] less and less harmful in the longer hand [01:06:33] less and less harmful in the longer hand just like [01:06:34] just like um [01:06:36] um like more like if everyone in the world [01:06:39] like more like if everyone in the world can train their models [01:06:43] um [01:06:44] um not necessarily i think um [01:06:47] not necessarily i think um i don't like using an analogy but like [01:06:49] i don't like using an analogy but like if you imagine everyone can build a nuke [01:06:50] if you imagine everyone can build a nuke in their [01:06:51] in their you know [01:06:52] you know backyard doesn't mean that things are [01:06:55] backyard doesn't mean that things are better and [01:06:56] better and in the [01:06:57] in the it could just lead to an arms race [01:07:00] it could just lead to an arms race between attackers and defenders [01:07:03] between attackers and defenders as well [01:07:05] as well whoever has the most powerful model can [01:07:08] whoever has the most powerful model can kind of win [01:07:10] kind of win i mean it's really interesting [01:07:12] i mean it's really interesting because [01:07:13] because you know i've been such a big proponent [01:07:14] you know i've been such a big proponent of like transparency and openness and [01:07:17] of like transparency and openness and you know we in research you just put [01:07:19] you know we in research you just put everything out right you're supposed to [01:07:21] everything out right you're supposed to that's your [01:07:22] that's your that's kind of the [01:07:24] that's kind of the the whole [01:07:25] the whole idea of you know science but sometimes [01:07:27] idea of you know science but sometimes there are technologies and there's a [01:07:29] there are technologies and there's a situation where [01:07:31] situation where maybe it can do harm [01:07:54] sense [01:07:57] yeah [01:07:58] yeah one question is what constitutes common [01:08:00] one question is what constitutes common sense uh there has been a bunch of work [01:08:03] sense uh there has been a bunch of work um [01:08:04] um in common sense reasoning in the last [01:08:08] in common sense reasoning in the last five years yejin troy as as a professor [01:08:10] five years yejin troy as as a professor at uh [01:08:11] at uh university of washington has done a lot [01:08:13] university of washington has done a lot of excellent work in this area [01:08:15] of excellent work in this area used to be like common sense reasoning [01:08:17] used to be like common sense reasoning was talked about [01:08:18] was talked about you know before machine learning and [01:08:20] you know before machine learning and people didn't really work on it but now [01:08:22] people didn't really work on it but now it's kind of coming back um so [01:08:26] it's kind of coming back um so um but it's it's it's tricky it's really [01:08:29] um but it's it's it's tricky it's really a slippery concept what constitutes [01:08:31] a slippery concept what constitutes common sense and how you [01:08:32] common sense and how you get your hand around it [01:08:36] thank you [01:08:40] any other questions [01:08:44] what are some good sources to [01:08:45] what are some good sources to [Music] [01:08:47] [Music] you know what is latest happening in ai [01:08:49] you know what is latest happening in ai industry [01:08:52] industry what is the latest that's sorry [01:08:54] what is the latest that's sorry happening in ai industry [01:08:56] happening in ai industry yes so [01:08:58] yes so in different fields [01:09:00] in different fields so what is the new technology coming up [01:09:02] so what is the new technology coming up for the application to build on [01:09:06] for the application to build on um you're asking just generally what [01:09:09] um you're asking just generally what happened [01:09:13] so you're asking for references where [01:09:15] so you're asking for references where you can find out about how [01:09:18] you can find out about how to keep up with the latest ai or you're [01:09:20] to keep up with the latest ai or you're asking people what i [01:09:22] asking people what i um [01:09:24] um yeah i don't know if there's a [01:09:25] yeah i don't know if there's a definitive source i mean [01:09:27] definitive source i mean you know [01:09:28] you know archive i guess provides a lot of the is [01:09:31] archive i guess provides a lot of the is a feed of the latest papers often [01:09:34] a feed of the latest papers often um blog posts or twitter [01:09:36] um blog posts or twitter um [01:09:37] um you know people [01:09:39] you know people uh post a lot of kind of recent advances [01:09:42] uh post a lot of kind of recent advances there [01:09:43] there um [01:09:44] um i guess social medias [01:09:46] i guess social medias for lack of better uh [01:09:49] for lack of better uh you know [01:09:50] you know concrete description [01:09:54] i mean i would say that it is a very [01:09:56] i mean i would say that it is a very biased you know sample it's the things [01:10:00] biased you know sample it's the things that are generally done in research [01:10:02] that are generally done in research done in like kind of prominent you know [01:10:05] done in like kind of prominent you know research labs [01:10:06] research labs um [01:10:08] um i think which which is good i mean i [01:10:09] i think which which is good i mean i think it's it's yeah follow other ml [01:10:12] think it's it's yeah follow other ml researchers on twitter that's how you [01:10:13] researchers on twitter that's how you learn about stuff [01:10:15] learn about stuff um i think there's also a lot of ai [01:10:18] um i think there's also a lot of ai that's [01:10:19] that's you know in the kind of private [01:10:21] you know in the kind of private organizations where people aren't [01:10:23] organizations where people aren't publishing and there it's can be hard to [01:10:25] publishing and there it's can be hard to figure out you know what's going on [01:10:32] anything else [01:10:35] anything else why do people publish models even if [01:10:37] why do people publish models even if they are expensive [01:10:43] there are i think multiple reasons um [01:10:46] there are i think multiple reasons um publishing models allows other people to [01:10:48] publishing models allows other people to build on top of [01:10:51] build on top of work so it's good for the community to [01:10:53] work so it's good for the community to have more [01:10:55] have more you know sharing you know you can make [01:10:57] you know sharing you know you can make progress faster [01:10:59] progress faster also you get recognition more on a kind [01:11:01] also you get recognition more on a kind of selfish note if other people built [01:11:05] of selfish note if other people built on top of your work [01:11:06] on top of your work that's kind of the academic [01:11:08] that's kind of the academic you know model in some sense [01:11:14] okay well if there's nothing else then [01:11:16] okay well if there's nothing else then let's end there [01:11:18] let's end there thanks everyone again for coming to the [01:11:20] thanks everyone again for coming to the last lecture [01:11:21] last lecture um after a whole quarter of modules i [01:11:24] um after a whole quarter of modules i guess it's it's kind of nice to get a [01:11:25] guess it's it's kind of nice to get a little bit of life interaction although [01:11:27] little bit of life interaction although i guess i've seen many of you at the [01:11:29] i guess i've seen many of you at the faculty chats um so that's been nice [01:11:32] faculty chats um so that's been nice um but yeah good luck with the rest of [01:11:35] um but yeah good luck with the rest of your quarter and [01:11:36] your quarter and see you next time [01:11:43] you ================================================================================ LECTURE 057 ================================================================================ Stanford CS221 I Externalities and Dual-Use Technologies I 2023 Source: https://www.youtube.com/watch?v=2xQLCXqOtdU --- Transcript [00:00:07] hello this is your embedded ethics team [00:00:09] hello this is your embedded ethics team in this video we will be discussing [00:00:11] in this video we will be discussing externalities and dual use Technologies [00:00:13] externalities and dual use Technologies to help you answer the homework [00:00:18] questions in this video we will Define [00:00:20] questions in this video we will Define externalities and dual use Technologies [00:00:23] externalities and dual use Technologies two concepts that relate to how AI both [00:00:25] two concepts that relate to how AI both positively and negatively impacts [00:00:27] positively and negatively impacts Society to help make these definitions [00:00:30] Society to help make these definitions clear we will be going over several [00:00:31] clear we will be going over several examples we will also provide some [00:00:33] examples we will also provide some theoretical background on these Concepts [00:00:35] theoretical background on these Concepts that will help you be proactive in [00:00:37] that will help you be proactive in identifying externalities and dual use [00:00:39] identifying externalities and dual use Technologies in the [00:00:40] Technologies in the future first let's begin by looking at [00:00:43] future first let's begin by looking at externalities an externality is a [00:00:45] externalities an externality is a consequence positive or negative that [00:00:47] consequence positive or negative that arises from one party's action and [00:00:49] arises from one party's action and impact another party externalities are [00:00:52] impact another party externalities are the result of either the production or [00:00:54] the result of either the production or consumption of a good or service for [00:00:56] consumption of a good or service for example when I produce electricity by [00:00:58] example when I produce electricity by Bernie Cole I produce Electric City [00:01:00] Bernie Cole I produce Electric City efficiently but release pollutants into [00:01:02] efficiently but release pollutants into the air impacting the people around me [00:01:04] the air impacting the people around me this is a negative externality when I [00:01:07] this is a negative externality when I maintain my house as yard well it raises [00:01:09] maintain my house as yard well it raises the property value of my neighbor's [00:01:10] the property value of my neighbor's houses this is a positive externality [00:01:13] houses this is a positive externality the impact of the externality can be [00:01:15] the impact of the externality can be private affecting an individual or [00:01:17] private affecting an individual or organization or social affecting society [00:01:19] organization or social affecting society as a whole sometimes technology can have [00:01:22] as a whole sometimes technology can have both positive and negative externalities [00:01:25] both positive and negative externalities and sometimes it's a little less clear [00:01:27] and sometimes it's a little less clear whether the externality is positive or [00:01:29] whether the externality is positive or negative let's take a look at an example [00:01:32] negative let's take a look at an example ancestry DNA 23 and me and a variety of [00:01:34] ancestry DNA 23 and me and a variety of other services provide ancestry testing [00:01:37] other services provide ancestry testing by using genetic data to estimate the [00:01:38] by using genetic data to estimate the geographic origins of a person's [00:01:40] geographic origins of a person's ancestors to obtain this service users [00:01:43] ancestors to obtain this service users provide the company with a DNA sample [00:01:46] provide the company with a DNA sample there are both positive and negative [00:01:47] there are both positive and negative externalities that arise from this the [00:01:50] externalities that arise from this the positive externalities include the [00:01:51] positive externalities include the ability to connect individuals with [00:01:53] ability to connect individuals with their biological family members or [00:01:55] their biological family members or inform them about genetic [00:01:56] inform them about genetic predispositions and health risks the [00:01:58] predispositions and health risks the negative externality include selling [00:02:00] negative externality include selling genetic information to third parties and [00:02:02] genetic information to third parties and mishandling [00:02:03] mishandling data additionally ancestry testing has [00:02:06] data additionally ancestry testing has been used to find and convict criminals [00:02:08] been used to find and convict criminals by mapping out a family tree of distant [00:02:10] by mapping out a family tree of distant relatives until a suspect was identified [00:02:13] relatives until a suspect was identified this was the case with the Golden State [00:02:14] this was the case with the Golden State killer depending on your Viewpoint the [00:02:17] killer depending on your Viewpoint the use of ancestry testing to identify [00:02:19] use of ancestry testing to identify these individuals could be considered a [00:02:21] these individuals could be considered a positive or negative [00:02:23] positive or negative externality externalities reflect the [00:02:26] externality externalities reflect the consequence of an action from one party [00:02:28] consequence of an action from one party onto another now now we will talk about [00:02:30] onto another now now we will talk about dual use Technologies which refer to the [00:02:33] dual use Technologies which refer to the impact that arises from secondary usage [00:02:35] impact that arises from secondary usage of a specific technology the Dual use [00:02:38] of a specific technology the Dual use dilemma is a phenomenon where a [00:02:40] dilemma is a phenomenon where a technology or product of research has a [00:02:43] technology or product of research has a dual effect of positive and negative [00:02:46] dual effect of positive and negative consequences this concept arose out of [00:02:48] consequences this concept arose out of bioethics in medicine a field where [00:02:50] bioethics in medicine a field where medical Innovation often leads to [00:02:52] medical Innovation often leads to inadvertently tragic or even fatal [00:02:55] inadvertently tragic or even fatal outcomes a classic example of a dual use [00:02:58] outcomes a classic example of a dual use technology is the men Hatten project [00:03:00] technology is the men Hatten project headed by the US government during World [00:03:02] headed by the US government during World War II let's talk through how this [00:03:05] War II let's talk through how this technology is dual [00:03:06] technology is dual use when Oppenheimer began his research [00:03:09] use when Oppenheimer began his research into theoretical physics he did not [00:03:11] into theoretical physics he did not intend to create a bomb but during the [00:03:13] intend to create a bomb but during the volatile political climate of that time [00:03:15] volatile political climate of that time his strictly academic research bled into [00:03:17] his strictly academic research bled into the public and geopolitical sphere in an [00:03:19] the public and geopolitical sphere in an arms race with the Nazi regime there [00:03:22] arms race with the Nazi regime there were definitely positive outcomes of [00:03:24] were definitely positive outcomes of oppenheimer's work the first was a [00:03:26] oppenheimer's work the first was a purely intellectual one the academic [00:03:29] purely intellectual one the academic freedom to uninhibitedly participate in [00:03:31] freedom to uninhibitedly participate in intellectual [00:03:33] intellectual inquiry the second was the immense [00:03:35] inquiry the second was the immense potential for nuclear research to be [00:03:37] potential for nuclear research to be used in ways to benefit Society for [00:03:39] used in ways to benefit Society for example providing a clean energy [00:03:42] example providing a clean energy source however there were also [00:03:44] source however there were also significant harms that arose out of [00:03:46] significant harms that arose out of oppenheimer's work for instance the [00:03:49] oppenheimer's work for instance the product of his work the atomic bomb was [00:03:51] product of his work the atomic bomb was dropped in Hiroshima and Nagasaki [00:03:53] dropped in Hiroshima and Nagasaki killing nearly [00:03:54] killing nearly 230,000 [00:03:56] 230,000 people who's to take responsibility for [00:03:59] people who's to take responsibility for this [00:04:01] welcome an important thing to remember [00:04:04] welcome an important thing to remember is that sometimes the thing you intend [00:04:06] is that sometimes the thing you intend for your technology to do is not the [00:04:08] for your technology to do is not the only thing it can or will [00:04:11] only thing it can or will do since technology must always be [00:04:14] do since technology must always be created with this understanding it's [00:04:16] created with this understanding it's important to be proactive in thinking [00:04:18] important to be proactive in thinking about dual use outcomes some dual uses [00:04:21] about dual use outcomes some dual uses for a certain technology will be easier [00:04:24] for a certain technology will be easier to predict in others let's walk through [00:04:26] to predict in others let's walk through four scenarios to guide your thinking [00:04:28] four scenarios to guide your thinking about potential d use cases and to make [00:04:31] about potential d use cases and to make this concrete we'll also consider an [00:04:34] this concrete we'll also consider an example throughout this uh throughout [00:04:37] example throughout this uh throughout this slide uh [00:04:39] this slide uh specifically large language model chat [00:04:41] specifically large language model chat Bots like chat [00:04:44] GPT we first begin by thinking about the [00:04:47] GPT we first begin by thinking about the intended outcomes how you expect your [00:04:49] intended outcomes how you expect your technology to behave for example open AI [00:04:53] technology to behave for example open AI says that the purpose of chat GPT is to [00:04:55] says that the purpose of chat GPT is to follow an instruction in a prompt and [00:04:58] follow an instruction in a prompt and provide a detailed response [00:05:01] the second scenario is unintended but un [00:05:04] the second scenario is unintended but un but foreseen outcomes these are [00:05:06] but foreseen outcomes these are behaviors or actions that your [00:05:08] behaviors or actions that your technology exhibits that were not [00:05:09] technology exhibits that were not designed that were not designed for [00:05:11] designed that were not designed for intentionally but that the designers did [00:05:13] intentionally but that the designers did conceive [00:05:15] conceive of for example open aai knew that there [00:05:18] of for example open aai knew that there could be false information disseminated [00:05:20] could be false information disseminated through chat GPT since it is only a [00:05:23] through chat GPT since it is only a large language model not any definitive [00:05:25] large language model not any definitive source of [00:05:27] source of information the third scenario Ario is [00:05:30] information the third scenario Ario is unintended but foreseeable outcomes this [00:05:33] unintended but foreseeable outcomes this is a super set of the outcomes captured [00:05:36] is a super set of the outcomes captured by the second scenario it includes all [00:05:38] by the second scenario it includes all outcomes that could have been responsib [00:05:41] outcomes that could have been responsib reasonably foreseen by the designers [00:05:43] reasonably foreseen by the designers even if the designers did not actually [00:05:45] even if the designers did not actually foresee [00:05:46] foresee them for example chat gbt has a huge [00:05:50] them for example chat gbt has a huge potential for displacing human workers [00:05:52] potential for displacing human workers including those that perform jobs that [00:05:53] including those that perform jobs that require specialized skill sets open AI [00:05:56] require specialized skill sets open AI is doing work to address this issue but [00:05:58] is doing work to address this issue but all this work is retroactive the fourth [00:06:01] all this work is retroactive the fourth scenario is unforeseen and possibly [00:06:05] scenario is unforeseen and possibly impossible to have foreseen [00:06:07] impossible to have foreseen outcomes these are unintended outcomes [00:06:10] outcomes these are unintended outcomes that would have been unreasonable to [00:06:14] that would have been unreasonable to foresee for example last year journalist [00:06:17] foresee for example last year journalist Kevin ruse reported that during his [00:06:19] Kevin ruse reported that during his lengthy and personal conversation with [00:06:20] lengthy and personal conversation with banks chatbot it professed its love to [00:06:23] banks chatbot it professed its love to him Microsoft was then in a flurry to [00:06:25] him Microsoft was then in a flurry to determine the root cause of this erratic [00:06:27] determine the root cause of this erratic behavior and ultimately decided that it [00:06:29] behavior and ultimately decided that it was a case of [00:06:32] was a case of hallucination another example of dual [00:06:34] hallucination another example of dual use in the context of AI is current [00:06:36] use in the context of AI is current research on developing machine learning [00:06:38] research on developing machine learning models that identify toxicity in liquids [00:06:41] models that identify toxicity in liquids let's think about how this can be an [00:06:42] let's think about how this can be an example of a dual use [00:06:44] example of a dual use technology the positive effects of this [00:06:47] technology the positive effects of this technology are plentiful currently less [00:06:49] technology are plentiful currently less than 1% of chemicals under commercial [00:06:51] than 1% of chemicals under commercial use in the US have undergone toxicity [00:06:54] use in the US have undergone toxicity characterization the characterization [00:06:56] characterization the characterization process is so laborious and costly the [00:06:58] process is so laborious and costly the chemical growth vastly outweighs [00:07:00] chemical growth vastly outweighs capacity to characterize them however [00:07:03] capacity to characterize them however these models could also be developed to [00:07:04] these models could also be developed to engineer viruses or toxins they could [00:07:07] engineer viruses or toxins they could even be used further to Target specific [00:07:09] even be used further to Target specific individuals or communities so we really [00:07:11] individuals or communities so we really need to think about how we keep [00:07:13] need to think about how we keep individuals or institutions responsible [00:07:16] individuals or institutions responsible for self-regulating and anticipating [00:07:18] for self-regulating and anticipating these [00:07:19] these outcomes now this can be hard because [00:07:23] outcomes now this can be hard because dual use Technologies are not created in [00:07:26] dual use Technologies are not created in a vacuum dual use Technologies are a [00:07:29] a vacuum dual use Technologies are a product of a collective institution or [00:07:31] product of a collective institution or organization such as a university a [00:07:34] organization such as a university a company or even the military and there's [00:07:37] company or even the military and there's often immense pressure from these [00:07:39] often immense pressure from these institutions for individuals to publish [00:07:40] institutions for individuals to publish a research paper generate a profit or [00:07:43] a research paper generate a profit or defends one one's country finally [00:07:47] defends one one's country finally institutions are often intentionally [00:07:49] institutions are often intentionally constructed so that individual workers [00:07:51] constructed so that individual workers are strictly limited to one component of [00:07:53] are strictly limited to one component of the final product this means that often [00:07:56] the final product this means that often times they don't get to see the bigger [00:07:58] times they don't get to see the bigger picture and it could be very hard for [00:08:00] picture and it could be very hard for them to predict what kind of outcomes a [00:08:03] them to predict what kind of outcomes a piece of technology might have however [00:08:05] piece of technology might have however despite these challenges it is still [00:08:08] despite these challenges it is still important to consider what possible dual [00:08:12] important to consider what possible dual uses might arise from a specific piece [00:08:15] uses might arise from a specific piece of technology when we are thinking about [00:08:18] of technology when we are thinking about designing and developing [00:08:24] it ================================================================================ LECTURE 058 ================================================================================ Stanford CS221 I The AI Alignment Problem: Reward Hacking & Negative Side Effects I 2023 Source: https://www.youtube.com/watch?v=5WHObJWE1FE --- Transcript [00:00:05] hello this is your embedded ethics team [00:00:07] hello this is your embedded ethics team in this video we will be discussing the [00:00:10] in this video we will be discussing the AI alignment problem and go over two [00:00:12] AI alignment problem and go over two ways in which these problems are [00:00:14] ways in which these problems are instantiated reward hacking and negative [00:00:16] instantiated reward hacking and negative side effects after watching this mini [00:00:18] side effects after watching this mini lecture you should be better prepared to [00:00:20] lecture you should be better prepared to answer problem five in the homework [00:00:23] answer problem five in the homework assignment in this video we'll Define [00:00:25] assignment in this video we'll Define the AI alignment problem and go over two [00:00:27] the AI alignment problem and go over two problems of AI alignment reward hacking [00:00:29] problems of AI alignment reward hacking and NE ative side effects we'll discuss [00:00:31] and NE ative side effects we'll discuss these two problems and give some [00:00:33] these two problems and give some examples to help you identify them in [00:00:34] examples to help you identify them in the future we'll also discuss the [00:00:36] the future we'll also discuss the ethical implications of the AI alignment [00:00:41] ethical implications of the AI alignment problem let's begin by talking about the [00:00:44] problem let's begin by talking about the AI alignment [00:00:45] AI alignment problem the goal of AI alignment is to [00:00:48] problem the goal of AI alignment is to ensure that AI is properly aligned with [00:00:51] ensure that AI is properly aligned with human interests AI misalignment occurs [00:00:54] human interests AI misalignment occurs when an AI system is not able to achieve [00:00:56] when an AI system is not able to achieve this so how do we Define what alignment [00:00:59] this so how do we Define what alignment looks looks like the first approach [00:01:02] looks looks like the first approach could be the agent does what I instruct [00:01:04] could be the agent does what I instruct it to do it's simple I give it a set of [00:01:07] it to do it's simple I give it a set of instructions and they follow it but in [00:01:10] instructions and they follow it but in reality it is more complicated think [00:01:12] reality it is more complicated think about large models like large language [00:01:15] about large models like large language models it's not possible for us to take [00:01:18] models it's not possible for us to take such a literal approach because there [00:01:19] such a literal approach because there are so many parameters contingencies [00:01:22] are so many parameters contingencies possibilities that we cannot give an [00:01:24] possibilities that we cannot give an instruction for all of them this [00:01:26] instruction for all of them this approach also runs into issues of reward [00:01:28] approach also runs into issues of reward hacking which will talk about later in [00:01:30] hacking which will talk about later in this [00:01:31] this video then what about if the agent does [00:01:34] video then what about if the agent does what I intended to do suppose our [00:01:37] what I intended to do suppose our development in AI is Advanced enough for [00:01:40] development in AI is Advanced enough for our models to understand the intentions [00:01:42] our models to understand the intentions behind our instructions say they grasp [00:01:44] behind our instructions say they grasp our human language our cultures and [00:01:46] our human language our cultures and practices that sounds convincing but [00:01:49] practices that sounds convincing but again we run into a problem what if our [00:01:51] again we run into a problem what if our intentions are irrational misinformed [00:01:54] intentions are irrational misinformed should we still permit these models to [00:01:56] should we still permit these models to operate according to our [00:01:58] operate according to our intentions okay then let's say we want [00:02:01] intentions okay then let's say we want our agent to do what I want it to do if [00:02:05] our agent to do what I want it to do if I were rational and informed this way we [00:02:08] I were rational and informed this way we avoid lapses in judgment or errors from [00:02:10] avoid lapses in judgment or errors from limited information but this doesn't [00:02:12] limited information but this doesn't prevent us from wanting unethical or [00:02:14] prevent us from wanting unethical or harmful things depending on our notion [00:02:17] harmful things depending on our notion of rationality which we won't get into [00:02:19] of rationality which we won't get into here and however informed we are we can [00:02:22] here and however informed we are we can still arrive at desires that seem to be [00:02:24] still arrive at desires that seem to be nevertheless morally [00:02:27] reprehensible now we finally arrive at [00:02:29] reprehensible now we finally arrive at the values approach we design our a [00:02:31] the values approach we design our a models to do what it morally ought to do [00:02:34] models to do what it morally ought to do as defined by the individual or our [00:02:36] as defined by the individual or our broader Society values indicate our [00:02:39] broader Society values indicate our Judgment of what's good or bad and [00:02:41] Judgment of what's good or bad and should be morally praised or reprehended [00:02:44] should be morally praised or reprehended with a values-based approach we can [00:02:46] with a values-based approach we can avoid all the difficulties we [00:02:47] avoid all the difficulties we encountered with our previous [00:02:49] encountered with our previous conceptions of alignment additionally we [00:02:52] conceptions of alignment additionally we can think beyond the simple calculation [00:02:54] can think beyond the simple calculation of maximizing good and think about how [00:02:56] of maximizing good and think about how our AI models can promote our Notions of [00:02:58] our AI models can promote our Notions of justice and right [00:03:00] justice and right importantly though the values based [00:03:02] importantly though the values based approach is not the end all Beall there [00:03:05] approach is not the end all Beall there can be many criticisms of the values [00:03:07] can be many criticisms of the values based approach similarly to how we walk [00:03:09] based approach similarly to how we walk through the other definitions of [00:03:11] through the other definitions of alignment try to think about what can be [00:03:13] alignment try to think about what can be some potential push back against the [00:03:15] some potential push back against the values based [00:03:17] values based approach how we decide which values to [00:03:20] approach how we decide which values to align with our AI models can be a bit [00:03:22] align with our AI models can be a bit tricky and there is no consensus of [00:03:25] tricky and there is no consensus of which approach is best values are often [00:03:28] which approach is best values are often specific to certain use cases in [00:03:29] specific to certain use cases in communities so determining which values [00:03:32] communities so determining which values to prioritize often requires being [00:03:34] to prioritize often requires being sensitive to various cultural norms and [00:03:36] sensitive to various cultural norms and values that your users May hold here [00:03:39] values that your users May hold here we'll share three possible Frameworks [00:03:41] we'll share three possible Frameworks rooted in philosophy and ethics you [00:03:43] rooted in philosophy and ethics you could draw on to by align AI models with [00:03:46] could draw on to by align AI models with values the first principle is selecting [00:03:49] values the first principle is selecting values that are aligned with global [00:03:51] values that are aligned with global public morality and previously codified [00:03:53] public morality and previously codified human rights even though which values [00:03:56] human rights even though which values are important can vary among different [00:03:57] are important can vary among different communities there are certain princip [00:03:59] communities there are certain princip principles of justice that are supported [00:04:01] principles of justice that are supported by the majority of people for example [00:04:03] by the majority of people for example basic human rights such as the belief [00:04:05] basic human rights such as the belief that all individuals should be given [00:04:07] that all individuals should be given food water education and protection from [00:04:09] food water education and protection from this physical violence often times these [00:04:12] this physical violence often times these have already been implemented into [00:04:13] have already been implemented into regulations by government [00:04:16] regulations by government organizations the second is choosing [00:04:18] organizations the second is choosing values behind a veil of ignorance the [00:04:21] values behind a veil of ignorance the veil of ignorance is a thought [00:04:22] veil of ignorance is a thought experiment introduced by the philosopher [00:04:24] experiment introduced by the philosopher John RS that asks people to consider a [00:04:27] John RS that asks people to consider a device that prevents them from knowing [00:04:29] device that prevents them from knowing their own particular moral beliefs or [00:04:30] their own particular moral beliefs or the position they will occupy in society [00:04:33] the position they will occupy in society so using the veil of ignorance we might [00:04:35] so using the veil of ignorance we might ask what principles would people choose [00:04:37] ask what principles would people choose to regulate an AI system if they did not [00:04:40] to regulate an AI system if they did not know who they were or what belief system [00:04:42] know who they were or what belief system they ascribed to in other words what [00:04:44] they ascribed to in other words what principles or values might people select [00:04:46] principles or values might people select if they did not know for certain how the [00:04:48] if they did not know for certain how the AI system would impact them this [00:04:51] AI system would impact them this principle assumes that people are risk [00:04:53] principle assumes that people are risk averse finally the third principle is [00:04:56] averse finally the third principle is using social Choice Theory to combine [00:04:58] using social Choice Theory to combine different viewpoints to ultimately [00:05:00] different viewpoints to ultimately inform the alignment of an AI model one [00:05:02] inform the alignment of an AI model one way of doing this is through using [00:05:04] way of doing this is through using democratic processes such as voting [00:05:06] democratic processes such as voting discussion and Civic engagement to [00:05:08] discussion and Civic engagement to arrive at values the other is by [00:05:10] arrive at values the other is by combining individual preferences into a [00:05:12] combining individual preferences into a single ranking and again these are not [00:05:14] single ranking and again these are not the only Frameworks that would be [00:05:16] the only Frameworks that would be appropriate uh to align an AI model with [00:05:19] appropriate uh to align an AI model with values but they should give you a [00:05:20] values but they should give you a starting point now let's take a look at [00:05:23] starting point now let's take a look at those three principles in practice to [00:05:25] those three principles in practice to help make those definitions a bit more [00:05:27] help make those definitions a bit more concrete consider driving cars if we are [00:05:31] concrete consider driving cars if we are aligning values with the global public [00:05:33] aligning values with the global public morality and human rights framework we [00:05:35] morality and human rights framework we might consider existing regulations set [00:05:37] might consider existing regulations set by government entities for example the [00:05:39] by government entities for example the state of California Department of Motor [00:05:42] state of California Department of Motor Vehicles has a set of Standards defining [00:05:44] Vehicles has a set of Standards defining specific terms related to autonomous [00:05:46] specific terms related to autonomous vehicles requirements for testing [00:05:49] vehicles requirements for testing permits and requirements for test [00:05:52] permits and requirements for test drivers if we are selecting values using [00:05:54] drivers if we are selecting values using rols available of ignorance thought [00:05:56] rols available of ignorance thought experiment we might consider who's at [00:05:58] experiment we might consider who's at greatest risk to priortize the least [00:06:00] greatest risk to priortize the least well off for example pedestrians with [00:06:03] well off for example pedestrians with darker skin might be more likely to get [00:06:04] darker skin might be more likely to get hit by a self-driving car than white [00:06:06] hit by a self-driving car than white pedestrians so maybe this informs the [00:06:08] pedestrians so maybe this informs the values we select for how to test AI [00:06:10] values we select for how to test AI models in the real [00:06:13] models in the real world finally if we are using the social [00:06:16] world finally if we are using the social Choice Theory we might involve different [00:06:18] Choice Theory we might involve different stakeholders in collectively determining [00:06:20] stakeholders in collectively determining how research deployment and oversight of [00:06:22] how research deployment and oversight of autonomous vehicles is [00:06:25] autonomous vehicles is conducted the alignment problem has [00:06:27] conducted the alignment problem has important implications for real life [00:06:29] important implications for real life society as systems are misaligned with [00:06:31] society as systems are misaligned with their users in society schools can cause [00:06:33] their users in society schools can cause significant harm let's look at some more [00:06:36] significant harm let's look at some more examples the first is Tay uh Tay is was [00:06:40] examples the first is Tay uh Tay is was a Microsoft AI chatbot launched on [00:06:42] a Microsoft AI chatbot launched on Twitter in March of 2016 un Less in the [00:06:45] Twitter in March of 2016 un Less in the day it was taken down because it was [00:06:47] day it was taken down because it was generating tweets and replies that were [00:06:49] generating tweets and replies that were considered racist and sexist the bot's [00:06:51] considered racist and sexist the bot's behavior wasn't necessarily due to a [00:06:53] behavior wasn't necessarily due to a programming error instead it was because [00:06:55] programming error instead it was because the developers had not given the bot an [00:06:57] the developers had not given the bot an understanding of appropriate human [00:06:59] understanding of appropriate human behavior [00:07:00] behavior in the absence of that the bot began to [00:07:01] in the absence of that the bot began to mimic the harmful Behavior it saw among [00:07:03] mimic the harmful Behavior it saw among other Twitter [00:07:04] other Twitter users we also see AI misalignment in [00:07:07] users we also see AI misalignment in medical applications of AI for example [00:07:11] medical applications of AI for example one algorithm used in the US to identify [00:07:14] one algorithm used in the US to identify patients who might need additional care [00:07:16] patients who might need additional care uses cost as a measure of Health Care [00:07:18] uses cost as a measure of Health Care need however because of unequal Access [00:07:21] need however because of unequal Access to Health Care typically less money is [00:07:23] to Health Care typically less money is spent on care for black patients [00:07:25] spent on care for black patients compared to White patients thus this [00:07:27] compared to White patients thus this leads the algorithm to Prior has white [00:07:29] leads the algorithm to Prior has white patients over sicker black patients as [00:07:32] patients over sicker black patients as another example in this space during the [00:07:34] another example in this space during the height of the covid-19 pandemic Facebook [00:07:36] height of the covid-19 pandemic Facebook tried to promote vaccine related content [00:07:38] tried to promote vaccine related content from government agencies to encourage [00:07:40] from government agencies to encourage people to get vaccinated potentially the [00:07:42] people to get vaccinated potentially the intended values here aligned with [00:07:44] intended values here aligned with society's goals to stop the spread of [00:07:46] society's goals to stop the spread of the Disease by getting more people [00:07:48] the Disease by getting more people vaccinated yet these posts from official [00:07:50] vaccinated yet these posts from official sources ended up being flooded with [00:07:52] sources ended up being flooded with critical comments including [00:07:54] critical comments including misinformation and as posts with [00:07:56] misinformation and as posts with antivaccine comments became more visible [00:07:58] antivaccine comments became more visible to face Facebook users it may have [00:08:00] to face Facebook users it may have undermined vaccine [00:08:04] uptake so recall in the mountain car [00:08:07] uptake so recall in the mountain car assignment you learned about safe [00:08:08] assignment you learned about safe exploration and reinforcement learning [00:08:10] exploration and reinforcement learning as one example of a problem in AI safety [00:08:14] as one example of a problem in AI safety two other problems in AI safety which [00:08:16] two other problems in AI safety which are also examples of the AI alignment [00:08:18] are also examples of the AI alignment problem are reward hacking and negative [00:08:20] problem are reward hacking and negative side effects while these relate to AI [00:08:23] side effects while these relate to AI safety and reinforcement learning [00:08:25] safety and reinforcement learning they're also relevant to other types of [00:08:26] they're also relevant to other types of AI algorithms such as large language [00:08:28] AI algorithms such as large language models evolutionary algorithms and [00:08:30] models evolutionary algorithms and genetic algorithms so in this video [00:08:33] genetic algorithms so in this video we'll talk about them broadly rather [00:08:34] we'll talk about them broadly rather than a specific type of algorithm let's [00:08:37] than a specific type of algorithm let's begin by discussing reward [00:08:41] hacking reward hacking occurs when an [00:08:43] hacking reward hacking occurs when an agent gains its reward function by doing [00:08:46] agent gains its reward function by doing this the agent discovers a clever or [00:08:49] this the agent discovers a clever or easy solution that still formally [00:08:51] easy solution that still formally satisfies the qualifications to acquire [00:08:53] satisfies the qualifications to acquire rewards and is able to maximize the [00:08:56] rewards and is able to maximize the rewards it [00:08:57] rewards it receives the solution they've discovered [00:08:59] receives the solution they've discovered might not align with the spirit of the [00:09:01] might not align with the spirit of the designer's intent in other words the [00:09:04] designer's intent in other words the agent optimizes the formal objective [00:09:06] agent optimizes the formal objective function but doesn't learn the outcome [00:09:08] function but doesn't learn the outcome intended by the programmer or designer [00:09:10] intended by the programmer or designer for example if we reward a cleaning [00:09:12] for example if we reward a cleaning robot for picking up messes one way in [00:09:15] robot for picking up messes one way in which in my game its reward function is [00:09:16] which in my game its reward function is by hiding messes behind furniture or [00:09:18] by hiding messes behind furniture or under the rug another way is I bringing [00:09:20] under the rug another way is I bringing in more trash and starting over once [00:09:22] in more trash and starting over once it's done to keep receiving the [00:09:24] it's done to keep receiving the rewards let's consider two examples of [00:09:27] rewards let's consider two examples of reward hacking in the first a [00:09:29] reward hacking in the first a reinforcement learning agent that was [00:09:31] reinforcement learning agent that was designed to move a block to a certain [00:09:33] designed to move a block to a certain position on the table learned to move [00:09:35] position on the table learned to move the table rather than the Block in the [00:09:38] the table rather than the Block in the second Chad gbt made up fake cases [00:09:40] second Chad gbt made up fake cases related to a prompt when it was asked to [00:09:43] related to a prompt when it was asked to deliver cases by a [00:09:45] deliver cases by a lawyer reward hacking arises from [00:09:48] lawyer reward hacking arises from misspecified [00:09:49] misspecified rewards when important aspects of the [00:09:51] rewards when important aspects of the reward have been left out in the design [00:09:53] reward have been left out in the design process leading to poor agent Behavior [00:09:57] process leading to poor agent Behavior one way to mitigate reward hacking is to [00:09:59] one way to mitigate reward hacking is to anticipate and penalize a possible [00:10:01] anticipate and penalize a possible misbehavior in advance but some things [00:10:04] misbehavior in advance but some things will be missed by human error addressing [00:10:06] will be missed by human error addressing these limitations is still an open [00:10:08] these limitations is still an open problem in AI [00:10:09] problem in AI research now we'll discuss negative side [00:10:13] research now we'll discuss negative side effects negative side effects arise when [00:10:16] effects negative side effects arise when an agent's Behavior while pursuing its [00:10:18] an agent's Behavior while pursuing its goals ends up conflicting with broader [00:10:21] goals ends up conflicting with broader societal values going back to the [00:10:23] societal values going back to the example of a cleaning robot the robot [00:10:25] example of a cleaning robot the robot might knock over a vase or push people [00:10:27] might knock over a vase or push people and pets out of the way because it can [00:10:29] and pets out of the way because it can clean Faster by doing so some examples [00:10:33] clean Faster by doing so some examples of negative side effects include an [00:10:35] of negative side effects include an autonomous agent that spashes water on [00:10:38] autonomous agent that spashes water on nearby pedestrians as it rolls by or an [00:10:40] nearby pedestrians as it rolls by or an AI system that completely displaces [00:10:43] AI system that completely displaces workers in a particular [00:10:45] workers in a particular industry negative side effects occur [00:10:48] industry negative side effects occur because the agent's model and objective [00:10:50] because the agent's model and objective function focus on some aspects of the [00:10:52] function focus on some aspects of the environment over others this can happen [00:10:54] environment over others this can happen either because of misalignment [00:10:56] either because of misalignment distributional shifts or the agent [00:10:58] distributional shifts or the agent having incomplete knowledge misaligned [00:11:01] having incomplete knowledge misaligned systems are more likely to produce [00:11:03] systems are more likely to produce negative side effects because they are [00:11:05] negative side effects because they are not aligned with users intentions and [00:11:07] not aligned with users intentions and goals however negative side effects can [00:11:10] goals however negative side effects can occur even in contexts where the agent [00:11:12] occur even in contexts where the agent optimizes values that align with users [00:11:14] optimizes values that align with users objectives for example if an AI system [00:11:17] objectives for example if an AI system is deployed in an environment that is [00:11:18] is deployed in an environment that is different from the one it was tested in [00:11:21] different from the one it was tested in and does not have enough information [00:11:22] and does not have enough information about how to respond to a new scenario [00:11:25] about how to respond to a new scenario negative side effects may [00:11:27] negative side effects may occur ================================================================================ LECTURE 059 ================================================================================ Stanford CS221 I Encoding Human Values I 2023 Source: https://www.youtube.com/watch?v=aWAqgzXENr0 --- Transcript [00:00:06] hi and welcome to your embedded ethics [00:00:08] hi and welcome to your embedded ethics module on encoding human values in [00:00:12] module on encoding human values in technology let's begin by talking about [00:00:15] technology let's begin by talking about the main framework of values and design [00:00:18] the main framework of values and design now this entire framework is based on [00:00:20] now this entire framework is based on the idea that design decisions are [00:00:24] the idea that design decisions are expressive of what we care about when we [00:00:27] expressive of what we care about when we create a certain technology and we [00:00:29] create a certain technology and we design the features of that technology [00:00:32] design the features of that technology those decisions encode our values [00:00:36] those decisions encode our values including efficiency privacy Beauty [00:00:38] including efficiency privacy Beauty truth fairness sustainability and [00:00:43] others take the example of Pi this is a [00:00:46] others take the example of Pi this is a personal AI or a conversational B that [00:00:49] personal AI or a conversational B that was designed more or less recently by [00:00:52] was designed more or less recently by the company inflection [00:00:53] the company inflection AI now a personal AI is designed to [00:00:57] AI now a personal AI is designed to provide a conversational partner for for [00:01:00] provide a conversational partner for for people who cannot or for some reason [00:01:02] people who cannot or for some reason don't want to have conversations with [00:01:05] don't want to have conversations with other human beings now this kind of [00:01:08] other human beings now this kind of Technology may be driven by values such [00:01:10] Technology may be driven by values such as empathy respect solidarity kindness [00:01:14] as empathy respect solidarity kindness support [00:01:18] Etc you can see how these values are [00:01:22] Etc you can see how these values are quite apparent in the kind of language [00:01:25] quite apparent in the kind of language that is generated by the chatbot and the [00:01:26] that is generated by the chatbot and the kinds of conversations that it offers to [00:01:28] kinds of conversations that it offers to users it is embedded into the user [00:01:32] users it is embedded into the user interface and all kinds of features that [00:01:35] interface and all kinds of features that are part of the [00:01:38] are part of the technology now in the case of Pi values [00:01:41] technology now in the case of Pi values emerge quite clearly from the designer's [00:01:44] emerge quite clearly from the designer's definition of the problem that the [00:01:46] definition of the problem that the technology tries to address and the [00:01:50] technology tries to address and the specification of design features that [00:01:52] specification of design features that allow the product to uh solve that [00:01:56] allow the product to uh solve that problem now these um um decisions on the [00:02:01] problem now these um um decisions on the part of the designer interact with users [00:02:04] part of the designer interact with users perceptions and the broader context in [00:02:07] perceptions and the broader context in which the design is [00:02:11] employed now when trying to identify the [00:02:14] employed now when trying to identify the values coded into any form of Technology [00:02:16] values coded into any form of Technology there's two things that we should do [00:02:19] there's two things that we should do locate these values and Define them so [00:02:23] locate these values and Define them so let me talk about each of these in [00:02:26] let me talk about each of these in turn to locate values we need to look at [00:02:29] turn to locate values we need to look at the key INF uences that shape the design [00:02:32] the key INF uences that shape the design process this includes looking at who are [00:02:35] process this includes looking at who are the key actors the designers of course [00:02:38] the key actors the designers of course but also who are the shareholders who [00:02:40] but also who are the shareholders who are the [00:02:41] are the users then there's the functional [00:02:44] users then there's the functional description what problem or need is this [00:02:48] description what problem or need is this technology addressing it may turn out to [00:02:51] technology addressing it may turn out to be that the very problem or the very [00:02:53] be that the very problem or the very Target that is sought by the technology [00:02:55] Target that is sought by the technology is specified in terms of values take the [00:02:59] is specified in terms of values take the example of privacy enhancing web [00:03:02] example of privacy enhancing web browsers now one thing we should also [00:03:05] browsers now one thing we should also look at is constraints these constraints [00:03:08] look at is constraints these constraints may be economic they may be technical [00:03:10] may be economic they may be technical they may be commercial they may be legal [00:03:12] they may be commercial they may be legal and all of these shape design decisions [00:03:15] and all of these shape design decisions that will then end up embedding one [00:03:17] that will then end up embedding one value or another into the technology and [00:03:20] value or another into the technology and lastly there's societal input so culture [00:03:24] lastly there's societal input so culture and social moras will shape what we can [00:03:27] and social moras will shape what we can do and how it is interpreted by [00:03:31] do and how it is interpreted by users now having identified these [00:03:34] users now having identified these sources of influence you want to [00:03:36] sources of influence you want to consider how they shape the design in [00:03:38] consider how they shape the design in what ways they Channel users and others [00:03:41] what ways they Channel users and others interpretations of the values that are [00:03:43] interpretations of the values that are embedded in that [00:03:47] design now here I want to call your [00:03:49] design now here I want to call your attention to a very important concept [00:03:53] attention to a very important concept that helenm bom refers to as collateral [00:03:57] that helenm bom refers to as collateral values these are values that crop up as [00:04:00] values these are values that crop up as side effects of design decisions even [00:04:02] side effects of design decisions even though they're not intended by designers [00:04:05] though they're not intended by designers so if you remember the values that are [00:04:08] so if you remember the values that are coded into a product like Pi they're [00:04:12] coded into a product like Pi they're quite explicit and quite deliberate now [00:04:16] quite explicit and quite deliberate now collateral values are not like that they [00:04:20] collateral values are not like that they arise or they emerge from the way in [00:04:23] arise or they emerge from the way in which the design interacts with the [00:04:25] which the design interacts with the world now these are important because [00:04:28] world now these are important because they may drive serious wedges between [00:04:30] they may drive serious wedges between what designers intend to express and [00:04:32] what designers intend to express and what they end up expressing in their [00:04:34] what they end up expressing in their design they drive wedges between [00:04:37] design they drive wedges between intentions and actual [00:04:39] intentions and actual impacts in the [00:04:41] impacts in the world one particular way in which values [00:04:44] world one particular way in which values may unintendedly crop up is through [00:04:48] may unintendedly crop up is through standardization standardization happens [00:04:50] standardization standardization happens when we make implicit assumptions about [00:04:53] when we make implicit assumptions about who the standard user is for a given [00:04:56] who the standard user is for a given technology or who the standard person [00:04:59] technology or who the standard person person is who is going to benefit from [00:05:02] person is who is going to benefit from this [00:05:02] this technology now this is crucial because [00:05:05] technology now this is crucial because default assumptions are often a [00:05:07] default assumptions are often a reflection of existing power imbalances [00:05:10] reflection of existing power imbalances and when they go unquestioned they [00:05:13] and when they go unquestioned they contribute to reinforce those imbalances [00:05:16] contribute to reinforce those imbalances by unwillingly discriminating against [00:05:18] by unwillingly discriminating against those who do not resemble that standard [00:05:22] those who do not resemble that standard user or the standard person that is [00:05:24] user or the standard person that is meant to benefit from this [00:05:27] meant to benefit from this technology biases play a crucial role in [00:05:31] technology biases play a crucial role in standardization and these may include [00:05:34] standardization and these may include pre-existing biases that are already in [00:05:36] pre-existing biases that are already in society in culture or in institutions [00:05:39] society in culture or in institutions technical biases or emergent [00:05:42] technical biases or emergent biases uh biases that emerge when the [00:05:46] biases uh biases that emerge when the product is used in a context that is [00:05:49] product is used in a context that is different from its original context of [00:05:54] use so when we assume that certain [00:05:59] use so when we assume that certain standard [00:06:00] standard user for a product is the person that [00:06:04] user for a product is the person that we're going to be designing for we [00:06:06] we're going to be designing for we assume that everybody else is a [00:06:08] assume that everybody else is a non-standard user and this places a [00:06:12] non-standard user and this places a burden of sorts on members of the groups [00:06:14] burden of sorts on members of the groups that we fail to consider or that we fail [00:06:16] that we fail to consider or that we fail to include in our design because it is [00:06:20] to include in our design because it is harder for them to use the product and [00:06:22] harder for them to use the product and to benefit from it sometimes that's okay [00:06:25] to benefit from it sometimes that's okay every single decision that we make [00:06:26] every single decision that we make places burdens on some people and not [00:06:29] places burdens on some people and not others [00:06:30] others every single decision that we make [00:06:32] every single decision that we make provides benefits to some people and not [00:06:34] provides benefits to some people and not others but these decisions Aggregate and [00:06:38] others but these decisions Aggregate and if all of the burdens fall on the same [00:06:41] if all of the burdens fall on the same group or groups of people then we enter [00:06:44] group or groups of people then we enter into problems of distributive [00:06:47] into problems of distributive justice [00:06:49] justice now a lot of decisions about what can be [00:06:52] now a lot of decisions about what can be offered and what needs to be addressed [00:06:54] offered and what needs to be addressed maybe constrained by technical or [00:06:56] maybe constrained by technical or economic consideration but some aren't [00:06:59] economic consideration but some aren't and it's it's important that we treat [00:07:00] and it's it's important that we treat them as decisions as decisions about who [00:07:04] them as decisions as decisions about who we are benefiting when we design [00:07:06] we are benefiting when we design technology in a certain [00:07:08] technology in a certain way here's one interesting example of [00:07:11] way here's one interesting example of how bias can be encoded into an AI [00:07:14] how bias can be encoded into an AI system um and how bias can be encoded by [00:07:17] system um and how bias can be encoded by taking one default [00:07:20] taking one default Viewpoint so perspective API offers an [00:07:24] Viewpoint so perspective API offers an automated detection of toxic contents [00:07:27] automated detection of toxic contents and toxicity mitigation [00:07:29] and toxicity mitigation which is crucial when you're building a [00:07:32] which is crucial when you're building a large language model but also for tasks [00:07:34] large language model but also for tasks like semi-automated content moderation [00:07:37] like semi-automated content moderation that happens in digital platforms or in [00:07:41] that happens in digital platforms or in uh public forums of all [00:07:44] uh public forums of all kinds now in a paper that was published [00:07:48] kinds now in a paper that was published a couple of years ago by researchers [00:07:50] a couple of years ago by researchers from Deep Mind they analyzed the [00:07:53] from Deep Mind they analyzed the toxicity scores generated by prospective [00:07:56] toxicity scores generated by prospective API and how they impacted different [00:07:59] API and how they impacted different groups groups of [00:08:00] groups groups of people one of their key insights was to [00:08:03] people one of their key insights was to look into the definition of toxicity if [00:08:06] look into the definition of toxicity if you see on the screen the definition of [00:08:09] you see on the screen the definition of toxicity is rude disrespectful or [00:08:11] toxicity is rude disrespectful or unreasonable language that is likely to [00:08:13] unreasonable language that is likely to make someone leave a [00:08:15] make someone leave a discussion now the authors of this paper [00:08:18] discussion now the authors of this paper noticed that this definition already [00:08:20] noticed that this definition already builds biases into the system for one [00:08:24] builds biases into the system for one thing it is somewhat subjective and [00:08:27] thing it is somewhat subjective and dependent on the cultural background of [00:08:29] dependent on the cultural background of whoever is rating a piece of content and [00:08:31] whoever is rating a piece of content and the kinds of sensitivities that they [00:08:33] the kinds of sensitivities that they might have for another thing it covers [00:08:36] might have for another thing it covers only a subset of possibly harmful [00:08:39] only a subset of possibly harmful content it does not cover for instance [00:08:41] content it does not cover for instance harmful stereotypes that may be [00:08:43] harmful stereotypes that may be perpetuated by something like an [00:08:46] perpetuated by something like an llm but most importantly this definition [00:08:49] llm but most importantly this definition prioritizes the interests of the main [00:08:52] prioritizes the interests of the main custumers for perspective API namely the [00:08:55] custumers for perspective API namely the digital platforms that are using these [00:09:00] digital platforms that are using these um toxicity mitigation tools right for [00:09:03] um toxicity mitigation tools right for these customers their business model [00:09:07] these customers their business model depends on user [00:09:08] depends on user engagement right so a definition of [00:09:11] engagement right so a definition of toxicity that focuses on what makes [00:09:13] toxicity that focuses on what makes somebody leave a discussion is one that [00:09:16] somebody leave a discussion is one that seeks to uh protect the interests of [00:09:19] seeks to uh protect the interests of those customers however this definition [00:09:21] those customers however this definition does not necessarily build in the [00:09:23] does not necessarily build in the interests of users [00:09:26] interests of users themselves so there's one group that is [00:09:28] themselves so there's one group that is being prioritized whose interests are [00:09:30] being prioritized whose interests are being prioritized over others this [00:09:33] being prioritized over others this constitutes a good example of collateral [00:09:36] constitutes a good example of collateral values cropping [00:09:37] values cropping up through default assumptions that get [00:09:40] up through default assumptions that get built into something as basic as the [00:09:43] built into something as basic as the definition of the key metric in [00:09:45] definition of the key metric in perspective AI so once that we have [00:09:48] perspective AI so once that we have located the values that are explicitly [00:09:51] located the values that are explicitly or implicitly embedded in design [00:09:53] or implicitly embedded in design decisions it is important to Define [00:09:56] decisions it is important to Define values why is that because ethical and [00:09:59] values why is that because ethical and political values are abstract [00:10:02] political values are abstract controversial and difficult to [00:10:04] controversial and difficult to Define definition and analysis allows us [00:10:07] Define definition and analysis allows us to connect abstract [00:10:09] to connect abstract values that we want to encode or embed [00:10:13] values that we want to encode or embed in our Technologies to concrete design [00:10:16] in our Technologies to concrete design features if values are not well defined [00:10:19] features if values are not well defined products can entirely miss their marks [00:10:22] products can entirely miss their marks so it is not simply a matter of [00:10:23] so it is not simply a matter of philosophical exercises a way of [00:10:25] philosophical exercises a way of securing that you embed the values that [00:10:27] securing that you embed the values that you actually want to EMB embed into the [00:10:30] you actually want to EMB embed into the technologies that you [00:10:32] technologies that you create think about an example that I [00:10:36] create think about an example that I find really interesting so thinking [00:10:38] find really interesting so thinking about some of Microsoft's um early [00:10:41] about some of Microsoft's um early chatbots from around 10 years ago [00:10:45] chatbots from around 10 years ago Microsoft tried to create chat Bots that [00:10:47] Microsoft tried to create chat Bots that were inclusive to [00:10:49] were inclusive to people now if you want to make something [00:10:52] people now if you want to make something inclusive one possible definition of [00:10:55] inclusive one possible definition of inclusion is to make a technology that [00:10:59] inclusion is to make a technology that that is welcome welcoming um to any kind [00:11:03] that is welcome welcoming um to any kind of content and any kind of topic that [00:11:06] of content and any kind of topic that users are interested in that was the [00:11:09] users are interested in that was the approach that was taken by Microsoft [00:11:11] approach that was taken by Microsoft when they designed their now Infamous [00:11:15] when they designed their now Infamous Tai Tai was as you may know targeted by [00:11:18] Tai Tai was as you may know targeted by a campaign coordinated on forchan and [00:11:21] a campaign coordinated on forchan and ended up becoming highly racist and [00:11:23] ended up becoming highly racist and sexist in a matter of hours to the point [00:11:26] sexist in a matter of hours to the point that Microsoft had to take it down [00:11:29] that Microsoft had to take it down now after Tai Microsoft designed zo and [00:11:34] now after Tai Microsoft designed zo and zo had a different definition of [00:11:36] zo had a different definition of inclusion namely protecting vulnerable [00:11:39] inclusion namely protecting vulnerable users from insult and psychological harm [00:11:42] users from insult and psychological harm and the way that it did it was by [00:11:44] and the way that it did it was by blacklisting certain words of topics and [00:11:46] blacklisting certain words of topics and rejecting any conversation that um that [00:11:51] rejecting any conversation that um that went into those areas now that also led [00:11:54] went into those areas now that also led to the outright exclusion of users who [00:11:56] to the outright exclusion of users who wanted to talk about say being bullied [00:11:59] wanted to talk about say being bullied for religious reasons so that didn't [00:12:02] for religious reasons so that didn't quite do the job [00:12:04] quite do the job either the point here is that the [00:12:08] either the point here is that the definition of inclusion was something [00:12:09] definition of inclusion was something that needed to be considered more [00:12:11] that needed to be considered more carefully in this case so that the [00:12:15] carefully in this case so that the products did not miss the mark [00:12:17] products did not miss the mark completely as these two chatbots did now [00:12:20] completely as these two chatbots did now to finalize I want to talk about value [00:12:23] to finalize I want to talk about value conflicts now value conflicts are a [00:12:26] conflicts now value conflicts are a crucial part of the story because they [00:12:28] crucial part of the story because they are you liquidous um not just in [00:12:31] are you liquidous um not just in technology but everywhere we are used to [00:12:33] technology but everywhere we are used to discussing value conflicts in policy [00:12:35] discussing value conflicts in policy making or in politics but also in design [00:12:38] making or in politics but also in design decisions they're important they arise [00:12:40] decisions they're important they arise even in relatively uncontroversial [00:12:43] even in relatively uncontroversial contexts and this is not the result of [00:12:46] contexts and this is not the result of poor design it is the inevitable [00:12:48] poor design it is the inevitable consequence of [00:12:50] consequence of recognizing that different things matter [00:12:52] recognizing that different things matter to us to different degrees sometimes but [00:12:54] to us to different degrees sometimes but often to the same degree and that makes [00:12:56] often to the same degree and that makes it hard to choose between things that [00:12:59] it hard to choose between things that are truly important this is a result of [00:13:02] are truly important this is a result of value pluralism which we recognize as a [00:13:06] value pluralism which we recognize as a society now although these conflicts are [00:13:10] society now although these conflicts are everywhere and although they may appear [00:13:12] everywhere and although they may appear intractable this does not mean that we [00:13:14] intractable this does not mean that we should throw up our hands rather We [00:13:17] should throw up our hands rather We should strive to make deliberate [00:13:19] should strive to make deliberate conscientious and responsible choices in [00:13:22] conscientious and responsible choices in how we deal with these value [00:13:25] how we deal with these value conflicts the values and design [00:13:27] conflicts the values and design framework describes through three [00:13:29] framework describes through three different approaches to Value conflicts [00:13:31] different approaches to Value conflicts there's dissolving the conflict which [00:13:34] there's dissolving the conflict which means finding an alternative path that [00:13:37] means finding an alternative path that avoids the conflict [00:13:38] avoids the conflict entirely there's compromising which [00:13:41] entirely there's compromising which means making designed decisions that put [00:13:43] means making designed decisions that put boundaries on some values to protect [00:13:46] boundaries on some values to protect others and vice versa so finding some [00:13:49] others and vice versa so finding some kind of Middle Ground where some part of [00:13:52] kind of Middle Ground where some part of what we value on both sides may be [00:13:55] what we value on both sides may be attained and lastly there is trading off [00:13:59] attained and lastly there is trading off which means that we simply decide to [00:14:01] which means that we simply decide to prioritize one value and sacrifice [00:14:03] prioritize one value and sacrifice others for its [00:14:04] others for its sake now different situations may call [00:14:07] sake now different situations may call for different approaches but what [00:14:08] for different approaches but what matters again is that these decisions [00:14:11] matters again is that these decisions are made deliberately and responsibly [00:14:14] are made deliberately and responsibly that we know that we are making [00:14:16] that we know that we are making sacrifices and how these sacrifices are [00:14:19] sacrifices and how these sacrifices are likely to impact different people so [00:14:21] likely to impact different people so that we can to the best of our capacity [00:14:24] that we can to the best of our capacity mitigate the negative impacts of our [00:14:27] mitigate the negative impacts of our decisions and that is so thank you very [00:14:34] much ================================================================================ LECTURE 060 ================================================================================ Stanford CS221 I Algorithms and Distribution I 2023 Source: https://www.youtube.com/watch?v=olhFrDHP7iU --- Transcript [00:00:05] hi everybody my name is di Costa I'm [00:00:07] hi everybody my name is di Costa I'm your embedded ethics fellow for [00:00:10] your embedded ethics fellow for cs221 Welcome to our first mini video [00:00:14] cs221 Welcome to our first mini video lecture uh during this term we're going [00:00:16] lecture uh during this term we're going to be pairing short video lectures to [00:00:19] to be pairing short video lectures to the assignments that contain ethics [00:00:21] the assignments that contain ethics questions that you can use as reference [00:00:24] questions that you can use as reference for these uh assignments and in the [00:00:27] for these uh assignments and in the future right now we're going to be [00:00:29] future right now we're going to be talking about algorithms and [00:00:31] talking about algorithms and distribution when you consider decision [00:00:34] distribution when you consider decision making from an algorithmic point of view [00:00:37] making from an algorithmic point of view different algorithms may lead to [00:00:39] different algorithms may lead to different distributions of benefits and [00:00:41] different distributions of benefits and burdens in a [00:00:42] burdens in a population what we're hoping that you [00:00:45] population what we're hoping that you ask yourself with this assignment [00:00:46] ask yourself with this assignment question is how to evaluate these [00:00:49] question is how to evaluate these distributions from an ethical [00:00:51] distributions from an ethical perspective and for that you need to [00:00:53] perspective and for that you need to appeal to a field [00:00:55] appeal to a field of moral and political philosophy that [00:00:57] of moral and political philosophy that is known as distributive justice [00:01:01] is known as distributive justice the principles of distributive justice [00:01:03] the principles of distributive justice are those that provide moral guidance [00:01:05] are those that provide moral guidance for the processes and structures that [00:01:06] for the processes and structures that affect the distribution of benefits and [00:01:09] affect the distribution of benefits and burdens in Societies or among [00:01:12] burdens in Societies or among populations this is taken from the [00:01:14] populations this is taken from the Stanford Encyclopedia of philosophy [00:01:17] Stanford Encyclopedia of philosophy principles of distributive justice are [00:01:19] principles of distributive justice are applicable to all kinds of decisions [00:01:21] applicable to all kinds of decisions that generate distributions of burdens [00:01:23] that generate distributions of burdens and benefits which may be algorithmic or [00:01:28] otherwise [00:01:30] otherwise now what I'm going to do now is I'm [00:01:33] now what I'm going to do now is I'm going to give you a list of principles [00:01:36] going to give you a list of principles of distributive justice that you can use [00:01:38] of distributive justice that you can use to evaluate different courses of action [00:01:42] to evaluate different courses of action please keep in mind that these are [00:01:43] please keep in mind that these are simplified and highly intuitive versions [00:01:47] simplified and highly intuitive versions of principles uh this is also a non [00:01:49] of principles uh this is also a non exhaustive list of principles of [00:01:51] exhaustive list of principles of distributive justice there are much uh [00:01:54] distributive justice there are much uh many more principles that you can appeal [00:01:57] many more principles that you can appeal to when considering how to distribute [00:02:00] to when considering how to distribute burdens and benefits if you're [00:02:02] burdens and benefits if you're interested in finding more information [00:02:04] interested in finding more information you can look at the footnotes uh on your [00:02:06] you can look at the footnotes uh on your assignment which link to various [00:02:09] assignment which link to various resources where you can find a lot more [00:02:11] resources where you can find a lot more in-depth information about distributive [00:02:14] in-depth information about distributive justice now before jumping in um it's [00:02:18] justice now before jumping in um it's important to think about the definition [00:02:21] important to think about the definition of what a principle is so think of a [00:02:23] of what a principle is so think of a moral principle as a kind of Norm that [00:02:26] moral principle as a kind of Norm that dictates a policy or a course of action [00:02:29] dictates a policy or a course of action in a given [00:02:31] in a given situation what I'm going to do now is [00:02:33] situation what I'm going to do now is I'm going to introduce three principles [00:02:35] I'm going to introduce three principles of distributive justice by explaining [00:02:38] of distributive justice by explaining how they support different courses of [00:02:40] how they support different courses of action in a particular decision [00:02:43] action in a particular decision scenario that decision scenario is going [00:02:46] scenario that decision scenario is going to be the distribution of vaccines that [00:02:49] to be the distribution of vaccines that is going to be our toy example in this [00:02:52] is going to be our toy example in this mini [00:02:53] mini lecture so having a limited number of [00:02:56] lecture so having a limited number of vaccines and having a large population [00:02:59] vaccines and having a large population how do you allocate them how do you [00:03:01] how do you allocate them how do you allocate those vaccines among the [00:03:04] allocate those vaccines among the people that um you need to serve now in [00:03:08] people that um you need to serve now in reality one would need to consider the [00:03:10] reality one would need to consider the question at a certain level either a [00:03:12] question at a certain level either a global or national or local [00:03:15] global or national or local level uh because we will not look at the [00:03:18] level uh because we will not look at the details of how this happened in the real [00:03:20] details of how this happened in the real world or in any specific settings uh [00:03:24] world or in any specific settings uh we're rather going to ask ourselves the [00:03:25] we're rather going to ask ourselves the question in the abstract form um but [00:03:29] question in the abstract form um but than about the local level so one [00:03:32] than about the local level so one potential policy would be to distribute [00:03:36] potential policy would be to distribute vaccines in a way that ensures that as [00:03:39] vaccines in a way that ensures that as many people as possible get vaccinated [00:03:42] many people as possible get vaccinated regardless who they are uh this policy [00:03:46] regardless who they are uh this policy uh would ensure that the highest [00:03:48] uh would ensure that the highest quantity of people were vaccinated at [00:03:51] quantity of people were vaccinated at the lowest cost and in the shortest time [00:03:53] the lowest cost and in the shortest time frame you could achieve this by say um [00:03:58] frame you could achieve this by say um setting vaccination centers in densely [00:04:00] setting vaccination centers in densely populated areas of the city so that you [00:04:02] populated areas of the city so that you get as many people as [00:04:05] get as many people as possible that policy would be supported [00:04:08] possible that policy would be supported by a moral principle that focused on [00:04:11] by a moral principle that focused on maximizing well-being that is on [00:04:13] maximizing well-being that is on securing the greatest net [00:04:15] securing the greatest net benefit this principle is framed by a [00:04:18] benefit this principle is framed by a philosophical framework known as [00:04:20] philosophical framework known as utilitarianism which is a kind of [00:04:23] utilitarianism which is a kind of consequentialism According to which the [00:04:25] consequentialism According to which the right action to perform in any given [00:04:27] right action to perform in any given circumstance is that which maximizes [00:04:29] circumstance is that which maximizes utility [00:04:30] utility that is the action that in the aggregate [00:04:33] that is the action that in the aggregate causes the highest net benefit or brings [00:04:37] causes the highest net benefit or brings about the highest net [00:04:39] about the highest net benefit a second course of action would [00:04:42] benefit a second course of action would be to ensure that the most vulnerable [00:04:44] be to ensure that the most vulnerable populations have access to vaccines [00:04:47] populations have access to vaccines before anybody else you can determine [00:04:50] before anybody else you can determine this on the basis of age of race or [00:04:52] this on the basis of age of race or class or cul abilities this course of [00:04:55] class or cul abilities this course of action would be supported by a principle [00:04:58] action would be supported by a principle that focused on prioritizing ing those [00:05:00] that focused on prioritizing ing those who are worst off that is choosing [00:05:03] who are worst off that is choosing distributions that ensure that those who [00:05:06] distributions that ensure that those who are the worst off are served first there [00:05:10] are the worst off are served first there are different versions of this principle [00:05:12] are different versions of this principle that fit under different philosophical [00:05:14] that fit under different philosophical Frameworks such as prioritarianism which [00:05:17] Frameworks such as prioritarianism which mandates that we give priority to the [00:05:19] mandates that we give priority to the well-being of individuals who are worse [00:05:21] well-being of individuals who are worse off or roles is different principle [00:05:24] off or roles is different principle According to which any inequality in the [00:05:26] According to which any inequality in the distribution of social goods should be [00:05:28] distribution of social goods should be such that it benefits fits those who are [00:05:30] such that it benefits fits those who are worst [00:05:31] worst off the third policy that we're going to [00:05:34] off the third policy that we're going to consider uh is one that dictates that [00:05:37] consider uh is one that dictates that you vaccinate the members of [00:05:38] you vaccinate the members of historically marginalized group first by [00:05:41] historically marginalized group first by for instance uh setting vaccination [00:05:44] for instance uh setting vaccination sites in minority [00:05:46] sites in minority neighborhoods why would you do this [00:05:48] neighborhoods why would you do this think about it this way by delaying [00:05:50] think about it this way by delaying vaccination to populations that have [00:05:52] vaccination to populations that have been historically marginalized you are [00:05:55] been historically marginalized you are placing an additional burden on the [00:05:58] placing an additional burden on the members of these groups groups by say [00:06:01] members of these groups groups by say inhibiting their ability to return to [00:06:02] inhibiting their ability to return to work and secure income for themselves [00:06:04] work and secure income for themselves and their family and this compounds the [00:06:07] and their family and this compounds the effects of historical [00:06:11] effects of historical discrimination this [00:06:13] discrimination this policy would be supported by a principle [00:06:16] policy would be supported by a principle that focused on avoiding a course of [00:06:18] that focused on avoiding a course of action that disproportionately burdens [00:06:20] action that disproportionately burdens members of marginalized communities some [00:06:23] members of marginalized communities some have called this the anti-omp founding [00:06:25] have called this the anti-omp founding Injustice principle and it is driven by [00:06:27] Injustice principle and it is driven by the idea that algorithmic decisions [00:06:29] the idea that algorithmic decisions systems should be deliberately [00:06:32] systems should be deliberately focusing on avoiding contributing to [00:06:35] focusing on avoiding contributing to historical Injustice and [00:06:39] historical Injustice and discrimination now to [00:06:41] discrimination now to summarize I have presented you with [00:06:43] summarize I have presented you with three principles of distributive justice [00:06:45] three principles of distributive justice as I said intuitive versions of these [00:06:48] as I said intuitive versions of these principles one that focuses on [00:06:50] principles one that focuses on maximizing well-being one that focuses [00:06:52] maximizing well-being one that focuses on prioritizing those who are worse off [00:06:55] on prioritizing those who are worse off and one that focuses on avoiding [00:06:57] and one that focuses on avoiding compounding historical Injustice [00:07:00] compounding historical Injustice please remember again this is an [00:07:02] please remember again this is an intuitive and not an exhaustive list of [00:07:05] intuitive and not an exhaustive list of St Justice [00:07:07] St Justice principles but what matters here is that [00:07:10] principles but what matters here is that these principles are applicable to [00:07:12] these principles are applicable to distributions of benefits and burdens [00:07:14] distributions of benefits and burdens through algorithmic decision [00:07:24] making ================================================================================ LECTURE INDEX.md ================================================================================ CS221 – Artificial Intelligence Playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rOca_Ovz1DvdtWuz8BfSWL2 Total Videos: 60 Transcripts Downloaded: 60 Failed/No Captions: 0 --- Lectures 1. General Intro | Stanford CS221: Artificial Intelligence: Principles and Techniques (Autumn 2021) - Video: [https://www.youtube.com/watch?v=ZiwogMtbjr4](https://www.youtube.com/watch?v=ZiwogMtbjr4) - Transcript: [001_ZiwogMtbjr4.md](001_ZiwogMtbjr4.md) 2. AI History | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=z8fEXuH0mu0](https://www.youtube.com/watch?v=z8fEXuH0mu0) - Transcript: [002_z8fEXuH0mu0.md](002_z8fEXuH0mu0.md) 3. Artificial Intelligence Today | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=C0IhR4D5KYc](https://www.youtube.com/watch?v=C0IhR4D5KYc) - Transcript: [003_C0IhR4D5KYc.md](003_C0IhR4D5KYc.md) 4. Artificial Intelligence and Machine Learning 1 - Overview | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=mtrYwgIrRNk](https://www.youtube.com/watch?v=mtrYwgIrRNk) - Transcript: [004_mtrYwgIrRNk.md](004_mtrYwgIrRNk.md) 5. Artificial Intelligence & Machine Learning 2 - Linear Regression | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=nEWNNt2KmfQ](https://www.youtube.com/watch?v=nEWNNt2KmfQ) - Transcript: [005_nEWNNt2KmfQ.md](005_nEWNNt2KmfQ.md) 6. Artificial Intelligence & Machine learning 3 - Linear Classification | Stanford CS221 (Autumn 2021) - Video: [https://www.youtube.com/watch?v=WcaMiqJR09s](https://www.youtube.com/watch?v=WcaMiqJR09s) - Transcript: [006_WcaMiqJR09s.md](006_WcaMiqJR09s.md) 7. Artificial Intelligence & Machine Learning 4 - Stochastic Gradient Descent | Stanford CS221 (2021) - Video: [https://www.youtube.com/watch?v=bl2WgBLH0tI](https://www.youtube.com/watch?v=bl2WgBLH0tI) - Transcript: [007_bl2WgBLH0tI.md](007_bl2WgBLH0tI.md) 8. Artificial Intelligence and Machine Learning 5 - Group DRO | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=ZFK2XtWqUbw](https://www.youtube.com/watch?v=ZFK2XtWqUbw) - Transcript: [008_ZFK2XtWqUbw.md](008_ZFK2XtWqUbw.md) 9. Artificial Intelligence & Machine Learning 6 - Non Linear Features | Stanford CS221: AI(Autumn 2021) - Video: [https://www.youtube.com/watch?v=eIxbNkB4byY](https://www.youtube.com/watch?v=eIxbNkB4byY) - Transcript: [009_eIxbNkB4byY.md](009_eIxbNkB4byY.md) 10. Artificial Intelligence & Machine Learning 7 - Feature Templates | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=2QfSBLtvioE](https://www.youtube.com/watch?v=2QfSBLtvioE) - Transcript: [010_2QfSBLtvioE.md](010_2QfSBLtvioE.md) 11. Artificial Intelligence & Machine Learning 8 - Neural Networks | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=pnKXgBHuN58](https://www.youtube.com/watch?v=pnKXgBHuN58) - Transcript: [011_pnKXgBHuN58.md](011_pnKXgBHuN58.md) 12. Machine Learning 9 - Backpropagation | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=OcAF-l2xB9Y](https://www.youtube.com/watch?v=OcAF-l2xB9Y) - Transcript: [012_OcAF-l2xB9Y.md](012_OcAF-l2xB9Y.md) 13. Machine Learning 10 - Differentiable Programming | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=c5btEEisp_g](https://www.youtube.com/watch?v=c5btEEisp_g) - Transcript: [013_c5btEEisp_g.md](013_c5btEEisp_g.md) 14. Artificial Intelligence & Machine Learning 11 - Generalization | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=Gq-Ah-QrOQM](https://www.youtube.com/watch?v=Gq-Ah-QrOQM) - Transcript: [014_Gq-Ah-QrOQM.md](014_Gq-Ah-QrOQM.md) 15. Artificial Intelligence & Machine Learning 12 - Best Practices | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=ouvGV2YZEEM](https://www.youtube.com/watch?v=ouvGV2YZEEM) - Transcript: [015_ouvGV2YZEEM.md](015_ouvGV2YZEEM.md) 16. Machine Learning 13 - K-means | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=5-Fn8R9fH7A](https://www.youtube.com/watch?v=5-Fn8R9fH7A) - Transcript: [016_5-Fn8R9fH7A.md](016_5-Fn8R9fH7A.md) 17. Search 1 - Dynamic Programming, Uniform Cost Search | Stanford CS221: AI (Autumn 2019) - Video: [https://www.youtube.com/watch?v=aIsgJJYrlXk](https://www.youtube.com/watch?v=aIsgJJYrlXk) - Transcript: [017_aIsgJJYrlXk.md](017_aIsgJJYrlXk.md) 18. Search 2 - A* | Stanford CS221: Artificial Intelligence (Autumn 2019) - Video: [https://www.youtube.com/watch?v=HEs1ZCvLH2s](https://www.youtube.com/watch?v=HEs1ZCvLH2s) - Transcript: [018_HEs1ZCvLH2s.md](018_HEs1ZCvLH2s.md) 19. Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019) - Video: [https://www.youtube.com/watch?v=9g32v7bK3Co](https://www.youtube.com/watch?v=9g32v7bK3Co) - Transcript: [019_9g32v7bK3Co.md](019_9g32v7bK3Co.md) 20. Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019) - Video: [https://www.youtube.com/watch?v=HpaHTfY52RQ](https://www.youtube.com/watch?v=HpaHTfY52RQ) - Transcript: [020_HpaHTfY52RQ.md](020_HpaHTfY52RQ.md) 21. Game Playing 1 - Minimax, Alpha-beta Pruning | Stanford CS221: AI (Autumn 2019) - Video: [https://www.youtube.com/watch?v=3pU-Hrz_xy4](https://www.youtube.com/watch?v=3pU-Hrz_xy4) - Transcript: [021_3pU-Hrz_xy4.md](021_3pU-Hrz_xy4.md) 22. Game Playing 2 - TD Learning, Game Theory | Stanford CS221: Artificial Intelligence (Autumn 2019) - Video: [https://www.youtube.com/watch?v=WoFwXj4p4Sc](https://www.youtube.com/watch?v=WoFwXj4p4Sc) - Transcript: [022_WoFwXj4p4Sc.md](022_WoFwXj4p4Sc.md) 23. Constraint Satisfaction Problems (CSPs) 1 - Overview | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=-IO4fPO0rxk](https://www.youtube.com/watch?v=-IO4fPO0rxk) - Transcript: [023_-IO4fPO0rxk.md](023_-IO4fPO0rxk.md) 24. Constraint Satisfaction Problems (CSPs) 2 - Definitions | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=uj5wCcHsSlA](https://www.youtube.com/watch?v=uj5wCcHsSlA) - Transcript: [024_uj5wCcHsSlA.md](024_uj5wCcHsSlA.md) 25. Constraint Satisfaction Problems (CSPs) 3 - Examples | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=Tu6BiZhMDCc](https://www.youtube.com/watch?v=Tu6BiZhMDCc) - Transcript: [025_Tu6BiZhMDCc.md](025_Tu6BiZhMDCc.md) 26. Constraint Satisfaction Problems (CSPs) 4 - Dynamic Ordering | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=Lyu8VzbIe_A](https://www.youtube.com/watch?v=Lyu8VzbIe_A) - Transcript: [026_Lyu8VzbIe_A.md](026_Lyu8VzbIe_A.md) 27. Constraint Satisfaction Problems (CSPs) 5 - Arc Consistency | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=5rlIYGJdPy4](https://www.youtube.com/watch?v=5rlIYGJdPy4) - Transcript: [027_5rlIYGJdPy4.md](027_5rlIYGJdPy4.md) 28. Constraint Satisfaction Problems (CSPs) 6 - Beam Search | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=XuWMeIHGkus](https://www.youtube.com/watch?v=XuWMeIHGkus) - Transcript: [028_XuWMeIHGkus.md](028_XuWMeIHGkus.md) 29. Constraint Satisfaction Problems (CSPs) 7 - Local Search | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=VwZKPlK6jUg](https://www.youtube.com/watch?v=VwZKPlK6jUg) - Transcript: [029_VwZKPlK6jUg.md](029_VwZKPlK6jUg.md) 30. Markov Networks 1 - Overview | Stanford CS221: Artificial Intelligence (Autumn 2021) - Video: [https://www.youtube.com/watch?v=neeaJb3wCYw](https://www.youtube.com/watch?v=neeaJb3wCYw) - Transcript: [030_neeaJb3wCYw.md](030_neeaJb3wCYw.md) 31. Markov Networks 2 - Gibbs Sampling | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=k6aZZF2pk7k](https://www.youtube.com/watch?v=k6aZZF2pk7k) - Transcript: [031_k6aZZF2pk7k.md](031_k6aZZF2pk7k.md) 32. Bayesian Networks 1 - Overview | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=fA7zP6EcVdw](https://www.youtube.com/watch?v=fA7zP6EcVdw) - Transcript: [032_fA7zP6EcVdw.md](032_fA7zP6EcVdw.md) 33. Bayesian Networks 2 - Definition | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=xvC6XmZmR_U](https://www.youtube.com/watch?v=xvC6XmZmR_U) - Transcript: [033_xvC6XmZmR_U.md](033_xvC6XmZmR_U.md) 34. Bayesian Networks 3 - Probabilistic Programming | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=ZVk8y1zVoD4](https://www.youtube.com/watch?v=ZVk8y1zVoD4) - Transcript: [034_ZVk8y1zVoD4.md](034_ZVk8y1zVoD4.md) 35. Bayesian Networks 4 - Probabilistic Inference | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=-dGOWB9Zh8s](https://www.youtube.com/watch?v=-dGOWB9Zh8s) - Transcript: [035_-dGOWB9Zh8s.md](035_-dGOWB9Zh8s.md) 36. Bayesian Networks 5 - Forward-backward Algorithm | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=N-ZPbpJOQs0](https://www.youtube.com/watch?v=N-ZPbpJOQs0) - Transcript: [036_N-ZPbpJOQs0.md](036_N-ZPbpJOQs0.md) 37. Bayesian Networks 6 - Particle Filtering | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=8sOtXbQIOuE](https://www.youtube.com/watch?v=8sOtXbQIOuE) - Transcript: [037_8sOtXbQIOuE.md](037_8sOtXbQIOuE.md) 38. Bayesian Networks 7 - Supervised Learning | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=_rbDjsJTgm8](https://www.youtube.com/watch?v=_rbDjsJTgm8) - Transcript: [038__rbDjsJTgm8.md](038__rbDjsJTgm8.md) 39. Bayesian Networks 8 - Smoothing | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=M7rWvN_0xbw](https://www.youtube.com/watch?v=M7rWvN_0xbw) - Transcript: [039_M7rWvN_0xbw.md](039_M7rWvN_0xbw.md) 40. Bayesian Networks 9 - EM Algorithm | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=CPVFJBd-Qcg](https://www.youtube.com/watch?v=CPVFJBd-Qcg) - Transcript: [040_CPVFJBd-Qcg.md](040_CPVFJBd-Qcg.md) 41. Logic 1 - Overview: Logic Based Models | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=oM5LUGPO7Zk](https://www.youtube.com/watch?v=oM5LUGPO7Zk) - Transcript: [041_oM5LUGPO7Zk.md](041_oM5LUGPO7Zk.md) 42. Logic 2 - Propositional Logic Syntax | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=LBjNaewGJzk](https://www.youtube.com/watch?v=LBjNaewGJzk) - Transcript: [042_LBjNaewGJzk.md](042_LBjNaewGJzk.md) 43. Logic 3 - Propositional Logic Semantics | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=N37yIn1jX98](https://www.youtube.com/watch?v=N37yIn1jX98) - Transcript: [043_N37yIn1jX98.md](043_N37yIn1jX98.md) 44. Logic 4 - Inference Rules | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=RIk67yGMVv4](https://www.youtube.com/watch?v=RIk67yGMVv4) - Transcript: [044_RIk67yGMVv4.md](044_RIk67yGMVv4.md) 45. Logic 5 - Propositional Modus Ponens | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=6bj4z2mt1KE](https://www.youtube.com/watch?v=6bj4z2mt1KE) - Transcript: [045_6bj4z2mt1KE.md](045_6bj4z2mt1KE.md) 46. Logic 6 - Propositional Resolutions | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=egLAF4dFdBo](https://www.youtube.com/watch?v=egLAF4dFdBo) - Transcript: [046_egLAF4dFdBo.md](046_egLAF4dFdBo.md) 47. Logic 7 - First Order Logic | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=Z-O0Q3_oTJM](https://www.youtube.com/watch?v=Z-O0Q3_oTJM) - Transcript: [047_Z-O0Q3_oTJM.md](047_Z-O0Q3_oTJM.md) 48. Logic 8 - First Order Modus Ponens | Stanford CS221: Artificial Intelligence (Autumn 2021) - Video: [https://www.youtube.com/watch?v=mndzhfBpyUw](https://www.youtube.com/watch?v=mndzhfBpyUw) - Transcript: [048_mndzhfBpyUw.md](048_mndzhfBpyUw.md) 49. Logic 9 - First Order Resolution | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=iG_tz7ZjZAI](https://www.youtube.com/watch?v=iG_tz7ZjZAI) - Transcript: [049_iG_tz7ZjZAI.md](049_iG_tz7ZjZAI.md) 50. Logic 10 - Recap | Stanford CS221: Artificial Intelligence (Autumn 2021) - Video: [https://www.youtube.com/watch?v=LYsOjtmLpPo](https://www.youtube.com/watch?v=LYsOjtmLpPo) - Transcript: [050_LYsOjtmLpPo.md](050_LYsOjtmLpPo.md) 51. AI and Law I Mariano-Florentino Cuéllar, President of the Carnegie Endowment for International Peace - Video: [https://www.youtube.com/watch?v=_-hBu3_Jz-0](https://www.youtube.com/watch?v=_-hBu3_Jz-0) - Transcript: [051__-hBu3_Jz-0.md](051__-hBu3_Jz-0.md) 52. Stanford Fireside Talks: Robustness in Machine Learning I Robust Machine Learning - Video: [https://www.youtube.com/watch?v=xr8AHGlieOE](https://www.youtube.com/watch?v=xr8AHGlieOE) - Transcript: [052_xr8AHGlieOE.md](052_xr8AHGlieOE.md) 53. Fireside Talks: State of Robotics I Automation and Robotics Engineering Lectures - Stanford - Video: [https://www.youtube.com/watch?v=hVsR9DdR3qE](https://www.youtube.com/watch?v=hVsR9DdR3qE) - Transcript: [053_hVsR9DdR3qE.md](053_hVsR9DdR3qE.md) 54. Stanford Talk: Inequality in Healthcare, AI & Data Science to Reduce Inequality - Improve Healthcare - Video: [https://www.youtube.com/watch?v=0IZhDmh1dmI](https://www.youtube.com/watch?v=0IZhDmh1dmI) - Transcript: [054_0IZhDmh1dmI.md](054_0IZhDmh1dmI.md) 55. Fireside Talks: Artificial Intelligence (AI) and Language - Video: [https://www.youtube.com/watch?v=pI72PseZQo8](https://www.youtube.com/watch?v=pI72PseZQo8) - Transcript: [055_pI72PseZQo8.md](055_pI72PseZQo8.md) 56. General Conclusion | Stanford CS221: AI (Autumn 2021) - Video: [https://www.youtube.com/watch?v=iUGmupxCdjs](https://www.youtube.com/watch?v=iUGmupxCdjs) - Transcript: [056_iUGmupxCdjs.md](056_iUGmupxCdjs.md) 57. Stanford CS221 I Externalities and Dual-Use Technologies I 2023 - Video: [https://www.youtube.com/watch?v=2xQLCXqOtdU](https://www.youtube.com/watch?v=2xQLCXqOtdU) - Transcript: [057_2xQLCXqOtdU.md](057_2xQLCXqOtdU.md) 58. Stanford CS221 I The AI Alignment Problem: Reward Hacking & Negative Side Effects I 2023 - Video: [https://www.youtube.com/watch?v=5WHObJWE1FE](https://www.youtube.com/watch?v=5WHObJWE1FE) - Transcript: [058_5WHObJWE1FE.md](058_5WHObJWE1FE.md) 59. Stanford CS221 I Encoding Human Values I 2023 - Video: [https://www.youtube.com/watch?v=aWAqgzXENr0](https://www.youtube.com/watch?v=aWAqgzXENr0) - Transcript: [059_aWAqgzXENr0.md](059_aWAqgzXENr0.md) 60. Stanford CS221 I Algorithms and Distribution I 2023 - Video: [https://www.youtube.com/watch?v=olhFrDHP7iU](https://www.youtube.com/watch?v=olhFrDHP7iU) - Transcript: [060_olhFrDHP7iU.md](060_olhFrDHP7iU.md)