================================================================================
LECTURE 001
================================================================================

General Intro | Stanford CS221: Artificial Intelligence: Principles and Techniques (Autumn 2021)

Source: https://www.youtube.com/watch?v=ZiwogMtbjr4

---

Transcript

[00:00:05] okay hello everyone i'm dorothy static
[00:00:07] okay hello everyone i'm dorothy static uh and i am one of the co-instructors of
[00:00:10] uh and i am one of the co-instructors of cs221 and uh today i'm here with percy
[00:00:14] cs221 and uh today i'm here with percy liang and our group of
[00:00:16] liang and our group of tas here to teach uh the first lecture
[00:00:19] tas here to teach uh the first lecture of 221
[00:00:21] of 221 so with that i would like to first
[00:00:24] so with that i would like to first before getting started in the details of
[00:00:26] before getting started in the details of the class just introduce uh the team so
[00:00:29] the class just introduce uh the team so i am dorothy sadik i am an assistant
[00:00:31] i am dorothy sadik i am an assistant professor in computer science
[00:00:33] professor in computer science uh and i this is the fifth time i'm
[00:00:35] uh and i this is the fifth time i'm teaching cs221
[00:00:37] teaching cs221 uh the second time i'm teaching it
[00:00:39] uh the second time i'm teaching it online virtually uh i think it is the
[00:00:42] online virtually uh i think it is the third time i'm co-teaching with percy
[00:00:44] third time i'm co-teaching with percy liang so really excited uh to to start
[00:00:48] liang so really excited uh to to start the quarter with you guys here and and
[00:00:50] the quarter with you guys here and and my research just a little bit about my
[00:00:52] my research just a little bit about my research my work is on the
[00:00:54] research my work is on the in robotics and ai and in general very
[00:00:57] in robotics and ai and in general very interested in the interaction between
[00:00:59] interested in the interaction between robotics and ai agents with humans and
[00:01:02] robotics and ai agents with humans and with other agents other intelligent
[00:01:03] with other agents other intelligent agents so if these topics are of
[00:01:05] agents so if these topics are of interest come to office hours we'd love
[00:01:07] interest come to office hours we'd love to chat about that and talk about it
[00:01:09] to chat about that and talk about it offline in general too
[00:01:11] offline in general too uh my co-instructor today here is percy
[00:01:13] uh my co-instructor today here is percy liang i think i saw percy uh somewhere
[00:01:16] liang i think i saw percy uh somewhere yeah i'm here
[00:01:18] yeah i'm here hello everyone i'm percy i'm the
[00:01:20] hello everyone i'm percy i'm the co-instructor um and i think this is my
[00:01:24] co-instructor um and i think this is my ninth or tenth year teaching 221 it's
[00:01:27] ninth or tenth year teaching 221 it's really been interesting how ai has
[00:01:29] really been interesting how ai has evolved since since when i first started
[00:01:32] evolved since since when i first started talking
[00:01:33] talking about it
[00:01:34] about it my research interests are in machine
[00:01:36] my research interests are in machine learning and natural language processing
[00:01:38] learning and natural language processing thinking about how to make systems more
[00:01:40] thinking about how to make systems more robust and
[00:01:41] robust and trustworthy
[00:01:43] trustworthy recently i've been really fascinated by
[00:01:44] recently i've been really fascinated by what we've been calling foundation
[00:01:46] what we've been calling foundation models uh models such as gp3 and bert
[00:01:49] models uh models such as gp3 and bert and dali
[00:01:50] and dali and i'm going to discuss that more later
[00:01:52] and i'm going to discuss that more later in the class
[00:01:54] in the class all right
[00:01:55] all right thank you percy
[00:01:57] thank you percy all right so um so what are we going to
[00:02:00] all right so um so what are we going to be talking about today so our plan for
[00:02:02] be talking about today so our plan for today is to talk a little bit about some
[00:02:04] today is to talk a little bit about some of the course logistics and then some of
[00:02:06] of the course logistics and then some of the course contents like what are we
[00:02:07] the course contents like what are we going to actually cover as part of this
[00:02:09] going to actually cover as part of this class then we'll have some icebreakers
[00:02:12] class then we'll have some icebreakers so we'll have like a five minute
[00:02:13] so we'll have like a five minute breakout room we'll discuss things about
[00:02:15] breakout room we'll discuss things about ai and then toward the end of the class
[00:02:17] ai and then toward the end of the class i'm going to talk a little bit about the
[00:02:19] i'm going to talk a little bit about the history of a brief history of ai and
[00:02:21] history of a brief history of ai and then what ai is today and what are some
[00:02:24] then what ai is today and what are some risks and benefits of ai and how we
[00:02:25] risks and benefits of ai and how we should think about it moving forward so
[00:02:28] should think about it moving forward so so that is our plan for today okay so
[00:02:31] so that is our plan for today okay so before i start also if there are any
[00:02:33] before i start also if there are any questions um
[00:02:34] questions um feel free to put questions on on zoom
[00:02:36] feel free to put questions on on zoom chat and or raise your hand and the ca
[00:02:39] chat and or raise your hand and the ca skin and can try to um kind of like
[00:02:42] skin and can try to um kind of like answer the questions or like ask the
[00:02:43] answer the questions or like ask the questions like throughout um as a as i
[00:02:45] questions like throughout um as a as i give as i give the talk
[00:02:48] give as i give the talk all right so so let's talk about course
[00:02:50] all right so so let's talk about course logistics
[00:02:52] so
[00:02:54] so so we're going to have a set of
[00:02:55] so we're going to have a set of activities as part of the class and and
[00:02:57] activities as part of the class and and last year when we had to go virtually
[00:02:59] last year when we had to go virtually because of because of covet we started
[00:03:01] because of because of covet we started experimenting with a few different ways
[00:03:03] experimenting with a few different ways of changing and reformatting the class
[00:03:05] of changing and reformatting the class and some of them worked really well and
[00:03:07] and some of them worked really well and some of them actually didn't work so
[00:03:09] some of them actually didn't work so well so based on the experience that we
[00:03:11] well so based on the experience that we have from last year we've decided to
[00:03:13] have from last year we've decided to switch up the activities a little bit
[00:03:14] switch up the activities a little bit more and also like some of some of these
[00:03:17] more and also like some of some of these some of these changes really make sense
[00:03:20] some of these changes really make sense to do also throughout like normal
[00:03:22] to do also throughout like normal quarter to even if
[00:03:23] quarter to even if even if you're not virtual so so one of
[00:03:25] even if you're not virtual so so one of these changes is going from traditional
[00:03:28] these changes is going from traditional lectures to something that you're
[00:03:29] lectures to something that you're calling modules so these are basically
[00:03:32] calling modules so these are basically pre-recorded modules and lectures that
[00:03:34] pre-recorded modules and lectures that are that are broken into small
[00:03:36] are that are broken into small bite-sized chunks so so on every topic
[00:03:39] bite-sized chunks so so on every topic that we are we are going to cover in
[00:03:40] that we are we are going to cover in this class we're going to have a lecture
[00:03:42] this class we're going to have a lecture of like 10 to 20 minutes and a module
[00:03:44] of like 10 to 20 minutes and a module really that goes over that topic and
[00:03:47] really that goes over that topic and these are pre-recorded and you're going
[00:03:48] these are pre-recorded and you're going to release these like the modules for
[00:03:51] to release these like the modules for that week on the monday of the week so
[00:03:53] that week on the monday of the week so you have time to to watch these kind of
[00:03:55] you have time to to watch these kind of like
[00:03:56] like based on your own schedule and when it
[00:03:58] based on your own schedule and when it makes sense for you to watch these
[00:04:00] makes sense for you to watch these lectures so it's a little bit easier to
[00:04:02] lectures so it's a little bit easier to manage these like bite-sized chunks
[00:04:04] manage these like bite-sized chunks that's one reason that you're going
[00:04:05] that's one reason that you're going towards these modules also these are
[00:04:07] towards these modules also these are pre-recorded and you're probably going
[00:04:09] pre-recorded and you're probably going to use the same recordings as as last
[00:04:11] to use the same recordings as as last year so so um so we have more time to
[00:04:13] year so so um so we have more time to spend with you guys during our lecture
[00:04:15] spend with you guys during our lecture times uh kind of in a flipped format
[00:04:17] times uh kind of in a flipped format okay
[00:04:18] okay so so that's the modules
[00:04:20] so so that's the modules then in addition to that during our
[00:04:22] then in addition to that during our normal lecture time you're going to have
[00:04:24] normal lecture time you're going to have two types of activities so on mondays
[00:04:26] two types of activities so on mondays you're going to have faculty chats so
[00:04:28] you're going to have faculty chats so these are going to be unzoom and they're
[00:04:31] these are going to be unzoom and they're basically small group discussions with
[00:04:33] basically small group discussions with faculty on on ai related topics so um
[00:04:36] faculty on on ai related topics so um they're going to be basically six like
[00:04:38] they're going to be basically six like 25 minute sessions so so every monday
[00:04:41] 25 minute sessions so so every monday from from 1 30 to 3 p.m uh percy and i
[00:04:44] from from 1 30 to 3 p.m uh percy and i each of us are going to have a zoom room
[00:04:46] each of us are going to have a zoom room like 30 minutes each for each session
[00:04:49] like 30 minutes each for each session and you're going to be assigned for one
[00:04:51] and you're going to be assigned for one of these faculty chats so this is
[00:04:53] of these faculty chats so this is actually mandatory to attend at least
[00:04:55] actually mandatory to attend at least like one of these faculty chats and the
[00:04:57] like one of these faculty chats and the reason we are doing this is
[00:04:58] reason we are doing this is traditionally when you have a large ai
[00:05:00] traditionally when you have a large ai course like you have like 300 something
[00:05:02] course like you have like 300 something enrollment and it's really hard to get
[00:05:04] enrollment and it's really hard to get to know you guys and actually like talk
[00:05:06] to know you guys and actually like talk to you and sometimes it becomes like
[00:05:07] to you and sometimes it becomes like really difficult to know the faculty
[00:05:09] really difficult to know the faculty when you're in some of these larger
[00:05:10] when you're in some of these larger classes and this is really like what
[00:05:13] classes and this is really like what what we are trying what we're trying to
[00:05:14] what we are trying what we're trying to do here is is really to to get to know
[00:05:17] do here is is really to to get to know you through some of these faculty chats
[00:05:18] you through some of these faculty chats and discuss like some of the ai related
[00:05:20] and discuss like some of the ai related topics some of the some of the more
[00:05:22] topics some of the some of the more recent material research material around
[00:05:25] recent material research material around newer topics like foundation models that
[00:05:27] newer topics like foundation models that percy is going to talk about or some of
[00:05:29] percy is going to talk about or some of the some of the topics around robotics
[00:05:31] the some of the topics around robotics autonomous driving and so on
[00:05:33] autonomous driving and so on so i'll talk about this a little bit
[00:05:35] so i'll talk about this a little bit later in the talk the exact format of it
[00:05:37] later in the talk the exact format of it and and and there is a little bit of a
[00:05:39] and and and there is a little bit of a homework to do beforehand before coming
[00:05:41] homework to do beforehand before coming to
[00:05:42] to to these
[00:05:43] to these sessions but but the idea is we'll have
[00:05:45] sessions but but the idea is we'll have these in pers not in person sorry
[00:05:47] these in pers not in person sorry virtual virtual faculty chats on mondays
[00:05:50] virtual virtual faculty chats on mondays and lecture time and one other point
[00:05:52] and lecture time and one other point that i want to mention is that if you
[00:05:54] that i want to mention is that if you have conflicts during lecture times you
[00:05:55] have conflicts during lecture times you want to make sure that you wouldn't have
[00:05:57] want to make sure that you wouldn't have conflict for the time that you're you're
[00:05:58] conflict for the time that you're you're assigned so it does actually because
[00:06:00] assigned so it does actually because that is mandatory so make sure that you
[00:06:02] that is mandatory so make sure that you actually don't have conflicts during
[00:06:03] actually don't have conflicts during lecture times
[00:06:05] lecture times all right
[00:06:06] all right so um the second uh bit is problem
[00:06:10] so um the second uh bit is problem session so this is going to be on
[00:06:11] session so this is going to be on wednesdays so on wednesdays again during
[00:06:13] wednesdays so on wednesdays again during class time you're going to have these
[00:06:16] class time you're going to have these problem sessions and they're kind of
[00:06:17] problem sessions and they're kind of like traditional sections except for we
[00:06:20] like traditional sections except for we have changed them a little bit based on
[00:06:21] have changed them a little bit based on that based on the feedback that we got
[00:06:23] that based on the feedback that we got last year and during these problem
[00:06:24] last year and during these problem sessions we're going to have the cas
[00:06:26] sessions we're going to have the cas work out practice problems so these
[00:06:28] work out practice problems so these could be previous years quizzes or
[00:06:31] could be previous years quizzes or previous years exam questions or
[00:06:33] previous years exam questions or basically just problems that can help
[00:06:35] basically just problems that can help you get started on your homework or or
[00:06:37] you get started on your homework or or basically get ready for for some of some
[00:06:39] basically get ready for for some of some of the exam questions later on so so i
[00:06:42] of the exam questions later on so so i do recommend going to these problem
[00:06:43] do recommend going to these problem sessions they're incredibly useful to to
[00:06:45] sessions they're incredibly useful to to get your hands dirty with some of the
[00:06:46] get your hands dirty with some of the topics that that you're learning that
[00:06:48] topics that that you're learning that week and again this is on zoom
[00:06:50] week and again this is on zoom on wednesdays class time
[00:06:53] on wednesdays class time all right so what else we're going to
[00:06:54] all right so what else we're going to also have um homework parties so
[00:06:57] also have um homework parties so homework parties uh used to be very
[00:06:59] homework parties uh used to be very popular when they were in person and
[00:07:01] popular when they were in person and last year i think it was a little bit
[00:07:02] last year i think it was a little bit more difficult to make it happen but
[00:07:05] more difficult to make it happen but eventually people realize that homework
[00:07:06] eventually people realize that homework parties like matter a lot because that's
[00:07:08] parties like matter a lot because that's a very good place to show up and and and
[00:07:11] a very good place to show up and and and work with other people on your homework
[00:07:12] work with other people on your homework problems get started on on on some of
[00:07:15] problems get started on on on some of some of the more challenging problems
[00:07:16] some of the more challenging problems together study together and the cas will
[00:07:19] together study together and the cas will be there to answer questions so these
[00:07:21] be there to answer questions so these homework parties are going to happen on
[00:07:23] homework parties are going to happen on nukes which is a platform that we
[00:07:24] nukes which is a platform that we started using last year again all the
[00:07:26] started using last year again all the information about zoom nukes like all
[00:07:28] information about zoom nukes like all these all these links are is on the
[00:07:31] these all these links are is on the cs221 website so so the details of all
[00:07:33] cs221 website so so the details of all everything i say today is also on the
[00:07:35] everything i say today is also on the website okay
[00:07:37] website okay all right so beyond homework parties we
[00:07:39] all right so beyond homework parties we also have office hours so the cas have
[00:07:42] also have office hours so the cas have office hours and there are two types of
[00:07:44] office hours and there are two types of office hours so um there there are a set
[00:07:46] office hours so um there there are a set of in-person office hours that i was
[00:07:48] of in-person office hours that i was talking about earlier these are limited
[00:07:50] talking about earlier these are limited but they're going to be in the basement
[00:07:51] but they're going to be in the basement of hwang and and they are group based
[00:07:54] of hwang and and they are group based there's no sign up required but but
[00:07:56] there's no sign up required but but basically there's a group of students
[00:07:57] basically there's a group of students and the ca at a at the basement of huang
[00:08:00] and the ca at a at the basement of huang in addition to that we also the majority
[00:08:02] in addition to that we also the majority of office hours are actually going to be
[00:08:04] of office hours are actually going to be virtual and these are by appointments so
[00:08:07] virtual and these are by appointments so so we used calendar last year and part
[00:08:09] so we used calendar last year and part of it was just these queues were
[00:08:10] of it was just these queues were becoming too long to handle so now we
[00:08:13] becoming too long to handle so now we you have you can make it and make an
[00:08:14] you have you can make it and make an appointment for ca office hour it's
[00:08:16] appointment for ca office hour it's going to be a one-on-one office hour and
[00:08:18] going to be a one-on-one office hour and these office hours will happen on weeks
[00:08:20] these office hours will happen on weeks okay
[00:08:21] okay in addition our ca office hours are two
[00:08:24] in addition our ca office hours are two categories so we have separated general
[00:08:26] categories so we have separated general office hours from homework office hours
[00:08:28] office hours from homework office hours so if you have over questions you should
[00:08:30] so if you have over questions you should just go to the dedicated homework office
[00:08:32] just go to the dedicated homework office hours but if you have more general
[00:08:33] hours but if you have more general questions about the course if you're
[00:08:35] questions about the course if you're thinking about your project if you're
[00:08:36] thinking about your project if you're thinking about general ai questions you
[00:08:38] thinking about general ai questions you should go to the general office hours or
[00:08:40] should go to the general office hours or the faculty office hours
[00:08:44] all right so um and then the final thing
[00:08:47] all right so um and then the final thing that we have is faculty office hours so
[00:08:49] that we have is faculty office hours so so percy and i both of us will have a
[00:08:51] so percy and i both of us will have a 50-minute office hours weekly and the
[00:08:53] 50-minute office hours weekly and the schedule for this is also uh on on on
[00:08:56] schedule for this is also uh on on on the website so you can take a look at
[00:08:58] the website so you can take a look at that again it's one on one it is going
[00:09:00] that again it's one on one it is going to be virtual um and um you can sign up
[00:09:04] to be virtual um and um you can sign up for them beforehand and come and chat
[00:09:06] for them beforehand and come and chat with us and again all details is on the
[00:09:08] with us and again all details is on the website
[00:09:09] website all right so let me just like see if
[00:09:11] all right so let me just like see if there are any questions here
[00:09:15] there are any questions here does that
[00:09:16] does that anyone have any questions
[00:09:22] should i look at chat okay let me just
[00:09:24] should i look at chat okay let me just quickly look at chat uh so many of you
[00:09:26] quickly look at chat uh so many of you know each faculty chat to attend so we
[00:09:29] know each faculty chat to attend so we will be in touch about this soon so
[00:09:30] will be in touch about this soon so actually on that there is um i'll talk
[00:09:33] actually on that there is um i'll talk about this in a little bit but uh
[00:09:35] about this in a little bit but uh basically uh there's a survey that you
[00:09:37] basically uh there's a survey that you need to fill out uh by i should notice
[00:09:40] need to fill out uh by i should notice by wednesday correct me if i'm wrong by
[00:09:42] by wednesday correct me if i'm wrong by wednesday
[00:09:44] wednesday no wednesday
[00:09:45] no wednesday okay by wednesday yeah so that is
[00:09:48] okay by wednesday yeah so that is basically to get your preferences on um
[00:09:51] basically to get your preferences on um on what faculty chat you would like to
[00:09:53] on what faculty chat you would like to go and then we'll also assign you after
[00:09:55] go and then we'll also assign you after that to specific faculty chats okay
[00:09:59] that to specific faculty chats okay all right
[00:10:01] all right uh do we sign up for ca and faculty
[00:10:04] uh do we sign up for ca and faculty office hours through nukes so no we will
[00:10:06] office hours through nukes so no we will use calendar for signing up but we'll
[00:10:08] use calendar for signing up but we'll use nukes for for answering questions
[00:10:11] use nukes for for answering questions all right
[00:10:13] all right okay
[00:10:15] okay all right so if there are not
[00:10:17] all right so if there are not no more questions let me go to the next
[00:10:19] no more questions let me go to the next slide so so now let's talk a little bit
[00:10:21] slide so so now let's talk a little bit about periodics so so this is a question
[00:10:23] about periodics so so this is a question that oftentimes comes up
[00:10:25] that oftentimes comes up so what are the prereqs for the class so
[00:10:26] so what are the prereqs for the class so you need to have some programming
[00:10:28] you need to have some programming background uh it would be good if it is
[00:10:30] background uh it would be good if it is in python so so these are some courses
[00:10:32] in python so so these are some courses that are prereqs for this course in
[00:10:34] that are prereqs for this course in addition to that it's a good idea to
[00:10:36] addition to that it's a good idea to have some mathematic math backgrounds so
[00:10:37] have some mathematic math backgrounds so discrete math cs10103
[00:10:40] discrete math cs10103 is is a
[00:10:41] is is a is a prereq in addition to that it would
[00:10:43] is a prereq in addition to that it would be a good idea to have some background
[00:10:45] be a good idea to have some background in probability and linear algebra so 109
[00:10:47] in probability and linear algebra so 109 and math math 51. okay
[00:10:49] and math math 51. okay but but in general we want you guys to
[00:10:51] but but in general we want you guys to have general familiarity with these
[00:10:55] have general familiarity with these and have some mathematical rigor and
[00:10:56] and have some mathematical rigor and general familiarity with probability
[00:10:58] general familiarity with probability linearity of our discrete math these
[00:11:00] linearity of our discrete math these types of topics we are not really
[00:11:02] types of topics we are not really expecting a very specific type of
[00:11:04] expecting a very specific type of knowledge for example like in linear
[00:11:06] knowledge for example like in linear algebra you'll learn about eigenvectors
[00:11:08] algebra you'll learn about eigenvectors but like we don't really like require
[00:11:10] but like we don't really like require the knowledge of eigenvectors in this
[00:11:11] the knowledge of eigenvectors in this class so so they're not specific topics
[00:11:14] class so so they're not specific topics that we're looking for but generally you
[00:11:16] that we're looking for but generally you want to know math you want to know
[00:11:17] want to know math you want to know programming and and come into class with
[00:11:19] programming and and come into class with that knowledge and and the reason is
[00:11:21] that knowledge and and the reason is that this course is also fairly like
[00:11:23] that this course is also fairly like fast-paced so you don't want to spend
[00:11:24] fast-paced so you don't want to spend your time like learning python or
[00:11:26] your time like learning python or learning learning math through this
[00:11:28] learning learning math through this class your your python programming and
[00:11:31] class your your python programming and programming is going to improve your
[00:11:32] programming is going to improve your math knowledge is going to improve but
[00:11:34] math knowledge is going to improve but you don't want to spend time learning
[00:11:36] you don't want to spend time learning these backgrounds and you want to really
[00:11:38] these backgrounds and you want to really spend all of your time learning ai so so
[00:11:40] spend all of your time learning ai so so if there are gaps some people do catch
[00:11:43] if there are gaps some people do catch up it is possible but again you want to
[00:11:45] up it is possible but again you want to spend your time like learning ai so you
[00:11:47] spend your time like learning ai so you kind of like leave it to you guys to
[00:11:49] kind of like leave it to you guys to decide and then and move forward and and
[00:11:52] decide and then and move forward and and you might ask okay how do i decide so we
[00:11:53] you might ask okay how do i decide so we have a we have a couple of things online
[00:11:55] have a we have a couple of things online that you can take a look at so so we
[00:11:57] that you can take a look at so so we have a set of modules that we recorded
[00:11:58] have a set of modules that we recorded actually last year and and these are
[00:12:00] actually last year and and these are prereq modules so these kind of provide
[00:12:02] prereq modules so these kind of provide refreshers of some of these topics so
[00:12:04] refreshers of some of these topics so definitely take a look at some of these
[00:12:06] definitely take a look at some of these prereq modules and that gives you a kind
[00:12:08] prereq modules and that gives you a kind of like a good sense of what is required
[00:12:10] of like a good sense of what is required to know to come into this class
[00:12:12] to know to come into this class in addition to that the first homework
[00:12:14] in addition to that the first homework is based on foundations and the first
[00:12:17] is based on foundations and the first homework really gives you a good idea of
[00:12:18] homework really gives you a good idea of what to expect as part of this class in
[00:12:20] what to expect as part of this class in terms of again programming and math
[00:12:22] terms of again programming and math knowledge coming in so take a look at
[00:12:24] knowledge coming in so take a look at these before deciding if you want to
[00:12:26] these before deciding if you want to skip every wreck or not but in general
[00:12:28] skip every wreck or not but in general again i do think it's a good idea to
[00:12:30] again i do think it's a good idea to have these backgrounds coming into this
[00:12:31] have these backgrounds coming into this class
[00:12:33] class all right
[00:12:34] all right so let's then talk about grading a
[00:12:36] so let's then talk about grading a little bit so grading is fairly
[00:12:38] little bit so grading is fairly straightforward so so we're going to
[00:12:40] straightforward so so we're going to have a set of homeworks um that's 55 of
[00:12:42] have a set of homeworks um that's 55 of the grade we're going to have two exams
[00:12:45] the grade we're going to have two exams so that is 40 of the grade uh the
[00:12:48] so that is 40 of the grade uh the faculty chats we actually count
[00:12:49] faculty chats we actually count participation part as part of the grade
[00:12:51] participation part as part of the grade so that is five percent and then
[00:12:53] so that is five percent and then projects we are going to make that uh
[00:12:55] projects we are going to make that uh optional this year so it's going to
[00:12:57] optional this year so it's going to count down towards extra credit and then
[00:12:59] count down towards extra credit and then if you contribute to ed so we're going
[00:13:01] if you contribute to ed so we're going to use at this quarter as opposed to
[00:13:03] to use at this quarter as opposed to piata if you contribute to ed that is
[00:13:06] piata if you contribute to ed that is also going to give you some level of
[00:13:07] also going to give you some level of extra credit okay
[00:13:09] extra credit okay uh and in general you can take the class
[00:13:11] uh and in general you can take the class uh for letter grade or pass no pass that
[00:13:14] uh for letter grade or pass no pass that is also like your choice basically so
[00:13:16] is also like your choice basically so now let's talk about each one of these
[00:13:18] now let's talk about each one of these components a little bit in more detail
[00:13:20] components a little bit in more detail so um
[00:13:22] so um so in terms of homeworks we have eight
[00:13:24] so in terms of homeworks we have eight homeworks and these eight homeworks are
[00:13:25] homeworks and these eight homeworks are a mix of programming questions and
[00:13:27] a mix of programming questions and written questions and the programming
[00:13:29] written questions and the programming problems are mainly focused on a
[00:13:31] problems are mainly focused on a specific application so uh like for
[00:13:34] specific application so uh like for example we might be looking at uh
[00:13:36] example we might be looking at uh blackjack as a game or we might be
[00:13:37] blackjack as a game or we might be looking at like pac-man as a game of
[00:13:39] looking at like pac-man as a game of pac-man or various types of topics like
[00:13:41] pac-man or various types of topics like car tracking so so there's a particular
[00:13:43] car tracking so so there's a particular application that is used as part of the
[00:13:45] application that is used as part of the programming component of the homeworks
[00:13:47] programming component of the homeworks and these programming components they're
[00:13:49] and these programming components they're auto-graded so
[00:13:51] auto-graded so and then there are a set of basically
[00:13:52] and then there are a set of basically public and private tests so you should
[00:13:54] public and private tests so you should definitely try out these public tests
[00:13:56] definitely try out these public tests first make sure that you you test this
[00:13:59] first make sure that you you test this thoroughly because the grading is very
[00:14:01] thoroughly because the grading is very strict it's based on auto grading and
[00:14:03] strict it's based on auto grading and you don't see all the tests so that's
[00:14:04] you don't see all the tests so that's kind of like the point that i was trying
[00:14:06] kind of like the point that i was trying to make here
[00:14:07] to make here uh and then um in addition to that you
[00:14:10] uh and then um in addition to that you have seven total late states and you can
[00:14:12] have seven total late states and you can use maximum of two per homework the
[00:14:14] use maximum of two per homework the reason for that is you want to release
[00:14:16] reason for that is you want to release the homework solutions so you can't use
[00:14:18] the homework solutions so you can't use more than two late days for homeworks
[00:14:21] more than two late days for homeworks okay so so that is homeworks okay so
[00:14:23] okay so so that is homeworks okay so that's our plan for homeworks the usual
[00:14:25] that's our plan for homeworks the usual we'll go with that
[00:14:27] we'll go with that um
[00:14:28] um one other point that i want to add on
[00:14:29] one other point that i want to add on homeworks is we are adding an extra
[00:14:31] homeworks is we are adding an extra addition to every every single homework
[00:14:33] addition to every every single homework which is an ethics component so it's an
[00:14:35] which is an ethics component so it's an ethics component is going to be added uh
[00:14:38] ethics component is going to be added uh to all of our homeworks is a new edition
[00:14:40] to all of our homeworks is a new edition that you're having this quarter and
[00:14:41] that you're having this quarter and we're also going to significantly change
[00:14:43] we're also going to significantly change some of these homeworks uh to
[00:14:45] some of these homeworks uh to incorporate an ethics question into them
[00:14:48] incorporate an ethics question into them so so we're trying to incorporate that
[00:14:49] so so we're trying to incorporate that throughout the class throughout these
[00:14:51] throughout the class throughout these homeworks so uh that would be that would
[00:14:54] homeworks so uh that would be that would be also an addition to consider this
[00:14:56] be also an addition to consider this course all right so moving forward with
[00:14:59] course all right so moving forward with exams so um this quarter last year we
[00:15:02] exams so um this quarter last year we decided to do a set of quizzes this year
[00:15:03] decided to do a set of quizzes this year we're not going to do the quizzes that
[00:15:05] we're not going to do the quizzes that didn't really like students don't really
[00:15:07] didn't really like students don't really like it every every week so instead we
[00:15:09] like it every every week so instead we were going to have two exams okay
[00:15:12] were going to have two exams okay and and the point of the exams is really
[00:15:14] and and the point of the exams is really to test the ability of your knowledge on
[00:15:17] to test the ability of your knowledge on working in new problems it's not really
[00:15:19] working in new problems it's not really to know like facts that you're teaching
[00:15:20] to know like facts that you're teaching it's more about like your knowledge of
[00:15:22] it's more about like your knowledge of ai and if you can actually apply that to
[00:15:24] ai and if you can actually apply that to new problems
[00:15:26] new problems and and all these problems are going to
[00:15:27] and and all these problems are going to be written so no coding and you should
[00:15:30] be written so no coding and you should take a look at like past exams to get a
[00:15:31] take a look at like past exams to get a sense of like that like how these
[00:15:34] sense of like that like how these problems look like and and what is a
[00:15:35] problems look like and and what is a format of them so each one of the exams
[00:15:38] format of them so each one of the exams is going to be a hundred minutes and and
[00:15:41] is going to be a hundred minutes and and these exams are going to be to be open
[00:15:43] these exams are going to be to be open book
[00:15:44] book so um we actually have the dates for
[00:15:45] so um we actually have the dates for these exams already they are going to be
[00:15:48] these exams already they are going to be released in a 24 24-hour window uh so
[00:15:51] released in a 24 24-hour window uh so they're going to be released on one of
[00:15:53] they're going to be released on one of the first one is going to be released on
[00:15:54] the first one is going to be released on october 29th at 3 15 p.m and then it's
[00:15:57] october 29th at 3 15 p.m and then it's going to be due the next day at 3 15 p.m
[00:16:01] going to be due the next day at 3 15 p.m pacific time and then similarly we have
[00:16:02] pacific time and then similarly we have exam 2 on december 8th since it's 3 15
[00:16:05] exam 2 on december 8th since it's 3 15 p.m pacific time is the time okay
[00:16:09] p.m pacific time is the time okay all right so um we have these dates if
[00:16:12] all right so um we have these dates if you have major conflicts about with any
[00:16:14] you have major conflicts about with any of these dates you should let us know by
[00:16:16] of these dates you should let us know by october 8th which is week three of the
[00:16:18] october 8th which is week three of the class okay
[00:16:20] class okay uh in addition we will not have any late
[00:16:22] uh in addition we will not have any late days for these exams um again and
[00:16:25] days for these exams um again and because we need to release solutions we
[00:16:26] because we need to release solutions we need to make sure it works for everyone
[00:16:28] need to make sure it works for everyone so no late days gets applied to the
[00:16:29] so no late days gets applied to the exams and of course no collaboration on
[00:16:32] exams and of course no collaboration on the exam so please do not talk about the
[00:16:34] the exam so please do not talk about the exams on x so so like if you've done it
[00:16:37] exams on x so so like if you've done it you're done with it but there's still
[00:16:38] you're done with it but there's still like time left uh within that 24 hour
[00:16:41] like time left uh within that 24 hour window do not post anything about the
[00:16:43] window do not post anything about the exam on on it okay
[00:16:46] exam on on it okay all right so that was that was exams
[00:16:49] all right so that was that was exams um and uh the last component uh that is
[00:16:53] um and uh the last component uh that is mandatory as part of the class is the
[00:16:54] mandatory as part of the class is the faculty chat participation so so as i
[00:16:57] faculty chat participation so so as i was mentioning earlier the goal of this
[00:16:59] was mentioning earlier the goal of this is really discussing topics around ai
[00:17:01] is really discussing topics around ai like related and topics around the ai
[00:17:04] like related and topics around the ai so fill out this initial survey that i
[00:17:06] so fill out this initial survey that i was talking about by wednesday so so
[00:17:07] was talking about by wednesday so so that way we can start scheduling these
[00:17:09] that way we can start scheduling these you're going to be a sign in session
[00:17:11] you're going to be a sign in session again six sessions run in parallel on
[00:17:13] again six sessions run in parallel on mondays
[00:17:15] mondays and this is during class time on mondays
[00:17:17] and this is during class time on mondays so make sure that you can actually make
[00:17:18] so make sure that you can actually make them make that time
[00:17:20] them make that time and then
[00:17:21] and then you should prepare before these sessions
[00:17:23] you should prepare before these sessions and these sessions are going to be on
[00:17:24] and these sessions are going to be on different topics so if they are on
[00:17:26] different topics so if they are on specific like research topics like
[00:17:28] specific like research topics like robotics autonomous driving ethics uh
[00:17:31] robotics autonomous driving ethics uh robustness foundation models uh we often
[00:17:34] robustness foundation models uh we often have some related material that that we
[00:17:36] have some related material that that we released beforehand sometimes these are
[00:17:38] released beforehand sometimes these are we had a set of fireside chats last year
[00:17:41] we had a set of fireside chats last year it could be like that fireside chat to
[00:17:43] it could be like that fireside chat to watch and or talks last year
[00:17:45] watch and or talks last year it could be basically that talk to watch
[00:17:47] it could be basically that talk to watch beforehand so you come to the come to
[00:17:49] beforehand so you come to the come to the session a little bit prepared and we
[00:17:51] the session a little bit prepared and we can talk about these topics we also have
[00:17:53] can talk about these topics we also have another set of topics that are really
[00:17:55] another set of topics that are really about uh more thinking about academia
[00:17:58] about uh more thinking about academia versus industry graduate school uh
[00:18:01] versus industry graduate school uh thinking about like how you read a
[00:18:02] thinking about like how you read a research paper so
[00:18:04] research paper so so some of these and other components
[00:18:06] so some of these and other components that are maybe not necessarily a
[00:18:08] that are maybe not necessarily a particular research area and again
[00:18:09] particular research area and again you'll have some material for this
[00:18:11] you'll have some material for this reading material to have beforehand so
[00:18:13] reading material to have beforehand so you come in again prepared so the way we
[00:18:16] you come in again prepared so the way we are looking at participation as part of
[00:18:18] are looking at participation as part of these faculty chats is as you come in
[00:18:20] these faculty chats is as you come in you should introduce yourself and you
[00:18:22] you should introduce yourself and you should also share a little bit about
[00:18:23] should also share a little bit about your thoughts or your goals for that
[00:18:25] your thoughts or your goals for that session so you should actively
[00:18:27] session so you should actively participate in that in that 25 minute
[00:18:29] participate in that in that 25 minute session and you kind of expect that when
[00:18:31] session and you kind of expect that when you're thinking about participation and
[00:18:33] you're thinking about participation and grading participation uh during these
[00:18:35] grading participation uh during these faculty chats okay
[00:18:38] faculty chats okay uh you will not be tested on the
[00:18:40] uh you will not be tested on the material that you're discussing on the
[00:18:41] material that you're discussing on the faculty chats so i also just wanted to
[00:18:44] faculty chats so i also just wanted to mention
[00:18:46] mention all right
[00:18:48] all right um do we need to attend one faculty chat
[00:18:52] um do we need to attend one faculty chat session to get yes so you will be
[00:18:53] session to get yes so you will be assigned one to one faculty chat if
[00:18:56] assigned one to one faculty chat if there is room you can actually attend
[00:18:58] there is room you can actually attend more faculty chats uh we are potentially
[00:19:01] more faculty chats uh we are potentially going to have more room based on the
[00:19:03] going to have more room based on the number of students who are enrolled but
[00:19:04] number of students who are enrolled but uh we will be in touch on like what is
[00:19:06] uh we will be in touch on like what is like what are the availabilities and if
[00:19:08] like what are the availabilities and if you can attend more than one faculty
[00:19:10] you can attend more than one faculty chat but yeah you'll be assigned one
[00:19:13] chat but yeah you'll be assigned one okay so let me talk about uh the project
[00:19:16] okay so let me talk about uh the project also real quick so
[00:19:18] also real quick so um the project this quarter is going to
[00:19:20] um the project this quarter is going to be optional uh this is what we did last
[00:19:22] be optional uh this is what we did last year too because the course is virtual
[00:19:24] year too because the course is virtual and we thought it would be um
[00:19:26] and we thought it would be um it might be a little bit more difficult
[00:19:27] it might be a little bit more difficult to to find a team and work together but
[00:19:29] to to find a team and work together but regardless like a lot of students did
[00:19:31] regardless like a lot of students did the project last year and and there are
[00:19:34] the project last year and and there are a lot of interesting ideas and projects
[00:19:36] a lot of interesting ideas and projects came out of that and it was really
[00:19:37] came out of that and it was really exciting to see like so many cool
[00:19:39] exciting to see like so many cool projects uh like during that quarter too
[00:19:41] projects uh like during that quarter too so i do recommend that you guys look
[00:19:43] so i do recommend that you guys look into this closely even though it is
[00:19:44] into this closely even though it is optional so so the idea is you want to
[00:19:46] optional so so the idea is you want to choose a task uh where you can actually
[00:19:49] choose a task uh where you can actually apply some of the ideas that you have
[00:19:51] apply some of the ideas that you have learned as part of this class and and
[00:19:53] learned as part of this class and and use those techniques for that particular
[00:19:55] use those techniques for that particular task it's a little bit open-ended you
[00:19:57] task it's a little bit open-ended you need to like decide what that task is
[00:19:58] need to like decide what that task is but that's also the beauty of it right
[00:20:00] but that's also the beauty of it right you can pick anything and apply some of
[00:20:02] you can pick anything and apply some of the ai techniques that you're learning
[00:20:04] the ai techniques that you're learning for that
[00:20:05] for that uh the idea is that you can work in
[00:20:07] uh the idea is that you can work in groups of up to four people
[00:20:10] groups of up to four people and then you also have a set of
[00:20:11] and then you also have a set of [Music]
[00:20:13] [Music] stones like uh you need to fill out a
[00:20:15] stones like uh you need to fill out a project interest form there's a proposal
[00:20:17] project interest form there's a proposal progress report and there's a video and
[00:20:19] progress report and there's a video and final reports that you need to do so if
[00:20:20] final reports that you need to do so if you decide to do the project and
[00:20:22] you decide to do the project and actually get the extra credit you should
[00:20:23] actually get the extra credit you should do all these different you should
[00:20:24] do all these different you should actually like pass through all these
[00:20:26] actually like pass through all these milestones and finish the project
[00:20:28] milestones and finish the project uh again the task is completely open and
[00:20:30] uh again the task is completely open and and but but there are a set of
[00:20:32] and but but there are a set of well-defined steps that we expect you
[00:20:33] well-defined steps that we expect you guys to to to have throughout the course
[00:20:36] guys to to to have throughout the course uh throughout the course for this um for
[00:20:38] uh throughout the course for this um for this project
[00:20:40] this project uh so this includes things of the form
[00:20:42] uh so this includes things of the form of like defining the task or
[00:20:44] of like defining the task or implementing your baselines and oracles
[00:20:47] implementing your baselines and oracles and things of those form or having a
[00:20:49] and things of those form or having a literature review thinking about what
[00:20:50] literature review thinking about what your revaluation metrics are and and you
[00:20:53] your revaluation metrics are and and you will have a ca assigned to you if you
[00:20:55] will have a ca assigned to you if you decide to do a project you'll have a ca
[00:20:56] decide to do a project you'll have a ca assigned to your group and your ca can
[00:20:58] assigned to your group and your ca can also walk you through some of these
[00:21:00] also walk you through some of these different components that that you want
[00:21:02] different components that that you want to have as part of your project
[00:21:05] to have as part of your project and in addition to that one other thing
[00:21:07] and in addition to that one other thing that we've added is a mandatory check-in
[00:21:09] that we've added is a mandatory check-in meeting with your ca so this is a
[00:21:11] meeting with your ca so this is a 15-minute mandatory check-in meeting
[00:21:12] 15-minute mandatory check-in meeting david your ca you think this is really
[00:21:14] david your ca you think this is really useful and to make sure that you are you
[00:21:16] useful and to make sure that you are you keep you keep up with the project if you
[00:21:18] keep you keep up with the project if you decide to do it and in general if you
[00:21:20] decide to do it and in general if you want to think about ideas for what to do
[00:21:23] want to think about ideas for what to do for your project or if you have some
[00:21:24] for your project or if you have some idea and you want to discuss it
[00:21:26] idea and you want to discuss it definitely come to office hours you can
[00:21:27] definitely come to office hours you can come to a person at nice office hour or
[00:21:29] come to a person at nice office hour or like the ci's office hours and discuss
[00:21:31] like the ci's office hours and discuss discuss some of these questions okay
[00:21:35] discuss some of these questions okay all right
[00:21:37] all right um and the last point that i want to
[00:21:38] um and the last point that i want to mention on
[00:21:40] mention on honor
[00:21:41] honor on logistics is the honor code so i want
[00:21:43] on logistics is the honor code so i want to spend a little bit of time talking
[00:21:44] to spend a little bit of time talking about this because this is really
[00:21:45] about this because this is really important you guys don't want to deal
[00:21:47] important you guys don't want to deal with it we don't want to deal with it so
[00:21:49] with it we don't want to deal with it so let's just talk about it and
[00:21:51] let's just talk about it and get it out of the way so
[00:21:53] get it out of the way so especially this quarter given that
[00:21:54] especially this quarter given that things are online we do want you guys to
[00:21:56] things are online we do want you guys to collaborate we do want you guys to
[00:21:58] collaborate we do want you guys to discuss together learn together like
[00:22:00] discuss together learn together like think about problems together but the
[00:22:02] think about problems together but the write up and the code needs to be
[00:22:04] write up and the code needs to be entered independently so so you need to
[00:22:06] entered independently so so you need to like write your code you need to write
[00:22:08] like write your code you need to write up your solutions like independently
[00:22:10] up your solutions like independently based on your own thoughts and your own
[00:22:12] based on your own thoughts and your own ideas so please do not sure code please
[00:22:15] ideas so please do not sure code please do not share your write-ups with others
[00:22:17] do not share your write-ups with others and don't look at anyone else's write-up
[00:22:19] and don't look at anyone else's write-up or code even if it is on internet and
[00:22:21] or code even if it is on internet and you found it but not look at these
[00:22:22] you found it but not look at these things
[00:22:24] things um and then um yeah do not post it
[00:22:26] um and then um yeah do not post it online but like if you're proud of your
[00:22:28] online but like if you're proud of your code you shouldn't post it on github do
[00:22:30] code you shouldn't post it on github do not do that
[00:22:31] not do that um and and in general when you're
[00:22:33] um and and in general when you're debugging try to look at like input
[00:22:35] debugging try to look at like input output behavior you could be like going
[00:22:36] output behavior you could be like going to homework parties and debugging your
[00:22:38] to homework parties and debugging your solutions with other people and really
[00:22:40] solutions with other people and really just look at input output behavior don't
[00:22:42] just look at input output behavior don't look at each other's code and and that
[00:22:43] look at each other's code and and that way you'll be safe
[00:22:45] way you'll be safe but i do want to emphasize that we do
[00:22:47] but i do want to emphasize that we do run moss periodically and this will
[00:22:49] run moss periodically and this will automatically like detect like if there
[00:22:51] automatically like detect like if there is like matching between codes and and
[00:22:53] is like matching between codes and and please like do not do that in boss is
[00:22:55] please like do not do that in boss is really good uh so
[00:22:58] really good uh so yeah so and every year we have a number
[00:23:00] yeah so and every year we have a number of cases and sometimes like we run these
[00:23:03] of cases and sometimes like we run these things mid-quarter so i want to also
[00:23:05] things mid-quarter so i want to also emphasize that and then you don't want
[00:23:07] emphasize that and then you don't want to you don't want to want
[00:23:09] to you don't want to want things like yeah you don't want to go
[00:23:10] things like yeah you don't want to go through these things like mid-quarter
[00:23:12] through these things like mid-quarter and and it's it's again something that
[00:23:14] and and it's it's again something that we don't want to deal with we don't want
[00:23:15] we don't want to deal with we don't want to deal with let's just not do it you're
[00:23:17] to deal with let's just not do it you're also changing a number of like homework
[00:23:19] also changing a number of like homework questions and you're adding like you're
[00:23:22] questions and you're adding like you're adapting things to make this a little
[00:23:24] adapting things to make this a little bit easier on everyone
[00:23:26] bit easier on everyone all right oh and then the last point i
[00:23:28] all right oh and then the last point i want to make is on communication so um
[00:23:32] want to make is on communication so um we're going to use ed this quarter so in
[00:23:34] we're going to use ed this quarter so in general if you have any questions best
[00:23:36] general if you have any questions best idea is to
[00:23:37] idea is to put it public at the post that way of
[00:23:40] put it public at the post that way of course staff students everyone can see
[00:23:41] course staff students everyone can see it and you have a broader group of
[00:23:43] it and you have a broader group of people who can answer that question and
[00:23:45] people who can answer that question and you can hope like probably other people
[00:23:46] you can hope like probably other people are thinking about that question too so
[00:23:48] are thinking about that question too so that's kind of a best way of
[00:23:49] that's kind of a best way of communicating with us if there is a
[00:23:51] communicating with us if there is a private question make a private ad post
[00:23:53] private question make a private ad post and that way the course staff can see it
[00:23:55] and that way the course staff can see it uh and and for example if you have a
[00:23:57] uh and and for example if you have a question uh that can give away answers
[00:23:59] question uh that can give away answers it's a good idea to to post that as a
[00:24:01] it's a good idea to to post that as a private post and in general if there are
[00:24:03] private post and in general if there are sensitive matters that you want to
[00:24:05] sensitive matters that you want to discuss or oe accommodations you should
[00:24:07] discuss or oe accommodations you should email the court uh the the this
[00:24:09] email the court uh the the this particular email address this goes to
[00:24:11] particular email address this goes to only four people this goes to percy and
[00:24:13] only four people this goes to percy and i and shiri and faith so our student
[00:24:15] i and shiri and faith so our student liaison and our head ca so if you have
[00:24:18] liaison and our head ca so if you have any sensitive matter you should just uh
[00:24:20] any sensitive matter you should just uh send an email to this email list that
[00:24:21] send an email to this email list that goes to the four of us
[00:24:24] goes to the four of us um in addition to that you're going to
[00:24:25] um in addition to that you're going to have periodic surveys you also have a
[00:24:27] have periodic surveys you also have a welcome survey already on canvas so so
[00:24:29] welcome survey already on canvas so so please take a look at that and fill that
[00:24:31] please take a look at that and fill that up and and that way we can start getting
[00:24:34] up and and that way we can start getting getting some feedback and again as the
[00:24:36] getting some feedback and again as the course is virtual we would love to get
[00:24:37] course is virtual we would love to get more feedback periodic feedback
[00:24:39] more feedback periodic feedback throughout the course so it will be
[00:24:40] throughout the course so it will be great to give us feedback tell us what
[00:24:42] great to give us feedback tell us what works what doesn't work so we can
[00:24:44] works what doesn't work so we can adapt and again all these details
[00:24:47] adapt and again all these details everything i've said so far
[00:24:48] everything i've said so far is on the course website
[00:24:51] is on the course website and and with that i can take any
[00:24:53] and and with that i can take any questions about logistics i know i
[00:24:55] questions about logistics i know i covered quite a bit on logistics
[00:24:59] covered quite a bit on logistics if anyone wants to just ask a question
[00:25:00] if anyone wants to just ask a question that's probably easier to because then
[00:25:02] that's probably easier to because then so
[00:25:04] so if uh so on the exam if we're looking
[00:25:06] if uh so on the exam if we're looking for a clarification should we post that
[00:25:09] for a clarification should we post that privately to ad not at all or should we
[00:25:12] privately to ad not at all or should we email it
[00:25:15] assuming that it's not something that
[00:25:16] assuming that it's not something that was supposed to give anything away it's
[00:25:18] was supposed to give anything away it's just supposed to be a clarification of
[00:25:19] just supposed to be a clarification of what's intended by the question
[00:25:21] what's intended by the question you should you should post a private
[00:25:23] you should you should post a private private post on end that only go to
[00:25:25] private post on end that only go to their courses
[00:25:36] that was a good guideline but then uh as
[00:25:39] that was a good guideline but then uh as far as with the coding of
[00:25:40] far as with the coding of python um what about uh the use you know
[00:25:44] python um what about uh the use you know in terms of basic routines obviously not
[00:25:47] in terms of basic routines obviously not uh trying to copy uh
[00:25:49] uh trying to copy uh codes wholesale but
[00:25:51] codes wholesale but as far as using things like stack
[00:25:53] as far as using things like stack overflow and
[00:25:55] overflow and and others as far as four
[00:25:57] and others as far as four um obviously there's all the various uh
[00:26:00] um obviously there's all the various uh things that can be used as sort of like
[00:26:02] things that can be used as sort of like uh you know uh virtual to tutorials and
[00:26:05] uh you know uh virtual to tutorials and uh you know various things that you
[00:26:07] uh you know various things that you wanna accomplish with the python that
[00:26:09] wanna accomplish with the python that you're writing
[00:26:10] you're writing i assume that
[00:26:12] i assume that as long as it's you know just little
[00:26:15] as long as it's you know just little uh routines that it's not a problem it's
[00:26:17] uh routines that it's not a problem it's the problem is when it's starting to be
[00:26:19] the problem is when it's starting to be that you're
[00:26:20] that you're taking somebody's idea
[00:26:22] taking somebody's idea wholesale yeah and in general yeah so
[00:26:25] wholesale yeah and in general yeah so try to write things like yeah yourself
[00:26:28] try to write things like yeah yourself basically like when it comes to writing
[00:26:30] basically like when it comes to writing the code part of it try to like you can
[00:26:31] the code part of it try to like you can get ideas you can discuss the idea of it
[00:26:33] get ideas you can discuss the idea of it with other people or you can like yeah
[00:26:35] with other people or you can like yeah like look at like online forms for ideas
[00:26:37] like look at like online forms for ideas but when it comes to writing the code uh
[00:26:40] but when it comes to writing the code uh try to just write it yourself if there
[00:26:42] try to just write it yourself if there are specific like things that you're not
[00:26:44] are specific like things that you're not sure of like you should go to the ca
[00:26:46] sure of like you should go to the ca office hours or should go to our office
[00:26:48] office hours or should go to our office hours basically ask us like that
[00:26:49] hours basically ask us like that specific instance and uh we can talk
[00:26:51] specific instance and uh we can talk about it then yeah
[00:26:53] about it then yeah okay
[00:26:55] okay all right so let's move forward uh so
[00:26:58] all right so let's move forward uh so now let's talk a little bit about the
[00:26:59] now let's talk a little bit about the course content okay
[00:27:02] course content okay so what are we discussing what is ai
[00:27:04] so what are we discussing what is ai what are we going to be covering in this
[00:27:05] what are we going to be covering in this class
[00:27:06] class so so in general you're an ai right like
[00:27:09] so so in general you're an ai right like you're interested in solving realistic
[00:27:11] you're interested in solving realistic complex problems that have a lot of
[00:27:13] complex problems that have a lot of messiness and uncertainty and if you
[00:27:15] messiness and uncertainty and if you think about a complex problem let's say
[00:27:17] think about a complex problem let's say routing cars like in a city with a lot
[00:27:20] routing cars like in a city with a lot of like complex settings that is
[00:27:22] of like complex settings that is happening in that city how do you how do
[00:27:25] happening in that city how do you how do you go about solving that question let's
[00:27:26] you go about solving that question let's say the question is just routing the
[00:27:28] say the question is just routing the vehicles right like you're not going to
[00:27:30] vehicles right like you're not going to start just writing out code for it right
[00:27:32] start just writing out code for it right like you're going don't start from
[00:27:33] like you're going don't start from scratch and from like not really having
[00:27:35] scratch and from like not really having a formal exam to just directly code that
[00:27:38] a formal exam to just directly code that that seems pretty difficult and in
[00:27:40] that seems pretty difficult and in general there's there's a gap between
[00:27:42] general there's there's a gap between the code or the software the hardware
[00:27:44] the code or the software the hardware that in general we developed as ai
[00:27:46] that in general we developed as ai scientists as engineers and what is
[00:27:49] scientists as engineers and what is happening in reality like what is the
[00:27:51] happening in reality like what is the real world with messy with all the messy
[00:27:53] real world with messy with all the messy and complex messiness and complexities
[00:27:55] and complex messiness and complexities that exist
[00:27:56] that exist and then really what ai and what in this
[00:27:59] and then really what ai and what in this course is trying to do is to bridge that
[00:28:01] course is trying to do is to bridge that gap to figure out how we can take some
[00:28:03] gap to figure out how we can take some of these real world problems and and
[00:28:04] of these real world problems and and make it simpler in a way that is
[00:28:06] make it simpler in a way that is manageable so we can develop algorithms
[00:28:08] manageable so we can develop algorithms and code for it
[00:28:10] and code for it and for that we have a paradigm in this
[00:28:11] and for that we have a paradigm in this class that would like to follow and this
[00:28:14] class that would like to follow and this paradigm has three different core
[00:28:16] paradigm has three different core components three pillars and these three
[00:28:19] components three pillars and these three pillars are the modeling in friends and
[00:28:21] pillars are the modeling in friends and learning pillars and i'm going to talk a
[00:28:23] learning pillars and i'm going to talk a little bit about these so so the idea is
[00:28:25] little bit about these so so the idea is we take a very difficult problem we
[00:28:27] we take a very difficult problem we model it and then we develop inference
[00:28:30] model it and then we develop inference algorithms for it and then throughout
[00:28:33] algorithms for it and then throughout the this process
[00:28:34] the this process there could also be the model could have
[00:28:36] there could also be the model could have a set of unknowns and we use learning
[00:28:38] a set of unknowns and we use learning throughout to actually make our models
[00:28:40] throughout to actually make our models better so so let me let me try to make
[00:28:42] better so so let me let me try to make this a little bit more clear moving
[00:28:44] this a little bit more clear moving forward so so let's go back to this real
[00:28:47] forward so so let's go back to this real world problem that you were talking
[00:28:48] world problem that you were talking about droughting vehicles like on in a
[00:28:50] about droughting vehicles like on in a city okay so so this is a this is a big
[00:28:53] city okay so so this is a this is a big problem and in general i would like to
[00:28:55] problem and in general i would like to have a formalism so what modeling does
[00:28:57] have a formalism so what modeling does is it takes that complex problem and it
[00:28:59] is it takes that complex problem and it tries to come up with a formalism with a
[00:29:01] tries to come up with a formalism with a mathematical way of thinking about that
[00:29:03] mathematical way of thinking about that problem and and modeling just by
[00:29:05] problem and and modeling just by definition is lossy right like i'm not
[00:29:07] definition is lossy right like i'm not gonna get all that complexity that
[00:29:09] gonna get all that complexity that exists in the real world right all
[00:29:11] exists in the real world right all models are wrong but some are useful
[00:29:13] models are wrong but some are useful right so so under that idea of course
[00:29:16] right so so under that idea of course you're going to lose some of this
[00:29:17] you're going to lose some of this complexity but we're still going to come
[00:29:19] complexity but we're still going to come up with something that is somewhat
[00:29:21] up with something that is somewhat useful for the goal that we have maybe i
[00:29:23] useful for the goal that we have maybe i would like to find the shortest surest
[00:29:25] would like to find the shortest surest way of getting from one one road to
[00:29:27] way of getting from one one road to another road and if that is my goal i
[00:29:29] another road and if that is my goal i can basically maybe model this real
[00:29:31] can basically maybe model this real world problem as a graph problem where i
[00:29:34] world problem as a graph problem where i have i have a bunch of edges and
[00:29:36] have i have a bunch of edges and vertices and my vertices here are maybe
[00:29:39] vertices and my vertices here are maybe my locations like in the world and then
[00:29:41] my locations like in the world and then the edges are maybe the roads that
[00:29:43] the edges are maybe the roads that connect them okay so this would be a
[00:29:45] connect them okay so this would be a graph model that represents that real
[00:29:47] graph model that represents that real world problem
[00:29:49] world problem so we're going to spend quite a bit of
[00:29:50] so we're going to spend quite a bit of time in this class talking about
[00:29:51] time in this class talking about modeling
[00:29:52] modeling and then once i have a model then i can
[00:29:55] and then once i have a model then i can start asking questions about that model
[00:29:57] start asking questions about that model right i can ask for what is the shortest
[00:29:58] right i can ask for what is the shortest path of getting from one one node to
[00:30:00] path of getting from one one node to another node or what is the most scenic
[00:30:03] another node or what is the most scenic path of getting getting from one region
[00:30:05] path of getting getting from one region to another region or i might have
[00:30:06] to another region or i might have different objectives that i would like
[00:30:08] different objectives that i would like to be able to optimize and inference is
[00:30:11] to be able to optimize and inference is really a way of trying
[00:30:13] really a way of trying to solve that problem and give us an
[00:30:16] to solve that problem and give us an answer to some of these questions that
[00:30:18] answer to some of these questions that that we have here okay so how do we make
[00:30:20] that we have here okay so how do we make predictions how do we figure out what is
[00:30:22] predictions how do we figure out what is the right path to take in in in this
[00:30:24] the right path to take in in in this problem is is kind of like the thing
[00:30:26] problem is is kind of like the thing that inference lets us get and then
[00:30:28] that inference lets us get and then finally the last pillar is learning and
[00:30:30] finally the last pillar is learning and the way i want you to think of learning
[00:30:32] the way i want you to think of learning is is that if you think about that model
[00:30:34] is is that if you think about that model right like we're oftentimes you're not
[00:30:36] right like we're oftentimes you're not gonna write be able to write everything
[00:30:38] gonna write be able to write everything in in that model with all the
[00:30:39] in in that model with all the complexities but all we can do is we can
[00:30:42] complexities but all we can do is we can write a skeleton for what we're trying
[00:30:43] write a skeleton for what we're trying to do maybe a graph but that graph might
[00:30:46] to do maybe a graph but that graph might not you might not know what are the edge
[00:30:48] not you might not know what are the edge like what are the what are the weights
[00:30:50] like what are the what are the weights on on the edges like we might not be
[00:30:51] on on the edges like we might not be given given given the the edge values
[00:30:54] given given given the the edge values here because like that would be like too
[00:30:56] here because like that would be like too complicated to write or we might just
[00:30:57] complicated to write or we might just not have it like periods at the
[00:30:59] not have it like periods at the beginning so we often have a model
[00:31:01] beginning so we often have a model without parameters and the goal of
[00:31:04] without parameters and the goal of learning is to look at data and from
[00:31:06] learning is to look at data and from data complete this model and and add
[00:31:10] data complete this model and and add these parameters that were unknown at
[00:31:11] these parameters that were unknown at the beginning
[00:31:12] the beginning so so what learning is really doing is
[00:31:14] so so what learning is really doing is is taking the complexity that we have in
[00:31:17] is taking the complexity that we have in writing the specification writing the
[00:31:19] writing the specification writing the model and and takes that away and puts
[00:31:21] model and and takes that away and puts that into data and the fact that there
[00:31:23] that into data and the fact that there is data i can take that data and then
[00:31:25] is data i can take that data and then based on how good that data is or based
[00:31:27] based on how good that data is or based on what i can learn from that data i
[00:31:29] on what i can learn from that data i would be able to complete my model and
[00:31:32] would be able to complete my model and have a better model than it can actually
[00:31:33] have a better model than it can actually do and friends over so so we're going to
[00:31:35] do and friends over so so we're going to have also learning throughout this class
[00:31:37] have also learning throughout this class as a filler in every section that you'll
[00:31:39] as a filler in every section that you'll talk about in this class okay
[00:31:42] talk about in this class okay all right so so modeling inference and
[00:31:45] all right so so modeling inference and learning are just three pillars that
[00:31:47] learning are just three pillars that keep appearing throughout every week of
[00:31:50] keep appearing throughout every week of this class but what is our course plan
[00:31:52] this class but what is our course plan so our first plan is really to talk
[00:31:53] so our first plan is really to talk about different types of models starting
[00:31:55] about different types of models starting from low-level intelligence all the way
[00:31:57] from low-level intelligence all the way to higher level intelligence
[00:31:59] to higher level intelligence and and we're going to basically go over
[00:32:02] and and we're going to basically go over a variety of these models but before we
[00:32:04] a variety of these models but before we start talking about these we're going to
[00:32:06] start talking about these we're going to actually spend two weeks talking about
[00:32:08] actually spend two weeks talking about machine learning so so this is just to
[00:32:10] machine learning so so this is just to get some of the basics of machine
[00:32:12] get some of the basics of machine learning out of also machine learning in
[00:32:14] learning out of also machine learning in general is a very powerful tool that has
[00:32:16] general is a very powerful tool that has been quite impactful in the field of ai
[00:32:18] been quite impactful in the field of ai so so it's a good idea to learn some of
[00:32:20] so so it's a good idea to learn some of these some of these ideas in machine
[00:32:21] these some of these ideas in machine learning at the beginning so that we can
[00:32:23] learning at the beginning so that we can actually also use it like throughout the
[00:32:25] actually also use it like throughout the class when we are thinking about
[00:32:27] class when we are thinking about learning modeling and inference
[00:32:28] learning modeling and inference throughout the class based on these
[00:32:30] throughout the class based on these different types of uh models that we
[00:32:32] different types of uh models that we will discuss throughout throughout okay
[00:32:34] will discuss throughout throughout okay so so next next week and the week after
[00:32:37] so so next next week and the week after is basically going to be modules and
[00:32:39] is basically going to be modules and machine learning
[00:32:40] machine learning okay um and and um i'm just spending a
[00:32:44] okay um and and um i'm just spending a little bit time on what is machine
[00:32:46] little bit time on what is machine learning so again the role of machine
[00:32:47] learning so again the role of machine learning is to take data right and and
[00:32:50] learning is to take data right and and from that data try to generate these
[00:32:52] from that data try to generate these models that that were at the beginning
[00:32:54] models that that were at the beginning incomplete but now we can actually use
[00:32:56] incomplete but now we can actually use them and then we can actually
[00:32:56] them and then we can actually incorporate the data the information
[00:32:58] incorporate the data the information that's in that data
[00:33:00] that's in that data in in the model and the idea of it is
[00:33:02] in in the model and the idea of it is really moving from code to data so again
[00:33:05] really moving from code to data so again adding
[00:33:06] adding moving the complexity that exists in the
[00:33:08] moving the complexity that exists in the code to complexity existing data and
[00:33:11] code to complexity existing data and then one other point about machine
[00:33:12] then one other point about machine learning is it kind of like it requires
[00:33:15] learning is it kind of like it requires faith right so so if we have some data
[00:33:17] faith right so so if we have some data based on that data below the model
[00:33:19] based on that data below the model there's no reason
[00:33:21] there's no reason on the surface that that model could
[00:33:23] on the surface that that model could work in a new scenario that it could
[00:33:25] work in a new scenario that it could generalize to new settings and then
[00:33:27] generalize to new settings and then we'll talk about the idea of
[00:33:28] we'll talk about the idea of generalization quite a bit like when is
[00:33:30] generalization quite a bit like when is it that the model could generalize to
[00:33:32] it that the model could generalize to new settings like if i've trained it on
[00:33:34] new settings like if i've trained it on some set of data of let's say like house
[00:33:37] some set of data of let's say like house prices how can i make sure that this
[00:33:38] prices how can i make sure that this model could actually work in a new
[00:33:40] model could actually work in a new setting for a new house and and that
[00:33:42] setting for a new house and and that kind of goes back to this question of
[00:33:43] kind of goes back to this question of generalization and we'll spend time on
[00:33:45] generalization and we'll spend time on that
[00:33:46] that all right so that was machine learning
[00:33:49] all right so that was machine learning so as we were talking about machine
[00:33:51] so as we were talking about machine learning in the first two two weeks of
[00:33:53] learning in the first two two weeks of the class we're also going to spend a
[00:33:55] the class we're also going to spend a little bit of time talking about reflex
[00:33:56] little bit of time talking about reflex based models so these are kind of the
[00:33:59] based models so these are kind of the lowest level of intelligence in terms of
[00:34:01] lowest level of intelligence in terms of in terms of the modeling paradigms that
[00:34:03] in terms of the modeling paradigms that we'll be talking about throughout the
[00:34:04] we'll be talking about throughout the course
[00:34:05] course and then example and and here's an
[00:34:07] and then example and and here's an example of a reflex based model so i'm
[00:34:10] example of a reflex based model so i'm going to ask you guys what is this
[00:34:11] going to ask you guys what is this animal
[00:34:13] animal okay
[00:34:14] okay maybe you can put in chat what is this
[00:34:16] maybe you can put in chat what is this animal or just like chat
[00:34:19] animal or just like chat what was it
[00:34:22] what was it it was a zebra right and you were very
[00:34:24] it was a zebra right and you were very quickly you were able to quickly figure
[00:34:26] quickly you were able to quickly figure out that you just saw a zebra here right
[00:34:28] out that you just saw a zebra here right um and and this is really based on your
[00:34:31] um and and this is really based on your reflexes this is really like an example
[00:34:33] reflexes this is really like an example of what a reflex based model could do
[00:34:35] of what a reflex based model could do other examples of reflex space models
[00:34:37] other examples of reflex space models are things of the form of linear
[00:34:39] are things of the form of linear classifiers or deep neural networks and
[00:34:41] classifiers or deep neural networks and the idea that i'm calling these like the
[00:34:43] the idea that i'm calling these like the reason i'm calling these low intel
[00:34:45] reason i'm calling these low intel low-level intelligence is we're not
[00:34:46] low-level intelligence is we're not making a lot we're not doing a lot of
[00:34:48] making a lot we're not doing a lot of reasoning here we basically have a feed
[00:34:50] reasoning here we basically have a feed forward model and then we're not doing
[00:34:53] forward model and then we're not doing much computation into responding and
[00:34:55] much computation into responding and saying well that was a zebra right like
[00:34:56] saying well that was a zebra right like we quickly were just able to quickly say
[00:34:59] we quickly were just able to quickly say that that that was a zebra and this
[00:35:00] that that that was a zebra and this reflect space models they're the most
[00:35:02] reflect space models they're the most common form of models in machine
[00:35:04] common form of models in machine learning they're often like fully feed
[00:35:06] learning they're often like fully feed forward no backtracking no reasoning
[00:35:08] forward no backtracking no reasoning about what was going on and just like
[00:35:10] about what was going on and just like evaluating the model i think
[00:35:13] evaluating the model i think deep neural networks are an example of
[00:35:14] deep neural networks are an example of this linear classifiers are an example
[00:35:16] this linear classifiers are an example of this and that that's actually a
[00:35:17] of this and that that's actually a reason where we are going to discuss
[00:35:19] reason where we are going to discuss machine learning we're going to also
[00:35:20] machine learning we're going to also spend a little bit time thinking about
[00:35:21] spend a little bit time thinking about reflex space models right
[00:35:23] reflex space models right inference is extremely simple and then
[00:35:25] inference is extremely simple and then we just call the model
[00:35:28] we just call the model all right so moving along uh on top of
[00:35:30] all right so moving along uh on top of the reflex based models the one level
[00:35:32] the reflex based models the one level higher we're going to talk about
[00:35:33] higher we're going to talk about state-based models
[00:35:35] state-based models and these state-based models uh we're
[00:35:37] and these state-based models uh we're going to talk about three types of them
[00:35:38] going to talk about three types of them search problems mbps and adversarial
[00:35:40] search problems mbps and adversarial games so so so what are state-based
[00:35:42] games so so so what are state-based models so so here is an example let's
[00:35:45] models so so here is an example let's say you want to you want to play a game
[00:35:46] say you want to you want to play a game of chess and you want to figure out
[00:35:49] of chess and you want to figure out where what should be the next move of
[00:35:51] where what should be the next move of white
[00:35:52] white so so
[00:35:53] so so this is not the same as detecting if
[00:35:56] this is not the same as detecting if that if that animal was a zebra this is
[00:35:58] that if that animal was a zebra this is actually a lot more difficult than that
[00:36:00] actually a lot more difficult than that you actually need to sit down and do a
[00:36:01] you actually need to sit down and do a little bit of reasoning and figure out
[00:36:03] little bit of reasoning and figure out what state in the world you're in and
[00:36:05] what state in the world you're in and and figure out like how how the world is
[00:36:08] and figure out like how how the world is going to evolve so there is kind of like
[00:36:10] going to evolve so there is kind of like this notion of like sequences of of
[00:36:13] this notion of like sequences of of actions and sequence of states that are
[00:36:15] actions and sequence of states that are coming after each other like a leading
[00:36:17] coming after each other like a leading to b and so on and and this kind of
[00:36:20] to b and so on and and this kind of brings us this idea of state-based
[00:36:22] brings us this idea of state-based models and it hasn't it has many
[00:36:23] models and it hasn't it has many applications including in in games so if
[00:36:26] applications including in in games so if you think about games like chess go
[00:36:28] you think about games like chess go pac-man starcraft these are these are
[00:36:30] pac-man starcraft these are these are all examples uh where we can we can
[00:36:32] all examples uh where we can we can think about state-based models as a good
[00:36:34] think about state-based models as a good way of modeling them
[00:36:36] way of modeling them they show up in robotics all the time
[00:36:37] they show up in robotics all the time you think about motion planning like
[00:36:39] you think about motion planning like getting a robot arm to move from one
[00:36:41] getting a robot arm to move from one location to another location uh we
[00:36:43] location to another location uh we oftentimes use state-based models as a
[00:36:45] oftentimes use state-based models as a way of formalizing that they also show
[00:36:47] way of formalizing that they also show up in natural language generation
[00:36:49] up in natural language generation machine translation image captioning
[00:36:51] machine translation image captioning they're basically like all throughout ai
[00:36:53] they're basically like all throughout ai and and they're very good way of
[00:36:55] and and they're very good way of thinking about like what are all the
[00:36:57] thinking about like what are all the sufficient information that you need to
[00:36:59] sufficient information that you need to know at the current time and how that
[00:37:02] know at the current time and how that should evolve like uh in the next time
[00:37:04] should evolve like uh in the next time step and then adding an ordering of
[00:37:06] step and then adding an ordering of going from this state to the next state
[00:37:09] going from this state to the next state so so we'll talk about three types of
[00:37:10] so so we'll talk about three types of state-based models we'll talk about
[00:37:12] state-based models we'll talk about search problems where you can actually
[00:37:13] search problems where you can actually control everything so so you have a
[00:37:16] control everything so so you have a state and then based on the action that
[00:37:17] state and then based on the action that you take you end up in a new state
[00:37:20] you take you end up in a new state um we'll talk about markov decision
[00:37:22] um we'll talk about markov decision processes which are making research
[00:37:24] processes which are making research problems a little bit more difficult by
[00:37:26] problems a little bit more difficult by adding uncertainty that comes from the
[00:37:29] adding uncertainty that comes from the world so so basically these markov
[00:37:31] world so so basically these markov decision processes are state-based
[00:37:32] decision processes are state-based models where you're playing against the
[00:37:34] models where you're playing against the nature right like nature gives you some
[00:37:36] nature right like nature gives you some probabilities you you would look at like
[00:37:39] probabilities you you would look at like coin tosses and based on that you
[00:37:41] coin tosses and based on that you proceed so there is kind of this notion
[00:37:43] proceed so there is kind of this notion of uncertainty and then we'll spend some
[00:37:45] of uncertainty and then we'll spend some time talking about games adversarial
[00:37:47] time talking about games adversarial games where you're not playing against
[00:37:49] games where you're not playing against nature which is probably
[00:37:51] nature which is probably instead you're playing against another
[00:37:52] instead you're playing against another opponent which is also very intelligent
[00:37:54] opponent which is also very intelligent and is making decisions kind of like
[00:37:56] and is making decisions kind of like against you as opposed to you and then
[00:37:58] against you as opposed to you and then we'll we'll basically go over these
[00:38:00] we'll we'll basically go over these different types of state-based models uh
[00:38:02] different types of state-based models uh a little bit
[00:38:04] a little bit okay so and as part of the homework for
[00:38:06] okay so and as part of the homework for uh state-based models we're going to
[00:38:09] uh state-based models we're going to play around with the game of pac-man i
[00:38:11] play around with the game of pac-man i just want to show a quick demo of uh
[00:38:13] just want to show a quick demo of uh this game here
[00:38:15] this game here yeah so you're going to play around with
[00:38:17] yeah so you're going to play around with the game of pac-man and basically come
[00:38:20] the game of pac-man and basically come up with algorithms for pac-man who can
[00:38:22] up with algorithms for pac-man who can avoid ghosts and eat these food pellets
[00:38:26] avoid ghosts and eat these food pellets and it will be kind of fun playing
[00:38:27] and it will be kind of fun playing around with it and
[00:38:29] around with it and let me go back to my slides um yeah so
[00:38:33] let me go back to my slides um yeah so and as you're thinking about pac-man and
[00:38:35] and as you're thinking about pac-man and general state-based models the things to
[00:38:36] general state-based models the things to think about is what is a notion of
[00:38:38] think about is what is a notion of states how should you transit how do you
[00:38:40] states how should you transit how do you transition from one state to another uh
[00:38:42] transition from one state to another uh how can you come up with a strategy a
[00:38:44] how can you come up with a strategy a policy that can get you from one point
[00:38:46] policy that can get you from one point to another so you avoid like the ghosts
[00:38:47] to another so you avoid like the ghosts and feet your food pilots and so on and
[00:38:50] and feet your food pilots and so on and these are some of the questions that
[00:38:51] these are some of the questions that we're going to talk about when we
[00:38:52] we're going to talk about when we discuss state-based models
[00:38:54] discuss state-based models all right so moving forward uh we're
[00:38:57] all right so moving forward uh we're then going to move to the next level of
[00:38:59] then going to move to the next level of complexity and intelligence and not
[00:39:01] complexity and intelligence and not complexity intelligence and that is
[00:39:03] complexity intelligence and that is variable based models so so an example
[00:39:06] variable based models so so an example of a variable based model is something
[00:39:08] of a variable based model is something like a game of sudoku so if you think
[00:39:10] like a game of sudoku so if you think about state-based models there's a
[00:39:12] about state-based models there's a notion of like sequential ordering of
[00:39:15] notion of like sequential ordering of states you have to do a to get to you
[00:39:17] states you have to do a to get to you have to go through a to get to b right
[00:39:19] have to go through a to get to b right like if you think about like moving
[00:39:20] like if you think about like moving through a graph to for solving like
[00:39:23] through a graph to for solving like shortest path you actually need to like
[00:39:24] shortest path you actually need to like see city one to two and then after that
[00:39:27] see city one to two and then after that ccd2 but there are a set of set of
[00:39:29] ccd2 but there are a set of set of problems that don't really require that
[00:39:32] problems that don't really require that type of ordering that type of strict
[00:39:34] type of ordering that type of strict ordering so think of the game of sudoku
[00:39:36] ordering so think of the game of sudoku right like the game of sudoku you have a
[00:39:37] right like the game of sudoku you have a bunch of numbers you want to make sure
[00:39:39] bunch of numbers you want to make sure that you can fit in like digits of one
[00:39:41] that you can fit in like digits of one through nine in every row and column and
[00:39:43] through nine in every row and column and the order that you put in these numbers
[00:39:45] the order that you put in these numbers that doesn't really matter right like
[00:39:47] that doesn't really matter right like you can put the nines first or you can
[00:39:49] you can put the nines first or you can put the ones first and that really
[00:39:51] put the ones first and that really doesn't
[00:39:54] matter really brings us this idea of
[00:39:57] matter really brings us this idea of variable based models where we don't
[00:39:59] variable based models where we don't have this script ordering and because of
[00:40:00] have this script ordering and because of that then we can do something that's a
[00:40:02] that then we can do something that's a little bit more intelligent and and
[00:40:04] little bit more intelligent and and helps us come up with better algorithms
[00:40:05] helps us come up with better algorithms in these settings so we will talk about
[00:40:07] in these settings so we will talk about two types of variable based models uh
[00:40:09] two types of variable based models uh we'll talk about uh constraint
[00:40:11] we'll talk about uh constraint satisfaction problems these are settings
[00:40:13] satisfaction problems these are settings when we have hard constraints so uh
[00:40:15] when we have hard constraints so uh sudoku was an example right like you
[00:40:17] sudoku was an example right like you have a hard constraint you have to
[00:40:18] have a hard constraint you have to actually fit like one through nine in in
[00:40:20] actually fit like one through nine in in your board or like scheduling type
[00:40:22] your board or like scheduling type problems like a person cannot be at two
[00:40:24] problems like a person cannot be at two places at the same time so it actually
[00:40:26] places at the same time so it actually has very strong strict relations between
[00:40:29] has very strong strict relations between the different variables that exist
[00:40:32] the different variables that exist but we all in addition to that we have
[00:40:34] but we all in addition to that we have also bayesian networks that try to take
[00:40:36] also bayesian networks that try to take those hard constraints and make them
[00:40:38] those hard constraints and make them soft so there are stock dependencies
[00:40:40] soft so there are stock dependencies when you think about bayesian networks
[00:40:41] when you think about bayesian networks um like unlike let's say sudoku or
[00:40:44] um like unlike let's say sudoku or scheduling so so an example is let's say
[00:40:46] scheduling so so an example is let's say you want to track an airplane or you
[00:40:47] you want to track an airplane or you want to track a car right like if you're
[00:40:49] want to track a car right like if you're tracking your car you might have a set
[00:40:51] tracking your car you might have a set of sensors on that car and those sensors
[00:40:54] of sensors on that car and those sensors are noisy they're not going to give you
[00:40:55] are noisy they're not going to give you the ground truth of where the car is you
[00:40:57] the ground truth of where the car is you also know that your car cannot like
[00:40:59] also know that your car cannot like teleport right so so the previous time
[00:41:02] teleport right so so the previous time step where it was and the current times
[00:41:04] step where it was and the current times the next time step they're related to
[00:41:06] the next time step they're related to each other and based on these different
[00:41:08] each other and based on these different types of relations of where you're where
[00:41:10] types of relations of where you're where the car is and where the car is going to
[00:41:11] the car is and where the car is going to be and the fact that you have these
[00:41:13] be and the fact that you have these noisy sensor readings
[00:41:14] noisy sensor readings you can have these soft dependencies
[00:41:16] you can have these soft dependencies between these variables and that allows
[00:41:18] between these variables and that allows you to estimate where the car is
[00:41:21] you to estimate where the car is and then that's the topic that that
[00:41:22] and then that's the topic that that we'll discuss like through uh beijing
[00:41:25] we'll discuss like through uh beijing network we'll have a homework on this
[00:41:26] network we'll have a homework on this and that will actually be about tracking
[00:41:28] and that will actually be about tracking cars that'll be exciting
[00:41:31] cars that'll be exciting all right and then finally the last
[00:41:33] all right and then finally the last components that we are going to discuss
[00:41:35] components that we are going to discuss is uh going to be uh on logic so so this
[00:41:38] is uh going to be uh on logic so so this would bring us to the highest level of
[00:41:41] would bring us to the highest level of intelligence
[00:41:42] intelligence and and for logic like as an instance of
[00:41:45] and and for logic like as an instance of an example that uses logic we can think
[00:41:48] an example that uses logic we can think of a virtual assistant so think of a
[00:41:50] of a virtual assistant so think of a virtual assistant what do you want from
[00:41:52] virtual assistant what do you want from virtual assistant you oftentimes want to
[00:41:54] virtual assistant you oftentimes want to tell it some information give it some
[00:41:56] tell it some information give it some information um and you also want to be
[00:41:58] information um and you also want to be able to ask it some some questions right
[00:42:01] able to ask it some some questions right um and then expect it to respond and
[00:42:03] um and then expect it to respond and maybe you would want to use like natural
[00:42:05] maybe you would want to use like natural language as a way of communicating with
[00:42:07] language as a way of communicating with this virtual assistant so we actually go
[00:42:09] this virtual assistant so we actually go through a virtual assistant example as
[00:42:11] through a virtual assistant example as part of the homework and logic i want to
[00:42:13] part of the homework and logic i want to show a quick demo of that again here um
[00:42:16] show a quick demo of that again here um let me see
[00:42:18] let me see if i can bring this
[00:42:20] if i can bring this to the right window let me go to the
[00:42:25] let me see if i can bring terminal to
[00:42:28] let me see if i can bring terminal to this
[00:42:29] this ah there you go there you go
[00:42:31] ah there you go there you go okay so this is actually a tool that
[00:42:33] okay so this is actually a tool that we're going to play around with during
[00:42:35] we're going to play around with during the uh during the logic homework um and
[00:42:39] the uh during the logic homework um and uh basically it's a virtual assistant
[00:42:41] uh basically it's a virtual assistant you can give it information you can ask
[00:42:43] you can give it information you can ask it information so let me let me try to
[00:42:45] it information so let me let me try to some example let me let me actually give
[00:42:47] some example let me let me actually give it some information so i'm going to say
[00:42:49] it some information so i'm going to say alice is a student
[00:42:51] alice is a student okay
[00:42:52] okay sorry dorset could you zoom in
[00:42:55] sorry dorset could you zoom in oh yeah
[00:43:10] okay so i told it alice as a student and
[00:43:12] okay so i told it alice as a student and it just learned something uh i can ask
[00:43:15] it just learned something uh i can ask it now is
[00:43:17] it now is alice
[00:43:18] alice student
[00:43:20] student what should it say
[00:43:22] what should it say it says yes right because i just told it
[00:43:24] it says yes right because i just told it alice's student
[00:43:25] alice's student i'm going to ask is bob student
[00:43:29] i'm going to ask is bob student um what should it respond
[00:43:32] um what should it respond you should probably say i don't know
[00:43:33] you should probably say i don't know right because how would it know
[00:43:36] right because how would it know i don't know
[00:43:37] i don't know let me give it some facts i can say
[00:43:39] let me give it some facts i can say students are people
[00:43:41] students are people okay
[00:43:42] okay then i can say alice is not a person
[00:43:45] then i can say alice is not a person let's see what it says in response to
[00:43:47] let's see what it says in response to that
[00:43:49] that okay it says i don't buy that so it
[00:43:51] okay it says i don't buy that so it understands it understands contradiction
[00:43:53] understands it understands contradiction so so i i told the students or people i
[00:43:55] so so i i told the students or people i i showed a generalization and now it's a
[00:43:58] i showed a generalization and now it's a contradiction to that and it understands
[00:44:00] contradiction to that and it understands that
[00:44:01] that i can say alice is the person let's see
[00:44:03] i can say alice is the person let's see what that says it confirms i already
[00:44:05] what that says it confirms i already knew that okay
[00:44:07] knew that okay uh let's give it some more information
[00:44:09] uh let's give it some more information alice is from phoenix maybe let's do
[00:44:13] alice is from phoenix maybe let's do that
[00:44:14] that house is from phoenix
[00:44:16] house is from phoenix i learned something we can say phoenix
[00:44:19] i learned something we can say phoenix is a hot
[00:44:20] is a hot city i learned something
[00:44:23] city i learned something i can say cities are places
[00:44:26] i can say cities are places places
[00:44:27] places i learned something
[00:44:29] i learned something let me
[00:44:31] let me actually make this a little bit smaller
[00:44:33] actually make this a little bit smaller so you can see this
[00:44:34] so you can see this okay so if it is uh snowing
[00:44:39] okay so if it is uh snowing then it is cold so i'm gonna teach you
[00:44:42] then it is cold so i'm gonna teach you kind of like anything else type of
[00:44:43] kind of like anything else type of statement
[00:44:44] statement i'll learn something okay i'm gonna ask
[00:44:46] i'll learn something okay i'm gonna ask it is it snowing what should you say
[00:44:50] it is it snowing what should you say oh he doesn't know
[00:44:54] okay so it says i don't know okay so i'm
[00:44:57] okay so it says i don't know okay so i'm gonna give it more information if a
[00:44:58] gonna give it more information if a person is from a hot place
[00:45:01] person is from a hot place and it is it is cold uh then she's not
[00:45:05] and it is it is cold uh then she's not happy
[00:45:06] happy okay
[00:45:07] okay so i'm giving it this more complicated
[00:45:10] so i'm giving it this more complicated if-then-else type statement
[00:45:12] if-then-else type statement i learned something
[00:45:14] i learned something i'm gonna ask is it snowing
[00:45:16] i'm gonna ask is it snowing what would it say
[00:45:20] mommy doesn't know right i don't know
[00:45:23] mommy doesn't know right i don't know let's say alice is happy okay
[00:45:26] let's say alice is happy okay now i'm gonna ask is it snowing what you
[00:45:28] now i'm gonna ask is it snowing what you could say
[00:45:32] yes it's not snowing okay yeah
[00:45:35] yes it's not snowing okay yeah so uh yeah so so this was just an
[00:45:37] so uh yeah so so this was just an example um that was going over like this
[00:45:40] example um that was going over like this the spiritual assistant and you'll play
[00:45:41] the spiritual assistant and you'll play around with this virtual assistant and
[00:45:43] around with this virtual assistant and in the logic module um you will be
[00:45:46] in the logic module um you will be thinking about this idea of giving
[00:45:47] thinking about this idea of giving information asking information and the
[00:45:49] information asking information and the logical relationships between them um
[00:45:51] logical relationships between them um and and this will be something that we
[00:45:53] and and this will be something that we will work on so i just want to quickly
[00:45:55] will work on so i just want to quickly show this demo
[00:45:56] show this demo but one thing to notice here is that
[00:45:58] but one thing to notice here is that here we were giving these kind of like
[00:46:01] here we were giving these kind of like heterogeneous information this was very
[00:46:03] heterogeneous information this was very different from me giving like a millions
[00:46:05] different from me giving like a millions of pictures of cats and training in all
[00:46:07] of pictures of cats and training in all natural right like i was giving these
[00:46:08] natural right like i was giving these very heterogeneous information and then
[00:46:11] very heterogeneous information and then the system was able to like reason about
[00:46:13] the system was able to like reason about these this information in a very deep
[00:46:15] these this information in a very deep way right like it was making these deep
[00:46:17] way right like it was making these deep connections and i could ask these
[00:46:19] connections and i could ask these questions from it and that's very
[00:46:20] questions from it and that's very exciting right like that like being able
[00:46:22] exciting right like that like being able to have these types of deep deep
[00:46:26] to have these types of deep deep deep interactions between the symbols
[00:46:27] deep interactions between the symbols that we are providing it
[00:46:29] that we are providing it um all right so so that kind of brings
[00:46:32] um all right so so that kind of brings us uh to the end of uh this module where
[00:46:35] us uh to the end of uh this module where we are thinking about different types of
[00:46:37] we are thinking about different types of different types of states uh state
[00:46:39] different types of states uh state different types of sorry models and and
[00:46:42] different types of sorry models and and as just as a just a quick recap right in
[00:46:44] as just as a just a quick recap right in this class we were going to talk about
[00:46:46] this class we were going to talk about low low-level intelligence all the way
[00:46:48] low low-level intelligence all the way to high-level intelligence models
[00:46:49] to high-level intelligence models reflex-based models state-based models
[00:46:51] reflex-based models state-based models variable-based models and logic and in
[00:46:54] variable-based models and logic and in each one of these models we're going to
[00:46:55] each one of these models we're going to talk about the usual paradigm that we
[00:46:57] talk about the usual paradigm that we have that tries to do modeling so we'll
[00:47:00] have that tries to do modeling so we'll talk about modeling in each one of these
[00:47:02] talk about modeling in each one of these these settings and then we talk about
[00:47:04] these settings and then we talk about inference what are the different
[00:47:05] inference what are the different inference algorithms that we can use in
[00:47:07] inference algorithms that we can use in addition to that we talk about learning
[00:47:08] addition to that we talk about learning right like how can we have data and how
[00:47:10] right like how can we have data and how can we how can we learn and improve our
[00:47:13] can we how can we learn and improve our models for each one of these components
[00:47:15] models for each one of these components so that paradigm keeps showing up
[00:47:17] so that paradigm keeps showing up throughout the class basically every
[00:47:19] throughout the class basically every week
[00:47:20] week all right so now let's uh spend five
[00:47:22] all right so now let's uh spend five minutes and and let's have an icebreaker
[00:47:25] minutes and and let's have an icebreaker so
[00:47:26] so he's going to put us in
[00:47:27] he's going to put us in in groups of four uh and then in
[00:47:30] in groups of four uh and then in breakout rooms and during this five
[00:47:32] breakout rooms and during this five minutes let's just try to introduce
[00:47:34] minutes let's just try to introduce ourselves to others and and just to
[00:47:36] ourselves to others and and just to maybe set the stage let's let's have a
[00:47:38] maybe set the stage let's let's have a question and let's discuss a question
[00:47:40] question and let's discuss a question and the question is what is the biggest
[00:47:41] and the question is what is the biggest benefit of ai and what is the biggest
[00:47:43] benefit of ai and what is the biggest risk of ai and when you come back try to
[00:47:46] risk of ai and when you come back try to like put that on chat and uh we will
[00:47:49] like put that on chat and uh we will discuss and and go from there okay so
[00:47:52] discuss and and go from there okay so let's spend five minutes on
[00:47:55] let's spend five minutes on on in brave cat rooms
[00:47:57] on in brave cat rooms so it was um good talking to some of you
[00:48:00] so it was um good talking to some of you during the breakout rooms um maybe uh
[00:48:03] during the breakout rooms um maybe uh yeah
[00:48:04] yeah so maybe you can uh kind of like put
[00:48:06] so maybe you can uh kind of like put some of your responses some of the
[00:48:08] some of your responses some of the things that you discussed on chat
[00:48:09] things that you discussed on chat um as a way of like just discussing some
[00:48:12] um as a way of like just discussing some of them um
[00:48:15] also quick things don't direct message
[00:48:18] also quick things don't direct message me on chat i'm like barely looking at
[00:48:20] me on chat i'm like barely looking at chat so if there are any questions
[00:48:22] chat so if there are any questions please ping the cas or email me later on
[00:48:24] please ping the cas or email me later on or with questions on ed
[00:48:27] or with questions on ed okay
[00:48:28] okay all right so yeah
[00:48:30] all right so yeah any
[00:48:31] any biggest benefits biggest risks anyone
[00:48:34] biggest benefits biggest risks anyone has thoughts
[00:48:37] okay
[00:48:39] okay thank you for starting this
[00:48:43] thank you for starting this improving people's lives tangible
[00:48:45] improving people's lives tangible applications
[00:48:46] applications mutual assistance biggest risks ml
[00:48:49] mutual assistance biggest risks ml fairness ethics
[00:48:50] fairness ethics yeah so these are all like great points
[00:48:53] yeah so these are all like great points um i'm gonna talk about them a little
[00:48:55] um i'm gonna talk about them a little bit a fifth time but now let's uh let's
[00:48:59] bit a fifth time but now let's uh let's continue with the kind of like next
[00:49:00] continue with the kind of like next segment where i want to give a little
[00:49:02] segment where i want to give a little bit of a history of uh ai here and this
[00:49:05] bit of a history of uh ai here and this is going to be brief i don't want to go
[00:49:06] is going to be brief i don't want to go into too much detail it's not going to
[00:49:08] into too much detail it's not going to be a complete history uh but i think
[00:49:10] be a complete history uh but i think it's good idea to talk about this
[00:49:12] it's good idea to talk about this because it gives a little bit of like
[00:49:13] because it gives a little bit of like insights in terms of like why we are
[00:49:15] insights in terms of like why we are where we are today and how things shaped
[00:49:18] where we are today and how things shaped over time so if you want to give a
[00:49:19] over time so if you want to give a history of ai like you can really like
[00:49:21] history of ai like you can really like go back to 1950 and 19 1950 alan turing
[00:49:25] go back to 1950 and 19 1950 alan turing put out during put out um
[00:49:27] put out during put out um his kind of like his landmark paper on
[00:49:30] his kind of like his landmark paper on computing machinery and intelligence and
[00:49:33] computing machinery and intelligence and in this paper alan turing asked a
[00:49:35] in this paper alan turing asked a question can machines think okay and he
[00:49:37] question can machines think okay and he came up with his answer which was the
[00:49:39] came up with his answer which was the imitation game which you might know as
[00:49:41] imitation game which you might know as the touring test and then you guys might
[00:49:43] the touring test and then you guys might be familiar with the turing test the
[00:49:45] be familiar with the turing test the idea of the turing test is that a
[00:49:47] idea of the turing test is that a machine can pass a turing test if it is
[00:49:49] machine can pass a turing test if it is able to fool a person into thinking that
[00:49:51] able to fool a person into thinking that it's actually a human okay
[00:49:53] it's actually a human okay and and this paper was was really like
[00:49:56] and and this paper was was really like foundational like in a sense that it
[00:49:58] foundational like in a sense that it started allowing us to think about
[00:50:00] started allowing us to think about intelligence a lot more carefully um and
[00:50:03] intelligence a lot more carefully um and and actually try to formalize that uh in
[00:50:05] and actually try to formalize that uh in a in a better way and it
[00:50:07] a in a better way and it it was kind of like one of the first
[00:50:09] it was kind of like one of the first words one of the foundational works that
[00:50:10] words one of the foundational works that it started thinking about yeah
[00:50:12] it started thinking about yeah formalizing this this idea of
[00:50:13] formalizing this this idea of intelligence and then we might argue on
[00:50:16] intelligence and then we might argue on if turing test is a good test or not in
[00:50:18] if turing test is a good test or not in terms of measuring intelligence and then
[00:50:20] terms of measuring intelligence and then we might have various opinions on that
[00:50:22] we might have various opinions on that but that part of it is not really the
[00:50:23] but that part of it is not really the part that matters the part of it that
[00:50:25] part that matters the part of it that matters is thinking about intelligence
[00:50:27] matters is thinking about intelligence being able to formalize it and one other
[00:50:29] being able to formalize it and one other thing that alan turin actually i'm
[00:50:31] thing that alan turin actually i'm turing actually like uh provided in this
[00:50:33] turing actually like uh provided in this in this paper was this idea of
[00:50:35] in this paper was this idea of separating the the question that you're
[00:50:38] separating the the question that you're trying to ask the what from the how like
[00:50:41] trying to ask the what from the how like how are we going to answer this question
[00:50:42] how are we going to answer this question right like alan turing came up with this
[00:50:44] right like alan turing came up with this idea of uh imitation game or or um or
[00:50:48] idea of uh imitation game or or um or and and basically what what this was
[00:50:50] and and basically what what this was giving us was was this idea of object
[00:50:53] giving us was was this idea of object specification right this is the thing
[00:50:55] specification right this is the thing that we are trying to get this is our
[00:50:57] that we are trying to get this is our specification that we are trying to get
[00:50:59] specification that we are trying to get at but how we do it or how the machine
[00:51:02] at but how we do it or how the machine really does it but he didn't really
[00:51:03] really does it but he didn't really specify that and this modularity of
[00:51:06] specify that and this modularity of specifying what we are trying to get and
[00:51:08] specifying what we are trying to get and how we are trying to get that this is
[00:51:10] how we are trying to get that this is really like a foundational idea that we
[00:51:12] really like a foundational idea that we have been also using in a lot of our
[00:51:14] have been also using in a lot of our algorithms and we see throughout this
[00:51:15] algorithms and we see throughout this class like separating out the objective
[00:51:18] class like separating out the objective from the algorithm and how we are going
[00:51:19] from the algorithm and how we are going about it is actually quite important and
[00:51:22] about it is actually quite important and then it's a
[00:51:23] then it's a very good foundational idea um and and
[00:51:26] very good foundational idea um and and one interesting thing is that touring uh
[00:51:28] one interesting thing is that touring uh also provided like at the end of this
[00:51:30] also provided like at the end of this paper provided some ideas in terms of
[00:51:32] paper provided some ideas in terms of how we should go about it and he talked
[00:51:35] how we should go about it and he talked about two different approaches one was
[00:51:37] about two different approaches one was this very abstract way of going about
[00:51:39] this very abstract way of going about this problem kind of like a top-down
[00:51:41] this problem kind of like a top-down view kind of like how we would go about
[00:51:43] view kind of like how we would go about solving chess and this is really related
[00:51:45] solving chess and this is really related to this idea of symbolic ai that i'm
[00:51:47] to this idea of symbolic ai that i'm going to talk about and he also provided
[00:51:49] going to talk about and he also provided another potential way of going about
[00:51:51] another potential way of going about this which is like having machines that
[00:51:53] this which is like having machines that have sense organs aka sensors and then
[00:51:56] have sense organs aka sensors and then teach them like a child right and then
[00:51:58] teach them like a child right and then this idea of teaching a machine and then
[00:52:01] this idea of teaching a machine and then getting that machine and have sensors on
[00:52:03] getting that machine and have sensors on it and let it sense data and from that
[00:52:05] it and let it sense data and from that data try to learn um that that idea is
[00:52:08] data try to learn um that that idea is very related to the idea of neural ai so
[00:52:12] very related to the idea of neural ai so so since this point right 19 1950 where
[00:52:14] so since this point right 19 1950 where touring put out this paper there has
[00:52:16] touring put out this paper there has been kind of like three different
[00:52:18] been kind of like three different flavors of ai that has been around right
[00:52:20] flavors of ai that has been around right this is there's symbolic ai there's
[00:52:22] this is there's symbolic ai there's neural ai and there's statistical ai and
[00:52:25] neural ai and there's statistical ai and i want to give a little bit of a history
[00:52:26] i want to give a little bit of a history brief history of each one of these so
[00:52:28] brief history of each one of these so let's start with in 1956 and let's start
[00:52:31] let's start with in 1956 and let's start with the the story of symbolic ai so
[00:52:34] with the the story of symbolic ai so this is the first type of ai flavor of
[00:52:37] this is the first type of ai flavor of ai that i want to talk about so so the
[00:52:39] ai that i want to talk about so so the term ai really comes back to to this age
[00:52:42] term ai really comes back to to this age of 1956 and this is when uh mccarthy um
[00:52:46] of 1956 and this is when uh mccarthy um basically organized a workshop and at
[00:52:48] basically organized a workshop and at dartsmith college so john mccarthy he
[00:52:50] dartsmith college so john mccarthy he was a faculty at stanford cs he actually
[00:52:53] was a faculty at stanford cs he actually uh created the stanford ai lab and and
[00:52:56] uh created the stanford ai lab and and he organized a workshop at dartmouth
[00:52:58] he organized a workshop at dartmouth college that summer and he invited a lot
[00:53:00] college that summer and he invited a lot of other big names like marvin minsky
[00:53:02] of other big names like marvin minsky newell um herbert simon
[00:53:05] newell um herbert simon he invited all of these people and the
[00:53:07] he invited all of these people and the goal of this workshop was to think about
[00:53:09] goal of this workshop was to think about intelligence right they wanted to they
[00:53:11] intelligence right they wanted to they had a very ambitious goal they wanted to
[00:53:13] had a very ambitious goal they wanted to think about every aspect of learning and
[00:53:15] think about every aspect of learning and then every feature of intelligence and
[00:53:17] then every feature of intelligence and they wanted to model that so precisely
[00:53:20] they wanted to model that so precisely so they can have a machine that could
[00:53:22] so they can have a machine that could simulate that and this is a very
[00:53:24] simulate that and this is a very ambitious goal right and then they were
[00:53:25] ambitious goal right and then they were really after generality they wanted to
[00:53:27] really after generality they wanted to figure out what are the general
[00:53:28] figure out what are the general principles of intelligence and
[00:53:32] principles of intelligence and learning that they can really get so so
[00:53:34] learning that they can really get so so they can have an artificial intelligence
[00:53:36] they can have an artificial intelligence and an intelligent agent that can
[00:53:38] and an intelligent agent that can stimulate that and that was that was
[00:53:40] stimulate that and that was that was really exciting and immediately after
[00:53:42] really exciting and immediately after after this workshop all these people
[00:53:44] after this workshop all these people went back their ways and they started
[00:53:45] went back their ways and they started producing really cool systems around
[00:53:47] producing really cool systems around this time so this was really the birth
[00:53:49] this time so this was really the birth of ai and you started seeing a lot of
[00:53:51] of ai and you started seeing a lot of early successes so like 1952 arthur fami
[00:53:54] early successes so like 1952 arthur fami or arthur samuel put out uh one of these
[00:53:57] or arthur samuel put out uh one of these first checkers programs that was able to
[00:54:00] first checkers programs that was able to play at the at the level of an amateur
[00:54:02] play at the at the level of an amateur like play checker at the level on
[00:54:03] like play checker at the level on amateur which was really exciting um we
[00:54:06] amateur which was really exciting um we also in addition to that um we had uh in
[00:54:09] also in addition to that um we had uh in 1955 noah and simon came up with uh the
[00:54:12] 1955 noah and simon came up with uh the theorem with kind of our first fear
[00:54:14] theorem with kind of our first fear improvers and and they basically had the
[00:54:16] improvers and and they basically had the system that could uh that could solve
[00:54:19] system that could uh that could solve generally solve problems and solve
[00:54:21] generally solve problems and solve theorems they came up with a proof for a
[00:54:24] theorems they came up with a proof for a theorem uh that was actually a lot more
[00:54:25] theorem uh that was actually a lot more elegant than uh than like a per like
[00:54:28] elegant than uh than like a per like than what people had before and they
[00:54:30] than what people had before and they tried to publish a paper uh on this
[00:54:32] tried to publish a paper uh on this proof but the paper got rejected because
[00:54:34] proof but the paper got rejected because the reviewers thought the theorem
[00:54:35] the reviewers thought the theorem already existed but it was really
[00:54:37] already existed but it was really exciting to be able to have systems that
[00:54:39] exciting to be able to have systems that can prove theorems that can play
[00:54:40] can prove theorems that can play checkers that can generally solve
[00:54:42] checkers that can generally solve problems
[00:54:43] problems and there are a lot of optimism right
[00:54:44] and there are a lot of optimism right like all these really famous people um
[00:54:47] like all these really famous people um in the in the field
[00:54:49] in the in the field and they all had a lot of optimism about
[00:54:51] and they all had a lot of optimism about what is possible with ai right herbert
[00:54:53] what is possible with ai right herbert simon said machines will be capable
[00:54:55] simon said machines will be capable within 20 years of doing any work a man
[00:54:57] within 20 years of doing any work a man can do marvin minsky said within 10
[00:54:59] can do marvin minsky said within 10 years the problem of artificial
[00:55:01] years the problem of artificial intelligence will be substantially
[00:55:02] intelligence will be substantially solved claude shannon said i visualized
[00:55:05] solved claude shannon said i visualized a time when when we will be to robots
[00:55:07] a time when when we will be to robots with dogs or to humans and i'm rooting
[00:55:08] with dogs or to humans and i'm rooting for the machines so these are not like
[00:55:11] for the machines so these are not like random people on the street
[00:55:15] famous people like founding fathers of
[00:55:17] famous people like founding fathers of ai and and these are like some of the
[00:55:19] ai and and these are like some of the optimism overwhelming optimism that
[00:55:20] optimism overwhelming optimism that people had around that time what is what
[00:55:23] people had around that time what is what we could actually do with these ai
[00:55:25] we could actually do with these ai systems and unfortunately we started
[00:55:27] systems and unfortunately we started seeing very underwhelming results so
[00:55:28] seeing very underwhelming results so around this time right the government
[00:55:30] around this time right the government really cared about the problem machine
[00:55:31] really cared about the problem machine translation and there there's a lot of
[00:55:33] translation and there there's a lot of funding around it and then we started
[00:55:35] funding around it and then we started seeing results that weren't very like
[00:55:37] seeing results that weren't very like that that weren't
[00:55:38] that that weren't kind of underwhelming and then here is
[00:55:40] kind of underwhelming and then here is kind of a made-up example but the
[00:55:43] kind of a made-up example but the results were things of the form of you
[00:55:45] results were things of the form of you may you might have a text that says the
[00:55:46] may you might have a text that says the spirit is willing but the fresh the
[00:55:48] spirit is willing but the fresh the flesh is weak and you might translate
[00:55:51] flesh is weak and you might translate that to russian and translate it back to
[00:55:53] that to russian and translate it back to english and you would get a text that
[00:55:56] english and you would get a text that says the white card is good but the meat
[00:55:57] says the white card is good but the meat is rotten which is which is very which
[00:56:00] is rotten which is which is very which is not very good
[00:56:01] is not very good and as we started seeing these results
[00:56:04] and as we started seeing these results right um
[00:56:05] right um like governments started putting out a
[00:56:07] like governments started putting out a report about how these results are are
[00:56:10] report about how these results are are not are not so great and then they
[00:56:12] not are not so great and then they started cutting off funding for ai
[00:56:14] started cutting off funding for ai research and this is around the same
[00:56:16] research and this is around the same time that we started seeing the first
[00:56:18] time that we started seeing the first winter of ai so so a lot of optimism
[00:56:21] winter of ai so so a lot of optimism wasn't really going anywhere and we had
[00:56:23] wasn't really going anywhere and we had this first winter of ai and that wasn't
[00:56:25] this first winter of ai and that wasn't so great
[00:56:26] so great so if you think about this first early
[00:56:28] so if you think about this first early era of of ai what were some of the
[00:56:31] era of of ai what were some of the problems so some of the problems were
[00:56:33] problems so some of the problems were first off we had very limited
[00:56:34] first off we had very limited computation like a lot of these problems
[00:56:36] computation like a lot of these problems were written as logical problems and and
[00:56:39] were written as logical problems and and usually like resolved as search problems
[00:56:41] usually like resolved as search problems where the search space was just growing
[00:56:43] where the search space was just growing exponentially and with the limited
[00:56:45] exponentially and with the limited hardware that we had it was just like
[00:56:47] hardware that we had it was just like not possible to solve these very
[00:56:48] not possible to solve these very difficult problems
[00:56:50] difficult problems but even if we had like infinite compute
[00:56:52] but even if we had like infinite compute at that time which we didn't there was
[00:56:54] at that time which we didn't there was another problem and this other problem
[00:56:56] another problem and this other problem was the fact that we had limited
[00:56:58] was the fact that we had limited information right like if you think
[00:56:59] information right like if you think about solving some of these very complex
[00:57:01] about solving some of these very complex ai problems that people were thinking
[00:57:03] ai problems that people were thinking about they they needed to write out
[00:57:05] about they they needed to write out these problems and the knowledge that
[00:57:07] these problems and the knowledge that exists around them using using words and
[00:57:09] exists around them using using words and objects and writing out the concepts and
[00:57:11] objects and writing out the concepts and then it was very difficult to actually
[00:57:14] then it was very difficult to actually provide all these information and we had
[00:57:16] provide all these information and we had really limited information about some of
[00:57:18] really limited information about some of these concepts but regardless we started
[00:57:20] these concepts but regardless we started seeing a lot of interesting
[00:57:22] seeing a lot of interesting contributions that came out of this era
[00:57:24] contributions that came out of this era even though it was a failure and we had
[00:57:25] even though it was a failure and we had this
[00:57:26] this winter of ai there were a lot of
[00:57:28] winter of ai there were a lot of interesting ideas that came out around
[00:57:29] interesting ideas that came out around this time like we had the list
[00:57:30] this time like we had the list programming language we had ideas like
[00:57:33] programming language we had ideas like garbage collection and time sharing and
[00:57:35] garbage collection and time sharing and a lot of these ideas are associated
[00:57:36] a lot of these ideas are associated actually to john mccarthy but it was
[00:57:38] actually to john mccarthy but it was exciting to see a lot of advances even
[00:57:41] exciting to see a lot of advances even though like we there were the problems
[00:57:43] though like we there were the problems were still there and we couldn't really
[00:57:44] were still there and we couldn't really solve the big problem and this really
[00:57:47] solve the big problem and this really brings us to the era of 70s and 80s so
[00:57:50] brings us to the era of 70s and 80s so in 70s and 80s really uh people started
[00:57:53] in 70s and 80s really uh people started thinking about this idea of knowledge
[00:57:55] thinking about this idea of knowledge and building knowledge based systems and
[00:57:57] and building knowledge based systems and and and kind of like the core idea was
[00:57:59] and and kind of like the core idea was knowledge is really the key if we can
[00:58:01] knowledge is really the key if we can encode knowledge if you can bring in
[00:58:03] encode knowledge if you can bring in ideas from experts and donate like bring
[00:58:05] ideas from experts and donate like bring in your domain knowledge and incorporate
[00:58:07] in your domain knowledge and incorporate that in the system then we can actually
[00:58:09] that in the system then we can actually solve interesting ai questions so so
[00:58:11] solve interesting ai questions so so this was the rise of expert systems
[00:58:13] this was the rise of expert systems where we can basically elicit like
[00:58:15] where we can basically elicit like domain specific knowledge from an expert
[00:58:17] domain specific knowledge from an expert and encode that into like these if then
[00:58:19] and encode that into like these if then else type statements into rules that the
[00:58:22] else type statements into rules that the system can call
[00:58:24] system can call and be able to like solve various types
[00:58:26] and be able to like solve various types of problems
[00:58:27] of problems there was also another shift around this
[00:58:29] there was also another shift around this time so so the first era of ai right the
[00:58:31] time so so the first era of ai right the john mccarthy like dartmouth college
[00:58:33] john mccarthy like dartmouth college like workshop that happened that was all
[00:58:36] like workshop that happened that was all about understanding intelligence being
[00:58:38] about understanding intelligence being able to say well what is human
[00:58:39] able to say well what is human intelligence and can we stimulate that
[00:58:42] intelligence and can we stimulate that and that didn't really work out but in
[00:58:44] and that didn't really work out but in this new era what happened was people
[00:58:46] this new era what happened was people started kind of like changing and
[00:58:49] started kind of like changing and changing paradigms and they started
[00:58:51] changing paradigms and they started thinking about applications a lot more
[00:58:53] thinking about applications a lot more so sure you're not going to be able to
[00:58:55] so sure you're not going to be able to think about intelligence and simulating
[00:58:56] think about intelligence and simulating intelligence but i can build systems
[00:58:59] intelligence but i can build systems that can be used in chemistry or they
[00:59:00] that can be used in chemistry or they can be used in medical diagnosis or they
[00:59:02] can be used in medical diagnosis or they could be used in business and then this
[00:59:04] could be used in business and then this was kind of like the first era that
[00:59:06] was kind of like the first era that people started building ai systems that
[00:59:08] people started building ai systems that maybe they learned so much about
[00:59:09] maybe they learned so much about simulating intelligence but they were
[00:59:11] simulating intelligence but they were they were about solving interesting
[00:59:13] they were about solving interesting useful problems that could be used in
[00:59:14] useful problems that could be used in industry
[00:59:16] industry so um lots of events during this time
[00:59:18] so um lots of events during this time right so so these knowledge-based
[00:59:20] right so so these knowledge-based systems they really helped us uh with
[00:59:22] systems they really helped us uh with both the information and computation gap
[00:59:25] both the information and computation gap right they allowed us to like
[00:59:26] right they allowed us to like incorporate knowledge and information
[00:59:28] incorporate knowledge and information and by doing so they allowed us to kind
[00:59:30] and by doing so they allowed us to kind of like prune the search space and that
[00:59:32] of like prune the search space and that would help us need like less compute to
[00:59:35] would help us need like less compute to solve some of these problems so that was
[00:59:37] solve some of these problems so that was exciting and this was the first time
[00:59:39] exciting and this was the first time that we were seeing some of these real
[00:59:41] that we were seeing some of these real applications that actually impacted
[00:59:42] applications that actually impacted industry and that was also a lot that
[00:59:45] industry and that was also a lot that was also very exciting
[00:59:46] was also very exciting but there were still some problems
[00:59:49] but there were still some problems around this era one of the problems was
[00:59:52] around this era one of the problems was these rules were very deterministic and
[00:59:54] these rules were very deterministic and they couldn't really handle the
[00:59:55] they couldn't really handle the uncertainty that existed in real world
[00:59:57] uncertainty that existed in real world we had all these deterministic
[00:59:58] we had all these deterministic connections and rules that were never
[01:00:00] connections and rules that were never coming together and they weren't really
[01:00:02] coming together and they weren't really capturing the complexity of that that
[01:00:04] capturing the complexity of that that exists in the world
[01:00:05] exists in the world in addition to that the rules were
[01:00:07] in addition to that the rules were becoming very complex very quickly so so
[01:00:10] becoming very complex very quickly so so this is a quote from terry vinograd who
[01:00:12] this is a quote from terry vinograd who was a faculty and hci um in cs
[01:00:16] was a faculty and hci um in cs department in hci at stanford and at the
[01:00:18] department in hci at stanford and at the time he was faculty actually and working
[01:00:20] time he was faculty actually and working in ai at mit and and here's here is some
[01:00:23] in ai at mit and and here's here is some codes that he said around these
[01:00:25] codes that he said around these knowledge-based systems he said well
[01:00:26] knowledge-based systems he said well these systems are a dead end and they
[01:00:28] these systems are a dead end and they have very complex interactions that are
[01:00:30] have very complex interactions that are that are difficult to handle and they're
[01:00:32] that are difficult to handle and they're just really no easy footholds to have
[01:00:34] just really no easy footholds to have and and this really brings us to the
[01:00:36] and and this really brings us to the second winter of ai so lots of
[01:00:39] second winter of ai so lots of excitement right we're seeing real
[01:00:41] excitement right we're seeing real applications but there was still quite a
[01:00:42] applications but there was still quite a bit of difficulty into extending these
[01:00:45] bit of difficulty into extending these systems
[01:00:46] systems and and this was kind of the end of like
[01:00:48] and and this was kind of the end of like this era of symbolic ai right symbolic
[01:00:51] this era of symbolic ai right symbolic ai really dominated ai for many decades
[01:00:54] ai really dominated ai for many decades and then and i i now want to go back in
[01:00:57] and then and i i now want to go back in time right like i'm now in 1987 but now
[01:01:00] time right like i'm now in 1987 but now i want to go back in time and tell you a
[01:01:02] i want to go back in time and tell you a little bit of a history of neural ai and
[01:01:05] little bit of a history of neural ai and where that started and what was what was
[01:01:07] where that started and what was what was kind of um
[01:01:08] kind of um the progression of that and that takes
[01:01:10] the progression of that and that takes us to 1943. so
[01:01:12] us to 1943. so going back in time
[01:01:14] going back in time let's think about okay artificial neural
[01:01:16] let's think about okay artificial neural networks and how they started so so in
[01:01:18] networks and how they started so so in 1943 mccullough and pitts they they came
[01:01:22] 1943 mccullough and pitts they they came up with the first artificial neural
[01:01:24] up with the first artificial neural network where they had a single neuron
[01:01:25] network where they had a single neuron and they basically modeled that's that
[01:01:27] and they basically modeled that's that single neuron with that and they they
[01:01:29] single neuron with that and they they thought about very similar relations
[01:01:31] thought about very similar relations like ants and nerves and and they
[01:01:33] like ants and nerves and and they weren't thinking about learning rules or
[01:01:34] weren't thinking about learning rules or anything of that form at that point and
[01:01:36] anything of that form at that point and then that is kind of like 1943 the very
[01:01:38] then that is kind of like 1943 the very first version of artificial neural
[01:01:40] first version of artificial neural networks and in 1949 help came up with
[01:01:44] networks and in 1949 help came up with the idea of coming up with a learning
[01:01:45] the idea of coming up with a learning rule here this learning rule was very
[01:01:47] rule here this learning rule was very simple it was cells that fire together
[01:01:49] simple it was cells that fire together wire together and this learning rule it
[01:01:52] wire together and this learning rule it was actually kind of like it didn't
[01:01:53] was actually kind of like it didn't really work and it wasn't very unstable
[01:01:55] really work and it wasn't very unstable but it was one of the first learning
[01:01:56] but it was one of the first learning rules that was put in place
[01:01:58] rules that was put in place and finally in 1958 you started seeing
[01:02:00] and finally in 1958 you started seeing some advances in artificial neural
[01:02:02] some advances in artificial neural networks this is when rosenblatt came up
[01:02:04] networks this is when rosenblatt came up with the perceptron algorithm for a
[01:02:06] with the perceptron algorithm for a single layer neural network which is
[01:02:08] single layer neural network which is basically a linear classifier and this
[01:02:10] basically a linear classifier and this perceptron algorithm was even been like
[01:02:13] perceptron algorithm was even been like was was being used until very recently
[01:02:15] was was being used until very recently and it showed a lot of success it was
[01:02:17] and it showed a lot of success it was actually very powerful and there was a
[01:02:18] actually very powerful and there was a lot of excitement around it
[01:02:20] lot of excitement around it um 1959 we started seeing the analog of
[01:02:24] um 1959 we started seeing the analog of linear regression that was adeline there
[01:02:26] linear regression that was adeline there was a multi-layer extension of that
[01:02:28] was a multi-layer extension of that madeline and and this was actually used
[01:02:31] madeline and and this was actually used for removing echoes on phone lines and
[01:02:33] for removing echoes on phone lines and this was again one of the very first
[01:02:35] this was again one of the very first times that people started using
[01:02:36] times that people started using artificial neural networks for for a
[01:02:38] artificial neural networks for for a real application like removing echoes
[01:02:40] real application like removing echoes from a phone line
[01:02:42] from a phone line and in 1969 that was an important year
[01:02:45] and in 1969 that was an important year for artificial neural networks so so
[01:02:47] for artificial neural networks so so this year minsky and packard wrote a
[01:02:49] this year minsky and packard wrote a book on artificial neural networks they
[01:02:51] book on artificial neural networks they wrote a book called perceptrons and they
[01:02:53] wrote a book called perceptrons and they basically tried to analyze mathematical
[01:02:55] basically tried to analyze mathematical properties of linear models and the
[01:02:58] properties of linear models and the thing that they showed was actually
[01:02:59] thing that they showed was actually something that was very simple which is
[01:03:01] something that was very simple which is a single layer neural network is a
[01:03:03] a single layer neural network is a linear classifier it's actually not
[01:03:05] linear classifier it's actually not going to be able to solve the function
[01:03:07] going to be able to solve the function of x or and then this this book is
[01:03:10] of x or and then this this book is really associated uh to shutting down
[01:03:13] really associated uh to shutting down the research on artificial neural
[01:03:14] the research on artificial neural networks so that was a time that people
[01:03:16] networks so that was a time that people started thinking maybe maybe um
[01:03:18] started thinking maybe maybe um these types of artificial neural
[01:03:20] these types of artificial neural networks are not very powerful and maybe
[01:03:21] networks are not very powerful and maybe we should stop doing research on them
[01:03:23] we should stop doing research on them even though the book wasn't really any
[01:03:25] even though the book wasn't really any saying anything about these neural
[01:03:26] saying anything about these neural networks
[01:03:28] networks regardless um they started seeing a
[01:03:30] regardless um they started seeing a revival of neural networks and and
[01:03:33] revival of neural networks and and around 1980s and this was kind of with
[01:03:35] around 1980s and this was kind of with the rise of convolutional neural
[01:03:36] the rise of convolutional neural networks and the idea like this came
[01:03:38] networks and the idea like this came under the umbrella connectionism
[01:03:41] under the umbrella connectionism and you started seeing that this very
[01:03:43] and you started seeing that this very very first convolutional neural networks
[01:03:45] very first convolutional neural networks and the training of them was very ad hoc
[01:03:47] and the training of them was very ad hoc uh but but in 1986 that was kind of
[01:03:51] uh but but in 1986 that was kind of around the time that we started seeing
[01:03:52] around the time that we started seeing better training and giving like more
[01:03:54] better training and giving like more principled ways of training these
[01:03:56] principled ways of training these systems so uh so uh around this time
[01:04:00] systems so uh so uh around this time rama hart hinton and and williams they
[01:04:02] rama hart hinton and and williams they popularized or kind of like reinvented
[01:04:04] popularized or kind of like reinvented the idea of back propagation and then
[01:04:07] the idea of back propagation and then that was adding a lot more principle
[01:04:08] that was adding a lot more principle into how we should train these systems
[01:04:11] into how we should train these systems and 1989 again that was like one of the
[01:04:13] and 1989 again that was like one of the first times that you started seeing
[01:04:15] first times that you started seeing these systems being used in practice so
[01:04:17] these systems being used in practice so john le applied commercial neural
[01:04:19] john le applied commercial neural networks for recognizing handwritten
[01:04:21] networks for recognizing handwritten digits and he actually deployed this
[01:04:23] digits and he actually deployed this with usps for detecting digits of zip
[01:04:25] with usps for detecting digits of zip codes which was really exciting
[01:04:27] codes which was really exciting but still this idea of artificial neural
[01:04:29] but still this idea of artificial neural networks it was still like a niche like
[01:04:31] networks it was still like a niche like it wasn't like a thing that everyone was
[01:04:33] it wasn't like a thing that everyone was working on and and that was until like
[01:04:36] working on and and that was until like the era of like deep learning and the
[01:04:38] the era of like deep learning and the 2000 2010s like era so and part of the
[01:04:41] 2000 2010s like era so and part of the reason was it was actually really
[01:04:43] reason was it was actually really difficult to train these models but in
[01:04:45] difficult to train these models but in 2006 hinton and all they developed the
[01:04:48] 2006 hinton and all they developed the system an unsupervised layer-wise
[01:04:50] system an unsupervised layer-wise pre-training system that was helping
[01:04:53] pre-training system that was helping pre-training some of these some of these
[01:04:55] pre-training some of these some of these neural networks and kind of reducing the
[01:04:57] neural networks and kind of reducing the efforts that goes into into training the
[01:05:00] efforts that goes into into training the disease models but the break really
[01:05:02] disease models but the break really happened around 2000 so 2012 we started
[01:05:06] happened around 2000 so 2012 we started seeing systems like alex said that were
[01:05:08] seeing systems like alex said that were giving us huge gains in terms of object
[01:05:11] giving us huge gains in terms of object recognition like these systems basically
[01:05:13] recognition like these systems basically [Music]
[01:05:15] [Music] revolutionize and transform the field of
[01:05:17] revolutionize and transform the field of computer vision overnight right the type
[01:05:19] computer vision overnight right the type of the computer vision course that i
[01:05:21] of the computer vision course that i took in 2012 was actually pre-neural
[01:05:23] took in 2012 was actually pre-neural networks and it was very different from
[01:05:25] networks and it was very different from like what is being taught today and and
[01:05:27] like what is being taught today and and this basically changed the field like
[01:05:29] this basically changed the field like overnight with with the rise of uh
[01:05:31] overnight with with the rise of uh commercial neural networks and training
[01:05:33] commercial neural networks and training these systems and being able to do
[01:05:34] these systems and being able to do object recognition
[01:05:37] object recognition and uh finally in 2016 also we started
[01:05:40] and uh finally in 2016 also we started seeing um things like alphago like
[01:05:42] seeing um things like alphago like another breakthrough and and alphago um
[01:05:45] another breakthrough and and alphago um was basically using deep reinforcement
[01:05:47] was basically using deep reinforcement learning to defeat like world champion
[01:05:49] learning to defeat like world champion go player um and that was another
[01:05:52] go player um and that was another breakthrough that that was that was a
[01:05:53] breakthrough that that was that was a game that people were not thinking we're
[01:05:55] game that people were not thinking we're thinking it was a lot more difficult and
[01:05:56] thinking it was a lot more difficult and it was really exciting to see deep
[01:05:58] it was really exciting to see deep learning and deep reinforcement learning
[01:06:00] learning and deep reinforcement learning is able to solve some of these problems
[01:06:02] is able to solve some of these problems so all right so let me try to wrap up
[01:06:04] so all right so let me try to wrap up because i know it's almost 3 p.m and and
[01:06:06] because i know it's almost 3 p.m and and we'll release um modules on some of the
[01:06:08] we'll release um modules on some of the rest of the rest of the lecture uh later
[01:06:11] rest of the rest of the lecture uh later today uh but let me just give you some
[01:06:14] today uh but let me just give you some some kind of like food for thoughts i've
[01:06:16] some kind of like food for thoughts i've talked about symbolically i have talked
[01:06:18] talked about symbolically i have talked about numeral ai the symbolic ai is
[01:06:20] about numeral ai the symbolic ai is really this top-down view that that is
[01:06:24] really this top-down view that that is that goes back its roots goes back
[01:06:25] that goes back its roots goes back really to logic right like you had these
[01:06:27] really to logic right like you had these very big goals of building a virtual
[01:06:30] very big goals of building a virtual assistant and neural ai on the other
[01:06:32] assistant and neural ai on the other hand it's more like bottom up it's
[01:06:34] hand it's more like bottom up it's trying to solve these perceptual tasks
[01:06:36] trying to solve these perceptual tasks and um and the two like might seem like
[01:06:40] and um and the two like might seem like might seem to have very philosophical
[01:06:42] might seem to have very philosophical differences and there's they might seem
[01:06:44] differences and there's they might seem contradictory but they're actually not
[01:06:46] contradictory but they're actually not very contradictory like there are a lot
[01:06:48] very contradictory like there are a lot of deeper connections between them and
[01:06:50] of deeper connections between them and today actually people are thinking about
[01:06:51] today actually people are thinking about integrating them in ways that that we
[01:06:54] integrating them in ways that that we weren't able to do before and even if
[01:06:56] weren't able to do before and even if you go back to the history of it right
[01:06:57] you go back to the history of it right like what mccullough and were doing with
[01:06:59] like what mccullough and were doing with the first neural network was was
[01:07:02] the first neural network was was actually analyzing properties of a
[01:07:03] actually analyzing properties of a logical system or alphago if you think
[01:07:06] logical system or alphago if you think about alphago it's a very logical game
[01:07:08] about alphago it's a very logical game that you write like the the the rules of
[01:07:10] that you write like the the the rules of the game in logic and and it's using
[01:07:12] the game in logic and and it's using it's using neural networks to solve that
[01:07:14] it's using neural networks to solve that game so there are deeper connections
[01:07:16] game so there are deeper connections between these two between between uh
[01:07:18] between these two between between uh these two views of ai and they really
[01:07:20] these two views of ai and they really come together
[01:07:22] come together um all right so um
[01:07:24] um all right so um sorry for going over a little bit uh
[01:07:27] sorry for going over a little bit uh what is left as part of this lecture is
[01:07:29] what is left as part of this lecture is really talking talking a little bit
[01:07:31] really talking talking a little bit about statistical ai um we'll release
[01:07:34] about statistical ai um we'll release modules on these um and then thinking
[01:07:37] modules on these um and then thinking about where statistical ai is coming
[01:07:38] about where statistical ai is coming into play and and wrapping up the
[01:07:40] into play and and wrapping up the history of ai and then the other part
[01:07:42] history of ai and then the other part that is really left in is talking about
[01:07:44] that is really left in is talking about ai and what are some risks and benefits
[01:07:46] ai and what are some risks and benefits of that and that's something that you
[01:07:47] of that and that's something that you guys talked a little bit about during
[01:07:49] guys talked a little bit about during the breakout rooms uh but we'll also
[01:07:51] the breakout rooms uh but we'll also talk about that in lecture so if you
[01:07:53] talk about that in lecture so if you want to watch these lectures uh later
[01:07:56] want to watch these lectures uh later that would be cool and
[01:07:57] that would be cool and with that
[01:07:59] with that i can
[01:08:00] i can if there are any questions i can take
[01:08:01] if there are any questions i can take any questions otherwise i'll see you
[01:08:04] any questions otherwise i'll see you guys
[01:08:05] guys next week
[01:08:11] you


================================================================================
LECTURE 002
================================================================================

AI History | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=z8fEXuH0mu0

---

Transcript

[00:00:05] the next thing i want to do is talk a
[00:00:07] the next thing i want to do is talk a bit about the history of ai
[00:00:10] bit about the history of ai and obviously the history of ai is going
[00:00:12] and obviously the history of ai is going to be necessarily abbreviated and
[00:00:14] to be necessarily abbreviated and simplified here but i just want to give
[00:00:15] simplified here but i just want to give you appreciation for how multifaceted
[00:00:18] you appreciation for how multifaceted the history is and how
[00:00:20] the history is and how rich and somewhat sometimes
[00:00:22] rich and somewhat sometimes controversial it is
[00:00:24] controversial it is so a natural starting point to talk
[00:00:26] so a natural starting point to talk about the history of ai is alan turing's
[00:00:29] about the history of ai is alan turing's famous paper in 1950 called computing
[00:00:31] famous paper in 1950 called computing machinery and intelligence
[00:00:33] machinery and intelligence so in this paper he asked the question
[00:00:36] so in this paper he asked the question can machines think
[00:00:37] can machines think and he proposes the imitation game as
[00:00:39] and he proposes the imitation game as his solution
[00:00:41] his solution more popularly known as the turing test
[00:00:44] more popularly known as the turing test and some of you probably know the turing
[00:00:46] and some of you probably know the turing test is said to be passed by a machine
[00:00:49] test is said to be passed by a machine if it can fool a human judge into
[00:00:51] if it can fool a human judge into thinking that it is actually a human
[00:00:53] thinking that it is actually a human being
[00:00:54] being so this paper is remarkable not because
[00:00:57] so this paper is remarkable not because it built a system or proposed new
[00:00:59] it built a system or proposed new methods but it framed the philosophical
[00:01:01] methods but it framed the philosophical discussions of what is intelligence for
[00:01:04] discussions of what is intelligence for years to come
[00:01:06] years to come and you just have to appreciate how
[00:01:08] and you just have to appreciate how difficult a notion
[00:01:10] difficult a notion intelligence is to pin down so this was
[00:01:12] intelligence is to pin down so this was really the first actionable
[00:01:14] really the first actionable formal answer to the question can
[00:01:16] formal answer to the question can machines think and now whether you
[00:01:19] machines think and now whether you think that
[00:01:20] think that working on a turing test is a good idea
[00:01:22] working on a turing test is a good idea that will lead to progress is
[00:01:24] that will lead to progress is questionable and controversial but at
[00:01:26] questionable and controversial but at least philosophically it's quite
[00:01:28] least philosophically it's quite thought-provoking
[00:01:30] thought-provoking so for us one major takeaway of the
[00:01:32] so for us one major takeaway of the turing tests which was not really
[00:01:34] turing tests which was not really highlighted
[00:01:35] highlighted is this objective specification
[00:01:38] is this objective specification so note that the test itself
[00:01:41] so note that the test itself is
[00:01:42] is meant to be capturing what a system
[00:01:44] meant to be capturing what a system ought to be doing
[00:01:46] ought to be doing independent of how you get there it
[00:01:48] independent of how you get there it doesn't say whether it should be using
[00:01:50] doesn't say whether it should be using neural networks or logic based methods
[00:01:52] neural networks or logic based methods or so on
[00:01:54] or so on and this modularity is going to be
[00:01:56] and this modularity is going to be really important to us in the
[00:01:58] really important to us in the in this course
[00:02:00] in this course so at the end of the paper train does
[00:02:02] so at the end of the paper train does speculate on what might work so he talks
[00:02:05] speculate on what might work so he talks about two possible approaches you could
[00:02:07] about two possible approaches you could take a top-down approach and try to
[00:02:10] take a top-down approach and try to tackle abstract problems such as chess
[00:02:12] tackle abstract problems such as chess this is the route taken by symbolic ai
[00:02:15] this is the route taken by symbolic ai you could also
[00:02:16] you could also quote unquote um
[00:02:18] quote unquote um provide a machine with the best sense
[00:02:20] provide a machine with the best sense organs and aka sensors and teach it like
[00:02:24] organs and aka sensors and teach it like a child
[00:02:25] a child and this is more of the approach taken
[00:02:27] and this is more of the approach taken by neural and statistical ai
[00:02:30] by neural and statistical ai and both have been tried
[00:02:33] and both have been tried and we'll see
[00:02:34] and we'll see how all three
[00:02:36] how all three uh
[00:02:37] uh types of ai symbolic neural statistical
[00:02:40] types of ai symbolic neural statistical kind of meld together at the end
[00:02:43] so to start our first story let's go to
[00:02:46] so to start our first story let's go to the summer of 1956. the place was
[00:02:49] the summer of 1956. the place was starmeth college
[00:02:51] starmeth college john mccarthy who actually founded the
[00:02:53] john mccarthy who actually founded the stanford ai lab and
[00:02:56] stanford ai lab and organized a workshop he gathered the
[00:02:58] organized a workshop he gathered the brightest minds of the time
[00:03:00] brightest minds of the time in attendance with marvin minsky alan
[00:03:02] in attendance with marvin minsky alan newell herbert simon all of whom want to
[00:03:04] newell herbert simon all of whom want to make seminal contributions ai
[00:03:07] make seminal contributions ai and the participants set out a
[00:03:09] and the participants set out a not so modest proposal it was to
[00:03:12] not so modest proposal it was to they claim that every aspect of learning
[00:03:15] they claim that every aspect of learning or any other feature intelligence can be
[00:03:17] or any other feature intelligence can be so precisely described that a machine
[00:03:19] so precisely described that a machine can be made to simulate it
[00:03:22] can be made to simulate it so they were really after the moon
[00:03:26] so they were really after the moon they were after generality
[00:03:28] they were after generality and this was post-war computers were
[00:03:31] and this was post-war computers were just coming on the scene it was a really
[00:03:33] just coming on the scene it was a really exciting time and people were really
[00:03:35] exciting time and people were really ambitious
[00:03:38] so during this time there were a few
[00:03:40] so during this time there were a few systems that were built um arthur samuel
[00:03:43] systems that were built um arthur samuel built a computer program that could play
[00:03:45] built a computer program that could play chat checkers at a reasonable amateur
[00:03:47] chat checkers at a reasonable amateur level and actually featured some uh you
[00:03:50] level and actually featured some uh you know machine learning
[00:03:52] know machine learning um
[00:03:53] um ali newell and herbert simon
[00:03:55] ali newell and herbert simon came up with a logic theorist that could
[00:03:57] came up with a logic theorist that could prove theorems
[00:03:59] prove theorems for one theorem they actually found a
[00:04:00] for one theorem they actually found a proof that was better than the human
[00:04:02] proof that was better than the human written proof and they tried to submit a
[00:04:04] written proof and they tried to submit a paper on the result but the paper got
[00:04:06] paper on the result but the paper got rejected because
[00:04:08] rejected because the reviewers said it was not a new
[00:04:10] the reviewers said it was not a new theorem
[00:04:11] theorem what the reviewers didn't realize that
[00:04:13] what the reviewers didn't realize that the third author was actually a computer
[00:04:15] the third author was actually a computer program
[00:04:17] program later they worked generalized these
[00:04:19] later they worked generalized these ideas to the general problem solver
[00:04:22] ideas to the general problem solver which
[00:04:22] which was aimed at solving any problem
[00:04:24] was aimed at solving any problem provided it could be suitably encoded in
[00:04:26] provided it could be suitably encoded in logic and again this carries forward the
[00:04:29] logic and again this carries forward the ambitious general intelligence
[00:04:31] ambitious general intelligence agenda
[00:04:34] sand this was a time of high optimism
[00:04:38] sand this was a time of high optimism with the leaders of the field who are
[00:04:40] with the leaders of the field who are all really impressive thinkers
[00:04:42] all really impressive thinkers predicting ai would be solved in a
[00:04:45] predicting ai would be solved in a matter of years
[00:04:48] but we know that
[00:04:50] but we know that they didn't get solved in 10 years and
[00:04:52] they didn't get solved in 10 years and there were some
[00:04:54] there were some tasks such as machine translations which
[00:04:56] tasks such as machine translations which were very stubborn
[00:04:57] were very stubborn so this is now a folklore story i don't
[00:05:00] so this is now a folklore story i don't know how true it is but it's amusing
[00:05:01] know how true it is but it's amusing nonetheless um you take a sentence like
[00:05:04] nonetheless um you take a sentence like the spirit is willing but the flesh is
[00:05:05] the spirit is willing but the flesh is weak
[00:05:06] weak translate it into russian which was the
[00:05:09] translate it into russian which was the favorite language for translation in the
[00:05:11] favorite language for translation in the 50s and you translate it back
[00:05:14] 50s and you translate it back and then you get the vodka is good but
[00:05:16] and then you get the vodka is good but the meat is rotten
[00:05:18] the meat is rotten so
[00:05:19] so uh this was
[00:05:21] uh this was less than amusing to the government
[00:05:23] less than amusing to the government funding agencies
[00:05:24] funding agencies who decided to write a report showing
[00:05:26] who decided to write a report showing how really machine translation wasn't
[00:05:28] how really machine translation wasn't going anywhere and cut off funding this
[00:05:30] going anywhere and cut off funding this led to the first ai winter
[00:05:34] led to the first ai winter so what went wrong here
[00:05:36] so what went wrong here so there's two things
[00:05:39] so there's two things first is that most of the approaches
[00:05:42] first is that most of the approaches involved casting problems as logical
[00:05:44] involved casting problems as logical reasoning which required a search over
[00:05:46] reasoning which required a search over an exponentially large state space and
[00:05:48] an exponentially large state space and the hardware at the time was just simply
[00:05:50] the hardware at the time was just simply too limited
[00:05:52] too limited and secondly
[00:05:54] and secondly even if the research had infinite
[00:05:56] even if the research had infinite compute they would still not be able to
[00:05:58] compute they would still not be able to solve ai because there's just too many
[00:06:00] solve ai because there's just too many concepts in the world words objects and
[00:06:04] concepts in the world words objects and all this information has to somehow be
[00:06:06] all this information has to somehow be put
[00:06:07] put into the ai system
[00:06:10] into the ai system so these grand ambitions weren't
[00:06:11] so these grand ambitions weren't realized but nonetheless there were some
[00:06:14] realized but nonetheless there were some useful contributions
[00:06:16] useful contributions many due to john mccarthy that came out
[00:06:18] many due to john mccarthy that came out of this era
[00:06:19] of this era first lisp was invented for ai and
[00:06:22] first lisp was invented for ai and arguably it's still the world's most
[00:06:25] arguably it's still the world's most advanced programming language
[00:06:27] advanced programming language garbage collection is something that if
[00:06:28] garbage collection is something that if you're programming only in python it
[00:06:31] you're programming only in python it allows you to not know what garbage
[00:06:32] allows you to not know what garbage collection is and time sharing the
[00:06:35] collection is and time sharing the ability to use a single computer by
[00:06:37] ability to use a single computer by multiple people was prescient at the
[00:06:39] multiple people was prescient at the time
[00:06:42] so then fast forward to the 70s and 80s
[00:06:46] so then fast forward to the 70s and 80s knowledge was the key word
[00:06:48] knowledge was the key word and ai researchers thought knowledge was
[00:06:50] and ai researchers thought knowledge was the key to combat both the computation
[00:06:53] the key to combat both the computation and information limitations of the
[00:06:55] and information limitations of the previous era
[00:06:56] previous era and at that time expert systems became
[00:06:58] and at that time expert systems became very fashionable
[00:07:00] very fashionable where a domain expert could encode
[00:07:02] where a domain expert could encode knowledge in the form of rules usually
[00:07:04] knowledge in the form of rules usually looking like this
[00:07:06] looking like this and
[00:07:07] and there was a noticeable shift
[00:07:09] there was a noticeable shift as well to solve it all optimism from
[00:07:11] as well to solve it all optimism from the 50s and 60s was gone and instead
[00:07:13] the 50s and 60s was gone and instead researchers focused on very practical
[00:07:16] researchers focused on very practical systems targeted at particular domains
[00:07:18] systems targeted at particular domains for example chemistry medical diagnosis
[00:07:21] for example chemistry medical diagnosis and
[00:07:22] and business operations
[00:07:24] business operations and
[00:07:26] and there were some good things knowledge
[00:07:27] there were some good things knowledge did help
[00:07:28] did help curb both the information complexity and
[00:07:31] curb both the information complexity and also restricted the space state space so
[00:07:34] also restricted the space state space so that it alleviated the computation
[00:07:36] that it alleviated the computation burden
[00:07:38] burden and this was the first time that ai had
[00:07:41] and this was the first time that ai had real
[00:07:42] real applications on industry
[00:07:45] applications on industry but there were obviously problems
[00:07:47] but there were obviously problems deterministic rules couldn't handle the
[00:07:49] deterministic rules couldn't handle the complexity and uncertainty in the real
[00:07:51] complexity and uncertainty in the real world and moreover these rules just
[00:07:54] world and moreover these rules just became quickly too complex to create and
[00:07:56] became quickly too complex to create and maintain
[00:07:57] maintain so this is a quote from terry buenergrad
[00:07:59] so this is a quote from terry buenergrad who some of you know was on the hci
[00:08:01] who some of you know was on the hci faculty at stanford but before he was a
[00:08:04] faculty at stanford but before he was a hdf faculty he worked at mit as an ai
[00:08:06] hdf faculty he worked at mit as an ai researcher
[00:08:07] researcher and
[00:08:08] and this is what he had to say in the mid
[00:08:10] this is what he had to say in the mid 70s you thought that it was a dead end
[00:08:12] 70s you thought that it was a dead end there was just too many complex
[00:08:14] there was just too many complex interactions between all the components
[00:08:16] interactions between all the components no easy footholds and you just couldn't
[00:08:19] no easy footholds and you just couldn't hold the comp
[00:08:21] hold the comp have a mental model what was going on in
[00:08:23] have a mental model what was going on in your head
[00:08:25] your head and moreover there was a lot of
[00:08:27] and moreover there was a lot of over-promising and under-delivering
[00:08:29] over-promising and under-delivering field collapsed again
[00:08:31] field collapsed again and it really seemed that history was
[00:08:33] and it really seemed that history was repeating itself
[00:08:36] repeating itself so at this point we're going to leave
[00:08:37] so at this point we're going to leave aside the story of symbolic ai which
[00:08:40] aside the story of symbolic ai which dominated ai for multiple decades
[00:08:42] dominated ai for multiple decades and go back in time to 1943 to tell the
[00:08:45] and go back in time to 1943 to tell the story of neural ai
[00:08:48] story of neural ai so 1943 is the year often attributed to
[00:08:52] so 1943 is the year often attributed to the birth of artificial neural networks
[00:08:55] the birth of artificial neural networks so mccollum pitts
[00:08:56] so mccollum pitts devised a simple model that
[00:08:59] devised a simple model that and study mathematical properties of the
[00:09:02] and study mathematical properties of the simple model
[00:09:05] but they didn't do anything in a way of
[00:09:07] but they didn't do anything in a way of learning
[00:09:08] learning the model's parameters in 1946 there was
[00:09:12] the model's parameters in 1946 there was a first learning rule by donald hebb
[00:09:15] a first learning rule by donald hebb based on the mantra that cells that fire
[00:09:17] based on the mantra that cells that fire together wired together it was nice and
[00:09:19] together wired together it was nice and simple but it didn't really work 1958
[00:09:22] simple but it didn't really work 1958 rose and black came up with a perceptron
[00:09:24] rose and black came up with a perceptron algorithm for learning single-layer
[00:09:26] algorithm for learning single-layer artificial neural networks aka linear
[00:09:28] artificial neural networks aka linear classifiers which actually turned out to
[00:09:30] classifiers which actually turned out to work really well
[00:09:32] work really well even was used even fairly recently
[00:09:36] even was used even fairly recently 59 there was analog for linear
[00:09:38] 59 there was analog for linear regression by woodrow and hoff
[00:09:40] regression by woodrow and hoff they came up with actually a multi-layer
[00:09:42] they came up with actually a multi-layer generalization called madeline
[00:09:44] generalization called madeline which was actually used to eliminate
[00:09:46] which was actually used to eliminate echoes on phone lines at the time and
[00:09:49] echoes on phone lines at the time and this was one of the first real world
[00:09:50] this was one of the first real world applications of neural networks
[00:09:54] applications of neural networks and then 1969 this was a big year
[00:09:57] and then 1969 this was a big year so marvin minsky seymour pappard wrote a
[00:10:00] so marvin minsky seymour pappard wrote a small book called perceptrons
[00:10:03] small book called perceptrons and they analyzed perceptrons with
[00:10:06] and they analyzed perceptrons with varying mathematical properties
[00:10:08] varying mathematical properties and they had a little almost trivial
[00:10:10] and they had a little almost trivial result that showed that single-layer
[00:10:13] result that showed that single-layer perceptron couldn't recognize the xor
[00:10:17] perceptron couldn't recognize the xor function
[00:10:18] function and even though that is said nothing
[00:10:20] and even though that is said nothing about the capabilities deep networks
[00:10:23] about the capabilities deep networks somehow this book is largely credited
[00:10:26] somehow this book is largely credited with
[00:10:27] with shutting down neural networks research
[00:10:29] shutting down neural networks research and the continued rise of symbolic ai
[00:10:32] and the continued rise of symbolic ai it's a really kind of interesting piece
[00:10:33] it's a really kind of interesting piece of history and i you know encourage you
[00:10:35] of history and i you know encourage you to go examine it
[00:10:38] to go examine it in the 80s our neural networks started
[00:10:40] in the 80s our neural networks started coming back again
[00:10:42] coming back again 1980 was the first convolutional neural
[00:10:44] 1980 was the first convolutional neural network um which was trained in a kind
[00:10:47] network um which was trained in a kind of a ad hoc way 1986
[00:10:50] of a ad hoc way 1986 romahart hinton-williams
[00:10:53] romahart hinton-williams reinvented
[00:10:54] reinvented and popularized back propagation from
[00:10:56] and popularized back propagation from multi-layer networks and now training
[00:10:58] multi-layer networks and now training became a little bit more principled 1989
[00:11:02] became a little bit more principled 1989 yamachan
[00:11:03] yamachan devised a convolutional network that was
[00:11:06] devised a convolutional network that was able to recognize handwritten digits
[00:11:09] able to recognize handwritten digits and was actually deployed for the use at
[00:11:10] and was actually deployed for the use at usps to recognize zip codes
[00:11:13] usps to recognize zip codes and this was one of the first major
[00:11:15] and this was one of the first major success stories of using neural networks
[00:11:19] success stories of using neural networks um but
[00:11:21] um but until the mid 2000s neural network's
[00:11:23] until the mid 2000s neural network's research was still fairly niche i would
[00:11:26] research was still fairly niche i would say and they were very notorious hard to
[00:11:28] say and they were very notorious hard to train
[00:11:29] train in 2006 this kind of started changing
[00:11:31] in 2006 this kind of started changing jeff hinton and his colleagues had a
[00:11:33] jeff hinton and his colleagues had a paper showing how you could use
[00:11:35] paper showing how you could use unsupervised layer-wise pre-training to
[00:11:37] unsupervised layer-wise pre-training to mitigate some of these
[00:11:39] mitigate some of these facts and the term deep learning started
[00:11:42] facts and the term deep learning started getting used around this time as well
[00:11:44] getting used around this time as well but it was really
[00:11:46] but it was really 2012. i would say that was a real kind
[00:11:48] 2012. i would say that was a real kind of the major break for neural networks
[00:11:51] of the major break for neural networks so alice
[00:11:53] so alice chuzzkevsky elias suscover and jeff
[00:11:55] chuzzkevsky elias suscover and jeff hinton wrote this landmark paper
[00:11:58] hinton wrote this landmark paper um which came up with what is now called
[00:12:00] um which came up with what is now called alexnet a convolutional network which
[00:12:03] alexnet a convolutional network which had huge huge gains in object
[00:12:05] had huge huge gains in object recognition
[00:12:06] recognition and at the time computer vision
[00:12:08] and at the time computer vision community was very skeptical and almost
[00:12:11] community was very skeptical and almost overnight it completely transformed the
[00:12:13] overnight it completely transformed the field think about computer vision
[00:12:14] field think about computer vision without neural networks today that's
[00:12:16] without neural networks today that's kind of um this almost feels like kind
[00:12:18] kind of um this almost feels like kind of distant memory almost
[00:12:20] of distant memory almost 2016 was another big event alphago
[00:12:24] 2016 was another big event alphago defeated they said oh and go something
[00:12:26] defeated they said oh and go something that experts thought was still many
[00:12:29] that experts thought was still many decades away and that just kind of
[00:12:31] decades away and that just kind of firmly more established deep learning as
[00:12:34] firmly more established deep learning as a dominant
[00:12:35] a dominant paradigm in ai
[00:12:37] paradigm in ai and this kind of continues even to the
[00:12:39] and this kind of continues even to the modern day
[00:12:40] modern day um but
[00:12:42] um but let's reflect so far so we have seen two
[00:12:45] let's reflect so far so we have seen two intellectual traditions symbolic ai
[00:12:47] intellectual traditions symbolic ai which is roots
[00:12:49] which is roots in logic and neural ai with its roots in
[00:12:52] in logic and neural ai with its roots in neuroscience
[00:12:53] neuroscience the two have fought fiercely over the
[00:12:55] the two have fought fiercely over the decades over philosophical differences
[00:12:58] decades over philosophical differences but i want to suggest some food for
[00:13:01] but i want to suggest some food for thought maybe there are deeper
[00:13:02] thought maybe there are deeper connections here
[00:13:03] connections here so remember that nicole and pitt's paper
[00:13:06] so remember that nicole and pitt's paper that introduced neural networks and
[00:13:08] that introduced neural networks and arguably the root of deep learning well
[00:13:10] arguably the root of deep learning well they spent most of the time talking
[00:13:12] they spent most of the time talking about how it can actually encode logical
[00:13:15] about how it can actually encode logical operations
[00:13:17] operations and the game of go which is actually a
[00:13:18] and the game of go which is actually a perfectly logical game
[00:13:20] perfectly logical game designed by a few elegant simple rules
[00:13:23] designed by a few elegant simple rules but alphago use the power pattern
[00:13:26] but alphago use the power pattern matching capabilities of neural networks
[00:13:28] matching capabilities of neural networks to solve this otherwise logical game
[00:13:32] to solve this otherwise logical game so there may be room for more symbiosis
[00:13:34] so there may be room for more symbiosis than we think
[00:13:37] than we think so now there's a third and final story
[00:13:38] so now there's a third and final story that we must tell to complete the
[00:13:40] that we must tell to complete the picture
[00:13:41] picture so this story is not really about ai per
[00:13:44] so this story is not really about ai per se but it's about the influx of certain
[00:13:47] se but it's about the influx of certain other ideas from other areas that have
[00:13:50] other ideas from other areas that have helped shape and form a mathematical
[00:13:53] helped shape and form a mathematical foundation
[00:13:54] foundation for ai and we call this statistical ai
[00:13:57] for ai and we call this statistical ai so machine learning is very
[00:13:59] so machine learning is very popular but the idea of fitting models
[00:14:03] popular but the idea of fitting models from data goes which is at the core of
[00:14:06] from data goes which is at the core of machine learning goes far back
[00:14:08] machine learning goes far back even to uh gauss and legend and at the
[00:14:11] even to uh gauss and legend and at the beginning of the
[00:14:12] beginning of the 19th century we developed at least
[00:14:14] 19th century we developed at least squares for linear regression
[00:14:16] squares for linear regression classification was also very early in
[00:14:18] classification was also very early in statistics
[00:14:20] statistics and ai also consists of sequential
[00:14:22] and ai also consists of sequential decision-making problems for
[00:14:23] decision-making problems for deterministic versions dijkstra's
[00:14:26] deterministic versions dijkstra's algorithm from the algorithms community
[00:14:29] algorithm from the algorithms community for uh models with uncertainty from
[00:14:31] for uh models with uncertainty from control theory bellman created markov
[00:14:34] control theory bellman created markov decision processes
[00:14:36] decision processes um and notice that all these
[00:14:37] um and notice that all these developments largely predated um the 50s
[00:14:40] developments largely predated um the 50s which
[00:14:41] which and 40s where ai really kind of started
[00:14:44] and 40s where ai really kind of started springing up
[00:14:47] so you might have noticed if you're
[00:14:49] so you might have noticed if you're paying close attention that
[00:14:51] paying close attention that where we left symbolic ai was at the end
[00:14:54] where we left symbolic ai was at the end of the 80s
[00:14:55] of the 80s but where neural ai started really
[00:14:56] but where neural ai started really gaining traction was
[00:14:58] gaining traction was the 2010s so what was going on between
[00:15:02] the 2010s so what was going on between and what was going on between was that
[00:15:05] and what was going on between was that there was a period where the term ai
[00:15:08] there was a period where the term ai wasn't
[00:15:08] wasn't really used at least not to the extent
[00:15:10] really used at least not to the extent that it is today
[00:15:12] that it is today and i think that part of it was to
[00:15:14] and i think that part of it was to distance um
[00:15:16] distance um to add distance to the failed attempts
[00:15:19] to add distance to the failed attempts of the recent account of ai winter
[00:15:22] of the recent account of ai winter and also because the goals were just
[00:15:23] and also because the goals were just more down to earth people talked about
[00:15:25] more down to earth people talked about machine learning
[00:15:26] machine learning and then during that period there were
[00:15:29] and then during that period there were two paradigms
[00:15:31] two paradigms there was bayesian networks developed in
[00:15:33] there was bayesian networks developed in 80s by judea pearl which provided
[00:15:36] 80s by judea pearl which provided reasoning under uncertainty framework
[00:15:40] reasoning under uncertainty framework which is something that a symbolic ai
[00:15:42] which is something that a symbolic ai didn't have a satisfying answer for 1995
[00:15:45] didn't have a satisfying answer for 1995 support vector machines were developed
[00:15:48] support vector machines were developed derived from ideas from learning theory
[00:15:50] derived from ideas from learning theory and optimization
[00:15:52] and optimization and at that time svms were easier to
[00:15:54] and at that time svms were easier to turn tuned than neural networks and
[00:15:56] turn tuned than neural networks and really became the favorite tool in
[00:15:57] really became the favorite tool in machine learning before deep learning
[00:15:59] machine learning before deep learning started taking off again
[00:16:03] so to kind of wrap up
[00:16:04] so to kind of wrap up you know the there's three stories that
[00:16:06] you know the there's three stories that we talked about symbolic ai
[00:16:09] we talked about symbolic ai took a top-down approach
[00:16:11] took a top-down approach and really failed to deserve on its
[00:16:13] and really failed to deserve on its original promise but it did offer a
[00:16:16] original promise but it did offer a vision and built impressive artifacts
[00:16:18] vision and built impressive artifacts like question answering and dialogue
[00:16:20] like question answering and dialogue system managing trying to do this on
[00:16:22] system managing trying to do this on ancient hardware in the 60s
[00:16:26] ancient hardware in the 60s neural ai took a completely different
[00:16:28] neural ai took a completely different approach proceeding bottom up starting
[00:16:30] approach proceeding bottom up starting with simple perceptual tasks which the
[00:16:33] with simple perceptual tasks which the symbolic community wasn't interested at
[00:16:35] symbolic community wasn't interested at a time i compared machine translation
[00:16:37] a time i compared machine translation with removing echoes on phone lines for
[00:16:39] with removing echoes on phone lines for example but in the end it offered a
[00:16:42] example but in the end it offered a class of models and a way of thinking
[00:16:44] class of models and a way of thinking about data that has proven
[00:16:47] about data that has proven capable of conquering
[00:16:49] capable of conquering today's ambitious problems
[00:16:51] today's ambitious problems and finally statistical ai
[00:16:54] and finally statistical ai foremost
[00:16:55] foremost for us will offer mathematical rigor and
[00:16:57] for us will offer mathematical rigor and clarity for example in the course when
[00:16:59] clarity for example in the course when we define define objective functions
[00:17:02] we define define objective functions separate from optimization or have a
[00:17:04] separate from optimization or have a language to talk about the complexity of
[00:17:06] language to talk about the complexity of a model in learning
[00:17:07] a model in learning these ideas and language all stem from
[00:17:10] these ideas and language all stem from statistical ai and the course will
[00:17:13] statistical ai and the course will actually be presented mostly through the
[00:17:15] actually be presented mostly through the lens of statistical ai but i want to
[00:17:17] lens of statistical ai but i want to highlight that all three views are kind
[00:17:19] highlight that all three views are kind of compatible and just offer different
[00:17:22] of compatible and just offer different advantages on the same underlying
[00:17:24] advantages on the same underlying ideas
[00:17:26] ideas stepping back you know the modern world
[00:17:28] stepping back you know the modern world of ai is kind of like new york city it's
[00:17:30] of ai is kind of like new york city it's a melting pot that has drawn largely
[00:17:33] a melting pot that has drawn largely from a lot of different fields
[00:17:34] from a lot of different fields statistics algorithms neuroscience
[00:17:36] statistics algorithms neuroscience economics and it's really a symbiosis
[00:17:39] economics and it's really a symbiosis between all these fields and how they
[00:17:42] between all these fields and how they come together and allow you to tackle
[00:17:44] come together and allow you to tackle real-world applications that makes our
[00:17:46] real-world applications that makes our ai so rewarding
[00:17:50] okay so that ends the
[00:17:53] okay so that ends the uh the ai
[00:17:55] uh the ai history module
[00:17:57] history module you can read much more about it at a few
[00:17:59] you can read much more about it at a few links at the end of these slides


================================================================================
LECTURE 003
================================================================================

Artificial Intelligence Today | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=C0IhR4D5KYc

---

Transcript

[00:00:05] so if i had to use one word to describe
[00:00:08] so if i had to use one word to describe ai today it would be
[00:00:10] ai today it would be surreal
[00:00:12] surreal it's kind of hard for me to imagine that
[00:00:13] it's kind of hard for me to imagine that 10 years ago it was very much an
[00:00:15] 10 years ago it was very much an academic endeavor and now countries are
[00:00:18] academic endeavor and now countries are forming national strategies around they
[00:00:20] forming national strategies around they are what
[00:00:22] are what so the ai index
[00:00:23] so the ai index is a project that aims to track
[00:00:27] is a project that aims to track the status of ai and each year they
[00:00:28] the status of ai and each year they release an annual report
[00:00:30] release an annual report um here are some quotes from this report
[00:00:32] um here are some quotes from this report compute doubling every 3.4 months um the
[00:00:35] compute doubling every 3.4 months um the conference neurops
[00:00:37] conference neurops increased over 800 percent in the last
[00:00:40] increased over 800 percent in the last eight years number of jobs is also going
[00:00:42] eight years number of jobs is also going up and so quantitatively at least we see
[00:00:45] up and so quantitatively at least we see that you know shouldn't be surprising to
[00:00:47] that you know shouldn't be surprising to people that ai
[00:00:49] people that ai is just becoming a big deal
[00:00:52] is just becoming a big deal qualitatively
[00:00:54] qualitatively what
[00:00:55] what i think is really interesting is that ai
[00:00:57] i think is really interesting is that ai is transitioning
[00:00:59] is transitioning from being in the lab to the real world
[00:01:01] from being in the lab to the real world for a long time ai was limited to
[00:01:03] for a long time ai was limited to relatively artificial environments which
[00:01:05] relatively artificial environments which was useful for developing
[00:01:07] was useful for developing methods
[00:01:08] methods but now we're seeing real world
[00:01:10] but now we're seeing real world deployment in ways that really impact
[00:01:13] deployment in ways that really impact people's lives
[00:01:15] people's lives and i want to stress that ai
[00:01:17] and i want to stress that ai like any technology is an amplifier it
[00:01:19] like any technology is an amplifier it makes what is good better and makes what
[00:01:22] makes what is good better and makes what is bad worse and we really need to be
[00:01:24] is bad worse and we really need to be aware of both sides
[00:01:26] aware of both sides so let me start with the positives the
[00:01:28] so let me start with the positives the prospects
[00:01:30] prospects so here are some examples in which ai
[00:01:33] so here are some examples in which ai has been well beneficial so in the last
[00:01:37] has been well beneficial so in the last decade speech recognition question
[00:01:38] decade speech recognition question answerings have gone
[00:01:40] answerings have gone remarkably good and now you can talk to
[00:01:42] remarkably good and now you can talk to your favorite assistant and expect some
[00:01:44] your favorite assistant and expect some basic though obviously not perfect level
[00:01:46] basic though obviously not perfect level of language understanding my
[00:01:48] of language understanding my three-year-old is growing up thinking
[00:01:50] three-year-old is growing up thinking that talking to computers is perfectly
[00:01:51] that talking to computers is perfectly normal
[00:01:53] normal and you know search engines like google
[00:01:55] and you know search engines like google have told us
[00:01:56] have told us that
[00:01:57] that you know enabling
[00:01:59] you know enabling power comes that comes with um
[00:02:02] power comes that comes with um being able to tap into the world's rich
[00:02:04] being able to tap into the world's rich information and now taking one step
[00:02:07] information and now taking one step further these assistants allow uh this
[00:02:09] further these assistants allow uh this information to be more efficiently and
[00:02:12] information to be more efficiently and naturally accessible which could be
[00:02:14] naturally accessible which could be especially useful for people do not have
[00:02:16] especially useful for people do not have the means to use a computer
[00:02:20] so there's language barriers in the
[00:02:21] so there's language barriers in the world that pose significant
[00:02:24] world that pose significant significant challenges to travelers
[00:02:27] significant challenges to travelers immigrants
[00:02:28] immigrants businesses minority subject communities
[00:02:31] businesses minority subject communities and so connecting people is very
[00:02:33] and so connecting people is very valuable so machine translation aims to
[00:02:35] valuable so machine translation aims to overcome these barriers
[00:02:36] overcome these barriers machine translation has come a long way
[00:02:39] machine translation has come a long way since the 60s and while it's far from
[00:02:41] since the 60s and while it's far from perfect it is really good enough for
[00:02:43] perfect it is really good enough for someone to get the basic gist of a
[00:02:45] someone to get the basic gist of a document written in a different language
[00:02:47] document written in a different language or to have a real-time conversation with
[00:02:49] or to have a real-time conversation with someone speaking in a completely
[00:02:51] someone speaking in a completely different language
[00:02:53] different language autonomous driving will someday
[00:02:55] autonomous driving will someday hopefully be able to reduce the number
[00:02:57] hopefully be able to reduce the number of accidents and congestion but a major
[00:03:00] of accidents and congestion but a major challenge is to recognize what is going
[00:03:03] challenge is to recognize what is going on in the unstructured environment and
[00:03:05] on in the unstructured environment and computer vision has
[00:03:08] computer vision has made a lot of progress towards
[00:03:10] made a lot of progress towards recognizing these objects but
[00:03:12] recognizing these objects but there is still headroom to be made to
[00:03:14] there is still headroom to be made to ensure sufficient
[00:03:16] ensure sufficient reliability an interesting application
[00:03:18] reliability an interesting application is visual assistive technology so this
[00:03:21] is visual assistive technology so this is an example it's called scene ai for
[00:03:23] is an example it's called scene ai for microsoft research would you point a
[00:03:25] microsoft research would you point a camera at something and it narrates um
[00:03:27] camera at something and it narrates um what's going on there and so this
[00:03:29] what's going on there and so this obviously could be a game changer for
[00:03:31] obviously could be a game changer for the visually impaired
[00:03:32] the visually impaired auto capturing technology is the
[00:03:34] auto capturing technology is the opposite which it's also potentially
[00:03:36] opposite which it's also potentially very impactful turning sound into sight
[00:03:41] healthcare is another big area that's
[00:03:43] healthcare is another big area that's growing in importance
[00:03:45] growing in importance both for diagnosis and therapeutic
[00:03:47] both for diagnosis and therapeutic development especially in areas where
[00:03:50] development especially in areas where there is a shortage of
[00:03:52] there is a shortage of clinical expertise
[00:03:54] clinical expertise so an example of this is
[00:03:56] so an example of this is detecting
[00:03:58] detecting diseases based on chest x-rays or
[00:04:02] diseases based on chest x-rays or diagnosing diabetic retinopathy
[00:04:05] diagnosing diabetic retinopathy which is a you know one of the major
[00:04:09] challenges in ai and health care these
[00:04:11] challenges in ai and health care these days
[00:04:12] days there's also interesting recent data set
[00:04:13] there's also interesting recent data set that
[00:04:14] that shows images of covalent 19 cells
[00:04:17] shows images of covalent 19 cells infected uh
[00:04:18] infected uh the 19 infected cells
[00:04:20] the 19 infected cells and how they respond to certain drugs
[00:04:22] and how they respond to certain drugs with the hope that one day we can find
[00:04:24] with the hope that one day we can find drugs that can uh
[00:04:27] drugs that can uh treat late stage cover 19.
[00:04:30] poverty is a big a problem in the world
[00:04:33] poverty is a big a problem in the world and but even figuring out
[00:04:36] and but even figuring out the areas in greatest need is
[00:04:38] the areas in greatest need is challenging
[00:04:39] challenging so some recently people have been using
[00:04:41] so some recently people have been using satellite imagery to try to
[00:04:44] satellite imagery to try to figure this out because gathering survey
[00:04:46] figure this out because gathering survey data on the ground is
[00:04:48] data on the ground is very very expensive
[00:04:50] very very expensive and using machine learning you can look
[00:04:53] and using machine learning you can look at satellite images and try to predict
[00:04:55] at satellite images and try to predict various wealth indicators this could be
[00:04:57] various wealth indicators this could be really useful for governments and ngos
[00:05:00] really useful for governments and ngos to take action and monitor the progress
[00:05:04] to take action and monitor the progress so this sounds all great right so what's
[00:05:06] so this sounds all great right so what's the catch
[00:05:07] the catch well there's a lot of things
[00:05:10] well there's a lot of things that one has to be aware of we'll
[00:05:13] that one has to be aware of we'll give i just want to give you a general
[00:05:15] give i just want to give you a general idea of the space and i'm going to go
[00:05:17] idea of the space and i'm going to go fairly quickly here
[00:05:19] fairly quickly here first is energy consumption
[00:05:21] first is energy consumption so there is a genuine cost to training
[00:05:24] so there is a genuine cost to training high performing models that we're seeing
[00:05:26] high performing models that we're seeing today
[00:05:27] today so if we look at nlp
[00:05:28] so if we look at nlp there has been a trend of training more
[00:05:30] there has been a trend of training more and more large language models back in
[00:05:32] and more large language models back in 2018 which is like ancient history now
[00:05:36] 2018 which is like ancient history now models had about 100 million parameters
[00:05:39] models had about 100 million parameters only
[00:05:40] only and then bird came along which some of
[00:05:41] and then bird came along which some of you might have heard of of a big splash
[00:05:44] you might have heard of of a big splash with 300
[00:05:45] with 300 and then in january this year
[00:05:48] and then in january this year microsoft released a model which was 17
[00:05:51] microsoft released a model which was 17 billion parameters and then to top it
[00:05:53] billion parameters and then to top it off open ai may release 10 times
[00:05:57] off open ai may release 10 times larger model with 175 billion parameters
[00:06:00] larger model with 175 billion parameters so this is
[00:06:02] so this is big
[00:06:03] big um last year there was a paper published
[00:06:05] um last year there was a paper published that talked about the carbon footprint
[00:06:07] that talked about the carbon footprint of training these models they looked at
[00:06:10] of training these models they looked at transformer with 200 million
[00:06:12] transformer with 200 million uh parameters which would be around
[00:06:16] uh parameters which would be around here on this on this on this graph
[00:06:18] here on this on this on this graph and they show that even this train if
[00:06:20] and they show that even this train if you use neural architecture search this
[00:06:22] you use neural architecture search this was five times
[00:06:24] was five times the amount of co2 emissions in the
[00:06:26] the amount of co2 emissions in the entire lifetime of the u.s car
[00:06:30] entire lifetime of the u.s car so now i'll leave you to speculate how
[00:06:32] so now i'll leave you to speculate how much dpd3 uh what gp 33's environmental
[00:06:36] much dpd3 uh what gp 33's environmental footprint is
[00:06:38] footprint is so needless to say a lot of people are
[00:06:40] so needless to say a lot of people are actively trying to somehow reduce the
[00:06:42] actively trying to somehow reduce the model's
[00:06:44] model's size um
[00:06:45] size um improve efficiency without sacrificing
[00:06:47] improve efficiency without sacrificing accuracy
[00:06:49] accuracy privacy is another big area so
[00:06:52] privacy is another big area so machine learning algorithms have really
[00:06:54] machine learning algorithms have really been developed to assume that data is
[00:06:56] been developed to assume that data is just sitting there in one place and is
[00:06:58] just sitting there in one place and is fully accessible
[00:07:00] fully accessible but
[00:07:01] but our mobile files generate a wealth of
[00:07:02] our mobile files generate a wealth of information and we might not want to be
[00:07:04] information and we might not want to be sending all that information up into
[00:07:06] sending all that information up into some big internet company
[00:07:08] some big internet company recently there's been a lot of active
[00:07:09] recently there's been a lot of active work in privacy preserving machine
[00:07:11] work in privacy preserving machine learning which allows some of the
[00:07:13] learning which allows some of the learning to be happening on device in a
[00:07:15] learning to be happening on device in a centralized way and only transmitting uh
[00:07:18] centralized way and only transmitting uh various essential statistics to a
[00:07:20] various essential statistics to a central server
[00:07:21] central server security is another major uh challenge
[00:07:25] security is another major uh challenge especially in high-stakes applications
[00:07:26] especially in high-stakes applications like autonomous driving and face
[00:07:28] like autonomous driving and face identification
[00:07:31] identification for authentication
[00:07:34] for authentication so here models not only need to be
[00:07:35] so here models not only need to be accurate but robust against attackers
[00:07:38] accurate but robust against attackers and malicious behavior which we know
[00:07:40] and malicious behavior which we know exists in the world
[00:07:42] exists in the world so researchers have shown that if you
[00:07:44] so researchers have shown that if you can adversarial examples so if you take
[00:07:46] can adversarial examples so if you take us images of stop sign and you actually
[00:07:47] us images of stop sign and you actually post these stickers on them you can get
[00:07:50] post these stickers on them you can get a state of the art
[00:07:52] a state of the art system to think that these are speed
[00:07:54] system to think that these are speed limit signs
[00:07:56] limit signs or you can actually buy these cool
[00:07:57] or you can actually buy these cool looking glasses that will trick face id
[00:08:00] looking glasses that will trick face id to think that you're some celebrity that
[00:08:02] to think that you're some celebrity that you're not
[00:08:03] you're not so guarding against these attackers
[00:08:05] so guarding against these attackers quite kind of frightening is still a
[00:08:07] quite kind of frightening is still a wide open problem
[00:08:09] wide open problem um bias was mentioned in the chat
[00:08:12] um bias was mentioned in the chat um this is something that's maybe less
[00:08:16] um this is something that's maybe less uh spectacular
[00:08:17] uh spectacular in terms of
[00:08:19] in terms of you know sudden impact but i think it's
[00:08:20] you know sudden impact but i think it's more pernicious so here's an example
[00:08:22] more pernicious so here's an example from machine translation if you take
[00:08:24] from machine translation if you take hungarian
[00:08:25] hungarian and you have the words he and she which
[00:08:27] and you have the words he and she which are not differentiated you translate
[00:08:29] are not differentiated you translate them into english
[00:08:31] them into english so the machine translation model has to
[00:08:33] so the machine translation model has to hallucinate the gender you can reveal
[00:08:35] hallucinate the gender you can reveal all sorts of stereotypes that the model
[00:08:38] all sorts of stereotypes that the model is harboring for example she is a nurse
[00:08:40] is harboring for example she is a nurse baker wedding organizer but he is a
[00:08:42] baker wedding organizer but he is a scientist engineer and teacher and ceo
[00:08:45] scientist engineer and teacher and ceo and there's a lot of active work showing
[00:08:47] and there's a lot of active work showing how
[00:08:49] how hard it is to actually remove these
[00:08:51] hard it is to actually remove these biases
[00:08:52] biases so i want to say that
[00:08:55] so i want to say that machine learning algorithms they are
[00:08:57] machine learning algorithms they are based on
[00:08:58] based on quote-unquote objective mathematical
[00:09:00] quote-unquote objective mathematical principles but
[00:09:02] principles but the trained models are trained on
[00:09:04] the trained models are trained on uh to latch on to statistics in the data
[00:09:08] uh to latch on to statistics in the data and the data comes from society so any
[00:09:11] and the data comes from society so any biases
[00:09:12] biases in society are reflected in data and
[00:09:14] in society are reflected in data and propagated to model predictions and
[00:09:16] propagated to model predictions and worse sometimes they're even amplified
[00:09:21] so here's another case study um
[00:09:24] so here's another case study um so north point is a company that
[00:09:26] so north point is a company that produces a software called compass that
[00:09:28] produces a software called compass that assesses um whether someone is going to
[00:09:31] assesses um whether someone is going to commit a crime again
[00:09:34] commit a crime again and so propublica this um nonprofit
[00:09:37] and so propublica this um nonprofit organization that does some
[00:09:38] organization that does some investigative journalism
[00:09:40] investigative journalism came out and said whoa whoa whoa you are
[00:09:43] came out and said whoa whoa whoa you are not being fair because given that an
[00:09:45] not being fair because given that an individual did not reoffend black people
[00:09:47] individual did not reoffend black people are twice as likely to be wrongly
[00:09:48] are twice as likely to be wrongly classified as committing a of a higher
[00:09:52] classified as committing a of a higher risk war than
[00:09:53] risk war than white people her
[00:09:55] white people her but north pine the north point defended
[00:09:58] but north pine the north point defended themselves by saying that given a risk
[00:10:00] themselves by saying that given a risk score of seven
[00:10:02] score of seven sixty percent of white people are
[00:10:03] sixty percent of white people are reoffended sixty percent of black people
[00:10:05] reoffended sixty percent of black people are offended so
[00:10:07] are offended so therefore it's it's fair
[00:10:10] therefore it's it's fair so
[00:10:12] so both of these actually turn out to be
[00:10:14] both of these actually turn out to be simply different kind of desirada of
[00:10:17] simply different kind of desirada of fairness and unfortunately there's some
[00:10:19] fairness and unfortunately there's some actually impossibility results that say
[00:10:22] actually impossibility results that say you can't have
[00:10:23] you can't have these two and a third criteria that hold
[00:10:26] these two and a third criteria that hold for imperfect classifiers at the same
[00:10:28] for imperfect classifiers at the same time
[00:10:30] time and given that these algorithms are you
[00:10:32] and given that these algorithms are you know being actually deployed and really
[00:10:35] know being actually deployed and really impacting people's lives in a
[00:10:37] impacting people's lives in a huge way this indicates that we not only
[00:10:40] huge way this indicates that we not only need to understand
[00:10:42] need to understand the technical implications of all these
[00:10:44] the technical implications of all these algorithms but also think about the
[00:10:46] algorithms but also think about the philosophical and policy related issues
[00:10:49] philosophical and policy related issues as well
[00:10:52] so this one's kind of scary generating
[00:10:54] so this one's kind of scary generating fake content
[00:10:56] fake content deep learning has enabled us to generate
[00:10:58] deep learning has enabled us to generate deep fix such as obama saying things
[00:11:00] deep fix such as obama saying things that he never did which you can find
[00:11:02] that he never did which you can find online
[00:11:04] online um or more recently this is a blog post
[00:11:07] um or more recently this is a blog post written by our friend gbt3 that was made
[00:11:10] written by our friend gbt3 that was made it its way to number one on hacker news
[00:11:14] it its way to number one on hacker news so it's completely clear at least to me
[00:11:18] so it's completely clear at least to me that we've lost the ability to gen tell
[00:11:19] that we've lost the ability to gen tell the difference between real and fake
[00:11:21] the difference between real and fake content and given the ease and skill at
[00:11:23] content and given the ease and skill at which fake content can now be generated
[00:11:26] which fake content can now be generated back to actors spreading disinformation
[00:11:28] back to actors spreading disinformation is i think a major threat to our society
[00:11:32] is i think a major threat to our society finally ai systems are being deployed in
[00:11:36] finally ai systems are being deployed in dynamic environments where you have
[00:11:38] dynamic environments where you have systems which are making predictions
[00:11:41] systems which are making predictions serving you search results giving you
[00:11:42] serving you search results giving you recommendations serving ads
[00:11:44] recommendations serving ads and users are taking actions by
[00:11:48] and users are taking actions by clicking essentially
[00:11:49] clicking essentially and these actions are recorded as data
[00:11:53] and these actions are recorded as data this data is used to retrain the system
[00:11:55] this data is used to retrain the system which furthermore reinforces these
[00:11:57] which furthermore reinforces these actions so i think there's a very
[00:11:59] actions so i think there's a very dangerous feedback loop inherent in
[00:12:02] dangerous feedback loop inherent in machine learning where all these biases
[00:12:05] machine learning where all these biases and are amplified and they're polarized
[00:12:08] and are amplified and they're polarized and leads to quite unstable behavior
[00:12:11] and leads to quite unstable behavior so i think the major open research
[00:12:12] so i think the major open research challenges to how to figure out how to
[00:12:15] challenges to how to figure out how to build more robust systems that are
[00:12:18] build more robust systems that are not as susceptible to these uh unstable
[00:12:21] not as susceptible to these uh unstable dynamics
[00:12:24] so
[00:12:25] so to conclude
[00:12:27] to conclude i just want to stress that ai technology
[00:12:30] i just want to stress that ai technology is an amplifier
[00:12:32] is an amplifier and we've seen that ai can
[00:12:35] and we've seen that ai can and promises to
[00:12:37] and promises to be quite beneficial to society reducing
[00:12:40] be quite beneficial to society reducing excel accessibility barriers improving
[00:12:43] excel accessibility barriers improving efficiency
[00:12:44] efficiency but on the other hand it can generally
[00:12:47] but on the other hand it can generally amplify biases introduce new security
[00:12:49] amplify biases introduce new security risk
[00:12:50] risk centralized power
[00:12:52] centralized power in ways that was kind of unprecedented
[00:12:55] in ways that was kind of unprecedented you know before
[00:12:57] you know before and i just want you to
[00:12:59] and i just want you to keep these issues in mind as we go
[00:13:01] keep these issues in mind as we go through the course
[00:13:03] through the course just because you can build it doesn't
[00:13:05] just because you can build it doesn't mean you should
[00:13:07] mean you should and we could if we're not careful we
[00:13:09] and we could if we're not careful we could potentially build something that
[00:13:11] could potentially build something that does more harm than good
[00:13:14] does more harm than good and moreover figuring out the best way
[00:13:17] and moreover figuring out the best way to tread the line
[00:13:18] to tread the line between positive prospects and negative
[00:13:22] between positive prospects and negative risks is i think something that requires
[00:13:25] risks is i think something that requires a deep technical understanding
[00:13:27] a deep technical understanding especially if we are to develop novel
[00:13:29] especially if we are to develop novel solutions
[00:13:30] solutions and that's something that this course is
[00:13:32] and that's something that this course is going to equip you with
[00:13:36] so that concludes
[00:13:38] so that concludes this module


================================================================================
LECTURE 004
================================================================================

Artificial Intelligence and Machine Learning 1 - Overview | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=mtrYwgIrRNk

---

Transcript

[00:00:05] hi in this module i'm going to be
[00:00:07] hi in this module i'm going to be talking about machine learning and give
[00:00:08] talking about machine learning and give you an overview of all the topics we'll
[00:00:10] you an overview of all the topics we'll cover
[00:00:11] cover so remember that machine learning in the
[00:00:13] so remember that machine learning in the process of taking data and converting it
[00:00:15] process of taking data and converting it into models and with those models you
[00:00:17] into models and with those models you can go and perform inferences and answer
[00:00:19] can go and perform inferences and answer all sorts of questions
[00:00:21] all sorts of questions so we're going to focus on reflex based
[00:00:23] so we're going to focus on reflex based models these are models including linear
[00:00:25] models these are models including linear classifiers neural networks which in
[00:00:28] classifiers neural networks which in which inference is very fast and feed
[00:00:30] which inference is very fast and feed forward which makes them very attractive
[00:00:34] forward which makes them very attractive so in a nutshell this is what a reflex
[00:00:36] so in a nutshell this is what a reflex space model is we'll call a reflex space
[00:00:38] space model is we'll call a reflex space model a predictor and the predictor
[00:00:40] model a predictor and the predictor takes as input sum x and produces some
[00:00:44] takes as input sum x and produces some output
[00:00:45] output y
[00:00:46] y and in general x can be something
[00:00:48] and in general x can be something arbitrary like an image or a text
[00:00:51] arbitrary like an image or a text and
[00:00:52] and y is going to be restricted and that
[00:00:55] y is going to be restricted and that particular restriction is going to
[00:00:56] particular restriction is going to determine what type of prediction task
[00:00:58] determine what type of prediction task we are talking about
[00:01:00] we are talking about we'll consider two common cases of
[00:01:02] we'll consider two common cases of prediction tasks here
[00:01:04] prediction tasks here the first is binary classification so in
[00:01:07] the first is binary classification so in binary classification the predictor is
[00:01:09] binary classification the predictor is also called a classifier
[00:01:11] also called a classifier and the output y is called a label and
[00:01:14] and the output y is called a label and that label can either be plus one for
[00:01:16] that label can either be plus one for the positive class or minus one for the
[00:01:19] the positive class or minus one for the negative class
[00:01:21] negative class so some examples of binary
[00:01:22] so some examples of binary classification problems there's fraud
[00:01:24] classification problems there's fraud detection where x is a credit card
[00:01:27] detection where x is a credit card transaction and we're trying to predict
[00:01:29] transaction and we're trying to predict why whether there's fraud or no fraud so
[00:01:32] why whether there's fraud or no fraud so that the transaction can be blocked or
[00:01:34] that the transaction can be blocked or not
[00:01:36] not another example is moderating online
[00:01:38] another example is moderating online discussion forums so the input x is an
[00:01:41] discussion forums so the input x is an online comment a piece of text and
[00:01:43] online comment a piece of text and you're trying to predict why whether
[00:01:45] you're trying to predict why whether it's toxic or not so that the comment
[00:01:47] it's toxic or not so that the comment can be flagged or taken down
[00:01:49] can be flagged or taken down appropriately
[00:01:51] appropriately and finally here's an example from
[00:01:54] and finally here's an example from physics
[00:01:55] physics so after the higgs boson was discovered
[00:01:57] so after the higgs boson was discovered scientists want to know how does it
[00:01:58] scientists want to know how does it decay
[00:02:00] decay so the hadron collider collected a bunch
[00:02:02] so the hadron collider collected a bunch of data which includes
[00:02:04] of data which includes measurements of events so here x is a
[00:02:06] measurements of events so here x is a measurement of particular event you're
[00:02:08] measurement of particular event you're trying to predict
[00:02:10] trying to predict whether it was a decay event or simply
[00:02:12] whether it was a decay event or simply background
[00:02:15] so the second type of task we're going
[00:02:17] so the second type of task we're going to consider is regression so in
[00:02:19] to consider is regression so in regression
[00:02:21] regression y is going to be a real number
[00:02:24] y is going to be a real number and it's generally known as the response
[00:02:28] and it's generally known as the response so here are some examples of regression
[00:02:30] so here are some examples of regression problems so in poverty mapping acts as a
[00:02:33] problems so in poverty mapping acts as a satellite image and you're trying to
[00:02:34] satellite image and you're trying to predict why which is the asset wealth
[00:02:37] predict why which is the asset wealth index of the homes in that area in the
[00:02:40] index of the homes in that area in the satellite image
[00:02:42] satellite image in housing you might want to predict
[00:02:45] in housing you might want to predict using the information about the house
[00:02:47] using the information about the house location number of bedrooms year and
[00:02:50] location number of bedrooms year and predict the price
[00:02:52] predict the price and finally
[00:02:53] and finally you might be interested in predicting
[00:02:54] you might be interested in predicting arrival times given where you're going
[00:02:57] arrival times given where you're going why there are conditions at the time
[00:02:59] why there are conditions at the time what time of day it is and you're trying
[00:03:01] what time of day it is and you're trying to predict why which is a time of
[00:03:03] to predict why which is a time of arrival
[00:03:05] arrival so the main difference between
[00:03:06] so the main difference between regression and classification is that in
[00:03:08] regression and classification is that in classification y is a discrete
[00:03:11] classification y is a discrete entity and in regression it is a
[00:03:13] entity and in regression it is a continuous entity
[00:03:17] so the final thing we're going to talk
[00:03:19] so the final thing we're going to talk about is structured prediction so
[00:03:21] about is structured prediction so instruction prediction is a little bit
[00:03:22] instruction prediction is a little bit of a catch-all an instruction prediction
[00:03:24] of a catch-all an instruction prediction y is simply a complex object
[00:03:27] y is simply a complex object so some examples include machine
[00:03:29] so some examples include machine translation where x the input is a
[00:03:32] translation where x the input is a sentence in one language and y is its
[00:03:35] sentence in one language and y is its translation in another language
[00:03:37] translation in another language dialog can also be cast as structure
[00:03:39] dialog can also be cast as structure prediction you're given a conversational
[00:03:41] prediction you're given a conversational history between a user and agent for
[00:03:43] history between a user and agent for example in a virtual assistant setting
[00:03:45] example in a virtual assistant setting and you're trying to predict why which
[00:03:47] and you're trying to predict why which is the next utterance that the agent
[00:03:50] is the next utterance that the agent should say
[00:03:52] should say another example is image captioning
[00:03:54] another example is image captioning which might be useful for visual
[00:03:55] which might be useful for visual assistive technologies
[00:03:57] assistive technologies acts is an image of a scene and why is a
[00:04:00] acts is an image of a scene and why is a sentence describing or narrating that
[00:04:02] sentence describing or narrating that scene
[00:04:05] image segmentation which is useful for
[00:04:07] image segmentation which is useful for autonomous driving
[00:04:09] autonomous driving takes an image of a scene as x and
[00:04:11] takes an image of a scene as x and produces y which is a segmentation of
[00:04:13] produces y which is a segmentation of that scene into a region's corresponding
[00:04:17] that scene into a region's corresponding objects in the real world
[00:04:20] objects in the real world so it might seem daunting at first to be
[00:04:22] so it might seem daunting at first to be able to generate segmentations or
[00:04:24] able to generate segmentations or sentences or
[00:04:25] sentences or texts
[00:04:26] texts but there's a secret here which is that
[00:04:28] but there's a secret here which is that many structure prediction problems can
[00:04:30] many structure prediction problems can be actually decomposed into a sequence
[00:04:33] be actually decomposed into a sequence of multi-class
[00:04:34] of multi-class classification problems
[00:04:36] classification problems and this allows us to leverage the
[00:04:38] and this allows us to leverage the machinery that we'll talk about in just
[00:04:41] machinery that we'll talk about in just multi-class classification
[00:04:43] multi-class classification or structure prediction
[00:04:47] so here is the roadmap of
[00:04:49] so here is the roadmap of the rest of the modules in the machine
[00:04:52] the rest of the modules in the machine learning unit
[00:04:53] learning unit so first we're going to start with
[00:04:55] so first we're going to start with regression and classification the bread
[00:04:57] regression and classification the bread and butter of machine learning we're
[00:04:59] and butter of machine learning we're going to focus on the most simple
[00:05:00] going to focus on the most simple settings linear models
[00:05:02] settings linear models where we're training using radiant
[00:05:04] where we're training using radiant descent
[00:05:05] descent then we're going to step over to
[00:05:07] then we're going to step over to algorithms and introduce stochastic
[00:05:10] algorithms and introduce stochastic gradient descent which is going to give
[00:05:11] gradient descent which is going to give us major speed ups over gradient descent
[00:05:14] us major speed ups over gradient descent next we're going to hop over to models
[00:05:17] next we're going to hop over to models and improve
[00:05:19] and improve from linear models so first we'll show
[00:05:22] from linear models so first we'll show that actually even linear models can be
[00:05:25] that actually even linear models can be pushed to its limits by using non-linear
[00:05:28] pushed to its limits by using non-linear features using the linear machinery we
[00:05:30] features using the linear machinery we can use feature templates to organize
[00:05:32] can use feature templates to organize the set of features that we have
[00:05:35] the set of features that we have then we'll talk about neural networks
[00:05:37] then we'll talk about neural networks which also allows you to have nonlinear
[00:05:39] which also allows you to have nonlinear predictors but allow these
[00:05:41] predictors but allow these nonlinearities to be learned from data
[00:05:44] nonlinearities to be learned from data following neural networks we're going to
[00:05:46] following neural networks we're going to look at the back propagation algorithm
[00:05:48] look at the back propagation algorithm for computing gradients automatically so
[00:05:50] for computing gradients automatically so you don't have to do it yourself
[00:05:52] you don't have to do it yourself manually
[00:05:53] manually so you can train neural networks
[00:05:55] so you can train neural networks we're going to hop back over here and
[00:05:57] we're going to hop back over here and talk about differential programming
[00:05:59] talk about differential programming which is a generalization or extension
[00:06:01] which is a generalization or extension of neural networks that will enable us
[00:06:02] of neural networks that will enable us to build all sorts of complicated deep
[00:06:05] to build all sorts of complicated deep learning models
[00:06:06] learning models using
[00:06:07] using like building blocks
[00:06:10] like building blocks and all this is generally done in the
[00:06:12] and all this is generally done in the context of supervised learning we're
[00:06:13] context of supervised learning we're going to touch on unsupervised learning
[00:06:15] going to touch on unsupervised learning a little bit and introduce the classical
[00:06:17] a little bit and introduce the classical k-means algorithm for clustering points
[00:06:20] k-means algorithm for clustering points and finally we're going to end on a few
[00:06:22] and finally we're going to end on a few nodes so first is generalization the
[00:06:25] nodes so first is generalization the question of if you train a machine
[00:06:27] question of if you train a machine learning model on a particular set of
[00:06:28] learning model on a particular set of data
[00:06:29] data when does it
[00:06:31] when does it is it able to generalize to a new set of
[00:06:33] is it able to generalize to a new set of examples
[00:06:35] examples and finally i'm going to talk about best
[00:06:37] and finally i'm going to talk about best practices like cross-validation and how
[00:06:39] practices like cross-validation and how do you do machine learning in practice
[00:06:42] do you do machine learning in practice so that concludes this module
[00:06:49] you


================================================================================
LECTURE 005
================================================================================

Artificial Intelligence & Machine Learning 2 - Linear Regression | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=nEWNNt2KmfQ

---

Transcript

[00:00:05] hi in this module i'm going to cover the
[00:00:07] hi in this module i'm going to cover the basics of linear regression
[00:00:10] basics of linear regression a story of linear regression begins on
[00:00:12] a story of linear regression begins on january 1 1801
[00:00:14] january 1 1801 italian astronomer piazzi looked up at
[00:00:17] italian astronomer piazzi looked up at the night sky and discovered something
[00:00:20] the night sky and discovered something which he named ceres he didn't know what
[00:00:22] which he named ceres he didn't know what it was what's a comet or a planet but he
[00:00:24] it was what's a comet or a planet but he did make some observations of the
[00:00:26] did make some observations of the location before it was obscured by the
[00:00:28] location before it was obscured by the sun the data he collected looked like
[00:00:30] sun the data he collected looked like this
[00:00:32] this at a particular time two numbers which
[00:00:34] at a particular time two numbers which represent the location of the of series
[00:00:38] represent the location of the of series in the night sky
[00:00:40] in the night sky so now the big question at the time was
[00:00:42] so now the big question at the time was when and where was ceres going to be
[00:00:44] when and where was ceres going to be observed again to emerge from behind the
[00:00:47] observed again to emerge from behind the sun
[00:00:48] sun all the top astronomers at the time went
[00:00:51] all the top astronomers at the time went and tried to analyze this data and
[00:00:52] and tried to analyze this data and figure out at the answer
[00:00:55] figure out at the answer so carl frederick gauss famous german
[00:00:58] so carl frederick gauss famous german mathematician took piazza's data and
[00:01:01] mathematician took piazza's data and created a model of series orbits and
[00:01:03] created a model of series orbits and makes a prediction
[00:01:04] makes a prediction this prediction was actually wildly
[00:01:06] this prediction was actually wildly different from all the other predictions
[00:01:08] different from all the other predictions that other astronomers made but in
[00:01:10] that other astronomers made but in december
[00:01:11] december series was located and gauss's
[00:01:13] series was located and gauss's prediction was by far the most accurate
[00:01:16] prediction was by far the most accurate so now there's an interesting story here
[00:01:18] so now there's an interesting story here gauss was actually very secretive about
[00:01:20] gauss was actually very secretive about what his method was and in 1805 the
[00:01:23] what his method was and in 1805 the french mathematician legend was actually
[00:01:26] french mathematician legend was actually the first to publish the method
[00:01:28] the first to publish the method before accounts could publish in 1809
[00:01:30] before accounts could publish in 1809 even though gals had this method back in
[00:01:32] even though gals had this method back in 1795.
[00:01:34] 1795. the method here is none other than least
[00:01:37] the method here is none other than least squares linear regression which is the
[00:01:39] squares linear regression which is the topic of this module
[00:01:42] topic of this module so here is the framework
[00:01:44] so here is the framework so we are given some training data which
[00:01:47] so we are given some training data which consists of a set of examples each
[00:01:50] consists of a set of examples each example consists of an input x and
[00:01:52] example consists of an input x and output y so 1 1 2 3
[00:01:56] output y so 1 1 2 3 4 3
[00:01:57] 4 3 and we can visualize these examples on a
[00:02:00] and we can visualize these examples on a 2d plot here plotting y the output
[00:02:03] 2d plot here plotting y the output against x the input so here is 1 1
[00:02:07] against x the input so here is 1 1 here is 2 3 and here is 4 3.
[00:02:12] here is 2 3 and here is 4 3. so what we want to do is take this data
[00:02:14] so what we want to do is take this data have a learning algorithm
[00:02:16] have a learning algorithm produce a predictor f
[00:02:19] produce a predictor f and the predictor in this case is uh
[00:02:23] and the predictor in this case is uh let's say a line
[00:02:25] let's say a line okay and what the predictor allows us to
[00:02:27] okay and what the predictor allows us to do is to take new inputs such as this
[00:02:29] do is to take new inputs such as this three here
[00:02:30] three here and send it through and produce an
[00:02:33] and send it through and produce an output
[00:02:34] output 2.71 corresponding to this point on this
[00:02:37] 2.71 corresponding to this point on this line here
[00:02:40] and
[00:02:42] and there are three design decisions that we
[00:02:44] there are three design decisions that we need to make to flesh out this framework
[00:02:46] need to make to flesh out this framework first
[00:02:47] first what are the possible predictors that
[00:02:49] what are the possible predictors that the learning algorithm is allowed to
[00:02:50] the learning algorithm is allowed to output is it only lines or is it curves
[00:02:53] output is it only lines or is it curves as well this is a question of what is
[00:02:55] as well this is a question of what is the hypothesis class
[00:02:57] the hypothesis class second question how good is a predictor
[00:03:02] second question how good is a predictor and the answer is going to be framed in
[00:03:04] and the answer is going to be framed in terms of determining a loss function
[00:03:07] terms of determining a loss function that judges each and individual
[00:03:08] that judges each and individual predictor in the hypothesis class
[00:03:11] predictor in the hypothesis class and finally how do we actually compute
[00:03:14] and finally how do we actually compute the best predictor there are a lot of
[00:03:15] the best predictor there are a lot of predictors there
[00:03:16] predictors there and if we even if we have the loss
[00:03:18] and if we even if we have the loss function how do we go searching for them
[00:03:21] function how do we go searching for them and this is going to be the question of
[00:03:22] and this is going to be the question of the optimization algorithm
[00:03:25] the optimization algorithm so this is a recipe that we're going to
[00:03:27] so this is a recipe that we're going to see over and over again
[00:03:29] see over and over again and it's kind of like a build your own
[00:03:31] and it's kind of like a build your own learning algorithm
[00:03:33] learning algorithm uh so we're going to start
[00:03:35] uh so we're going to start with the first question what is a
[00:03:37] with the first question what is a hypothesis class
[00:03:39] hypothesis class so here is that predictor that we were
[00:03:40] so here is that predictor that we were looking at f of x equals 1 plus 0.75
[00:03:44] looking at f of x equals 1 plus 0.75 now it's a five seven x
[00:03:47] now it's a five seven x and that corresponds to this red line
[00:03:51] and that corresponds to this red line and here's another one here's a purple
[00:03:52] and here's another one here's a purple predictor which has
[00:03:55] predictor which has uh a intercept of two and a slope of
[00:03:58] uh a intercept of two and a slope of zero point two and in general you can
[00:04:01] zero point two and in general you can consider predictors of the following
[00:04:03] consider predictors of the following form
[00:04:04] form f of x equals w1 plus w2 of x for
[00:04:08] f of x equals w1 plus w2 of x for arbitrary w1 which is the intercept and
[00:04:11] arbitrary w1 which is the intercept and w2 which is the slope
[00:04:15] w2 which is the slope so now we're going to generalize this
[00:04:17] so now we're going to generalize this using vector notation
[00:04:19] using vector notation so let's take w 1 and w 2 and pack them
[00:04:23] so let's take w 1 and w 2 and pack them up together
[00:04:24] up together into a vector which we will call w this
[00:04:28] into a vector which we will call w this is called the weight vector
[00:04:30] is called the weight vector and we're also going to define a feature
[00:04:32] and we're also going to define a feature extractor also known as a feature map
[00:04:35] extractor also known as a feature map fee
[00:04:36] fee so phi is going to take an arbitrary
[00:04:38] so phi is going to take an arbitrary input x and return the vector one comma
[00:04:42] input x and return the vector one comma x at least in this case and one comma x
[00:04:45] x at least in this case and one comma x is going to be known as the feature
[00:04:46] is going to be known as the feature vector
[00:04:48] vector so now we can simply rewrite this
[00:04:50] so now we can simply rewrite this equation up here in vector notation so
[00:04:52] equation up here in vector notation so we're going to write f sub w to denote
[00:04:55] we're going to write f sub w to denote that this predictor depends on the
[00:04:56] that this predictor depends on the weights
[00:04:57] weights of a particular input x is equal to w
[00:05:01] of a particular input x is equal to w dot v of x
[00:05:03] dot v of x and this w dot phi of x which we'll see
[00:05:06] and this w dot phi of x which we'll see over and over again is called the
[00:05:08] over and over again is called the score okay so here's an example
[00:05:11] score okay so here's an example if you've taken three into this
[00:05:13] if you've taken three into this predictor
[00:05:14] predictor then what we're doing is taking the
[00:05:16] then what we're doing is taking the weight vector
[00:05:17] weight vector and dotting it with
[00:05:19] and dotting it with the feature vector applied to three
[00:05:22] the feature vector applied to three and remember the definition of feature
[00:05:24] and remember the definition of feature vector is of one comma x so that's one
[00:05:26] vector is of one comma x so that's one comma three here and if you take the dot
[00:05:29] comma three here and if you take the dot product one times one plus zero point
[00:05:30] product one times one plus zero point five seven times three that gives you
[00:05:32] five seven times three that gives you two point seven one
[00:05:36] so now finally the hypothesis class is
[00:05:39] so now finally the hypothesis class is defined as
[00:05:41] defined as the set script f
[00:05:43] the set script f is in the set of all predictors
[00:05:47] is in the set of all predictors f
[00:05:48] f sub w where w can be arbitrary intercept
[00:05:51] sub w where w can be arbitrary intercept in slope w is an arbitrary vector
[00:05:56] okay so that defines our hypothesis
[00:05:58] okay so that defines our hypothesis class that we're going to be working
[00:06:00] class that we're going to be working with
[00:06:02] so now let's turn to the second design
[00:06:04] so now let's turn to the second design decision how good is a predictor
[00:06:07] decision how good is a predictor so let's take our predictor that we're
[00:06:09] so let's take our predictor that we're looking at the red one
[00:06:11] looking at the red one and let's look at some training data so
[00:06:13] and let's look at some training data so this is the training data that we had
[00:06:14] this is the training data that we had before let's plot
[00:06:16] before let's plot the predictor and the three data points
[00:06:19] the predictor and the three data points this one the two three and four three
[00:06:23] this one the two three and four three so intuitively how good is the predictor
[00:06:25] so intuitively how good is the predictor is how
[00:06:27] is how good it uh fits the training data and
[00:06:30] good it uh fits the training data and we're gonna quantify that by measuring
[00:06:32] we're gonna quantify that by measuring the distance between the prediction and
[00:06:35] the distance between the prediction and the target
[00:06:36] the target this difference is called the residual
[00:06:38] this difference is called the residual so we're going to measure the residual
[00:06:39] so we're going to measure the residual for each of our points
[00:06:41] for each of our points and
[00:06:42] and we're going to take that into account
[00:06:44] we're going to take that into account so formally we're going to define a loss
[00:06:47] so formally we're going to define a loss function
[00:06:48] function which is a function of an example x y
[00:06:51] which is a function of an example x y and a particular way vector
[00:06:54] and a particular way vector and that's going to be equal to
[00:06:56] and that's going to be equal to the prediction f of
[00:06:58] the prediction f of x
[00:06:59] x minus the target y
[00:07:01] minus the target y so that's the residual here and i'm
[00:07:03] so that's the residual here and i'm going to square it
[00:07:06] going to square it so that is called the squared loss
[00:07:09] so that is called the squared loss so when it's aside you could also take
[00:07:11] so when it's aside you could also take the absolute value here which gives you
[00:07:13] the absolute value here which gives you the absolute deviation loss but we're
[00:07:15] the absolute deviation loss but we're going to stick with the square laws for
[00:07:16] going to stick with the square laws for mathematical components
[00:07:19] so on these three examples we can
[00:07:21] so on these three examples we can compute the laws so we take 1 1
[00:07:25] compute the laws so we take 1 1 and the y vector we dot them together
[00:07:27] and the y vector we dot them together that's the prediction you subtract off
[00:07:30] that's the prediction you subtract off the target
[00:07:31] the target and that
[00:07:33] and that is
[00:07:34] is and square it and that gives you 0.32
[00:07:37] and square it and that gives you 0.32 second example
[00:07:38] second example is a
[00:07:40] is a 2 3 and the third example is 4 3 each
[00:07:43] 2 3 and the third example is 4 3 each one gives you a loss function which
[00:07:45] one gives you a loss function which corresponds to the square of the length
[00:07:47] corresponds to the square of the length of these
[00:07:49] of these dashed lines
[00:07:51] so now we can define the training loss
[00:07:55] so now we can define the training loss of a particular y vector to be
[00:07:59] of a particular y vector to be simply the average over the losses so
[00:08:01] simply the average over the losses so formally this is going to be a sum
[00:08:05] formally this is going to be a sum over all the examples
[00:08:07] over all the examples in our training set
[00:08:09] in our training set so these
[00:08:11] so these and of the loss function of a particular
[00:08:14] and of the loss function of a particular example with respect to that weight
[00:08:16] example with respect to that weight vector and then finally we're going to
[00:08:18] vector and then finally we're going to just divide by the number of points in
[00:08:20] just divide by the number of points in the training set
[00:08:23] the training set so in this example we just average these
[00:08:25] so in this example we just average these three numbers and we get 0.3
[00:08:28] three numbers and we get 0.3 okay so that is how we define the
[00:08:31] okay so that is how we define the squared loss and the training loss in
[00:08:33] squared loss and the training loss in terms of the square loss
[00:08:36] so
[00:08:37] so here is a training loss as we had from
[00:08:39] here is a training loss as we had from the previous slide and we can visualize
[00:08:41] the previous slide and we can visualize this so for every single weight vector
[00:08:43] this so for every single weight vector we can stick it in and get out a number
[00:08:45] we can stick it in and get out a number so fortunately w is only two dimensional
[00:08:48] so fortunately w is only two dimensional here so we can plot this actually so
[00:08:50] here so we can plot this actually so here is the plot w1 w2 and every point
[00:08:54] here is the plot w1 w2 and every point here gives you a training loss on the
[00:08:56] here gives you a training loss on the z-axis
[00:08:58] z-axis so red here denotes high loss
[00:09:01] so red here denotes high loss blue here denotes low loss and
[00:09:03] blue here denotes low loss and so it's natural to
[00:09:06] so it's natural to think about how you would find the the
[00:09:08] think about how you would find the the point here with the minimum training
[00:09:10] point here with the minimum training loss and that's captured mathematically
[00:09:12] loss and that's captured mathematically as minimum of a w of train loss
[00:09:16] as minimum of a w of train loss of w so this is the optimization problem
[00:09:19] of w so this is the optimization problem that we want to solve
[00:09:23] so now the third question is how do you
[00:09:26] so now the third question is how do you compute the best so fortunately we
[00:09:28] compute the best so fortunately we already have our well-defined goal we
[00:09:30] already have our well-defined goal we want to find the weight vector that
[00:09:32] want to find the weight vector that minimizes the training loss
[00:09:35] minimizes the training loss and we're going to adopt a very simple
[00:09:36] and we're going to adopt a very simple strategy called follow your nose okay so
[00:09:39] strategy called follow your nose okay so you start at a particular w and then you
[00:09:42] you start at a particular w and then you sniff around and then you just move in
[00:09:44] sniff around and then you just move in the direction that seems like it's going
[00:09:46] the direction that seems like it's going to reduce your loss the most
[00:09:48] to reduce your loss the most more mathematically uh we're going to
[00:09:50] more mathematically uh we're going to define the gradient as uh the direction
[00:09:53] define the gradient as uh the direction that increases the training loss the
[00:09:55] that increases the training loss the most and importantly we want to go in
[00:09:57] most and importantly we want to go in the opposite direction because we want
[00:09:59] the opposite direction because we want to decrease the training loss not
[00:10:01] to decrease the training loss not increase it
[00:10:02] increase it so pictorially what the fall in the nose
[00:10:04] so pictorially what the fall in the nose strategy or gradient descent is going to
[00:10:06] strategy or gradient descent is going to look like is you're going to start at
[00:10:08] look like is you're going to start at some w and then you're going to follow
[00:10:11] some w and then you're going to follow the gradient and you're going to end up
[00:10:12] the gradient and you're going to end up here and then you're going to look
[00:10:14] here and then you're going to look compute the gradient you're going to end
[00:10:15] compute the gradient you're going to end up here and you might bounce around a
[00:10:17] up here and you might bounce around a bit but hopefully you'll decrease the
[00:10:20] bit but hopefully you'll decrease the loss um on average over time
[00:10:23] loss um on average over time okay so here's the pseudo code for
[00:10:24] okay so here's the pseudo code for gradient descent
[00:10:26] gradient descent so we initialize w to be something let's
[00:10:28] so we initialize w to be something let's say zeros for simplicity
[00:10:31] say zeros for simplicity and then we're going to repeat big t
[00:10:33] and then we're going to repeat big t times big t is the number of epochs
[00:10:36] times big t is the number of epochs and what we're going to do is we're
[00:10:37] and what we're going to do is we're going to take our old value of the white
[00:10:40] going to take our old value of the white vector and we're going to subtract out
[00:10:44] vector and we're going to subtract out some eta which is called the the step
[00:10:47] some eta which is called the the step size
[00:10:50] um which we'll get into a little bit
[00:10:52] um which we'll get into a little bit later so here's the step size times
[00:10:55] later so here's the step size times the gradient so grad of returning loss
[00:10:59] the gradient so grad of returning loss of w so that's called the gradient here
[00:11:05] okay so that's it
[00:11:07] okay so that's it there's three lines and really only one
[00:11:10] there's three lines and really only one line that's of of interest here and
[00:11:12] line that's of of interest here and that's all there is uh to gradient uh
[00:11:14] that's all there is uh to gradient uh descent
[00:11:17] okay so
[00:11:19] okay so at least at an abstract level so all
[00:11:22] at least at an abstract level so all that remains to do is to actually
[00:11:24] that remains to do is to actually compute the gradient so remember here is
[00:11:26] compute the gradient so remember here is our objective function train loss is the
[00:11:28] our objective function train loss is the average over the individual losses which
[00:11:31] average over the individual losses which i've expanded the square loss right now
[00:11:33] i've expanded the square loss right now but gradient descent is actually much
[00:11:35] but gradient descent is actually much more general than just
[00:11:37] more general than just square lots or even machine learning
[00:11:40] square lots or even machine learning and now we just need to compute the
[00:11:41] and now we just need to compute the gradient
[00:11:42] gradient and if you remember
[00:11:44] and if you remember uh your calculus here's how you do it so
[00:11:47] uh your calculus here's how you do it so the gradient with respect to w of the
[00:11:50] the gradient with respect to w of the train loss of w so remember there's a
[00:11:52] train loss of w so remember there's a lot of symbols here but we are
[00:11:54] lot of symbols here but we are differentiating with respect to w not x
[00:11:57] differentiating with respect to w not x not y not v
[00:11:58] not y not v and this is going to be equal to the
[00:12:00] and this is going to be equal to the gradient of this expression this is just
[00:12:03] gradient of this expression this is just a constant so the gradient can be pushed
[00:12:05] a constant so the gradient can be pushed inside
[00:12:07] inside and this is a sum the gradient can be
[00:12:09] and this is a sum the gradient can be pushed inside of some linearity
[00:12:12] pushed inside of some linearity so this is a sum over the training set
[00:12:14] so this is a sum over the training set again
[00:12:15] again and then so now the interesting thing
[00:12:17] and then so now the interesting thing happens so here is something squared the
[00:12:20] happens so here is something squared the gradient of something squared you bring
[00:12:22] gradient of something squared you bring down the two and then you have the same
[00:12:26] down the two and then you have the same something which is uh
[00:12:28] something which is uh you remember the residual
[00:12:30] you remember the residual times the gradient of what's inside here
[00:12:34] times the gradient of what's inside here and what's inside here is w dot phi of x
[00:12:36] and what's inside here is w dot phi of x p of x is a constant y is a constant so
[00:12:40] p of x is a constant y is a constant so the gradient of that residual is just
[00:12:43] the gradient of that residual is just phi of x
[00:12:45] phi of x and so notice that there's something
[00:12:46] and so notice that there's something interesting i want to point out here
[00:12:49] interesting i want to point out here which is that the gradient
[00:12:51] which is that the gradient can be expressed as
[00:12:53] can be expressed as the residual times the feature vector
[00:12:55] the residual times the feature vector where the residual is the prediction
[00:12:58] where the residual is the prediction minus the target
[00:13:01] minus the target so intuitively you can think about if
[00:13:03] so intuitively you can think about if the prediction is equal to target then
[00:13:05] the prediction is equal to target then the gradient is zero so nothing will
[00:13:07] the gradient is zero so nothing will happen
[00:13:08] happen and if the prediction is not equal
[00:13:09] and if the prediction is not equal target then
[00:13:11] target then the gradient will be in the direction
[00:13:13] the gradient will be in the direction that ascends the prediction away from
[00:13:16] that ascends the prediction away from the target and remember we're always
[00:13:17] the target and remember we're always minimizing so we're subtracting that off
[00:13:20] minimizing so we're subtracting that off and which will move the weights in the
[00:13:21] and which will move the weights in the right direction
[00:13:24] okay so let's walk through the gradient
[00:13:27] okay so let's walk through the gradient descent for our example here so here's a
[00:13:29] descent for our example here so here's a training example
[00:13:31] training example again
[00:13:32] again here is expression for the gradient that
[00:13:34] here is expression for the gradient that we just computed on the previous slide
[00:13:36] we just computed on the previous slide and here is the gradient update where
[00:13:37] and here is the gradient update where i've taken the liberty of just
[00:13:39] i've taken the liberty of just substituting the step size to be just
[00:13:41] substituting the step size to be just point one just for simplicity
[00:13:43] point one just for simplicity okay so
[00:13:45] okay so we start with w equals 0 0
[00:13:48] we start with w equals 0 0 and then what we're going to do is plug
[00:13:49] and then what we're going to do is plug in 0 0
[00:13:51] in 0 0 into this extrane loss expression this
[00:13:54] into this extrane loss expression this somewhat nasty looking thing
[00:13:56] somewhat nasty looking thing and
[00:13:57] and that is this this is just simply the
[00:14:00] that is this this is just simply the average three examples one over three
[00:14:02] average three examples one over three here's the first example here's the
[00:14:04] here's the first example here's the second example here's the third example
[00:14:06] second example here's the third example each example consists of a dot product
[00:14:09] each example consists of a dot product the prediction minus target times the
[00:14:11] the prediction minus target times the feature vector of that example
[00:14:14] feature vector of that example so i'll let you go through the details
[00:14:15] so i'll let you go through the details here but if you do the math you get
[00:14:18] here but if you do the math you get minus 4.67 minus 12.67 you multiply by
[00:14:22] minus 4.67 minus 12.67 you multiply by the step size and you get
[00:14:25] the step size and you get this
[00:14:26] this weight
[00:14:29] weight okay so now the second iteration you're
[00:14:30] okay so now the second iteration you're going to
[00:14:32] going to take this way vector stick it into this
[00:14:35] take this way vector stick it into this expression all over again you compute a
[00:14:37] expression all over again you compute a new gradient and then you
[00:14:40] new gradient and then you subtract that gradient times 0.1 from
[00:14:43] subtract that gradient times 0.1 from this wave vector and you're going to get
[00:14:44] this wave vector and you're going to get a new wave vector
[00:14:46] a new wave vector and then you keep on repeating and
[00:14:48] and then you keep on repeating and repeating so um after
[00:14:51] repeating so um after maybe 200 iterations you're going to end
[00:14:53] maybe 200 iterations you're going to end up with something like this
[00:14:55] up with something like this and something interesting happens if
[00:14:58] and something interesting happens if you're lucky the gradient at the end
[00:15:00] you're lucky the gradient at the end will be
[00:15:00] will be zero
[00:15:02] zero so what is zero means so zero
[00:15:04] so what is zero means so zero if you subtract out zero you get the
[00:15:06] if you subtract out zero you get the same thing so means that gradient
[00:15:08] same thing so means that gradient descent has converged by subtracting off
[00:15:10] descent has converged by subtracting off the gradient you're not going to move
[00:15:12] the gradient you're not going to move anywhere and as you might as well just
[00:15:13] anywhere and as you might as well just stop and the stopping point is this way
[00:15:15] stop and the stopping point is this way vector one comma
[00:15:17] vector one comma 0.57 which is indeed the red classifier
[00:15:24] so just to concretize this even more
[00:15:27] so just to concretize this even more let's uh do gradient descent in python
[00:15:30] let's uh do gradient descent in python okay so i'm going to pull out the
[00:15:31] okay so i'm going to pull out the terminal here
[00:15:32] terminal here um so
[00:15:34] um so in practice you probably wouldn't
[00:15:35] in practice you probably wouldn't implement gradient descent uh from
[00:15:37] implement gradient descent uh from scratch except for if you're just trying
[00:15:39] scratch except for if you're just trying to learn about brain descent but for
[00:15:41] to learn about brain descent but for pedagogical purposes uh let me do this
[00:15:44] pedagogical purposes uh let me do this okay
[00:15:45] okay so um
[00:15:47] so um i'm gonna do this in a kind of very bare
[00:15:49] i'm gonna do this in a kind of very bare bones way
[00:15:51] bones way so i'm gonna use
[00:15:53] so i'm gonna use numpy
[00:15:55] numpy rather than pi torch or something that
[00:15:57] rather than pi torch or something that can do gradients for you
[00:16:00] can do gradients for you first i'm going to define our training
[00:16:01] first i'm going to define our training examples as 1
[00:16:04] examples as 1 1
[00:16:04] 1 2 3 and 4 3.
[00:16:07] 2 3 and 4 3. i believe those are the training
[00:16:08] i believe those are the training examples let me just double check over
[00:16:11] examples let me just double check over here one one two three four three
[00:16:13] here one one two three four three okay so now i have to define a feature
[00:16:16] okay so now i have to define a feature vector of x which is remember is one x
[00:16:21] vector of x which is remember is one x this is just a
[00:16:23] this is just a numpy array
[00:16:24] numpy array um i'm going to initialize uh the weight
[00:16:28] um i'm going to initialize uh the weight vector with let's call this initial wave
[00:16:30] vector with let's call this initial wave vector
[00:16:33] vector and this is just going to be all zeros
[00:16:35] and this is just going to be all zeros vector of dimension two which is going
[00:16:38] vector of dimension two which is going to match the dimensionality of phi
[00:16:42] okay so now i need to define the
[00:16:44] okay so now i need to define the training loss
[00:16:46] training loss training loss takes away vector
[00:16:48] training loss takes away vector and i'm going to actually go to the
[00:16:51] and i'm going to actually go to the previous slide here and it's just
[00:16:53] previous slide here and it's just basically copying down math and turning
[00:16:55] basically copying down math and turning it into code
[00:16:57] it into code so this is 1 over
[00:16:59] so this is 1 over the number of training examples
[00:17:03] the number of training examples times the sum
[00:17:06] times the sum um and the sum is over
[00:17:08] um and the sum is over for all training examples x y and
[00:17:11] for all training examples x y and training examples
[00:17:13] training examples and for each one i'm going to do w
[00:17:16] and for each one i'm going to do w uh dot e of x it's really literally the
[00:17:20] uh dot e of x it's really literally the same thing minus y
[00:17:21] same thing minus y and i'm going to take this expression
[00:17:23] and i'm going to take this expression the residual and i'm going to square it
[00:17:27] the residual and i'm going to square it okay so let's make this a little bit
[00:17:28] okay so let's make this a little bit bigger
[00:17:29] bigger okay so that's the training loss
[00:17:32] okay so that's the training loss um okay now i need to take the gradient
[00:17:35] um okay now i need to take the gradient so i'm going to cheat a little bit and
[00:17:36] so i'm going to cheat a little bit and just copy that down here and edit it
[00:17:39] just copy that down here and edit it so the gradient of the training loss is
[00:17:42] so the gradient of the training loss is going to be
[00:17:44] going to be 2 times the residual
[00:17:47] 2 times the residual times
[00:17:48] times e of x
[00:17:50] e of x okay so that's it for the training loss
[00:17:53] okay so that's it for the training loss okay so now i'm going to implement
[00:17:55] okay so now i'm going to implement gradient uh descent
[00:17:58] gradient uh descent so
[00:17:58] so i'm going to
[00:18:01] i'm going to um
[00:18:02] um do it this way actually so gradient
[00:18:04] do it this way actually so gradient descent
[00:18:05] descent like i alluded to before is actually a
[00:18:07] like i alluded to before is actually a general purpose optimization so all it
[00:18:10] general purpose optimization so all it needs is a function
[00:18:12] needs is a function gradient access to that function an
[00:18:14] gradient access to that function an initial wave vector and it's ready to go
[00:18:16] initial wave vector and it's ready to go okay so i'm going to
[00:18:19] okay so i'm going to initialize w to the initial wave vector
[00:18:22] initialize w to the initial wave vector and then i'm going to let's say eta to
[00:18:25] and then i'm going to let's say eta to 0.1
[00:18:26] 0.1 um for
[00:18:28] um for a number of iterations t
[00:18:31] a number of iterations t in range of let's just say i know 500
[00:18:34] in range of let's just say i know 500 just for fun
[00:18:36] just for fun um i'm going to
[00:18:38] um i'm going to uh
[00:18:39] uh evaluate the function at w
[00:18:42] evaluate the function at w i'm going to evaluate the gradient
[00:18:45] i'm going to evaluate the gradient um and i'm just going to do the one line
[00:18:49] um and i'm just going to do the one line thing of
[00:18:50] thing of subtracting out a to times the gradient
[00:18:52] subtracting out a to times the gradient from the existing wave vector and
[00:18:54] from the existing wave vector and setting it to the new way vector
[00:18:57] setting it to the new way vector and i'm going to print out
[00:19:01] where i am so f of t
[00:19:04] where i am so f of t w equals w
[00:19:06] w equals w f of w equals uh the value
[00:19:10] f of w equals uh the value and let's do the gradient just one so
[00:19:13] and let's do the gradient just one so grad
[00:19:15] grad gradient f
[00:19:16] gradient f equals um the gradient
[00:19:19] equals um the gradient okay
[00:19:21] okay okay so now i just need to call gradient
[00:19:23] okay so now i just need to call gradient descent
[00:19:24] descent with
[00:19:26] with what function am i optimizing the train
[00:19:28] what function am i optimizing the train loss
[00:19:30] loss the gradient of the train loss
[00:19:33] the gradient of the train loss is the gradient of the train loss and
[00:19:36] is the gradient of the train loss and the initial weight vector
[00:19:38] the initial weight vector okay
[00:19:40] okay so
[00:19:41] so uh that's all i have and let's actually
[00:19:43] uh that's all i have and let's actually just
[00:19:44] just you know run it
[00:19:47] gradient descent
[00:19:48] gradient descent um
[00:19:50] um so we see here that in f x 0 the wave
[00:19:53] so we see here that in f x 0 the wave vector
[00:19:54] vector is something
[00:19:55] is something and the function value is something
[00:19:57] and the function value is something greater than something and over time
[00:20:00] greater than something and over time the function value is going to decrease
[00:20:02] the function value is going to decrease which is a good sign
[00:20:04] which is a good sign the gradient of f is going to start
[00:20:07] the gradient of f is going to start becoming zero zero
[00:20:09] becoming zero zero and uh and the weight vectors are now
[00:20:13] and uh and the weight vectors are now converging to
[00:20:14] converging to one comma zero point five seven as
[00:20:17] one comma zero point five seven as advertised
[00:20:19] advertised so i will declare this program working
[00:20:21] so i will declare this program working let's just kind of summarize what we did
[00:20:23] let's just kind of summarize what we did here
[00:20:24] here so i want to set this up as follows so
[00:20:27] so i want to set this up as follows so here is the optimization problem
[00:20:30] here is the optimization problem which is
[00:20:31] which is how you you have the training example
[00:20:33] how you you have the training example the future vectors the loss and gradient
[00:20:35] the future vectors the loss and gradient and so on and this is kind of a
[00:20:39] and so on and this is kind of a specification of what the problem we
[00:20:41] specification of what the problem we want to solve is
[00:20:42] want to solve is and then down here
[00:20:44] and then down here we have the optimization algorithm
[00:20:48] we have the optimization algorithm and we're going to be doing this a few
[00:20:49] and we're going to be doing this a few times throughout the course drawing kind
[00:20:51] times throughout the course drawing kind of
[00:20:52] of modules where we can separate the
[00:20:55] modules where we can separate the optimization problem from the
[00:20:57] optimization problem from the optimization algorithm notice that on
[00:20:59] optimization algorithm notice that on the optimization algorithm again doesn't
[00:21:01] the optimization algorithm again doesn't depend on anything
[00:21:03] depend on anything relating to
[00:21:04] relating to machine learning at all
[00:21:06] machine learning at all and the optimization problem doesn't say
[00:21:08] and the optimization problem doesn't say anything about how you solve it so it's
[00:21:10] anything about how you solve it so it's decoupling this what from the how i
[00:21:12] decoupling this what from the how i think is a really important thing
[00:21:16] think is a really important thing okay so that was gradient descent in
[00:21:19] okay so that was gradient descent in code
[00:21:20] code and let's summarize now so in summary we
[00:21:24] and let's summarize now so in summary we take training data we have a learning
[00:21:25] take training data we have a learning algorithm that produces a predictor that
[00:21:28] algorithm that produces a predictor that can produce new predictions on new
[00:21:30] can produce new predictions on new inputs
[00:21:31] inputs and there are three design decisions or
[00:21:34] and there are three design decisions or build your own learning algorithm which
[00:21:36] build your own learning algorithm which predictors are possible
[00:21:38] predictors are possible that is a question of hypothesis class
[00:21:40] that is a question of hypothesis class and we consider linear functions here
[00:21:42] and we consider linear functions here where
[00:21:43] where the function is simply w dot phi of x
[00:21:47] the function is simply w dot phi of x with a particular
[00:21:49] with a particular feature map one comma x
[00:21:51] feature map one comma x um you could imagine other things we'll
[00:21:53] um you could imagine other things we'll see other things later non-linear uh
[00:21:55] see other things later non-linear uh features and even neural networks but
[00:21:58] features and even neural networks but it's still the question of what is the
[00:21:59] it's still the question of what is the hypothesis class
[00:22:01] hypothesis class how good is a predictor
[00:22:03] how good is a predictor that's a question of what is the loss
[00:22:04] that's a question of what is the loss function for regression we looked at the
[00:22:07] function for regression we looked at the squared loss
[00:22:08] squared loss later for classification we're going to
[00:22:10] later for classification we're going to look at the hinge loss and the zero one
[00:22:12] look at the hinge loss and the zero one loss
[00:22:13] loss um but this is orthogonal for neural
[00:22:15] um but this is orthogonal for neural networks we can also look at the hinge
[00:22:17] networks we can also look at the hinge locks or the square lots or any of the
[00:22:19] locks or the square lots or any of the losses
[00:22:20] losses and finally
[00:22:22] and finally how do we compute the best predictor
[00:22:24] how do we compute the best predictor this is a question of what is the
[00:22:25] this is a question of what is the optimization algorithm
[00:22:27] optimization algorithm and for this we introduce gradient
[00:22:29] and for this we introduce gradient descent which is this lovely uh simple
[00:22:32] descent which is this lovely uh simple and very effective algorithm for
[00:22:34] and very effective algorithm for optimization
[00:22:36] optimization so that concludes this module
[00:22:43] you


================================================================================
LECTURE 006
================================================================================

Artificial Intelligence & Machine learning 3 - Linear Classification | Stanford CS221 (Autumn 2021)

Source: https://www.youtube.com/watch?v=WcaMiqJR09s

---

Transcript

[00:00:05] hi this module is about linear
[00:00:07] hi this module is about linear classification we're going to go through
[00:00:09] classification we're going to go through a linear classification via a simple
[00:00:11] a linear classification via a simple example just like we did for linear
[00:00:12] example just like we did for linear regression
[00:00:14] regression so as before we have training data which
[00:00:17] so as before we have training data which consists of a set of examples
[00:00:20] consists of a set of examples and each example
[00:00:22] and each example is now going to be an input x1 x2
[00:00:26] is now going to be an input x1 x2 followed by a label y
[00:00:28] followed by a label y so we have three examples here the input
[00:00:30] so we have three examples here the input zero two
[00:00:32] zero two is has output one
[00:00:34] is has output one minus two zero has output one
[00:00:36] minus two zero has output one and minus one minus one
[00:00:39] and minus one minus one sorry one minus one has output minus one
[00:00:43] sorry one minus one has output minus one we can visualize these points just the
[00:00:45] we can visualize these points just the input part on the
[00:00:47] input part on the on the 2d diagram so where i'm plotting
[00:00:50] on the 2d diagram so where i'm plotting x1 by x2
[00:00:53] x1 by x2 so here is 0 2
[00:00:55] so here is 0 2 and i'm coloring it
[00:00:57] and i'm coloring it orange to denote that as a positive
[00:00:59] orange to denote that as a positive point
[00:01:00] point this is minus 2 0 that's also orange
[00:01:03] this is minus 2 0 that's also orange because it's positive and here is ma 1
[00:01:06] because it's positive and here is ma 1 minus 1
[00:01:08] minus 1 which is blue because it's labeled as
[00:01:11] which is blue because it's labeled as negative
[00:01:13] negative so given these points we want to design
[00:01:15] so given these points we want to design a learning algorithm
[00:01:16] a learning algorithm that can output a predictor in
[00:01:18] that can output a predictor in classification it's known as a
[00:01:20] classification it's known as a classifier
[00:01:21] classifier and this classifier can take new inputs
[00:01:24] and this classifier can take new inputs and crank them through the classifier
[00:01:26] and crank them through the classifier and produce an output label
[00:01:30] and produce an output label and so this is demonstrated um as
[00:01:34] and so this is demonstrated um as follows on the plot as follows so the
[00:01:36] follows on the plot as follows so the classifier in
[00:01:37] classifier in classification is going to be
[00:01:39] classification is going to be represented by a decision boundary
[00:01:42] represented by a decision boundary so this decision boundary carves up the
[00:01:45] so this decision boundary carves up the space into a region
[00:01:48] space into a region where
[00:01:49] where the points are labeled positive
[00:01:52] the points are labeled positive and the region where the points are
[00:01:54] and the region where the points are labeled minus
[00:01:56] labeled minus so two zero is going to be predicted as
[00:01:59] so two zero is going to be predicted as a minus one in this case
[00:02:03] a minus one in this case okay as before we have three design
[00:02:05] okay as before we have three design decisions we need to settle
[00:02:07] decisions we need to settle first which classifiers are possible
[00:02:11] first which classifiers are possible and this is a question of the hypothesis
[00:02:13] and this is a question of the hypothesis class we're going to consider
[00:02:15] class we're going to consider are the decision boundaries going to be
[00:02:17] are the decision boundaries going to be straight or can they be curved
[00:02:20] straight or can they be curved second how good is a classifier this is
[00:02:22] second how good is a classifier this is a question of a loss function
[00:02:24] a question of a loss function and third how do we compute
[00:02:26] and third how do we compute the best classifier
[00:02:28] the best classifier aka the classifier with the lowest loss
[00:02:32] aka the classifier with the lowest loss and that's going to be a question of the
[00:02:34] and that's going to be a question of the optimization algorithm
[00:02:37] optimization algorithm so
[00:02:39] so before we begin talking about
[00:02:41] before we begin talking about the design space of
[00:02:44] the design space of the hypothesis class i'm going to focus
[00:02:46] the hypothesis class i'm going to focus on an example linear classifier here
[00:02:49] on an example linear classifier here so we have f of x equals
[00:02:53] so we have f of x equals and then we have i'm going to define
[00:02:55] and then we have i'm going to define this weight vector w
[00:02:57] this weight vector w to be
[00:02:59] to be minus 0.6 comma 0.6
[00:03:05] and i'm going to take the dot product
[00:03:08] and i'm going to take the dot product with
[00:03:09] with a feature vector
[00:03:11] a feature vector which is going to be
[00:03:13] which is going to be just the identity feature vector mapping
[00:03:15] just the identity feature vector mapping to x1 comma x2 remember x is
[00:03:18] to x1 comma x2 remember x is now a two-dimensional
[00:03:20] now a two-dimensional list of
[00:03:21] list of two numbers
[00:03:23] two numbers and then i'm going to take this dot
[00:03:24] and then i'm going to take this dot product i'm going to take the sign
[00:03:28] product i'm going to take the sign and remember the sign of a scalar is
[00:03:32] and remember the sign of a scalar is equal to
[00:03:36] plus one if that scalar is positive
[00:03:39] plus one if that scalar is positive minus one if it's negative
[00:03:42] minus one if it's negative and zero
[00:03:43] and zero if it is
[00:03:44] if it is zero
[00:03:47] zero okay so
[00:03:49] okay so um how
[00:03:50] um how let's see what this classifier does on
[00:03:52] let's see what this classifier does on some points
[00:03:54] some points so each point is x1 x2 so let's look at
[00:03:58] so each point is x1 x2 so let's look at zero
[00:03:59] zero okay so let's look at where 0 2 is
[00:04:02] okay so let's look at where 0 2 is on the plot 0 2 is right here
[00:04:05] on the plot 0 2 is right here and i'm going to represent it by this
[00:04:07] and i'm going to represent it by this vector here
[00:04:08] vector here and now
[00:04:10] and now this vector is phi of x
[00:04:12] this vector is phi of x w is going to be this vector here that's
[00:04:16] w is going to be this vector here that's the weight vector
[00:04:18] the weight vector and
[00:04:18] and the dot product remembering from linear
[00:04:21] the dot product remembering from linear algebra is the cosine of this angle
[00:04:25] algebra is the cosine of this angle and in particular the dot product is
[00:04:27] and in particular the dot product is positive if and only if this angle is
[00:04:29] positive if and only if this angle is acute
[00:04:30] acute and it's
[00:04:32] and it's negative if the angle is up to so in
[00:04:34] negative if the angle is up to so in this case it is
[00:04:36] this case it is acute so therefore this point is going
[00:04:39] acute so therefore this point is going to be classified as positive
[00:04:43] to be classified as positive so let's take another point so
[00:04:45] so let's take another point so minus 2 0
[00:04:47] minus 2 0 minus 2 0 is here
[00:04:50] minus 2 0 is here and this angle is also acute so
[00:04:53] and this angle is also acute so therefore
[00:04:54] therefore this point is also labeled as positive
[00:04:58] this point is also labeled as positive and the third point is one minus one so
[00:05:01] and the third point is one minus one so one minus one is over here
[00:05:03] one minus one is over here and now this angle between the red and
[00:05:06] and now this angle between the red and the blue is obtuse
[00:05:08] the blue is obtuse therefore the
[00:05:10] therefore the sign is negative
[00:05:13] sign is negative so you can
[00:05:14] so you can understand how a classifier behaves
[00:05:17] understand how a classifier behaves geometrically
[00:05:18] geometrically um but you can also do this uh
[00:05:20] um but you can also do this uh symbolically by following the math so if
[00:05:23] symbolically by following the math so if you plug in the first point 0 2
[00:05:26] you plug in the first point 0 2 you take the sign
[00:05:28] you take the sign the dot product is 1.2 you take the sign
[00:05:31] the dot product is 1.2 you take the sign and you get 1.
[00:05:32] and you get 1. if you take the second point
[00:05:35] if you take the second point the sign is also 1.2 you get 1. and you
[00:05:38] the sign is also 1.2 you get 1. and you take the third point the sign is minus
[00:05:40] take the third point the sign is minus 1.2 and the sine of minus 1.2 is minus
[00:05:43] 1.2 and the sine of minus 1.2 is minus 1.
[00:05:45] 1. okay so you can kind of see the pattern
[00:05:46] okay so you can kind of see the pattern now
[00:05:48] now so
[00:05:48] so we have
[00:05:49] we have any point over here that forms an acute
[00:05:52] any point over here that forms an acute angle with this wave vector
[00:05:55] angle with this wave vector minus 0.6.6 is going to be labeled as
[00:05:59] minus 0.6.6 is going to be labeled as positive
[00:06:00] positive and anything that forms an obtuse angle
[00:06:03] and anything that forms an obtuse angle with this wave vector is going to be
[00:06:05] with this wave vector is going to be labeled as negative
[00:06:06] labeled as negative and the decision boundary is exactly
[00:06:09] and the decision boundary is exactly those points
[00:06:10] those points that are perpendicular and indeed you
[00:06:13] that are perpendicular and indeed you can see that this is a right angle here
[00:06:15] can see that this is a right angle here these are the points which the
[00:06:16] these are the points which the classifier just doesn't know if it's
[00:06:18] classifier just doesn't know if it's positive or negative
[00:06:22] okay
[00:06:23] okay so that was one particular classifier um
[00:06:27] so that was one particular classifier um that was this one
[00:06:29] that was this one but we can imagine other ones
[00:06:31] but we can imagine other ones so we can imagine this purple classifier
[00:06:34] so we can imagine this purple classifier which has weights
[00:06:35] which has weights 0.5 and 1
[00:06:37] 0.5 and 1 and that corresponds to
[00:06:40] and that corresponds to this point here
[00:06:42] this point here so that is 0.51 and remember the
[00:06:44] so that is 0.51 and remember the decision boundary is the thing that is
[00:06:47] decision boundary is the thing that is perpendicular or normal to the weight
[00:06:49] perpendicular or normal to the weight vector and in 2d it's given by this line
[00:06:52] vector and in 2d it's given by this line so this purple classifier will classify
[00:06:54] so this purple classifier will classify all these points plus and all of these
[00:06:57] all these points plus and all of these points minus
[00:06:59] points minus in general
[00:07:01] in general the binary classifier f sub w
[00:07:05] the binary classifier f sub w uh where w is the weights of a
[00:07:08] uh where w is the weights of a particular input x is equal to
[00:07:11] particular input x is equal to you take the dot product
[00:07:13] you take the dot product and then you take the sign of that dot
[00:07:15] and then you take the sign of that dot product
[00:07:17] product okay
[00:07:19] okay and the hypothesis class as before is
[00:07:22] and the hypothesis class as before is just simply the set of all possible
[00:07:24] just simply the set of all possible classifiers by ranging the weights over
[00:07:27] classifiers by ranging the weights over any two real numbers
[00:07:32] so that's the hypothesis class now let's
[00:07:34] so that's the hypothesis class now let's go on to the second design decision
[00:07:37] go on to the second design decision what is a good loss function
[00:07:39] what is a good loss function okay so let's take our purple classifier
[00:07:43] okay so let's take our purple classifier and some training data
[00:07:46] and some training data and we're going to evaluate how
[00:07:49] and we're going to evaluate how good this classifier is on this training
[00:07:51] good this classifier is on this training data okay so the training data uh
[00:07:54] data okay so the training data uh let's go through this so here's the
[00:07:56] let's go through this so here's the classifier and the first point is zero
[00:08:00] classifier and the first point is zero two
[00:08:01] two and this was labeled as uh plus one okay
[00:08:05] and this was labeled as uh plus one okay so that is this
[00:08:06] so that is this point over here
[00:08:08] point over here um
[00:08:10] um and this classifier is predicted
[00:08:12] and this classifier is predicted correctly because it's on this side it's
[00:08:15] correctly because it's on this side it's a positive label and it's the classifier
[00:08:17] a positive label and it's the classifier also thinks it's positive
[00:08:19] also thinks it's positive so therefore we expect low loss
[00:08:22] so therefore we expect low loss whereas this point over here
[00:08:25] whereas this point over here minus 2 0
[00:08:27] minus 2 0 is labeled as positive but it's on the
[00:08:29] is labeled as positive but it's on the other side of the decision boundary and
[00:08:31] other side of the decision boundary and therefore it's classified incorrectly
[00:08:35] therefore it's classified incorrectly on this point
[00:08:37] on this point 1 minus 1 minus 1
[00:08:40] 1 minus 1 minus 1 is over here and it's is labeled in the
[00:08:43] is over here and it's is labeled in the training data as minus
[00:08:46] training data as minus it is on this side of the decision
[00:08:47] it is on this side of the decision boundary so it's predicted as minus
[00:08:49] boundary so it's predicted as minus therefore it is labeled correctly as
[00:08:52] therefore it is labeled correctly as well
[00:08:53] well so to formalize this we're going to
[00:08:55] so to formalize this we're going to define something called the zero one
[00:08:57] define something called the zero one loss
[00:08:58] loss and just like any loss function it takes
[00:09:00] and just like any loss function it takes in a particular example and a wave
[00:09:03] in a particular example and a wave vector
[00:09:04] vector and it looks at the prediction
[00:09:07] and it looks at the prediction and the the target and says do they
[00:09:10] and the the target and says do they disagree
[00:09:11] disagree and if they disagree
[00:09:13] and if they disagree then this indicator function will return
[00:09:15] then this indicator function will return one
[00:09:16] one and if they agree then the indicator
[00:09:18] and if they agree then the indicator function returns zero so this is a zero
[00:09:20] function returns zero so this is a zero on loss so mathematically you can walk
[00:09:24] on loss so mathematically you can walk through these calculations you plug in
[00:09:26] through these calculations you plug in the first point
[00:09:29] the first point and you look at the sign
[00:09:31] and you look at the sign the sign here is going to be
[00:09:35] the sign here is going to be two
[00:09:36] two the dot product of two is one
[00:09:38] the dot product of two is one they don't disagree so that's a zero the
[00:09:41] they don't disagree so that's a zero the second point
[00:09:42] second point they do disagree so the loss is one and
[00:09:45] they do disagree so the loss is one and the third point they also don't disagree
[00:09:48] the third point they also don't disagree and the loss is zero
[00:09:51] and the loss is zero and as before the training loss
[00:09:53] and as before the training loss over the entire training set of examples
[00:09:56] over the entire training set of examples is
[00:09:58] is just simply the average over the per
[00:10:00] just simply the average over the per example losses and in this case it's one
[00:10:02] example losses and in this case it's one third
[00:10:07] so before we move on to the design
[00:10:09] so before we move on to the design decision of how to optimize the loss
[00:10:11] decision of how to optimize the loss function let's spend some time
[00:10:13] function let's spend some time understanding two important concepts so
[00:10:15] understanding two important concepts so we can rewrite the zero loss in a
[00:10:18] we can rewrite the zero loss in a slightly different way
[00:10:19] slightly different way so recall that the predicted label on
[00:10:21] so recall that the predicted label on the particular input is the sign of the
[00:10:24] the particular input is the sign of the dot product and
[00:10:26] dot product and the target label is y
[00:10:30] so
[00:10:31] so the score
[00:10:33] the score is something that we've seen before a
[00:10:36] is something that we've seen before a score on example is simply this
[00:10:39] score on example is simply this expression
[00:10:40] expression which is a dot product inside the sign
[00:10:43] which is a dot product inside the sign and while the sign is just one or minus
[00:10:46] and while the sign is just one or minus one
[00:10:47] one um the score is a real number which
[00:10:50] um the score is a real number which uh intuitively represents how confident
[00:10:53] uh intuitively represents how confident we are in predicting plus one
[00:10:56] we are in predicting plus one so points over here have large dot
[00:10:59] so points over here have large dot product with this purple weight vector
[00:11:01] product with this purple weight vector and have high score um ones on the
[00:11:03] and have high score um ones on the decision boundary have zero score ones
[00:11:06] decision boundary have zero score ones over here have um very negative scores
[00:11:10] over here have um very negative scores the second concept is that of a margin
[00:11:13] the second concept is that of a margin which takes takes into account the
[00:11:15] which takes takes into account the target label
[00:11:16] target label so the margin on the example is simply
[00:11:19] so the margin on the example is simply the score
[00:11:20] the score times
[00:11:21] times the correct target label
[00:11:23] the correct target label and this measures how correct we are
[00:11:26] and this measures how correct we are notice that you can be confident but not
[00:11:28] notice that you can be confident but not correct
[00:11:29] correct uh important life lesson
[00:11:31] uh important life lesson so um
[00:11:34] so um so if y is positive
[00:11:37] so if y is positive then the margin is going to be high when
[00:11:40] then the margin is going to be high when this number is hugely positive and if y
[00:11:43] this number is hugely positive and if y is minus one
[00:11:45] is minus one then the margin is going to be high when
[00:11:47] then the margin is going to be high when this score is hugely negative
[00:11:52] okay so with these two definitions in
[00:11:55] okay so with these two definitions in mind we can now
[00:11:56] mind we can now look at the zero and loss again remember
[00:11:59] look at the zero and loss again remember the zero on loss is an indicator of
[00:12:01] the zero on loss is an indicator of whether the prediction and target
[00:12:02] whether the prediction and target disagree
[00:12:04] disagree but now we can represent it in terms of
[00:12:06] but now we can represent it in terms of the margin
[00:12:07] the margin so this is the expression it's basically
[00:12:09] so this is the expression it's basically when the margin is less or equal to zero
[00:12:12] when the margin is less or equal to zero so remember
[00:12:13] so remember positive margin means that we're
[00:12:15] positive margin means that we're classifying correctly a negative margin
[00:12:17] classifying correctly a negative margin means that we're classifying incorrectly
[00:12:20] means that we're classifying incorrectly and we can visualize this
[00:12:22] and we can visualize this as follows so here i'm plotting the
[00:12:24] as follows so here i'm plotting the margin
[00:12:25] margin against the loss
[00:12:27] against the loss and if the margin is positive greater
[00:12:30] and if the margin is positive greater than zero the loss is zero and if the
[00:12:33] than zero the loss is zero and if the margin is less or equal to zero then the
[00:12:36] margin is less or equal to zero then the loss is one
[00:12:41] okay so that is zero one losses
[00:12:43] okay so that is zero one losses expressed in the margin
[00:12:45] expressed in the margin okay so now let's optimize the third
[00:12:47] okay so now let's optimize the third design decision let's do
[00:12:49] design decision let's do um
[00:12:50] um optimize the training loss
[00:12:52] optimize the training loss we want to find the minimum weight
[00:12:54] we want to find the minimum weight vector that minimizes this expression
[00:12:56] vector that minimizes this expression which is the average of the individual
[00:12:57] which is the average of the individual losses
[00:12:58] losses and let's just use gradient descent as
[00:13:01] and let's just use gradient descent as we did before
[00:13:02] we did before and to do it we have to compute the
[00:13:03] and to do it we have to compute the gradients so the gradient of the
[00:13:05] gradients so the gradient of the training loss is equal to the sum over
[00:13:07] training loss is equal to the sum over the gradient of the individual losses
[00:13:10] the gradient of the individual losses you look at the individual losses take
[00:13:12] you look at the individual losses take the gradient and now you have to take
[00:13:14] the gradient and now you have to take the gradient with respect to this
[00:13:16] the gradient with respect to this indicator function
[00:13:19] indicator function and now that's where things go uh wrong
[00:13:22] and now that's where things go uh wrong so if you remember what this
[00:13:25] so if you remember what this the loss looks like
[00:13:27] the loss looks like it's looks like this step function
[00:13:30] it's looks like this step function right and what's the gradient of this
[00:13:31] right and what's the gradient of this function
[00:13:32] function well
[00:13:33] well it's zero almost everywhere it's zero
[00:13:36] it's zero almost everywhere it's zero zero zero zero zero and then there's
[00:13:37] zero zero zero zero and then there's this discontinuity where it's undefined
[00:13:39] this discontinuity where it's undefined and then zero zero zero zero
[00:13:42] and then zero zero zero zero so remember what gradient descent's
[00:13:43] so remember what gradient descent's trying to do it computes the gradient
[00:13:45] trying to do it computes the gradient and then it moves in that direction and
[00:13:47] and then it moves in that direction and if the gradient is zero brain descent
[00:13:49] if the gradient is zero brain descent just gets stuck and it can't go anywhere
[00:13:51] just gets stuck and it can't go anywhere so gradient descent will not work
[00:13:54] so gradient descent will not work on the xeron loss
[00:13:56] on the xeron loss so one kind of technical note is that if
[00:13:59] so one kind of technical note is that if someone asks you why can't you do
[00:14:00] someone asks you why can't you do gradient descent on the zero and loss uh
[00:14:03] gradient descent on the zero and loss uh initial reaction might be because it's
[00:14:04] initial reaction might be because it's not differentiable and that is true it's
[00:14:07] not differentiable and that is true it's not differentiable but it's only
[00:14:08] not differentiable but it's only differentiable not differentiable at one
[00:14:10] differentiable not differentiable at one point
[00:14:11] point the real reason is that the gradient is
[00:14:13] the real reason is that the gradient is zero everywhere and with a zero gradient
[00:14:15] zero everywhere and with a zero gradient you just can't make any progress
[00:14:19] so how do you fix this there's a few
[00:14:21] so how do you fix this there's a few things you can do but one example is
[00:14:25] things you can do but one example is what is called the hinge loss so
[00:14:26] what is called the hinge loss so pictorially the hinge loss is just
[00:14:28] pictorially the hinge loss is just another loss function
[00:14:30] another loss function um the one in green here
[00:14:33] um the one in green here and i'm plotting it on this margin
[00:14:35] and i'm plotting it on this margin versus loss plot the zero velocity looks
[00:14:38] versus loss plot the zero velocity looks like this and the hinge loss looks like
[00:14:40] like this and the hinge loss looks like that
[00:14:40] that so it's the maximum of two lines one is
[00:14:44] so it's the maximum of two lines one is this descending line and one is this
[00:14:46] this descending line and one is this flat
[00:14:47] flat line at zero
[00:14:49] line at zero okay so formally what is this so the
[00:14:51] okay so formally what is this so the hinge loss
[00:14:53] hinge loss is
[00:14:54] is equal to
[00:14:56] equal to the max over two things the first is
[00:14:59] the max over two things the first is um one minus
[00:15:01] um one minus the margin
[00:15:02] the margin okay so this complicated expression is
[00:15:04] okay so this complicated expression is just the margin
[00:15:06] just the margin and the zero function
[00:15:08] and the zero function okay corresponding these two arguments
[00:15:10] okay corresponding these two arguments to the max corresponding to these two
[00:15:13] to the max corresponding to these two regions of
[00:15:15] regions of the the hinge loss
[00:15:17] the the hinge loss okay so let's interpret this a little
[00:15:19] okay so let's interpret this a little bit so
[00:15:20] bit so if the margin is greater than or equal
[00:15:23] if the margin is greater than or equal to one
[00:15:24] to one then the hinge loss is zero
[00:15:27] then the hinge loss is zero but once the margin starts dipping below
[00:15:29] but once the margin starts dipping below one
[00:15:30] one then the hinge loss starts growing
[00:15:32] then the hinge loss starts growing linearly
[00:15:34] linearly with
[00:15:34] with the margin violation
[00:15:37] the margin violation and why is there a one here and not at
[00:15:39] and why is there a one here and not at zero um well this is because we asked
[00:15:42] zero um well this is because we asked the classifier to predict not only
[00:15:43] the classifier to predict not only correctly but by a positive margin of
[00:15:46] correctly but by a positive margin of safety
[00:15:48] safety and just an aside this one could really
[00:15:50] and just an aside this one could really be two or three or any number as long as
[00:15:52] be two or three or any number as long as it's positive and its magnitude
[00:15:55] it's positive and its magnitude effectively determines uh the
[00:15:57] effectively determines uh the regularization strength if you're using
[00:15:59] regularization strength if you're using regularizers
[00:16:00] regularizers don't worry if you didn't get that
[00:16:04] don't worry if you didn't get that okay so also notice that the hinge loss
[00:16:06] okay so also notice that the hinge loss is an upper bound on the zero loss so
[00:16:08] is an upper bound on the zero loss so this is cool because suppose you
[00:16:10] this is cool because suppose you optimize the hinge loss and you drive it
[00:16:12] optimize the hinge loss and you drive it down driving down this thing is going to
[00:16:14] down driving down this thing is going to start pushing on the zero loss
[00:16:16] start pushing on the zero loss more or less and particularly if you get
[00:16:18] more or less and particularly if you get this hinge loss of zero
[00:16:20] this hinge loss of zero then what is the zero loss well it's
[00:16:22] then what is the zero loss well it's also going to be zero so that's that's a
[00:16:25] also going to be zero so that's that's a nice fact
[00:16:27] so here's a minor digression there's a
[00:16:30] so here's a minor digression there's a lot of other loss functions here is the
[00:16:32] lot of other loss functions here is the logistic loss and we can just plot it on
[00:16:34] logistic loss and we can just plot it on this diagram and you see that the
[00:16:35] this diagram and you see that the logistic loss doesn't have this kink in
[00:16:38] logistic loss doesn't have this kink in it it has a smooth transition between
[00:16:41] it it has a smooth transition between something that's growing linearly to
[00:16:43] something that's growing linearly to something that fades away to zero
[00:16:46] something that fades away to zero and the
[00:16:47] and the the key property of the logistic loss is
[00:16:50] the key property of the logistic loss is that even if you were out here
[00:16:52] that even if you were out here so if you have a margin of 2
[00:16:55] so if you have a margin of 2 then you're classifying correctly and
[00:16:56] then you're classifying correctly and the hinge loss would say you get zero
[00:16:58] the hinge loss would say you get zero loss and you don't need to do anything
[00:16:59] loss and you don't need to do anything but the logistic loss is greedy it says
[00:17:02] but the logistic loss is greedy it says well you still have a little bit of a
[00:17:05] well you still have a little bit of a loss and if you try to minimize the
[00:17:07] loss and if you try to minimize the logistic loss you're just going to try
[00:17:09] logistic loss you're just going to try to
[00:17:09] to keep on pushing this margin far out as
[00:17:13] keep on pushing this margin far out as possible
[00:17:15] possible so the logistic loss is differentiable
[00:17:19] so the logistic loss is differentiable everywhere and smooth and it's nice and
[00:17:21] everywhere and smooth and it's nice and it's typically known as logistic
[00:17:23] it's typically known as logistic regression because it has connections to
[00:17:25] regression because it has connections to probability
[00:17:26] probability okay so let's now go back to the hinge
[00:17:28] okay so let's now go back to the hinge loss
[00:17:30] loss here is our friend the hinge loss
[00:17:32] here is our friend the hinge loss and here is the expression for the hinge
[00:17:34] and here is the expression for the hinge loss and remember it's the maximum of
[00:17:37] loss and remember it's the maximum of two expressions
[00:17:38] two expressions um this decreasing align part and then
[00:17:42] um this decreasing align part and then the zero part in orange and blue
[00:17:44] the zero part in orange and blue respectively
[00:17:46] respectively okay so now if we want to apply gradient
[00:17:48] okay so now if we want to apply gradient descent to the hinge loss we have to
[00:17:50] descent to the hinge loss we have to take the gradient
[00:17:51] take the gradient so how do we take the gradient so the
[00:17:53] so how do we take the gradient so the gradient of the loss
[00:17:56] gradient of the loss hinge is equal to
[00:17:59] hinge is equal to and now we have this max thing okay
[00:18:01] and now we have this max thing okay which might be a little bit scary but if
[00:18:03] which might be a little bit scary but if you look up here
[00:18:05] you look up here we can just do this kind of visually so
[00:18:07] we can just do this kind of visually so what is the slope here well the slope
[00:18:10] what is the slope here well the slope here is whatever the slope of the orange
[00:18:12] here is whatever the slope of the orange part is and what is the slope here it's
[00:18:15] part is and what is the slope here it's the slope of this
[00:18:16] the slope of this blue part
[00:18:18] blue part and so now we just have to switch
[00:18:19] and so now we just have to switch between the two cases
[00:18:21] between the two cases so in particular if
[00:18:24] so in particular if the margin
[00:18:26] the margin is orange apart
[00:18:28] is orange apart is greater than zero
[00:18:31] is greater than zero that means we're in this region
[00:18:34] that means we're in this region and then the gradient is just going to
[00:18:36] and then the gradient is just going to be the gradient of this expression one's
[00:18:39] be the gradient of this expression one's a constant it's so it's going to be
[00:18:42] a constant it's so it's going to be minus
[00:18:43] minus we're differentiating with respect to w
[00:18:46] we're differentiating with respect to w so and phi of x y is a constant so it's
[00:18:49] so and phi of x y is a constant so it's just going to be phi of x y
[00:18:52] just going to be phi of x y and then if this condition doesn't hold
[00:18:55] and then if this condition doesn't hold otherwise that means we're in this
[00:18:57] otherwise that means we're in this region
[00:18:58] region and what is the gradient of zero well
[00:19:01] and what is the gradient of zero well that's the world's easiest differential
[00:19:03] that's the world's easiest differential uh calculus problem and it's uh zero
[00:19:08] uh calculus problem and it's uh zero okay so this is the gradient of the
[00:19:09] okay so this is the gradient of the hinge loss
[00:19:11] hinge loss and just to kind of sanity check things
[00:19:13] and just to kind of sanity check things so if you have a pick up an example
[00:19:16] so if you have a pick up an example and it's on this side over here then the
[00:19:19] and it's on this side over here then the gradient is going to be zero and you're
[00:19:21] gradient is going to be zero and you're not going to update your weights
[00:19:23] not going to update your weights on the other hand if you are over here
[00:19:26] on the other hand if you are over here then the gradient will be non-zero in
[00:19:28] then the gradient will be non-zero in particular it's going to be minus v of x
[00:19:30] particular it's going to be minus v of x y
[00:19:34] okay so now let's put things together
[00:19:36] okay so now let's put things together and revisit our
[00:19:38] and revisit our example so here's the purple classifier
[00:19:41] example so here's the purple classifier over here
[00:19:43] over here and
[00:19:44] and here we have some training data and
[00:19:46] here we have some training data and we're going to try to compute the hinge
[00:19:48] we're going to try to compute the hinge loss on this training data along with
[00:19:50] loss on this training data along with its gradient
[00:19:51] its gradient okay so
[00:19:52] okay so remember the hinge loss is this
[00:19:54] remember the hinge loss is this expression
[00:19:56] expression so let's look at the first point 0 2
[00:19:59] so let's look at the first point 0 2 0 2 is here and it's labeled as a
[00:20:01] 0 2 is here and it's labeled as a positive so if you go and you plug
[00:20:05] positive so if you go and you plug that point into the hinge loss then you
[00:20:08] that point into the hinge loss then you get a max over one minus the margin
[00:20:12] get a max over one minus the margin and zero so what is the margin here
[00:20:15] and zero so what is the margin here well it's this dot product which happens
[00:20:17] well it's this dot product which happens to be two
[00:20:19] to be two so we have 1 minus 2 minus 1
[00:20:22] so we have 1 minus 2 minus 1 and my max of minus 1 and 0 is 0. and
[00:20:25] and my max of minus 1 and 0 is 0. and that agrees with our intuition that the
[00:20:28] that agrees with our intuition that the loss here should be 0 because it's
[00:20:31] loss here should be 0 because it's correctly classified and correctly
[00:20:33] correctly classified and correctly classified by a margin of 2.
[00:20:36] classified by a margin of 2. and now let's look at the second point
[00:20:38] and now let's look at the second point minus 2
[00:20:40] minus 2 0
[00:20:42] 0 so if we compute the loss here
[00:20:44] so if we compute the loss here we see that the loss is actually 2.
[00:20:47] we see that the loss is actually 2. so notice that even though we are
[00:20:50] so notice that even though we are getting
[00:20:52] getting this oh so this loss is actually 2
[00:20:55] this oh so this loss is actually 2 and that makes sense because we
[00:20:57] and that makes sense because we misclassified this point
[00:20:59] misclassified this point so now if we look at the third
[00:21:02] so now if we look at the third point here
[00:21:04] point here minus 1
[00:21:05] minus 1 sorry 1 minus 1
[00:21:08] sorry 1 minus 1 then the loss on this example is 1.5
[00:21:11] then the loss on this example is 1.5 so notice that even though we're
[00:21:12] so notice that even though we're classifying this point correctly we're
[00:21:15] classifying this point correctly we're still incurring a lot because the margin
[00:21:17] still incurring a lot because the margin was only 0.5 and didn't meet the
[00:21:21] was only 0.5 and didn't meet the threshold
[00:21:23] okay so
[00:21:25] okay so or i guess the loss is maybe 1.5
[00:21:29] or i guess the loss is maybe 1.5 sorry the margin is 0.5 but the loss is
[00:21:31] sorry the margin is 0.5 but the loss is 1.5
[00:21:33] 1.5 so
[00:21:34] so now we can also compute the gradients
[00:21:37] now we can also compute the gradients here so the loss on the first
[00:21:39] here so the loss on the first a point is zero because the loss is zero
[00:21:41] a point is zero because the loss is zero and generally
[00:21:43] and generally not always if the loss is zero then the
[00:21:45] not always if the loss is zero then the gradient will be zero as well
[00:21:48] gradient will be zero as well and on the second
[00:21:49] and on the second one the loss is not zero so we have a
[00:21:52] one the loss is not zero so we have a non-zero gradient which is
[00:21:56] non-zero gradient which is minus of e of x
[00:21:58] minus of e of x y so it's this
[00:22:00] y so it's this part
[00:22:01] part um
[00:22:03] times this minus sign
[00:22:05] times this minus sign and the third
[00:22:07] and the third point
[00:22:08] point also has positive loss so it has a
[00:22:10] also has positive loss so it has a positive gradient 1 minus 1.
[00:22:14] positive gradient 1 minus 1. now
[00:22:15] now we can compute the train loss which is
[00:22:17] we can compute the train loss which is the average over the losses that gives
[00:22:19] the average over the losses that gives us 1.17 and the gradient of the train
[00:22:22] us 1.17 and the gradient of the train loss is just the average of the
[00:22:24] loss is just the average of the gradients and that gives us 1 minus 0.33
[00:22:32] okay so
[00:22:33] okay so let us now move on let's concretize this
[00:22:36] let us now move on let's concretize this in python
[00:22:39] in python okay so let's uh remember last time
[00:22:43] okay so let's uh remember last time we coded up gradient descent for
[00:22:45] we coded up gradient descent for uh
[00:22:48] linear regression so now i'm gonna just
[00:22:50] linear regression so now i'm gonna just copy this
[00:22:51] copy this um
[00:22:52] um object against that send hinge
[00:22:55] object against that send hinge and do it for the hinge loss okay so i'm
[00:22:58] and do it for the hinge loss okay so i'm going to use this as a starting point
[00:23:00] going to use this as a starting point and
[00:23:01] and i'm going to just change a few things um
[00:23:06] i'm going to just change a few things um let's
[00:23:07] let's change the training examples because now
[00:23:09] change the training examples because now we're working with this training data
[00:23:12] we're working with this training data so we have uh just to keep track of
[00:23:14] so we have uh just to keep track of things so this is x y pairs
[00:23:17] things so this is x y pairs so x now
[00:23:18] so x now is um
[00:23:21] is um zero two
[00:23:23] zero two one
[00:23:24] one and then the second point is minus two
[00:23:26] and then the second point is minus two zero
[00:23:27] zero and the third point is one minus one
[00:23:30] and the third point is one minus one and so we have three points x y where x
[00:23:33] and so we have three points x y where x is uh uh
[00:23:34] is uh uh triple
[00:23:36] triple okay so phi
[00:23:38] okay so phi is uh just going to be
[00:23:41] is uh just going to be um
[00:23:42] um you know x
[00:23:45] and
[00:23:46] and uh
[00:23:47] uh the wave dimension of the white vector
[00:23:49] the wave dimension of the white vector is still two
[00:23:50] is still two and now the key thing we have to do is
[00:23:52] and now the key thing we have to do is change the uh the definition of the loss
[00:23:57] change the uh the definition of the loss so now let us see so we before we have
[00:24:01] so now let us see so we before we have the average over a sum here and instead
[00:24:04] the average over a sum here and instead of the square loss i'm going to make
[00:24:07] of the square loss i'm going to make this the hinge loss so the hinge loss is
[00:24:10] this the hinge loss so the hinge loss is max over 1 minus the margin
[00:24:16] okay and 0.
[00:24:18] okay and 0. all right so this is max over one minus
[00:24:21] all right so this is max over one minus the margin and zero
[00:24:24] the margin and zero and the gradient of that
[00:24:26] and the gradient of that um is going to be
[00:24:29] um is going to be so
[00:24:30] so um let's actually just copy this down
[00:24:34] um let's actually just copy this down let's delete this
[00:24:36] let's delete this to not confuse ourselves
[00:24:38] to not confuse ourselves um so remember if the
[00:24:41] um so remember if the this first expression
[00:24:43] this first expression is greater than 0
[00:24:46] is greater than 0 then the gradient is minus
[00:24:50] then the gradient is minus phi of x times y
[00:24:53] phi of x times y if
[00:24:53] if we're on that side of
[00:24:56] we're on that side of the curve and otherwise it's just going
[00:24:58] the curve and otherwise it's just going to be
[00:24:59] to be zero
[00:25:02] okay so that's it we just changed the
[00:25:05] okay so that's it we just changed the training examples and changed
[00:25:07] training examples and changed the
[00:25:08] the definition of the loss function
[00:25:11] definition of the loss function and
[00:25:12] and the optimization algorithm we don't
[00:25:14] the optimization algorithm we don't actually have to change at all
[00:25:16] actually have to change at all okay
[00:25:17] okay so let's run this and see what we get
[00:25:24] okay so here's where you said it starts
[00:25:26] okay so here's where you said it starts out with w equals zero and then it
[00:25:29] out with w equals zero and then it starts moving w to
[00:25:31] starts moving w to 0.5 uh minus 0.55
[00:25:34] 0.5 uh minus 0.55 you see that the the train loss is
[00:25:37] you see that the the train loss is decreasing nicely and actually in this
[00:25:39] decreasing nicely and actually in this case it gets to
[00:25:40] case it gets to zero which means that we
[00:25:44] zero which means that we remember the hinge loss is upper bound
[00:25:45] remember the hinge loss is upper bound on zero and loss so that means the zero
[00:25:47] on zero and loss so that means the zero loss is also zero
[00:25:49] loss is also zero and the gradient also vanishes
[00:25:51] and the gradient also vanishes and becomes zero meaning that we
[00:25:53] and becomes zero meaning that we converged
[00:25:56] okay so just to recap all we did here
[00:25:59] okay so just to recap all we did here was change the training examples the
[00:26:01] was change the training examples the feature riser
[00:26:02] feature riser and uh redefine the loss and it's great
[00:26:05] and uh redefine the loss and it's great that we didn't have to touch the
[00:26:06] that we didn't have to touch the optimization algorithm because this was
[00:26:09] optimization algorithm because this was meant to be a generic piece of code
[00:26:13] all right so
[00:26:15] all right so let us summarize
[00:26:18] let us summarize and in particular i'm going to contrast
[00:26:20] and in particular i'm going to contrast regression with classification since
[00:26:22] regression with classification since we've seen two of them so far
[00:26:25] we've seen two of them so far so
[00:26:26] so the key
[00:26:28] the key quantity
[00:26:29] quantity that drives the prediction in both cases
[00:26:31] that drives the prediction in both cases is the score the dot product between the
[00:26:33] is the score the dot product between the weight vector and the feature vector
[00:26:36] weight vector and the feature vector and in regression the prediction is
[00:26:38] and in regression the prediction is exactly just a raw score while in
[00:26:41] exactly just a raw score while in classification you stick it through the
[00:26:43] classification you stick it through the sine function so you get a one
[00:26:45] sine function so you get a one or a minus one
[00:26:48] or a minus one how the prediction is related to the
[00:26:50] how the prediction is related to the target
[00:26:53] target well in regression we looked at the
[00:26:55] well in regression we looked at the residual which was the score minus y
[00:26:58] residual which was the score minus y and in classification we're looking at
[00:27:00] and in classification we're looking at the margin so in regression low residual
[00:27:03] the margin so in regression low residual was good
[00:27:04] was good in classification high margin is good
[00:27:07] in classification high margin is good because we want score and y to have the
[00:27:08] because we want score and y to have the same sign
[00:27:12] using those quantities we can define
[00:27:14] using those quantities we can define loss functions
[00:27:15] loss functions so in regression we looked at the square
[00:27:17] so in regression we looked at the square laws but as i mentioned briefly you can
[00:27:19] laws but as i mentioned briefly you can also do the absolute deviation loss in
[00:27:22] also do the absolute deviation loss in classification the story becomes a
[00:27:24] classification the story becomes a little bit stranger
[00:27:26] little bit stranger because we generally care about the zero
[00:27:28] because we generally care about the zero loss that's our misclassification rate
[00:27:30] loss that's our misclassification rate but we can't optimize it so we have to
[00:27:32] but we can't optimize it so we have to come up with a surrogate loss function
[00:27:34] come up with a surrogate loss function like the hinge loss which we went into
[00:27:37] like the hinge loss which we went into depth and the logistic loss which we
[00:27:39] depth and the logistic loss which we briefly mentioned
[00:27:41] briefly mentioned and given the loss functions
[00:27:43] and given the loss functions in both cases we use the gradient
[00:27:45] in both cases we use the gradient descent algorithm to optimize the loss
[00:27:48] descent algorithm to optimize the loss function
[00:27:50] function that's it that concludes the unit on
[00:27:53] that's it that concludes the unit on linear classification thanks for
[00:27:55] linear classification thanks for listening


================================================================================
LECTURE 007
================================================================================

Artificial Intelligence & Machine Learning 4 - Stochastic Gradient Descent | Stanford CS221 (2021)

Source: https://www.youtube.com/watch?v=bl2WgBLH0tI

---

Transcript

[00:00:05] hi in this lecture i'm going to talk
[00:00:08] hi in this lecture i'm going to talk about sarcastic gradient descent
[00:00:11] about sarcastic gradient descent so recall grading ascent which was the
[00:00:13] so recall grading ascent which was the optimization
[00:00:14] optimization algorithm that we decided on for
[00:00:16] algorithm that we decided on for optimizing all our training losses for
[00:00:20] optimizing all our training losses for classification and regression
[00:00:22] classification and regression so recall that the training loss is an
[00:00:25] so recall that the training loss is an average
[00:00:26] average over all the examples in the training
[00:00:28] over all the examples in the training set
[00:00:29] set of the per example losses
[00:00:33] of the per example losses so graded descent
[00:00:34] so graded descent works as follows
[00:00:36] works as follows we're going to initialize the weight
[00:00:37] we're going to initialize the weight vector to 0 and then we're going to
[00:00:39] vector to 0 and then we're going to repeat t times
[00:00:42] repeat t times and do the following update
[00:00:44] and do the following update we're going to take
[00:00:45] we're going to take the old wave vector and subtract out the
[00:00:48] the old wave vector and subtract out the step size times the gradient over the
[00:00:51] step size times the gradient over the training loss
[00:00:53] training loss and now this looks very simple but if
[00:00:56] and now this looks very simple but if you unpack
[00:00:57] you unpack what this gradient is it's actually an
[00:01:00] what this gradient is it's actually an average over the gradients of the per
[00:01:03] average over the gradients of the per example losses
[00:01:05] example losses so now imagine you have a data set with
[00:01:07] so now imagine you have a data set with a million examples computing the single
[00:01:10] a million examples computing the single gradient is going to include involve
[00:01:14] gradient is going to include involve looping over all million examples just
[00:01:16] looping over all million examples just to get a single update
[00:01:18] to get a single update and then you take a step
[00:01:20] and then you take a step and then you have to do it all over
[00:01:22] and then you have to do it all over again
[00:01:23] again so this is why gradient descent is slow
[00:01:26] so this is why gradient descent is slow because it requires going through all
[00:01:27] because it requires going through all the training examples just to make one
[00:01:30] the training examples just to make one update
[00:01:31] update so what can we do about this
[00:01:34] so what can we do about this so the answer is stochastic gradient
[00:01:37] so the answer is stochastic gradient descent
[00:01:38] descent so here is the same training loss
[00:01:40] so here is the same training loss function and stochastic gradient set is
[00:01:42] function and stochastic gradient set is going to work as follows so we
[00:01:44] going to work as follows so we initialize the weight vectors to zero
[00:01:47] initialize the weight vectors to zero and then we iterate t times
[00:01:50] and then we iterate t times and now
[00:01:51] and now on each
[00:01:52] on each epoch we're going to
[00:01:55] epoch we're going to loop
[00:01:56] loop over
[00:01:58] over the trainings examples and then perform
[00:02:01] the trainings examples and then perform an update
[00:02:03] an update on
[00:02:03] on the individual losses
[00:02:06] the individual losses okay so here
[00:02:08] okay so here instead of
[00:02:09] instead of going through the training set
[00:02:12] going through the training set and performing one
[00:02:14] and performing one update
[00:02:15] update we're going to go through the training
[00:02:16] we're going to go through the training set and after each example we're going
[00:02:19] set and after each example we're going to perform an update
[00:02:21] to perform an update and this is going to be a lot faster in
[00:02:23] and this is going to be a lot faster in terms of
[00:02:25] terms of having the number of updates be large
[00:02:28] having the number of updates be large of course there is a trade-off because
[00:02:30] of course there is a trade-off because each update itself is not going to be as
[00:02:33] each update itself is not going to be as high quality because it's only consists
[00:02:36] high quality because it's only consists of one example as opposed to all of the
[00:02:38] of one example as opposed to all of the examples
[00:02:41] examples and that's it for stochastic gradient
[00:02:43] and that's it for stochastic gradient descent i want to talk about one small
[00:02:45] descent i want to talk about one small note
[00:02:46] note which is the step size
[00:02:49] which is the step size so recall the update is
[00:02:51] so recall the update is includes a step size which determines
[00:02:54] includes a step size which determines how
[00:02:55] how far in the direction of the gradient or
[00:02:57] far in the direction of the gradient or away from the gradient do you want to
[00:02:59] away from the gradient do you want to move
[00:03:01] move okay so what should ada be and in
[00:03:03] okay so what should ada be and in general there's not really a one
[00:03:05] general there's not really a one satisfying answer to this and it's
[00:03:07] satisfying answer to this and it's usually a hyper parameter that has to be
[00:03:09] usually a hyper parameter that has to be tuned via trial and error but here's
[00:03:12] tuned via trial and error but here's some general guidance here so the step
[00:03:14] some general guidance here so the step size has to be greater or equal to zero
[00:03:17] size has to be greater or equal to zero and if it is small
[00:03:19] and if it is small that means you're taking little little
[00:03:20] that means you're taking little little steps but that means your algorithm is
[00:03:23] steps but that means your algorithm is going to be more stable and less likely
[00:03:24] going to be more stable and less likely to bounce around
[00:03:26] to bounce around and as you increase ada larger and
[00:03:28] and as you increase ada larger and larger
[00:03:30] larger then you're taking more aggressive steps
[00:03:32] then you're taking more aggressive steps so you can move faster
[00:03:34] so you can move faster but perhaps at the risk of being a bit
[00:03:36] but perhaps at the risk of being a bit more unstable
[00:03:38] more unstable so two typical strategies for setting
[00:03:41] so two typical strategies for setting the step size one is using just a
[00:03:42] the step size one is using just a constant step size
[00:03:44] constant step size we've used so far a equals 0.1 kind of
[00:03:47] we've used so far a equals 0.1 kind of arbitrary number
[00:03:49] arbitrary number um or you can do a decreasing step size
[00:03:52] um or you can do a decreasing step size rate where ada is one over
[00:03:55] rate where ada is one over the number of updates
[00:03:57] the number of updates that you've made
[00:03:58] that you've made and the intuition here is that in the
[00:04:01] and the intuition here is that in the beginning
[00:04:02] beginning you're far away from the optimum so
[00:04:04] you're far away from the optimum so you're going to move quickly you want to
[00:04:06] you're going to move quickly you want to move quickly but as soon as you start
[00:04:09] move quickly but as soon as you start getting closer although you want to slow
[00:04:11] getting closer although you want to slow down
[00:04:15] so now let us
[00:04:17] so now let us explore stochastic gradient descent um
[00:04:20] explore stochastic gradient descent um in python i'm going to code it up and uh
[00:04:23] in python i'm going to code it up and uh see
[00:04:24] see you know what happens
[00:04:26] you know what happens okay
[00:04:28] okay so remember last time we degraded
[00:04:30] so remember last time we degraded descent so i'm going to
[00:04:32] descent so i'm going to copy this code over
[00:04:34] copy this code over descent hinge
[00:04:36] descent hinge uh sorry i just sent stochastic gradient
[00:04:39] uh sorry i just sent stochastic gradient descent
[00:04:41] descent um and
[00:04:43] um and what we're going to do is modify this
[00:04:45] what we're going to do is modify this code to make it uh do staccato screen
[00:04:48] code to make it uh do staccato screen descent
[00:04:50] descent okay so just recall last time we set up
[00:04:53] okay so just recall last time we set up some training examples um we define the
[00:04:55] some training examples um we define the loss function and then we have this
[00:04:57] loss function and then we have this generic optimization algorithm
[00:05:00] generic optimization algorithm so now to really tell the difference
[00:05:02] so now to really tell the difference between gradient descent and category
[00:05:04] between gradient descent and category and descent i'm going to make a larger
[00:05:07] and descent i'm going to make a larger data set and i'm going to do it in a way
[00:05:09] data set and i'm going to do it in a way so that it's large but it's structured
[00:05:11] so that it's large but it's structured so we know what the right answer is
[00:05:13] so we know what the right answer is because otherwise how can we verify that
[00:05:15] because otherwise how can we verify that it did the right thing
[00:05:17] it did the right thing to do this uh this is kind of a just a
[00:05:19] to do this uh this is kind of a just a general trick is that you kind of
[00:05:22] general trick is that you kind of generate synthetic data from kind of a
[00:05:25] generate synthetic data from kind of a ground truth and then you try to recover
[00:05:26] ground truth and then you try to recover that ground truth so suppose we had some
[00:05:29] that ground truth so suppose we had some true way vector
[00:05:31] true way vector um this is our secret code which is
[00:05:33] um this is our secret code which is unknown to the learning algorithm but we
[00:05:35] unknown to the learning algorithm but we hope that learning algorithm will
[00:05:37] hope that learning algorithm will recover this
[00:05:39] recover this and then we're going to define a
[00:05:40] and then we're going to define a function called generate which
[00:05:43] function called generate which uses this true w to generate an example
[00:05:46] uses this true w to generate an example so
[00:05:47] so here i'm going to generate x
[00:05:50] here i'm going to generate x and i'm going to just sample randomly um
[00:05:53] and i'm going to just sample randomly um a five-dimensional uh weight vector
[00:05:57] a five-dimensional uh weight vector oh sorry uh a
[00:05:58] oh sorry uh a input point
[00:06:00] input point and then i'm going to set y
[00:06:02] and then i'm going to set y to be
[00:06:03] to be true w dot x
[00:06:06] true w dot x so true
[00:06:08] so true the examples i'm going to generate are
[00:06:09] the examples i'm going to generate are generated from the true ray vector and
[00:06:12] generated from the true ray vector and then i'm just going to add some
[00:06:14] then i'm just going to add some noise
[00:06:18] okay and then i'm going to
[00:06:22] okay and then i'm going to set the training examples to be just
[00:06:24] set the training examples to be just generate
[00:06:25] generate uh for
[00:06:27] uh for let's say
[00:06:29] let's say one let's do one million examples that's
[00:06:32] one let's do one million examples that's a lot of examples
[00:06:34] a lot of examples all right
[00:06:35] all right so let's see what this data looks like
[00:06:37] so let's see what this data looks like so i'm going to print out x and y
[00:06:43] and just to see what is coming out
[00:06:48] oops um i had a typo here
[00:06:53] okay so here is the data set that we are
[00:06:56] okay so here is the data set that we are going to train on
[00:06:58] going to train on so example x is a five
[00:07:01] so example x is a five vector and the output is a scalar so
[00:07:05] vector and the output is a scalar so there's a lot of examples here
[00:07:08] there's a lot of examples here okay
[00:07:10] okay all right so i need to update the
[00:07:12] all right so i need to update the feature vector um
[00:07:14] feature vector um to be just x
[00:07:16] to be just x identity and here i'm going to the
[00:07:19] identity and here i'm going to the initial way vector has to match the
[00:07:21] initial way vector has to match the dimensionality of the of the true wave
[00:07:23] dimensionality of the of the true wave vector
[00:07:24] vector and then everything else the training
[00:07:25] and then everything else the training loss and gradient are i'm going to leave
[00:07:28] loss and gradient are i'm going to leave along okay
[00:07:31] along okay so now let's uh
[00:07:33] so now let's uh uncomment this line
[00:07:36] uncomment this line and let's run gradient descent let's see
[00:07:39] and let's run gradient descent let's see what happens here
[00:07:42] okay so it's going to
[00:07:44] okay so it's going to generate the data and now
[00:07:46] generate the data and now to compute a single gradient it has to
[00:07:48] to compute a single gradient it has to enumerate
[00:07:50] enumerate over 1 million examples
[00:07:53] over 1 million examples so this is going to be quite slow i'll
[00:07:55] so this is going to be quite slow i'll finish the first epoch
[00:07:57] finish the first epoch and
[00:07:58] and it has some some values
[00:08:02] it has some some values so
[00:08:02] so and then the second epoch
[00:08:05] and then the second epoch and it seems like it's making some
[00:08:08] and it seems like it's making some progress
[00:08:09] progress remember we want to see if this can hit
[00:08:10] remember we want to see if this can hit one two three four five
[00:08:12] one two three four five um the loss is going down
[00:08:15] um the loss is going down which is good and it seems like it's
[00:08:17] which is good and it seems like it's moving in the right direction but um
[00:08:20] moving in the right direction but um it's it's pretty slow and i'm just going
[00:08:22] it's it's pretty slow and i'm just going to stop it there because i don't want to
[00:08:23] to stop it there because i don't want to wait forever
[00:08:26] wait forever okay so now let's do sarcastic gradient
[00:08:27] okay so now let's do sarcastic gradient descent
[00:08:29] descent so
[00:08:31] so first i need to
[00:08:32] first i need to change the interface because gradient
[00:08:35] change the interface because gradient descent only had access to f and the
[00:08:37] descent only had access to f and the gradient of f
[00:08:38] gradient of f and now stochastic gradient descent
[00:08:40] and now stochastic gradient descent needs to access individual losses
[00:08:43] needs to access individual losses so what i'm going to do is i'm going to
[00:08:45] so what i'm going to do is i'm going to define a stochastic
[00:08:49] define a stochastic or actually i'll just call this the loss
[00:08:51] or actually i'll just call this the loss of w
[00:08:53] of w i'm going to
[00:08:55] i'm going to use i
[00:08:56] use i here to denote
[00:08:58] here to denote an index into one of these terms in the
[00:09:01] an index into one of these terms in the sum
[00:09:02] sum so what the loss is going to be
[00:09:04] so what the loss is going to be is
[00:09:05] is it's just going to be one
[00:09:07] it's just going to be one of these
[00:09:09] of these terms
[00:09:10] terms and the term i'm going to select out is
[00:09:12] and the term i'm going to select out is just the i
[00:09:13] just the i data point
[00:09:15] data point okay
[00:09:16] okay and similarly
[00:09:18] and similarly the the gradient
[00:09:20] the the gradient of the loss
[00:09:21] of the loss is
[00:09:22] is going to be just
[00:09:24] going to be just uh the gradient
[00:09:27] but for
[00:09:29] but for the ith
[00:09:30] the ith data point
[00:09:32] and this also takes in the index i
[00:09:37] and this also takes in the index i so now if i feed in i for various values
[00:09:39] so now if i feed in i for various values i can access the loss and the gradient
[00:09:42] i can access the loss and the gradient of that loss function for any given
[00:09:44] of that loss function for any given point vector
[00:09:46] point vector all right so
[00:09:48] all right so so now let's go over to the optimization
[00:09:50] so now let's go over to the optimization algorithm and let me do stochastic
[00:09:53] algorithm and let me do stochastic gradient descent
[00:09:55] gradient descent okay so i'm going to call this
[00:09:58] okay so i'm going to call this stochastic
[00:09:59] stochastic gradient descent
[00:10:01] gradient descent and i'm going to call this just to
[00:10:03] and i'm going to call this just to distinguish things i'm going to use
[00:10:05] distinguish things i'm going to use lowercase f
[00:10:06] lowercase f for like individual components of the
[00:10:08] for like individual components of the objective function
[00:10:10] objective function okay so i'm going to initialize the
[00:10:12] okay so i'm going to initialize the weight vector
[00:10:14] weight vector i'm going to
[00:10:16] i'm going to use a different step size here just for
[00:10:19] use a different step size here just for fun
[00:10:20] fun um i'm going to initialize with one
[00:10:24] um i'm going to initialize with one um
[00:10:26] um actually let me do this instead
[00:10:29] actually let me do this instead i'm going to set the step size to be 1
[00:10:32] i'm going to set the step size to be 1 over the square root of number of
[00:10:35] over the square root of number of updates
[00:10:36] updates and each time i do an update
[00:10:39] and each time i do an update i'm going to
[00:10:40] i'm going to [Music]
[00:10:41] [Music] increase the number of updates so
[00:10:43] increase the number of updates so actually let me do it in this order
[00:10:46] actually let me do it in this order okay so number of updates is starts at
[00:10:48] okay so number of updates is starts at zero
[00:10:49] zero um and then
[00:10:51] um and then uh remembering sarcastic gradient
[00:10:53] uh remembering sarcastic gradient descent
[00:10:54] descent i'm going to
[00:10:56] i'm going to loop over
[00:10:59] loop over um the number
[00:11:01] um the number of components of
[00:11:03] of components of the objective function
[00:11:05] the objective function so 1
[00:11:07] so 1 0 to m minus 1.
[00:11:09] 0 to m minus 1. so another thing i'm going to have to
[00:11:11] so another thing i'm going to have to pass in
[00:11:12] pass in is the number of components
[00:11:14] is the number of components that i'm going to use to index into f
[00:11:17] that i'm going to use to index into f and so now this is f of w i gradient f
[00:11:21] and so now this is f of w i gradient f of w i
[00:11:23] of w i and then i'm going to move everything
[00:11:25] and then i'm going to move everything inward
[00:11:28] inward okay
[00:11:30] okay so
[00:11:31] so now to call this function
[00:11:34] now to call this function i'm going to run stochastic gradient
[00:11:36] i'm going to run stochastic gradient descent
[00:11:37] descent and with just the loss
[00:11:41] and with just the loss and
[00:11:42] and the the gradient of the loss
[00:11:46] the the gradient of the loss um i'm going to pass an n which is a
[00:11:48] um i'm going to pass an n which is a number of
[00:11:49] number of training examples and an initial weight
[00:11:52] training examples and an initial weight vector
[00:11:53] vector okay so let's just review what's going
[00:11:55] okay so let's just review what's going on here so sarcastic gradient descent
[00:11:57] on here so sarcastic gradient descent takes a function which can access
[00:12:00] takes a function which can access individual components of the objective
[00:12:03] individual components of the objective it initializes the weights and then
[00:12:06] it initializes the weights and then uh iterates
[00:12:07] uh iterates some number of times and in each epoch
[00:12:10] some number of times and in each epoch it's going to
[00:12:13] in loop over all the examples compute
[00:12:17] in loop over all the examples compute the value compute the gradient
[00:12:19] the value compute the gradient and then
[00:12:20] and then it's going to
[00:12:22] it's going to do a gradient update
[00:12:24] do a gradient update and here i'm using the step size which
[00:12:27] and here i'm using the step size which is one over the number of updates that
[00:12:29] is one over the number of updates that i've made so far
[00:12:32] okay so let's see
[00:12:34] okay so let's see um sarcastic gradient descent in action
[00:12:37] um sarcastic gradient descent in action now
[00:12:38] now uh i have two returns here so that is a
[00:12:42] uh i have two returns here so that is a syntax error let me fix that
[00:12:46] um so now it's going through one million
[00:12:49] um so now it's going through one million examples oh i need to
[00:12:51] examples oh i need to import math as well
[00:12:54] import math as well um
[00:12:56] um so it's going to loop over 1 million
[00:12:59] so it's going to loop over 1 million examples but each example it's going to
[00:13:01] examples but each example it's going to perform an update
[00:13:03] perform an update and so when it prints out it's going to
[00:13:05] and so when it prints out it's going to have already taken 1 million steps
[00:13:08] have already taken 1 million steps of stochastic radiant descent
[00:13:11] of stochastic radiant descent and look what happened here
[00:13:14] and look what happened here so after the first step it's already
[00:13:17] so after the first step it's already quite close to one two three four five
[00:13:20] quite close to one two three four five and the objective
[00:13:21] and the objective uh
[00:13:23] uh i guess the function value doesn't
[00:13:24] i guess the function value doesn't really mean as much because it's only of
[00:13:26] really mean as much because it's only of an individual point but you can see that
[00:13:29] an individual point but you can see that the
[00:13:30] the weight vector is converging you know
[00:13:32] weight vector is converging you know quite nicely
[00:13:34] quite nicely and this shows that stochastic gradient
[00:13:36] and this shows that stochastic gradient descent
[00:13:37] descent just even sometimes with one passable
[00:13:39] just even sometimes with one passable training data can get
[00:13:41] training data can get uh much closer to the optimum than if
[00:13:44] uh much closer to the optimum than if you were to do
[00:13:45] you were to do um many many rounds of gradient descent
[00:13:50] okay so that was
[00:13:52] okay so that was a sarcastic radiant descent in python
[00:13:56] a sarcastic radiant descent in python so let's summarize here
[00:13:58] so let's summarize here so we want to optimize this training
[00:14:01] so we want to optimize this training loss which is an average over the per
[00:14:04] loss which is an average over the per example losses
[00:14:06] example losses and we looked at gradient descent
[00:14:09] and we looked at gradient descent which takes a step
[00:14:11] which takes a step on the gradient of the training loss
[00:14:14] on the gradient of the training loss and we also looked at stochastic
[00:14:16] and we also looked at stochastic gradient descent which picked up
[00:14:18] gradient descent which picked up individual examples and updated
[00:14:21] individual examples and updated on computing after computing the
[00:14:23] on computing after computing the gradient of individual examples
[00:14:26] gradient of individual examples and now we on this example we've shown
[00:14:28] and now we on this example we've shown that sarcastic gradient descent wins and
[00:14:31] that sarcastic gradient descent wins and the key idea behind stochastic updates
[00:14:33] the key idea behind stochastic updates is that it's not about quality it's
[00:14:35] is that it's not about quality it's about quantity
[00:14:37] about quantity so maybe not a general life lesson
[00:14:39] so maybe not a general life lesson but it seems like in this case it is
[00:14:42] but it seems like in this case it is more wise to keep in mind what you're
[00:14:45] more wise to keep in mind what you're trying to do which is optimize this
[00:14:46] trying to do which is optimize this objective
[00:14:48] objective rather than compute
[00:14:50] rather than compute the gradient which is only a means to an
[00:14:53] the gradient which is only a means to an end
[00:14:55] end okay so that concludes the module on
[00:14:56] okay so that concludes the module on stochastic gradient design thanks for
[00:14:58] stochastic gradient design thanks for listening


================================================================================
LECTURE 008
================================================================================

Artificial Intelligence and Machine Learning 5 - Group DRO | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=ZFK2XtWqUbw

---

Transcript

[00:00:05] hello in this module i'm going to first
[00:00:07] hello in this module i'm going to first show you how minimizing the average
[00:00:10] show you how minimizing the average error on your training examples can
[00:00:12] error on your training examples can actually lead to disparities between a
[00:00:14] actually lead to disparities between a performance between groups
[00:00:16] performance between groups and then i'm going to show you a simple
[00:00:18] and then i'm going to show you a simple approach called group distribution and
[00:00:19] approach called group distribution and robust optimization that can mitigate
[00:00:22] robust optimization that can mitigate some of these disparities
[00:00:24] some of these disparities so let me begin with a very famous
[00:00:27] so let me begin with a very famous example of disparities or inequalities
[00:00:30] example of disparities or inequalities in machine learning it's called the
[00:00:31] in machine learning it's called the gender shades project in this project
[00:00:33] gender shades project in this project the authors collected a data set of
[00:00:36] the authors collected a data set of images and faces of different
[00:00:39] images and faces of different genders and different
[00:00:41] genders and different skin tones
[00:00:42] skin tones and then they evaluated a gender
[00:00:44] and then they evaluated a gender classifier from microsoft face plus plus
[00:00:48] classifier from microsoft face plus plus in ibm
[00:00:49] in ibm what they found was rather striking so
[00:00:51] what they found was rather striking so for a group of lighter skinned males
[00:00:55] for a group of lighter skinned males the classifier was almost perfect
[00:00:58] the classifier was almost perfect but if you look at the performance of
[00:01:01] but if you look at the performance of those classifiers on darker skinned
[00:01:03] those classifiers on darker skinned females you'll see that the accuracy are
[00:01:06] females you'll see that the accuracy are much much worse
[00:01:08] much much worse so this is a general problem in machine
[00:01:11] so this is a general problem in machine learning which is that inequalities
[00:01:13] learning which is that inequalities between different groups arise because
[00:01:16] between different groups arise because machine learning is generally
[00:01:18] machine learning is generally where you minimize the average loss
[00:01:23] so these
[00:01:26] so these inequalities can have real world
[00:01:28] inequalities can have real world consequences
[00:01:29] consequences so in this vivid case a black man was
[00:01:32] so in this vivid case a black man was wrongly arrested due to an incorrect
[00:01:35] wrongly arrested due to an incorrect match with another black man captured
[00:01:36] match with another black man captured from a surveillance video and this
[00:01:38] from a surveillance video and this mistake was due to a mistake made by a
[00:01:41] mistake was due to a mistake made by a facial recognition system
[00:01:44] facial recognition system so
[00:01:44] so given what we just saw on the generous
[00:01:47] given what we just saw on the generous age project we can see that lower
[00:01:48] age project we can see that lower accuracies for some groups might lead to
[00:01:50] accuracies for some groups might lead to more false arrests which adds to already
[00:01:54] more false arrests which adds to already problematic inequalities that
[00:01:56] problematic inequalities that exist in our society today
[00:01:58] exist in our society today so in this module i'm going to focus on
[00:02:00] so in this module i'm going to focus on this issue of performance disparities
[00:02:02] this issue of performance disparities between groups and how we might be able
[00:02:05] between groups and how we might be able to mitigate them
[00:02:06] to mitigate them but i also want to highlight that even
[00:02:09] but i also want to highlight that even if we didn't have any disparities
[00:02:11] if we didn't have any disparities between groups there's a question of
[00:02:13] between groups there's a question of whether facial recognition technology
[00:02:15] whether facial recognition technology should be used in law enforcement or in
[00:02:17] should be used in law enforcement or in surveillance or anything at all
[00:02:19] surveillance or anything at all and these are big thorny ethical
[00:02:22] and these are big thorny ethical questions which we're not going to
[00:02:24] questions which we're not going to unfortunately be able to spend much time
[00:02:25] unfortunately be able to spend much time with in this module but i just want to
[00:02:28] with in this module but i just want to highlight that it's important to
[00:02:29] highlight that it's important to remember that sometimes
[00:02:31] remember that sometimes the issue is not with the solution but
[00:02:33] the issue is not with the solution but in the framing of the problem itself
[00:02:37] so
[00:02:38] so general shades was an example of
[00:02:40] general shades was an example of classification but to make things
[00:02:42] classification but to make things simpler let us consider our friend
[00:02:44] simpler let us consider our friend linear regression
[00:02:45] linear regression so recall in linear aggression we start
[00:02:48] so recall in linear aggression we start with a training
[00:02:49] with a training set which consists of examples each
[00:02:52] set which consists of examples each example has an input x and output y
[00:02:56] example has an input x and output y and but in our case we're going to
[00:02:57] and but in our case we're going to assume each example is also annotated
[00:02:59] assume each example is also annotated with a group g
[00:03:01] with a group g um so we're going to have a
[00:03:04] um so we're going to have a set of let's plot this over here so
[00:03:06] set of let's plot this over here so here's 1 4.
[00:03:08] here's 1 4. and here's a second example 2 8 which is
[00:03:11] and here's a second example 2 8 which is up here
[00:03:12] up here um and then
[00:03:14] um and then these examples down here are going to
[00:03:16] these examples down here are going to come from group b so we're going to have
[00:03:18] come from group b so we're going to have two groups a and b
[00:03:20] two groups a and b and here
[00:03:22] and here they are over here
[00:03:24] they are over here okay so the goal of machine learning or
[00:03:26] okay so the goal of machine learning or linear regression in particular is to
[00:03:28] linear regression in particular is to produce a predictor
[00:03:31] produce a predictor such as this one
[00:03:33] such as this one and the predictor is going to take new
[00:03:36] and the predictor is going to take new inputs such as 3 and produce an output
[00:03:40] inputs such as 3 and produce an output such as 3.27
[00:03:43] such as 3.27 so
[00:03:44] so in linear regression um we assume that
[00:03:46] in linear regression um we assume that the predictor
[00:03:48] the predictor has the form um a weight vector dot
[00:03:52] has the form um a weight vector dot a feature vector v of x
[00:03:55] a feature vector v of x in the simple example we're going to
[00:03:56] in the simple example we're going to restrict ourselves to the case where the
[00:03:58] restrict ourselves to the case where the the feature vector is simply the
[00:04:00] the feature vector is simply the identity map just x
[00:04:02] identity map just x which gives us a hypothesis class which
[00:04:05] which gives us a hypothesis class which is the set of all lines that go through
[00:04:07] is the set of all lines that go through the origin
[00:04:08] the origin so you can think about sweeping lines
[00:04:11] so you can think about sweeping lines through the origin here
[00:04:13] through the origin here and the white vector is just going to be
[00:04:15] and the white vector is just going to be a single number uh w
[00:04:19] a single number uh w so already you can see some tension here
[00:04:22] so already you can see some tension here so which web vector would you choose
[00:04:24] so which web vector would you choose would you choose one that's closer to
[00:04:27] would you choose one that's closer to these
[00:04:28] these points in group b
[00:04:30] points in group b or in group a and this tension means
[00:04:33] or in group a and this tension means that we have to compromise somehow and
[00:04:34] that we have to compromise somehow and exactly how we compromise is going to
[00:04:37] exactly how we compromise is going to have some
[00:04:38] have some implications
[00:04:41] implications so notice also that the predictor
[00:04:44] so notice also that the predictor doesn't use the group information it
[00:04:45] doesn't use the group information it just takes an input x as before
[00:04:48] just takes an input x as before what's going to use group information is
[00:04:50] what's going to use group information is the learning algorithm and we'll get to
[00:04:52] the learning algorithm and we'll get to that a little bit later
[00:04:55] so um just as a review
[00:04:58] so um just as a review for linear regression we define the loss
[00:05:00] for linear regression we define the loss function of an input x y
[00:05:03] function of an input x y um and a particular wave vector to be
[00:05:05] um and a particular wave vector to be the simply the difference between the
[00:05:07] the simply the difference between the predicted value of that classifier or
[00:05:09] predicted value of that classifier or sorry progressive
[00:05:11] sorry progressive f of w
[00:05:12] f of w and the target value
[00:05:15] and the target value y squared
[00:05:18] y squared and uh remember that we defined the
[00:05:20] and uh remember that we defined the training loss of a particular wave
[00:05:22] training loss of a particular wave vector as follows it's going to be the
[00:05:25] vector as follows it's going to be the average so one over a number of training
[00:05:27] average so one over a number of training examples over some over training
[00:05:29] examples over some over training examples of the per example loss
[00:05:32] examples of the per example loss so visually we can
[00:05:34] so visually we can see this on this plot where for each
[00:05:36] see this on this plot where for each value of w
[00:05:38] value of w in our case here remember w is a scalar
[00:05:41] in our case here remember w is a scalar for this particular example we get a
[00:05:43] for this particular example we get a loss value
[00:05:44] loss value so this is the training loss which is
[00:05:48] so this is the training loss which is this
[00:05:48] this this curve here
[00:05:51] this curve here and um
[00:05:52] and um what we can do is let's practice
[00:05:54] what we can do is let's practice evaluating this training loss at a
[00:05:57] evaluating this training loss at a particular value of w let's say one so
[00:06:00] particular value of w let's say one so this is going to take the average over
[00:06:02] this is going to take the average over this data set
[00:06:04] this data set and it's going to return some value
[00:06:07] and it's going to return some value 7.5
[00:06:09] 7.5 okay so the loss of
[00:06:11] okay so the loss of the average loss
[00:06:13] the average loss at w equals 1 is 7.5
[00:06:18] so
[00:06:19] so um which seems okay but now let's
[00:06:22] um which seems okay but now let's remember let's pure a little bit closer
[00:06:24] remember let's pure a little bit closer at how
[00:06:25] at how the loss is spread across groups okay so
[00:06:28] the loss is spread across groups okay so we're going to define
[00:06:29] we're going to define a notion of a per-group loss so here's
[00:06:32] a notion of a per-group loss so here's our training set so for group a what is
[00:06:35] our training set so for group a what is this loss in group b group b what's this
[00:06:37] this loss in group b group b what's this loss so formally we're going to define
[00:06:39] loss so formally we're going to define the per group loss written train loss
[00:06:41] the per group loss written train loss sub g for group g
[00:06:43] sub g for group g g can be either a or b
[00:06:45] g can be either a or b to be the average now over only those
[00:06:49] to be the average now over only those examples in that group so this notation
[00:06:51] examples in that group so this notation d train of g is the set of examples in
[00:06:54] d train of g is the set of examples in group g
[00:06:57] group g and of the for example loss again
[00:07:00] and of the for example loss again okay so um we're gonna plot these two
[00:07:03] okay so um we're gonna plot these two losses on this uh curve here and we see
[00:07:07] losses on this uh curve here and we see that um
[00:07:08] that um we have
[00:07:10] we have um these two plots so
[00:07:13] um these two plots so train loss a looks like this and train
[00:07:16] train loss a looks like this and train loss b looks like that
[00:07:19] loss b looks like that and we can practice evaluating um these
[00:07:22] and we can practice evaluating um these different loss functions at our uh
[00:07:24] different loss functions at our uh example weight vector one here so train
[00:07:27] example weight vector one here so train loss a is going to be an average
[00:07:29] loss a is going to be an average remember only over the examples in group
[00:07:32] remember only over the examples in group a
[00:07:33] a and that's going to give us 22.5
[00:07:36] and that's going to give us 22.5 you can see
[00:07:37] you can see it looks like about 22.5 here
[00:07:40] it looks like about 22.5 here and then what about b
[00:07:42] and then what about b so b actually gets a loss of zero
[00:07:46] so b actually gets a loss of zero which you can see at this point
[00:07:48] which you can see at this point so you can see that we have a single
[00:07:51] so you can see that we have a single wave vector
[00:07:53] wave vector one
[00:07:54] one gets very different losses on the two
[00:07:57] gets very different losses on the two data sets
[00:07:58] data sets on the two groups a is doing a lot worse
[00:08:01] on the two groups a is doing a lot worse it has 22.5 and b is doing
[00:08:04] it has 22.5 and b is doing much better it has a zero which is the
[00:08:06] much better it has a zero which is the minimum loss you can hope for
[00:08:09] minimum loss you can hope for so this is an example of a disparity
[00:08:11] so this is an example of a disparity between
[00:08:12] between if we were to choose wave vector one
[00:08:14] if we were to choose wave vector one there would be a huge disparity between
[00:08:16] there would be a huge disparity between the performance on
[00:08:17] the performance on these two groups
[00:08:20] so um
[00:08:22] so um so we can look at the losses of
[00:08:23] so we can look at the losses of different groups but it'll be helpful to
[00:08:25] different groups but it'll be helpful to kind of summarize that as a single
[00:08:27] kind of summarize that as a single number
[00:08:28] number now we're going to capture by by a
[00:08:30] now we're going to capture by by a quantity called the maximum group loss
[00:08:32] quantity called the maximum group loss and you might guess from the name that
[00:08:35] and you might guess from the name that the maximum group loss written train
[00:08:38] the maximum group loss written train loss max
[00:08:39] loss max is simply just going to be the maximum
[00:08:43] is simply just going to be the maximum over all groups of the per group loss
[00:08:47] over all groups of the per group loss and so visually what this looks like is
[00:08:49] and so visually what this looks like is as follows
[00:08:51] as follows so let me
[00:08:52] so let me um so remember we had um in
[00:08:55] um so remember we had um in in yellow here um
[00:08:58] in yellow here um or orange the
[00:09:00] or orange the loss
[00:09:01] loss of group a
[00:09:02] of group a and in blue we have the loss of group b
[00:09:05] and in blue we have the loss of group b and the maximum group loss is this
[00:09:08] and the maximum group loss is this function of w as the other functions
[00:09:11] function of w as the other functions which is going to be the pointwise
[00:09:12] which is going to be the pointwise maximum so at every point we choose
[00:09:14] maximum so at every point we choose either
[00:09:16] either the value of
[00:09:18] the value of that's loss of a or b that's going to be
[00:09:20] that's loss of a or b that's going to be larger so as you can see it traces out
[00:09:23] larger so as you can see it traces out an upper envelope here um over here it's
[00:09:27] an upper envelope here um over here it's uh the loss of a
[00:09:29] uh the loss of a is higher so it's going to track that
[00:09:31] is higher so it's going to track that and over here the loss of b is higher so
[00:09:33] and over here the loss of b is higher so it kind of
[00:09:34] it kind of hugs b from there on
[00:09:38] okay so um let's evaluate our at our
[00:09:41] okay so um let's evaluate our at our point um w equals one
[00:09:43] point um w equals one we see that remember from the previous
[00:09:46] we see that remember from the previous slide that the two losses are of our
[00:09:50] slide that the two losses are of our 22.5 and zero for the two groups
[00:09:54] 22.5 and zero for the two groups um and then to compute the
[00:09:56] um and then to compute the maximum we just take the max of these
[00:09:58] maximum we just take the max of these two values and you get 22.5 so 22.5 is a
[00:10:01] two values and you get 22.5 so 22.5 is a single number that summarizes how
[00:10:04] single number that summarizes how bad is the worst group of and the max
[00:10:07] bad is the worst group of and the max what's the maximum group loss
[00:10:10] what's the maximum group loss and if you compare the maximum group
[00:10:12] and if you compare the maximum group loss 22.5 with the average loss which is
[00:10:15] loss 22.5 with the average loss which is 7.5
[00:10:16] 7.5 you'll see that the maximum group loss
[00:10:18] you'll see that the maximum group loss is
[00:10:19] is larger and it's always larger
[00:10:24] so now let's uh compare these two loss
[00:10:27] so now let's uh compare these two loss functions we have the average
[00:10:29] functions we have the average loss
[00:10:30] loss and the maximum group loss so we can
[00:10:32] and the maximum group loss so we can plot both of these
[00:10:34] plot both of these here
[00:10:35] here and
[00:10:36] and [Music]
[00:10:38] [Music] what
[00:10:40] what pictorially we can see what's what's
[00:10:42] pictorially we can see what's what's going on
[00:10:44] going on so uh here let me just plot our
[00:10:47] so uh here let me just plot our data points just so we have them
[00:10:49] data points just so we have them available
[00:10:50] available um
[00:10:52] um so these functions are definitely very
[00:10:54] so these functions are definitely very different
[00:10:55] different okay so what happens now when we try to
[00:10:58] okay so what happens now when we try to minimize the average loss versus the
[00:11:01] minimize the average loss versus the maximum group loss so let's start with
[00:11:03] maximum group loss so let's start with minimizing the average slot so this is
[00:11:04] minimizing the average slot so this is standard learning
[00:11:06] standard learning this is a status quo you find the
[00:11:08] this is a status quo you find the minimum of the average loss which is
[00:11:10] minimum of the average loss which is going to be
[00:11:12] going to be this point w equals
[00:11:14] this point w equals 1.09 it gets a loss of 7.29
[00:11:18] 1.09 it gets a loss of 7.29 so it looks like you're doing pretty
[00:11:20] so it looks like you're doing pretty well but if you remember look at the
[00:11:22] well but if you remember look at the worst group loss of that wave vector
[00:11:25] worst group loss of that wave vector then we'll see that it's above 20 which
[00:11:28] then we'll see that it's above 20 which is not great
[00:11:30] is not great so what you can do instead
[00:11:33] so what you can do instead is to do what we call group
[00:11:35] is to do what we call group distributionally robust optimization
[00:11:38] distributionally robust optimization or group dro which is simply going to
[00:11:40] or group dro which is simply going to minimize the maximum group loss so it's
[00:11:43] minimize the maximum group loss so it's going to minimize this purple
[00:11:46] going to minimize this purple plot here
[00:11:47] plot here and what happens when you do that you
[00:11:49] and what happens when you do that you get w equals 1.58
[00:11:51] get w equals 1.58 which gets a loss of 15.69 which is
[00:11:54] which gets a loss of 15.69 which is better than the 20 plus that you work on
[00:11:57] better than the 20 plus that you work on there
[00:11:58] there now of course the average loss is
[00:12:00] now of course the average loss is worsened
[00:12:01] worsened because at this point the average loss
[00:12:03] because at this point the average loss on the on the red curve is a little bit
[00:12:05] on the on the red curve is a little bit higher so there's a trade-off here
[00:12:08] higher so there's a trade-off here and we can see this tension kind of play
[00:12:10] and we can see this tension kind of play out over on this this plot over here
[00:12:13] out over on this this plot over here so here we see that if you were to
[00:12:15] so here we see that if you were to minimize the average loss what you would
[00:12:18] minimize the average loss what you would do is find a regressor or a model that's
[00:12:21] do is find a regressor or a model that's um very close to the points over here b
[00:12:24] um very close to the points over here b because there's four of them they're
[00:12:25] because there's four of them they're kind of the majority class dominates
[00:12:28] kind of the majority class dominates whereas if you minimize the maximum
[00:12:30] whereas if you minimize the maximum group loss then you're going to get this
[00:12:32] group loss then you're going to get this purple line which is going to be able to
[00:12:34] purple line which is going to be able to balance out the two groups
[00:12:37] balance out the two groups no matter how many points are over here
[00:12:39] no matter how many points are over here versus over here so you can think about
[00:12:41] versus over here so you can think about this the purple line is more fair
[00:12:43] this the purple line is more fair because it treats groups more
[00:12:45] because it treats groups more equally
[00:12:49] so how do we
[00:12:51] so how do we minimize the maximum grip loss
[00:12:55] minimize the maximum grip loss so
[00:12:55] so as before we're going to use gradient
[00:12:57] as before we're going to use gradient descent and follow our nose
[00:13:00] descent and follow our nose so what this looks like let me just
[00:13:02] so what this looks like let me just should i plot this
[00:13:06] so here's the objective function the
[00:13:08] so here's the objective function the maximum group loss is train loss max is
[00:13:13] maximum group loss is train loss max is remember the maximum over all the groups
[00:13:16] remember the maximum over all the groups of um
[00:13:17] of um trained the per group training loss
[00:13:21] trained the per group training loss and so
[00:13:23] and so how do you take the gradient of of a max
[00:13:27] how do you take the gradient of of a max well the grading of a max
[00:13:29] well the grading of a max remember is equal to the gradient
[00:13:32] remember is equal to the gradient of the function where we're evaluating
[00:13:35] of the function where we're evaluating at
[00:13:36] at the particular
[00:13:37] the particular value of g g star and what is g star g
[00:13:41] value of g g star and what is g star g star is the arg max
[00:13:43] star is the arg max over the training loss
[00:13:46] over the training loss so
[00:13:47] so um
[00:13:47] um let's look at this picture so basically
[00:13:49] let's look at this picture so basically what we're doing so we want to diff uh
[00:13:52] what we're doing so we want to diff uh take the gradient of this purple curve
[00:13:54] take the gradient of this purple curve right
[00:13:55] right and so if you're over here the gradient
[00:13:58] and so if you're over here the gradient of the purple curve is exactly the
[00:13:59] of the purple curve is exactly the gradient of
[00:14:02] gradient of the
[00:14:02] the the loss on group a
[00:14:04] the loss on group a and if you're over here
[00:14:07] and if you're over here the grading of the the
[00:14:09] the grading of the the maximum group loss is exactly
[00:14:12] maximum group loss is exactly the gradient of
[00:14:14] the gradient of group b and it's exactly coin's response
[00:14:18] group b and it's exactly coin's response to g star is a over here because group a
[00:14:21] to g star is a over here because group a is worse and g star is b over here
[00:14:25] is worse and g star is b over here because group b is worse
[00:14:28] because group b is worse so
[00:14:29] so to compute the gradient it's actually
[00:14:32] to compute the gradient it's actually kind of very simple you first just
[00:14:33] kind of very simple you first just evaluate
[00:14:35] evaluate at your current um
[00:14:37] at your current um weight vector
[00:14:39] weight vector what are the losses of the different
[00:14:40] what are the losses of the different groups
[00:14:42] groups and you look at the group that is
[00:14:44] and you look at the group that is hurting the most has the highest loss
[00:14:46] hurting the most has the highest loss and then you just update um on those
[00:14:49] and then you just update um on those examples
[00:14:50] examples so it's a very intuitive process you
[00:14:52] so it's a very intuitive process you find which group needs the most amount
[00:14:53] find which group needs the most amount of help and then you only um update your
[00:14:56] of help and then you only um update your parameters based on that group
[00:14:59] parameters based on that group so one note is that it's important that
[00:15:01] so one note is that it's important that we're talking about gradient descent
[00:15:04] we're talking about gradient descent not sarcastic descent because the cancer
[00:15:07] not sarcastic descent because the cancer gradient descent relies on the objective
[00:15:10] gradient descent relies on the objective function being a sum over terms but this
[00:15:14] function being a sum over terms but this is a maximum over
[00:15:16] is a maximum over a sum
[00:15:17] a sum so it exactly won't work um how did
[00:15:21] so it exactly won't work um how did exactly get stochastic methods to work
[00:15:23] exactly get stochastic methods to work properly is beyond the scope of
[00:15:25] properly is beyond the scope of this module but you can read the notes
[00:15:27] this module but you can read the notes for pointers
[00:15:30] okay so let me summarize here so we've
[00:15:33] okay so let me summarize here so we've introduced the setting where examples
[00:15:36] introduced the setting where examples are associated with groups we've done a
[00:15:38] are associated with groups we've done a free regression but this generalizes for
[00:15:41] free regression but this generalizes for classification and they're more general
[00:15:42] classification and they're more general machine learning problems
[00:15:45] machine learning problems we saw that we have the average loss
[00:15:48] we saw that we have the average loss and the maximum group loss and these are
[00:15:51] and the maximum group loss and these are different what is good on average is not
[00:15:54] different what is good on average is not going to be a group for
[00:15:57] going to be a group for all groups
[00:15:59] all groups and we see that there's always a tension
[00:16:01] and we see that there's always a tension between
[00:16:03] between the groups if the groups are pulling you
[00:16:05] the groups if the groups are pulling you in kind of different directions
[00:16:07] in kind of different directions and we saw that group distributionally
[00:16:09] and we saw that group distributionally robust optimization or group dro is a
[00:16:12] robust optimization or group dro is a very simple algorithm that minimizes the
[00:16:14] very simple algorithm that minimizes the maximum group loss
[00:16:16] maximum group loss the the purple curve over here
[00:16:19] the the purple curve over here and finally i want to remark that um
[00:16:23] and finally i want to remark that um this module has kept things simple but
[00:16:25] this module has kept things simple but there's many many more nuances so
[00:16:27] there's many many more nuances so intersectionality is this principle or a
[00:16:30] intersectionality is this principle or a property where
[00:16:31] property where you on a group such as you know white
[00:16:34] you on a group such as you know white women is actually made out of multiple
[00:16:36] women is actually made out of multiple attributes and these groups might behave
[00:16:39] attributes and these groups might behave differently than the more coarse groups
[00:16:42] differently than the more coarse groups the women or
[00:16:44] the women or um
[00:16:45] um the the set of white people
[00:16:48] the the set of white people and
[00:16:49] and so we have to kind of take into account
[00:16:52] so we have to kind of take into account um you know finer green groups
[00:16:54] um you know finer green groups there are also cases where we don't
[00:16:56] there are also cases where we don't might not know what the groups are maybe
[00:16:58] might not know what the groups are maybe we don't collect demographic information
[00:16:59] we don't collect demographic information and we have to infer them
[00:17:02] and we have to infer them there's also an issue with overfitting
[00:17:04] there's also an issue with overfitting we're talking only about the training
[00:17:06] we're talking only about the training loss here just for simplicity but of
[00:17:08] loss here just for simplicity but of course what we care about in machine
[00:17:09] course what we care about in machine learning is doing well on um out of
[00:17:13] learning is doing well on um out of a test set
[00:17:15] a test set um but we're not talking about the test
[00:17:16] um but we're not talking about the test set here so there's many more references
[00:17:20] set here so there's many more references in the notes and i hope this has piqued
[00:17:23] in the notes and i hope this has piqued your interest and
[00:17:25] your interest and realizing that inequality is should be
[00:17:27] realizing that inequality is should be considered a first-class citizen when we
[00:17:30] considered a first-class citizen when we think about machine learning methods
[00:17:32] think about machine learning methods so that's it thank you


================================================================================
LECTURE 009
================================================================================

Artificial Intelligence & Machine Learning 6 - Non Linear Features | Stanford CS221: AI(Autumn 2021)

Source: https://www.youtube.com/watch?v=eIxbNkB4byY

---

Transcript

[00:00:05] hi in this module i'm going to show you
[00:00:07] hi in this module i'm going to show you how you can use the machinery of linear
[00:00:10] how you can use the machinery of linear predictors that we've developed so far
[00:00:12] predictors that we've developed so far to get some non-linear predictors
[00:00:15] to get some non-linear predictors so we're going to first focus on
[00:00:16] so we're going to first focus on regression and then later talk about
[00:00:18] regression and then later talk about classification
[00:00:19] classification so remembering regression we're given
[00:00:21] so remembering regression we're given some training data we have a learning
[00:00:22] some training data we have a learning algorithm that produces a predictor
[00:00:25] algorithm that produces a predictor and the first key question or design
[00:00:27] and the first key question or design decision is which predictors is the
[00:00:29] decision is which predictors is the learning algorithm allowed to choose
[00:00:31] learning algorithm allowed to choose from that's the question of the
[00:00:33] from that's the question of the hypothesis class
[00:00:36] hypothesis class so for
[00:00:37] so for linear predictors remember that the
[00:00:40] linear predictors remember that the hypothesis class is defined to be the
[00:00:43] hypothesis class is defined to be the set of all predictors f x equals
[00:00:47] set of all predictors f x equals some weight vector dot some feature
[00:00:49] some weight vector dot some feature vector phi of x
[00:00:52] vector phi of x and we allow the wave vector to range
[00:00:54] and we allow the wave vector to range freely over all d dimensional real
[00:00:56] freely over all d dimensional real vectors okay so if we take phi of x
[00:01:00] vectors okay so if we take phi of x equals one comma x like we did before
[00:01:03] equals one comma x like we did before then
[00:01:04] then we can get some lines
[00:01:06] we can get some lines so if we set the weight vector to be one
[00:01:09] so if we set the weight vector to be one comma zero point five seven then we get
[00:01:11] comma zero point five seven then we get this line with an intercept at 1 and a
[00:01:14] this line with an intercept at 1 and a slope of 0.57
[00:01:16] slope of 0.57 and here's a purple one with the
[00:01:18] and here's a purple one with the intercept
[00:01:19] intercept of 2 and a slope of 0.2
[00:01:23] of 2 and a slope of 0.2 so all is good
[00:01:25] so all is good but what happens if we get data that
[00:01:27] but what happens if we get data that looks like this if you try to fit a line
[00:01:29] looks like this if you try to fit a line through it you won't be very happy with
[00:01:31] through it you won't be very happy with this
[00:01:32] this you really want to fit some sort of
[00:01:34] you really want to fit some sort of non-linear predictor something that can
[00:01:35] non-linear predictor something that can curve around to fit the data
[00:01:39] curve around to fit the data so your first reaction might be to reach
[00:01:41] so your first reaction might be to reach for something like neural networks or
[00:01:43] for something like neural networks or decision trees something that's more
[00:01:45] decision trees something that's more complex but let's see how far we can get
[00:01:47] complex but let's see how far we can get with just using linear predictors
[00:01:51] with just using linear predictors so the key thing
[00:01:53] so the key thing is that
[00:01:54] is that the feature vector
[00:01:56] the feature vector can be arbitrary
[00:01:59] can be arbitrary so let's take the feature vector to be
[00:02:01] so let's take the feature vector to be one comma x as before but let's just add
[00:02:03] one comma x as before but let's just add on an x squared term
[00:02:06] on an x squared term okay just for fun and so for example if
[00:02:10] okay just for fun and so for example if we feed x equals three then we get the
[00:02:12] we feed x equals three then we get the future vector one comma
[00:02:14] future vector one comma nine
[00:02:16] nine um let's define some weights two comma
[00:02:18] um let's define some weights two comma one comma zero point two
[00:02:21] one comma zero point two add and let's plot what that function
[00:02:23] add and let's plot what that function looks like
[00:02:24] looks like and we get a nice curve
[00:02:29] and we get a nice curve so that's a nonlinear predictor so it
[00:02:31] so that's a nonlinear predictor so it has a intercept of 2 a slope
[00:02:34] has a intercept of 2 a slope of one at the origin and a curvature of
[00:02:38] of one at the origin and a curvature of negative zero point two
[00:02:40] negative zero point two here's another one four comma
[00:02:42] here's another one four comma minus one comma zero point one
[00:02:46] minus one comma zero point one there's an intercept of four a slope of
[00:02:49] there's an intercept of four a slope of minus 1 and a curvature of 0.1
[00:02:53] minus 1 and a curvature of 0.1 and here's another one 1 1 0. so what
[00:02:56] and here's another one 1 1 0. so what does this one look like this one just
[00:02:58] does this one look like this one just looks like a line because we've used a
[00:03:00] looks like a line because we've used a zero weight on this x squared term so it
[00:03:03] zero weight on this x squared term so it just reduces to a linear predictor
[00:03:06] just reduces to a linear predictor in general we can define a family of all
[00:03:09] in general we can define a family of all quadratic predictors
[00:03:11] quadratic predictors that looks like this by ranging the wave
[00:03:13] that looks like this by ranging the wave vector freely over all
[00:03:16] vector freely over all three dimensional vectors
[00:03:18] three dimensional vectors so here is our first example of getting
[00:03:21] so here is our first example of getting a non-linear predictor in particular
[00:03:22] a non-linear predictor in particular quadratic predictors just by changing
[00:03:25] quadratic predictors just by changing feet
[00:03:26] feet the one small note here is that in one
[00:03:28] the one small note here is that in one dimension
[00:03:30] dimension x squared is just a single feature
[00:03:34] x squared is just a single feature but if x were d dimensional to begin
[00:03:36] but if x were d dimensional to begin with then to get the range full range of
[00:03:39] with then to get the range full range of quadratic predictors we would need d
[00:03:41] quadratic predictors we would need d squared features one for every x i xj
[00:03:45] squared features one for every x i xj pair
[00:03:46] pair so that would be a lot
[00:03:48] so that would be a lot so that's one slight disadvantage of
[00:03:50] so that's one slight disadvantage of using the machinery of linear predictors
[00:03:52] using the machinery of linear predictors to get nonlinear predictors
[00:03:56] to get nonlinear predictors let's move on
[00:03:57] let's move on so quadratic predictors are great but
[00:04:00] so quadratic predictors are great but they can only kind of vary smoothly
[00:04:03] they can only kind of vary smoothly what happens if you want
[00:04:05] what happens if you want a function that looks like this
[00:04:09] a function that looks like this so here's an example of a piecewise
[00:04:11] so here's an example of a piecewise constant predictor
[00:04:13] constant predictor and we can get this predictor also by
[00:04:16] and we can get this predictor also by just reimagining what a feature vector
[00:04:18] just reimagining what a feature vector is so here
[00:04:19] is so here is uh i'm going to define phi of x
[00:04:22] is uh i'm going to define phi of x equals
[00:04:23] equals and the first i'm going to carve up the
[00:04:25] and the first i'm going to carve up the input space into a bunch of regions
[00:04:28] input space into a bunch of regions and define a feature to be
[00:04:30] and define a feature to be whether x lies in that region or not
[00:04:33] whether x lies in that region or not the first feature is tests whether x is
[00:04:35] the first feature is tests whether x is between 0 and 1
[00:04:37] between 0 and 1 and the indicator function will return 1
[00:04:40] and the indicator function will return 1 if that's true and 0 otherwise
[00:04:42] if that's true and 0 otherwise um the second one is going to test
[00:04:44] um the second one is going to test between 1 and 2
[00:04:46] between 1 and 2 and so on
[00:04:48] and so on so here's an example if you punch in 0
[00:04:50] so here's an example if you punch in 0 2.3
[00:04:52] 2.3 that is 0 on all the feature slash
[00:04:54] that is 0 on all the feature slash regions except for
[00:04:56] regions except for this one
[00:04:59] okay so if i set the weight vector
[00:05:02] okay so if i set the weight vector corresponding to one two four four three
[00:05:04] corresponding to one two four four three then i get this function
[00:05:06] then i get this function and notice that each weight
[00:05:08] and notice that each weight is just identifying the function value
[00:05:11] is just identifying the function value of that region so between 0 and 1 the
[00:05:15] of that region so between 0 and 1 the feature
[00:05:17] feature vector is
[00:05:18] vector is sorry the function is at 1 and then it's
[00:05:21] sorry the function is at 1 and then it's 2 and then it's 4 and then it's 3.
[00:05:25] 2 and then it's 4 and then it's 3. okay so here's another one
[00:05:28] okay so here's another one it's 4 and then 3 3 to
[00:05:32] it's 4 and then 3 3 to 1.5
[00:05:34] 1.5 and again in general the set of
[00:05:36] and again in general the set of predictors is
[00:05:39] predictors is w dot v of x where w can range
[00:05:43] w dot v of x where w can range so this is a general technique uh
[00:05:46] so this is a general technique uh piecewise constant functions which can
[00:05:49] piecewise constant functions which can give you expressive nonlinear predictors
[00:05:51] give you expressive nonlinear predictors by partitioning the input space
[00:05:53] by partitioning the input space again a caveat is that
[00:05:56] again a caveat is that everything looks nice in one dimension
[00:05:58] everything looks nice in one dimension but if d x were d dimensions and each
[00:06:02] but if d x were d dimensions and each dimension were carved up into b regions
[00:06:05] dimension were carved up into b regions then you have b to the d different
[00:06:06] then you have b to the d different features which is an exponential number
[00:06:09] features which is an exponential number of features which is a kind of a no go
[00:06:14] so
[00:06:15] so you can kind of get the idea now but
[00:06:17] you can kind of get the idea now but let's just do another example suppose
[00:06:20] let's just do another example suppose you're trying to
[00:06:22] you're trying to predict a function with some periodic
[00:06:23] predict a function with some periodic structure like you're trying to predict
[00:06:25] structure like you're trying to predict uh traffic patterns or sales of across a
[00:06:28] uh traffic patterns or sales of across a year
[00:06:29] year so imagine that you want to get a
[00:06:31] so imagine that you want to get a function that looks like
[00:06:32] function that looks like this okay
[00:06:34] this okay so let's see if we can hack together a
[00:06:37] so let's see if we can hack together a feature vector that does that so v of x
[00:06:39] feature vector that does that so v of x equals one x
[00:06:41] equals one x and x squared so put in the quadratic
[00:06:43] and x squared so put in the quadratic and now let's add a cosine three
[00:06:47] and now let's add a cosine three it's kind of arbitrary
[00:06:49] it's kind of arbitrary um
[00:06:50] um so here's an example if you punch 2 into
[00:06:53] so here's an example if you punch 2 into x then you get this feature vector
[00:06:56] x then you get this feature vector um if you define the weights in a
[00:06:58] um if you define the weights in a certain way
[00:06:59] certain way then you get that red
[00:07:01] then you get that red curve you can take define the weights
[00:07:04] curve you can take define the weights this way and then you get the purple
[00:07:05] this way and then you get the purple curve and then so on
[00:07:08] curve and then so on so here the kind of a key idea is that
[00:07:11] so here the kind of a key idea is that you can really go wild you can throw in
[00:07:14] you can really go wild you can throw in any sort of features you want and get
[00:07:16] any sort of features you want and get all sorts of wacky looking predictors
[00:07:19] all sorts of wacky looking predictors all using the machinery of linear
[00:07:20] all using the machinery of linear predictor
[00:07:23] predictor so you might say wait a minute wait a
[00:07:24] so you might say wait a minute wait a minute
[00:07:25] minute how are we able to
[00:07:27] how are we able to do this get all this expressive
[00:07:29] do this get all this expressive non-linear capabilities when we haven't
[00:07:31] non-linear capabilities when we haven't really changed the learning algorithm or
[00:07:33] really changed the learning algorithm or it's it's still supposed to be a linear
[00:07:35] it's it's still supposed to be a linear predictor right
[00:07:37] predictor right um
[00:07:38] um well
[00:07:38] well that's because the word linear is a
[00:07:41] that's because the word linear is a little bit ambiguous here
[00:07:43] little bit ambiguous here so remember the prediction is w dot p of
[00:07:46] so remember the prediction is w dot p of x so that's the score
[00:07:48] x so that's the score and the question is what
[00:07:50] and the question is what linear in what
[00:07:51] linear in what so is the score linear in w
[00:07:54] so is the score linear in w yes because the score is just some
[00:07:57] yes because the score is just some constant times w
[00:07:58] constant times w is it linear in phi of x
[00:08:00] is it linear in phi of x yes because
[00:08:02] yes because uh it's something times v of x
[00:08:06] uh it's something times v of x how about is it linear in x
[00:08:08] how about is it linear in x well the answer is no because phi of x
[00:08:11] well the answer is no because phi of x can be arbitrary so it doesn't have to
[00:08:12] can be arbitrary so it doesn't have to be linear x
[00:08:14] be linear x and the key idea behind non-linearity is
[00:08:17] and the key idea behind non-linearity is that there's two ways of viewing it from
[00:08:19] that there's two ways of viewing it from the point of view of gaining expressive
[00:08:20] the point of view of gaining expressive non-linear predictors this is great
[00:08:23] non-linear predictors this is great because you can define phi of x to be
[00:08:25] because you can define phi of x to be something and get arbitrary non-linear
[00:08:27] something and get arbitrary non-linear functions out
[00:08:29] functions out but from the point of view of having to
[00:08:31] but from the point of view of having to learn such a
[00:08:33] learn such a model
[00:08:34] model it's actually great because the score is
[00:08:36] it's actually great because the score is a linear function of w
[00:08:39] a linear function of w and when you're learning you take the
[00:08:40] and when you're learning you take the gradient with respect to w so it's just
[00:08:42] gradient with respect to w so it's just a kind of score is just a linear
[00:08:44] a kind of score is just a linear function so
[00:08:45] function so uh life is great in fact the learning
[00:08:47] uh life is great in fact the learning algorithm doesn't even care what fee is
[00:08:49] algorithm doesn't even care what fee is it only looks at the data through the
[00:08:50] it only looks at the data through the lens of p of x it doesn't know whether
[00:08:53] lens of p of x it doesn't know whether you gave it x and then applied fee or
[00:08:55] you gave it x and then applied fee or you just gave it p of x directly
[00:09:00] okay so now let's turn from regression
[00:09:03] okay so now let's turn from regression to classification the store is story is
[00:09:06] to classification the store is story is pretty much the same you can define
[00:09:07] pretty much the same you can define arbitrary features and get nonlinear
[00:09:09] arbitrary features and get nonlinear classifiers
[00:09:10] classifiers but just to kind of review
[00:09:13] but just to kind of review remember in linear classification you
[00:09:15] remember in linear classification you define in two dimensions you define the
[00:09:17] define in two dimensions you define the feature vector to be x1 comma x2
[00:09:20] feature vector to be x1 comma x2 and if you define
[00:09:23] and if you define the predictor
[00:09:24] the predictor as now a sign here
[00:09:26] as now a sign here and the sign
[00:09:28] and the sign allows you to partition define the
[00:09:31] allows you to partition define the decision boundary which separates the
[00:09:34] decision boundary which separates the region of the space which is labeled
[00:09:35] region of the space which is labeled plus from the region of space which is
[00:09:37] plus from the region of space which is labeled minus
[00:09:39] labeled minus okay so now what is non-linear mean well
[00:09:41] okay so now what is non-linear mean well if you look at f of x because of the
[00:09:43] if you look at f of x because of the sine functions it's already non-linear
[00:09:45] sine functions it's already non-linear so it doesn't really make sense so
[00:09:47] so it doesn't really make sense so instead non-linearity for a
[00:09:49] instead non-linearity for a classification means whether the
[00:09:51] classification means whether the decision boundary is linear or not in
[00:09:54] decision boundary is linear or not in particular here is it a line
[00:09:57] particular here is it a line and if we define the feature vector as
[00:09:59] and if we define the feature vector as x1 x2 then we just get a line
[00:10:04] so so now let's try to do something a
[00:10:07] so so now let's try to do something a little bit more interesting so let's see
[00:10:09] little bit more interesting so let's see if we can define a quadratic classifier
[00:10:11] if we can define a quadratic classifier suppose we wanted to define a classifier
[00:10:14] suppose we wanted to define a classifier that looks like this so it's a circle
[00:10:17] that looks like this so it's a circle the decision boundary is a circle where
[00:10:20] the decision boundary is a circle where inside the circle we want to label as
[00:10:22] inside the circle we want to label as plus and outside we want to label as
[00:10:24] plus and outside we want to label as minus
[00:10:26] minus okay so how are we going to do that
[00:10:28] okay so how are we going to do that well let's start with a feature vector
[00:10:31] well let's start with a feature vector equals uh x1 x2 as we had before and now
[00:10:36] equals uh x1 x2 as we had before and now we're just going to tack on a quadratic
[00:10:38] we're just going to tack on a quadratic term so x1 squared plus x2 squared
[00:10:42] term so x1 squared plus x2 squared okay
[00:10:44] okay and now if you define the corresponding
[00:10:46] and now if you define the corresponding weight vector to be 2 2 minus 1 then i
[00:10:50] weight vector to be 2 2 minus 1 then i claim that this gives you exactly this
[00:10:52] claim that this gives you exactly this decision boundary
[00:10:54] decision boundary which is a circle
[00:10:56] which is a circle so there's some algebra that you can do
[00:10:58] so there's some algebra that you can do which i'm going to uh
[00:11:00] which i'm going to uh skip over but what you can do is you can
[00:11:02] skip over but what you can do is you can rewrite this expression
[00:11:04] rewrite this expression as follows
[00:11:06] as follows so f of x the same f of x is equal to 1
[00:11:10] so f of x the same f of x is equal to 1 if
[00:11:12] if this quadratic form is less than or
[00:11:13] this quadratic form is less than or equal so what is this you might remember
[00:11:15] equal so what is this you might remember from
[00:11:16] from kind of algebra uh trigonometry days
[00:11:19] kind of algebra uh trigonometry days that this is the squared distance of a
[00:11:22] that this is the squared distance of a point to the point one one
[00:11:25] point to the point one one okay so in particular if i
[00:11:27] okay so in particular if i constrain the squared distance to be
[00:11:29] constrain the squared distance to be less than or equal to 2 then this
[00:11:31] less than or equal to 2 then this is the region of points
[00:11:35] is the region of points with within radius square root of 2 of a
[00:11:39] with within radius square root of 2 of a circle centered at 1 1 which is exactly
[00:11:41] circle centered at 1 1 which is exactly what this is
[00:11:43] what this is and everything else is classified as -1
[00:11:47] and everything else is classified as -1 so the zoom boundary
[00:11:49] so the zoom boundary we got successfully to be a circle
[00:11:54] okay so
[00:11:55] okay so this is let me try to take one more step
[00:11:58] this is let me try to take one more step to try to reconcile this tension between
[00:12:01] to try to reconcile this tension between linear
[00:12:03] linear in phi of x and nonlinear and x okay so
[00:12:06] in phi of x and nonlinear and x okay so what we're going to do here is
[00:12:08] what we're going to do here is remember the input space
[00:12:11] remember the input space x
[00:12:12] x this decision boundary is a circle
[00:12:14] this decision boundary is a circle and in feature space
[00:12:17] and in feature space you can see that the decision boundary
[00:12:18] you can see that the decision boundary is a line
[00:12:20] is a line so here is a cool
[00:12:22] so here is a cool animation that i found on youtube which
[00:12:24] animation that i found on youtube which i think
[00:12:25] i think really nicely illustrates this
[00:12:28] really nicely illustrates this so this is done in context of svm but
[00:12:29] so this is done in context of svm but the idea is the same so here we have
[00:12:32] the idea is the same so here we have points in inside the circle and outside
[00:12:34] points in inside the circle and outside the circle in the ambient
[00:12:37] the circle in the ambient x space they're not separable what we're
[00:12:39] x space they're not separable what we're going to do is we're going to apply the
[00:12:41] going to do is we're going to apply the feature map and the feature map remember
[00:12:43] feature map and the feature map remember adds this third dimension
[00:12:45] adds this third dimension x1 squared plus x2 squared and now we're
[00:12:48] x1 squared plus x2 squared and now we're in feature space which is 3d
[00:12:50] in feature space which is 3d and in 3d we can actually slice a linear
[00:12:53] and in 3d we can actually slice a linear predictor that separates the green of
[00:12:56] predictor that separates the green of the red and the blue points
[00:12:58] the red and the blue points and that separation
[00:13:00] and that separation induces a circle in the
[00:13:03] induces a circle in the original 2t 2tsp
[00:13:10] okay to summarize
[00:13:12] okay to summarize so
[00:13:13] so linear is ambiguous
[00:13:15] linear is ambiguous so we have a predictor
[00:13:18] so we have a predictor in a case of regression which is w dot v
[00:13:20] in a case of regression which is w dot v of x
[00:13:21] of x it's linear in w and phi of x but it's
[00:13:24] it's linear in w and phi of x but it's non-linear in x
[00:13:25] non-linear in x and this is what allows us to get
[00:13:27] and this is what allows us to get nonlinear predictors using the machinery
[00:13:29] nonlinear predictors using the machinery of linear predictor
[00:13:31] of linear predictor we solve for regression nonlinearity
[00:13:33] we solve for regression nonlinearity talks about the predictor directly and
[00:13:35] talks about the predictor directly and classification we talk about the
[00:13:36] classification we talk about the decision boundary
[00:13:38] decision boundary we also saw many types of nonlinear
[00:13:40] we also saw many types of nonlinear features quadratic features piecewise
[00:13:42] features quadratic features piecewise constant periodic features and again you
[00:13:44] constant periodic features and again you can kind of make up your own features
[00:13:46] can kind of make up your own features for the application you have in mind
[00:13:50] for the application you have in mind so next time someone on the street asks
[00:13:52] so next time someone on the street asks you about linear predictors you first
[00:13:54] you about linear predictors you first have to clarify linear and what
[00:13:57] have to clarify linear and what okay that's the end
[00:14:04] you


================================================================================
LECTURE 010
================================================================================

Artificial Intelligence & Machine Learning 7 - Feature Templates | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=2QfSBLtvioE

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about how to use feature templates to
[00:00:09] about how to use feature templates to organize and construct your features in
[00:00:11] organize and construct your features in a very flexible way
[00:00:14] a very flexible way so recall that a hypothesis class is a
[00:00:17] so recall that a hypothesis class is a set of all predictors that a learning
[00:00:18] set of all predictors that a learning algorithm is going to consider
[00:00:21] algorithm is going to consider and then in case of linear predictors
[00:00:23] and then in case of linear predictors we've looked at predictors
[00:00:25] we've looked at predictors of a function of x to be equal to in the
[00:00:28] of a function of x to be equal to in the case of regression w dot phi of x
[00:00:32] case of regression w dot phi of x or in the case of classification sine of
[00:00:34] or in the case of classification sine of that quantity
[00:00:36] that quantity and we allow the wave vectors to where
[00:00:38] and we allow the wave vectors to where vary freely
[00:00:41] okay so we can visualize the hypothesis
[00:00:44] okay so we can visualize the hypothesis class as follows so imagine the space of
[00:00:46] class as follows so imagine the space of all possible predictors all possible
[00:00:48] all possible predictors all possible functions mapping x to y
[00:00:51] functions mapping x to y when you define a feature extractor fee
[00:00:55] when you define a feature extractor fee what you're doing is committing to a
[00:00:57] what you're doing is committing to a particular subset of all possible
[00:00:59] particular subset of all possible predictors
[00:01:01] predictors and usually you do this by
[00:01:04] and usually you do this by using prior knowledge
[00:01:06] using prior knowledge and the second part is the learning
[00:01:08] and the second part is the learning algorithm
[00:01:10] algorithm where you're given script f the
[00:01:12] where you're given script f the hypothesis class you're asking the
[00:01:14] hypothesis class you're asking the learning algorithm to choose a
[00:01:16] learning algorithm to choose a particular predictor from that set
[00:01:18] particular predictor from that set based on data
[00:01:21] based on data so intuitively we want the script f
[00:01:24] so intuitively we want the script f hypothesis class to
[00:01:26] hypothesis class to contain the good predictors of course
[00:01:28] contain the good predictors of course but it can also contain some bad ones
[00:01:30] but it can also contain some bad ones because they will will be filtered out
[00:01:32] because they will will be filtered out based on the basis of data but we don't
[00:01:34] based on the basis of data but we don't want it to be too big
[00:01:37] want it to be too big so that the learning algorithm has a
[00:01:39] so that the learning algorithm has a trouble identifying good predictors from
[00:01:41] trouble identifying good predictors from the bad predictors
[00:01:45] so let's look at an example task and i
[00:01:48] so let's look at an example task and i want to give you an idea of how to
[00:01:50] want to give you an idea of how to choose the feature extractor
[00:01:53] choose the feature extractor so suppose you're given a string
[00:01:55] so suppose you're given a string such as abc gmail.com and you're asked
[00:01:59] such as abc gmail.com and you're asked to predict
[00:02:00] to predict whether
[00:02:01] whether this is a valid email address or not
[00:02:04] this is a valid email address or not using a linear classifier
[00:02:07] using a linear classifier so in this case what we have to do is to
[00:02:11] so in this case what we have to do is to identify the feature extractor fee
[00:02:15] identify the feature extractor fee so when you're designing a feature
[00:02:16] so when you're designing a feature extractor the main question you ask
[00:02:18] extractor the main question you ask yourself is what properties of the input
[00:02:21] yourself is what properties of the input x might be relevant for predicting why
[00:02:25] x might be relevant for predicting why of course you don't want to necessarily
[00:02:26] of course you don't want to necessarily commit to a particular aspect to be
[00:02:29] commit to a particular aspect to be important because you don't know that
[00:02:30] important because you don't know that you want to learn that from data
[00:02:32] you want to learn that from data but you should give the learning
[00:02:34] but you should give the learning algorithm some guidance
[00:02:37] algorithm some guidance so what we're going to do is define the
[00:02:39] so what we're going to do is define the feature extractor as given x produce a
[00:02:43] feature extractor as given x produce a set of feature name feature value pairs
[00:02:46] set of feature name feature value pairs so in this particular example the
[00:02:47] so in this particular example the feature extractor is going to produce a
[00:02:49] feature extractor is going to produce a feature vector
[00:02:51] feature vector and let's say if in this case we might
[00:02:54] and let's say if in this case we might look at
[00:02:56] look at the length
[00:02:57] the length is it greater than 10
[00:02:59] is it greater than 10 in case it's one because length
[00:03:02] in case it's one because length is has something to do with whether it's
[00:03:03] is has something to do with whether it's a value address the fraction of
[00:03:05] a value address the fraction of alphanumeric characters
[00:03:07] alphanumeric characters 0.85 in this case doesn't contain an at
[00:03:10] 0.85 in this case doesn't contain an at sign that's one because it does contain
[00:03:12] sign that's one because it does contain an x sign
[00:03:14] an x sign ends with a com
[00:03:16] ends with a com and it's
[00:03:17] and it's one here
[00:03:18] one here and does it end with dot org and it's
[00:03:21] and does it end with dot org and it's zero here
[00:03:22] zero here okay so this is a feature vector or that
[00:03:26] okay so this is a feature vector or that we might construct for this particular
[00:03:28] we might construct for this particular application
[00:03:30] application so now we go to prediction
[00:03:33] so now we go to prediction so remember that we've previously
[00:03:36] so remember that we've previously defined the feature vector to just be
[00:03:39] defined the feature vector to just be a real vector it's just a list of
[00:03:41] a real vector it's just a list of numbers so what we've done right now is
[00:03:43] numbers so what we've done right now is to just annotate or comment each
[00:03:46] to just annotate or comment each component of that feature vector with a
[00:03:48] component of that feature vector with a name that describes what that
[00:03:50] name that describes what that component is about
[00:03:52] component is about we can do the same thing with the
[00:03:53] we can do the same thing with the corresponding wave vector so here is a
[00:03:56] corresponding wave vector so here is a white vector just a list of numbers and
[00:03:58] white vector just a list of numbers and we can annotate each component with the
[00:04:00] we can annotate each component with the name of the corresponding field
[00:04:04] name of the corresponding field and recall that the score is just the
[00:04:06] and recall that the score is just the dot product w dot phi of x
[00:04:09] dot product w dot phi of x and just to write out the dot product
[00:04:10] and just to write out the dot product it's a
[00:04:13] it's a sum over all the features or components
[00:04:16] sum over all the features or components of wj the weight of that feature times
[00:04:20] of wj the weight of that feature times the feature value
[00:04:22] the feature value okay so here's an example
[00:04:24] okay so here's an example um the weight of length greater than 10
[00:04:27] um the weight of length greater than 10 is minus 1.2
[00:04:28] is minus 1.2 the the feature value is one so you have
[00:04:31] the the feature value is one so you have that product here and you have all the
[00:04:33] that product here and you have all the other features
[00:04:37] so a little piece of intuition here is
[00:04:39] so a little piece of intuition here is that you can think about the score
[00:04:41] that you can think about the score remember in classification positive
[00:04:43] remember in classification positive scores are
[00:04:45] scores are uh result in positive classification
[00:04:47] uh result in positive classification negative scores result in negative
[00:04:49] negative scores result in negative classification you can think about each
[00:04:51] classification you can think about each feature as providing a
[00:04:53] feature as providing a vote
[00:04:54] vote you can think about if let's say phi of
[00:04:56] you can think about if let's say phi of x j is one
[00:04:59] x j is one and w j if it's positive that means it's
[00:05:01] and w j if it's positive that means it's vading in fo fading in favor of a
[00:05:05] vading in fo fading in favor of a positive classification and if wj is
[00:05:07] positive classification and if wj is negative it's invading voting in favor
[00:05:10] negative it's invading voting in favor of negative classification and the
[00:05:12] of negative classification and the magnitude of wj determines the strength
[00:05:14] magnitude of wj determines the strength of the vote
[00:05:15] of the vote so that's another way to interpret
[00:05:18] so that's another way to interpret the dot product before we previously saw
[00:05:20] the dot product before we previously saw that we can interpret as a cosine of the
[00:05:22] that we can interpret as a cosine of the angle which is a more geometric
[00:05:24] angle which is a more geometric interpretation
[00:05:28] so so far we've seen that we can take
[00:05:30] so so far we've seen that we can take inputs define arbitrary features
[00:05:32] inputs define arbitrary features extractors get out our feature vectors
[00:05:34] extractors get out our feature vectors and do learner
[00:05:35] and do learner but
[00:05:36] but how do we choose these feature vectors i
[00:05:39] how do we choose these feature vectors i i just kind of made up at com and org um
[00:05:42] i just kind of made up at com and org um which ones do we include
[00:05:44] which ones do we include so far we've used some primary knowledge
[00:05:46] so far we've used some primary knowledge but um it's very easy
[00:05:48] but um it's very easy in this manner to miss some um what
[00:05:51] in this manner to miss some um what about
[00:05:51] about suffixes like an enus for example
[00:05:55] suffixes like an enus for example we need a more systematic way of doing
[00:05:58] we need a more systematic way of doing this
[00:05:59] this and this is where feature templates come
[00:06:01] and this is where feature templates come in it comes in so a feature template is
[00:06:04] in it comes in so a feature template is simply a group of features all computed
[00:06:06] simply a group of features all computed in a similar way
[00:06:08] in a similar way so here's an example so with the input
[00:06:10] so here's an example so with the input abc gmail.com
[00:06:12] abc gmail.com we're going to write the feature
[00:06:14] we're going to write the feature template as simply an english
[00:06:16] template as simply an english description
[00:06:17] description with a blank and that blank is meant to
[00:06:20] with a blank and that blank is meant to be filled in with an arbitrary value
[00:06:22] be filled in with an arbitrary value last three characters equals something
[00:06:25] last three characters equals something and by instantiating that blank with all
[00:06:28] and by instantiating that blank with all sorts of different
[00:06:29] sorts of different values then we begin to realize
[00:06:33] values then we begin to realize the feature vectors
[00:06:35] the feature vectors that are the features that are actually
[00:06:38] that are the features that are actually defined by this feature template
[00:06:44] so
[00:06:45] so the important part here is that
[00:06:47] the important part here is that we no longer have to say which suffixes
[00:06:50] we no longer have to say which suffixes are important we don't have to say what
[00:06:52] are important we don't have to say what types of patterns
[00:06:54] types of patterns what particular patterns to look at we
[00:06:56] what particular patterns to look at we just have to know that there exists
[00:06:58] just have to know that there exists some suffix
[00:06:59] some suffix that might be important in define this
[00:07:01] that might be important in define this feature template letting the learning
[00:07:03] feature template letting the learning algorithm sort out which of these many
[00:07:06] algorithm sort out which of these many features are actually relevant
[00:07:11] so to continue this example so the input
[00:07:13] so to continue this example so the input is abc gmail.com we define this feature
[00:07:16] is abc gmail.com we define this feature template which can be instantiated uh by
[00:07:19] template which can be instantiated uh by substituting something like dot com
[00:07:21] substituting something like dot com um we can also define this other feature
[00:07:23] um we can also define this other feature template length greater than blank
[00:07:26] template length greater than blank and we can type plug in one two three
[00:07:28] and we can type plug in one two three four five six seven eight nine ten and
[00:07:30] four five six seven eight nine ten and so on into that
[00:07:31] so on into that um some feature templates
[00:07:33] um some feature templates are don't have a blank and that's okay
[00:07:35] are don't have a blank and that's okay because that just corresponds to specify
[00:07:38] because that just corresponds to specify one single feature and that has a
[00:07:40] one single feature and that has a particular value
[00:07:42] particular value so here's another example so suppose the
[00:07:45] so here's another example so suppose the input is an aerial image along with some
[00:07:48] input is an aerial image along with some metadata about the location
[00:07:50] metadata about the location so you can go figure out where this
[00:07:51] so you can go figure out where this actually is
[00:07:52] actually is um
[00:07:53] um so the feature template in this case we
[00:07:56] so the feature template in this case we might want to look at the following so
[00:07:59] might want to look at the following so we want to look at the pixel intensity
[00:08:01] we want to look at the pixel intensity of this image at a particular row
[00:08:05] of this image at a particular row and a particular column
[00:08:07] and a particular column and it's a color image so there's three
[00:08:10] and it's a color image so there's three channels rgb so we for identify a
[00:08:13] channels rgb so we for identify a particular channel that we're looking at
[00:08:15] particular channel that we're looking at so this might be instantiated as the
[00:08:17] so this might be instantiated as the pixel intensity of image at row 10 and
[00:08:19] pixel intensity of image at row 10 and column 93
[00:08:20] column 93 red channel and that might have a
[00:08:22] red channel and that might have a particular value
[00:08:25] particular value another feature template might look at
[00:08:27] another feature template might look at the metadata the location
[00:08:29] the metadata the location and be
[00:08:30] and be a feature on whether the latitude is in
[00:08:33] a feature on whether the latitude is in a particular range and longitude is in a
[00:08:35] a particular range and longitude is in a particular range
[00:08:37] particular range so
[00:08:38] so this feature template gets instantiated
[00:08:41] this feature template gets instantiated might be essentially with um
[00:08:43] might be essentially with um particular values that uh denote ranges
[00:08:47] particular values that uh denote ranges um
[00:08:48] um so this is if you remember um piecewise
[00:08:52] so this is if you remember um piecewise constant features this is an example of
[00:08:54] constant features this is an example of piecewise constant features that carves
[00:08:56] piecewise constant features that carves up the
[00:08:57] up the um the
[00:08:58] um the the world into a bunch of regions and
[00:09:00] the world into a bunch of regions and has a feature firing if the
[00:09:04] has a feature firing if the lat long is in a particular region or
[00:09:06] lat long is in a particular region or not
[00:09:09] so one thing you might know is that
[00:09:11] so one thing you might know is that feature templates are pretty flexible
[00:09:13] feature templates are pretty flexible but sometimes they can give rise to a
[00:09:15] but sometimes they can give rise to a lot of features last character equals
[00:09:17] lot of features last character equals blank and there's already you know 26 if
[00:09:20] blank and there's already you know 26 if you only include a lowercase letters
[00:09:23] you only include a lowercase letters and furthermore most of these feature
[00:09:26] and furthermore most of these feature values are zero
[00:09:28] values are zero so in these cases this is what we mean
[00:09:31] so in these cases this is what we mean when a feature vector is sparse and you
[00:09:33] when a feature vector is sparse and you can actually represent sparse feature
[00:09:36] can actually represent sparse feature vectors
[00:09:36] vectors by more compactly by just as a
[00:09:39] by more compactly by just as a dictionary mapping the feature name to
[00:09:41] dictionary mapping the feature name to the actual feature value
[00:09:44] so in general there's two ways you can
[00:09:47] so in general there's two ways you can represent feature vectors one is using
[00:09:50] represent feature vectors one is using arrays and one is using dictionaries
[00:09:53] arrays and one is using dictionaries so if your feature vector looks like
[00:09:55] so if your feature vector looks like this which is dense or not sparse
[00:09:58] this which is dense or not sparse that means all the feature values are
[00:10:00] that means all the feature values are mostly non-zero
[00:10:02] mostly non-zero then you might want to just represent
[00:10:04] then you might want to just represent this as an array order the feature
[00:10:05] this as an array order the feature somehow and just list out the numbers
[00:10:08] somehow and just list out the numbers but in cases where your feature vector
[00:10:10] but in cases where your feature vector looks like this and has lots of zeros
[00:10:13] looks like this and has lots of zeros then it will be more efficient to
[00:10:15] then it will be more efficient to represent this as a dictionary where you
[00:10:17] represent this as a dictionary where you again you specify the feature name colon
[00:10:20] again you specify the feature name colon the
[00:10:21] the feature value of only the non-zero
[00:10:24] feature value of only the non-zero elements and by a convention anything
[00:10:27] elements and by a convention anything that is not mentioned has a value of
[00:10:29] that is not mentioned has a value of zero
[00:10:31] zero so one interesting advantage of sparse
[00:10:33] so one interesting advantage of sparse features is that you don't have to ins a
[00:10:36] features is that you don't have to ins a priori instantiate all the features in
[00:10:38] priori instantiate all the features in advance you can kind of as data comes
[00:10:42] advance you can kind of as data comes you only kind of lazily build up these
[00:10:44] you only kind of lazily build up these features over time whereas if you were
[00:10:47] features over time whereas if you were doing
[00:10:48] doing things in a dense way
[00:10:50] things in a dense way you would have to
[00:10:51] you would have to pre-define the fixed set of features
[00:10:53] pre-define the fixed set of features that you're going to be working with now
[00:10:55] that you're going to be working with now in recent years with deep learning dense
[00:10:57] in recent years with deep learning dense features have and arrays have been much
[00:11:00] features have and arrays have been much more ubiquitous
[00:11:02] more ubiquitous partly because you can take advantage of
[00:11:04] partly because you can take advantage of fast
[00:11:05] fast matrix computations on the gpu
[00:11:10] so to summarize
[00:11:13] so to summarize we want to identify hypothesis classes
[00:11:16] we want to identify hypothesis classes and in this case we're looking at
[00:11:19] and in this case we're looking at defining the hypothesis class with
[00:11:21] defining the hypothesis class with respect to the feature extractor
[00:11:24] respect to the feature extractor to define the feature of extractor we
[00:11:26] to define the feature of extractor we use feature templates which is a
[00:11:27] use feature templates which is a convenient shorthand for unrolling a
[00:11:30] convenient shorthand for unrolling a feature single feature template into a
[00:11:33] feature single feature template into a bunch of different features we also saw
[00:11:35] bunch of different features we also saw that in some cases the feature vectors
[00:11:37] that in some cases the feature vectors were sparse and therefore you can use a
[00:11:40] were sparse and therefore you can use a dictionary implementation to be more
[00:11:42] dictionary implementation to be more efficient
[00:11:43] efficient okay so that's the end of this module
[00:11:45] okay so that's the end of this module thanks


================================================================================
LECTURE 011
================================================================================

Artificial Intelligence & Machine Learning 8 - Neural Networks | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=pnKXgBHuN58

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about neural networks a way to construct
[00:00:09] about neural networks a way to construct nonlinear predictors via problem
[00:00:11] nonlinear predictors via problem decomposition
[00:00:14] decomposition so when we started we talked about
[00:00:15] so when we started we talked about linear predictor and there were two
[00:00:17] linear predictor and there were two linear in two ways first is that the
[00:00:20] linear in two ways first is that the feature vector was linear function of x
[00:00:23] feature vector was linear function of x and the way that the feature vector
[00:00:25] and the way that the feature vector interacted with the prediction was also
[00:00:26] interacted with the prediction was also linear this gave you rise to lines
[00:00:31] linear this gave you rise to lines next we talked about non-linear
[00:00:33] next we talked about non-linear predictors but
[00:00:34] predictors but keeping the same
[00:00:36] keeping the same linear machinery but just playing around
[00:00:38] linear machinery but just playing around with the feature vector and by adding
[00:00:41] with the feature vector and by adding terms like x squared you could get
[00:00:43] terms like x squared you could get quadratic predictors and so on
[00:00:46] quadratic predictors and so on so now what we're going to do is we're
[00:00:47] so now what we're going to do is we're going to define neural networks where we
[00:00:50] going to define neural networks where we can just leave
[00:00:51] can just leave b of x alone the future vector alone
[00:00:54] b of x alone the future vector alone and play with the way that the feature
[00:00:56] and play with the way that the feature vector results in the prediction
[00:01:00] vector results in the prediction and that will allow us to get all sorts
[00:01:02] and that will allow us to get all sorts of fancy stuff
[00:01:05] so let me begin with a motivating
[00:01:07] so let me begin with a motivating example
[00:01:08] example so suppose you're trying to predict
[00:01:10] so suppose you're trying to predict whether two cars are going to collide or
[00:01:12] whether two cars are going to collide or not so the input are the positions of
[00:01:14] not so the input are the positions of the two cars
[00:01:16] the two cars so x1 is the position of car 1 and x2 is
[00:01:19] so x1 is the position of car 1 and x2 is the position of car 2.
[00:01:22] the position of car 2. and what you like to output is whether
[00:01:24] and what you like to output is whether y equals one whether there's it's safe
[00:01:27] y equals one whether there's it's safe or y equals one whether y equals minus
[00:01:29] or y equals one whether y equals minus one or whether they're wide or not
[00:01:32] one or whether they're wide or not and
[00:01:34] and what is unknown here um is that
[00:01:38] what is unknown here um is that we're going to say that cars are safe if
[00:01:40] we're going to say that cars are safe if they're sufficiently far so the distance
[00:01:42] they're sufficiently far so the distance between them is at least one then we're
[00:01:44] between them is at least one then we're going to be safe you can visualize this
[00:01:47] going to be safe you can visualize this a true
[00:01:49] a true um predictor as follows so here is x1
[00:01:52] um predictor as follows so here is x1 and x2 and
[00:01:55] and x2 and um
[00:01:56] um what is going to happen was you're going
[00:01:57] what is going to happen was you're going to draw these two lines here
[00:02:00] to draw these two lines here and anything any point that is over here
[00:02:06] and anything any point that is over here and anything that is over here is going
[00:02:09] and anything that is over here is going to be labeled as uh
[00:02:11] to be labeled as uh plus which is safe and anything that's
[00:02:14] plus which is safe and anything that's in between is going to be labeled as
[00:02:16] in between is going to be labeled as minus or that will collide
[00:02:19] minus or that will collide okay so let's do some examples here so
[00:02:22] okay so let's do some examples here so suppose we have a point zero two which
[00:02:25] suppose we have a point zero two which is this point here
[00:02:27] is this point here um
[00:02:28] um this is safe
[00:02:30] this is safe so y equals one two zero is also
[00:02:34] so y equals one two zero is also safe and 0 0
[00:02:37] safe and 0 0 is uh
[00:02:39] is uh here which is not safe
[00:02:41] here which is not safe and 2 2
[00:02:43] and 2 2 is -1 which is also not safe
[00:02:47] is -1 which is also not safe okay so
[00:02:48] okay so as an aside this configuration points is
[00:02:52] as an aside this configuration points is what was historically known as a xor
[00:02:54] what was historically known as a xor problem and was shown that
[00:02:56] problem and was shown that pure linear classifiers could not be
[00:02:58] pure linear classifiers could not be used to solve this problem you couldn't
[00:03:00] used to solve this problem you couldn't draw a line to separate the blue and the
[00:03:02] draw a line to separate the blue and the orange points but nonetheless we're
[00:03:05] orange points but nonetheless we're going to show how neural networks can be
[00:03:06] going to show how neural networks can be used to
[00:03:08] used to solve this
[00:03:11] okay so the key intuition is the
[00:03:14] okay so the key intuition is the idea of problem decomposition so instead
[00:03:16] idea of problem decomposition so instead of solving the problem all at once we're
[00:03:18] of solving the problem all at once we're going to decompose it into two
[00:03:20] going to decompose it into two subproblems
[00:03:21] subproblems but first we're going to test if car 1
[00:03:23] but first we're going to test if car 1 is to the far right of car 2.
[00:03:26] is to the far right of car 2. and in the picture here that corresponds
[00:03:28] and in the picture here that corresponds to
[00:03:29] to simply this
[00:03:30] simply this region over here which we're going to
[00:03:32] region over here which we're going to call
[00:03:33] call h1 so h1 is whether x1 minus x2 is
[00:03:37] h1 so h1 is whether x1 minus x2 is greater than equal to 1.
[00:03:40] greater than equal to 1. and then we're going to find another sub
[00:03:41] and then we're going to find another sub problem testing whether car 2 is to the
[00:03:43] problem testing whether car 2 is to the far right of car 1
[00:03:45] far right of car 1 which is called h2
[00:03:47] which is called h2 that corresponds to this
[00:03:50] that corresponds to this region over here
[00:03:53] region over here and then we're going to predict safe if
[00:03:55] and then we're going to predict safe if at least one of them is true so we just
[00:03:57] at least one of them is true so we just add the 2 here which is either one or
[00:04:00] add the 2 here which is either one or zero and
[00:04:02] zero and if at least one of them is one then
[00:04:04] if at least one of them is one then we're going to return plus one
[00:04:07] we're going to return plus one and by convention we're going to assume
[00:04:09] and by convention we're going to assume that the sine of zero is uh
[00:04:11] that the sine of zero is uh minus
[00:04:12] minus one
[00:04:14] one okay so
[00:04:15] okay so um here are some examples here so
[00:04:18] um here are some examples here so suppose we have zero two again so this
[00:04:21] suppose we have zero two again so this point
[00:04:22] point h1 says nope that's not on my side h2
[00:04:26] h1 says nope that's not on my side h2 says yep that's on my side and at least
[00:04:29] says yep that's on my side and at least one is enough to make the prediction
[00:04:31] one is enough to make the prediction plus one
[00:04:32] plus one if you take two zero that's this point
[00:04:35] if you take two zero that's this point um each one says yep h2 says nope and
[00:04:38] um each one says yep h2 says nope and then f is one because all it takes is
[00:04:41] then f is one because all it takes is one
[00:04:42] one zero zero is this point both of them say
[00:04:46] zero zero is this point both of them say no and it's minus one and same with two
[00:04:49] no and it's minus one and same with two two both of them say no
[00:04:51] two both of them say no it's minus
[00:04:55] okay so so far we've just defined the
[00:04:57] okay so so far we've just defined the true function f
[00:04:59] true function f um
[00:05:00] um uh of course we don't know f
[00:05:02] uh of course we don't know f so
[00:05:03] so what we're going to do is try to
[00:05:06] what we're going to do is try to move gradually to defining a hypothesis
[00:05:09] move gradually to defining a hypothesis class and the first next step is to
[00:05:12] class and the first next step is to rewrite f using vector notation
[00:05:16] rewrite f using vector notation so here are the two intermediate sub
[00:05:18] so here are the two intermediate sub problems
[00:05:19] problems and the predictor is f of x equals the
[00:05:21] and the predictor is f of x equals the sine
[00:05:23] sine and what we're going to do is to write
[00:05:25] and what we're going to do is to write this in terms of a dot product between a
[00:05:28] this in terms of a dot product between a weight vector and a
[00:05:30] weight vector and a future vector so here's the feature
[00:05:32] future vector so here's the feature vector
[00:05:32] vector 1 x 1
[00:05:34] 1 x 1 x 2
[00:05:36] x 2 and then we're going to define a white
[00:05:38] and then we're going to define a white vector which is minus 1 and if you look
[00:05:40] vector which is minus 1 and if you look at the dot product it's going to be so
[00:05:42] at the dot product it's going to be so it's minus 1
[00:05:43] it's minus 1 plus x 1
[00:05:45] plus x 1 minus x 2
[00:05:47] minus x 2 and if that quantity is greater than 0
[00:05:50] and if that quantity is greater than 0 then we're going to return 1
[00:05:53] then we're going to return 1 otherwise return 0.
[00:05:55] otherwise return 0. and you can verify that this is exactly
[00:05:58] and you can verify that this is exactly just a rewrite of this expression
[00:06:00] just a rewrite of this expression and similarly if you reverse the roles
[00:06:02] and similarly if you reverse the roles of x1 and x2 then you can rewrite
[00:06:05] of x1 and x2 then you can rewrite h2
[00:06:06] h2 as
[00:06:07] as in vector notation as well
[00:06:10] in vector notation as well now what we're going to do is we're
[00:06:11] now what we're going to do is we're going to just
[00:06:13] going to just combine h1 and h2 by stacking them
[00:06:17] combine h1 and h2 by stacking them so we're going to find this matrix which
[00:06:19] so we're going to find this matrix which is just the two
[00:06:21] is just the two wave vectors here stacked up so we have
[00:06:23] wave vectors here stacked up so we have two rows here
[00:06:25] two rows here and we're going to
[00:06:27] and we're going to multiply this matrix by the feature
[00:06:29] multiply this matrix by the feature vector
[00:06:30] vector so remember left multiplication by a
[00:06:33] so remember left multiplication by a matrix is just taking the dot product
[00:06:35] matrix is just taking the dot product with each of the rows of that matrix
[00:06:38] with each of the rows of that matrix and now this produces
[00:06:40] and now this produces a two dimensional vector and we're going
[00:06:42] a two dimensional vector and we're going to test whether each component is
[00:06:45] to test whether each component is greater than or equal to zero
[00:06:47] greater than or equal to zero so in the end
[00:06:48] so in the end h of x is going to be a two dimensional
[00:06:52] h of x is going to be a two dimensional vector
[00:06:54] vector okay and now given that we can rewrite
[00:06:56] okay and now given that we can rewrite the predictor
[00:06:58] the predictor as simply the sine of the dot product
[00:07:00] as simply the sine of the dot product between 1 1 and h of x which is simply
[00:07:04] between 1 1 and h of x which is simply the sum of the two components
[00:07:07] the sum of the two components so now we've written
[00:07:09] so now we've written f of x which is the true function in
[00:07:11] f of x which is the true function in terms of
[00:07:13] terms of a bunch of
[00:07:15] a bunch of matrix or vector multiplies
[00:07:18] matrix or vector multiplies now everything in red here
[00:07:20] now everything in red here are just numbers
[00:07:22] are just numbers and so far we've specified what they are
[00:07:24] and so far we've specified what they are but
[00:07:25] but in general we're not going to know them
[00:07:27] in general we're not going to know them and we're going to have to learn them
[00:07:29] and we're going to have to learn them from data
[00:07:31] from data but
[00:07:32] but before we do that we're going to
[00:07:34] before we do that we're going to preemptively see one problem that's
[00:07:36] preemptively see one problem that's going to come up
[00:07:38] going to come up and this problem we saw before when we
[00:07:40] and this problem we saw before when we tried to optimize the zero one loss
[00:07:42] tried to optimize the zero one loss so let's look at the gradient of h1 of x
[00:07:46] so let's look at the gradient of h1 of x with respect to v1
[00:07:48] with respect to v1 um we can plot this as follows so here
[00:07:51] um we can plot this as follows so here is um
[00:07:52] is um this
[00:07:54] this score z
[00:07:55] score z which is the dot product
[00:07:57] which is the dot product and
[00:07:58] and this is h1
[00:08:00] this is h1 and this is just a step function so the
[00:08:03] and this is just a step function so the step function or threshold function is
[00:08:05] step function or threshold function is just
[00:08:06] just whether d is greater than zero it's one
[00:08:08] whether d is greater than zero it's one over here and zero over
[00:08:10] over here and zero over here okay so now if you try to degrade
[00:08:13] here okay so now if you try to degrade and descend on this uh you're just going
[00:08:15] and descend on this uh you're just going to get stuck because the gradients are
[00:08:16] to get stuck because the gradients are going to be 0 basically everywhere
[00:08:20] going to be 0 basically everywhere so the solution is to replace this
[00:08:22] so the solution is to replace this threshold function with a more general
[00:08:24] threshold function with a more general activation function sigma
[00:08:26] activation function sigma which has more friendly gradients
[00:08:30] which has more friendly gradients so classically and by classic i mean
[00:08:33] so classically and by classic i mean like in the 80s and 90s people use the
[00:08:37] like in the 80s and 90s people use the logistic function
[00:08:38] logistic function uh as activation function which looks
[00:08:41] uh as activation function which looks like this
[00:08:43] like this and this is just a kind of a smooth out
[00:08:45] and this is just a kind of a smooth out version of the threshold function
[00:08:48] version of the threshold function and in particular its gradients are
[00:08:50] and in particular its gradients are zero uh nowhere so that's just great so
[00:08:54] zero uh nowhere so that's just great so the gradient you can always move in
[00:08:56] the gradient you can always move in progress
[00:08:57] progress there is a caveat here which is that
[00:09:00] there is a caveat here which is that if you look out here this this function
[00:09:02] if you look out here this this function is pretty flat which means that the
[00:09:04] is pretty flat which means that the gradient is actually approaching zero
[00:09:06] gradient is actually approaching zero which means that if you're out here then
[00:09:08] which means that if you're out here then you can get stuck or at least make very
[00:09:11] you can get stuck or at least make very slow progress
[00:09:14] slow progress so
[00:09:14] so in 2012 the value activation was
[00:09:18] in 2012 the value activation was invented
[00:09:19] invented which just takes a max of x as z and
[00:09:22] which just takes a max of x as z and zero so that looks like this
[00:09:25] zero so that looks like this so
[00:09:26] so if
[00:09:27] if the inputs to the value is less than a
[00:09:29] the inputs to the value is less than a zero then i'm just going to keep it clip
[00:09:31] zero then i'm just going to keep it clip it to zero
[00:09:32] it to zero and then otherwise i'm going to just
[00:09:34] and then otherwise i'm going to just leave it alone
[00:09:36] leave it alone so now this function actually has
[00:09:39] so now this function actually has nice gradients over here so the green
[00:09:41] nice gradients over here so the green never vanishes it's always you know
[00:09:44] never vanishes it's always you know positive and bound away from zero
[00:09:48] positive and bound away from zero although over here it is zero
[00:09:51] although over here it is zero so it turns out empirically the value
[00:09:53] so it turns out empirically the value activation function works really well
[00:09:55] activation function works really well it's simpler in a lot of ways so it's
[00:09:57] it's simpler in a lot of ways so it's kind of become the activation function
[00:09:59] kind of become the activation function of choice here
[00:10:02] of choice here so
[00:10:02] so um the solution here is to replace this
[00:10:06] um the solution here is to replace this threshold step function with an
[00:10:07] threshold step function with an activation function choose your favorite
[00:10:10] activation function choose your favorite i would choose the value and now you
[00:10:12] i would choose the value and now you have uh something that has
[00:10:16] have uh something that has non-vanishing gradients
[00:10:21] so
[00:10:22] so let's now define two layer neural
[00:10:24] let's now define two layer neural networks using the machinery that we've
[00:10:27] networks using the machinery that we've so far
[00:10:28] so far okay so we're gonna define some
[00:10:30] okay so we're gonna define some intermediate set problems
[00:10:33] intermediate set problems so we start with a feature vector
[00:10:35] so we start with a feature vector v of x now i'm going to represent
[00:10:38] v of x now i'm going to represent vectors and matrices using these dots
[00:10:41] vectors and matrices using these dots so this is a six dimensional
[00:10:44] so this is a six dimensional feature vector but in general it's d
[00:10:46] feature vector but in general it's d dimensional
[00:10:47] dimensional um i'm going to next multiply it by this
[00:10:50] um i'm going to next multiply it by this weight matrix
[00:10:51] weight matrix which is going to be a
[00:10:53] which is going to be a three by six but in general a k by d
[00:10:55] three by six but in general a k by d matrix
[00:10:57] matrix and now that generates a three
[00:10:58] and now that generates a three three-dimensional or k-dimensional
[00:11:01] three-dimensional or k-dimensional vector i'm going to send it through this
[00:11:04] vector i'm going to send it through this non-linearity
[00:11:05] non-linearity activation function like the value or
[00:11:07] activation function like the value or the logistic
[00:11:08] the logistic and we're going to get
[00:11:10] and we're going to get a vector which i'm going to call h of x
[00:11:14] a vector which i'm going to call h of x okay so now given this h of x
[00:11:17] okay so now given this h of x i can now do prediction by taking h of x
[00:11:20] i can now do prediction by taking h of x and simply dot producting it with
[00:11:23] and simply dot producting it with a wave vector w
[00:11:26] a wave vector w and if i take the sign
[00:11:28] and if i take the sign and that gives me the prediction of that
[00:11:32] and that gives me the prediction of that neural network
[00:11:34] neural network so one thing that's kind of interesting
[00:11:36] so one thing that's kind of interesting here is that if you look at this
[00:11:37] here is that if you look at this equation
[00:11:38] equation it really pretty much looks like the
[00:11:40] it really pretty much looks like the equation for a linear classifier the
[00:11:43] equation for a linear classifier the only difference is now we have h of x
[00:11:45] only difference is now we have h of x instead of v of x
[00:11:47] instead of v of x so one way to interpret what neural
[00:11:49] so one way to interpret what neural networks are doing is that
[00:11:52] networks are doing is that instead of using the original feature
[00:11:54] instead of using the original feature vector we've kind of learned a smarter
[00:11:57] vector we've kind of learned a smarter representation and at the end of the day
[00:12:00] representation and at the end of the day we're still doing a linear
[00:12:01] we're still doing a linear classification on top of that feature
[00:12:03] classification on top of that feature representation so you can often people
[00:12:05] representation so you can often people think about neural networks as doing
[00:12:06] think about neural networks as doing feature learning for precisely this
[00:12:08] feature learning for precisely this reason
[00:12:10] reason and finally
[00:12:12] and finally now we can define the hypothesis class f
[00:12:14] now we can define the hypothesis class f is equal to set of all predictors
[00:12:18] is equal to set of all predictors and the predictor is given parameterized
[00:12:20] and the predictor is given parameterized by
[00:12:21] by a weight matrix v and of weight vector w
[00:12:26] defined up here
[00:12:28] defined up here and we can let the weight matrix be any
[00:12:31] and we can let the weight matrix be any arbitrary k by d
[00:12:33] arbitrary k by d matrix
[00:12:35] matrix and we let w be any d-dimensional
[00:12:39] and we let w be any d-dimensional vector
[00:12:40] vector sorry this d should actually be okay
[00:12:42] sorry this d should actually be okay there i will fix that
[00:12:45] there i will fix that okay
[00:12:46] okay so
[00:12:47] so we have to find a hypothesis class that
[00:12:49] we have to find a hypothesis class that corresponds to two layer neural networks
[00:12:51] corresponds to two layer neural networks for classification
[00:12:55] for classification now we can kind of push this farther
[00:12:58] now we can kind of push this farther we can go and talk about deep neural
[00:12:59] we can go and talk about deep neural networks so remember going back to
[00:13:02] networks so remember going back to single layer neural networks aka linear
[00:13:04] single layer neural networks aka linear predictors we see that we take the
[00:13:06] predictors we see that we take the feature vector we take the dot product
[00:13:09] feature vector we take the dot product with respect to a weight vector and you
[00:13:11] with respect to a weight vector and you get the score which can be used to drive
[00:13:13] get the score which can be used to drive prediction directly in the regression or
[00:13:16] prediction directly in the regression or take the sign to get classification
[00:13:18] take the sign to get classification predictions um for two layer neural
[00:13:21] predictions um for two layer neural networks we take phi of x we take the
[00:13:23] networks we take phi of x we take the dot product between
[00:13:25] dot product between layer one's uh weight matrix
[00:13:27] layer one's uh weight matrix take element y's uh
[00:13:29] take element y's uh activation function
[00:13:31] activation function and then multiply dot product with a
[00:13:34] and then multiply dot product with a weight vector you get the score
[00:13:36] weight vector you get the score and now
[00:13:37] and now the key thing is that this piece v
[00:13:41] the key thing is that this piece v apply v and then apply sigma you can
[00:13:44] apply v and then apply sigma you can just iterate over and over again so
[00:13:46] just iterate over and over again so here's a three-layer neural network
[00:13:49] here's a three-layer neural network big phi of x
[00:13:50] big phi of x which is a feature vector you multiply
[00:13:52] which is a feature vector you multiply by some matrix v one take a
[00:13:55] by some matrix v one take a non-linearity
[00:13:56] non-linearity multiply by another matrix take a
[00:13:59] multiply by another matrix take a non-linearity
[00:14:00] non-linearity and then finally you get some
[00:14:03] and then finally you get some vector that you take the dot product
[00:14:05] vector that you take the dot product with w and you get the score which can
[00:14:07] with w and you get the score which can be used to power your predictions
[00:14:11] be used to power your predictions so one small note is that i've left out
[00:14:13] so one small note is that i've left out all the bias terms
[00:14:15] all the bias terms for notational simplicity in practice
[00:14:17] for notational simplicity in practice you would have
[00:14:18] you would have you know biased terms
[00:14:21] you know biased terms okay and you can imagine just iterating
[00:14:23] okay and you can imagine just iterating this um
[00:14:24] this um over and over again but you know what is
[00:14:27] over and over again but you know what is this doing it it's kind of looks like a
[00:14:29] this doing it it's kind of looks like a little bit of abstract nonsense you're
[00:14:31] little bit of abstract nonsense you're just multiplying by matrices and
[00:14:33] just multiplying by matrices and ascending through non-linearity and you
[00:14:34] ascending through non-linearity and you hope something good happens
[00:14:36] hope something good happens and you know that's not com there's not
[00:14:39] and you know that's not com there's not uh completely false but there are some
[00:14:42] uh completely false but there are some intuitions which we can derive
[00:14:45] intuitions which we can derive so one intuition is thinking about
[00:14:47] so one intuition is thinking about layers as representing multiple
[00:14:49] layers as representing multiple levels of abstraction
[00:14:51] levels of abstraction so in computer vision let's say the
[00:14:54] so in computer vision let's say the input
[00:14:54] input is
[00:14:55] is an image
[00:14:57] an image so you can think about the first layer
[00:14:59] so you can think about the first layer as computing some sort of notion of
[00:15:02] as computing some sort of notion of edges
[00:15:03] edges and the second layer when you multiply
[00:15:05] and the second layer when you multiply matrix and you take a non-linearity you
[00:15:07] matrix and you take a non-linearity you compute some notion of object parts
[00:15:10] compute some notion of object parts and then the third layer you
[00:15:13] and then the third layer you um
[00:15:15] um multiply by matrix and apply some
[00:15:16] multiply by matrix and apply some nonlinearity you get some notion of
[00:15:19] nonlinearity you get some notion of objects
[00:15:21] objects now this is kind of just a story and we
[00:15:23] now this is kind of just a story and we haven't talked at all about learning so
[00:15:26] haven't talked at all about learning so this is definitely not true for all
[00:15:27] this is definitely not true for all neural networks
[00:15:28] neural networks it turns out that
[00:15:30] it turns out that when you actually learn a network to
[00:15:32] when you actually learn a network to data and you visualize what the weights
[00:15:34] data and you visualize what the weights are you actually do get some
[00:15:36] are you actually do get some interpretable results which is kind of
[00:15:38] interpretable results which is kind of interesting and you know somewhat
[00:15:39] interesting and you know somewhat surprising
[00:15:42] so now there's a question of uh depth
[00:15:45] so now there's a question of uh depth so the fact that you take a feature
[00:15:47] so the fact that you take a feature vector and you apply
[00:15:50] vector and you apply some sort of transformation again and
[00:15:53] some sort of transformation again and again again to get a score
[00:15:55] again again to get a score so why why do we do this so one
[00:15:58] so why why do we do this so one intuition that we talked about already
[00:16:00] intuition that we talked about already is this is representing different levels
[00:16:02] is this is representing different levels of abstraction to kind of low level
[00:16:05] of abstraction to kind of low level pixels to high level object parts and
[00:16:07] pixels to high level object parts and objects
[00:16:09] objects another way to think about this is this
[00:16:11] another way to think about this is this is performing multiple steps of
[00:16:12] is performing multiple steps of computation
[00:16:13] computation just like in a classic program if you
[00:16:15] just like in a classic program if you get more steps of computation it gives
[00:16:18] get more steps of computation it gives you more expressive power you can do
[00:16:19] you more expressive power you can do more things
[00:16:21] more things you can think about each of these
[00:16:23] you can think about each of these operations is simply doing some compute
[00:16:27] operations is simply doing some compute now it's it's maybe a kind of a foreign
[00:16:29] now it's it's maybe a kind of a foreign type of compute because you're
[00:16:30] type of compute because you're multiplying by a crazy unknown matrix
[00:16:33] multiplying by a crazy unknown matrix but
[00:16:34] but what way we can think about this is that
[00:16:36] what way we can think about this is that you set up this compute computation and
[00:16:39] you set up this compute computation and learning algorithm is going to figure
[00:16:40] learning algorithm is going to figure out what kind of computation makes sense
[00:16:43] out what kind of computation makes sense for making the best prediction
[00:16:46] for making the best prediction um another piece of intuition is that
[00:16:47] um another piece of intuition is that empirically it just happens to work
[00:16:49] empirically it just happens to work really well
[00:16:50] really well which is not not to be understated
[00:16:53] which is not not to be understated um if you're
[00:16:55] um if you're looking for a more
[00:16:56] looking for a more theoretical reason
[00:16:58] theoretical reason um
[00:16:59] um there are
[00:17:00] there are the jury's kind of still out on this you
[00:17:03] the jury's kind of still out on this you can have uh intuitions how um
[00:17:07] can have uh intuitions how um you know deeper logical circuits can
[00:17:09] you know deeper logical circuits can capture more than shallower ones but
[00:17:11] capture more than shallower ones but then there's like the kind of
[00:17:13] then there's like the kind of relationship between circuits and neural
[00:17:14] relationship between circuits and neural networks which is
[00:17:16] networks which is requires a little bit of massaging so
[00:17:18] requires a little bit of massaging so this is still kind of a pretty active
[00:17:19] this is still kind of a pretty active area of research
[00:17:23] so to summarize we start out with the
[00:17:25] so to summarize we start out with the very toy problem the xor problem testing
[00:17:28] very toy problem the xor problem testing whether two cars are going to collide or
[00:17:30] whether two cars are going to collide or not and we used it to motivate problem
[00:17:32] not and we used it to motivate problem decomposition and eventually defining uh
[00:17:36] decomposition and eventually defining uh neural networks
[00:17:39] we saw that
[00:17:41] we saw that intuitively neural networks allow you to
[00:17:44] intuitively neural networks allow you to define nonlinear predictors but in this
[00:17:46] define nonlinear predictors but in this particular way
[00:17:49] particular way and the way is to decompose the original
[00:17:51] and the way is to decompose the original problem into intermediate sub problems
[00:17:54] problem into intermediate sub problems testing of the cars to the far right or
[00:17:56] testing of the cars to the far right or the far left
[00:17:57] the far left and then combining them
[00:17:59] and then combining them over time
[00:18:00] over time and you can kind of take this idea
[00:18:02] and you can kind of take this idea further and iterate on this
[00:18:04] further and iterate on this decomposition multiple times giving rise
[00:18:06] decomposition multiple times giving rise to multiple levels of abstraction
[00:18:08] to multiple levels of abstraction multiple steps of computation
[00:18:11] multiple steps of computation a hypothesis class is now
[00:18:14] a hypothesis class is now larger it contains all predictors where
[00:18:18] larger it contains all predictors where the weights of all the layers can vary
[00:18:20] the weights of all the layers can vary um freely
[00:18:23] um freely in the next up we're going to show you
[00:18:25] in the next up we're going to show you how to actually learn the weights of a
[00:18:27] how to actually learn the weights of a neural network
[00:18:28] neural network that is the end


================================================================================
LECTURE 012
================================================================================

Machine Learning 9 - Backpropagation | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=OcAF-l2xB9Y

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about the back propagation algorithm for
[00:00:09] about the back propagation algorithm for computing gradients automatically
[00:00:11] computing gradients automatically it's generally associated with training
[00:00:13] it's generally associated with training neural networks but it's actually a far
[00:00:15] neural networks but it's actually a far more general algorithm
[00:00:18] so let's begin with our motivating
[00:00:20] so let's begin with our motivating example which is suppose we're doing
[00:00:22] example which is suppose we're doing regression with a four layer neural
[00:00:24] regression with a four layer neural network
[00:00:25] network so remember that we
[00:00:27] so remember that we do the loss on a given example
[00:00:30] do the loss on a given example the loss with respect to a particular
[00:00:32] the loss with respect to a particular example
[00:00:33] example and now we have the weights
[00:00:36] and now we have the weights of the network
[00:00:38] of the network v1 v2 v3 and w
[00:00:41] v1 v2 v3 and w is equal to
[00:00:43] is equal to and remember the form of the neural
[00:00:44] and remember the form of the neural network you start with a feature vector
[00:00:48] network you start with a feature vector you multiply it by some weight matrix
[00:00:51] you multiply it by some weight matrix that gives you a vector you send it
[00:00:53] that gives you a vector you send it through the activation function
[00:00:56] through the activation function repeatedly apply it apply a
[00:00:58] repeatedly apply it apply a matrix
[00:00:59] matrix send it through an activation function
[00:01:01] send it through an activation function um the left multiplied by a matrix
[00:01:03] um the left multiplied by a matrix center through an activation function
[00:01:05] center through an activation function you take the final vector you take the
[00:01:07] you take the final vector you take the dot product with respect to the final
[00:01:09] dot product with respect to the final weight vector and that gives you your
[00:01:12] weight vector and that gives you your score
[00:01:13] score so this is your prediction
[00:01:16] so this is your prediction subtract
[00:01:18] subtract the target value and you square it that
[00:01:21] the target value and you square it that gives you your loss
[00:01:24] gives you your loss so now if you wanted to train this
[00:01:26] so now if you wanted to train this neural network using stochastic gradient
[00:01:28] neural network using stochastic gradient descent you would need to
[00:01:31] descent you would need to compute the gradient of this loss
[00:01:33] compute the gradient of this loss function with respect to each of the
[00:01:35] function with respect to each of the parameters so for example
[00:01:38] parameters so for example would compute the gradient of the loss
[00:01:40] would compute the gradient of the loss with respect to v1 that gives you a
[00:01:42] with respect to v1 that gives you a gradient update which you can then use
[00:01:44] gradient update which you can then use to update v1 same with v2
[00:01:47] to update v1 same with v2 3
[00:01:48] 3 and w
[00:01:50] and w so now you can sit down with this lovely
[00:01:53] so now you can sit down with this lovely expression
[00:01:54] expression and you can just grind through the map
[00:01:56] and you can just grind through the map and get the expressions it's
[00:01:58] and get the expressions it's straightforward but it's rather tedious
[00:02:01] straightforward but it's rather tedious another question is how can you get the
[00:02:03] another question is how can you get the gradients without all doing all this
[00:02:05] gradients without all doing all this manual work
[00:02:09] so the answer to that is computation
[00:02:12] so the answer to that is computation graphs
[00:02:13] graphs so here is our loss function again
[00:02:16] so here is our loss function again and what we're going to do is
[00:02:17] and what we're going to do is write down the computation graph for
[00:02:20] write down the computation graph for this mathematics computation graph is a
[00:02:23] this mathematics computation graph is a direct acyclic graph whose root node
[00:02:25] direct acyclic graph whose root node represents the final expression this
[00:02:28] represents the final expression this loss function
[00:02:29] loss function and each node
[00:02:32] and each node represents intermediate sub expressions
[00:02:34] represents intermediate sub expressions like one v of x for example
[00:02:38] like one v of x for example now what this computation graph is going
[00:02:40] now what this computation graph is going to allow us to do is
[00:02:42] to allow us to do is allows us to apply the back propagation
[00:02:44] allows us to apply the back propagation algorithm to the computation graph and
[00:02:46] algorithm to the computation graph and automatically get gradients out
[00:02:50] automatically get gradients out so there's two purposes actually that
[00:02:53] so there's two purposes actually that we're going to do this the first is
[00:02:56] we're going to do this the first is computing the gradients automatically
[00:02:58] computing the gradients automatically and this is how deep learning packages
[00:03:00] and this is how deep learning packages like tensorflow and pi torch work behind
[00:03:03] like tensorflow and pi torch work behind the hood
[00:03:04] the hood and second more of all
[00:03:06] and second more of all we're going to use this as a tool to
[00:03:08] we're going to use this as a tool to gain insight into the modular structure
[00:03:11] gain insight into the modular structure of the gradients and try to demystify
[00:03:14] of the gradients and try to demystify because taking gradients by hand you can
[00:03:15] because taking gradients by hand you can lead it into situations where you just
[00:03:17] lead it into situations where you just have a lot of symbols
[00:03:19] have a lot of symbols but using a graph we can start to
[00:03:23] but using a graph we can start to see the structure
[00:03:26] okay so our starting point is
[00:03:29] okay so our starting point is to think about functions as boxes so
[00:03:32] to think about functions as boxes so imagine you have this
[00:03:34] imagine you have this expression a plus b
[00:03:36] expression a plus b and that gives rise to some variable c
[00:03:39] and that gives rise to some variable c so i'm going to represent this as a very
[00:03:42] so i'm going to represent this as a very simple computation graph where you have
[00:03:44] simple computation graph where you have a
[00:03:45] a and b
[00:03:47] and b and these arrows point into this
[00:03:49] and these arrows point into this box that does plus
[00:03:52] box that does plus and the result will be labeled as c here
[00:03:56] and the result will be labeled as c here okay
[00:03:57] okay so
[00:03:58] so now the question is
[00:04:00] now the question is if i change
[00:04:03] if i change a or b by a small amount how much does c
[00:04:06] a or b by a small amount how much does c change oh this is just a notion of a
[00:04:09] change oh this is just a notion of a gradient
[00:04:11] gradient so informally we can look at this as
[00:04:14] so informally we can look at this as a
[00:04:16] a plus b equals c
[00:04:18] plus b equals c now if i go and fiddle
[00:04:21] now if i go and fiddle with a a little bit i add epsilon
[00:04:24] with a a little bit i add epsilon so what happens to the right hand side
[00:04:27] so what happens to the right hand side well on the right hand side i just get
[00:04:29] well on the right hand side i just get plus one epsilon
[00:04:32] plus one epsilon and what i'm gonna do so the gradient of
[00:04:35] and what i'm gonna do so the gradient of c with respect to a
[00:04:37] c with respect to a is one
[00:04:38] is one i'm just gonna write it on that
[00:04:42] i'm just gonna write it on that so this can be
[00:04:43] so this can be interpreted as kind of amplification or
[00:04:45] interpreted as kind of amplification or gain if i move a by l a little bit this
[00:04:49] gain if i move a by l a little bit this is kind of the multiplicative factor
[00:04:50] is kind of the multiplicative factor that c needs to be by
[00:04:54] that c needs to be by so let's do the other side so a
[00:04:57] so let's do the other side so a uh plus b and you add a bit of noise to
[00:05:00] uh plus b and you add a bit of noise to b
[00:05:00] b and again you get one plus epsilon so
[00:05:04] and again you get one plus epsilon so the gradient of c with respect to b is
[00:05:07] the gradient of c with respect to b is one as well
[00:05:09] one as well here's another example of c equals a
[00:05:11] here's another example of c equals a times b
[00:05:12] times b so as a computation graph a
[00:05:15] so as a computation graph a and b
[00:05:16] and b go into this box which uh takes the dot
[00:05:19] go into this box which uh takes the dot product
[00:05:20] product and you get c
[00:05:22] and you get c so
[00:05:24] so what happens uh
[00:05:27] what happens uh when you add epsilon noise to a
[00:05:30] when you add epsilon noise to a a plus epsilon times b
[00:05:32] a plus epsilon times b is equal to c
[00:05:33] is equal to c plus
[00:05:34] plus and now you have b epsilon coming out so
[00:05:38] and now you have b epsilon coming out so therefore the gradient of c with respect
[00:05:40] therefore the gradient of c with respect to a is now b
[00:05:43] to a is now b and analogously
[00:05:45] and analogously we add the voice to b and we see that
[00:05:48] we add the voice to b and we see that the
[00:05:49] the the contribution to the output c is a
[00:05:53] the contribution to the output c is a times epsilon a epsilon
[00:05:56] times epsilon a epsilon okay therefore the gradient over here is
[00:05:59] okay therefore the gradient over here is a
[00:06:01] a okay so this all should be kind of
[00:06:03] okay so this all should be kind of familiar i just cast the sum and product
[00:06:06] familiar i just cast the sum and product rules for differentiation
[00:06:08] rules for differentiation in graphical form
[00:06:11] so let's do a few more kind of small
[00:06:13] so let's do a few more kind of small examples these small examples are going
[00:06:15] examples these small examples are going to be the building blocks it turns out
[00:06:18] to be the building blocks it turns out that you can take these building blocks
[00:06:19] that you can take these building blocks and compose them to build all sorts of
[00:06:22] and compose them to build all sorts of more complicated functions
[00:06:24] more complicated functions so here's the example we saw before a
[00:06:27] so here's the example we saw before a plus b
[00:06:28] plus b and the gradients are one and one
[00:06:30] and the gradients are one and one a minus b
[00:06:33] a minus b the gradients are 1 and minus 1 because
[00:06:35] the gradients are 1 and minus 1 because if you add epsilon to b
[00:06:38] if you add epsilon to b then
[00:06:38] then this difference is going to go down by
[00:06:41] this difference is going to go down by epsilon
[00:06:42] epsilon here we saw this example a
[00:06:44] here we saw this example a times b
[00:06:46] times b and the gradients are b and a
[00:06:48] and the gradients are b and a um if you look at the squared function a
[00:06:51] um if you look at the squared function a squared
[00:06:53] squared the gradient with respect to input is 2a
[00:06:57] the gradient with respect to input is 2a so
[00:06:58] so the kind of power rule
[00:07:00] the kind of power rule um let's consider a and b
[00:07:03] um let's consider a and b where you take the max
[00:07:05] where you take the max okay so this one
[00:07:06] okay so this one uh let's think about this so if i add a
[00:07:09] uh let's think about this so if i add a little bit of
[00:07:11] little bit of add epsilon to a
[00:07:12] add epsilon to a how's that going to change the max
[00:07:15] how's that going to change the max well
[00:07:16] well if a is greater than b
[00:07:19] if a is greater than b then it's just going to
[00:07:21] then it's just going to change the max by epsilon
[00:07:24] change the max by epsilon but if a is less than b then it's going
[00:07:26] but if a is less than b then it's going to be
[00:07:27] to be 0 because b is going to be
[00:07:31] 0 because b is going to be um so the gradient
[00:07:32] um so the gradient of this max of a and b with respect to a
[00:07:35] of this max of a and b with respect to a is the indicator function of whether a
[00:07:38] is the indicator function of whether a is greater than b or not
[00:07:41] is greater than b or not and symmetrically
[00:07:42] and symmetrically the gradient with respect to b is
[00:07:44] the gradient with respect to b is whether b is greater than a
[00:07:47] whether b is greater than a okay
[00:07:49] okay and finally here is the logistic
[00:07:51] and finally here is the logistic function
[00:07:53] function a
[00:07:54] a uh sending it through this logistic
[00:07:55] uh sending it through this logistic function
[00:07:56] function and a little bit of algebra which i'll
[00:07:58] and a little bit of algebra which i'll spare you of produces that's actually a
[00:08:01] spare you of produces that's actually a quite elegant expression which is sigma
[00:08:03] quite elegant expression which is sigma times one minus
[00:08:07] and you can kind of check
[00:08:08] and you can kind of check that as a goes to
[00:08:11] that as a goes to infinity or minus infinity
[00:08:14] infinity or minus infinity remember the the sigmoid is going to
[00:08:17] remember the the sigmoid is going to saturate at one or zero which means this
[00:08:20] saturate at one or zero which means this gradient is actually going to go to zero
[00:08:22] gradient is actually going to go to zero so that's just a simple sanity check
[00:08:25] so that's just a simple sanity check okay so these are the basic building
[00:08:27] okay so these are the basic building blocks
[00:08:28] blocks and that's really all the the kind of
[00:08:31] and that's really all the the kind of the um
[00:08:33] the um uh brute force differentiation that
[00:08:35] uh brute force differentiation that we're gonna really do the rest is just
[00:08:37] we're gonna really do the rest is just composition
[00:08:40] so so now we take these building blocks
[00:08:43] so so now we take these building blocks and we put them together so here's a
[00:08:45] and we put them together so here's a simple example so suppose you take a
[00:08:48] simple example so suppose you take a square you get b
[00:08:49] square you get b and then you take b and you square it
[00:08:51] and then you take b and you square it and you get c
[00:08:53] and you get c so
[00:08:54] so by the
[00:08:56] by the building blocks from the previous slide
[00:08:57] building blocks from the previous slide we know that the gradient
[00:09:00] we know that the gradient on this edge is going to be 2
[00:09:02] on this edge is going to be 2 times input here which is b
[00:09:05] times input here which is b and the gradient along this edge is
[00:09:07] and the gradient along this edge is going to be 2 times
[00:09:08] going to be 2 times a
[00:09:10] a okay so now using ts2 we can apply
[00:09:14] okay so now using ts2 we can apply chain rule from calculus to compute the
[00:09:17] chain rule from calculus to compute the gradient
[00:09:18] gradient of c with respect to a
[00:09:21] of c with respect to a and this is going to be nothing more
[00:09:23] and this is going to be nothing more than just the product of those two
[00:09:25] than just the product of those two quantities
[00:09:27] quantities so in this case we get 2b times 2a
[00:09:30] so in this case we get 2b times 2a and remember that
[00:09:32] and remember that b is equal to a squared
[00:09:35] b is equal to a squared subset that 2 that in you get 4 8 cubed
[00:09:40] subset that 2 that in you get 4 8 cubed and remember
[00:09:42] and remember c is a to the fourth so we can verify
[00:09:45] c is a to the fourth so we can verify that this is indeed consistent with
[00:09:47] that this is indeed consistent with using the power rule
[00:09:49] using the power rule okay
[00:09:50] okay so in general
[00:09:52] so in general you can take
[00:09:54] you can take compute these gradients by simply taking
[00:09:56] compute these gradients by simply taking the product along edges
[00:09:59] the product along edges and that's going to be really useful
[00:10:01] and that's going to be really useful um
[00:10:02] um on this slide okay so now let's turn to
[00:10:05] on this slide okay so now let's turn to our first example
[00:10:07] our first example um
[00:10:08] um the hinge loss for linear classification
[00:10:10] the hinge loss for linear classification we actually did this one before but i
[00:10:12] we actually did this one before but i just want to do it again through the
[00:10:14] just want to do it again through the lens of computation graph so here is a
[00:10:16] lens of computation graph so here is a loss function
[00:10:18] loss function and given this loss function i'm going
[00:10:20] and given this loss function i'm going to construct the computation graph and
[00:10:22] to construct the computation graph and then compute the gradient
[00:10:24] then compute the gradient of the loss with respect to w
[00:10:27] of the loss with respect to w so working bottom up we have
[00:10:29] so working bottom up we have the weight
[00:10:30] the weight vector and we have the future vector and
[00:10:33] vector and we have the future vector and we take the dot product that gives us a
[00:10:36] we take the dot product that gives us a score
[00:10:38] score we take the score we take y and multiply
[00:10:40] we take the score we take y and multiply them together and that gives us the
[00:10:42] them together and that gives us the margin
[00:10:43] margin 1
[00:10:44] 1 minus the margin
[00:10:47] minus the margin and you take the max of that
[00:10:49] and you take the max of that and 0 and you get the loss
[00:10:52] and 0 and you get the loss so another nice thing about the
[00:10:54] so another nice thing about the computation graph is it allows you to
[00:10:56] computation graph is it allows you to annotate these sub expressions and see
[00:10:58] annotate these sub expressions and see how the computation and what the pieces
[00:11:02] how the computation and what the pieces are
[00:11:04] are okay so now let us compute the gradient
[00:11:07] okay so now let us compute the gradient of the loss with respect to w
[00:11:11] of the loss with respect to w and what i'm going to do here
[00:11:13] and what i'm going to do here is all i need to do is compute the
[00:11:15] is all i need to do is compute the gradients along all these edges
[00:11:18] gradients along all these edges from
[00:11:19] from loss down to w
[00:11:22] loss down to w okay so let's begin at the top here
[00:11:25] okay so let's begin at the top here so the gradient width and oh here is our
[00:11:27] so the gradient width and oh here is our cheat sheet
[00:11:28] cheat sheet don't forget the cheat sheet
[00:11:30] don't forget the cheat sheet so we just now pattern match here's a
[00:11:32] so we just now pattern match here's a max over two things well what's on this
[00:11:35] max over two things well what's on this edge it's first thing
[00:11:37] edge it's first thing greater than second thing okay so
[00:11:40] greater than second thing okay so the gradient here is going to be first
[00:11:42] the gradient here is going to be first thing which is one minus margin greater
[00:11:44] thing which is one minus margin greater than the second thing which is zero
[00:11:47] than the second thing which is zero and now what about this edge so here is
[00:11:50] and now what about this edge so here is minus one so let's have minus one
[00:11:53] minus one so let's have minus one what about this uh times so times is the
[00:11:57] what about this uh times so times is the second input second input
[00:12:00] second input second input and the second input here is y
[00:12:03] and the second input here is y and here's another times the second
[00:12:05] and here's another times the second input is f of x
[00:12:08] input is f of x so
[00:12:09] so this allows us to think about the
[00:12:11] this allows us to think about the gradients one piece at a time and all
[00:12:14] gradients one piece at a time and all the little edges are just
[00:12:16] the little edges are just implications of this cheat sheet here
[00:12:20] implications of this cheat sheet here okay now we're ready to read off the
[00:12:23] okay now we're ready to read off the gradient
[00:12:24] gradient of the loss with respect to w
[00:12:26] of the loss with respect to w and this is just going to be product of
[00:12:28] and this is just going to be product of all
[00:12:29] all the
[00:12:30] the edges
[00:12:31] edges okay so we have
[00:12:33] okay so we have first we have
[00:12:35] first we have one minus margin greater than 0 so
[00:12:38] one minus margin greater than 0 so that's
[00:12:39] that's i'm going to rewrite that as margin less
[00:12:41] i'm going to rewrite that as margin less than one
[00:12:42] than one verify that's the same thing
[00:12:44] verify that's the same thing but we have a minus sign here
[00:12:47] but we have a minus sign here that's a minus sign here
[00:12:49] that's a minus sign here and then we have
[00:12:50] and then we have y and then we have phi of x
[00:12:55] y and then we have phi of x you multiply them all together and
[00:12:57] you multiply them all together and that's the expression and you can verify
[00:12:59] that's the expression and you can verify that
[00:13:00] that this is indeed
[00:13:01] this is indeed the gradient of the loss function
[00:13:06] the gradient of the loss function so in summary
[00:13:07] so in summary we computed from uh computed the
[00:13:09] we computed from uh computed the computation graph
[00:13:10] computation graph and then we apply the this cheat sheet
[00:13:14] and then we apply the this cheat sheet to the individual edges and then you
[00:13:15] to the individual edges and then you just multiply them all together
[00:13:20] and just as another note
[00:13:22] and just as another note remember the gradient
[00:13:25] remember the gradient with respect to w is really
[00:13:28] with respect to w is really think about perturbations if you change
[00:13:30] think about perturbations if you change w by a little bit how much is a loss
[00:13:33] w by a little bit how much is a loss going to change
[00:13:35] going to change and the change is going to be the
[00:13:38] and the change is going to be the product of all these kind of
[00:13:39] product of all these kind of amplifications
[00:13:40] amplifications evaluated at a particular
[00:13:42] evaluated at a particular point
[00:13:45] all right so now let's do neural
[00:13:48] all right so now let's do neural networks now
[00:13:49] networks now um so this is not going to be really
[00:13:52] um so this is not going to be really anything new it's just going to be a
[00:13:53] anything new it's just going to be a different example
[00:13:55] different example so i'm going to do a two-layer layer
[00:13:58] so i'm going to do a two-layer layer neural networks
[00:14:00] neural networks and we're going to again build this
[00:14:02] and we're going to again build this computation graph up
[00:14:04] computation graph up so we have the feature vector
[00:14:06] so we have the feature vector you have the first layer weight matrix v
[00:14:09] you have the first layer weight matrix v you take the product
[00:14:12] you take the product and then you stick this through the
[00:14:14] and then you stick this through the activation function and we're going to
[00:14:16] activation function and we're going to label that h
[00:14:18] label that h which is the hidden uh
[00:14:20] which is the hidden uh hidden vector
[00:14:22] hidden vector and now we're going to take the dot
[00:14:24] and now we're going to take the dot product of w and h that gives you
[00:14:27] product of w and h that gives you the score
[00:14:29] the score and then now it's a score
[00:14:33] and then now it's a score minus y
[00:14:34] minus y is a residual and the residual square is
[00:14:37] is a residual and the residual square is a loss
[00:14:39] a loss okay
[00:14:39] okay another aside is that the computation
[00:14:42] another aside is that the computation graph really allows you to see visually
[00:14:45] graph really allows you to see visually this modularity so the part up here
[00:14:48] this modularity so the part up here is just
[00:14:50] is just the square loss
[00:14:51] the square loss and the part down here is anything
[00:14:54] and the part down here is anything any way of computing
[00:14:58] any way of computing any way of computing a score
[00:15:00] any way of computing a score so before we had a linear classifier a
[00:15:03] so before we had a linear classifier a class linear predictor now we have a
[00:15:06] class linear predictor now we have a two-layer neural network it could be a
[00:15:07] two-layer neural network it could be a four-layer neural network which
[00:15:09] four-layer neural network which this computation graph is just
[00:15:12] this computation graph is just okay so that's a computation graph now
[00:15:14] okay so that's a computation graph now let's uh
[00:15:16] let's uh to perform stochastic gradient descent
[00:15:18] to perform stochastic gradient descent we need to compute the gradient with
[00:15:19] we need to compute the gradient with respect to both w
[00:15:21] respect to both w and v
[00:15:23] and v okay so let's compute the gradient with
[00:15:25] okay so let's compute the gradient with respect to w of the loss
[00:15:28] respect to w of the loss and
[00:15:29] and what i'm going to do is
[00:15:32] what i'm going to do is look at the
[00:15:35] look at the the edges and could be the gradients
[00:15:37] the edges and could be the gradients okay so here's our cheat sheet um so
[00:15:41] okay so here's our cheat sheet um so okay what goes on this edge what's the
[00:15:43] okay what goes on this edge what's the gradient of the square
[00:15:45] gradient of the square this is just two times the input which
[00:15:47] this is just two times the input which is in this case two times residual
[00:15:50] is in this case two times residual um
[00:15:51] um what about this edge
[00:15:53] what about this edge so um
[00:15:55] so um minus so this should just be a one
[00:15:58] minus so this should just be a one on here
[00:15:59] on here and then what about this edge uh this is
[00:16:02] and then what about this edge uh this is just going to be the second input um
[00:16:05] just going to be the second input um right here so that is an h
[00:16:08] right here so that is an h okay
[00:16:09] okay so now multiply all these things
[00:16:11] so now multiply all these things together and you get the gradient
[00:16:13] together and you get the gradient of um
[00:16:15] of um the loss function with respect to w
[00:16:19] the loss function with respect to w okay
[00:16:22] okay all right um
[00:16:24] all right um so
[00:16:25] so one thing you can um kind of double
[00:16:28] one thing you can um kind of double check is that we did do the gradient of
[00:16:30] check is that we did do the gradient of the square lots for linear predictors
[00:16:33] the square lots for linear predictors and it was also two times the residual
[00:16:35] and it was also two times the residual times the feature vector instead of
[00:16:38] times the feature vector instead of h
[00:16:39] h and now we just have h which is a kind
[00:16:42] and now we just have h which is a kind of a stand-in for
[00:16:44] of a stand-in for the feature vector with
[00:16:46] the feature vector with as far as w is concerned
[00:16:49] as far as w is concerned so that's kind of a nice sanity check
[00:16:51] so that's kind of a nice sanity check all right so now let's do this more
[00:16:53] all right so now let's do this more complicated one so this we want to
[00:16:55] complicated one so this we want to compute the gradient with respect to v
[00:16:58] compute the gradient with respect to v of loss of
[00:17:00] of loss of all the arguments
[00:17:03] all the arguments and this equals
[00:17:05] and this equals um let's fill in all the edges um
[00:17:09] um let's fill in all the edges um so first of all notice that
[00:17:11] so first of all notice that these two
[00:17:13] these two uh edges are actually in common with
[00:17:15] uh edges are actually in common with this path
[00:17:16] this path so we can go ahead and write them down
[00:17:18] so we can go ahead and write them down so one cool thing about computation
[00:17:20] so one cool thing about computation graphs is it allows you to see
[00:17:23] graphs is it allows you to see the shared structure that the gradients
[00:17:26] the shared structure that the gradients are actually have themselves also have
[00:17:28] are actually have themselves also have common sub expressions
[00:17:31] common sub expressions okay so now we need to do more work here
[00:17:34] okay so now we need to do more work here so the gradient on this edge is going to
[00:17:37] so the gradient on this edge is going to be
[00:17:37] be the other input which is w
[00:17:40] the other input which is w um this is
[00:17:42] um this is sigma so the gradient is going to be
[00:17:44] sigma so the gradient is going to be sigma the input minus 1 minus
[00:17:47] sigma the input minus 1 minus sigma
[00:17:48] sigma so this is going to just be h
[00:17:51] so this is going to just be h times 1 minus h
[00:17:55] this
[00:17:57] this this hollow circle here represents the
[00:18:00] this hollow circle here represents the element product of a vector so you just
[00:18:03] element product of a vector so you just take two vectors and you multiply the
[00:18:04] take two vectors and you multiply the elements together
[00:18:06] elements together and this is because this
[00:18:08] and this is because this function is applying just element-wise
[00:18:12] function is applying just element-wise and then what about this final
[00:18:15] and then what about this final edge this is just going to be p of x
[00:18:18] edge this is just going to be p of x which is just other input
[00:18:20] which is just other input and now we can just multiply the rest of
[00:18:22] and now we can just multiply the rest of these things together so we have w
[00:18:25] these things together so we have w times
[00:18:26] times uh h times y minus h
[00:18:30] uh h times y minus h times a fee of x
[00:18:33] times a fee of x transpose
[00:18:36] transpose so there's a slight kind of uh annoyance
[00:18:39] so there's a slight kind of uh annoyance here because here we have v times v of x
[00:18:42] here because here we have v times v of x whereas before um there's no transpose
[00:18:45] whereas before um there's no transpose here because we just have w dot
[00:18:47] here because we just have w dot something and w dot is the same as w
[00:18:49] something and w dot is the same as w transpose okay so
[00:18:52] transpose okay so uh but the the high level is that
[00:18:55] uh but the the high level is that the product of all of these green pieces
[00:18:59] the product of all of these green pieces yields the gradient
[00:19:03] of the loss with respect to v
[00:19:06] of the loss with respect to v okay
[00:19:08] okay all right so that
[00:19:11] all right so that finishes up this example
[00:19:15] so now
[00:19:17] so now we have mainly used this graphical
[00:19:18] we have mainly used this graphical representation to visualize the
[00:19:20] representation to visualize the computation of function
[00:19:22] computation of function values using gradients
[00:19:25] values using gradients but you know the promise of back
[00:19:26] but you know the promise of back propagation is that we didn't have to do
[00:19:28] propagation is that we didn't have to do any of that at all i just did that to
[00:19:31] any of that at all i just did that to kind of illustrate
[00:19:32] kind of illustrate the the inner workings of
[00:19:35] the the inner workings of radiant computations on the computation
[00:19:38] radiant computations on the computation but now we're going to introduce a batch
[00:19:40] but now we're going to introduce a batch propagation algorithm which is a general
[00:19:42] propagation algorithm which is a general procedure for computing these gradients
[00:19:44] procedure for computing these gradients so we never have to worry about it
[00:19:47] so we never have to worry about it i'm going to do this back propagation
[00:19:50] i'm going to do this back propagation for a simple example which is just this
[00:19:53] for a simple example which is just this squared
[00:19:54] squared loss and linear regression
[00:19:58] loss and linear regression and one note is that
[00:20:00] and one note is that previously we've worked with uh
[00:20:03] previously we've worked with uh symbolic expressions but the actual
[00:20:07] symbolic expressions but the actual algorithm is going to operate on numbers
[00:20:09] algorithm is going to operate on numbers usually
[00:20:10] usually so what i'm going to do is work with a
[00:20:12] so what i'm going to do is work with a concrete example and walk the through
[00:20:15] concrete example and walk the through the back propagation algorithm with this
[00:20:17] the back propagation algorithm with this example
[00:20:19] example so the back propagation algorithm
[00:20:21] so the back propagation algorithm includes two steps a forward uh step and
[00:20:24] includes two steps a forward uh step and a backwards step
[00:20:26] a backwards step so in the forward step what we're going
[00:20:28] so in the forward step what we're going to do is we're going to compute a bunch
[00:20:30] to do is we're going to compute a bunch of forward values
[00:20:33] of forward values from the leaves to the root
[00:20:36] from the leaves to the root and each forward value is simply the
[00:20:38] and each forward value is simply the value of that sub expression rooted at i
[00:20:41] value of that sub expression rooted at i the value could be a scalar a vector or
[00:20:43] the value could be a scalar a vector or a matrix
[00:20:45] a matrix so let's walk through this example here
[00:20:47] so let's walk through this example here okay so at the leaves we have um w which
[00:20:51] okay so at the leaves we have um w which is three one
[00:20:52] is three one and we have the feature vector one two
[00:20:56] and we have the feature vector one two so now if you take um
[00:20:59] so now if you take um these two quantities
[00:21:01] these two quantities and you
[00:21:02] and you multiple take the dot product
[00:21:05] multiple take the dot product you get three plus two which is
[00:21:08] you get three plus two which is five
[00:21:10] five um and now you take the score five and
[00:21:13] um and now you take the score five and you take y
[00:21:15] you take y you subtract them
[00:21:17] you subtract them and you get the residual which is three
[00:21:21] and you get the residual which is three notice that the forward values of this
[00:21:23] notice that the forward values of this node is five and the four value of this
[00:21:25] node is five and the four value of this node is three
[00:21:27] node is three and now finally you
[00:21:29] and now finally you square this
[00:21:31] square this and the value of the square is uh
[00:21:34] and the value of the square is uh 3 squared which is 9.
[00:21:36] 3 squared which is 9. or value at this node is
[00:21:40] or value at this node is okay so now we're done with a forward
[00:21:42] okay so now we're done with a forward phase all we've done is evaluated
[00:21:44] phase all we've done is evaluated uh the loss
[00:21:46] uh the loss but importantly we have also remembered
[00:21:49] but importantly we have also remembered all the values along the way which will
[00:21:51] all the values along the way which will become come in handy
[00:21:54] become come in handy so now the backward step is we're going
[00:21:59] so now the backward step is we're going to compute
[00:22:00] to compute a backward value
[00:22:02] a backward value gi 1 for every node
[00:22:05] gi 1 for every node and this is going to be
[00:22:07] and this is going to be the gradient of the loss with respect to
[00:22:10] the gradient of the loss with respect to the value at that node if that node
[00:22:12] the value at that node if that node changes value how does the loss change
[00:22:16] changes value how does the loss change so the backward pass is going to compute
[00:22:18] so the backward pass is going to compute the values from the root to the leaves
[00:22:21] the values from the root to the leaves so let's do this first of this example
[00:22:24] so let's do this first of this example so the base case
[00:22:25] so the base case gradient of the loss with respect to
[00:22:27] gradient of the loss with respect to loss is one
[00:22:29] loss is one um
[00:22:30] um and now
[00:22:32] and now we
[00:22:33] we look at the gradient on this edge we did
[00:22:35] look at the gradient on this edge we did this before it's just two times the
[00:22:37] this before it's just two times the residual
[00:22:38] residual okay so now
[00:22:40] okay so now um we need to compute the backward value
[00:22:43] um we need to compute the backward value of this node
[00:22:45] of this node okay to do that
[00:22:47] okay to do that we're going to take
[00:22:48] we're going to take the backward value of the parent and
[00:22:50] the backward value of the parent and multiply whatever is on this edge what's
[00:22:54] multiply whatever is on this edge what's on this edge is two times the residual
[00:22:56] on this edge is two times the residual the residual is three
[00:22:58] the residual is three so it's two times three which is six
[00:23:01] so it's two times three which is six and so one times six is six
[00:23:04] and so one times six is six notice that in computing this backward
[00:23:06] notice that in computing this backward value i'm using
[00:23:08] value i'm using the intermediate computations for the
[00:23:10] the intermediate computations for the from the forward pass
[00:23:13] from the forward pass okay so let's continue
[00:23:15] okay so let's continue so the gradient on this edge is 1.
[00:23:19] so the gradient on this edge is 1. so backward value here is
[00:23:21] so backward value here is 6 which is the parent backward value
[00:23:24] 6 which is the parent backward value times what's on this edge which is 1
[00:23:26] times what's on this edge which is 1 that gives us 6.
[00:23:28] that gives us 6. and then the backward value of this node
[00:23:31] and then the backward value of this node is 6 times what's on this edge which is
[00:23:34] is 6 times what's on this edge which is this other input one two
[00:23:37] this other input one two and that gives us six comma
[00:23:41] so to conclude the back complication
[00:23:43] so to conclude the back complication algorithm takes these concrete values
[00:23:46] algorithm takes these concrete values this expression
[00:23:48] this expression and produces the gradient of the loss
[00:23:52] and produces the gradient of the loss with respect to w evaluated at these
[00:23:55] with respect to w evaluated at these concrete values and that's 6 comma 12.
[00:23:59] concrete values and that's 6 comma 12. okay
[00:24:00] okay and the math propagation algorithm
[00:24:01] and the math propagation algorithm remember works for any computation graph
[00:24:04] remember works for any computation graph four layer neural networks
[00:24:06] four layer neural networks much more complicated models but this is
[00:24:09] much more complicated models but this is just a simple example to show you the
[00:24:11] just a simple example to show you the dynamics of forward pass
[00:24:13] dynamics of forward pass and backward
[00:24:16] pass okay so now we have the back
[00:24:19] pass okay so now we have the back propagation algorithm we compute
[00:24:21] propagation algorithm we compute gradients we stick these gradients into
[00:24:23] gradients we stick these gradients into stochastic gradient descent and then we
[00:24:25] stochastic gradient descent and then we just
[00:24:26] just run scans the gradients in and then we
[00:24:28] run scans the gradients in and then we get some weights
[00:24:30] get some weights um so now one question is
[00:24:33] um so now one question is uh
[00:24:34] uh what do we get um so we
[00:24:36] what do we get um so we wanted to optimize the training loss
[00:24:39] wanted to optimize the training loss using stochastic gradient descent
[00:24:42] using stochastic gradient descent but running stochastic ingredients in
[00:24:44] but running stochastic ingredients in does it actually minimize this weight
[00:24:47] does it actually minimize this weight this is a little bit of a delicate uh
[00:24:49] this is a little bit of a delicate uh question here
[00:24:51] question here so
[00:24:52] so for linear predictors
[00:24:54] for linear predictors it turns out that the training loss for
[00:24:56] it turns out that the training loss for a complex loss is going to be a convex
[00:24:58] a complex loss is going to be a convex function
[00:24:59] function which
[00:25:01] which means that it is going to have a single
[00:25:04] means that it is going to have a single global
[00:25:06] global minimum
[00:25:08] minimum which means that if you start at some
[00:25:09] which means that if you start at some point and then you just follow your
[00:25:11] point and then you just follow your nodes by running gradient set with
[00:25:13] nodes by running gradient set with appropriate step size it's going to
[00:25:15] appropriate step size it's going to converge to the goal law optima
[00:25:19] converge to the goal law optima but for neural networks
[00:25:21] but for neural networks the training loss is non-complex
[00:25:25] the training loss is non-complex and which means that
[00:25:26] and which means that there's no guarantees at all that you're
[00:25:29] there's no guarantees at all that you're going to converge to the global minimum
[00:25:31] going to converge to the global minimum you're lucky you can convert to a local
[00:25:33] you're lucky you can convert to a local mill
[00:25:35] mill so optimization of neural networks is in
[00:25:37] so optimization of neural networks is in principle hard
[00:25:39] principle hard but
[00:25:40] but of course people do it anyway and you uh
[00:25:43] of course people do it anyway and you uh actually get some
[00:25:44] actually get some good results
[00:25:46] good results so that there's a gap between theory and
[00:25:48] so that there's a gap between theory and practice which is uh not quite
[00:25:50] practice which is uh not quite understood yet
[00:25:54] but in practice um
[00:25:57] but in practice um getting neuronal works to train properly
[00:25:59] getting neuronal works to train properly is a little bit of art so i think of it
[00:26:01] is a little bit of art so i think of it as kind of like driving stick there's
[00:26:03] as kind of like driving stick there's just a lot of
[00:26:04] just a lot of degrees of freedom you can stall and get
[00:26:07] degrees of freedom you can stall and get stuck but if you know what you're doing
[00:26:09] stuck but if you know what you're doing you can actually get a lot of good
[00:26:10] you can actually get a lot of good results okay
[00:26:12] results okay so here are some examples just to give
[00:26:15] so here are some examples just to give you a flavor of what needs to be done
[00:26:18] you a flavor of what needs to be done okay so here is a 200 network and here
[00:26:21] okay so here is a 200 network and here is the loss function
[00:26:23] is the loss function um the first is
[00:26:25] um the first is initialization matters so if you have a
[00:26:26] initialization matters so if you have a convex function wherever you initialize
[00:26:28] convex function wherever you initialize you run it for long enough you converge
[00:26:30] you run it for long enough you converge to that global optimum
[00:26:32] to that global optimum for a non-convex function if you
[00:26:34] for a non-convex function if you initialize here you might get stuck up
[00:26:35] initialize here you might get stuck up here if you initialize over here you'll
[00:26:37] here if you initialize over here you'll get stuck here and so on
[00:26:39] get stuck here and so on so generally you have to be a little bit
[00:26:41] so generally you have to be a little bit careful about how
[00:26:42] careful about how you initialize um you can't initialize
[00:26:46] you initialize um you can't initialize at zero
[00:26:47] at zero because all it turns out that all the
[00:26:50] because all it turns out that all the rows of your
[00:26:52] rows of your play matrix are going to be identical
[00:26:54] play matrix are going to be identical which is not very useful
[00:26:56] which is not very useful um so you temporarily initialize around
[00:26:59] um so you temporarily initialize around zero with some amount of random noise or
[00:27:01] zero with some amount of random noise or you can use pre-training to initialize
[00:27:04] you can use pre-training to initialize uh your network as well
[00:27:07] uh your network as well which we won't cover right now
[00:27:10] which we won't cover right now um
[00:27:10] um another
[00:27:12] another thing that people do is called over
[00:27:14] thing that people do is called over parameterization so here this
[00:27:16] parameterization so here this corresponds to adding more hidden units
[00:27:18] corresponds to adding more hidden units than you kind of really need
[00:27:20] than you kind of really need this corresponds to having more a lot of
[00:27:23] this corresponds to having more a lot of rows of this uh of this matrix
[00:27:26] rows of this uh of this matrix and the idea here is that
[00:27:29] and the idea here is that the more
[00:27:30] the more hidden units you have the more kind of
[00:27:32] hidden units you have the more kind of quote-unquote chances you have
[00:27:34] quote-unquote chances you have of
[00:27:35] of having the network learn something
[00:27:37] having the network learn something reasonable by your data so some of the
[00:27:39] reasonable by your data so some of the units might die off and not be very
[00:27:40] units might die off and not be very useful but
[00:27:42] useful but you know maybe like some fraction of
[00:27:44] you know maybe like some fraction of them will actually be useful
[00:27:47] them will actually be useful and the final
[00:27:48] and the final thing that people do is using adaptive
[00:27:50] thing that people do is using adaptive step sizes which is generally an
[00:27:53] step sizes which is generally an extension of stochastic gradient descent
[00:27:55] extension of stochastic gradient descent remember as the caster grain is said you
[00:27:57] remember as the caster grain is said you have
[00:27:57] have we had a single step size ada which
[00:28:00] we had a single step size ada which controlled how fast you move
[00:28:02] controlled how fast you move with methods like atograd or atom you
[00:28:05] with methods like atograd or atom you actually get a uh per features uh or per
[00:28:09] actually get a uh per features uh or per uh parameter step size so for every
[00:28:12] uh parameter step size so for every weight you get a number which dictates
[00:28:14] weight you get a number which dictates how fast you should be moving in that
[00:28:16] how fast you should be moving in that direction
[00:28:17] direction and this generally leads to better
[00:28:19] and this generally leads to better results
[00:28:22] okay so one maybe high level thing to
[00:28:25] okay so one maybe high level thing to keep in mind is
[00:28:26] keep in mind is don't let your gradients vanish or
[00:28:28] don't let your gradients vanish or explode so
[00:28:30] explode so um if i explain this it will become kind
[00:28:33] um if i explain this it will become kind of clear so when you run gradient
[00:28:35] of clear so when you run gradient descent or gradients the cascading
[00:28:37] descent or gradients the cascading descent
[00:28:38] descent if your gradients vanish which means
[00:28:40] if your gradients vanish which means become
[00:28:41] become too small or close to zero then you
[00:28:43] too small or close to zero then you won't make you'll get stuck and you
[00:28:44] won't make you'll get stuck and you won't make progress
[00:28:46] won't make progress but your gradients become too large
[00:28:49] but your gradients become too large then you'll just explode and you will
[00:28:51] then you'll just explode and you will oscillate and
[00:28:53] oscillate and might diverge
[00:28:55] might diverge so
[00:28:56] so with careful initialization and setting
[00:28:58] with careful initialization and setting up the step sizes generally uh and even
[00:29:01] up the step sizes generally uh and even designing of the neural uh network
[00:29:04] designing of the neural uh network architecture
[00:29:05] architecture all of this is around kind of making
[00:29:08] all of this is around kind of making sure that your gradients don't vanish or
[00:29:10] sure that your gradients don't vanish or explode
[00:29:11] explode okay so that's all the guidance i'll
[00:29:13] okay so that's all the guidance i'll provide you there's a lot more to be
[00:29:14] provide you there's a lot more to be said on this topic we're just kind of
[00:29:16] said on this topic we're just kind of giving you a
[00:29:18] giving you a high level overview
[00:29:22] okay so let's summarize now so
[00:29:25] okay so let's summarize now so the most important topic of
[00:29:27] the most important topic of this module is that of a computation
[00:29:30] this module is that of a computation group
[00:29:32] group this allows you to represent arbitrary
[00:29:34] this allows you to represent arbitrary mathematical expressions and these
[00:29:36] mathematical expressions and these expressions are built out of these
[00:29:38] expressions are built out of these simple building blocks and i hope that
[00:29:40] simple building blocks and i hope that this car the idea of computation graphs
[00:29:43] this car the idea of computation graphs will allow you to get a better visual
[00:29:46] will allow you to get a better visual understanding of what your mathematical
[00:29:49] understanding of what your mathematical expressions are doing and also what
[00:29:51] expressions are doing and also what gradient computations are about
[00:29:54] gradient computations are about and
[00:29:55] and then we saw we had a back propagation
[00:29:58] then we saw we had a back propagation algorithm which is this general purpose
[00:30:00] algorithm which is this general purpose algorithm for leveraging the computation
[00:30:02] algorithm for leveraging the computation graph to make uh compute the gradients
[00:30:06] graph to make uh compute the gradients so notice that
[00:30:09] so notice that we've done this kind of in the context
[00:30:11] we've done this kind of in the context of neural networks but i stress that
[00:30:13] of neural networks but i stress that computation graphs and back propagation
[00:30:16] computation graphs and back propagation is fully general it allows you to handle
[00:30:19] is fully general it allows you to handle many many functions
[00:30:21] many many functions and the generality
[00:30:23] and the generality is one of these reasons that you can
[00:30:26] is one of these reasons that you can allows you to iterate very quickly on
[00:30:28] allows you to iterate very quickly on new types of models and loss functions
[00:30:31] new types of models and loss functions and
[00:30:32] and opens up this new paradigm for model
[00:30:34] opens up this new paradigm for model development
[00:30:35] development differential programming which we'll
[00:30:37] differential programming which we'll talk about in a future module
[00:30:39] talk about in a future module all right that's it thanks


================================================================================
LECTURE 013
================================================================================

Machine Learning 10 - Differentiable Programming | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=c5btEEisp_g

---

Transcript

[00:00:05] hi in this module i'm going to briefly
[00:00:07] hi in this module i'm going to briefly introduce the idea of differential
[00:00:09] introduce the idea of differential programming
[00:00:10] programming and differential programming kind of
[00:00:12] and differential programming kind of just runs off with ideas of computation
[00:00:15] just runs off with ideas of computation graphs and back propagation that we
[00:00:17] graphs and back propagation that we developed for simple neural networks
[00:00:19] developed for simple neural networks um there's really enough to say here to
[00:00:21] um there's really enough to say here to fill up an entire course at least so i'm
[00:00:24] fill up an entire course at least so i'm going to keep trying to keep things
[00:00:26] going to keep trying to keep things pretty high level but i will try to
[00:00:28] pretty high level but i will try to highlight the power of composition
[00:00:32] highlight the power of composition so differential programming is closely
[00:00:34] so differential programming is closely related deep learning i've adopted the
[00:00:36] related deep learning i've adopted the former term as an attempt to be more
[00:00:39] former term as an attempt to be more precise in terms of highlighting the
[00:00:42] precise in terms of highlighting the mechanics of writing models as you would
[00:00:45] mechanics of writing models as you would code
[00:00:48] so if you look around at deep learning
[00:00:49] so if you look around at deep learning today there's some pretty complex models
[00:00:52] today there's some pretty complex models which have many layers potential
[00:00:54] which have many layers potential mechanisms residual connections
[00:00:57] mechanisms residual connections to name a few and this could be quite
[00:00:59] to name a few and this could be quite overwhelming at first glance
[00:01:01] overwhelming at first glance but when you look closer
[00:01:03] but when you look closer you'll notice that these complex models
[00:01:05] you'll notice that these complex models are actually composed of functions and
[00:01:08] are actually composed of functions and these functions themselves are composed
[00:01:10] these functions themselves are composed of smaller functions so this is the
[00:01:13] of smaller functions so this is the programming part of differential
[00:01:14] programming part of differential programming which allows you to build up
[00:01:16] programming which allows you to build up increasingly more sophisticated model
[00:01:19] increasingly more sophisticated model without losing track of what's going on
[00:01:24] so let's begin with our familiar example
[00:01:27] so let's begin with our familiar example the three layer neural network
[00:01:29] the three layer neural network so remember that in
[00:01:31] so remember that in three layer neural network we start with
[00:01:34] three layer neural network we start with our feature vector
[00:01:35] our feature vector this case is a six dimensional vector
[00:01:38] this case is a six dimensional vector and we left multiply by a matrix
[00:01:41] and we left multiply by a matrix um i've drawn some lines here to help us
[00:01:44] um i've drawn some lines here to help us interpret this matrix as a set of rows
[00:01:48] interpret this matrix as a set of rows so each row corresponds to a hidden unit
[00:01:50] so each row corresponds to a hidden unit and i'm going to take the dot product of
[00:01:52] and i'm going to take the dot product of each row with
[00:01:54] each row with the input vector to produce a hidden
[00:01:57] the input vector to produce a hidden vector of dimension four
[00:02:00] vector of dimension four i'm going to add a bias term
[00:02:03] i'm going to add a bias term and then i'm going to apply an
[00:02:04] and then i'm going to apply an activation function element wise for
[00:02:07] activation function element wise for example the relu or the logistic
[00:02:10] example the relu or the logistic now i have a vector
[00:02:14] now i have a vector and now i can do the same thing again i
[00:02:16] and now i can do the same thing again i apply
[00:02:17] apply a matrix add a bias term
[00:02:19] a matrix add a bias term apply an activation function
[00:02:21] apply an activation function apply a matrix which happens to be a
[00:02:23] apply a matrix which happens to be a vector i so i get a scalar and i add a
[00:02:26] vector i so i get a scalar and i add a simple scalar bias term and i get a
[00:02:28] simple scalar bias term and i get a score which then i can happily drive
[00:02:31] score which then i can happily drive regression or take the sign to drive
[00:02:34] regression or take the sign to drive classification
[00:02:37] so
[00:02:38] so what i want to
[00:02:40] what i want to do now
[00:02:41] do now is to factor out this kind of complex
[00:02:44] is to factor out this kind of complex looking expression into a reusable
[00:02:46] looking expression into a reusable component which i'm going to call it
[00:02:48] component which i'm going to call it feed forward
[00:02:49] feed forward so we're going to see a lot of these box
[00:02:51] so we're going to see a lot of these box diagrams which are going to represent
[00:02:53] diagrams which are going to represent functions that we can reuse and have a
[00:02:56] functions that we can reuse and have a nice intro
[00:02:58] nice intro so the feedforward function takes in an
[00:03:00] so the feedforward function takes in an input vector x
[00:03:02] input vector x and produces another an output vector
[00:03:06] and produces another an output vector which could be of a different
[00:03:07] which could be of a different dimensionality
[00:03:09] dimensionality and the way to interpret what v4 are
[00:03:11] and the way to interpret what v4 are doing is performing one step of
[00:03:13] doing is performing one step of processing
[00:03:15] processing in particular what that processing is is
[00:03:18] in particular what that processing is is uh taking this input vector
[00:03:21] uh taking this input vector multiplying by a matrix adding a bias
[00:03:23] multiplying by a matrix adding a bias term and applying an activation function
[00:03:27] term and applying an activation function okay
[00:03:28] okay so this is a
[00:03:30] so this is a function or a program but unlike normal
[00:03:32] function or a program but unlike normal programming it's underspecified because
[00:03:35] programming it's underspecified because the red
[00:03:37] the red numbers here are parameters which are
[00:03:39] numbers here are parameters which are private to this function which are going
[00:03:41] private to this function which are going to be set
[00:03:42] to be set in tune later via back propagation
[00:03:46] in tune later via back propagation so now we can write our three-legged
[00:03:48] so now we can write our three-legged neural network using feedforward and the
[00:03:51] neural network using feedforward and the way i'm going to do this is
[00:03:53] way i'm going to do this is score is equal to
[00:03:56] score is equal to you take
[00:03:58] you take x or this should be phi of x
[00:04:01] x or this should be phi of x um
[00:04:02] um and you
[00:04:03] and you apply fee forward b forward feedforward
[00:04:06] apply fee forward b forward feedforward and you can write this as feedforward
[00:04:09] and you can write this as feedforward cubed to as to be more compact
[00:04:12] cubed to as to be more compact so this is a very compact way of writing
[00:04:15] so this is a very compact way of writing something that would otherwise be
[00:04:18] something that would otherwise be quite complicated
[00:04:23] so now let's suppose we want to do image
[00:04:25] so now let's suppose we want to do image classification
[00:04:27] classification we need some way of representing images
[00:04:30] we need some way of representing images so the feedforward function that we just
[00:04:32] so the feedforward function that we just introduced takes a vector as input
[00:04:35] introduced takes a vector as input and we can represent an image as a long
[00:04:38] and we can represent an image as a long vector by uh for example adding all the
[00:04:41] vector by uh for example adding all the rows
[00:04:42] rows but then we would have this huge matrix
[00:04:45] but then we would have this huge matrix that we would need to be able to
[00:04:47] that we would need to be able to transform this vector resulting a lot of
[00:04:49] transform this vector resulting a lot of parameters and uh which may make it
[00:04:52] parameters and uh which may make it difficult
[00:04:54] difficult um and the the problem here is that
[00:04:56] um and the the problem here is that we're not really using of the spatial
[00:05:00] we're not really using of the spatial structure of images for example if i
[00:05:02] structure of images for example if i just permuted all the um the elements of
[00:05:05] just permuted all the um the elements of this vector and retrain i would
[00:05:07] this vector and retrain i would basically get this i would get their
[00:05:09] basically get this i would get their identical model so it's kind of not
[00:05:12] identical model so it's kind of not paying attention to which
[00:05:14] paying attention to which pixels are close by
[00:05:17] pixels are close by to fix this uh problem uh we introduce
[00:05:20] to fix this uh problem uh we introduce convolutional neural networks which is a
[00:05:22] convolutional neural networks which is a refinement of a fully connected network
[00:05:25] refinement of a fully connected network so here in this example
[00:05:27] so here in this example of a conf net in action
[00:05:31] of a conf net in action uh so here's a car and you can see that
[00:05:34] uh so here's a car and you can see that it goes through a number of layers and
[00:05:37] it goes through a number of layers and over time it
[00:05:39] over time it computes increasingly abstract
[00:05:41] computes increasingly abstract representations of the image and at the
[00:05:44] representations of the image and at the end you get a vector representing the
[00:05:47] end you get a vector representing the probabilities of the different
[00:05:49] probabilities of the different object categories
[00:05:51] object categories so if you want to play with compnets you
[00:05:53] so if you want to play with compnets you can actually click here for android
[00:05:55] can actually click here for android carpathi's excellent
[00:05:57] carpathi's excellent demo where you can actually create and
[00:05:59] demo where you can actually create and train contents in your browser
[00:06:03] train contents in your browser so another comment is that confidence
[00:06:05] so another comment is that confidence we're going to introduce them for
[00:06:07] we're going to introduce them for 2d images but they can also
[00:06:09] 2d images but they can also be applied to
[00:06:11] be applied to texts or sequences which are 1d or
[00:06:13] texts or sequences which are 1d or videos which are
[00:06:18] so confidence have two basic
[00:06:21] so confidence have two basic building blocks um we're not going to go
[00:06:23] building blocks um we're not going to go through the details um you can take a
[00:06:25] through the details um you can take a cs231 if you want to learn all about
[00:06:27] cs231 if you want to learn all about confidence but instead i'm going to
[00:06:29] confidence but instead i'm going to focus on the interface and show how
[00:06:32] focus on the interface and show how these things these modules come
[00:06:35] these things these modules come so the first is
[00:06:37] so the first is conf
[00:06:38] conf and so conf takes an image
[00:06:41] and so conf takes an image and the image is going to be represented
[00:06:43] and the image is going to be represented as a volume which is a collection of
[00:06:46] as a volume which is a collection of matrices one for each channel red green
[00:06:49] matrices one for each channel red green blue
[00:06:50] blue each matrix it has the same
[00:06:52] each matrix it has the same dimensionality as the image
[00:06:54] dimensionality as the image height by
[00:06:55] height by width
[00:06:57] width and what the confidence is going to do
[00:07:00] and what the confidence is going to do is it's going to compute another volume
[00:07:02] is it's going to compute another volume of a slightly different size usually the
[00:07:05] of a slightly different size usually the height and width of this volume is going
[00:07:07] height and width of this volume is going to be equal or
[00:07:08] to be equal or maybe slightly smaller than the input
[00:07:10] maybe slightly smaller than the input volume
[00:07:11] volume and uh the number of channels is going
[00:07:14] and uh the number of channels is going to be somewhat different the way that
[00:07:16] to be somewhat different the way that coff is going to compute this
[00:07:19] coff is going to compute this volume is via a sequence of filters and
[00:07:22] volume is via a sequence of filters and intuitively what it's going to do is try
[00:07:24] intuitively what it's going to do is try to detect local patterns with
[00:07:27] to detect local patterns with so here is one filter
[00:07:30] so here is one filter and how
[00:07:33] and how it works is i'm going to slap the slide
[00:07:35] it works is i'm going to slap the slide this filter across the image
[00:07:38] this filter across the image and
[00:07:39] and if i put the filter here i'm going to
[00:07:42] if i put the filter here i'm going to kind of align it up with the first
[00:07:45] kind of align it up with the first pixels on on the image
[00:07:47] pixels on on the image um i'm going to compute the dot product
[00:07:49] um i'm going to compute the dot product between the eight numbers here and
[00:07:52] between the eight numbers here and the
[00:07:53] the um actually sorry 12 numbers here and
[00:07:55] um actually sorry 12 numbers here and the 12 numbers here i get a single
[00:07:57] the 12 numbers here i get a single number which i'm going to write into
[00:07:59] number which i'm going to write into this entry i slide the filter over a
[00:08:01] this entry i slide the filter over a little bit i'm going to write into the
[00:08:03] little bit i'm going to write into the second entry and so on
[00:08:06] second entry and so on and then for the second filter i'm going
[00:08:08] and then for the second filter i'm going to use to fill up the second output
[00:08:10] to use to fill up the second output channel so the number of filters is the
[00:08:12] channel so the number of filters is the number of output channels
[00:08:14] number of output channels okay so that's all i'm going to say
[00:08:15] okay so that's all i'm going to say about conf
[00:08:18] about conf the second operation is a max pool which
[00:08:21] the second operation is a max pool which again takes an input volume and then it
[00:08:24] again takes an input volume and then it produces a smaller
[00:08:26] produces a smaller output volume
[00:08:28] output volume it's going to have the same number of
[00:08:29] it's going to have the same number of channels and for every slice through the
[00:08:32] channels and for every slice through the matrix it's going to
[00:08:34] matrix it's going to slide
[00:08:35] slide a little
[00:08:37] a little operate max operation over every two by
[00:08:40] operate max operation over every two by two or three by three region
[00:08:42] two or three by three region so the max over these four numbers is
[00:08:45] so the max over these four numbers is going to be used for this
[00:08:48] going to be used for this number and so on
[00:08:50] number and so on okay so that's all i'm going to say
[00:08:52] okay so that's all i'm going to say about max pool um if you want to drill
[00:08:55] about max pool um if you want to drill into the details you can check out this
[00:08:57] into the details you can check out this demo or you can uh learn more in two
[00:09:00] demo or you can uh learn more in two three one
[00:09:02] three one but again i wanna highlight that there's
[00:09:04] but again i wanna highlight that there's these two modules one for detecting
[00:09:07] these two modules one for detecting patterns and one for aggregating to kind
[00:09:09] patterns and one for aggregating to kind of reduce the the dimensionality
[00:09:12] of reduce the the dimensionality and with these
[00:09:13] and with these uh
[00:09:14] uh two functions along with feed forward
[00:09:16] two functions along with feed forward now we can define alexnet which was uh
[00:09:19] now we can define alexnet which was uh the seminal
[00:09:21] the seminal cnn from 2012 that won the image
[00:09:24] cnn from 2012 that won the image connection and really transformed my
[00:09:26] connection and really transformed my computer
[00:09:28] computer so how this works is i'm going to start
[00:09:31] so how this works is i'm going to start with my input image
[00:09:33] with my input image apply a convolutional layer apply max 4
[00:09:36] apply a convolutional layer apply max 4 apply another configuration error by max
[00:09:39] apply another configuration error by max 4 apply three more convolutional layers
[00:09:42] 4 apply three more convolutional layers by max 4 and then apply three
[00:09:45] by max 4 and then apply three layers of feet forward
[00:09:48] layers of feet forward okay so in one line i have um
[00:09:51] okay so in one line i have um alexnet
[00:09:52] alexnet now of course i've underspecified um a
[00:09:55] now of course i've underspecified um a couple of things here
[00:09:56] couple of things here one is um
[00:09:59] one is um i haven't specified the parameters those
[00:10:02] i haven't specified the parameters those are to be learned
[00:10:03] are to be learned and each of these functions holds its uh
[00:10:07] and each of these functions holds its uh a private set of parameters that need to
[00:10:09] a private set of parameters that need to be learned the second thing is i also
[00:10:11] be learned the second thing is i also haven't specified the hyper parameters
[00:10:13] haven't specified the hyper parameters which is the number of channels the
[00:10:15] which is the number of channels the filter sizes and so on which are
[00:10:17] filter sizes and so on which are actually pretty important for getting a
[00:10:19] actually pretty important for getting a good performance but i just wanted to
[00:10:20] good performance but i just wanted to highlight the overarching structure and
[00:10:23] highlight the overarching structure and the idea that you can compose in a
[00:10:25] the idea that you can compose in a fairly effortless way
[00:10:29] fairly effortless way so now let's turn our attention to
[00:10:31] so now let's turn our attention to natural language processing so here's a
[00:10:33] natural language processing so here's a motivating example
[00:10:35] motivating example suppose we want to build a question
[00:10:36] suppose we want to build a question answering system
[00:10:38] answering system we have
[00:10:39] we have a paragraph
[00:10:41] a paragraph it's from wikipedia
[00:10:43] it's from wikipedia we have a question
[00:10:45] we have a question and we want to select the answer from
[00:10:47] and we want to select the answer from that passage
[00:10:48] that passage from the paragraph so this happens to be
[00:10:51] from the paragraph so this happens to be from the squad question answering
[00:10:53] from the squad question answering benchmark
[00:10:54] benchmark um so
[00:10:55] um so let's just read this so in meteorology
[00:10:57] let's just read this so in meteorology precipitation is any product of a
[00:10:59] precipitation is any product of a condensation of atmospheric water vapor
[00:11:01] condensation of atmospheric water vapor that falls under gravity
[00:11:03] that falls under gravity and the question is what causes
[00:11:05] and the question is what causes precipitation to fall
[00:11:07] precipitation to fall and the answer is gravity
[00:11:10] and the answer is gravity so to do question answering you have to
[00:11:13] so to do question answering you have to do
[00:11:14] do a fair amount of processing um so you
[00:11:17] a fair amount of processing um so you somehow have to relate to the question
[00:11:20] somehow have to relate to the question with uh the paragraph but it's not an
[00:11:23] with uh the paragraph but it's not an exact match some of the words match like
[00:11:25] exact match some of the words match like precipitation but some of them are kind
[00:11:27] precipitation but some of them are kind of more subtle like causes is somehow
[00:11:30] of more subtle like causes is somehow related to product
[00:11:32] related to product and
[00:11:33] and also the fact that some words are
[00:11:35] also the fact that some words are ambiguous like product can be um
[00:11:38] ambiguous like product can be um or
[00:11:39] or multiplication or um output
[00:11:43] multiplication or um output um so there's a lot of
[00:11:45] um so there's a lot of processing that needs to happen
[00:11:48] processing that needs to happen and it's hard to kind of specify in
[00:11:50] and it's hard to kind of specify in advance
[00:11:51] advance so
[00:11:52] so um
[00:11:54] um so first things first so words are
[00:11:56] so first things first so words are discrete objects and neural networks
[00:11:59] discrete objects and neural networks speak vectors
[00:12:00] speak vectors so whenever you're doing nlp with um
[00:12:04] so whenever you're doing nlp with um with neural nets you first have to embed
[00:12:08] with neural nets you first have to embed um the words or more generally tokens
[00:12:11] um the words or more generally tokens so we're going to define an embed token
[00:12:13] so we're going to define an embed token function that takes
[00:12:15] function that takes a word or a token x and maps it into a
[00:12:18] a word or a token x and maps it into a vector
[00:12:19] vector and all this function is going to do is
[00:12:21] and all this function is going to do is it's going to look up vector in a
[00:12:23] it's going to look up vector in a dictionary
[00:12:24] dictionary that has a static
[00:12:27] that has a static set of vectors associated with
[00:12:28] set of vectors associated with particular tokens
[00:12:32] particular tokens so um this is fine and for if you have
[00:12:36] so um this is fine and for if you have a a sequence of words
[00:12:40] a a sequence of words then you can just embed each word into a
[00:12:42] then you can just embed each word into a vector to get a sequence of vectors
[00:12:46] vector to get a sequence of vectors um there's one
[00:12:47] um there's one problem which is that the meaning of the
[00:12:50] problem which is that the meaning of the words and tokens depends on context so
[00:12:53] words and tokens depends on context so this
[00:12:54] this representation of the sentence is not
[00:12:56] representation of the sentence is not going to be a particularly
[00:12:58] going to be a particularly sophisticated one
[00:13:01] so what we're going to do is going to
[00:13:04] so what we're going to do is going to define
[00:13:06] define an abstract function
[00:13:08] an abstract function borrowing terminology for programming
[00:13:11] borrowing terminology for programming abstract function is something that has
[00:13:13] abstract function is something that has an interface but not an implementation
[00:13:16] an interface but not an implementation so a sequence model is going to be
[00:13:17] so a sequence model is going to be something that takes a
[00:13:19] something that takes a sequence of input vectors and produces a
[00:13:22] sequence of input vectors and produces a corresponding sequence of output vectors
[00:13:25] corresponding sequence of output vectors where each vector in this sequence is
[00:13:29] where each vector in this sequence is a process with respect to the other
[00:13:32] a process with respect to the other elements so in other words i want to
[00:13:34] elements so in other words i want to contextualize these vectors
[00:13:37] contextualize these vectors um
[00:13:38] um using the sequence models
[00:13:41] using the sequence models i'm going to talk about two
[00:13:42] i'm going to talk about two implementations of the sequence models
[00:13:44] implementations of the sequence models one is recurrent neural networks and one
[00:13:47] one is recurrent neural networks and one is transformers
[00:13:49] is transformers so historically recurrent neural
[00:13:51] so historically recurrent neural networks have have been around fog since
[00:13:54] networks have have been around fog since the early 90s and uh since
[00:13:58] the early 90s and uh since 2011 or so became really kind of the
[00:14:00] 2011 or so became really kind of the dominant paradigm for doing a deep
[00:14:03] dominant paradigm for doing a deep learning nlp transformers uh
[00:14:06] learning nlp transformers uh who came out in 2017 and really has kind
[00:14:09] who came out in 2017 and really has kind of started
[00:14:11] of started uh i guess transformed the landscape of
[00:14:14] uh i guess transformed the landscape of deep learning nlp
[00:14:18] so on rnn or a recurrent network can be
[00:14:21] so on rnn or a recurrent network can be thought of as reading
[00:14:24] thought of as reading you know a sentence left to right that's
[00:14:25] you know a sentence left to right that's a kind of intuitive way to think about
[00:14:27] a kind of intuitive way to think about it so
[00:14:28] it so we have um you know a word which gets
[00:14:31] we have um you know a word which gets mapped into a vector
[00:14:33] mapped into a vector that produces some hidden state
[00:14:36] that produces some hidden state and then i'm going to read a second
[00:14:39] and then i'm going to read a second uh
[00:14:40] uh input vector and i'm going to update
[00:14:42] input vector and i'm going to update this hidden state
[00:14:44] this hidden state with along with this thing that i just
[00:14:47] with along with this thing that i just read into a new hidden state
[00:14:49] read into a new hidden state and then i'm going to read another
[00:14:51] and then i'm going to read another input vector updated state
[00:14:54] input vector updated state and repeating
[00:14:55] and repeating and and again okay so at the end of the
[00:14:59] and and again okay so at the end of the day
[00:15:00] day i have the sequence model because that
[00:15:02] i have the sequence model because that maps um input sequence into an output
[00:15:06] maps um input sequence into an output sequence
[00:15:07] sequence and
[00:15:08] and i notice that each vector here now
[00:15:11] i notice that each vector here now depends on not just the
[00:15:13] depends on not just the us
[00:15:14] us the input
[00:15:16] the input vector but
[00:15:18] vector but um into the left so if you look at h3 h3
[00:15:21] um into the left so if you look at h3 h3 depends on x3 x2 and x1
[00:15:24] depends on x3 x2 and x1 all this computation graph
[00:15:27] all this computation graph so the intuition again is reading left
[00:15:30] so the intuition again is reading left to right updating hidden state as you go
[00:15:32] to right updating hidden state as you go along it's kind of like a memory
[00:15:35] along it's kind of like a memory um
[00:15:36] um one thing i haven't specified is what
[00:15:39] one thing i haven't specified is what this function that takes an old hidden
[00:15:42] this function that takes an old hidden state an input and updates the hidden
[00:15:45] state an input and updates the hidden state
[00:15:47] state so i'm going to do that next there's two
[00:15:50] so i'm going to do that next there's two types of implementations i'm going to
[00:15:52] types of implementations i'm going to talk about one is a simple rnn
[00:15:55] talk about one is a simple rnn um so
[00:15:56] um so the contract here is i'm going to have
[00:15:58] the contract here is i'm going to have an old hidden state
[00:16:00] an old hidden state an input and we're going to want to
[00:16:02] an input and we're going to want to generate a new hidden state of the same
[00:16:04] generate a new hidden state of the same dimensionality
[00:16:06] dimensionality and the way a simple rnn works is it's
[00:16:09] and the way a simple rnn works is it's uh i take the hidden state multiply by a
[00:16:12] uh i take the hidden state multiply by a matrix
[00:16:13] matrix um
[00:16:15] um take the input
[00:16:17] take the input um multiply by matrix
[00:16:19] um multiply by matrix and i add these two and i apply an
[00:16:21] and i add these two and i apply an activation function
[00:16:24] activation function so it's fairly simple and one other way
[00:16:27] so it's fairly simple and one other way to think about this is that this is
[00:16:29] to think about this is that this is really the feed forward function applied
[00:16:31] really the feed forward function applied to concatenation of h and
[00:16:34] to concatenation of h and x okay so one problem with a simple rnn
[00:16:38] x okay so one problem with a simple rnn is that
[00:16:40] is that it suffers from the vanishing gradient
[00:16:42] it suffers from the vanishing gradient problem
[00:16:43] problem if you have long sequences then the
[00:16:46] if you have long sequences then the gradients start vanishing so lstms or
[00:16:50] gradients start vanishing so lstms or long
[00:16:50] long short-term memory were developed to
[00:16:53] short-term memory were developed to solve this problem
[00:16:55] solve this problem and the way that this works is uh the
[00:16:58] and the way that this works is uh the interface is the same
[00:17:00] interface is the same and
[00:17:01] and the implementation is some
[00:17:03] the implementation is some rather involved thing that i'm not going
[00:17:05] rather involved thing that i'm not going to explain
[00:17:08] to explain but
[00:17:08] but intuitively you should black boxes and
[00:17:11] intuitively you should black boxes and think about lstms this is just a way to
[00:17:13] think about lstms this is just a way to update the hidden state
[00:17:15] update the hidden state given a new input but without forgetting
[00:17:18] given a new input but without forgetting the past
[00:17:19] the past remember up here for that simple rnn we
[00:17:22] remember up here for that simple rnn we can think of it as this feed forward on
[00:17:25] can think of it as this feed forward on x and
[00:17:27] x and h
[00:17:28] h which are treated kind of equally lsm
[00:17:30] which are treated kind of equally lsm lsm's kind of privilege h and make sure
[00:17:33] lsm's kind of privilege h and make sure that each doesn't get forgot while going
[00:17:35] that each doesn't get forgot while going through this era
[00:17:39] okay so
[00:17:40] okay so now we have our
[00:17:42] now we have our sequence model on rnn which
[00:17:45] sequence model on rnn which produces a
[00:17:47] produces a sequence of vectors which and the number
[00:17:49] sequence of vectors which and the number of vectors depends on how long the input
[00:17:51] of vectors depends on how long the input sequence is
[00:17:53] sequence is so suppose we want to do classification
[00:17:55] so suppose we want to do classification we need to somehow collapse that into a
[00:17:57] we need to somehow collapse that into a single vector so i'm going to define
[00:17:59] single vector so i'm going to define this function collapse which takes a
[00:18:01] this function collapse which takes a sequence of vectors and returns a single
[00:18:04] sequence of vectors and returns a single vector
[00:18:06] vector so you can intuitively think about this
[00:18:08] so you can intuitively think about this as summarizing the collection of vectors
[00:18:10] as summarizing the collection of vectors as one
[00:18:12] as one there's uh three um common things you
[00:18:14] there's uh three um common things you can do you can just simply take the
[00:18:16] can do you can just simply take the first vector
[00:18:18] first vector you can take the last vector or you can
[00:18:20] you can take the last vector or you can take uh the average of
[00:18:22] take uh the average of all the vectors so if you're doing text
[00:18:24] all the vectors so if you're doing text classification you probably want to pick
[00:18:26] classification you probably want to pick the average to not privilege any
[00:18:28] the average to not privilege any individual word but as we'll see later
[00:18:30] individual word but as we'll see later if you're trying to do language modeling
[00:18:31] if you're trying to do language modeling you want to take class
[00:18:34] you want to take class so here is an example text
[00:18:36] so here is an example text classification model that we can uh
[00:18:38] classification model that we can uh develop
[00:18:39] develop um the score
[00:18:42] um the score or let's say binary classification is
[00:18:44] or let's say binary classification is going to be equal to
[00:18:46] going to be equal to taking the input
[00:18:48] taking the input sequence of tokens
[00:18:50] sequence of tokens to embed all the tokens into a sequence
[00:18:53] to embed all the tokens into a sequence of vectors and now you can apply
[00:18:58] of vectors and now you can apply a sequence model for example the
[00:18:59] a sequence model for example the sequence rnn
[00:19:01] sequence rnn um and you can do this three times that
[00:19:03] um and you can do this three times that gives you um depth just like we talked
[00:19:06] gives you um depth just like we talked about for feed with forward networks
[00:19:08] about for feed with forward networks and now you can collapse that into a
[00:19:10] and now you can collapse that into a single vector take the dot product to
[00:19:12] single vector take the dot product to get a number out
[00:19:14] get a number out so this um
[00:19:16] so this um these types of
[00:19:19] these types of functions that where the input and
[00:19:20] functions that where the input and output have the same type signature are
[00:19:23] output have the same type signature are really handy because then you can
[00:19:25] really handy because then you can compose them with each other and get
[00:19:28] compose them with each other and get multiple steps of computation
[00:19:34] so
[00:19:35] so recurrent neural networks are
[00:19:38] recurrent neural networks are work generally fairly well but
[00:19:41] work generally fairly well but they suffer from one problem is that
[00:19:43] they suffer from one problem is that they're fairly
[00:19:45] they're fairly you know local
[00:19:47] you know local and um
[00:19:49] and um so
[00:19:53] the one problem
[00:19:55] the one problem that oh so this is a problem that we're
[00:19:57] that oh so this is a problem that we're going to try to
[00:19:58] going to try to address
[00:20:00] address with transformers
[00:20:02] with transformers so uh introducing transformers is fairly
[00:20:05] so uh introducing transformers is fairly involved
[00:20:06] involved so i'm going to step through
[00:20:09] so i'm going to step through introduce a few things before actually
[00:20:11] introduce a few things before actually defining it
[00:20:12] defining it so the core part of a transformer is
[00:20:16] so the core part of a transformer is the tension mechanism
[00:20:18] the tension mechanism and
[00:20:19] and the intention mechanism
[00:20:21] the intention mechanism takes in a collection of vectors
[00:20:24] takes in a collection of vectors of input vectors and a query vector
[00:20:27] of input vectors and a query vector and then outputs a single vector
[00:20:30] and then outputs a single vector and intuitively what the tension is
[00:20:32] and intuitively what the tension is doing is it's going to process
[00:20:34] doing is it's going to process y by comparing it to each of
[00:20:37] y by comparing it to each of these x's
[00:20:39] these x's okay
[00:20:40] okay so
[00:20:41] so mathematically what this is doing is um
[00:20:45] mathematically what this is doing is um you start with the query vector
[00:20:48] you start with the query vector i'm going to multiply a matrix to reduce
[00:20:50] i'm going to multiply a matrix to reduce its dimensionality in this case from six
[00:20:53] its dimensionality in this case from six to three
[00:20:55] to three um i'm also going to take um
[00:20:58] um i'm also going to take um the x
[00:20:59] the x transpose which is
[00:21:01] transpose which is each row here
[00:21:02] each row here is um one of the input vectors x1 x2 x3
[00:21:07] is um one of the input vectors x1 x2 x3 x4
[00:21:08] x4 i'm going to reduce its dimensionality
[00:21:10] i'm going to reduce its dimensionality to also
[00:21:12] to also three dimensions
[00:21:14] three dimensions and now i can take the dot product
[00:21:17] and now i can take the dot product between these x's and y's
[00:21:20] between these x's and y's so that's going to give me a four
[00:21:22] so that's going to give me a four dimensional vector
[00:21:24] dimensional vector of of dot products intuitively measuring
[00:21:27] of of dot products intuitively measuring this uh
[00:21:28] this uh similarity between x and the x's and the
[00:21:32] similarity between x and the x's and the y
[00:21:34] y so now i can take those scores
[00:21:37] so now i can take those scores and i can uh turn them into
[00:21:39] and i can uh turn them into probabilities by taking a softmax so a
[00:21:42] probabilities by taking a softmax so a softmax
[00:21:44] softmax exponentiates the scores and normalizes
[00:21:47] exponentiates the scores and normalizes that into a probability distribution
[00:21:50] that into a probability distribution so now i have a distribution over the
[00:21:52] so now i have a distribution over the input vectors x1 x2 x34 it's a four
[00:21:56] input vectors x1 x2 x34 it's a four dimensional vector
[00:21:57] dimensional vector i can use that
[00:21:59] i can use that those probabilities as weights to
[00:22:02] those probabilities as weights to when i multiply by x
[00:22:04] when i multiply by x to take a weighted combination of the
[00:22:06] to take a weighted combination of the columns of x here
[00:22:09] columns of x here so
[00:22:10] so for intuition if um one of the
[00:22:14] for intuition if um one of the and the inputs has a very high
[00:22:16] and the inputs has a very high probability let's say it's a 0 0 1 0
[00:22:19] probability let's say it's a 0 0 1 0 then i'm just going to
[00:22:22] then i'm just going to pick out the third
[00:22:24] pick out the third input vector so in general this is a
[00:22:27] input vector so in general this is a distribution so this is kind of like
[00:22:29] distribution so this is kind of like softly picking out um which input vector
[00:22:32] softly picking out um which input vector is similar to y
[00:22:35] is similar to y okay and then finally i'm going to
[00:22:37] okay and then finally i'm going to reduce the dimensionality to some lower
[00:22:40] reduce the dimensionality to some lower dimensional object
[00:22:43] so
[00:22:45] so similarity is can be a multifaceted
[00:22:48] similarity is can be a multifaceted thing so
[00:22:49] thing so one thing that the transformer does is
[00:22:52] one thing that the transformer does is allows us to use multiple attention
[00:22:55] allows us to use multiple attention heads so i'm going to repeat this
[00:22:57] heads so i'm going to repeat this process again
[00:22:59] process again taking the query vector taking the input
[00:23:02] taking the query vector taking the input vector comparing them getting a
[00:23:04] vector comparing them getting a distribution over the input vectors and
[00:23:06] distribution over the input vectors and using that distribution reweight the
[00:23:08] using that distribution reweight the input vector so i'm selecting out softly
[00:23:10] input vector so i'm selecting out softly in a vector and i multiply it by matrix
[00:23:14] in a vector and i multiply it by matrix to reduce the dimensionality i've done
[00:23:16] to reduce the dimensionality i've done this twice but in general you can do
[00:23:18] this twice but in general you can do this um any
[00:23:19] this um any for
[00:23:20] for 16.
[00:23:22] 16. so now i concatenate these vectors so i
[00:23:24] so now i concatenate these vectors so i have a four dimensional vector from this
[00:23:27] have a four dimensional vector from this computation four dimensional vector from
[00:23:29] computation four dimensional vector from this computation i can concatenate them
[00:23:31] this computation i can concatenate them into a eight dimensional vector and now
[00:23:34] into a eight dimensional vector and now i can reduce the dimensionality back to
[00:23:36] i can reduce the dimensionality back to the original
[00:23:37] the original dimensionality that of the of the inputs
[00:23:41] dimensionality that of the of the inputs okay so that was a kind of a very
[00:23:44] okay so that was a kind of a very involved uh you know process but at the
[00:23:46] involved uh you know process but at the end of the day you can think about this
[00:23:49] end of the day you can think about this as taking y comparing it with the x's
[00:23:52] as taking y comparing it with the x's and selecting out the one that's
[00:23:54] and selecting out the one that's most similar
[00:23:56] most similar and
[00:23:58] and doing some dimensionality reduction in
[00:23:59] doing some dimensionality reduction in the process
[00:24:02] okay so that's attention um
[00:24:06] okay so that's attention um the transformer uh uses something called
[00:24:09] the transformer uh uses something called self-attention which means that the
[00:24:11] self-attention which means that the query vector is actually going to be the
[00:24:13] query vector is actually going to be the import vectors themselves
[00:24:15] import vectors themselves so if attention self-attention takes a
[00:24:18] so if attention self-attention takes a sequence of
[00:24:20] sequence of input vectors
[00:24:22] input vectors and then it's going to output the same
[00:24:25] and then it's going to output the same a sequence of output vectors
[00:24:28] a sequence of output vectors where the first vector is i'm going to
[00:24:30] where the first vector is i'm going to stick x1
[00:24:32] stick x1 into the query vector for y and compute
[00:24:35] into the query vector for y and compute the tension
[00:24:37] the tension and then x2 and x3 and x4
[00:24:40] and then x2 and x3 and x4 so each of these vectors is
[00:24:42] so each of these vectors is comparing
[00:24:44] comparing a particular input vector with the rest
[00:24:46] a particular input vector with the rest of imperfectors and doing some
[00:24:48] of imperfectors and doing some processing
[00:24:50] processing so in other words i've basically
[00:24:55] so in other words i've basically generated a sequence of vectors where
[00:24:58] generated a sequence of vectors where all the objects all
[00:25:00] all the objects all n squared of the objects have allowed uh
[00:25:03] n squared of the objects have allowed uh i've allowed them to communicate with
[00:25:06] i've allowed them to communicate with each other directly
[00:25:08] each other directly um so in contrast with rnn
[00:25:11] um so in contrast with rnn um
[00:25:12] um we have representations that have to
[00:25:14] we have representations that have to kind of do proceed step by step and the
[00:25:18] kind of do proceed step by step and the number of steps is the length of the
[00:25:20] number of steps is the length of the sequence which causes these long chains
[00:25:22] sequence which causes these long chains which
[00:25:24] which prevents uh kind of fast propagation
[00:25:26] prevents uh kind of fast propagation whereas attention um solves uh this
[00:25:29] whereas attention um solves uh this problem
[00:25:31] problem so one kind of slight uh
[00:25:34] so one kind of slight uh you know comment is that you know i've
[00:25:36] you know comment is that you know i've speaking very vaguely and intuitively
[00:25:39] speaking very vaguely and intuitively about these things um
[00:25:41] about these things um trying to provide as much intuition as
[00:25:43] trying to provide as much intuition as possible and i it's really you can't
[00:25:46] possible and i it's really you can't really be more precise because i'm again
[00:25:49] really be more precise because i'm again not specifying uh the actual computation
[00:25:52] not specifying uh the actual computation i'm only specifying the kind of the
[00:25:54] i'm only specifying the kind of the scope of possible computations that
[00:25:57] scope of possible computations that can be done once
[00:25:59] can be done once the parameters are learned from data
[00:26:02] okay so that's the tension
[00:26:05] okay so that's the tension mechanism you can think about this as a
[00:26:07] mechanism you can think about this as a sequence model that just takes uh input
[00:26:10] sequence model that just takes uh input sequence and
[00:26:11] sequence and contextualizes the
[00:26:13] contextualizes the input vectors into output vectors
[00:26:17] input vectors into output vectors so there's two other pieces i need to
[00:26:20] so there's two other pieces i need to talk about um to before i can fully
[00:26:22] talk about um to before i can fully define the transformer layer
[00:26:24] define the transformer layer normalization and residual connections
[00:26:26] normalization and residual connections so these are really kind of technical
[00:26:28] so these are really kind of technical devices to make the final
[00:26:30] devices to make the final neural network easier to train
[00:26:33] neural network easier to train i'm going to package them up into
[00:26:35] i'm going to package them up into something called add norm
[00:26:37] something called add norm and it's also a has a type signature of
[00:26:40] and it's also a has a type signature of a sequence model where i have an input
[00:26:42] a sequence model where i have an input sequence of vectors and i spit out the
[00:26:45] sequence of vectors and i spit out the corresponding set of contextualized
[00:26:47] corresponding set of contextualized vectors
[00:26:48] vectors and the intuition behind this is i'm
[00:26:50] and the intuition behind this is i'm going to apply f to x safely
[00:26:54] going to apply f to x safely so let me explain what that means
[00:26:56] so let me explain what that means so add norm of f x is equal to
[00:27:00] so add norm of f x is equal to i'm first going to take x and apply f to
[00:27:02] i'm first going to take x and apply f to it
[00:27:03] it okay so
[00:27:05] okay so why is that not good enough
[00:27:07] why is that not good enough well remember that um
[00:27:10] well remember that um in
[00:27:11] in uh these these functions are under
[00:27:14] uh these these functions are under specified so at the beginning of
[00:27:15] specified so at the beginning of training they're basically not doing you
[00:27:18] training they're basically not doing you know anything
[00:27:19] know anything and so they're basically kind of junk
[00:27:22] and so they're basically kind of junk and if this is junk then anything that i
[00:27:24] and if this is junk then anything that i build on top of it is also going to be
[00:27:26] build on top of it is also going to be uh pretty junky
[00:27:28] uh pretty junky so what i want to do is add a residual
[00:27:30] so what i want to do is add a residual connection
[00:27:31] connection so residual connection is a kind of
[00:27:33] so residual connection is a kind of escape hatch that allows x to be
[00:27:35] escape hatch that allows x to be propagated through a verbatim
[00:27:38] propagated through a verbatim so that means if f is jumped at least i
[00:27:41] so that means if f is jumped at least i have x
[00:27:43] have x so then i'm going to add a layer norm
[00:27:47] so then i'm going to add a layer norm function on top of this
[00:27:50] function on top of this so layer normalization
[00:27:52] so layer normalization is just a way to
[00:27:55] is just a way to make sure that this vector is not too
[00:27:57] make sure that this vector is not too big or not too small because big vectors
[00:28:00] big or not too small because big vectors and small vectors
[00:28:02] and small vectors result in exploding gradients or
[00:28:04] result in exploding gradients or vanishing gradients which stalls
[00:28:07] vanishing gradients which stalls training or mixed training
[00:28:09] training or mixed training diverge
[00:28:10] diverge so specifically what layer norm does on
[00:28:13] so specifically what layer norm does on a single vector is that it treats these
[00:28:15] a single vector is that it treats these as a set of elements and it subtracts
[00:28:18] as a set of elements and it subtracts the mean of those elements and divides
[00:28:20] the mean of those elements and divides by the standard deviation to kind of
[00:28:22] by the standard deviation to kind of standardize the the magnitude and
[00:28:26] standardize the the magnitude and of the vectors
[00:28:30] okay so
[00:28:31] okay so in summary add norm of with a particular
[00:28:34] in summary add norm of with a particular function is just applying f to x uh
[00:28:37] function is just applying f to x uh safely
[00:28:39] safely okay so now i'm finally ready to define
[00:28:41] okay so now i'm finally ready to define the the transformer block
[00:28:44] the the transformer block and this is again a sequence model that
[00:28:46] and this is again a sequence model that takes this sequence of input vectors and
[00:28:49] takes this sequence of input vectors and spits out a contextualized set of output
[00:28:51] spits out a contextualized set of output vectors
[00:28:52] vectors and this is just uh intuitively
[00:28:55] and this is just uh intuitively processing each x i in context
[00:28:59] processing each x i in context so there's only one line here we've done
[00:29:02] so there's only one line here we've done actually a lot of most of the hard work
[00:29:04] actually a lot of most of the hard work so the transformer block on a
[00:29:07] so the transformer block on a sequence of vectors is going to be
[00:29:10] sequence of vectors is going to be x and you apply a tension
[00:29:13] x and you apply a tension that allows all the
[00:29:15] that allows all the the vectors to talk to each other
[00:29:18] the vectors to talk to each other and then you want to normalize to the
[00:29:21] and then you want to normalize to the end to do this safely
[00:29:23] end to do this safely um and finally you apply feed forward to
[00:29:26] um and finally you apply feed forward to each individual
[00:29:28] each individual resulting vector independently and then
[00:29:30] resulting vector independently and then you also want to normalize
[00:29:33] you also want to normalize and do this safely
[00:29:35] and do this safely so so that's it for
[00:29:37] so so that's it for a transformer block
[00:29:40] a transformer block so
[00:29:42] so now i can
[00:29:43] now i can and now we have enough that we can
[00:29:44] and now we have enough that we can actually build up to a bird which was
[00:29:46] actually build up to a bird which was this complicated thing that i mentioned
[00:29:49] this complicated thing that i mentioned at the beginning the bird is this large
[00:29:51] at the beginning the bird is this large unsupervised pre-trained model which is
[00:29:53] unsupervised pre-trained model which is uh came out in
[00:29:55] uh came out in 2018 which has really kind of
[00:29:58] 2018 which has really kind of transformed nlp before there were a lot
[00:30:01] transformed nlp before there were a lot of specialized architectures for
[00:30:02] of specialized architectures for different tasks but bird was a single
[00:30:05] different tasks but bird was a single model architecture that work well across
[00:30:08] model architecture that work well across the many tasks so this is the way um it
[00:30:11] the many tasks so this is the way um it works for
[00:30:13] works for you know question answering you take
[00:30:18] the
[00:30:19] the question you concatenate it with the
[00:30:21] question you concatenate it with the paragraph
[00:30:22] paragraph that gives you just a sequence of tokens
[00:30:27] that gives you just a sequence of tokens and uh what bert does on a sequence of
[00:30:29] and uh what bert does on a sequence of tokens is it's it's going to embed
[00:30:33] tokens is it's it's going to embed the tokens
[00:30:35] the tokens and then it's just going to
[00:30:37] and then it's just going to apply the transformer block 24 times
[00:30:42] apply the transformer block 24 times so again the nice thing about having a
[00:30:44] so again the nice thing about having a transformer block that where the input
[00:30:46] transformer block that where the input and output have the same dimensionality
[00:30:48] and output have the same dimensionality and type is that you can just kind of
[00:30:50] and type is that you can just kind of lay it on
[00:30:51] lay it on and get much deeper
[00:30:53] and get much deeper networks
[00:30:55] networks okay so at the end of the day bird gives
[00:30:57] okay so at the end of the day bird gives you a sequence of vectors which are
[00:30:59] you a sequence of vectors which are highly contextualized and
[00:31:02] highly contextualized and nuanced and it contains a lot of rich
[00:31:04] nuanced and it contains a lot of rich information about the sentence
[00:31:06] information about the sentence um and from there you can either use it
[00:31:09] um and from there you can either use it to drive classification
[00:31:12] to drive classification of let's say binary classification
[00:31:15] of let's say binary classification directly by collapsing the vectors into
[00:31:17] directly by collapsing the vectors into one vector or you can use it to
[00:31:20] one vector or you can use it to select out
[00:31:22] select out an answer to the question
[00:31:25] an answer to the question and i'm not going to go into details of
[00:31:27] and i'm not going to go into details of how that works
[00:31:30] so so far we've talked about how to
[00:31:32] so so far we've talked about how to design functions that can process a
[00:31:35] design functions that can process a sentence a sequence of tokens or vectors
[00:31:38] sentence a sequence of tokens or vectors um but we can also uh generate new
[00:31:41] um but we can also uh generate new sequences
[00:31:42] sequences and the basic building block for
[00:31:44] and the basic building block for generation is i'm going to call it
[00:31:46] generation is i'm going to call it generate token
[00:31:48] generate token and it's um it's you take a vector
[00:31:51] and it's um it's you take a vector x and you generate a token y
[00:31:55] x and you generate a token y and this is going to be this is kind of
[00:31:57] and this is going to be this is kind of the reverse of embed token which takes a
[00:32:00] the reverse of embed token which takes a token and gener produces a vector
[00:32:04] token and gener produces a vector and the way january token is going to
[00:32:05] and the way january token is going to work is that it's going to actually use
[00:32:08] work is that it's going to actually use this as a subroutine it's going to look
[00:32:10] this as a subroutine it's going to look at all the possible candidate words that
[00:32:14] at all the possible candidate words that one could generate it's going to embed
[00:32:16] one could generate it's going to embed those
[00:32:17] those to and take the dot product of x to get
[00:32:19] to and take the dot product of x to get some sort of similarity between the
[00:32:22] some sort of similarity between the vector and a potential candidate
[00:32:24] vector and a potential candidate generation
[00:32:25] generation now we have some scores we apply the
[00:32:27] now we have some scores we apply the soft max to get a distribution
[00:32:29] soft max to get a distribution over possible words and then we can
[00:32:31] over possible words and then we can generate from that probability
[00:32:37] so
[00:32:38] so here
[00:32:39] here building on top of generate token we can
[00:32:42] building on top of generate token we can do language modeling where
[00:32:44] do language modeling where the input is a sequence of words and the
[00:32:48] the input is a sequence of words and the output is the next word
[00:32:52] output is the next word um so this is actually fairly simple
[00:32:55] um so this is actually fairly simple since we already have all the
[00:32:56] since we already have all the essentiality tools so language modeling
[00:32:58] essentiality tools so language modeling of x is you take x
[00:33:02] of x is you take x you embed them into tokens the crucial
[00:33:05] you embed them into tokens the crucial step is that you stick it through a
[00:33:06] step is that you stick it through a sequence model remember a sequence model
[00:33:09] sequence model remember a sequence model um does fancy stuff and it turns this
[00:33:14] um does fancy stuff and it turns this sequence of um kind of primitive
[00:33:17] sequence of um kind of primitive vectors into contextualized vectors
[00:33:19] vectors into contextualized vectors which are um have contained more
[00:33:21] which are um have contained more information
[00:33:22] information and then it collapses them and this time
[00:33:26] and then it collapses them and this time you you generally want to use the the
[00:33:29] you you generally want to use the the last
[00:33:29] last vector
[00:33:31] vector because that's closest to the word that
[00:33:33] because that's closest to the word that you want to generate next
[00:33:35] you want to generate next and then that gives you just one vector
[00:33:38] and then that gives you just one vector and you can use that to generate a token
[00:33:44] okay so
[00:33:45] okay so finally
[00:33:48] finally we can take language models and we can
[00:33:50] we can take language models and we can build on top of them um
[00:33:53] build on top of them um to create what is known as a sequence to
[00:33:55] to create what is known as a sequence to sequence model
[00:33:56] sequence model so this is perhaps one of my kind of
[00:33:58] so this is perhaps one of my kind of favorite uh interfaces because it's so
[00:34:01] favorite uh interfaces because it's so versatile
[00:34:02] versatile so the basic idea is that you have an
[00:34:04] so the basic idea is that you have an input which is a sequence
[00:34:06] input which is a sequence and you are trying to generate another
[00:34:09] and you are trying to generate another sequence which is output and sequences
[00:34:12] sequence which is output and sequences are very
[00:34:13] are very general you can use sequences to encode
[00:34:14] general you can use sequences to encode basically any sort of discrete
[00:34:18] basically any sort of discrete output
[00:34:19] output and the way we're going to do that is
[00:34:21] and the way we're going to do that is just using you know a language model
[00:34:24] just using you know a language model so
[00:34:25] so um remember a language model takes the
[00:34:27] um remember a language model takes the sequence and predicts the next token so
[00:34:30] sequence and predicts the next token so i can take start out with x
[00:34:33] i can take start out with x and i can use a query the language model
[00:34:36] and i can use a query the language model to generate the next token and then i
[00:34:38] to generate the next token and then i can feed this token attach this token to
[00:34:42] can feed this token attach this token to the history query the language model
[00:34:44] the history query the language model again to generate the next token and so
[00:34:46] again to generate the next token and so on and so forth until i'm i
[00:34:50] on and so forth until i'm i am done
[00:34:52] am done um so this is by and large how a lot of
[00:34:55] um so this is by and large how a lot of the state-of-the-art
[00:34:57] the state-of-the-art methods for for example machine
[00:34:59] methods for for example machine translation works for generating a
[00:35:02] translation works for generating a translated sentence given input sentence
[00:35:04] translated sentence given input sentence or document
[00:35:06] or document summarization or semantic parsing
[00:35:08] summarization or semantic parsing each of these are sequence can be framed
[00:35:11] each of these are sequence can be framed as sequences sequence tasks
[00:35:13] as sequences sequence tasks based on
[00:35:15] based on usually these days um
[00:35:18] usually these days um basically bird and transformers
[00:35:22] okay so that was a really quick and
[00:35:24] okay so that was a really quick and high-level world when tour of different
[00:35:27] high-level world when tour of different types of differentiable programs from
[00:35:29] types of differentiable programs from deep learning so we started with
[00:35:32] deep learning so we started with uh now in in hindsight it seems kind of
[00:35:35] uh now in in hindsight it seems kind of very simple feed forward
[00:35:37] very simple feed forward networks
[00:35:40] networks then we looked at images and looked at
[00:35:42] then we looked at images and looked at convolutional neural networks which were
[00:35:44] convolutional neural networks which were built on conflict conf layers and max
[00:35:47] built on conflict conf layers and max pool layers
[00:35:48] pool layers and also feed forward so the nice thing
[00:35:51] and also feed forward so the nice thing about packaging this in the module is
[00:35:52] about packaging this in the module is that now this actually is used in
[00:35:54] that now this actually is used in transformers in the different
[00:35:56] transformers in the different places as well
[00:36:00] for us text and sequences
[00:36:03] for us text and sequences we first have to embed them into a
[00:36:05] we first have to embed them into a sequence of vectors
[00:36:07] sequence of vectors and then we have kind of two
[00:36:10] and then we have kind of two choices we can either
[00:36:12] choices we can either use
[00:36:13] use recurrent neural networks
[00:36:15] recurrent neural networks or we can use transformers which are
[00:36:18] or we can use transformers which are based on
[00:36:20] based on attention
[00:36:22] attention um we can use sequence models collapse
[00:36:25] um we can use sequence models collapse it into a vector to drive classification
[00:36:28] it into a vector to drive classification decisions
[00:36:29] decisions or we can use them to generate
[00:36:32] or we can use them to generate new sequences as well
[00:36:34] new sequences as well so there are many details that are
[00:36:36] so there are many details that are glossed over in particular
[00:36:39] glossed over in particular the some of the architectures have been
[00:36:40] the some of the architectures have been simplified so i encourage you to consult
[00:36:42] simplified so i encourage you to consult the original source if you want the kind
[00:36:44] the original source if you want the kind of the actual
[00:36:45] of the actual um
[00:36:46] um uh the full gory details another thing i
[00:36:50] uh the full gory details another thing i haven't talked about is learning any of
[00:36:51] haven't talked about is learning any of these models
[00:36:53] these models it's going to be using some variant of
[00:36:54] it's going to be using some variant of stochastic gradient descent but there's
[00:36:56] stochastic gradient descent but there's often various tricks that are needed to
[00:37:00] often various tricks that are needed to get it to work
[00:37:01] get it to work but maybe the final thing i'll leave you
[00:37:04] but maybe the final thing i'll leave you with is the idea that
[00:37:07] with is the idea that all of these of differential programming
[00:37:10] all of these of differential programming which is that all of these complex
[00:37:11] which is that all of these complex models are built out of modules and even
[00:37:15] models are built out of modules and even if you kind of don't understand or i
[00:37:16] if you kind of don't understand or i didn't explain the details i think it's
[00:37:19] didn't explain the details i think it's really important to pay attention to the
[00:37:21] really important to pay attention to the kind of type signature of these um
[00:37:25] kind of type signature of these um of these functions
[00:37:26] of these functions as well as with an intuitive idea of
[00:37:30] as well as with an intuitive idea of what each of these are doing
[00:37:33] what each of these are doing okay so that ends this module thanks for
[00:37:35] okay so that ends this module thanks for listening


================================================================================
LECTURE 014
================================================================================

Artificial Intelligence & Machine Learning 11 - Generalization | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=Gq-Ah-QrOQM

---

Transcript

[00:00:06] hi in this module i'm going to be
[00:00:07] hi in this module i'm going to be talking about the generalization of
[00:00:09] talking about the generalization of machine learning algorithm
[00:00:11] machine learning algorithm so recall that a machine learning
[00:00:13] so recall that a machine learning framework has three design decisions the
[00:00:16] framework has three design decisions the first is the hypothesis class which
[00:00:18] first is the hypothesis class which could be linear predictors or neural
[00:00:20] could be linear predictors or neural networks the second design decision is a
[00:00:22] networks the second design decision is a loss function which in the case of
[00:00:24] loss function which in the case of regression is could be word loss
[00:00:26] regression is could be word loss it's of classification it could be the
[00:00:28] it's of classification it could be the hinge or logistic loss if you take the
[00:00:30] hinge or logistic loss if you take the loss and you average them you get the
[00:00:32] loss and you average them you get the training loss which is our training
[00:00:34] training loss which is our training objective that we've so far been
[00:00:36] objective that we've so far been optimizing
[00:00:37] optimizing and finally we have the optimization
[00:00:39] and finally we have the optimization algorithm which is either gradient
[00:00:41] algorithm which is either gradient descent or stochastic
[00:00:44] descent or stochastic all good so far
[00:00:46] all good so far now let's take a step back and be a
[00:00:47] now let's take a step back and be a little more critical
[00:00:49] little more critical is this
[00:00:50] is this the training loss in particular a good
[00:00:53] the training loss in particular a good objective
[00:00:54] objective optimizing
[00:00:57] so here is a little cartoon example
[00:01:00] so here is a little cartoon example that does really well on training laws
[00:01:03] that does really well on training laws here it goes it's called rote learning
[00:01:06] here it goes it's called rote learning so the role learning algorithm is just
[00:01:07] so the role learning algorithm is just going to store all the training examples
[00:01:10] going to store all the training examples and then it's going to return this
[00:01:12] and then it's going to return this predictor and this predictor takes
[00:01:15] predictor and this predictor takes an input x and it's going to search for
[00:01:17] an input x and it's going to search for x in the training set and they can find
[00:01:19] x in the training set and they can find it then it's going to return the
[00:01:21] it then it's going to return the corresponding y
[00:01:23] corresponding y and otherwise it just gives up some seg
[00:01:25] and otherwise it just gives up some seg follows our questions
[00:01:27] follows our questions okay so this
[00:01:28] okay so this learning algorithm minimizes the
[00:01:30] learning algorithm minimizes the objective
[00:01:32] objective perfectly it gets zero training loss
[00:01:34] perfectly it gets zero training loss but
[00:01:35] but you can kind of tell that it's a bad
[00:01:36] you can kind of tell that it's a bad idea but it because it doesn't get
[00:01:38] idea but it because it doesn't get anything else right
[00:01:42] so this was an example of extreme
[00:01:44] so this was an example of extreme overfitting here are some examples of
[00:01:47] overfitting here are some examples of less extreme overfitting in pictures
[00:01:50] less extreme overfitting in pictures so here's an example from
[00:01:52] so here's an example from classification um you can see that the
[00:01:55] classification um you can see that the green decision boundary here tries
[00:01:57] green decision boundary here tries really hard to separate the blue and the
[00:01:59] really hard to separate the blue and the red points and does so successfully
[00:02:02] red points and does so successfully getting zero trigger
[00:02:04] getting zero trigger but you can kind of intuitively sense
[00:02:06] but you can kind of intuitively sense that it's overfitting
[00:02:08] that it's overfitting and perhaps this black decision boundary
[00:02:10] and perhaps this black decision boundary would be better
[00:02:12] would be better in case of regression this
[00:02:16] red curve gets zero training loss by
[00:02:19] red curve gets zero training loss by going through all the training points
[00:02:21] going through all the training points but you can see that it's overfitting
[00:02:23] but you can see that it's overfitting and instead maybe you should be
[00:02:25] and instead maybe you should be capturing the broader trend using simple
[00:02:27] capturing the broader trend using simple line
[00:02:29] line so in general if you try to overly
[00:02:31] so in general if you try to overly optimize the training loss then you risk
[00:02:34] optimize the training loss then you risk over fitting the noise in the
[00:02:38] so then what is the true objective if it
[00:02:41] so then what is the true objective if it isn't the training
[00:02:42] isn't the training well to answer that question let's take
[00:02:44] well to answer that question let's take a step back and think what are we trying
[00:02:46] a step back and think what are we trying to do machine learning is just a means
[00:02:49] to do machine learning is just a means and the end is a predictor that you're
[00:02:51] and the end is a predictor that you're going to launch into the world and make
[00:02:53] going to launch into the world and make predictions on real people
[00:02:55] predictions on real people and this just happens to be trained from
[00:02:58] and this just happens to be trained from a learning algorithm
[00:03:00] a learning algorithm so how good is this predictor in
[00:03:03] so how good is this predictor in the world
[00:03:04] the world well
[00:03:05] well the answer is
[00:03:07] the answer is it's the goal that how good it is
[00:03:09] it's the goal that how good it is depends on how well it's able to
[00:03:12] depends on how well it's able to predict on unseen future examples so our
[00:03:16] predict on unseen future examples so our true learning objective should be to
[00:03:18] true learning objective should be to minimize the error on unseen future
[00:03:20] minimize the error on unseen future examples sounds great
[00:03:22] examples sounds great only one small problem is that we don't
[00:03:24] only one small problem is that we don't have access to the future and in
[00:03:27] have access to the future and in particular if we don't see the examples
[00:03:29] particular if we don't see the examples how can we do anything about them
[00:03:32] how can we do anything about them so
[00:03:33] so often we settle for the next best thing
[00:03:35] often we settle for the next best thing which is
[00:03:36] which is get a test set
[00:03:38] get a test set and the test set is just a set of
[00:03:39] and the test set is just a set of examples that you didn't use for
[00:03:41] examples that you didn't use for training so it is a surrogate for
[00:03:44] training so it is a surrogate for the unseen future examples
[00:03:46] the unseen future examples so i make this distinction
[00:03:49] so i make this distinction uh because i want to stress the fact
[00:03:51] uh because i want to stress the fact that when you deploy a machine learning
[00:03:52] that when you deploy a machine learning algorithm into a predictor into the
[00:03:55] algorithm into a predictor into the world
[00:03:56] world it might encounter all sorts of you know
[00:03:58] it might encounter all sorts of you know crazy things um and
[00:04:01] crazy things um and what you
[00:04:02] what you do in the
[00:04:04] do in the training and in the lab is all you have
[00:04:06] training and in the lab is all you have is a test set so
[00:04:08] is a test set so the what you're trying to do is trying
[00:04:10] the what you're trying to do is trying to have the test set be as close and as
[00:04:13] to have the test set be as close and as representative of what you actually get
[00:04:16] representative of what you actually get in the real world as possible
[00:04:20] so
[00:04:22] so now we have an intuitive feeling for
[00:04:24] now we have an intuitive feeling for overfitting is can we make this a little
[00:04:26] overfitting is can we make this a little bit more precise in particular when does
[00:04:29] bit more precise in particular when does a learning algorithm generalize from the
[00:04:31] a learning algorithm generalize from the training set
[00:04:32] training set to the test set because that's kind of
[00:04:34] to the test set because that's kind of what we've settled for
[00:04:37] what we've settled for so
[00:04:37] so there is a way to make this
[00:04:39] there is a way to make this mathematically rigorous but i just want
[00:04:42] mathematically rigorous but i just want to give you the the framing of how to
[00:04:45] to give you the the framing of how to think about generalization
[00:04:49] think about generalization so the starting point is f star
[00:04:52] so the starting point is f star this is the predictor
[00:04:53] this is the predictor that is the ideal thing it predicts
[00:04:56] that is the ideal thing it predicts everything as correctly as you can
[00:04:59] everything as correctly as you can before
[00:05:00] before um this lives in the family of all
[00:05:02] um this lives in the family of all predictors
[00:05:04] predictors of course we can't get to f star
[00:05:06] of course we can't get to f star so what do we do well we do two things
[00:05:09] so what do we do well we do two things we first
[00:05:10] we first define a hypothesis class script f
[00:05:14] define a hypothesis class script f and
[00:05:15] and then
[00:05:16] then we are going to
[00:05:18] we are going to have a learning algorithm that
[00:05:20] have a learning algorithm that finds a particular predictor within this
[00:05:24] finds a particular predictor within this hypothesis class
[00:05:27] hypothesis class so another predictor i'm going to
[00:05:30] so another predictor i'm going to talk about is g
[00:05:32] talk about is g this is also a kind of a thing that you
[00:05:34] this is also a kind of a thing that you can't get a hold of it's the best
[00:05:37] can't get a hold of it's the best predictor that you can find in the
[00:05:39] predictor that you can find in the hypothesis class
[00:05:42] hypothesis class so now
[00:05:43] so now this this
[00:05:44] this this we're interested in the difference
[00:05:46] we're interested in the difference between the error
[00:05:47] between the error of the thing you have
[00:05:50] of the thing you have and the thing that you wish you had okay
[00:05:53] and the thing that you wish you had okay so mathematically that's written as
[00:05:55] so mathematically that's written as error of the learn predictor minus the
[00:05:57] error of the learn predictor minus the error of f star id
[00:06:00] error of f star id and this error can be decomposed into
[00:06:02] and this error can be decomposed into two parts
[00:06:04] two parts the first part is the approximation
[00:06:06] the first part is the approximation error
[00:06:07] error approximation error is the difference
[00:06:09] approximation error is the difference between g and f star
[00:06:11] between g and f star mathematically that's the difference
[00:06:13] mathematically that's the difference between the error of
[00:06:15] between the error of uh
[00:06:17] uh that's approximation error is the
[00:06:19] that's approximation error is the difference between
[00:06:20] difference between the error of g minus the error of f star
[00:06:25] the error of g minus the error of f star this measures how good your hypothesis
[00:06:28] this measures how good your hypothesis class is
[00:06:29] class is the second
[00:06:31] the second error is the estimation error
[00:06:33] error is the estimation error which is the gap between
[00:06:35] which is the gap between f hat and g
[00:06:37] f hat and g this measures how good is the learn
[00:06:39] this measures how good is the learn predictor relative to the potential of
[00:06:41] predictor relative to the potential of the hypothesis class error of f hat
[00:06:44] the hypothesis class error of f hat minus the error of g
[00:06:47] minus the error of g and you can verify this identity because
[00:06:49] and you can verify this identity because we're doing just subtracting error of g
[00:06:51] we're doing just subtracting error of g and adding error of g
[00:06:53] and adding error of g so this right hand side is equal to this
[00:06:55] so this right hand side is equal to this left-hand side
[00:06:57] left-hand side this trivial identity highlights these
[00:07:00] this trivial identity highlights these two quantities approximation error and
[00:07:02] two quantities approximation error and estimation error and gives us a language
[00:07:04] estimation error and gives us a language to talk about the trade-offs and
[00:07:05] to talk about the trade-offs and generalization
[00:07:09] so let's get some more intuition about
[00:07:11] so let's get some more intuition about how approximation and estimation error
[00:07:14] how approximation and estimation error behave as you increase the size of the
[00:07:16] behave as you increase the size of the hypothesis class
[00:07:19] hypothesis class so when
[00:07:21] so when the hypothesis class grows
[00:07:23] the hypothesis class grows the approximation error will decrease
[00:07:27] the approximation error will decrease this is because
[00:07:29] this is because approximation error is
[00:07:31] approximation error is measuring how good g is and the g is the
[00:07:34] measuring how good g is and the g is the best thing in the class and if you're
[00:07:35] best thing in the class and if you're adding more things the best thing is
[00:07:37] adding more things the best thing is just going to
[00:07:39] just going to get better
[00:07:40] get better so in other words you're taking a min
[00:07:42] so in other words you're taking a min over a larger set bound to decree
[00:07:45] over a larger set bound to decree where you're
[00:07:46] where you're optimizing
[00:07:48] optimizing the second thing that happens is that
[00:07:50] the second thing that happens is that the estimation error increases
[00:07:53] the estimation error increases when you the hypothesis class grows
[00:07:56] when you the hypothesis class grows and this is because
[00:07:58] and this is because it's harder to estimate something
[00:08:01] it's harder to estimate something more complex there's just more functions
[00:08:03] more complex there's just more functions that the learning algorithm has to
[00:08:06] figure out which one's the correct one
[00:08:08] figure out which one's the correct one given the limited data
[00:08:10] given the limited data so there are ways uh to make this more
[00:08:13] so there are ways uh to make this more precise using the tools from statistical
[00:08:15] precise using the tools from statistical learning theory but i'll just leave it
[00:08:16] learning theory but i'll just leave it as an intro for now
[00:08:20] so given these tradeoffs
[00:08:22] so given these tradeoffs what are the ways that we can use to
[00:08:24] what are the ways that we can use to control the hypothesis class size
[00:08:28] control the hypothesis class size so we're going to focus our attention to
[00:08:30] so we're going to focus our attention to linear predictor so remember in linear
[00:08:32] linear predictor so remember in linear predictors the
[00:08:34] predictors the the each predictor has a particular
[00:08:36] the each predictor has a particular weight vector
[00:08:38] weight vector so effectively the number of way the
[00:08:40] so effectively the number of way the size of the set of weight vectors
[00:08:42] size of the set of weight vectors determines the size of the hypothesis
[00:08:44] determines the size of the hypothesis class
[00:08:46] class so one thing you can do
[00:08:48] so one thing you can do is to reduce the dimensionality
[00:08:51] is to reduce the dimensionality of the set of possible wave vectors
[00:08:55] of the set of possible wave vectors so pictorially this looks like this so
[00:08:57] so pictorially this looks like this so imagine you have three features
[00:08:59] imagine you have three features so the set of weight vectors or this
[00:09:02] so the set of weight vectors or this three dimension of each weight vector is
[00:09:04] three dimension of each weight vector is just a
[00:09:06] just a ball
[00:09:07] ball and
[00:09:08] and if you remove one feature then you end
[00:09:10] if you remove one feature then you end up with a two-dimensional
[00:09:12] up with a two-dimensional uh ball
[00:09:14] uh ball um equivalently this is saying one of
[00:09:16] um equivalently this is saying one of the features has to have zero weight
[00:09:18] the features has to have zero weight which you can think about as a
[00:09:19] which you can think about as a restriction on a set of values that w
[00:09:25] so how do you control the dimensionality
[00:09:28] so how do you control the dimensionality of practice
[00:09:29] of practice the process is called feature selection
[00:09:31] the process is called feature selection or feature template selection
[00:09:33] or feature template selection you can do this manually by adding
[00:09:35] you can do this manually by adding feature templates seeing if they help
[00:09:37] feature templates seeing if they help and removing them if they don't and
[00:09:39] and removing them if they don't and you're trying to kind of manually figure
[00:09:42] you're trying to kind of manually figure out which what is a small set of
[00:09:44] out which what is a small set of features that actually gets you good
[00:09:46] features that actually gets you good accuracy
[00:09:49] accuracy dude there's also ways to do this more
[00:09:51] dude there's also ways to do this more automatically you can do forward
[00:09:53] automatically you can do forward selection boosting more l1
[00:09:54] selection boosting more l1 regularization this is beyond the scope
[00:09:56] regularization this is beyond the scope of the class
[00:09:57] of the class but there are ways to make this more
[00:10:00] but there are ways to make this more less manual
[00:10:02] less manual one thing i want to stress is that
[00:10:04] one thing i want to stress is that controlling the dimensionality
[00:10:05] controlling the dimensionality dimensionality is this number of
[00:10:08] dimensionality is this number of features and that's the key quantity
[00:10:10] features and that's the key quantity that matters not the number of feature
[00:10:12] that matters not the number of feature templates
[00:10:13] templates and also not the complexity of each
[00:10:16] and also not the complexity of each individual feature so imagine you write
[00:10:18] individual feature so imagine you write a thousand lines to compute one feature
[00:10:20] a thousand lines to compute one feature well it's still a very simple uh
[00:10:23] well it's still a very simple uh hypothesis class because it's just one
[00:10:25] hypothesis class because it's just one feature so in so far as generalization
[00:10:28] feature so in so far as generalization is concerned
[00:10:31] so the second strategy
[00:10:33] so the second strategy is controlling the norm or the length of
[00:10:35] is controlling the norm or the length of this wave
[00:10:38] this wave so we can reduce the norm of or the
[00:10:40] so we can reduce the norm of or the length
[00:10:41] length visually this looks like if you have a
[00:10:44] visually this looks like if you have a set of weight vectors which are bounded
[00:10:47] set of weight vectors which are bounded in length
[00:10:48] in length um you can shrink the length and that re
[00:10:51] um you can shrink the length and that re results in a smaller
[00:10:53] results in a smaller circle which is clearly a smaller number
[00:10:56] circle which is clearly a smaller number of weight vectors
[00:10:58] of weight vectors and so this is probably the most common
[00:11:00] and so this is probably the most common way
[00:11:01] way to
[00:11:03] to um
[00:11:04] um control the norm
[00:11:07] control the norm so
[00:11:08] so there are
[00:11:10] there are two ways to do this one is
[00:11:12] two ways to do this one is uh by regularization so remember the
[00:11:15] uh by regularization so remember the objective which we didn't like was
[00:11:17] objective which we didn't like was minimizing the training loss
[00:11:20] minimizing the training loss of w because that can lead for fitting
[00:11:23] of w because that can lead for fitting so
[00:11:24] so you one way to recognize is you add a
[00:11:27] you one way to recognize is you add a penalty term
[00:11:29] penalty term lambda over 2 times the norm of w
[00:11:32] lambda over 2 times the norm of w squared
[00:11:33] squared okay so w is a positive number
[00:11:36] okay so w is a positive number which controls the strength of this
[00:11:38] which controls the strength of this penalty
[00:11:39] penalty and what this penalty does is it says
[00:11:42] and what this penalty does is it says let's try to
[00:11:44] let's try to minimize the training loss but
[00:11:48] minimize the training loss but we also want to keep the norm small
[00:11:51] we also want to keep the norm small because we're taking a min over the sum
[00:11:52] because we're taking a min over the sum here
[00:11:54] here so if we look at what gradient descent
[00:11:56] so if we look at what gradient descent does to this objective we can interpret
[00:11:58] does to this objective we can interpret it as follows so gradient descent
[00:12:00] it as follows so gradient descent remember initializes weights iterates
[00:12:02] remember initializes weights iterates over t epochs
[00:12:04] over t epochs and performs an update so the update is
[00:12:07] and performs an update so the update is w minus eta the step size times
[00:12:10] w minus eta the step size times this
[00:12:11] this gradient of the training loss
[00:12:13] gradient of the training loss and now we take the gradient of this
[00:12:16] and now we take the gradient of this penalty which is just lambda times w
[00:12:19] penalty which is just lambda times w so
[00:12:20] so remember we're subtracting eta so if w
[00:12:24] remember we're subtracting eta so if w is um
[00:12:26] is um let's say uh 10 comma 10
[00:12:28] let's say uh 10 comma 10 then what we're going to do is we're
[00:12:30] then what we're going to do is we're going to subtract that vector and move
[00:12:32] going to subtract that vector and move the weights closer uh to zero
[00:12:35] the weights closer uh to zero by a amount that depends on
[00:12:42] so another way to control the norm is by
[00:12:46] so another way to control the norm is by early stopping so early stopping is a is
[00:12:49] early stopping so early stopping is a is really easy
[00:12:50] really easy to explain so here it is um you run
[00:12:53] to explain so here it is um you run gradient descent you initialize w and
[00:12:56] gradient descent you initialize w and you repeat a number of epochs and uh you
[00:12:59] you repeat a number of epochs and uh you perform the update and the only thing is
[00:13:01] perform the update and the only thing is that you're just going to reduce the
[00:13:03] that you're just going to reduce the number of epochs you go for
[00:13:05] number of epochs you go for that's it
[00:13:07] that's it so
[00:13:07] so this seems like a hack um there is you
[00:13:10] this seems like a hack um there is you can develop some theory about it but the
[00:13:12] can develop some theory about it but the intuition is that when you start the
[00:13:14] intuition is that when you start the weights at zero that's the smallest norm
[00:13:17] weights at zero that's the smallest norm and when you update the weights over a
[00:13:20] and when you update the weights over a number of iterations the norm of w is
[00:13:22] number of iterations the norm of w is actually going to grow it's not obvious
[00:13:24] actually going to grow it's not obvious that this always happens but empirically
[00:13:26] that this always happens but empirically this true generally
[00:13:29] this true generally so by stopping a gradient descent early
[00:13:32] so by stopping a gradient descent early you're saying don't let the norm of w
[00:13:34] you're saying don't let the norm of w get too big
[00:13:36] get too big so the lesson here is you're trying to
[00:13:38] so the lesson here is you're trying to minimize the training error but you're
[00:13:40] minimize the training error but you're not trying too hard because you're just
[00:13:42] not trying too hard because you're just going to call it quits after a while
[00:13:46] okay so let's summarize now
[00:13:48] okay so let's summarize now so we started by saying
[00:13:51] so we started by saying the training loss
[00:13:52] the training loss is not the true objective
[00:13:55] is not the true objective the real objective is
[00:13:58] the real objective is minimizing the loss on
[00:14:00] minimizing the loss on unseen future examples
[00:14:02] unseen future examples unfortunately we don't have access to
[00:14:04] unfortunately we don't have access to that so we're going to settle for the
[00:14:06] that so we're going to settle for the loss on some test data which serves as a
[00:14:09] loss on some test data which serves as a surrogate to the unseen examples
[00:14:12] surrogate to the unseen examples then we studied approximation and
[00:14:15] then we studied approximation and estimation error as a way to
[00:14:17] estimation error as a way to understand generalization and it's
[00:14:20] understand generalization and it's always going to be a balancing act
[00:14:22] always going to be a balancing act between fitting the training error and
[00:14:25] between fitting the training error and not
[00:14:27] not letting your hypothesis class grow too
[00:14:29] letting your hypothesis class grow too big
[00:14:31] big and the mantra to end with is perhaps
[00:14:34] and the mantra to end with is perhaps just keep it simple
[00:14:37] just keep it simple so right now we've introduced a bunch of
[00:14:39] so right now we've introduced a bunch of knobs for comparing the size of the
[00:14:43] knobs for comparing the size of the hypothesis class
[00:14:44] hypothesis class next we'll see how to actually turn


================================================================================
LECTURE 015
================================================================================

Artificial Intelligence & Machine Learning 12 - Best Practices | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=ouvGV2YZEEM

---

Transcript

[00:00:05] so we've spent a lot of time talking
[00:00:07] so we've spent a lot of time talking about the formal principles of machine
[00:00:09] about the formal principles of machine learning in this module i'm going to
[00:00:10] learning in this module i'm going to talk more about the empirical aspects of
[00:00:14] talk more about the empirical aspects of machine practice
[00:00:17] machine practice so recall the three design decisions for
[00:00:20] so recall the three design decisions for a machine learning
[00:00:22] a machine learning algorithm
[00:00:23] algorithm first set up the hypothesis class
[00:00:26] first set up the hypothesis class training objective and
[00:00:28] training objective and optimization average
[00:00:30] optimization average and each of these design decisions has
[00:00:32] and each of these design decisions has itself a bunch of different
[00:00:35] itself a bunch of different so for the hypothesis class
[00:00:37] so for the hypothesis class uh you have to specify the feature
[00:00:39] uh you have to specify the feature extractor fee
[00:00:41] extractor fee new features quadratic features and you
[00:00:44] new features quadratic features and you also have to specify the architecture
[00:00:46] also have to specify the architecture the linear predictor or to use a one
[00:00:48] the linear predictor or to use a one layer neural network or a two layer
[00:00:50] layer neural network or a two layer neural network and how many hidden units
[00:00:52] neural network and how many hidden units do you have when you have a neural
[00:00:54] do you have when you have a neural network
[00:00:56] network so for the training objective um there's
[00:00:58] so for the training objective um there's a question of what should the loss
[00:00:59] a question of what should the loss function be
[00:01:00] function be the hinge loss or there's just a class
[00:01:03] the hinge loss or there's just a class um and then what about regularization
[00:01:05] um and then what about regularization you use regularization uh what is should
[00:01:08] you use regularization uh what is should its strength be
[00:01:10] its strength be the optimization algorithm
[00:01:12] the optimization algorithm even the vanilla stochastic gradient
[00:01:14] even the vanilla stochastic gradient descent has
[00:01:15] descent has two hyper parameters one is the number
[00:01:17] two hyper parameters one is the number of epochs
[00:01:19] of epochs another one is the step size
[00:01:22] another one is the step size here it's a constant but maybe you want
[00:01:24] here it's a constant but maybe you want it to be decreasing or you want to use a
[00:01:26] it to be decreasing or you want to use a fancier adaptive sepsis rule at a grad
[00:01:29] fancier adaptive sepsis rule at a grad or atom
[00:01:30] or atom if you're training deep neural networks
[00:01:32] if you're training deep neural networks there's more things to think about
[00:01:35] there's more things to think about there's initialization how much noise do
[00:01:37] there's initialization how much noise do you add in
[00:01:39] you add in retraining
[00:01:41] retraining use a batch size for caster gradient
[00:01:43] use a batch size for caster gradient descent as batch size 1 do you use 4 16
[00:01:47] descent as batch size 1 do you use 4 16 um what about using a drop out rate to a
[00:01:50] um what about using a drop out rate to a guard against overfitting
[00:01:52] guard against overfitting so quickly you see that the design space
[00:01:54] so quickly you see that the design space becomes quite big and it's really kind
[00:01:57] becomes quite big and it's really kind of like choose your own
[00:01:58] of like choose your own venture
[00:02:00] venture some of these design decisions can be
[00:02:02] some of these design decisions can be made uh based on principles for example
[00:02:05] made uh based on principles for example if you believe that your data has some
[00:02:06] if you believe that your data has some sort of periodic structure you can add
[00:02:09] sort of periodic structure you can add periodic features
[00:02:10] periodic features but many of these if not most of the
[00:02:13] but many of these if not most of the design decisions are
[00:02:15] design decisions are really unclear
[00:02:17] really unclear and you sometimes just want an automatic
[00:02:19] and you sometimes just want an automatic way for these science decisions made
[00:02:25] so each of these design decisions is
[00:02:27] so each of these design decisions is called a hyper parameter hyper
[00:02:29] called a hyper parameter hyper parameters are the design decisions that
[00:02:32] parameters are the design decisions that may need to be made before running the
[00:02:34] may need to be made before running the learning so how do you choose these
[00:02:38] learning so how do you choose these so how about you we choose the design of
[00:02:40] so how about you we choose the design of the hyper parameters to minimize the
[00:02:42] the hyper parameters to minimize the training error
[00:02:44] training error so this is a
[00:02:46] so this is a really bad idea because
[00:02:48] really bad idea because the optimum would be to just include all
[00:02:51] the optimum would be to just include all the features use no regularization train
[00:02:53] the features use no regularization train forever and really drive the training
[00:02:56] forever and really drive the training loss down down down
[00:02:57] loss down down down but remember the training loss is not
[00:02:59] but remember the training loss is not the quantity that we remember about
[00:03:03] the quantity that we remember about okay so how about we choose hyper
[00:03:04] okay so how about we choose hyper parameters to minimize the test error
[00:03:08] parameters to minimize the test error so
[00:03:09] so this might generate actually good hyper
[00:03:11] this might generate actually good hyper parameters
[00:03:12] parameters but this is also bad because now you're
[00:03:15] but this is also bad because now you're looking at the test
[00:03:16] looking at the test set which makes it an unreliable
[00:03:19] set which makes it an unreliable estimate of the actual error
[00:03:23] so what do we do then
[00:03:24] so what do we do then so the solution is to use a held out
[00:03:27] so the solution is to use a held out validation set it's also known as a
[00:03:29] validation set it's also known as a holdout set or development set
[00:03:31] holdout set or development set and this set is just taken out of the
[00:03:33] and this set is just taken out of the training set and it's used to optimize
[00:03:35] training set and it's used to optimize hyper parameters
[00:03:37] hyper parameters urinals
[00:03:39] urinals so here's a picture you leave the test
[00:03:40] so here's a picture you leave the test set alone it's isolated from what you're
[00:03:43] set alone it's isolated from what you're doing here
[00:03:45] doing here and you take the training set and you
[00:03:47] and you take the training set and you divide it into a validation set which is
[00:03:50] divide it into a validation set which is usually a small fraction
[00:03:52] usually a small fraction but large enough to get reliable
[00:03:53] but large enough to get reliable estimates and then the rest of the
[00:03:55] estimates and then the rest of the training set
[00:03:57] training set so now for each setting the hyper
[00:03:58] so now for each setting the hyper parameters you can train on this train
[00:04:00] parameters you can train on this train minus the validation and then evaluate
[00:04:03] minus the validation and then evaluate the validation and then you can choose
[00:04:05] the validation and then you can choose the hyper parameters
[00:04:07] the hyper parameters to be the ones that
[00:04:09] to be the ones that minimize the error on this validation
[00:04:15] so now i'm going to talk about
[00:04:17] so now i'm going to talk about model development strategy
[00:04:20] model development strategy so we've talked a lot about uh the
[00:04:22] so we've talked a lot about uh the formal machinery
[00:04:24] formal machinery and i'm just going to walk you through
[00:04:26] and i'm just going to walk you through kind of a typical
[00:04:28] kind of a typical development cycle
[00:04:31] development cycle so what you do is you start out by
[00:04:34] so what you do is you start out by splitting the data you get some data and
[00:04:36] splitting the data you get some data and you split it into train validation and
[00:04:39] you split it into train validation and test you lock away the test set
[00:04:42] test you lock away the test set and you look at the data not the test
[00:04:44] and you look at the data not the test data the train or validation data to get
[00:04:47] data the train or validation data to get intuition you want to understand
[00:04:49] intuition you want to understand uh what kind of properties your problem
[00:04:52] uh what kind of properties your problem that you're trying to solve has
[00:04:55] that you're trying to solve has and then you repeat
[00:04:56] and then you repeat the following
[00:04:57] the following so you implement a model architecture or
[00:05:00] so you implement a model architecture or a feature extractor
[00:05:02] a feature extractor or you adjust some hyper parameters and
[00:05:05] or you adjust some hyper parameters and then you run the learning algorithm you
[00:05:07] then you run the learning algorithm you train a model
[00:05:09] train a model and then you look at sandy check the
[00:05:11] and then you look at sandy check the train and validation errors along the
[00:05:13] train and validation errors along the way make sure the training error is
[00:05:15] way make sure the training error is going down making sure the validation
[00:05:16] going down making sure the validation error more or less goes down it goes up
[00:05:19] error more or less goes down it goes up that means you're over fitting
[00:05:21] that means you're over fitting you also want to look at
[00:05:23] you also want to look at at least for linear classifiers the
[00:05:25] at least for linear classifiers the weights if they're interpretable get uh
[00:05:27] weights if they're interpretable get uh against any check and get some intuition
[00:05:30] against any check and get some intuition and you also want to look at some
[00:05:31] and you also want to look at some prediction errors you want to understand
[00:05:33] prediction errors you want to understand if the model is not doing as well as you
[00:05:36] if the model is not doing as well as you like
[00:05:37] like how is it making how is it screwing up
[00:05:41] how is it making how is it screwing up and you repeat this until you're
[00:05:43] and you repeat this until you're satisfied and then finally you unlock
[00:05:46] satisfied and then finally you unlock the test set you evaluate on the test
[00:05:48] the test set you evaluate on the test set to get your final error rates that
[00:05:50] set to get your final error rates that you put in your
[00:05:53] so let's walk through an example of how
[00:05:57] so let's walk through an example of how works okay so i'm going to take this
[00:05:59] works okay so i'm going to take this simple example of um a named ntt
[00:06:02] simple example of um a named ntt recognition
[00:06:03] recognition um so here the input is a string
[00:06:07] um so here the input is a string which contains a name along with a word
[00:06:11] which contains a name along with a word to the left and the word to the right to
[00:06:13] to the left and the word to the right to offer some context
[00:06:15] offer some context and the output is going to be whether
[00:06:17] and the output is going to be whether x
[00:06:18] x excluding this initial and final word is
[00:06:22] excluding this initial and final word is a person or not in this case gavin
[00:06:24] a person or not in this case gavin newsom is uh plus one
[00:06:27] newsom is uh plus one person
[00:06:28] person okay so now i'm going to uh code this up
[00:06:34] okay so now i'm going to uh code this up so
[00:06:35] so we have um ner.py
[00:06:38] we have um ner.py so this is a file um
[00:06:41] so this is a file um that we're going to use and uh this file
[00:06:44] that we're going to use and uh this file actually depends on
[00:06:45] actually depends on submission.py.util.py
[00:06:48] submission.py.util.py from your sentiment homework so if you
[00:06:50] from your sentiment homework so if you have that you can plug it in with code
[00:06:52] have that you can plug it in with code and see it in action for yourself
[00:06:54] and see it in action for yourself um so let me just walk through this
[00:06:57] um so let me just walk through this first let's read the training examples
[00:07:00] first let's read the training examples the validation examples and then we're
[00:07:02] the validation examples and then we're going to learn a predictor
[00:07:04] going to learn a predictor this returns a set of weights
[00:07:07] this returns a set of weights um we're going to output the weights to
[00:07:08] um we're going to output the weights to a file
[00:07:10] a file output the
[00:07:11] output the error analysis to file which i'll show
[00:07:13] error analysis to file which i'll show you in a second and then this part is
[00:07:15] you in a second and then this part is commented out because we don't want to
[00:07:17] commented out because we don't want to run
[00:07:17] run uh evaluation on a test set just yet
[00:07:21] uh evaluation on a test set just yet okay so the first thing we want to do is
[00:07:23] okay so the first thing we want to do is just open up this training file just to
[00:07:25] just open up this training file just to get some intuition for what the data
[00:07:27] get some intuition for what the data looks like okay so each line here is a
[00:07:29] looks like okay so each line here is a training example um this is why this is
[00:07:32] training example um this is why this is a minus which means not a person and
[00:07:34] a minus which means not a person and this is x
[00:07:36] this is x mauritius into mauritius is uh not a
[00:07:39] mauritius into mauritius is uh not a person
[00:07:40] person um us is not a person malaysia's not a
[00:07:43] um us is not a person malaysia's not a person
[00:07:44] person sarah pakowski is a person
[00:07:46] sarah pakowski is a person plus one moscow is not a person
[00:07:49] plus one moscow is not a person um and so on
[00:07:52] um and so on see all these training examples we have
[00:07:54] see all these training examples we have around 7000
[00:07:59] okay so so now let's begin by
[00:08:02] okay so so now let's begin by implementing um
[00:08:04] implementing um feature extractor i have to implement
[00:08:06] feature extractor i have to implement this
[00:08:07] this function it's going to take x
[00:08:10] function it's going to take x and
[00:08:11] and just to uh
[00:08:14] just to uh put a comment what does x looks like so
[00:08:17] put a comment what does x looks like so example x
[00:08:19] example x is the string here
[00:08:21] is the string here um and then i'm going to define the
[00:08:23] um and then i'm going to define the feature vector um to be a dictionary
[00:08:28] feature vector um to be a dictionary of upload so this is going to be a
[00:08:30] of upload so this is going to be a sparse representation of the future
[00:08:33] sparse representation of the future vector
[00:08:34] vector um okay so that is the simplest feature
[00:08:38] um okay so that is the simplest feature extractor it happens to be the empty
[00:08:40] extractor it happens to be the empty vector with no features but let's just
[00:08:42] vector with no features but let's just see
[00:08:43] see what happens
[00:08:45] what happens um start simple so starting really
[00:08:47] um start simple so starting really simple here
[00:08:49] simple here okay so let's do python
[00:08:52] okay so let's do python py
[00:08:53] py uh we see that across a number of
[00:08:56] uh we see that across a number of iterations
[00:08:57] iterations test error is uh really high 72 per
[00:09:01] test error is uh really high 72 per percent error
[00:09:02] percent error this is not surprising because we
[00:09:04] this is not surprising because we haven't we don't have any features
[00:09:06] haven't we don't have any features okay so let's add some features so
[00:09:09] okay so let's add some features so uh maybe a kind of obvious feature to
[00:09:11] uh maybe a kind of obvious feature to add is uh looking at the identity of
[00:09:15] add is uh looking at the identity of entity
[00:09:17] entity so what we're gonna do is i'm gonna
[00:09:19] so what we're gonna do is i'm gonna process x a little bit so i'm going to
[00:09:22] process x a little bit so i'm going to split it
[00:09:23] split it into a bunch of tokens
[00:09:25] into a bunch of tokens so a list containing took
[00:09:27] so a list containing took mauritius and into
[00:09:29] mauritius and into and then i'm going to split it up into
[00:09:32] and then i'm going to split it up into the left
[00:09:33] the left entity and the right
[00:09:35] entity and the right and this is going to be token 0 tokens 1
[00:09:39] and this is going to be token 0 tokens 1 through um
[00:09:41] through um everything except for the last
[00:09:43] everything except for the last token and then i'm going to have the
[00:09:45] token and then i'm going to have the last
[00:09:46] last okay so i'm just going to divide x into
[00:09:49] okay so i'm just going to divide x into these three parts
[00:09:52] these three parts so now i can define a feature template
[00:09:56] so now i can define a feature template so let's define the feature template to
[00:09:59] so let's define the feature template to be fee
[00:10:01] be fee entity
[00:10:02] entity is um and then i'm entertaining is now
[00:10:06] is um and then i'm entertaining is now an array so i'm going to join it and i'm
[00:10:08] an array so i'm going to join it and i'm going to set that to 1. so this is a
[00:10:10] going to set that to 1. so this is a binary uh this is one line that
[00:10:13] binary uh this is one line that represents
[00:10:14] represents one feature template but it instantiates
[00:10:17] one feature template but it instantiates into
[00:10:18] into a whole bunch of different
[00:10:21] a whole bunch of different features one for every possible entity
[00:10:27] okay so i'm naming the feature in a way
[00:10:30] okay so i'm naming the feature in a way that makes it
[00:10:32] that makes it really interpretable
[00:10:34] really interpretable so we'll see how this is quite useful
[00:10:38] okay so let's um run this
[00:10:41] okay so let's um run this see what happens
[00:10:43] see what happens and now the error is uh 19
[00:10:46] and now the error is uh 19 it's in progress the training error is
[00:10:48] it's in progress the training error is really low which means that we're really
[00:10:50] really low which means that we're really fitting the training error
[00:10:52] fitting the training error um so now let's go and inspect what
[00:10:55] um so now let's go and inspect what happens we look at the weights here
[00:10:58] happens we look at the weights here um so this is sorted from positive to
[00:11:01] um so this is sorted from positive to negative
[00:11:02] negative here we have the feature name and
[00:11:05] here we have the feature name and and the weight
[00:11:06] and the weight so up here you can see that entity is
[00:11:09] so up here you can see that entity is and these generally happen to be
[00:11:13] and these generally happen to be um and if you look at the bottom
[00:11:15] um and if you look at the bottom we see things which are not names
[00:11:18] we see things which are not names okay so this is a good sanity check that
[00:11:21] okay so this is a good sanity check that says suggest that the learning is
[00:11:23] says suggest that the learning is working
[00:11:24] working um let's look at the error analysis so
[00:11:27] um let's look at the error analysis so this shows you on the validation set
[00:11:30] this shows you on the validation set on the predictions that the model makes
[00:11:33] on the predictions that the model makes so here is the first
[00:11:35] so here is the first input
[00:11:36] input um
[00:11:39] on the true
[00:11:40] on the true label is one
[00:11:42] label is one person but we predicted minus one
[00:11:45] person but we predicted minus one which is wrong
[00:11:47] which is wrong and here i'm showing the the features
[00:11:50] and here i'm showing the the features that and their particular weights so
[00:11:53] that and their particular weights so entity is a romero that has a feature
[00:11:55] entity is a romero that has a feature value of one its weight is zero the
[00:11:58] value of one its weight is zero the weight of zero generally means that it
[00:12:00] weight of zero generally means that it never saw this feature at training time
[00:12:02] never saw this feature at training time so therefore the score is zero so we
[00:12:05] so therefore the score is zero so we have no idea what to do example
[00:12:08] have no idea what to do example um here is another example of the senate
[00:12:11] um here is another example of the senate it says entity is senate senate and uh
[00:12:14] it says entity is senate senate and uh senate has a weight of negative one so
[00:12:16] senate has a weight of negative one so we have a score of negative one and we
[00:12:18] we have a score of negative one and we make prediction
[00:12:20] make prediction um so let me just look through these
[00:12:24] um so let me just look through these incorrect predictions
[00:12:26] incorrect predictions margaret allah
[00:12:29] margaret allah flamed
[00:12:31] flamed was midfielder
[00:12:33] was midfielder and you can kind of see well
[00:12:36] and you can kind of see well it's unreasonable to expect that the
[00:12:38] it's unreasonable to expect that the entities have been seen
[00:12:40] entities have been seen before
[00:12:42] before so why don't we try to use the context
[00:12:44] so why don't we try to use the context to figure out whether uh the name is the
[00:12:47] to figure out whether uh the name is the entity
[00:12:48] entity the name is a person or not
[00:12:52] the name is a person or not so let's go over here um
[00:12:55] so let's go over here um i'm going to
[00:12:57] i'm going to define a feature template left is
[00:13:01] define a feature template left is left
[00:13:02] left and
[00:13:03] and right is right
[00:13:05] right is right um
[00:13:06] um so this is a feature template left is
[00:13:08] so this is a feature template left is blank as we've written past and i'm
[00:13:11] blank as we've written past and i'm instantiating this feature for this
[00:13:13] instantiating this feature for this particular x
[00:13:15] particular x which is taking um the actual value left
[00:13:18] which is taking um the actual value left here okay so i added two feature
[00:13:21] here okay so i added two feature templates um let's run this
[00:13:25] templates um let's run this and now we can see that
[00:13:27] and now we can see that the error rate has
[00:13:30] the error rate has gone down to 11
[00:13:32] gone down to 11 right
[00:13:34] right um
[00:13:35] um notice that the training error doesn't
[00:13:36] notice that the training error doesn't actually go down as fast because with
[00:13:39] actually go down as fast because with more features sometimes it's harder to
[00:13:41] more features sometimes it's harder to optimize but that's okay because we
[00:13:43] optimize but that's okay because we don't care about the training and we
[00:13:45] don't care about the training and we only care about
[00:13:46] only care about the test error going down um one one uh
[00:13:49] the test error going down um one one uh note is that this says test error but
[00:13:52] note is that this says test error but i'm actually
[00:13:53] i'm actually um
[00:13:54] um passing in here the validation set a
[00:13:58] passing in here the validation set a learn predictor the function turns out
[00:14:00] learn predictor the function turns out test error because it doesn't
[00:14:03] test error because it doesn't have any idea where
[00:14:06] okay so let's look at the weights
[00:14:09] okay so let's look at the weights so
[00:14:10] so at the top
[00:14:11] at the top still uh features that's look at the
[00:14:13] still uh features that's look at the entity
[00:14:14] entity clinton um
[00:14:16] clinton um nelson
[00:14:18] nelson here is some examples left is minister
[00:14:22] here is some examples left is minister so if you have minister someone that is
[00:14:25] so if you have minister someone that is probably a person president someone as a
[00:14:27] probably a person president someone as a person
[00:14:28] person um if you look down here you see that if
[00:14:31] um if you look down here you see that if the left context is the
[00:14:33] the left context is the the weight is minus which means that the
[00:14:36] the weight is minus which means that the something is not a person so this all
[00:14:38] something is not a person so this all makes sense it's good sanity check
[00:14:41] makes sense it's good sanity check let's look at the error analysis so now
[00:14:44] let's look at the error analysis so now we're getting a royal romero correct um
[00:14:46] we're getting a royal romero correct um let's see what we're getting wrong so
[00:14:49] let's see what we're getting wrong so it is blamed
[00:14:51] it is blamed um
[00:14:52] um felix and attila
[00:14:53] felix and attila um never uh i guess we've seen that
[00:14:56] um never uh i guess we've seen that person before um
[00:14:59] person before um workers party
[00:15:00] workers party um it's never seen this
[00:15:02] um it's never seen this um
[00:15:03] um and you know now you can think more
[00:15:06] and you know now you can think more brainstorm and think well
[00:15:08] brainstorm and think well could we maybe we don't aren't gonna see
[00:15:11] could we maybe we don't aren't gonna see the exact
[00:15:12] the exact stream match of an entity but you can
[00:15:14] stream match of an entity but you can maybe break it down
[00:15:17] maybe break it down pieces
[00:15:18] pieces so what i'm going to do is for each word
[00:15:21] so what i'm going to do is for each word in entity
[00:15:22] in entity i'm going to say entity contains
[00:15:26] i'm going to say entity contains a word
[00:15:28] a word so it's pretty easy to write feature
[00:15:30] so it's pretty easy to write feature templates um
[00:15:32] templates um and this is very intuitive so this
[00:15:33] and this is very intuitive so this feature template just says does this
[00:15:35] feature template just says does this entity contain a
[00:15:37] entity contain a particular word
[00:15:39] particular word in the entity
[00:15:42] in the entity okay so let's run this and now the error
[00:15:44] okay so let's run this and now the error rate is going down to six percent okay
[00:15:47] rate is going down to six percent okay so we're making good progress um
[00:15:51] so we're making good progress um let's look at the weights i need to
[00:15:54] let's look at the weights i need to check this
[00:15:55] check this um so an entity contains
[00:15:58] um so an entity contains um this feature will fire both for
[00:16:01] um this feature will fire both for clinton and also as well as bill clinton
[00:16:03] clinton and also as well as bill clinton um
[00:16:04] um and these contains features are
[00:16:06] and these contains features are generally seen more general and they're
[00:16:08] generally seen more general and they're given high weight
[00:16:10] given high weight at the bottom
[00:16:11] at the bottom again if it has new in it it's probably
[00:16:14] again if it has new in it it's probably like new york or something and that's
[00:16:17] like new york or something and that's not probably not going to be a person
[00:16:19] not probably not going to be a person contains newsroom i don't know too many
[00:16:21] contains newsroom i don't know too many folks
[00:16:24] um
[00:16:25] um error analysis uh let's see what's wrong
[00:16:28] error analysis uh let's see what's wrong here we're still getting this wrong
[00:16:30] here we're still getting this wrong sometimes you just
[00:16:33] going on we're still getting this
[00:16:35] going on we're still getting this kurdistan party
[00:16:37] kurdistan party one wrong
[00:16:38] one wrong um
[00:16:39] um iron man and sometimes it's it's kind of
[00:16:41] iron man and sometimes it's it's kind of hard to know what to do so
[00:16:44] hard to know what to do so um
[00:16:45] um you know let's just try something else
[00:16:47] you know let's just try something else let's say
[00:16:48] let's say uh going in the spirit of decomposing
[00:16:51] uh going in the spirit of decomposing the entity into words we can go further
[00:16:54] the entity into words we can go further and have patterns that match on prefix
[00:16:57] and have patterns that match on prefix prefixes and suffixes
[00:16:59] prefixes and suffixes so we can say the entity contains the
[00:17:02] so we can say the entity contains the prefix
[00:17:03] prefix word and just
[00:17:05] word and just arbitrarily choose the first four
[00:17:07] arbitrarily choose the first four uh characters
[00:17:09] uh characters um it changes the suffix
[00:17:12] um it changes the suffix word and i'm gonna choose last
[00:17:15] word and i'm gonna choose last characters
[00:17:17] so first for characters
[00:17:20] so first for characters um last character
[00:17:23] um last character okay
[00:17:24] okay all right so let's
[00:17:26] all right so let's see how this does
[00:17:29] see how this does now we can see that the error rate is
[00:17:31] now we can see that the error rate is going down to four percent so we made a
[00:17:33] going down to four percent so we made a little bit of progress uh there
[00:17:35] little bit of progress uh there um i'm actually going to call it uh
[00:17:37] um i'm actually going to call it uh quits for now just in the interest of
[00:17:39] quits for now just in the interest of time we've made a lot of progress from
[00:17:41] time we've made a lot of progress from seventy percent error to only four
[00:17:43] seventy percent error to only four percent error
[00:17:45] percent error but remember this is only on the
[00:17:47] but remember this is only on the validation set
[00:17:48] validation set so now comes the final
[00:17:51] so now comes the final trial
[00:17:52] trial see how well
[00:17:53] see how well this does on the test set so read in the
[00:17:56] this does on the test set so read in the test set and then evaluate
[00:17:59] test set and then evaluate the predictor on this
[00:18:01] the predictor on this test set
[00:18:02] test set and i'm going to print it out so let's
[00:18:04] and i'm going to print it out so let's run this
[00:18:06] run this and hope that we didn't overfit
[00:18:09] and hope that we didn't overfit um and here we actually
[00:18:11] um and here we actually did even better on the test set than
[00:18:15] did even better on the test set than validation set which sometimes happens
[00:18:17] validation set which sometimes happens um
[00:18:18] um there's just always some randomness here
[00:18:20] there's just always some randomness here and so we ended up with four percent uh
[00:18:23] and so we ended up with four percent uh iterate which is um pretty good for
[00:18:26] iterate which is um pretty good for uh 10 minutes before
[00:18:28] uh 10 minutes before so in practice it's things are probably
[00:18:31] so in practice it's things are probably not going to go as smooth as this this
[00:18:33] not going to go as smooth as this this is just kind of illustrative example um
[00:18:35] is just kind of illustrative example um to illustrate the the kind of process
[00:18:37] to illustrate the the kind of process here
[00:18:40] okay so there's much more to be said
[00:18:43] okay so there's much more to be said about the practice of machine learning
[00:18:45] about the practice of machine learning um but
[00:18:46] um but so
[00:18:47] so i'm just going to give you kind of some
[00:18:48] i'm just going to give you kind of some general
[00:18:49] general you know advice
[00:18:51] you know advice many of these tips are kind of related
[00:18:53] many of these tips are kind of related to having best software engineering
[00:18:56] to having best software engineering practice
[00:18:57] practice so the first thing i want to talk about
[00:18:59] so the first thing i want to talk about is starting simple so the wrong thing to
[00:19:01] is starting simple so the wrong thing to do is code up a really complicated
[00:19:03] do is code up a really complicated learning algorithm run it on a million
[00:19:05] learning algorithm run it on a million examples and watch again fashion burn
[00:19:08] examples and watch again fashion burn wonder what happened
[00:19:10] wonder what happened um
[00:19:11] um simplify um both in terms of running on
[00:19:14] simplify um both in terms of running on small subsets of your data
[00:19:16] small subsets of your data maybe even synthetic data and also start
[00:19:18] maybe even synthetic data and also start with a simple baseline model we started
[00:19:20] with a simple baseline model we started with a literally a classifier that have
[00:19:22] with a literally a classifier that have zero features and maybe one feature just
[00:19:25] zero features and maybe one feature just so we can see and understand what it's
[00:19:27] so we can see and understand what it's doing so this is important because it
[00:19:30] doing so this is important because it allows you to work in a regime where
[00:19:32] allows you to work in a regime where things are understandable
[00:19:34] things are understandable and also importantly
[00:19:36] and also importantly things run quickly
[00:19:38] things run quickly so you want fast iteration time like
[00:19:41] so you want fast iteration time like what we you just saw was you we can
[00:19:43] what we you just saw was you we can quickly try something get a result try
[00:19:45] quickly try something get a result try something weird result if you have to
[00:19:47] something weird result if you have to wait
[00:19:48] wait 10 hours to get a result then you're
[00:19:50] 10 hours to get a result then you're just not going to make as much progress
[00:19:52] just not going to make as much progress because you won't be rigged
[00:19:54] because you won't be rigged um
[00:19:55] um one
[00:19:56] one sanity check that i would recommend is
[00:19:59] sanity check that i would recommend is uh try to train on very few examples
[00:20:02] uh try to train on very few examples like five examples and see if you can
[00:20:05] like five examples and see if you can overfit see if you can drive the
[00:20:07] overfit see if you can drive the training error to zero
[00:20:09] training error to zero now of course doing so is not going to
[00:20:11] now of course doing so is not going to give you a useful model
[00:20:13] give you a useful model but it will
[00:20:14] but it will tell you whether your the machinery is
[00:20:17] tell you whether your the machinery is working or not if you're unable to fit
[00:20:19] working or not if you're unable to fit five examples that
[00:20:21] five examples that something is wrong it could mean that
[00:20:24] something is wrong it could mean that your data is too noisy or you lack
[00:20:26] your data is too noisy or you lack certain features or your model is not
[00:20:28] certain features or your model is not expressive enough
[00:20:29] expressive enough or your learning algorithm isn't working
[00:20:31] or your learning algorithm isn't working maybe you try to optimize a zero loss or
[00:20:34] maybe you try to optimize a zero loss or something i don't know but
[00:20:36] something i don't know but anyway it's a good sanity check
[00:20:38] anyway it's a good sanity check uh the second thing is log everything
[00:20:42] uh the second thing is log everything print out metrics
[00:20:44] print out metrics uh um so track the training lots and the
[00:20:47] uh um so track the training lots and the validation lots over time make sure that
[00:20:49] validation lots over time make sure that it's going down as intended record the
[00:20:52] it's going down as intended record the hyper parameters
[00:20:54] hyper parameters that you're using to train so you can
[00:20:55] that you're using to train so you can keep track of what you actually did to
[00:20:59] keep track of what you actually did to get through your result for now
[00:21:00] get through your result for now statistics of the data set how many
[00:21:02] statistics of the data set how many features how many examples um the model
[00:21:06] features how many examples um the model uh how many weights are there the norm
[00:21:08] uh how many weights are there the norm of the weights predictions as you saw
[00:21:10] of the weights predictions as you saw and it was really useful to have this
[00:21:12] and it was really useful to have this file that showed you
[00:21:14] file that showed you exactly how the model makes each
[00:21:17] exactly how the model makes each prediction it just gives you a lot more
[00:21:19] prediction it just gives you a lot more insight
[00:21:20] insight finally
[00:21:21] finally uh spend some time figuring how to
[00:21:23] uh spend some time figuring how to organize your experiments i like to have
[00:21:26] organize your experiments i like to have each run i make go into a separate
[00:21:29] each run i make go into a separate folder which you can save
[00:21:31] folder which you can save um
[00:21:32] um and so then later back later you can go
[00:21:35] and so then later back later you can go back and check
[00:21:36] back and check all the models the predictions and
[00:21:40] all the models the predictions and a record of all the hyper parameters
[00:21:42] a record of all the hyper parameters that you use so that you just have an
[00:21:44] that you use so that you just have an idea of what you did
[00:21:47] idea of what you did and then a note about reporting your
[00:21:49] and then a note about reporting your results
[00:21:50] results it's important to run your experiments
[00:21:52] it's important to run your experiments multiple times
[00:21:54] multiple times particularly with different random seeds
[00:21:56] particularly with different random seeds so you make sure that your
[00:21:59] so you make sure that your results are stable and reliable and then
[00:22:02] results are stable and reliable and then you can report the mean and the standard
[00:22:05] you can report the mean and the standard deviation over
[00:22:07] deviation over these random seeds
[00:22:10] and finally um
[00:22:12] and finally um we often in machine learning tend to be
[00:22:15] we often in machine learning tend to be guilty of distilling everything down
[00:22:17] guilty of distilling everything down into one number as a test error
[00:22:20] into one number as a test error um but in in practice we might be
[00:22:24] um but in in practice we might be interested in multiple metrics in
[00:22:26] interested in multiple metrics in particular it's important to make sure
[00:22:28] particular it's important to make sure that
[00:22:29] that if you get five percent error understand
[00:22:31] if you get five percent error understand what those errors are sometimes it's
[00:22:34] what those errors are sometimes it's useful to report the error rates on
[00:22:36] useful to report the error rates on different
[00:22:37] different minority groups or subpopulations if you
[00:22:39] minority groups or subpopulations if you have access to that information
[00:22:42] have access to that information and generally be cognizant of the biases
[00:22:44] and generally be cognizant of the biases in your model
[00:22:47] okay to summary summarize we've talked
[00:22:50] okay to summary summarize we've talked about the practice of machine learning
[00:22:53] about the practice of machine learning first make sure you have good data
[00:22:55] first make sure you have good data hygiene
[00:22:56] hygiene separate your test set leave it alone
[00:22:58] separate your test set leave it alone and divide your training set into a
[00:22:59] and divide your training set into a validation and the rest don't look at
[00:23:02] validation and the rest don't look at the touch set
[00:23:03] the touch set but do look at the training or
[00:23:05] but do look at the training or validation set to understand the shape
[00:23:07] validation set to understand the shape of your data so that you can
[00:23:11] of your data so that you can have intuition for
[00:23:13] have intuition for deciding how to model the
[00:23:15] deciding how to model the start simple
[00:23:17] start simple and finally
[00:23:19] and finally there's a lot
[00:23:20] there's a lot of design decisions which can be
[00:23:22] of design decisions which can be overwhelming
[00:23:23] overwhelming at first and
[00:23:25] at first and the most important thing is to practice
[00:23:28] the most important thing is to practice doing that
[00:23:29] doing that doing experimentation so that you can
[00:23:31] doing experimentation so that you can start developing an intuition of what
[00:23:33] start developing an intuition of what hyperparameters
[00:23:35] hyperparameters matter and what kind of effect they have
[00:23:37] matter and what kind of effect they have and then eventually developing a set of
[00:23:39] and then eventually developing a set of best practices for yourself
[00:23:42] best practices for yourself okay that's all


================================================================================
LECTURE 016
================================================================================

Machine Learning 13 - K-means | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=5-Fn8R9fH7A

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about k-means a simple algorithm for
[00:00:09] about k-means a simple algorithm for clustering
[00:00:10] clustering one form of unsupervised learning
[00:00:13] one form of unsupervised learning so i want to start with a classical
[00:00:15] so i want to start with a classical example of clustering from the nlp
[00:00:18] example of clustering from the nlp literature around clustering
[00:00:20] literature around clustering so this was the
[00:00:22] so this was the unsupervised learning method of choice
[00:00:24] unsupervised learning method of choice before word vector
[00:00:26] before word vector or contextualized word
[00:00:28] or contextualized word so on so the input to the algorithm was
[00:00:32] so on so the input to the algorithm was simply raw text lots of lots of words of
[00:00:34] simply raw text lots of lots of words of musicals and the output was a clustering
[00:00:38] musicals and the output was a clustering of those words
[00:00:40] of those words um so
[00:00:41] um so the algorithm was able to pick out a
[00:00:43] the algorithm was able to pick out a cluster one which
[00:00:45] cluster one which friday monday thursday and generally
[00:00:47] friday monday thursday and generally days of the week a certude had months
[00:00:51] days of the week a certude had months um cluster three had some sort of
[00:00:53] um cluster three had some sort of natural resources
[00:00:54] natural resources and you can list and each cluster had
[00:00:58] and you can list and each cluster had fairly coherent
[00:01:00] fairly coherent uh
[00:01:01] uh structure
[00:01:02] structure in it
[00:01:04] in it one thing that's
[00:01:05] one thing that's quite interesting to note is that no one
[00:01:08] quite interesting to note is that no one told this algorithm
[00:01:09] told this algorithm what days of the week were or what
[00:01:11] what days of the week were or what months or what family relations are it
[00:01:13] months or what family relations are it was able to simply figure all the stuff
[00:01:16] was able to simply figure all the stuff out just by looking at the data
[00:01:19] out just by looking at the data so on a personal note brown clustering
[00:01:22] so on a personal note brown clustering was actually my first experience that
[00:01:24] was actually my first experience that got me to pursue research in nlp in the
[00:01:26] got me to pursue research in nlp in the first place just seeing the results of
[00:01:28] first place just seeing the results of unsupervised learning uh when it worked
[00:01:30] unsupervised learning uh when it worked was kind of really magical
[00:01:32] was kind of really magical and of course today we're seeing even
[00:01:34] and of course today we're seeing even more strong evidence of the sheer
[00:01:36] more strong evidence of the sheer potential of unsupervised learning with
[00:01:39] potential of unsupervised learning with a language
[00:01:40] a language for attention
[00:01:44] so
[00:01:45] so i want to contrast
[00:01:47] i want to contrast to unsupervised learning and supervised
[00:01:49] to unsupervised learning and supervised learning
[00:01:50] learning so in
[00:01:52] so in supervised learning we looked at
[00:01:53] supervised learning we looked at classification you're given a training
[00:01:55] classification you're given a training set which is labeled
[00:01:57] set which is labeled so
[00:01:58] so inputs are labeled with an output y
[00:02:01] inputs are labeled with an output y this goes into a learning algorithm you
[00:02:02] this goes into a learning algorithm you get a classifier and then you click on
[00:02:05] get a classifier and then you click on new points
[00:02:08] new points and
[00:02:09] and the main challenge with on the label
[00:02:11] the main challenge with on the label data is that label data is expensive to
[00:02:13] data is that label data is expensive to obtain you have to have
[00:02:15] obtain you have to have annotated
[00:02:17] annotated often domain experts add
[00:02:21] so in contrast
[00:02:24] so in contrast unsupervised learning um
[00:02:26] unsupervised learning um of which clustering is a form of
[00:02:29] of which clustering is a form of is uses on label data that's very cheap
[00:02:33] is uses on label data that's very cheap to obtain
[00:02:34] to obtain so as a concrete example suppose you
[00:02:37] so as a concrete example suppose you have uh some points
[00:02:39] have uh some points here
[00:02:40] here and they're just unlabeled points
[00:02:43] and they're just unlabeled points um
[00:02:44] um any data can be sizes here there's
[00:02:47] any data can be sizes here there's points
[00:02:48] points um and we want a learning algorithm
[00:02:50] um and we want a learning algorithm that can produce
[00:02:52] that can produce not a predictor but
[00:02:53] not a predictor but an assignment of each point uh to a
[00:02:57] an assignment of each point uh to a cluster i have two clusters
[00:02:59] cluster i have two clusters so let's assign the first
[00:03:02] so let's assign the first uh four points um the blue cluster
[00:03:06] uh four points um the blue cluster here are the four points blue cluster
[00:03:09] here are the four points blue cluster and the second set of points uh to the
[00:03:12] and the second set of points uh to the orange cluster
[00:03:13] orange cluster are the points down
[00:03:17] so intuitively
[00:03:19] so intuitively we want to assign nearby points to the
[00:03:22] we want to assign nearby points to the same cluster
[00:03:23] same cluster and you can kind of see that these
[00:03:25] and you can kind of see that these points are closer to each other and
[00:03:27] points are closer to each other and these points are closer to the other end
[00:03:30] these points are closer to the other end some separation
[00:03:32] some separation clusters here
[00:03:36] so more formally the task of clustering
[00:03:38] so more formally the task of clustering is we're given some training points uh d
[00:03:41] is we're given some training points uh d train and this is the list of points
[00:03:44] train and this is the list of points under xn
[00:03:45] under xn and the output is an assignment of each
[00:03:48] and the output is an assignment of each point to a cluster
[00:03:50] point to a cluster formally
[00:03:51] formally we have an assignment vector z
[00:03:54] we have an assignment vector z which is going to be here z1
[00:03:56] which is going to be here z1 all the way through zn
[00:03:58] all the way through zn where each zi
[00:04:00] where each zi is an element from there's a number
[00:04:03] is an element from there's a number between one
[00:04:04] between one and k
[00:04:06] and k so assuming we have big k clusters
[00:04:09] so assuming we have big k clusters each point is assigned to one of the
[00:04:15] so
[00:04:17] so what makes a cluster
[00:04:20] what makes a cluster the key assumption behind
[00:04:22] the key assumption behind k-means
[00:04:24] k-means is that each cluster can be represented
[00:04:26] is that each cluster can be represented faithfully by a centroid
[00:04:30] and we're going to concatenate all the
[00:04:32] and we're going to concatenate all the centers together to form uh
[00:04:36] so this diagram illustrates what a
[00:04:38] so this diagram illustrates what a centroid is trying to capture
[00:04:40] centroid is trying to capture the centroid is in some sense
[00:04:43] the centroid is in some sense a point which is closest to all the
[00:04:46] a point which is closest to all the other points on in that clusters
[00:04:49] other points on in that clusters represents a cluster by some concrete
[00:04:52] represents a cluster by some concrete point in space
[00:04:54] point in space centroid
[00:04:56] centroid so the intuition in terms of centroid is
[00:04:59] so the intuition in terms of centroid is that we want
[00:05:00] that we want each point
[00:05:01] each point to be close to its assigned centroid u
[00:05:05] to be close to its assigned centroid u of z i's a bit of notation which i'll go
[00:05:08] of z i's a bit of notation which i'll go through later but intuitively you can
[00:05:10] through later but intuitively you can look at this point over here
[00:05:12] look at this point over here we want this point to be close to
[00:05:15] we want this point to be close to um the centroid of the sine cluster
[00:05:20] um the centroid of the sine cluster and this point
[00:05:22] and this point goes to centroid
[00:05:27] so now we can define the k-means
[00:05:29] so now we can define the k-means objective function
[00:05:31] objective function here's a picture which i walk through so
[00:05:34] here's a picture which i walk through so the k-means objective function is uh
[00:05:36] the k-means objective function is uh denoted as a loss
[00:05:39] denoted as a loss he means loss function
[00:05:41] he means loss function and it's a function of the cluster
[00:05:43] and it's a function of the cluster assignments um
[00:05:45] assignments um one through zn
[00:05:46] one through zn and the
[00:05:48] and the cluster centroids mu1 through
[00:05:50] cluster centroids mu1 through and this is equal to i'm going to look
[00:05:54] and this is equal to i'm going to look at all the endpoints
[00:05:57] at all the endpoints some over them
[00:05:59] some over them looking at the
[00:06:01] looking at the i point
[00:06:03] i point look at the difference between
[00:06:05] look at the difference between um
[00:06:06] um so zi is
[00:06:08] so zi is uh one a number between one and k which
[00:06:11] uh one a number between one and k which specifies which cluster
[00:06:13] specifies which cluster a point i is assigned to
[00:06:16] a point i is assigned to and i'm going to access
[00:06:18] and i'm going to access its uh centroid
[00:06:20] its uh centroid and i'm going to take a difference
[00:06:23] and i'm going to take a difference between these two and uh square it so
[00:06:25] between these two and uh square it so this is the square difference between
[00:06:28] this is the square difference between distance between uh the point and its
[00:06:31] distance between uh the point and its okay so
[00:06:32] okay so pictorially what i'm looking at is for
[00:06:35] pictorially what i'm looking at is for each point i look at its uh
[00:06:38] each point i look at its uh assigned centroid
[00:06:40] assigned centroid and i'm looking at the squared
[00:06:42] and i'm looking at the squared of the length of these dashed uh lines
[00:06:46] of the length of these dashed uh lines sum of all the squares of the dashed
[00:06:48] sum of all the squares of the dashed lines is exactly
[00:06:50] lines is exactly amy plus
[00:06:52] amy plus i have to be as small as possible so i
[00:06:54] i have to be as small as possible so i want to minimize with respect to both
[00:06:57] want to minimize with respect to both the clusters centroids the cluster
[00:07:00] the clusters centroids the cluster assignments and the centroids
[00:07:04] assignments and the centroids of this objective function
[00:07:09] so
[00:07:11] so this is to get some intuition
[00:07:13] this is to get some intuition let's consider a simpler example in one
[00:07:16] let's consider a simpler example in one dimension so here we have four points at
[00:07:19] dimension so here we have four points at 0
[00:07:21] 0 2
[00:07:22] 2 um 10 and 12.
[00:07:25] um 10 and 12. okay so
[00:07:27] okay so i'm going to consider
[00:07:28] i'm going to consider the
[00:07:29] the the case of if we know what the
[00:07:31] the case of if we know what the centroids are does that make our life
[00:07:34] centroids are does that make our life easier
[00:07:35] easier so if we know the centroids are at one
[00:07:37] so if we know the centroids are at one and eleven
[00:07:39] and eleven um indeed this becomes a pretty trivial
[00:07:42] um indeed this becomes a pretty trivial problem because now remember how do we
[00:07:45] problem because now remember how do we assign points well
[00:07:46] assign points well for the first point we just assign it to
[00:07:49] for the first point we just assign it to the closest centroid
[00:07:50] the closest centroid because where we know where the
[00:07:52] because where we know where the centroids are
[00:07:53] centroids are um so this point is zero is closest to
[00:07:55] um so this point is zero is closest to one so i'm going to assign this one
[00:07:58] one so i'm going to assign this one for z2
[00:08:00] for z2 this point is closest to one so
[00:08:03] this point is closest to one so one
[00:08:05] one this point is closest to plus the
[00:08:07] this point is closest to plus the century for cluster two
[00:08:08] century for cluster two and same with
[00:08:10] and same with this so all i'm doing is
[00:08:13] this so all i'm doing is looking at all the centroids and
[00:08:16] looking at all the centroids and one is the closest to
[00:08:18] one is the closest to the point i'm trying to assign
[00:08:22] so now let's consider the case where i
[00:08:25] so now let's consider the case where i don't know
[00:08:26] don't know centroids
[00:08:28] centroids but i have the assignments
[00:08:30] but i have the assignments if i have the assignments
[00:08:33] if i have the assignments then i can also compute the centroid
[00:08:37] then i can also compute the centroid so for the first cluster
[00:08:40] so for the first cluster i simply look at all the points that are
[00:08:42] i simply look at all the points that are assigned to that cluster
[00:08:44] assigned to that cluster and remember i want to find
[00:08:48] and remember i want to find the centroid which is as close as
[00:08:50] the centroid which is as close as possible to all the mount average and so
[00:08:54] possible to all the mount average and so this is going to be minimum over
[00:08:57] this is going to be minimum over sum of the squared distances
[00:08:59] sum of the squared distances and recall that that is exactly
[00:09:02] and recall that that is exactly optimized in closed form by sending it
[00:09:04] optimized in closed form by sending it to mean
[00:09:06] to mean of
[00:09:06] of the points assigned to that cluster
[00:09:09] the points assigned to that cluster so for mere two
[00:09:11] so for mere two um points 10 and 12 are assigned to that
[00:09:14] um points 10 and 12 are assigned to that cluster so the mean of those is 11.
[00:09:19] cluster so the mean of those is 11. so now
[00:09:20] so now given either the cluster
[00:09:23] given either the cluster assignments
[00:09:24] assignments or the centroids we can successfully
[00:09:27] or the centroids we can successfully recover the
[00:09:28] recover the optimally
[00:09:29] optimally but this is a chicken and egg problem
[00:09:31] but this is a chicken and egg problem because we neither have
[00:09:33] because we neither have the centroids nor do we have the
[00:09:35] the centroids nor do we have the assignments to begin with
[00:09:38] assignments to begin with so what can we do
[00:09:40] so what can we do well
[00:09:41] well um let's just uh take a gamble and just
[00:09:45] um let's just uh take a gamble and just initialize randomly
[00:09:47] initialize randomly okay so we're going to initialize
[00:09:49] okay so we're going to initialize the the centroids to some random
[00:09:53] the the centroids to some random so let's us usually they are assigned to
[00:09:56] so let's us usually they are assigned to uh some of the existing points data
[00:09:58] uh some of the existing points data points so let's just assign them to
[00:10:02] points so let's just assign them to so clearly this is not optimum
[00:10:04] so clearly this is not optimum but let's try to iterate to
[00:10:08] but let's try to iterate to so first iteration
[00:10:09] so first iteration what we're going to do
[00:10:11] what we're going to do is we're going to fix these centroids
[00:10:14] is we're going to fix these centroids and optimize the cluster assignments so
[00:10:17] and optimize the cluster assignments so let's look at each point and try to
[00:10:19] let's look at each point and try to assign it to one of these clusters
[00:10:22] assign it to one of these clusters so zero is closest to zero so i'm going
[00:10:24] so zero is closest to zero so i'm going to assign that
[00:10:26] to assign that plus a one
[00:10:28] plus a one this point
[00:10:30] this point two is closest to cluster two because
[00:10:33] two is closest to cluster two because it's right on top so that's two
[00:10:35] it's right on top so that's two and these two points
[00:10:37] and these two points are also closest to
[00:10:39] are also closest to um
[00:10:40] um uh cluster two so they'll annotate them
[00:10:43] uh cluster two so they'll annotate them assign them to cluster two
[00:10:46] assign them to cluster two and then i'm going to use these uh new
[00:10:51] and then i'm going to use these uh new assignments and try to re-estimate the
[00:10:53] assignments and try to re-estimate the centroid
[00:10:55] centroid so for the first cluster
[00:10:58] so for the first cluster i'm just going that's only this point so
[00:11:00] i'm just going that's only this point so i'm going to keep the
[00:11:02] i'm going to keep the centroid there
[00:11:03] centroid there and for the second cluster now i have uh
[00:11:06] and for the second cluster now i have uh these three points and i'm going to
[00:11:09] these three points and i'm going to find a
[00:11:11] find a place the centroid that minimizes the
[00:11:13] place the centroid that minimizes the square distance to all of them which is
[00:11:15] square distance to all of them which is going to be the average at eight
[00:11:18] going to be the average at eight which is two plus ten plus twelve
[00:11:20] which is two plus ten plus twelve divided by three
[00:11:22] divided by three okay so now i have these uh updated
[00:11:24] okay so now i have these uh updated centroids at zero and eight you can see
[00:11:26] centroids at zero and eight you can see that this is looking a bit better
[00:11:28] that this is looking a bit better and now the second iteration i'm going
[00:11:32] and now the second iteration i'm going to
[00:11:33] to um reassign the points
[00:11:35] um reassign the points based on these uh plus new centroids is
[00:11:38] based on these uh plus new centroids is thrown in
[00:11:40] thrown in so um the first
[00:11:43] so um the first at this point is going to be assigned to
[00:11:46] at this point is going to be assigned to cluster
[00:11:47] cluster right here
[00:11:49] right here um this point is uh is also going to be
[00:11:52] um this point is uh is also going to be close assigned to cluster 1 because 0 is
[00:11:56] close assigned to cluster 1 because 0 is closer to 2 than this
[00:12:00] closer to 2 than this and uh
[00:12:01] and uh point 10 is going to be closest to
[00:12:04] point 10 is going to be closest to the second cluster
[00:12:06] the second cluster and same with point
[00:12:10] so now we have new cluster assignments
[00:12:12] so now we have new cluster assignments we can go back and uh re-estimate the
[00:12:16] we can go back and uh re-estimate the centroids
[00:12:18] centroids and now we're back in our familiar uh
[00:12:20] and now we're back in our familiar uh problem from the previous slide where
[00:12:22] problem from the previous slide where we look at the first cluster
[00:12:25] we look at the first cluster the sine
[00:12:26] the sine of the centroid to be just the mean
[00:12:28] of the centroid to be just the mean between the two points
[00:12:30] between the two points and same
[00:12:32] and same second cluster
[00:12:34] second cluster and then 12 is 11.
[00:12:36] and then 12 is 11. and now we're actually converged
[00:12:39] and now we're actually converged um if you try to repeat this process
[00:12:41] um if you try to repeat this process nothing will change so we're done
[00:12:43] nothing will change so we're done and in this case it happens to recover
[00:12:46] and in this case it happens to recover the optimum
[00:12:48] the optimum clustering for these four points even
[00:12:50] clustering for these four points even though we didn't know anything to start
[00:12:51] though we didn't know anything to start out with
[00:12:54] so here is the k-means algorithm stated
[00:12:58] so here is the k-means algorithm stated more formally
[00:13:00] more formally so
[00:13:01] so first we are going to initialize all the
[00:13:03] first we are going to initialize all the centroids randomly
[00:13:05] centroids randomly then we're going to iterate
[00:13:07] then we're going to iterate d times or until convergence
[00:13:10] d times or until convergence and we're going to alternate between
[00:13:12] and we're going to alternate between step one which is set the assignments
[00:13:15] step one which is set the assignments given the centroids and step two which
[00:13:17] given the centroids and step two which is setting the centroids given
[00:13:19] is setting the centroids given assignments
[00:13:21] assignments so step one we're going to go through
[00:13:24] so step one we're going to go through each point
[00:13:25] each point and we're going to try to assign it and
[00:13:27] and we're going to try to assign it and assign it to zi is going to be
[00:13:30] assign it to zi is going to be equal to
[00:13:31] equal to we're going to look at all the
[00:13:35] we're going to look at all the the clusters
[00:13:37] the clusters 1 through big k
[00:13:39] 1 through big k for each one of these we're going to
[00:13:41] for each one of these we're going to look at where that point is
[00:13:43] look at where that point is and
[00:13:44] and look at the square distance between
[00:13:47] look at the square distance between the centroid of that cluster
[00:13:50] the centroid of that cluster squared
[00:13:51] squared and then we're just going to take the
[00:13:54] and then we're just going to take the argument which is the cluster
[00:13:57] argument which is the cluster k which minimizes this
[00:14:02] for step two we're going to loop over
[00:14:04] for step two we're going to loop over all the clusters
[00:14:07] all the clusters and we're going to set the centroid for
[00:14:09] and we're going to set the centroid for that cluster to be
[00:14:11] that cluster to be um we're going to
[00:14:13] um we're going to look at all of the points i which are
[00:14:16] look at all of the points i which are assigned
[00:14:18] assigned to that cluster okay
[00:14:20] to that cluster okay and just uh average um sum up all the
[00:14:24] and just uh average um sum up all the uh the points
[00:14:26] uh the points and this one divided by the number of
[00:14:28] and this one divided by the number of points i'm summing over to get the
[00:14:30] points i'm summing over to get the average
[00:14:31] average um
[00:14:32] um equals k
[00:14:34] equals k okay so that is the k-means algorithm
[00:14:42] so one word about whether it
[00:14:45] so one word about whether it works or not the k-means algorithm is
[00:14:47] works or not the k-means algorithm is guaranteed to converge to a local
[00:14:50] guaranteed to converge to a local minimum of the objective k-means
[00:14:52] minimum of the objective k-means objective
[00:14:53] objective but it's not guaranteed to converge uh
[00:14:55] but it's not guaranteed to converge uh to find the global minimum it's a
[00:14:57] to find the global minimum it's a cartoon picture of all optimization
[00:14:59] cartoon picture of all optimization functions it can convert to a local
[00:15:01] functions it can convert to a local minimum but not a global
[00:15:04] minimum but not a global um so here if you click here i have this
[00:15:07] um so here if you click here i have this demo
[00:15:08] demo which shows you
[00:15:10] which shows you how k-means works um
[00:15:13] how k-means works um so you can actually construct your own
[00:15:15] so you can actually construct your own set of training examples and if you
[00:15:18] set of training examples and if you step through the k-means algorithm
[00:15:20] step through the k-means algorithm um you initialize and
[00:15:23] um you initialize and it
[00:15:24] it alternate between moving these centroids
[00:15:26] alternate between moving these centroids around and reassigning the points
[00:15:29] around and reassigning the points and in this happy case we actually get
[00:15:32] and in this happy case we actually get to a pretty good clustering um the blue
[00:15:35] to a pretty good clustering um the blue points over here red points over here
[00:15:38] points over here red points over here and green
[00:15:39] and green training error
[00:15:41] training error means objective function is 44.7
[00:15:44] means objective function is 44.7 um but if i initialize in slightly
[00:15:48] um but if i initialize in slightly different way
[00:15:49] different way um let's see what happens
[00:15:52] um let's see what happens uh it converges to something with much
[00:15:55] uh it converges to something with much worse error and you can see visually
[00:15:57] worse error and you can see visually that this is a sub-optimal clustering
[00:15:59] that this is a sub-optimal clustering because this point has only one
[00:16:02] because this point has only one this cluster only has one point whereas
[00:16:05] this cluster only has one point whereas this cluster has so many other points
[00:16:07] this cluster has so many other points which are spread out
[00:16:10] which are spread out so um what do you do about
[00:16:13] so um what do you do about this um
[00:16:16] this um well there are a couple things
[00:16:19] well there are a couple things one is that you can just run them
[00:16:20] one is that you can just run them multiple times for different
[00:16:21] multiple times for different randomizations
[00:16:23] randomizations and take the best one
[00:16:25] and take the best one another thing you can do is use a
[00:16:28] another thing you can do is use a smarter heuristic so i didn't say very
[00:16:30] smarter heuristic so i didn't say very much about
[00:16:31] much about initialization
[00:16:33] initialization but you there's a cool thing you can do
[00:16:35] but you there's a cool thing you can do called k-means plus plus where you
[00:16:39] called k-means plus plus where you initialize the clusters one at a time
[00:16:42] initialize the clusters one at a time the centroids one at a time to be the
[00:16:44] the centroids one at a time to be the data points which are as far away as
[00:16:46] data points which are as far away as possible from all the other so this kind
[00:16:49] possible from all the other so this kind of makes sure that the centroids are
[00:16:50] of makes sure that the centroids are spread out and that they can kind of
[00:16:54] spread out and that they can kind of move over time to capture all the points
[00:16:56] move over time to capture all the points in your content
[00:17:01] okay to wrap things up so far we've
[00:17:04] okay to wrap things up so far we've talked about k-means which is an
[00:17:06] talked about k-means which is an algorithm for doing clustering and
[00:17:08] algorithm for doing clustering and clustering is a useful task
[00:17:10] clustering is a useful task that can be allowed to discover a
[00:17:13] that can be allowed to discover a structure and non-labeled data
[00:17:15] structure and non-labeled data particularly group points together it's
[00:17:18] particularly group points together it's useful to distinguish k-means um
[00:17:21] useful to distinguish k-means um two
[00:17:23] meanings of k-means
[00:17:25] meanings of k-means the first is the k-means objective
[00:17:28] the first is the k-means objective so this is objective function that says
[00:17:31] so this is objective function that says find assignments and find centroids that
[00:17:35] find assignments and find centroids that minimizes
[00:17:36] minimizes the squared distance between
[00:17:38] the squared distance between each point and a side the centroid of
[00:17:41] each point and a side the centroid of the sine cluster
[00:17:43] the sine cluster and then there's a k-means algorithm
[00:17:45] and then there's a k-means algorithm which performs alternating minimization
[00:17:48] which performs alternating minimization on the k-means objective
[00:17:50] on the k-means objective uh so
[00:17:52] uh so this is a
[00:17:53] this is a setting the assignments given the
[00:17:55] setting the assignments given the centroids and setting the centroids
[00:17:58] centroids and setting the centroids given assignments
[00:18:01] given assignments chicken egg problem so this is not
[00:18:02] chicken egg problem so this is not guaranteed to
[00:18:05] guaranteed to globally minimize the k-means objective
[00:18:07] globally minimize the k-means objective although
[00:18:07] although and can usually get pretty good results
[00:18:10] and can usually get pretty good results robustly
[00:18:12] robustly um step me back a little bit
[00:18:14] um step me back a little bit a means is for clustering there's other
[00:18:17] a means is for clustering there's other types of clustering methods
[00:18:19] types of clustering methods out there and clustering is a form of
[00:18:22] out there and clustering is a form of unsupervised learning
[00:18:24] unsupervised learning and
[00:18:25] and generally unsupervised learning has a
[00:18:27] generally unsupervised learning has a few use cases one is just data
[00:18:30] few use cases one is just data exploration and discovery to get a pile
[00:18:32] exploration and discovery to get a pile of data you might not have had a chance
[00:18:35] of data you might not have had a chance to
[00:18:35] to annotate or label it
[00:18:37] annotate or label it so you can run clustering or other types
[00:18:40] so you can run clustering or other types of unsupervised learning to
[00:18:42] of unsupervised learning to group points and discover any sort of
[00:18:45] group points and discover any sort of structure
[00:18:46] structure to get insight
[00:18:48] to get insight and the second use case is that when you
[00:18:50] and the second use case is that when you perform clustering or some sort of
[00:18:53] perform clustering or some sort of representation learning you can actually
[00:18:56] representation learning you can actually get useful
[00:18:57] get useful representations or features that you can
[00:19:00] representations or features that you can stick them in to downstream supervised
[00:19:03] stick them in to downstream supervised learning problems when you do get a
[00:19:05] learning problems when you do get a bunch of label data and this generally
[00:19:08] bunch of label data and this generally is helpful for
[00:19:09] is helpful for helping
[00:19:11] helping supervised learning work better
[00:19:15] supervised learning work better okay so that's end of this module


================================================================================
LECTURE 017
================================================================================

Search 1 - Dynamic Programming, Uniform Cost Search | Stanford CS221: AI (Autumn 2019)

Source: https://www.youtube.com/watch?v=aIsgJJYrlXk

---

Transcript

[00:00:05] everyone I'm dorsa and this week I'll be
[00:00:09] everyone I'm dorsa and this week I'll be teaching the state-based models and the
[00:00:11] teaching the state-based models and the pine is for the next couple of weeks for
[00:00:13] pine is for the next couple of weeks for me to teach the state base models MVPs
[00:00:16] me to teach the state base models MVPs and games and and after that Percy will
[00:00:18] and games and and after that Percy will come back and talk about the later so
[00:00:20] come back and talk about the later so the later topics so a few announcements
[00:00:22] the later topics so a few announcements so homework three is out there just make
[00:00:26] so homework three is out there just make sure to look at that and then the grades
[00:00:27] sure to look at that and then the grades for homework one will be coming out soon
[00:00:29] for homework one will be coming out soon so just yeah all right so so let's talk
[00:00:34] so just yeah all right so so let's talk about state based models let's talk
[00:00:36] about state based models let's talk about search so just to start I was
[00:00:39] about search so just to start I was thinking maybe we can start with this
[00:00:40] thinking maybe we can start with this question if you can so basically you're
[00:00:47] question if you can so basically you're gonna let me tell you what the question
[00:00:48] gonna let me tell you what the question is and then think about it and then
[00:00:50] is and then think about it and then after I don't get this working so so the
[00:00:52] after I don't get this working so so the question is you have a farmer and the
[00:00:54] question is you have a farmer and the farmer has a cabbage a goat and a wolf
[00:00:56] farmer has a cabbage a goat and a wolf and it's on one side of the river
[00:00:58] and it's on one side of the river everything is on one side of the river
[00:00:59] everything is on one side of the river so you have this river you have a farmer
[00:01:02] so you have this river you have a farmer we have the farmer with the cabbage the
[00:01:05] we have the farmer with the cabbage the goat and the wolf yeah and the farmer
[00:01:08] goat and the wolf yeah and the farmer wants to go to the other side of the
[00:01:10] wants to go to the other side of the river and take everything with himself
[00:01:13] river and take everything with himself and but the thing is the farmer has a
[00:01:15] and but the thing is the farmer has a boat and in that boat can only fit two
[00:01:17] boat and in that boat can only fit two things so the farmer can be in it with
[00:01:19] things so the farmer can be in it with with one of these other things okay so
[00:01:21] with one of these other things okay so the question is how many crossings can
[00:01:24] the question is how many crossings can candy farmer do to take everything on
[00:01:26] candy farmer do to take everything on the other side of the river and are a
[00:01:28] the other side of the river and are a bunch of constraints the constraint is
[00:01:30] bunch of constraints the constraint is if you leave the cabbage and go together
[00:01:32] if you leave the cabbage and go together the goat is going to eat the cabbage so
[00:01:34] the goat is going to eat the cabbage so you can't really do that if you leave
[00:01:35] you can't really do that if you leave wolf with the goat
[00:01:37] wolf with the goat the wolf is going to eat the goat you
[00:01:39] the wolf is going to eat the goat you can't really do that how many crossings
[00:01:41] can't really do that how many crossings should you take to take everything to
[00:01:43] should you take to take everything to the other side think about it talk to
[00:01:45] the other side think about it talk to your neighbors
[00:01:45] your neighbors I'll get this working everyone clear on
[00:01:48] I'll get this working everyone clear on the question
[00:01:49] the question [Music]
[00:01:57] [Music]
[00:02:33] [Music] the link doesn't work because I can't
[00:02:36] the link doesn't work because I can't connect to internet but alright so okay
[00:02:39] connect to internet but alright so okay so how many people think it is four four
[00:02:43] so how many people think it is four four crossings five six six
[00:02:49] crossings five six six some people think six seven no solution
[00:02:55] some people think six seven no solution no solution okay so the point is
[00:02:58] no solution okay so the point is actually not like what the answer is yes
[00:03:00] actually not like what the answer is yes it would come back to this question and
[00:03:01] it would come back to this question and try to solve it but I think the
[00:03:03] try to solve it but I think the important point to think about right now
[00:03:04] important point to think about right now is how you went about solving it so so
[00:03:07] is how you went about solving it so so what were you thinking and what was the
[00:03:08] what were you thinking and what was the process that you were thinking when you
[00:03:10] process that you were thinking when you were trying to solve solve this problem
[00:03:12] were trying to solve solve this problem and that is kind of the commonality that
[00:03:14] and that is kind of the commonality that search problems have and and we want to
[00:03:15] search problems have and and we want to think about those types of problems or
[00:03:17] think about those types of problems or it's it's more challenging to answer
[00:03:19] it's it's more challenging to answer these types of questions and let's say
[00:03:21] these types of questions and let's say reflex based type of questions so so
[00:03:23] reflex based type of questions so so that's kind of just a motivating
[00:03:25] that's kind of just a motivating examples that you will come back later
[00:03:27] examples that you will come back later and here's an xkcd on this so basically
[00:03:31] and here's an xkcd on this so basically one potential solution is the farmer
[00:03:33] one potential solution is the farmer takes the goat goes to the other side
[00:03:34] takes the goat goes to the other side comes back takes the cabbage goes to the
[00:03:37] comes back takes the cabbage goes to the other side and just leaves the wolf
[00:03:39] other side and just leaves the wolf because why would he need a wolf why
[00:03:41] because why would he need a wolf why would a former need a wolf so if yeah I
[00:03:44] would a former need a wolf so if yeah I searched for you probably were thinking
[00:03:46] searched for you probably were thinking about this and I get surprised like an
[00:03:49] about this and I get surprised like an interesting point in it because
[00:03:50] interesting point in it because sometimes maybe you should change the
[00:03:52] sometimes maybe you should change the problem your model is completely wrong
[00:03:54] problem your model is completely wrong maybe maybe sometimes you should rethink
[00:03:55] maybe maybe sometimes you should rethink and go back to your model I try to fix
[00:03:57] and go back to your model I try to fix that but anyways so we'll come back to
[00:04:00] that but anyways so we'll come back to this question so all right so this was
[00:04:03] this question so all right so this was our guideline for the class and we have
[00:04:05] our guideline for the class and we have already talked about reflex based model
[00:04:07] already talked about reflex based model so we have talked about machine learning
[00:04:08] so we have talked about machine learning and how that can get applied and now we
[00:04:10] and how that can get applied and now we want to start talking about state based
[00:04:12] want to start talking about state based models this week we're going to talk
[00:04:14] models this week we're going to talk about search problems next week
[00:04:15] about search problems next week MVP's and then the week after we're
[00:04:18] MVP's and then the week after we're going to talk about games and if you
[00:04:19] going to talk about games and if you remember the kind of the guideline that
[00:04:22] remember the kind of the guideline that we had for the class was we were
[00:04:24] we had for the class was we were thinking about these three different
[00:04:26] thinking about these three different paradigms of modeling right we talked
[00:04:30] paradigms of modeling right we talked about this already so modeling inference
[00:04:34] about this already so modeling inference and learning so for reflux based models
[00:04:40] and learning so for reflux based models we talked about this already right so
[00:04:42] we talked about this already right so well with the model B Roley can be a
[00:04:44] well with the model B Roley can be a linear predictor or it can be a neural
[00:04:46] linear predictor or it can be a neural network so that was the model and then
[00:04:48] network so that was the model and then we talked about and friends but in the
[00:04:50] we talked about and friends but in the case of reflex based models it was
[00:04:52] case of reflex based models it was really simple it was just function
[00:04:53] really simple it was just function evaluation you have you had your neural
[00:04:55] evaluation you have you had your neural network and you just go about evaluating
[00:04:57] network and you just go about evaluating it and that was inference and we also
[00:04:59] it and that was inference and we also spent some time talking about learning
[00:05:00] spent some time talking about learning so how would we use like what's a
[00:05:03] so how would we use like what's a gradient descent to try to fit the
[00:05:05] gradient descent to try to fit the parameters of the model okay so similar
[00:05:08] parameters of the model okay so similar thing with search based models you want
[00:05:09] thing with search based models you want to talk about these three different
[00:05:11] to talk about these three different paradigms that we have in the class and
[00:05:13] paradigms that we have in the class and then the plan is to talk about models
[00:05:15] then the plan is to talk about models and in France today and then under
[00:05:17] and in France today and then under Wednesday we'll talk about learning and
[00:05:19] Wednesday we'll talk about learning and we kind of have the same sort of format
[00:05:21] we kind of have the same sort of format next week too so we're going to start
[00:05:22] next week too so we're going to start talking about modeling and inference on
[00:05:23] talking about modeling and inference on Mondays Wednesdays are gonna be about
[00:05:25] Mondays Wednesdays are gonna be about learning so so just to give you an idea
[00:05:27] learning so so just to give you an idea what the plan is
[00:05:29] what the plan is all right so so what are search problems
[00:05:32] all right so so what are search problems let's start with a few motivating
[00:05:34] let's start with a few motivating examples so so one potential example one
[00:05:36] examples so so one potential example one can think of is is route-finding so you
[00:05:39] can think of is is route-finding so you might have a map and you want to go from
[00:05:40] might have a map and you want to go from point A to point B on the map and you
[00:05:42] point A to point B on the map and you have an objective so you want to maybe
[00:05:44] have an objective so you want to maybe find a shortest path or the fastest path
[00:05:47] find a shortest path or the fastest path or most scenic time and and that is your
[00:05:50] or most scenic time and and that is your objective and the things you can do is
[00:05:52] objective and the things you can do is you can take a bunch of action so you
[00:05:53] you can take a bunch of action so you can do things like go straight turn left
[00:05:55] can do things like go straight turn left turn right and then the answer for the
[00:05:58] turn right and then the answer for the search problem is going to be a sequence
[00:06:00] search problem is going to be a sequence of actions if you want to go from A to B
[00:06:02] of actions if you want to go from A to B with the shortest path the answer that
[00:06:04] with the shortest path the answer that one would give is maybe turned right
[00:06:06] one would give is maybe turned right first and then turn left and then right
[00:06:08] first and then turn left and then right again
[00:06:08] again or any or any of these sequences okay so
[00:06:10] or any or any of these sequences okay so so this is just a canonical example of
[00:06:12] so this is just a canonical example of what a search problem is there are a few
[00:06:15] what a search problem is there are a few other examples so for example you can
[00:06:17] other examples so for example you can think of robot a robot motion planning
[00:06:19] think of robot a robot motion planning so if you have a robot that wants to go
[00:06:21] so if you have a robot that wants to go from point A to point B then it might
[00:06:24] from point A to point B then it might want to have different objectives for
[00:06:25] want to have different objectives for doing that so again the question might
[00:06:27] doing that so again the question might be what is the fastest way of doing it
[00:06:29] be what is the fastest way of doing it or what is the most energy efficient way
[00:06:31] or what is the most energy efficient way of getting the robot to do that or or
[00:06:33] of getting the robot to do that or or what is the safest way of doing it like
[00:06:35] what is the safest way of doing it like another question that we are interested
[00:06:37] another question that we are interested in is what is the most expressive or
[00:06:38] in is what is the most expressive or legible way of robot doing it so so
[00:06:41] legible way of robot doing it so so people can understand what a robot
[00:06:43] people can understand what a robot really wants so you might have again
[00:06:44] really wants so you might have again various types of objectives you can
[00:06:46] various types of objectives you can formalize that and then the actions that
[00:06:49] formalize that and then the actions that you can take in the case of the robot
[00:06:50] you can take in the case of the robot motion planning is the robot is going to
[00:06:52] motion planning is the robot is going to have different joints and each one of
[00:06:53] have different joints and each one of the joints can translate and can rotate
[00:06:55] the joints can translate and can rotate so translation and rotation are the type
[00:06:58] so translation and rotation are the type of actions that you can take so so in
[00:07:00] of actions that you can take so so in this case I have a robot with seven
[00:07:01] this case I have a robot with seven seven joints and then I need to tell
[00:07:03] seven joints and then I need to tell what each one of those joints should do
[00:07:04] what each one of those joints should do in terms of translation and rotation
[00:07:07] in terms of translation and rotation this is my robot yes it's a fetch robot
[00:07:13] this is my robot yes it's a fetch robot alright so so let's look at another
[00:07:15] alright so so let's look at another example so games this is a fun example
[00:07:17] example so games this is a fun example so you might think about something like
[00:07:20] so you might think about something like Rubik's Cube or or this 15 puzzle and
[00:07:22] Rubik's Cube or or this 15 puzzle and and again what do you want to do as a
[00:07:24] and again what do you want to do as a search problem well you want to you want
[00:07:26] search problem well you want to you want to end up in configuration that's
[00:07:27] to end up in configuration that's desirable right so you want to end up in
[00:07:29] desirable right so you want to end up in a configuration where you have this type
[00:07:31] a configuration where you have this type of configuration of rubik's cube or the
[00:07:34] of configuration of rubik's cube or the 15 puzzle so that is the goal that's the
[00:07:36] 15 puzzle so that is the goal that's the objective and then the action as you can
[00:07:39] objective and then the action as you can move pieces around here so the sequence
[00:07:41] move pieces around here so the sequence of actions might be how you're moving
[00:07:43] of actions might be how you're moving these pieces are on to get that
[00:07:44] these pieces are on to get that particular configuration of the 15
[00:07:47] particular configuration of the 15 puzzle ok so again another example of
[00:07:49] puzzle ok so again another example of what a search problem is machine
[00:07:52] what a search problem is machine translation is an interesting one it's
[00:07:55] translation is an interesting one it's not necessarily the most natural thing
[00:07:57] not necessarily the most natural thing you might think about when you think
[00:07:58] you might think about when you think about search problems but what it is
[00:08:00] about search problems but what it is actually you can think about it as a
[00:08:01] actually you can think about it as a search problem again so imagine you have
[00:08:03] search problem again so imagine you have a phrase in different language and you
[00:08:05] a phrase in different language and you want to translate it to English so what
[00:08:07] want to translate it to English so what is the objective here well you can think
[00:08:09] is the objective here well you can think of the objective as going to fluent
[00:08:11] of the objective as going to fluent English and preserving meaning so that
[00:08:13] English and preserving meaning so that is the objective that one would have in
[00:08:15] is the objective that one would have in machine translation and then the type of
[00:08:18] machine translation and then the type of actions that you're taking is you're
[00:08:20] actions that you're taking is you're appending words so you start with there
[00:08:21] appending words so you start with there and then your appending blue to it and
[00:08:23] and then your appending blue to it and you're appending hostage so so as you're
[00:08:26] you're appending hostage so so as you're appending these these different
[00:08:27] appending these these different different words those are the actions
[00:08:29] different words those are the actions that you're taking so so in some sense
[00:08:31] that you're taking so so in some sense you can have any complex sequential task
[00:08:33] you can have any complex sequential task and the sequence of actions that you
[00:08:35] and the sequence of actions that you would get to get to your objective is
[00:08:38] would get to get to your objective is this going to be the answer for your
[00:08:40] this going to be the answer for your search problem and you can pose it as a
[00:08:41] search problem and you can pose it as a search problem ok all right so so what
[00:08:46] search problem ok all right so so what is different between let's say reflex
[00:08:48] is different between let's say reflex based models and search problem so if
[00:08:50] based models and search problem so if you remember reflex based models the
[00:08:52] you remember reflex based models the idea was you'd have an input X and then
[00:08:55] idea was you'd have an input X and then we wanted to find this F for example a
[00:08:57] we wanted to find this F for example a classifier that that would output
[00:09:00] classifier that that would output something like like this Y which is labo
[00:09:02] something like like this Y which is labo it's a plus 1 or minus 1 so the common
[00:09:04] it's a plus 1 or minus 1 so the common thing in in these reflex based models
[00:09:06] thing in in these reflex based models was we were outputting this this one
[00:09:08] was we were outputting this this one they
[00:09:09] they this one in this case action being minus
[00:09:12] this one in this case action being minus 1 or plus 1 again in search problems the
[00:09:15] 1 or plus 1 again in search problems the idea is I've given an input I'm given a
[00:09:18] idea is I've given an input I'm given a state and then given that I have that
[00:09:20] state and then given that I have that state what I want to output is a
[00:09:22] state what I want to output is a sequence of actions so I do want to
[00:09:24] sequence of actions so I do want to think about what happens if I take this
[00:09:26] think about what happens if I take this action like how is that going to affect
[00:09:28] action like how is that going to affect the future of my actions okay so so the
[00:09:31] the future of my actions okay so so the key idea in search problems is you need
[00:09:33] key idea in search problems is you need to consider future consequences of the
[00:09:36] to consider future consequences of the actions you take at the currency like
[00:09:39] actions you take at the currency like just outputting one thing and so if you
[00:09:44] just outputting one thing and so if you rerun it so the question is yeah is it
[00:09:45] rerun it so the question is yeah is it not the same as like I'm rerunning it I
[00:09:47] not the same as like I'm rerunning it I asked you to thing and then I rerun it
[00:09:49] asked you to thing and then I rerun it again
[00:09:49] again and then you could do that but that ends
[00:09:51] and then you could do that but that ends up being a little bit of a that would be
[00:09:53] up being a little bit of a that would be something similar to a greedy algorithm
[00:09:54] something similar to a greedy algorithm or like let's say I want to get to the
[00:09:56] or like let's say I want to get to the door and I want to find a find the
[00:09:58] door and I want to find a find the fastest way and and right now if I just
[00:10:00] fastest way and and right now if I just look at like my current state maybe I
[00:10:01] look at like my current state maybe I think the fastest way of getting there
[00:10:03] think the fastest way of getting there is going this way but if I actually
[00:10:05] is going this way but if I actually think about a horizon and I think about
[00:10:06] think about a horizon and I think about how this action is going to affect my
[00:10:08] how this action is going to affect my future I might call with different
[00:10:10] future I might call with different sequence of actions yeah all right okay
[00:10:15] sequence of actions yeah all right okay so and then we've already seen this
[00:10:16] so and then we've already seen this paradigm so let's start talking about
[00:10:18] paradigm so let's start talking about modeling and in France during this class
[00:10:20] modeling and in France during this class so this is the the plan for today so
[00:10:22] so this is the the plan for today so we're going to talk about three
[00:10:24] we're going to talk about three different algorithms for for doing
[00:10:26] different algorithms for for doing inference for search problems so so
[00:10:29] inference for search problems so so we're going to talk about research which
[00:10:30] we're going to talk about research which is the most naive thing one could do to
[00:10:32] is the most naive thing one could do to solve some of these search problems but
[00:10:34] solve some of these search problems but that's the simplest thing we can start
[00:10:35] that's the simplest thing we can start with and then after that you want to
[00:10:37] with and then after that you want to look at improvements of that doing
[00:10:39] look at improvements of that doing dynamic programming or uniform cost
[00:10:41] dynamic programming or uniform cost search based problem another flex pays
[00:10:46] search based problem another flex pays problem the very fact that in a respect
[00:10:48] problem the very fact that in a respect face problem the output that you give
[00:10:50] face problem the output that you give does not influence an exchange and it
[00:10:53] does not influence an exchange and it doesn't search yeah that's true yeah so
[00:10:55] doesn't search yeah that's true yeah so so the output that you get and search
[00:10:56] so the output that you get and search problem instance action it actually
[00:10:58] problem instance action it actually influences your future yeah that's a
[00:11:00] influences your future yeah that's a good way of actually thinking about it
[00:11:03] all right so so let's talk about
[00:11:05] all right so so let's talk about research so let's go back to our
[00:11:08] research so let's go back to our favorite example okay so we have the
[00:11:11] favorite example okay so we have the farm area cabbage go-to in both so let's
[00:11:13] farm area cabbage go-to in both so let's think about all possible actions that
[00:11:15] think about all possible actions that one can take when we have this farmer
[00:11:18] one can take when we have this farmer cabbage goat interval okay so so a bunch
[00:11:21] cabbage goat interval okay so so a bunch of things we can do is the farmer
[00:11:22] of things we can do is the farmer I can go to the other side of the river
[00:11:24] I can go to the other side of the river with the boat alone so this triangle
[00:11:27] with the boat alone so this triangle here just means like going to the other
[00:11:29] here just means like going to the other side of that de river the farmer can
[00:11:32] side of that de river the farmer can take the cabbage so C's for cabbage G's
[00:11:34] take the cabbage so C's for cabbage G's for it go to WC for both so another
[00:11:37] for it go to WC for both so another possible action is the farmer takes a
[00:11:39] possible action is the farmer takes a cabbage or the farmer takes a goat or
[00:11:40] cabbage or the farmer takes a goat or the farmer takes a wolf and goes to the
[00:11:42] the farmer takes a wolf and goes to the other side of the river you also have a
[00:11:44] other side of the river you also have a bunch of other actions the farmer can
[00:11:46] bunch of other actions the farmer can come back the farmer I can come back
[00:11:47] come back the farmer I can come back with the cabbage come back with the goat
[00:11:49] with the cabbage come back with the goat collaborative so I'm basically numerate
[00:11:52] collaborative so I'm basically numerate enumerate all possible actions that that
[00:11:55] enumerate all possible actions that that one could ever do and sure none of some
[00:11:57] one could ever do and sure none of some of these might not be possible in
[00:11:59] of these might not be possible in particular States but I'm just creating
[00:12:01] particular States but I'm just creating this library of actions things that are
[00:12:03] this library of actions things that are possible yeah so then when we think
[00:12:06] possible yeah so then when we think about this as a search problem we could
[00:12:09] about this as a search problem we could create a search tree which which
[00:12:11] create a search tree which which basically starts from an initial state
[00:12:13] basically starts from an initial state of where things are and then we can kind
[00:12:16] of where things are and then we can kind of think about where we could go from
[00:12:19] of think about where we could go from that initial state so the search tree is
[00:12:20] that initial state so the search tree is more of a what if what if tree which
[00:12:23] more of a what if what if tree which which allows you to think about what are
[00:12:25] which allows you to think about what are the possible options that you can take
[00:12:27] the possible options that you can take so conceptually what it looks like is
[00:12:30] so conceptually what it looks like is you're starting with your initial state
[00:12:32] you're starting with your initial state where everything is on one side of the
[00:12:34] where everything is on one side of the river so those two lines are it is the
[00:12:37] river so those two lines are it is the river white and you can take a bunch of
[00:12:40] river white and you can take a bunch of actions right like one possible action
[00:12:42] actions right like one possible action is you can take the cabbage and go to
[00:12:44] is you can take the cabbage and go to the other side of the river and you end
[00:12:46] the other side of the river and you end up in that state and that's a little not
[00:12:48] up in that state and that's a little not a good state so I'm making that red well
[00:12:50] a good state so I'm making that red well why is that because the wolf is going to
[00:12:51] why is that because the wolf is going to eat the goat that's not that great okay
[00:12:55] eat the goat that's not that great okay and every action every crossing let's
[00:12:57] and every action every crossing let's say let's say every crossing takes cost
[00:12:59] say let's say every crossing takes cost of one so that one that you see on the
[00:13:01] of one so that one that you see on the edge is the cost of that action okay so
[00:13:04] edge is the cost of that action okay so that didn't really work that well what
[00:13:05] that didn't really work that well what else can I do well I can I can do
[00:13:08] else can I do well I can I can do another action I can I can from the
[00:13:10] another action I can I can from the initial State
[00:13:10] initial State I can take the goat and go to the other
[00:13:12] I can take the goat and go to the other side of the river that ends up in this
[00:13:15] side of the river that ends up in this configuration from there the farmer
[00:13:17] configuration from there the farmer could come back take the cabbage go to
[00:13:19] could come back take the cabbage go to the other side end up in this
[00:13:21] the other side end up in this configuration the farmer can come back
[00:13:23] configuration the farmer can come back that's again not a great States because
[00:13:25] that's again not a great States because cabbage and goat are left on the other
[00:13:27] cabbage and goat are left on the other side of the river good is going to eat
[00:13:29] side of the river good is going to eat the cabbage that's not great what else
[00:13:31] the cabbage that's not great what else can I do well the farmer can come back
[00:13:33] can I do well the farmer can come back with the goat and
[00:13:35] with the goat and once the farmer comes back with the goat
[00:13:36] once the farmer comes back with the goat the farmer leaves the goat takes the
[00:13:39] the farmer leaves the goat takes the wolf goes to the other side comes back
[00:13:41] wolf goes to the other side comes back gets the goat again and then okay so so
[00:13:46] gets the goat again and then okay so so how many steps is to stay cool one two
[00:13:48] how many steps is to stay cool one two three four five six and seven so so the
[00:13:51] three four five six and seven so so the ones mice are seven that was a right
[00:13:53] ones mice are seven that was a right answer and that is kind of the idea of
[00:13:57] answer and that is kind of the idea of getting to this so you could have this
[00:14:10] getting to this so you could have this giant tree where you go to different
[00:14:12] giant tree where you go to different states but we can actually have like a
[00:14:14] states but we can actually have like a counter that tells you if I visited that
[00:14:16] counter that tells you if I visited that state and if you have visited that state
[00:14:17] state and if you have visited that state maybe you don't want to go there again
[00:14:18] maybe you don't want to go there again because because you have already
[00:14:19] because because you have already explored all the possible actions from
[00:14:21] explored all the possible actions from there you're not done with this tree
[00:14:23] there you're not done with this tree though right like I found this this good
[00:14:26] though right like I found this this good state here but maybe there's a better
[00:14:27] state here but maybe there's a better way of like getting there I don't know
[00:14:29] way of like getting there I don't know yet I haven't explored everything so so
[00:14:31] yet I haven't explored everything so so what I can do is I can actually explore
[00:14:33] what I can do is I can actually explore all these other things that that one
[00:14:34] all these other things that that one could do not gonna go over them but
[00:14:37] could do not gonna go over them but there is another solution and turns out
[00:14:39] there is another solution and turns out that other solution also takes seven
[00:14:41] that other solution also takes seven steps so it's not necessarily a better
[00:14:42] steps so it's not necessarily a better solution but but you got to explore all
[00:14:44] solution but but you got to explore all of that because there could be another
[00:14:45] of that because there could be another solution later on that that is better
[00:14:48] solution later on that that is better than the seven steps all right the wiser
[00:14:58] than the seven steps all right the wiser okay all right so so this is how the
[00:15:03] okay all right so so this is how the search tree looks like oh that's a very
[00:15:09] search tree looks like oh that's a very good point thank you for saying so for
[00:15:11] good point thank you for saying so for CPD students I'll try to repeat the
[00:15:13] CPD students I'll try to repeat the questions I always forget this I'll try
[00:15:15] questions I always forget this I'll try to repeat the questions the question was
[00:15:17] to repeat the questions the question was was the slice or the slides are up they
[00:15:19] was the slice or the slides are up they are up they should be up okay all right
[00:15:22] are up they should be up okay all right so going back to our search problem so
[00:15:25] so going back to our search problem so we can try to formalize this search
[00:15:27] we can try to formalize this search problem so let's actually think about it
[00:15:29] problem so let's actually think about it more formally so what are the things
[00:15:31] more formally so what are the things that we need to keep track of so so we
[00:15:33] that we need to keep track of so so we have a start state so let's define s
[00:15:34] have a start state so let's define s start to be the start state in addition
[00:15:37] start to be the start state in addition to that we can we can define this
[00:15:38] to that we can we can define this function called actions which returns
[00:15:40] function called actions which returns all possible actions from States so
[00:15:43] all possible actions from States so actions is a function of state if I'm in
[00:15:45] actions is a function of state if I'm in a state it basically tells me what are
[00:15:47] a state it basically tells me what are the actions I can take from
[00:15:48] the actions I can take from I can can you find this cost function so
[00:15:51] I can can you find this cost function so this cost function takes a state in
[00:15:53] this cost function takes a state in action and tells me what is the cost of
[00:15:55] action and tells me what is the cost of that and in this example the cost of
[00:15:57] that and in this example the cost of crossing the river was just one but you
[00:15:59] crossing the river was just one but you can imagine having different cost values
[00:16:01] can imagine having different cost values we can have a successor function that
[00:16:04] we can have a successor function that basically takes a state in action and
[00:16:06] basically takes a state in action and and tells us where we end up at so if
[00:16:08] and tells us where we end up at so if I'm state s and I take action a where
[00:16:11] I'm state s and I take action a where would I end up at and that's the
[00:16:12] would I end up at and that's the successor function and then we were
[00:16:14] successor function and then we were going to define and is end the function
[00:16:16] going to define and is end the function which basically checks if you're in an
[00:16:18] which basically checks if you're in an end state where we don't have any other
[00:16:20] end state where we don't have any other possible actions we can you can think of
[00:16:27] possible actions we can you can think of it as a way of like finite state machine
[00:16:30] it as a way of like finite state machine type of type of way of looking at it
[00:16:32] type of type of way of looking at it yeah so like we use a similar type of
[00:16:35] yeah so like we use a similar type of formalism for MVPs and games too so it's
[00:16:38] formalism for MVPs and games too so it's just good idea to get like all these
[00:16:39] just good idea to get like all these hormones outside but start state
[00:16:41] hormones outside but start state transitions cost position and action an
[00:16:55] transitions cost position and action an extravert this thing so then so so the
[00:17:00] extravert this thing so then so so the action okay so so action depends on the
[00:17:01] action okay so so action depends on the state so you start from start state
[00:17:03] state so you start from start state where you haven't taken any actions
[00:17:04] where you haven't taken any actions right and then from that start state
[00:17:06] right and then from that start state then you can think about all possible
[00:17:08] then you can think about all possible like right up there so you're in that
[00:17:10] like right up there so you're in that start state and then you can think about
[00:17:12] start state and then you can think about all possible actions you can take
[00:17:14] all possible actions you can take and then those actions depend on current
[00:17:16] and then those actions depend on current state but they don't depend on the
[00:17:18] state but they don't depend on the future State right so based on my
[00:17:19] future State right so based on my current state everything is on one side
[00:17:21] current state everything is on one side of the river I can think about all
[00:17:23] of the river I can think about all possible actions I can take and where I
[00:17:24] possible actions I can take and where I know where I end up at and then after
[00:17:27] know where I end up at and then after that's like the next action depends on
[00:17:29] that's like the next action depends on that that state so it's a sequential
[00:17:31] that that state so it's a sequential thing okay yes you have all the
[00:17:35] thing okay yes you have all the information on the actions and the cost
[00:17:37] information on the actions and the cost that you could do beforehand how is this
[00:17:39] that you could do beforehand how is this conceptually different than like a mini
[00:17:40] conceptually different than like a mini cost flow convex optimization
[00:17:42] cost flow convex optimization okay so how is it different from a kind
[00:17:46] okay so how is it different from a kind of convex optimization type of row so we
[00:17:48] of convex optimization type of row so we have you have an objective here and you
[00:17:49] have you have an objective here and you can think of what that objective is and
[00:17:51] can think of what that objective is and based on what that objective is you can
[00:17:53] based on what that objective is you can have different methods for solving it
[00:17:54] have different methods for solving it right so so you can basically formulate
[00:17:56] right so so you can basically formulate this as an optimization problem where
[00:17:58] this as an optimization problem where you saw you look for the
[00:18:00] you saw you look for the solution to a search problem as an
[00:18:01] solution to a search problem as an optimization problem that's perfectly a
[00:18:03] optimization problem that's perfectly a perfect way of doing it and then we are
[00:18:05] perfect way of doing it and then we are going to talk about various types of
[00:18:07] going to talk about various types of methods for solving this problem today
[00:18:09] methods for solving this problem today yeah all right so so let's look at
[00:18:12] yeah all right so so let's look at another example so this is a
[00:18:14] another example so this is a transportation problem this so okay so
[00:18:24] transportation problem this so okay so basically what we want to do is we have
[00:18:26] basically what we want to do is we have straight blocks from 1 through n so 1 2
[00:18:32] straight blocks from 1 through n so 1 2 3 so on so these are straight thoughts
[00:18:38] 3 so on so these are straight thoughts and what we want to do is we basically
[00:18:41] and what we want to do is we basically want to travel from from 1 2 to some n
[00:18:45] want to travel from from 1 2 to some n number and we have two possible actions
[00:18:47] number and we have two possible actions so at any state let's say on state s at
[00:18:50] so at any state let's say on state s at any state I can either walk and if I
[00:18:53] any state I can either walk and if I walk I end up in s plus 1 so if I'm in 3
[00:18:56] walk I end up in s plus 1 so if I'm in 3 I'm going to end up in 4 and walking
[00:19:00] I'm going to end up in 4 and walking takes 1 minute or I can take this magic
[00:19:03] takes 1 minute or I can take this magic  and if magic tram takes any state
[00:19:06] and if magic tram takes any state s 2 2 times s so if I mean 3 then I'm
[00:19:10] s 2 2 times s so if I mean 3 then I'm going to end up in 6 by taking the magic
[00:19:12] going to end up in 6 by taking the magic trap and the magic time
[00:19:14] trap and the magic time always takes 2 minutes doesn't matter
[00:19:16] always takes 2 minutes doesn't matter from that so so if I'm in to all end up
[00:19:19] from that so so if I'm in to all end up in 4 or 5 min 5 I can end up in 10 by
[00:19:22] in 4 or 5 min 5 I can end up in 10 by taking the trap ok so so I have two
[00:19:24] taking the trap ok so so I have two possible actions in any of these states
[00:19:26] possible actions in any of these states and what I want to do is I want to go
[00:19:28] and what I want to do is I want to go from 1 to N and then I want to basically
[00:19:30] from 1 to N and then I want to basically do that in the shortest time possible so
[00:19:34] do that in the shortest time possible so we did with the least amount of costs
[00:19:36] we did with the least amount of costs there's a problem all right so so this
[00:19:40] there's a problem all right so so this is kind of like what the search problem
[00:19:42] is kind of like what the search problem is so what we want to do is first off
[00:19:43] is so what we want to do is first off you want to just formalize it and I'm
[00:19:46] you want to just formalize it and I'm gonna do that here I'm not gonna do wife
[00:19:48] gonna do that here I'm not gonna do wife solutions cuz I'm not Percy and I did
[00:19:51] solutions cuz I'm not Percy and I did that once and it was a disaster so we
[00:19:55] that once and it was a disaster so we are going to yeah I've taped these in
[00:19:58] are going to yeah I've taped these in 2018 but basically you're going to go
[00:20:02] 2018 but basically you're going to go over it together so so let's just do
[00:20:04] over it together so so let's just do that so we're going to define the search
[00:20:08] that so we're going to define the search problem this problem so we're
[00:20:10] problem this problem so we're going to define a class for
[00:20:11] going to define a class for transportation problems you're going to
[00:20:12] transportation problems you're going to separate our search
[00:20:14] separate our search from our algorithms because remember
[00:20:16] from our algorithms because remember modeling is separate from inference so
[00:20:19] modeling is separate from inference so let's just have a constructor for this
[00:20:20] let's just have a constructor for this transportation problem it takes n
[00:20:22] transportation problem it takes n because we are at n box okay so n is the
[00:20:26] because we are at n box okay so n is the number of blocks
[00:20:30] well
[00:20:33] all right so so then you have miss to
[00:20:35] all right so so then you have miss to have a start state we are starting from
[00:20:37] have a start state we are starting from one so block one and then we need to
[00:20:40] one so block one and then we need to define is end state so as n state
[00:20:43] define is end state so as n state basically checks if you have reached an
[00:20:45] basically checks if you have reached an or not because we have to get to there
[00:20:47] or not because we have to get to there and in okay alright so what else do
[00:20:53] and in okay alright so what else do we need
[00:20:54] we need so we have a successor function you also
[00:20:56] so we have a successor function you also have a cost function I'm gonna put both
[00:20:58] have a cost function I'm gonna put both of them together cuz that is just easier
[00:21:01] of them together cuz that is just easier so the successor in cost function I'm
[00:21:03] so the successor in cost function I'm saying let's just give it state s and
[00:21:06] saying let's just give it state s and then given a state it's going to return
[00:21:09] then given a state it's going to return this triple of action new state costs so
[00:21:13] this triple of action new state costs so I give it a state as the initial state
[00:21:14] I give it a state as the initial state and then it just returns all possible
[00:21:16] and then it just returns all possible actions with the new states I can end up
[00:21:18] actions with the new states I can end up at and how much does that cost yeah so
[00:21:21] at and how much does that cost yeah so what are my options
[00:21:22] what are my options well if I'm in state s I can walk to s
[00:21:25] well if I'm in state s I can walk to s plus 1 that cost 1 if I'm in state s I
[00:21:29] plus 1 that cost 1 if I'm in state s I can take the tram I can end up in to s
[00:21:31] can take the tram I can end up in to s and that costs to okay so that's how I'm
[00:21:33] and that costs to okay so that's how I'm creating my Triple C and I need to check
[00:21:37] creating my Triple C and I need to check if I don't pass the enth walk remember
[00:21:39] if I don't pass the enth walk remember like you have n box so we don't want to
[00:21:40] like you have n box so we don't want to pass then walk ok so so that's just to
[00:21:44] pass then walk ok so so that's just to make sure that we don't pass it so we
[00:21:46] make sure that we don't pass it so we are still below tenth block and then
[00:21:49] are still below tenth block and then this is what my successor in cost
[00:21:51] this is what my successor in cost function will return the debt triples
[00:21:52] function will return the debt triples okay so let's just return that okay so
[00:21:57] okay so let's just return that okay so that is my transportation problem let's
[00:21:59] that is my transportation problem let's make sure it does the think the way we
[00:22:02] make sure it does the think the way we want it so let's say we have 10 blocks
[00:22:04] want it so let's say we have 10 blocks and now I want to print my
[00:22:06] and now I want to print my transportation of my successor and cost
[00:22:08] transportation of my successor and cost much analysis let's say I'm returning
[00:22:10] much analysis let's say I'm returning successor and cost for 3 what should I
[00:22:12] successor and cost for 3 what should I get
[00:22:12] get so from 3 I can have two actions right I
[00:22:16] so from 3 I can have two actions right I can either walk or I can take the tram
[00:22:19] can either walk or I can take the tram if I walk it costs one if I take the
[00:22:22] if I walk it costs one if I take the tram it costs two I'll end up in 4 or 6
[00:22:24] tram it costs two I'll end up in 4 or 6 let's just try I don't know 9
[00:22:29] if I'm a state 9 I can only do one thing
[00:22:32] if I'm a state 9 I can only do one thing I can walk right cuz remember the the
[00:22:34] I can walk right cuz remember the the block is number of blocks is 10 and I
[00:22:36] block is number of blocks is 10 and I can't go beyond that so alright okay so
[00:22:42] can't go beyond that so alright okay so that was yeah let's go back here so that
[00:22:47] that was yeah let's go back here so that was just defining the search problem
[00:22:50] was just defining the search problem yeah and I haven't told you guys like
[00:22:54] yeah and I haven't told you guys like how to solve it right this is we were
[00:22:56] how to solve it right this is we were just doing the modeling right now so we
[00:22:58] just doing the modeling right now so we just modeled this problem just coated it
[00:23:00] just modeled this problem just coated it up modeling it means what is this what
[00:23:02] up modeling it means what is this what are the actions what is a successor
[00:23:04] are the actions what is a successor function what is the cost function
[00:23:06] function what is the cost function defining an is end function say but what
[00:23:09] defining an is end function say but what the initial state is okay so so now I
[00:23:11] the initial state is okay so so now I think we are ready to think about the
[00:23:14] think we are ready to think about the algorithm in terms of like going in
[00:23:16] algorithm in terms of like going in solving these types of search problems
[00:23:18] solving these types of search problems so the simplest algorithm we want to
[00:23:21] so the simplest algorithm we want to talk about is backtracking search so the
[00:23:25] talk about is backtracking search so the idea of backtracking search is maybe I
[00:23:28] idea of backtracking search is maybe I can draw a tree here is you're starting
[00:23:31] can draw a tree here is you're starting from an initial state and then you have
[00:23:32] from an initial state and then you have a bunch of possible actions and then you
[00:23:35] a bunch of possible actions and then you end up in some state and you have a
[00:23:36] end up in some state and you have a bunch of water possible actions let's
[00:23:39] bunch of water possible actions let's say you have two actions possible and
[00:23:41] say you have two actions possible and this can become this exponentially blows
[00:23:44] this can become this exponentially blows up so I'm going to stop soon all right
[00:23:48] up so I'm going to stop soon all right so so create this tree and this tree has
[00:23:51] so so create this tree and this tree has some branching factor it's number of
[00:23:54] some branching factor it's number of actions you have at every at every state
[00:23:57] actions you have at every at every state and then it also has some depth so that
[00:24:01] and then it also has some depth so that is how many levels you go down to let me
[00:24:05] is how many levels you go down to let me just define that by D okay
[00:24:07] just define that by D okay and now their solutions down in this no
[00:24:10] and now their solutions down in this no it's right so so we want to figure out
[00:24:11] it's right so so we want to figure out what those solutions are and
[00:24:12] what those solutions are and backtracking search just does the
[00:24:14] backtracking search just does the simplest thing possible what it does is
[00:24:16] simplest thing possible what it does is it starts from this initial state and
[00:24:18] it starts from this initial state and it's going to go all the way down here
[00:24:19] it's going to go all the way down here and if it doesn't find a solution it's
[00:24:20] and if it doesn't find a solution it's gonna go back here and then try again
[00:24:22] gonna go back here and then try again and try again and it's gonna go over all
[00:24:24] and try again and it's gonna go over all over the tree because there might be a
[00:24:26] over the tree because there might be a better solution down here too so it
[00:24:28] better solution down here too so it needs to actually go over all of the
[00:24:29] needs to actually go over all of the truth okay so I'm going to have a table
[00:24:32] truth okay so I'm going to have a table of algorithms cuz you're going to talk
[00:24:35] of algorithms cuz you're going to talk about a few of them yeah algorithms
[00:24:40] about a few of them yeah algorithms what sort of costs they allow
[00:24:43] what sort of costs they allow in terms of time how bad they are in
[00:24:47] in terms of time how bad they are in terms of space so if you've taken an
[00:24:50] terms of space so if you've taken an algorithms course like some of these are
[00:24:52] algorithms course like some of these are probably familiar so alright so we
[00:24:56] probably familiar so alright so we talked about Mack tracking search
[00:24:58] talked about Mack tracking search tracking search that is basically this
[00:25:04] tracking search that is basically this algorithm that goes through pretty much
[00:25:06] algorithm that goes through pretty much everything and it allows any type of
[00:25:08] everything and it allows any type of costs so I can have any costs right I
[00:25:11] costs so I can have any costs right I can have pretty much any cost I want on
[00:25:14] can have pretty much any cost I want on these edges because I'm going over all
[00:25:15] these edges because I'm going over all of the tree it doesn't matter what these
[00:25:17] of the tree it doesn't matter what these costs okay so how bad is this in terms
[00:25:22] costs okay so how bad is this in terms of in terms of time so in terms of time
[00:25:25] of in terms of time so in terms of time I'm going over the full tree like going
[00:25:28] I'm going over the full tree like going over the full tree then then this is
[00:25:30] over the full tree then then this is going to have this exponential blow-up
[00:25:32] going to have this exponential blow-up where I'm looking at order of B to the D
[00:25:37] where I'm looking at order of B to the D where B is again my branching factor and
[00:25:39] where B is again my branching factor and D is the depth of the tree because in
[00:25:44] D is the depth of the tree because in terms of time this is not a good
[00:25:45] terms of time this is not a good algorithm maybe in terms of time I have
[00:25:47] algorithm maybe in terms of time I have to go over everything in the tree and
[00:25:49] to go over everything in the tree and that's the size of my tree and in terms
[00:25:52] that's the size of my tree and in terms of space in terms of space what I mean
[00:25:54] of space in terms of space what I mean is I need to figure out what was what
[00:25:57] is I need to figure out what was what was the sequence of actions I needed to
[00:25:58] was the sequence of actions I needed to take to get to some solution so let's
[00:26:00] take to get to some solution so let's say that my solution is down here my
[00:26:02] say that my solution is down here my solution is down here then for mean or
[00:26:05] solution is down here then for mean or like I hate to store a bunch of things
[00:26:06] like I hate to store a bunch of things to know how I got here and the things I
[00:26:08] to know how I got here and the things I need to store the parents of this node
[00:26:11] need to store the parents of this node and that is depth of them so in terms of
[00:26:14] and that is depth of them so in terms of space this algorithm takes order of D
[00:26:19] space this algorithm takes order of D because because that is like the things
[00:26:21] because because that is like the things that I need to store in my memory to be
[00:26:23] that I need to store in my memory to be able to recover everything between the
[00:26:29] able to recover everything between the space be bigger your B to the D as well
[00:26:32] space be bigger your B to the D as well because until you get that you need to
[00:26:35] because until you get that you need to have the space to have everything right
[00:26:36] have the space to have everything right no so actually we will talk about
[00:26:38] no so actually we will talk about breadth-first search later which does
[00:26:40] breadth-first search later which does require have a larger space so so the
[00:26:43] require have a larger space so so the reason you can forget it is the only
[00:26:44] reason you can forget it is the only history that I need to keep track of is
[00:26:47] history that I need to keep track of is this particular branch right I don't
[00:26:49] this particular branch right I don't need to figure out like I don't need to
[00:26:50] need to figure out like I don't need to keep track of like actually the history
[00:26:53] keep track of like actually the history of all these other notes I can throw
[00:26:55] of all these other notes I can throw those out but or something else like
[00:26:57] those out but or something else like breadth-first search where we will talk
[00:26:58] breadth-first search where we will talk about in a few slides you actually need
[00:27:00] about in a few slides you actually need to keep track of like the history of
[00:27:01] to keep track of like the history of everything else
[00:27:02] everything else so so let me get back to that in a few
[00:27:04] so so let me get back to that in a few slides but for this one make the clean
[00:27:06] slides but for this one make the clean ideas yeah like I want to know how I got
[00:27:08] ideas yeah like I want to know how I got there to know how I got there I just
[00:27:10] there to know how I got there I just need to know like the minimum cost to
[00:27:15] need to know like the minimum cost to reach your point or is it to find
[00:27:16] reach your point or is it to find whether like so it depends on what your
[00:27:20] whether like so it depends on what your objective is like it really depends on
[00:27:22] objective is like it really depends on what the search problem is asking so so
[00:27:24] what the search problem is asking so so in the case of that farmer good example
[00:27:26] in the case of that farmer good example the search problem is asking you want to
[00:27:29] the search problem is asking you want to move everything to the other side of the
[00:27:30] move everything to the other side of the river so you have that criteria and you
[00:27:32] river so you have that criteria and you want to find the minimum cost one so you
[00:27:34] want to find the minimum cost one so you also add other criteria so it really
[00:27:36] also add other criteria so it really depends on what the search problems
[00:27:38] depends on what the search problems asking and some of these notes might be
[00:27:39] asking and some of these notes might be solutions some of them might not be
[00:27:41] solutions some of them might not be solutions so it's a really difference
[00:27:43] solutions so it's a really difference all right so so let's just look at these
[00:27:46] all right so so let's just look at these on the slide so the memory is order of T
[00:27:48] on the slide so the memory is order of T it's actually small it's nice in terms
[00:27:50] it's actually small it's nice in terms of time this is not a great algorithm
[00:27:52] of time this is not a great algorithm right because even if you're branching
[00:27:54] right because even if you're branching factors too if the depth of the tree is
[00:27:56] factors too if the depth of the tree is 50 then this is going to blow up like
[00:27:59] 50 then this is going to blow up like immediately so a lot of these tree
[00:28:01] immediately so a lot of these tree search algorithm said you're going to
[00:28:03] search algorithm said you're going to talk about like they have the same
[00:28:04] talk about like they have the same problem so so they pretty much have the
[00:28:06] problem so so they pretty much have the same time complexity if you're going to
[00:28:08] same time complexity if you're going to just look at very minimal improvements
[00:28:10] just look at very minimal improvements of them and then after that we'll talk
[00:28:12] of them and then after that we'll talk about dynamic programming and uniform
[00:28:14] about dynamic programming and uniform cost search which are polynomial
[00:28:15] cost search which are polynomial algorithms that are much better than
[00:28:17] algorithms that are much better than okay all right so let's actually let's
[00:28:20] okay all right so let's actually let's go back to a tram example and let's try
[00:28:22] go back to a tram example and let's try to write up what backtracking search
[00:28:24] to write up what backtracking search does so alright so we defined our model
[00:28:26] does so alright so we defined our model our model is the search problem this
[00:28:29] our model is the search problem this particular transportation search problem
[00:28:31] particular transportation search problem it could be anything else and now we're
[00:28:34] it could be anything else and now we're going to kind of have this main section
[00:28:36] going to kind of have this main section with where we were going to put in like
[00:28:38] with where we were going to put in like our algorithms in it and you're going to
[00:28:40] our algorithms in it and you're going to write them as general as possible so so
[00:28:43] write them as general as possible so so we can apply them to other types of
[00:28:44] we can apply them to other types of search problems yeah so let's define
[00:28:46] search problems yeah so let's define backtracking search it takes a search
[00:28:49] backtracking search it takes a search problem it can take the transportation
[00:28:51] problem it can take the transportation problem okay all right so and then we're
[00:28:54] problem okay all right so and then we're going to basically in backtracking
[00:28:56] going to basically in backtracking search what we were doing is we were
[00:28:57] search what we were doing is we were recursing on every state given that we
[00:29:00] recursing on every state given that we have a history of getting there and the
[00:29:02] have a history of getting there and the total cost that it took us to get there
[00:29:04] total cost that it took us to get there okay so so at the state having gone some
[00:29:08] okay so so at the state having gone some history and some accumulated cost so
[00:29:11] history and some accumulated cost so far we are going to basically recurse on
[00:29:14] far we are going to basically recurse on that state and look at the children of
[00:29:16] that state and look at the children of that state okay so so we're going to
[00:29:18] that state okay so so we're going to explore the rest of the subtree from
[00:29:21] explore the rest of the subtree from from that particular state
[00:29:25] all right so how do we do that
[00:29:28] well we got to make sure that you're not
[00:29:30] well we got to make sure that you're not in an end state or if you're an in-state
[00:29:33] in an end state or if you're an in-state like we can actually update the best
[00:29:35] like we can actually update the best solution so far okay so let's put that
[00:29:40] solution so far okay so let's put that for to do so so the bunch of things we
[00:29:42] for to do so so the bunch of things we need to do we need to figure out if we
[00:29:43] need to do we need to figure out if we are in an end state if you are well we
[00:29:45] are in an end state if you are well we gotta we gotta update our best solution
[00:29:47] gotta we gotta update our best solution if you're not in an end state then we
[00:29:49] if you're not in an end state then we are going to recurse on children
[00:29:53] all right so you can do that later and
[00:29:59] all right so you can do that later and then in general this recurse function is
[00:30:02] then in general this recurse function is going to we're going to call it on on
[00:30:05] going to we're going to call it on on the on the start State so let's actually
[00:30:07] the on the start State so let's actually do that too so so what backtracking
[00:30:09] do that too so so what backtracking search does is it calls this recursive
[00:30:12] search does is it calls this recursive function on the initial State dead
[00:30:13] function on the initial State dead behalf with history of none right if you
[00:30:16] behalf with history of none right if you don't have any history yet and and cost
[00:30:18] don't have any history yet and and cost this zero so far because we haven't
[00:30:20] this zero so far because we haven't really gone anywhere so so we start with
[00:30:22] really gone anywhere so so we start with a start state we call recurse on it okay
[00:30:25] a start state we call recurse on it okay and how do we recurse on children well
[00:30:27] and how do we recurse on children well we have defined this this successor and
[00:30:29] we have defined this this successor and cost function so by calling that
[00:30:31] cost function so by calling that successor and cost function on state
[00:30:33] successor and cost function on state then we can get action new state and
[00:30:36] then we can get action new state and costs so so we get this triple of action
[00:30:39] costs so so we get this triple of action news dating cost
[00:30:42] and then we can basically recurse on the
[00:30:45] and then we can basically recurse on the new state I'm not putting the histories
[00:30:49] new state I'm not putting the histories right now in this code so so you need to
[00:30:51] right now in this code so so you need to keep track of the history too but let's
[00:30:53] keep track of the history too but let's just not worry about the history oh I
[00:30:57] just not worry about the history oh I guess I'm putting it in this one I mean
[00:30:59] guess I'm putting it in this one I mean the later ones I will not put them but
[00:31:02] the later ones I will not put them but but basically the history is keeping
[00:31:03] but basically the history is keeping track of like how you got there and
[00:31:05] track of like how you got there and total cost is going to be what you've
[00:31:07] total cost is going to be what you've got so far costs the cost of this is new
[00:31:10] got so far costs the cost of this is new state action yeah okay so we need to
[00:31:14] state action yeah okay so we need to keep track of the best solution so far
[00:31:15] keep track of the best solution so far so I'm just going to find a dictionary
[00:31:19] so I'm just going to find a dictionary here just to make sure that we keep
[00:31:21] here just to make sure that we keep track of it and we'll Python scoping
[00:31:26] okay
[00:31:29] and then the place we are going to
[00:31:32] and then the place we are going to update our best solution so far is that
[00:31:33] update our best solution so far is that to do that is left right so if you're in
[00:31:36] to do that is left right so if you're in an end state then we can actually update
[00:31:38] an end state then we can actually update the best solution so far yeah and what
[00:31:41] the best solution so far yeah and what do we want in our best solution well we
[00:31:43] do we want in our best solution well we want to know what the cost is so so you
[00:31:46] want to know what the cost is so so you can start with cost of infinity and
[00:31:48] can start with cost of infinity and anything below infinity is better and
[00:31:51] anything below infinity is better and then we're going to start with a history
[00:31:53] then we're going to start with a history of empty but we're going to throw up
[00:31:54] of empty but we're going to throw up that's history too so that's the
[00:31:57] that's history too so that's the initialization of best solution so far
[00:32:00] initialization of best solution so far then we are going to update that right
[00:32:03] then we are going to update that right if you're in an end state if the total
[00:32:05] if you're in an end state if the total cost that we have right now is smaller
[00:32:07] cost that we have right now is smaller than the best solution so far then
[00:32:09] than the best solution so far then you're going to update that best
[00:32:11] you're going to update that best solution and then you're going to update
[00:32:13] solution and then you're going to update its history with whatever its histories
[00:32:17] all right and that's it that's
[00:32:18] all right and that's it that's backtracking search okay so let's just
[00:32:21] backtracking search okay so let's just make sure it does the thing so to do
[00:32:28] make sure it does the thing so to do that we are going to actually know we
[00:32:32] that we are going to actually know we got to return the best solution so far
[00:32:39] all right so now we have defined this
[00:32:42] all right so now we have defined this transportation problem now what I want
[00:32:44] transportation problem now what I want to do is I want to call backtracking
[00:32:46] to do is I want to call backtracking search on the transportation problem
[00:32:48] search on the transportation problem yeah so that all sounds good I need to
[00:32:53] yeah so that all sounds good I need to write a print function also so to be
[00:32:56] write a print function also so to be able to print things so I'm going to
[00:32:57] able to print things so I'm going to just write a generic print function that
[00:32:59] just write a generic print function that we can call on any of these types of
[00:33:01] we can call on any of these types of problems so let's let's define a print
[00:33:05] problems so let's let's define a print solution function that just like prints
[00:33:07] solution function that just like prints things the way you want them so we get
[00:33:10] things the way you want them so we get the solution and we're going to just
[00:33:11] the solution and we're going to just unpack the cost and history and just
[00:33:13] unpack the cost and history and just print the constant history nicely
[00:33:21] you
[00:33:23] you all right so I can i can use this print
[00:33:26] all right so I can i can use this print solution for pretty much all the other
[00:33:27] solution for pretty much all the other algorithms we'll talk about today too
[00:33:32] and you're going to talk about how we
[00:33:35] and you're going to talk about how we get there with the history so now I have
[00:33:37] get there with the history so now I have my print function I have my backtracking
[00:33:39] my print function I have my backtracking search algorithm I've defined my
[00:33:41] search algorithm I've defined my transportation problem I can just call
[00:33:43] transportation problem I can just call it on this transportation problem it's
[00:33:45] it on this transportation problem it's been ten blocks so as you guys can see
[00:33:48] been ten blocks so as you guys can see here so the total cost is six so what
[00:33:51] here so the total cost is six so what this means is for going from city 1 to
[00:33:53] this means is for going from city 1 to city to city 10 then this is the best
[00:33:56] city to city 10 then this is the best solution I got a walk walk walk walk and
[00:33:58] solution I got a walk walk walk walk and then after that take the tram guys like
[00:34:00] then after that take the tram guys like I end up in five and then after that
[00:34:02] I end up in five and then after that it's actually worth taking the tram and
[00:34:04] it's actually worth taking the tram and paying constant let's try it out for 20
[00:34:08] paying constant let's try it out for 20 what do you think is the answer for 20
[00:34:11] what do you think is the answer for 20 so similar here for a walk walk walk
[00:34:14] so similar here for a walk walk walk until we get to five then we take the
[00:34:17] until we get to five then we take the tram then we take the tram again across
[00:34:18] tram then we take the tram again across the state and then if it is 100 it's a
[00:34:24] the state and then if it is 100 it's a little bit more interesting if you have
[00:34:25] little bit more interesting if you have 100 so you're walking and then you're
[00:34:27] 100 so you're walking and then you're taking the tram and you get to 24 and
[00:34:30] taking the tram and you get to 24 and you want you have that one step to get
[00:34:32] you want you have that one step to get to 25 which is a good stay because then
[00:34:34] to 25 which is a good stay because then you can just multiply that by 2 so so
[00:34:36] you can just multiply that by 2 so so you walk for that one step and take the
[00:34:38] you walk for that one step and take the tram again so what if I want to try out
[00:34:41] tram again so what if I want to try out a much larger number of blocks so is
[00:34:45] a much larger number of blocks so is this gonna work no cuz cuz remember that
[00:34:49] this gonna work no cuz cuz remember that time was order of B to the D that wasn't
[00:34:51] time was order of B to the D that wasn't that great so let's try that
[00:34:54] that great so let's try that Oh because maximum recursion to him we
[00:34:58] Oh because maximum recursion to him we can fix that so let's try fix impact so
[00:35:01] can fix that so let's try fix impact so if you can't you can set your recursion
[00:35:03] if you can't you can set your recursion limit to be whatever so if we try that
[00:35:08] limit to be whatever so if we try that is this gonna work
[00:35:11] [Laughter]
[00:35:14] [Laughter] well now it's just gonna take a long
[00:35:16] well now it's just gonna take a long time right it's not gonna give you an
[00:35:18] time right it's not gonna give you an answer because it's gonna just take a
[00:35:20] answer because it's gonna just take a long time so all right
[00:35:28] okay let's go back here alright so that
[00:35:32] okay let's go back here alright so that was backtracking search right so all we
[00:35:34] was backtracking search right so all we was doing was just going over all of
[00:35:36] was doing was just going over all of this tree and it was taking exponential
[00:35:39] this tree and it was taking exponential time as you saw and we just tried it out
[00:35:40] time as you saw and we just tried it out on that transportation problem that
[00:35:42] on that transportation problem that really fine so we just defined a search
[00:35:44] really fine so we just defined a search problem we use this really simple search
[00:35:46] problem we use this really simple search algorithm to find solutions for that and
[00:35:48] algorithm to find solutions for that and then that's what we have so far so so
[00:35:50] then that's what we have so far so so now what we want to do is we want to we
[00:35:52] now what we want to do is we want to we want to come up with a few better
[00:35:54] want to come up with a few better improvements of this backtracking search
[00:35:56] improvements of this backtracking search again don't get your hopes up it's not
[00:35:58] again don't get your hopes up it's not that big of an improvement but but we
[00:36:00] that big of an improvement but but we can do some something better so so the
[00:36:03] can do some something better so so the first improvement you want to make is by
[00:36:04] first improvement you want to make is by using this algorithm called depth-first
[00:36:06] using this algorithm called depth-first search so you might have heard of it the
[00:36:09] search so you might have heard of it the FS or depth-first search okay so the
[00:36:12] FS or depth-first search okay so the restriction that DFS puts in is that
[00:36:15] restriction that DFS puts in is that your cost has to be zero so your cost
[00:36:18] your cost has to be zero so your cost has to be you actually draw a line
[00:36:22] has to be you actually draw a line between them so okay so so we're talking
[00:36:27] between them so okay so so we're talking about the investor and the restriction
[00:36:31] about the investor and the restriction is the cost has to be zero so so what
[00:36:33] is the cost has to be zero so so what the offense does is it basically does
[00:36:36] the offense does is it basically does exactly the same thing as backtracking
[00:36:37] exactly the same thing as backtracking search but once it finds a solution down
[00:36:40] search but once it finds a solution down here then it is done it basically
[00:36:43] here then it is done it basically doesn't like explore the rest of this
[00:36:45] doesn't like explore the rest of this thing and the reason it can do that if
[00:36:47] thing and the reason it can do that if the cost of all these edges is zero so
[00:36:49] the cost of all these edges is zero so if the cost of all these edges are zero
[00:36:52] if the cost of all these edges are zero then if I find a solution I found a
[00:36:54] then if I find a solution I found a solution I don't need to like find its
[00:36:56] solution I don't need to like find its better solution that's because that that
[00:36:57] better solution that's because that that is good enough like anything like fine
[00:36:59] is good enough like anything like fine also has a cost of zero so my dad's all
[00:37:01] also has a cost of zero so my dad's all just returned to so an example of that
[00:37:04] just returned to so an example of that is if you have Rubik in rubik's cube
[00:37:05] is if you have Rubik in rubik's cube like if you find a solution then you
[00:37:08] like if you find a solution then you have found a solution right there are
[00:37:09] have found a solution right there are million different ways of like getting
[00:37:11] million different ways of like getting to a solution but like you just want one
[00:37:13] to a solution but like you just want one and then if you find one then you're
[00:37:15] and then if you find one then you're happy you are done okay so as you can
[00:37:18] happy you are done okay so as you can see this is a very very slight
[00:37:20] see this is a very very slight improvement to backtracking search what
[00:37:23] improvement to backtracking search what happens is in terms of in terms of space
[00:37:25] happens is in terms of in terms of space it's still the same thing so it's order
[00:37:27] it's still the same thing so it's order of D so in terms of space nothing has
[00:37:29] of D so in terms of space nothing has changed
[00:37:30] changed it's pretty good its order of D in terms
[00:37:33] it's pretty good its order of D in terms of time in practice it is better right
[00:37:36] of time in practice it is better right because in practice if I find a solution
[00:37:38] because in practice if I find a solution I can just be done don't worry about the
[00:37:40] I can just be done don't worry about the rest of the tree but
[00:37:41] rest of the tree but in general if you want to talk about it
[00:37:43] in general if you want to talk about it in theory then the worst case scenario
[00:37:45] in theory then the worst case scenario is just trying out all of the trees so
[00:37:46] is just trying out all of the trees so you write it as force case scenario it's
[00:37:49] you write it as force case scenario it's order of B to the D so so nothing has
[00:37:50] order of B to the D so so nothing has really changed in terms of in terms of
[00:37:54] really changed in terms of in terms of exponential blow-up doll that tree
[00:37:57] exponential blow-up doll that tree assumes that you applied the subproblems
[00:38:01] assumes that you applied the subproblems to not overlap all right because you're
[00:38:03] to not overlap all right because you're kind of branching of a kind of different
[00:38:05] kind of branching of a kind of different states but in fact a sub-problem could
[00:38:07] states but in fact a sub-problem could overlap so somebody to trim problem you
[00:38:10] overlap so somebody to trim problem you can get to a same place with different
[00:38:12] can get to a same place with different history but the rest are the same
[00:38:14] history but the rest are the same yeah so you can it's so the question is
[00:38:16] yeah so you can it's so the question is yeah do subproblems overlap here or they
[00:38:19] yeah do subproblems overlap here or they don't so you could actually have it in a
[00:38:21] don't so you could actually have it in a setting where two subproblems do overlap
[00:38:22] setting where two subproblems do overlap but you could actually add this this
[00:38:23] but you could actually add this this extra like constraint that says if I
[00:38:25] extra like constraint that says if I visited the state and don't add it to
[00:38:27] visited the state and don't add it to the tree so so you have that option or
[00:38:29] the tree so so you have that option or you have the option of like going down
[00:38:31] you have the option of like going down the tree with some like particular
[00:38:33] the tree with some like particular deaths and not trying out everything in
[00:38:35] deaths and not trying out everything in the setting that we have here yeah like
[00:38:37] the setting that we have here yeah like you're basically trying out all possible
[00:38:39] you're basically trying out all possible like I'm talking about the most like
[00:38:42] like I'm talking about the most like general form you're going over all the
[00:38:44] general form you're going over all the states and all possible actions that
[00:38:46] states and all possible actions that would come alright so that was CFS okay
[00:38:53] would come alright so that was CFS okay so the idea of the FS again as you're
[00:38:57] so the idea of the FS again as you're doing backtracking search and then
[00:38:58] doing backtracking search and then you're just stopping when you find a
[00:39:00] you're just stopping when you find a solution because because cost is zero
[00:39:02] solution because because cost is zero here so in terms of space order of D in
[00:39:05] here so in terms of space order of D in terms of time it's still order of
[00:39:08] terms of time it's still order of utility alright so that was the FS we
[00:39:13] utility alright so that was the FS we have another algorithm called
[00:39:14] have another algorithm called breadth-first search BFS and this is
[00:39:18] breadth-first search BFS and this is useful when cost is some constant but it
[00:39:22] useful when cost is some constant but it doesn't need to be zero it's just so
[00:39:24] doesn't need to be zero it's just so some some positive constant so what that
[00:39:27] some some positive constant so what that means is all these edges have the same
[00:39:31] means is all these edges have the same cost and that cost is just C the same
[00:39:37] cost and that cost is just C the same cost so the idea of breadth first search
[00:39:42] cost so the idea of breadth first search is we can we can go layer by layer like
[00:39:46] is we can we can go layer by layer like like we're not going to try out the
[00:39:48] like we're not going to try out the depth instead what we can do is we can
[00:39:50] depth instead what we can do is we can go layer by layer try out this layer and
[00:39:53] go layer by layer try out this layer and see if you find a solution here remember
[00:39:54] see if you find a solution here remember the tree doesn't need to go all the way
[00:39:56] the tree doesn't need to go all the way down here the tree could end here or
[00:39:57] down here the tree could end here or look at any of these and any of these
[00:39:59] look at any of these and any of these notes right like I can have like a tree
[00:40:01] notes right like I can have like a tree that looks you do like this
[00:40:03] that looks you do like this I have a solution here this tree doesn't
[00:40:06] I have a solution here this tree doesn't need to be this nicely formed like I can
[00:40:09] need to be this nicely formed like I can have a tree that looks like this okay so
[00:40:11] have a tree that looks like this okay so if I have a tree that looks like this
[00:40:13] if I have a tree that looks like this with breadth-first search I'm gonna try
[00:40:15] with breadth-first search I'm gonna try out this layer see if this guy is a
[00:40:17] out this layer see if this guy is a solution if it's not I'm gonna try this
[00:40:18] solution if it's not I'm gonna try this guy this is the solution if not I'm
[00:40:20] guy this is the solution if not I'm gonna try here here and then when I find
[00:40:22] gonna try here here and then when I find a solution when I get here I'm done
[00:40:24] a solution when I get here I'm done right that's like if I find a solution
[00:40:26] right that's like if I find a solution here I know it took to see to get here
[00:40:29] here I know it took to see to get here like two of these see values and if
[00:40:31] like two of these see values and if there is any other solution anywhere
[00:40:33] there is any other solution anywhere else and this subtree or in this subtree
[00:40:36] else and this subtree or in this subtree those solutions are going to be worse
[00:40:38] those solutions are going to be worse than this because they're gonna just
[00:40:39] than this because they're gonna just like take like they are going to have a
[00:40:42] like take like they are going to have a higher cost so because the cost is
[00:40:44] higher cost so because the cost is constant throughout so then it's it's
[00:40:48] constant throughout so then it's it's useful if your solutions are somewhere
[00:40:50] useful if your solutions are somewhere like high up in this tree and then you
[00:40:52] like high up in this tree and then you can find it so in terms of time I get
[00:40:55] can find it so in terms of time I get some improvements here because I can
[00:40:57] some improvements here because I can call this steps it's shorter depth D
[00:41:00] call this steps it's shorter depth D small D I'm gonna call it a shorter
[00:41:02] small D I'm gonna call it a shorter depth small D and in terms of time it's
[00:41:05] depth small D and in terms of time it's still exponential but it's order of B to
[00:41:08] still exponential but it's order of B to the small D and this is actually a huge
[00:41:11] the small D and this is actually a huge improvement because if you think about
[00:41:13] improvement because if you think about it the trees exponentially become larger
[00:41:15] it the trees exponentially become larger so these like lower levels are a lot of
[00:41:18] so these like lower levels are a lot of things that you need to you need to
[00:41:20] things that you need to you need to explore if you have like branching
[00:41:21] explore if you have like branching factor of 10 the next layer has a
[00:41:23] factor of 10 the next layer has a hundred things in it right so so going
[00:41:25] hundred things in it right so so going down these layers is actually pretty bad
[00:41:27] down these layers is actually pretty bad so so the fact that with breadth-first
[00:41:29] so so the fact that with breadth-first search i can improve the timing and then
[00:41:31] search i can improve the timing and then limit it to a particular depth that's
[00:41:33] limit it to a particular depth that's pretty good still exponential but pretty
[00:41:36] pretty good still exponential but pretty good if you have no negative cost so
[00:41:39] good if you have no negative cost so that way you can also assume this
[00:41:41] that way you can also assume this yeah you can assume that this is best
[00:41:43] yeah you can assume that this is best solution yeah exactly so you're assuming
[00:41:44] solution yeah exactly so you're assuming that there are no negative costs so at
[00:41:46] that there are no negative costs so at this point I know this the best solution
[00:41:47] this point I know this the best solution undone like I call it and I don't know I
[00:41:49] undone like I call it and I don't know I guess for anything else
[00:41:50] guess for anything else the problem with breadth-first search is
[00:41:53] the problem with breadth-first search is there's a question there sorry
[00:41:55] there's a question there sorry so same yeah we are assuming all the
[00:41:58] so same yeah we are assuming all the costs are same because maybe like all
[00:42:00] costs are same because maybe like all the costs are one if I don't assume that
[00:42:02] the costs are one if I don't assume that but all of these costs are 100 and then
[00:42:04] but all of these costs are 100 and then like there might be like like some water
[00:42:07] like there might be like like some water like yeah you need to explore the rest
[00:42:10] like yeah you need to explore the rest if they're not the same basically
[00:42:12] if they're not the same basically alright so so the problem with BFS is in
[00:42:16] alright so so the problem with BFS is in terms of memory we are losing in terms
[00:42:18] terms of memory we are losing in terms of memory you need to actually keep
[00:42:20] of memory you need to actually keep track of the history of all these other
[00:42:22] track of the history of all these other like all the notes that you have
[00:42:23] like all the notes that you have explored so far so in terms of memory
[00:42:26] explored so far so in terms of memory this is going to be order of B to the D
[00:42:29] this is going to be order of B to the D kind of similar to the time and then the
[00:42:32] kind of similar to the time and then the reason is I have explored this guy and
[00:42:36] reason is I have explored this guy and then after exploring this guy I need to
[00:42:39] then after exploring this guy I need to still have like a history of where it's
[00:42:42] still have like a history of where it's going to go because next time around
[00:42:44] going to go because next time around when I try out the Slayer I need to know
[00:42:46] when I try out the Slayer I need to know everything about this parent and I'd
[00:42:47] everything about this parent and I'd like when I even explore here and this
[00:42:49] like when I even explore here and this is not a solution
[00:42:50] is not a solution I need to store everything about this
[00:42:51] I need to store everything about this because maybe I don't find a solution in
[00:42:53] because maybe I don't find a solution in this in this level and I need to come
[00:42:55] this in this level and I need to come down and then I come down I need to know
[00:42:57] down and then I come down I need to know everything about these notes so I need
[00:42:59] everything about these notes so I need to actually store pretty much like
[00:43:01] to actually store pretty much like everything about the tree until I find
[00:43:03] everything about the tree until I find my solution and then that's where you
[00:43:05] my solution and then that's where you lose like in Brett's first search in
[00:43:08] lose like in Brett's first search in terms of space it's not going to be that
[00:43:10] terms of space it's not going to be that great so in terms of space it's now
[00:43:12] great so in terms of space it's now order P to the D it's a lot worse than
[00:43:13] order P to the D it's a lot worse than what we have had in terms of time it is
[00:43:16] what we have had in terms of time it is better it's still exponential it's
[00:43:18] better it's still exponential it's better okay all right okay so now let's
[00:43:26] better okay all right okay so now let's talk about one more algorithm and then
[00:43:28] talk about one more algorithm and then after that we we jump to dynamic
[00:43:30] after that we we jump to dynamic programming where is a question back
[00:43:32] programming where is a question back here yeah so it is exponential I agree
[00:43:39] here yeah so it is exponential I agree so D can be the same as Big D but in
[00:43:41] so D can be the same as Big D but in practice if small D is not the same as
[00:43:43] practice if small D is not the same as Big D you're winning a lot because
[00:43:46] Big D you're winning a lot because because yeah these lower layers are so
[00:43:48] because yeah these lower layers are so bad that that people actually like to
[00:43:51] bad that that people actually like to call the call the fact that
[00:43:53] call the call the fact that or don't be to the small D EFS be big
[00:43:59] or don't be to the small D EFS be big worst case in there for the time and
[00:44:01] worst case in there for the time and awkward yet so Deena fest needs to go
[00:44:06] awkward yet so Deena fest needs to go all the way down to these lower lower
[00:44:08] all the way down to these lower lower levels what vnfs can stop at every level
[00:44:11] levels what vnfs can stop at every level because it's doing a level by level yeah
[00:44:17] because it's doing a level by level yeah so the reason is yeah so like if you're
[00:44:19] so the reason is yeah so like if you're saying okay so in DFS we were also
[00:44:21] saying okay so in DFS we were also saving some time right like why aren't
[00:44:22] saving some time right like why aren't you like calling that out and then the
[00:44:24] you like calling that out and then the reason is with DFS you still need to get
[00:44:26] reason is with DFS you still need to get to these like more layers and that is
[00:44:28] to these like more layers and that is the like that is the place that you're
[00:44:31] the like that is the place that you're losing on time so so the fact that
[00:44:32] losing on time so so the fact that you're still like losing on time and
[00:44:34] you're still like losing on time and sure like you haven't explored these
[00:44:35] sure like you haven't explored these other ones but you have already got to
[00:44:37] other ones but you have already got to these lower trees like so far that's
[00:44:39] these lower trees like so far that's pretty bad so that is why or they're
[00:44:42] pretty bad so that is why or they're free to the D universe alright so this
[00:44:46] free to the D universe alright so this last Tyler thing I want to talk about
[00:44:47] last Tyler thing I want to talk about this is an idea that tries it's a cool
[00:44:49] this is an idea that tries it's a cool idea it actually tries to combine the
[00:44:52] idea it actually tries to combine the benefits of VFS and DFS and then this is
[00:44:55] benefits of VFS and DFS and then this is called DFS iterative deepening so what
[00:45:01] called DFS iterative deepening so what this algorithm does is it basically goes
[00:45:03] this algorithm does is it basically goes level by level same SPFs because then
[00:45:07] level by level same SPFs because then that way if you find a solution you're
[00:45:09] that way if you find a solution you're done everything is great right
[00:45:11] done everything is great right but with what it does is for every level
[00:45:14] but with what it does is for every level it runs a full DFS and it feels it take
[00:45:17] it runs a full DFS and it feels it take it's gonna take a long time but it's
[00:45:20] it's gonna take a long time but it's actually good because again if you find
[00:45:21] actually good because again if you find your solution like early on it doesn't
[00:45:24] your solution like early on it doesn't matter that you have ran like a million
[00:45:25] matter that you have ran like a million DSS's
[00:45:26] DSS's so far so um so it's kind of like an
[00:45:29] so far so um so it's kind of like an analogy of it is imagine that you have a
[00:45:31] analogy of it is imagine that you have a dog and that dog is DFS and it's on a
[00:45:34] dog and that dog is DFS and it's on a leash and you have like a short leash
[00:45:36] leash and you have like a short leash and when it is on that leash it's going
[00:45:38] and when it is on that leash it's going to do a DFS and try out and search all
[00:45:39] to do a DFS and try out and search all the space and it doesn't find it
[00:45:41] the space and it doesn't find it anything so it comes back and then
[00:45:43] anything so it comes back and then you're going to extend the leash a
[00:45:44] you're going to extend the leash a little bit and it's gonna do everything
[00:45:46] little bit and it's gonna do everything and like to search everything and do a
[00:45:48] and like to search everything and do a DFS comes back doesn't find anything you
[00:45:50] DFS comes back doesn't find anything you extend the leash again so so that's the
[00:45:52] extend the leash again so so that's the idea like extending the leeches this
[00:45:54] idea like extending the leeches this idea of extending your your levels okay
[00:45:57] idea of extending your your levels okay so so how does how does DFS iterative
[00:46:02] so so how does how does DFS iterative deepening in yes
[00:46:06] bottom of the tree it's even worse than
[00:46:07] bottom of the tree it's even worse than both of mine
[00:46:14] yes exactly yes that's okay that's a
[00:46:16] yes exactly yes that's okay that's a good point so the point is the point out
[00:46:18] good point so the point is the point out that mentioned this if your solution is
[00:46:22] that mentioned this if your solution is like here you're screwed it's worse than
[00:46:24] like here you're screwed it's worse than the FS or BFS right you're doing all
[00:46:26] the FS or BFS right you're doing all these DFS's through like a bigger like
[00:46:29] these DFS's through like a bigger like higher level BFS and your and and it's
[00:46:32] higher level BFS and your and and it's it's a terrible situation but again in
[00:46:33] it's a terrible situation but again in practice like we were hoping the
[00:46:35] practice like we were hoping the solutions are not gonna end up like down
[00:46:37] solutions are not gonna end up like down the street but yeah if the solutions are
[00:46:39] the street but yeah if the solutions are down the tree then you're not like
[00:46:41] down the tree then you're not like reading anything wait about using the SS
[00:46:44] reading anything wait about using the SS like problems do you think the FS area
[00:46:48] like problems do you think the FS area name would be useful in general if you
[00:46:50] name would be useful in general if you okay so the question is yeah so in what
[00:46:52] okay so the question is yeah so in what problems do we think the FSA deepening
[00:46:54] problems do we think the FSA deepening is useful in general if like there are
[00:46:57] is useful in general if like there are problems that I think BFS is going to be
[00:46:59] problems that I think BFS is going to be useful usually BFS iterative deepening
[00:47:00] useful usually BFS iterative deepening is useful the reason I would think that
[00:47:02] is useful the reason I would think that is like there is some structure about
[00:47:04] is like there is some structure about the problem that I would think I would
[00:47:06] the problem that I would think I would find my solution earlier so if I if I
[00:47:08] find my solution earlier so if I if I have some reasons some some reasons
[00:47:11] have some reasons some some reasons about the problem about the structure of
[00:47:12] about the problem about the structure of the problem and I think solutions are
[00:47:14] the problem and I think solutions are low depth I should use some of these
[00:47:16] low depth I should use some of these algorithms and a DFS video rate of
[00:47:18] algorithms and a DFS video rate of deepening in terms of space it helps to
[00:47:20] deepening in terms of space it helps to so might as well use that all right so
[00:47:22] so might as well use that all right so so in terms of space it's going to be
[00:47:24] so in terms of space it's going to be order of small Deena so in terms of
[00:47:28] order of small Deena so in terms of space or their small D and then in terms
[00:47:31] space or their small D and then in terms of time it gets the same benefits of it
[00:47:35] of time it gets the same benefits of it gets the same benefits of BFS so so
[00:47:38] gets the same benefits of BFS so so that's that's nice and then again like
[00:47:40] that's that's nice and then again like because it's has this BFS out of the
[00:47:43] because it's has this BFS out of the loop it has the same sort of constraint
[00:47:45] loop it has the same sort of constraint on it cost it's gotta be a constant
[00:47:48] on it cost it's gotta be a constant constraint guys pause alright so that is
[00:47:52] constraint guys pause alright so that is our table and again in looking at this
[00:47:55] our table and again in looking at this table in terms of time you're just not
[00:47:57] table in terms of time you're just not doing well right like we have this
[00:47:59] doing well right like we have this exponential time algorithms here and we
[00:48:04] exponential time algorithms here and we could avoid the exponential space
[00:48:06] could avoid the exponential space reducing something like the FS iterative
[00:48:08] reducing something like the FS iterative deepening but still this time thing is
[00:48:10] deepening but still this time thing is it's just not that great okay and what
[00:48:13] it's just not that great okay and what we want to do in a now is you want to
[00:48:14] we want to do in a now is you want to talk about search algorithms that bring
[00:48:17] talk about search algorithms that bring down this exponential time to polynomial
[00:48:19] down this exponential time to polynomial time
[00:48:19] time somehow and then there is no magic we'll
[00:48:22] somehow and then there is no magic we'll talk about how and dynamic programming
[00:48:25] talk about how and dynamic programming is the first so so the way iterative
[00:48:36] is the first so so the way iterative deepening works is it sets the local say
[00:48:40] deepening works is it sets the local say level is 1 so if level is 1 I'm gonna do
[00:48:42] level is 1 so if level is 1 I'm gonna do a full DFS okay because I'm doing a full
[00:48:46] a full DFS okay because I'm doing a full DFS interpretive space it's the same as
[00:48:49] DFS interpretive space it's the same as DFS in terms of space I just it's just
[00:48:51] DFS in terms of space I just it's just the same as the length where we find a
[00:48:54] the same as the length where we find a solution let's see the length where I
[00:48:55] solution let's see the length where I find a solution is small D so now I say
[00:48:58] find a solution is small D so now I say level is too many the level is - I'm
[00:49:00] level is too many the level is - I'm gonna do a full DFS yeah so when I do a
[00:49:04] gonna do a full DFS yeah so when I do a full DFS then in terms of space I need
[00:49:07] full DFS then in terms of space I need to I need to just remember my parents so
[00:49:10] to I need to just remember my parents so that's why it's border of T in terms of
[00:49:12] that's why it's border of T in terms of space and in terms of time its its order
[00:49:15] space and in terms of time its its order of B to the D because if I find my
[00:49:17] of B to the D because if I find my solution here I'm done I don't need to
[00:49:19] solution here I'm done I don't need to like explore anything else and and that
[00:49:21] like explore anything else and and that is exponential about the exponential in
[00:49:23] is exponential about the exponential in this smaller depth as opposed to longer
[00:49:26] this smaller depth as opposed to longer that's similar to similar to BFS I sorry
[00:49:34] that's similar to similar to BFS I sorry I still don't understand why let's say
[00:49:35] I still don't understand why let's say like that's it okay so that's a very
[00:49:40] like that's it okay so that's a very good question so yeah I think I know it
[00:49:41] good question so yeah I think I know it so you're asking small D small D was the
[00:49:43] so you're asking small D small D was the same as Big D if I had my solutions down
[00:49:45] same as Big D if I had my solutions down here oh why am I like differentiating
[00:49:47] here oh why am I like differentiating here between a small D and Big D right
[00:49:50] here between a small D and Big D right is that what you're asking for away
[00:49:54] quite large like smoothies log oh I see
[00:50:09] quite large like smoothies log oh I see what you're saying so you're saying okay
[00:50:10] what you're saying so you're saying okay like when I'm doing when I'm performing
[00:50:12] like when I'm doing when I'm performing DFS iterative deepening then I'm doing
[00:50:16] DFS iterative deepening then I'm doing DF DF SS so sure it's order of B to the
[00:50:19] DF DF SS so sure it's order of B to the D for each of them but then I'm doing D
[00:50:21] D for each of them but then I'm doing D of them and if these really large I
[00:50:22] of them and if these really large I should put that here sure I do agree
[00:50:25] should put that here sure I do agree that is the right time but again like in
[00:50:27] that is the right time but again like in the case of this exponential this is so
[00:50:30] the case of this exponential this is so bad that we are just dropping that like
[00:50:32] bad that we are just dropping that like don't even worry about that extra
[00:50:33] don't even worry about that extra t-that's come see but it is true you
[00:50:35] t-that's come see but it is true you need to have that extra D like in
[00:50:37] need to have that extra D like in general if you want to talk about it I
[00:50:38] general if you want to talk about it I don't want to move on to dynamic
[00:50:39] don't want to move on to dynamic programming but last question just on
[00:50:41] programming but last question just on top of that presumably though you're
[00:50:43] top of that presumably though you're saving the work that you've done during
[00:50:44] saving the work that you've done during the prior iteration so you're not really
[00:50:46] the prior iteration so you're not really computing anything larger than the
[00:50:47] computing anything larger than the capital d correct yeah that's right
[00:50:50] capital d correct yeah that's right the worst-case scenario is or to the
[00:50:51] the worst-case scenario is or to the beat all right so let's move to dynamic
[00:50:55] beat all right so let's move to dynamic programming okay so so what does dynamic
[00:50:58] programming okay so so what does dynamic programming do so maybe I can I'll still
[00:51:03] programming do so maybe I can I'll still use this cuz I might use this later okay
[00:51:08] use this cuz I might use this later okay so you know erase my tram on here okay
[00:51:11] so you know erase my tram on here okay so the idea of dynamic programming we
[00:51:13] so the idea of dynamic programming we have already seen this in the first
[00:51:14] have already seen this in the first lecture is I have a state s and I want
[00:51:19] lecture is I have a state s and I want to end up in some end state but to do
[00:51:21] to end up in some end state but to do that I can take an action that takes me
[00:51:24] that I can take an action that takes me to s prime right I can I can end up in s
[00:51:27] to s prime right I can I can end up in s prime by cost of this a I can take an
[00:51:31] prime by cost of this a I can take an action that not ends up in s prime and
[00:51:33] action that not ends up in s prime and then from there I can do a bunch of
[00:51:34] then from there I can do a bunch of things I don't know what but I'll end up
[00:51:37] things I don't know what but I'll end up in some end State and what I'm
[00:51:41] in some end State and what I'm interested in actually computing is for
[00:51:44] interested in actually computing is for this state s is to find what is future
[00:51:48] this state s is to find what is future cost of and this part of it is future
[00:51:54] cost of and this part of it is future cost that's fine and I don't know what
[00:51:57] cost that's fine and I don't know what it is but I can just leave it as future
[00:51:59] it is but I can just leave it as future cost of s prime so if I want to find
[00:52:01] cost of s prime so if I want to find what future cost of s is maybe I should
[00:52:04] what future cost of s is maybe I should say feels a little bit to the right one
[00:52:06] say feels a little bit to the right one sense right cost of si for this edge
[00:52:10] sense right cost of si for this edge erase this what I'm interested in
[00:52:13] erase this what I'm interested in finding is future cost of my stakes well
[00:52:17] finding is future cost of my stakes well what is that equal to well that's going
[00:52:20] what is that equal to well that's going to be equal to this cost of Si right
[00:52:23] to be equal to this cost of Si right like at state s I'm gonna take action
[00:52:24] like at state s I'm gonna take action okay so it's gonna be cost of Si plus
[00:52:30] okay so it's gonna be cost of Si plus future cost of s Prime again I don't
[00:52:32] future cost of s Prime again I don't know what that is but that's future
[00:52:34] know what that is but that's future Dorset's problem so this is future cost
[00:52:38] Dorset's problem so this is future cost of s Prime
[00:52:40] of s Prime and then you might ask well what is a
[00:52:42] and then you might ask well what is a where does a come from how do I know
[00:52:44] where does a come from how do I know what a is I don't know
[00:52:46] what a is I don't know I'm gonna pick an a that minimizes this
[00:52:49] I'm gonna pick an a that minimizes this some around it okay so future cost of s
[00:52:57] some around it okay so future cost of s is just going to be equal to minimum of
[00:53:00] is just going to be equal to minimum of cost of Si plus future cost of s prime
[00:53:03] cost of Si plus future cost of s prime over all possible actions and it's going
[00:53:07] over all possible actions and it's going to be zero if you're in an end State
[00:53:10] to be zero if you're in an end State this is end of true okay so if I already
[00:53:16] this is end of true okay so if I already know I'm in an end State and there is no
[00:53:18] know I'm in an end State and there is no future cost that's going to be equal to
[00:53:19] future cost that's going to be equal to 0 otherwise
[00:53:21] 0 otherwise future cost is just going to be cost of
[00:53:23] future cost is just going to be cost of going from s to the next state and then
[00:53:25] going from s to the next state and then future cost so that is just how one
[00:53:33] future cost so that is just how one would go about formalizing this problem
[00:53:35] would go about formalizing this problem as a dynamic problem and they're not
[00:53:37] as a dynamic problem and they're not dynamic programming problem and then how
[00:53:41] dynamic programming problem and then how do I find what s Primus well I wrote
[00:53:44] do I find what s Primus well I wrote this successor and cost function my code
[00:53:46] this successor and cost function my code remember like we know how to find a
[00:53:48] remember like we know how to find a successor given that we are in state s
[00:53:51] successor given that we are in state s and we are taking action a so s prime is
[00:53:54] and we are taking action a so s prime is just calling that successor function
[00:53:56] just calling that successor function over SN a alright so let's go back to
[00:53:59] over SN a alright so let's go back to some route finding example so so this is
[00:54:03] some route finding example so so this is a slightly different route finding
[00:54:05] a slightly different route finding example so let's say that you want to
[00:54:06] example so let's say that you want to find the minimum cost path from going a
[00:54:09] find the minimum cost path from going a city 1 to some city end in the future
[00:54:12] city 1 to some city end in the future moving forward we can always just move
[00:54:14] moving forward we can always just move forward and it costs CIJ to go from city
[00:54:17] forward and it costs CIJ to go from city I to CDJ ok so this is this is my new
[00:54:20] I to CDJ ok so this is this is my new search problem so so this is kind of how
[00:54:23] search problem so so this is kind of how the tree would look like so if I want to
[00:54:25] the tree would look like so if I want to draw this research for this I can start
[00:54:27] draw this research for this I can start from city one I can end up in city 2 or
[00:54:30] from city one I can end up in city 2 or 3 or 4 then if I'm a city two I can end
[00:54:33] 3 or 4 then if I'm a city two I can end up in 3 or 4 upon 3 I can end up in or
[00:54:35] up in 3 or 4 upon 3 I can end up in or right like this is how I can have a much
[00:54:39] right like this is how I can have a much larger version of it if I'm talking
[00:54:41] larger version of it if I'm talking about going to city 7 then I have this
[00:54:44] about going to city 7 then I have this type of tree and by just like looking at
[00:54:46] type of tree and by just like looking at this tree you see all these sub tree is
[00:54:49] this tree you see all these sub tree is just being repeated like like throughout
[00:54:51] just being repeated like like throughout right if
[00:54:52] right if just looking at five like future costs
[00:54:54] just looking at five like future costs of five it's gonna be the same thing
[00:54:56] of five it's gonna be the same thing this is gonna be the same thing
[00:54:58] this is gonna be the same thing throughout and if I use like something
[00:55:00] throughout and if I use like something liked research said we have talked about
[00:55:01] liked research said we have talked about then I have to like go and explore like
[00:55:03] then I have to like go and explore like this whole tree and then it's gonna be
[00:55:06] this whole tree and then it's gonna be really time-consuming so so the key
[00:55:08] really time-consuming so so the key insight here is future costs this value
[00:55:11] insight here is future costs this value of future cost only depends on state
[00:55:13] of future cost only depends on state okay so it only depends on where I am
[00:55:16] okay so it only depends on where I am right now and because of that maybe I
[00:55:18] right now and because of that maybe I can just store that the first time that
[00:55:19] can just store that the first time that I compute future cost of five and then
[00:55:22] I compute future cost of five and then like in the future I just call that and
[00:55:24] like in the future I just call that and then and I don't like recompute the
[00:55:25] then and I don't like recompute the future cost of life so so the
[00:55:28] future cost of life so so the observation here is future costs only
[00:55:31] observation here is future costs only depends on current city so so my state
[00:55:34] depends on current city so so my state in this case is current city and then
[00:55:36] in this case is current city and then that state is enough for me to compute
[00:55:38] that state is enough for me to compute future costs all right so so if you if
[00:55:43] future costs all right so so if you if you think about what we have talked
[00:55:45] you think about what we have talked about so far like we have thought about
[00:55:46] about so far like we have thought about like these these search problems where
[00:55:49] like these these search problems where the stage we think of it as the past
[00:55:50] the stage we think of it as the past sequence of actions in the history of
[00:55:52] sequence of actions in the history of actions you have taken and all that but
[00:55:54] actions you have taken and all that but right now for this problem like see is
[00:55:56] right now for this problem like see is just current city that's enough okay so
[00:55:59] just current city that's enough okay so and because of that you're getting all
[00:56:01] and because of that you're getting all these exponential savings in time and
[00:56:03] these exponential savings in time and and space because again I can compute
[00:56:05] and space because again I can compute future cost of five there and collapse
[00:56:07] future cost of five there and collapse that whole tree into this graph and just
[00:56:10] that whole tree into this graph and just go about solving my search problem in
[00:56:12] go about solving my search problem in this graph as opposed to that that whole
[00:56:14] this graph as opposed to that that whole tree yeah so so that's that's where you
[00:56:16] tree yeah so so that's that's where you get the savings from dynamic programming
[00:56:20] get the savings from dynamic programming and I just want to emphasize that again
[00:56:23] and I just want to emphasize that again all let me actually do this so so the
[00:56:25] all let me actually do this so so the key idea here is like what I was saying
[00:56:28] key idea here is like what I was saying there's no magic happening here the key
[00:56:29] there's no magic happening here the key idea here is is how to figure out what
[00:56:32] idea here is is how to figure out what your state is it's actually important to
[00:56:34] your state is it's actually important to think about what your status in this
[00:56:36] think about what your status in this case we are we're assuming the state is
[00:56:37] case we are we're assuming the state is summary of all parts all past actions
[00:56:40] summary of all parts all past actions that we have taken sufficient for us to
[00:56:43] that we have taken sufficient for us to choose the optimal future okay so so
[00:56:45] choose the optimal future okay so so that's like a mouthful but basically
[00:56:47] that's like a mouthful but basically what that means is the only reason
[00:56:50] what that means is the only reason dynamic programming works and for this
[00:56:52] dynamic programming works and for this particular example we just saw is the
[00:56:54] particular example we just saw is the state the way we define it is enough for
[00:56:56] state the way we define it is enough for us to plan for the future I might have a
[00:56:59] us to plan for the future I might have a different problem where the state like I
[00:57:00] different problem where the state like I defined a state in a way that it's not
[00:57:02] defined a state in a way that it's not enough for me to define for future but
[00:57:04] enough for me to define for future but if
[00:57:05] if I want to use dynamic programming then I
[00:57:06] I want to use dynamic programming then I got to be smart about choosing my state
[00:57:08] got to be smart about choosing my state because because that is the thing that
[00:57:09] because because that is the thing that decides for the future so so for example
[00:57:12] decides for the future so so for example for this problem like I might visit city
[00:57:14] for this problem like I might visit city 1 then 3 then 4 and then 6 and and for
[00:57:18] 1 then 3 then 4 and then 6 and and for solving this particular search problem I
[00:57:19] solving this particular search problem I just need to know that I'm in City 6
[00:57:21] just need to know that I'm in City 6 that is enough but like maybe I have
[00:57:24] that is enough but like maybe I have some other problem that requires knowing
[00:57:26] some other problem that requires knowing 1 3 4 and 6 and then because of that
[00:57:28] 1 3 4 and 6 and then because of that maybe I need to know the full tree ok so
[00:57:30] maybe I need to know the full tree ok so so this is where the saving comes from
[00:57:32] so this is where the saving comes from like figuring out what the state is and
[00:57:34] like figuring out what the state is and and defining that right all right so so
[00:57:37] and defining that right all right so so we'll come back to this notion of state
[00:57:38] we'll come back to this notion of state again and think about the state a little
[00:57:40] again and think about the state a little bit more carefully but maybe before that
[00:57:42] bit more carefully but maybe before that maybe we can just implement dynamic
[00:57:43] maybe we can just implement dynamic programming real quick all right so
[00:57:46] programming real quick all right so let's go back to our tram problem I'm
[00:57:48] let's go back to our tram problem I'm back to the tram problem and let's
[00:57:52] back to the tram problem and let's implement dynamic programming ok so how
[00:57:55] implement dynamic programming ok so how do we do this you're basically just
[00:57:56] do we do this you're basically just writing that like math over there into
[00:57:59] writing that like math over there into code that that's all we're doing so so
[00:58:00] code that that's all we're doing so so we're going to define this future cost
[00:58:02] we're going to define this future cost if you're in an end state we're in
[00:58:04] if you're in an end state we're in return 0 if you're not in an end state
[00:58:07] return 0 if you're not in an end state we're just going to add up cost plus
[00:58:09] we're just going to add up cost plus future cost of s Prime how do we get S
[00:58:11] future cost of s Prime how do we get S Prime well you're going to call this
[00:58:13] Prime well you're going to call this success a success and cost function so
[00:58:16] success a success and cost function so we can get action new state and the
[00:58:18] we can get action new state and the costs and then you're going to take the
[00:58:20] costs and then you're going to take the minimum of them over over all possible
[00:58:23] minimum of them over over all possible actions so minimum of cost plus future
[00:58:26] actions so minimum of cost plus future cost of new states that is literally
[00:58:29] cost of new states that is literally what you have on your board
[00:58:33] right
[00:58:35] right and we are returning a result so that is
[00:58:38] and we are returning a result so that is Futurecast
[00:58:39] Futurecast what's your dynamic programming did it
[00:58:40] what's your dynamic programming did it should it should return future cost over
[00:58:43] should it should return future cost over initial state right starts
[00:58:47] you
[00:58:48] you and you returned the history if you want
[00:58:51] and you returned the history if you want in this case I'm not returning to
[00:58:53] in this case I'm not returning to history okay so how do I get savings
[00:58:57] history okay so how do I get savings well I got to put a cache right that's
[00:58:59] well I got to put a cache right that's the only way I'm gonna get savings so
[00:59:01] the only way I'm gonna get savings so that is where I put the cash and and if
[00:59:05] that is where I put the cash and and if I if the state is already in the cache
[00:59:08] I if the state is already in the cache I'll just call my cache otherwise
[00:59:10] I'll just call my cache otherwise question go supposed to go how are we
[00:59:16] question go supposed to go how are we getting future costs how are we getting
[00:59:19] getting future costs how are we getting so future caustics and States but what
[00:59:22] so future caustics and States but what actually this would like we have to have
[00:59:23] actually this would like we have to have like a function implemented if I
[00:59:24] like a function implemented if I calculate your costs or is that like so
[00:59:27] calculate your costs or is that like so the future cost is going to be yes oh so
[00:59:29] the future cost is going to be yes oh so we have this function right future costs
[00:59:31] we have this function right future costs overstate but you're gonna call future
[00:59:33] overstate but you're gonna call future costs so future cost over state is going
[00:59:36] costs so future cost over state is going to be equal to cost of state and actions
[00:59:38] to be equal to cost of state and actions in this function I'm saying all possible
[00:59:40] in this function I'm saying all possible actions try that out plus future cost of
[00:59:42] actions try that out plus future cost of s Prime and s Prime comes from the
[00:59:45] s Prime and s Prime comes from the successor and then cost function and a
[00:59:46] successor and then cost function and a successor and cost much
[00:59:49] successor and cost much all right so an and yeah and so so we do
[00:59:53] all right so an and yeah and so so we do the caching the proper caching type of
[00:59:55] the caching the proper caching type of way of doing this too and now we have
[00:59:58] way of doing this too and now we have dynamic programming so we can basically
[00:59:59] dynamic programming so we can basically call this over our trying problem so I'm
[01:00:05] call this over our trying problem so I'm gonna I'm gonna move forward
[01:00:08] gonna I'm gonna move forward okay so let's do print solution dynamic
[01:00:13] okay so let's do print solution dynamic programming over our problem you can
[01:00:16] programming over our problem you can again play around with this the only
[01:00:18] again play around with this the only baby I'm taking this is if it gives me
[01:00:20] baby I'm taking this is if it gives me the same solution has backtracking
[01:00:22] the same solution has backtracking search because I knew how that works
[01:00:24] search because I knew how that works right so let's just call it on ten and
[01:00:28] right so let's just call it on ten and yeah it gave me the same the same answer
[01:00:30] yeah it gave me the same the same answer so I can pull your own with this okay
[01:00:32] so I can pull your own with this okay all right so let's go back
[01:00:40] all right so let's go back okay so one assumption that we have here
[01:00:44] okay so one assumption that we have here to just point out is we're assuming that
[01:00:46] to just point out is we're assuming that this graph is going to be a cyclic so so
[01:00:49] this graph is going to be a cyclic so so that's that's an assumption that we need
[01:00:50] that's that's an assumption that we need to make when we are solving this dynamic
[01:00:52] to make when we are solving this dynamic programming problem and the reason is
[01:00:54] programming problem and the reason is well we need to compute this future cost
[01:00:57] well we need to compute this future cost right for me to compute future cost of s
[01:00:59] right for me to compute future cost of s the SS prime I need to like have thought
[01:01:03] the SS prime I need to like have thought about sorry for me to compute future
[01:01:05] about sorry for me to compute future cost of s I need to have thought about
[01:01:06] cost of s I need to have thought about future cost of s prime so there is kind
[01:01:09] future cost of s prime so there is kind of this natural ordering that exists
[01:01:11] of this natural ordering that exists between my states so if I think about an
[01:01:13] between my states so if I think about an example over there are cycles then then
[01:01:17] example over there are cycles then then I don't have that ordering right if I
[01:01:18] I don't have that ordering right if I want to compute it let's say I want to
[01:01:19] want to compute it let's say I want to go from A to D here and C so if I want
[01:01:25] go from A to D here and C so if I want to compute future cost of B I don't
[01:01:27] to compute future cost of B I don't really know if I should have computed
[01:01:29] really know if I should have computed future cost of a before or C before or
[01:01:32] future cost of a before or C before or what order should I have gone to compute
[01:01:35] what order should I have gone to compute like future possibly so so you actually
[01:01:38] like future possibly so so you actually need to have some way of ordering your
[01:01:40] need to have some way of ordering your states in order to compute these future
[01:01:43] states in order to compute these future costs and apply the dynamic programming
[01:01:45] costs and apply the dynamic programming so that's why like we can't really have
[01:01:47] so that's why like we can't really have cycles like let me let me think about
[01:01:49] cycles like let me let me think about this algorithm but we are going to talk
[01:01:50] this algorithm but we are going to talk about a uniform cost search which
[01:01:52] about a uniform cost search which actually allows us to have cycles like
[01:01:54] actually allows us to have cycles like in a few slides so the wrong time all
[01:01:59] in a few slides so the wrong time all this is actually polynomial time into
[01:02:01] this is actually polynomial time into order of states so order of N
[01:02:04] order of states so order of N yeah well remember and it's a number of
[01:02:07] yeah well remember and it's a number of states all right so all right so let's
[01:02:12] states all right so all right so let's talk about the idea of States a little
[01:02:13] talk about the idea of States a little bit more because I think this is this is
[01:02:15] bit more because I think this is this is actually interesting right so so let's
[01:02:17] actually interesting right so so let's just reiterate what is a state state is
[01:02:19] just reiterate what is a state state is a summary of all past actions sufficient
[01:02:22] a summary of all past actions sufficient to choose future actions optimally okay
[01:02:25] to choose future actions optimally okay so so everyone happy but what status so
[01:02:29] so so everyone happy but what status so now what we want to do is we want to
[01:02:30] now what we want to do is we want to figure out how we should define our
[01:02:33] figure out how we should define our state space because again this is an
[01:02:34] state space because again this is an important problem right like how we were
[01:02:36] important problem right like how we were defining state space is the thing that
[01:02:38] defining state space is the thing that gets the dynamic program you're working
[01:02:39] gets the dynamic program you're working so so we got it we got to think about
[01:02:41] so so we got it we got to think about how to do that so so let's go back to
[01:02:43] how to do that so so let's go back to this example and let's just change that
[01:02:45] this example and let's just change that a little bit so so this is the same
[01:02:46] a little bit so so this is the same example of I'm going from city one to
[01:02:48] example of I'm going from city one to see the end I can only move forward and
[01:02:51] see the end I can only move forward and it costs CIJ to go from any city I to
[01:02:53] it costs CIJ to go from any city I to say DJ and I'm gonna add a constraint
[01:02:56] say DJ and I'm gonna add a constraint and the constraint is I can't visit
[01:02:58] and the constraint is I can't visit three odd cities in a row so what that
[01:03:02] three odd cities in a row so what that means is maybe I'm in state 1 and then I
[01:03:08] means is maybe I'm in state 1 and then I went to state 3 or cd1 I went to city 3
[01:03:12] went to state 3 or cd1 I went to city 3 and then after that can I go to City
[01:03:15] and then after that can I go to City Simon well no based on this constraint
[01:03:18] Simon well no based on this constraint that I've added I can't do that right so
[01:03:21] that I've added I can't do that right so I wanted to find a state space that
[01:03:23] I wanted to find a state space that allows me to keep track of these things
[01:03:25] allows me to keep track of these things so I can solve this new search problem
[01:03:27] so I can solve this new search problem with this new constraint so so how
[01:03:30] with this new constraint so so how should i how should i do so in the
[01:03:33] should i how should i do so in the previous problem when we didn't have the
[01:03:35] previous problem when we didn't have the constraint our state was just the
[01:03:38] constraint our state was just the current city like previously you just
[01:03:40] current city like previously you just cared about the current city and the
[01:03:43] cared about the current city and the reason we cared about the current city
[01:03:44] reason we cared about the current city is like like we were solving the search
[01:03:46] is like like we were solving the search problem like they end up in a city you
[01:03:48] problem like they end up in a city you need to know how I'm going where I
[01:03:50] need to know how I'm going where I should go from 3 so I should I should
[01:03:51] should go from 3 so I should I should have my current city in general right so
[01:03:53] have my current city in general right so so for the previous problem without the
[01:03:55] so for the previous problem without the constraint current city was enough but
[01:03:58] constraint current city was enough but now current city is not enough right I
[01:03:59] now current city is not enough right I actually need to know like something
[01:04:02] actually need to know like something about my path ok yeah that's actually a
[01:04:09] about my path ok yeah that's actually a very good point so one suggestion is
[01:04:12] very good point so one suggestion is have a count of how many odd States
[01:04:14] have a count of how many odd States another maybe like and different maybe
[01:04:15] another maybe like and different maybe the first thing I would come to our mind
[01:04:17] the first thing I would come to our mind is something
[01:04:17] is something simpler so maybe we say well the state
[01:04:19] simpler so maybe we say well the state is similar to its like the state like
[01:04:24] is similar to its like the state like when we say well the state is previous
[01:04:26] when we say well the state is previous city and current city
[01:04:29] city and current city okay so this is one possible option for
[01:04:32] okay so this is one possible option for for my state right cuz cuz if I have
[01:04:34] for my state right cuz cuz if I have this if I have this guy as my state and
[01:04:37] this if I have this guy as my state and then that is enough right like if I my
[01:04:39] then that is enough right like if I my current city is three I know my previous
[01:04:41] current city is three I know my previous city was one I know I shouldn't go to
[01:04:44] city was one I know I shouldn't go to seven like that's enough for me to make
[01:04:45] seven like that's enough for me to make like future decisions yeah but there is
[01:04:49] like future decisions yeah but there is a problem with this well what is the
[01:04:51] a problem with this well what is the problem so I have n cities right so so
[01:04:55] problem so I have n cities right so so current city can take any possible
[01:04:57] current city can take any possible action and n possible states previous
[01:05:00] action and n possible states previous city can also take and possible options
[01:05:03] city can also take and possible options has impossible options so if I think
[01:05:04] has impossible options so if I think about the size of my state space it is n
[01:05:07] about the size of my state space it is n squirt if I decide to choose this state
[01:05:10] squirt if I decide to choose this state but if I decide to choose this state I'm
[01:05:12] but if I decide to choose this state I'm going to have n squirt
[01:05:14] going to have n squirt States and remember we are doing this
[01:05:15] States and remember we are doing this dynamic programming thing like we need
[01:05:17] dynamic programming thing like we need to actually like write down like like
[01:05:19] to actually like write down like like how to get from all those states that's
[01:05:21] how to get from all those states that's gonna be big but there is an improvement
[01:05:23] gonna be big but there is an improvement to this and that's the improvement that
[01:05:24] to this and that's the improvement that you suggested which is I don't actually
[01:05:27] you suggested which is I don't actually need to have this whole giant previous
[01:05:29] need to have this whole giant previous city which has n options I can just have
[01:05:31] city which has n options I can just have a counter to just know whether the
[01:05:33] a counter to just know whether the previous city was odd or not like that's
[01:05:36] previous city was odd or not like that's enough right like fight I don't care if
[01:05:37] enough right like fight I don't care if it was one or three or whatever right
[01:05:39] it was one or three or whatever right like I just care to know if previous
[01:05:41] like I just care to know if previous city was odd or not so so another option
[01:05:43] city was odd or not so so another option for alright it's here another option for
[01:05:46] for alright it's here another option for my state is to know if previous was odd
[01:05:51] my state is to know if previous was odd or not okay and I need to know my
[01:05:54] or not okay and I need to know my current city again right currency do we
[01:05:56] current city again right currency do we need that because like we need to know
[01:05:58] need that because like we need to know how to get from there and then this
[01:06:01] how to get from there and then this brings down my state space like how does
[01:06:03] brings down my state space like how does it bring down my state space because
[01:06:04] it bring down my state space because well what's the size of my state space
[01:06:06] well what's the size of my state space this guy can take and possible states if
[01:06:10] this guy can take and possible states if my previous city was odd that's too like
[01:06:13] my previous city was odd that's too like so I just brought down my state space
[01:06:17] so I just brought down my state space from something that was N squared 2 to N
[01:06:19] from something that was N squared 2 to N and that's a good improvement so in
[01:06:21] and that's a good improvement so in general learning you're picking these
[01:06:23] general learning you're picking these state spaces you should pick them mini
[01:06:25] state spaces you should pick them mini more like sufficient thing for you to
[01:06:27] more like sufficient thing for you to make decisions so it's got to be a
[01:06:29] make decisions so it's got to be a summary of all the previous action
[01:06:31] summary of all the previous action previous things that you need to make
[01:06:33] previous things that you need to make future decisions but pick the minimal
[01:06:35] future decisions but pick the minimal one because you're stirring these things
[01:06:36] one because you're stirring these things and it actually matters to pick the
[01:06:38] and it actually matters to pick the smallest one so so here is an example of
[01:06:41] smallest one so so here is an example of like exactly that so so my state is now
[01:06:43] like exactly that so so my state is now this tuple or whether the previous city
[01:06:45] this tuple or whether the previous city was odd or not and my current city so if
[01:06:47] was odd or not and my current city so if I start at city one well like I don't
[01:06:49] I start at city one well like I don't have a previous city and I'm at City one
[01:06:51] have a previous city and I'm at City one I could go to City three and I end up in
[01:06:54] I could go to City three and I end up in odd and three I could try to go to City
[01:06:58] odd and three I could try to go to City seven well that's not possible because
[01:06:59] seven well that's not possible because now I have good three state and end and
[01:07:03] now I have good three state and end and up here and there like the rest of the
[01:07:05] up here and there like the rest of the tree you can have another example so so
[01:07:11] tree you can have another example so so the way I'm counting this is how so my
[01:07:14] the way I'm counting this is how so my state is a tuple of two things right if
[01:07:17] state is a tuple of two things right if the previous city is odd or even I have
[01:07:19] the previous city is odd or even I have two options here it's either odd or even
[01:07:21] two options here it's either odd or even that's too and then my current city and
[01:07:23] that's too and then my current city and I have n possible options for my
[01:07:25] I have n possible options for my currency they could be sitting one city
[01:07:27] currency they could be sitting one city to city three so that's n so I have any
[01:07:29] to city three so that's n so I have any options here I have two options here
[01:07:31] options here I have two options here that's why I'm saying my whole state
[01:07:32] that's why I'm saying my whole state space is two times okay all right okay
[01:07:38] space is two times okay all right okay so let's try out this example let's not
[01:07:42] so let's try out this example let's not put it in just talk to your neighbors
[01:07:44] put it in just talk to your neighbors about this and then maybe you have ideas
[01:07:47] about this and then maybe you have ideas just let me know in a minute so okay so
[01:07:49] just let me know in a minute so okay so what is the difference here so you're
[01:07:50] what is the difference here so you're traveling from city one to CDN and then
[01:07:53] traveling from city one to CDN and then the constraint has changed now we want
[01:07:54] the constraint has changed now we want to visit at least three odd cities so
[01:07:57] to visit at least three odd cities so that's what we want to do and then the
[01:07:59] that's what we want to do and then the question is what is the minimal state
[01:08:03] talk to your neighbors
[01:08:29] all right any ideas
[01:08:39] any ideas
[01:08:41] any ideas what is a possible state like it don't
[01:08:45] what is a possible state like it don't worry about the minimal event like for
[01:08:46] worry about the minimal event like for now like what do I need to keep track of
[01:08:51] number of this number of odd City okay
[01:08:54] number of this number of odd City okay so is that it
[01:08:55] so is that it do I need to just know number of odd
[01:08:56] do I need to just know number of odd cities so number of what I meant is I
[01:09:05] cities so number of what I meant is I also need to have chrome City right so
[01:09:07] also need to have chrome City right so okay so one possible option for this new
[01:09:09] okay so one possible option for this new example I'm gonna write that here it is
[01:09:12] example I'm gonna write that here it is I wonder is it at least three odd cities
[01:09:15] I wonder is it at least three odd cities I also need my to know my current city
[01:09:17] I also need my to know my current city for any of these types of like not any
[01:09:18] for any of these types of like not any of these types of problems look for
[01:09:20] of these types of problems look for these particular problems that I've
[01:09:21] these particular problems that I've defined here I need to know where I am
[01:09:22] defined here I need to know where I am so I need to know what my current city
[01:09:24] so I need to know what my current city is so so that is like that is given what
[01:09:27] is so so that is like that is given what I need to talk okay so I want to see at
[01:09:30] I need to talk okay so I want to see at least three odd cities so one possible
[01:09:32] least three odd cities so one possible option is to just have a counter I keep
[01:09:34] option is to just have a counter I keep counting number of odd cities so this
[01:09:41] counting number of odd cities so this could be one potential yes or maybe one
[01:09:49] could be one potential yes or maybe one three one so okay so the question is do
[01:09:53] three one so okay so the question is do the cities need to be different the way
[01:09:54] the cities need to be different the way they are defining the problem is we're
[01:09:56] they are defining the problem is we're moving forward if I'm in one like I can
[01:09:58] moving forward if I'm in one like I can just move forward I can't like say add
[01:09:59] just move forward I can't like say add one or I can't like go back so so we're
[01:10:02] one or I can't like go back so so we're always moving forward but when we talk
[01:10:03] always moving forward but when we talk about the state space we're talking
[01:10:05] about the state space we're talking about the more general setting like some
[01:10:08] about the more general setting like some of some of that 2n might not even be
[01:10:10] of some of that 2n might not even be possible
[01:10:10] possible that's alright so so this is one option
[01:10:15] that's alright so so this is one option but I can actually do better than this
[01:10:24] and then you're done right so a
[01:10:27] and then you're done right so a suggestion there is we can you didn't
[01:10:28] suggestion there is we can you didn't have like you can use at least three odd
[01:10:31] have like you can use at least three odd cities then you need at least two art
[01:10:33] cities then you need at least two art reduced and at least one art city and
[01:10:35] reduced and at least one art city and then you're done and one way of
[01:10:36] then you're done and one way of formalizing that that's exactly right
[01:10:37] formalizing that that's exactly right like I don't care if I have four art
[01:10:39] like I don't care if I have four art cities now or five art cities like as
[01:10:41] cities now or five art cities like as long as I have like a buff three that's
[01:10:43] long as I have like a buff three that's that's good enough right one odd city to
[01:10:45] that's good enough right one odd city to our cities three art city and
[01:10:47] our cities three art city and that is just three Foss like like that's
[01:10:49] that is just three Foss like like that's enough for me okay so if I had this then
[01:10:53] enough for me okay so if I had this then the state space here is going to be an
[01:10:56] the state space here is going to be an options here and number of odd cities
[01:10:59] options here and number of odd cities it's around and over two so it's going
[01:11:01] it's around and over two so it's going to be N squared over two
[01:11:02] to be N squared over two but if I use this this new suggestion
[01:11:05] but if I use this this new suggestion where I don't keep track of four or five
[01:11:07] where I don't keep track of four or five six seven I just keep track of one two
[01:11:09] six seven I just keep track of one two and three plus then my state space ends
[01:11:12] and three plus then my state space ends up becoming three times N and I can
[01:11:15] up becoming three times N and I can formally write that as s is equal to
[01:11:17] formally write that as s is equal to minimum of number of odd cities and
[01:11:23] minimum of number of odd cities and three and then current cities needs the
[01:11:27] three and then current cities needs the current city and with this state space
[01:11:31] current city and with this state space then size is equal to three so I just
[01:11:38] then size is equal to three so I just pick up brought down and squared to n
[01:11:40] pick up brought down and squared to n and that's a nice yes not also need an
[01:11:45] and that's a nice yes not also need an option for zero visibility we're
[01:11:50] option for zero visibility we're starting from city one so you're already
[01:11:51] starting from city one so you're already counting that in but yeah like if you
[01:11:53] counting that in but yeah like if you have zero answer
[01:11:56] have zero answer all right so haha okay so that was that
[01:12:01] all right so haha okay so that was that this is how it look like like you can
[01:12:03] this is how it look like like you can think of your state space like this
[01:12:04] think of your state space like this again has a two pole of I visited one
[01:12:06] again has a two pole of I visited one two three and and then the cities I have
[01:12:10] two three and and then the cities I have another example here you can think about
[01:12:12] another example here you can think about this later and yeah like work it at home
[01:12:15] this later and yeah like work it at home but basically the question is again
[01:12:17] but basically the question is again you're going from city 1 to n and you
[01:12:19] you're going from city 1 to n and you want to visit more odd cities than even
[01:12:21] want to visit more odd cities than even cities what would be the minimum states
[01:12:23] cities what would be the minimum states but you can talk about it offline so the
[01:12:26] but you can talk about it offline so the summary so far is is that state is going
[01:12:29] summary so far is is that state is going to be a summary of past actions
[01:12:30] to be a summary of past actions sufficient to choose future actions
[01:12:32] sufficient to choose future actions optimally and then dynamic programming
[01:12:34] optimally and then dynamic programming it's not doing any magic right it's
[01:12:36] it's not doing any magic right it's using this notion of state to bring down
[01:12:38] using this notion of state to bring down this exponential time algorithm to a
[01:12:40] this exponential time algorithm to a polynomial time algorithm and then the
[01:12:42] polynomial time algorithm and then the trick of using memorization and with the
[01:12:44] trick of using memorization and with the trick of choosing the right state okay
[01:12:47] trick of choosing the right state okay and we've talked about dynamic
[01:12:48] and we've talked about dynamic programming and how it doesn't work for
[01:12:51] programming and how it doesn't work for cyclic graphs and now we want to spend a
[01:12:53] cyclic graphs and now we want to spend a little bit of time talking about uniform
[01:12:55] little bit of time talking about uniform cost search
[01:12:57] cost search and how that can help with the cycles so
[01:13:01] and how that can help with the cycles so if you guys have seen Dijkstra's
[01:13:02] if you guys have seen Dijkstra's algorithm this is very similar to
[01:13:04] algorithm this is very similar to de-stress like yeah so it's basically
[01:13:07] de-stress like yeah so it's basically like stars but alright so let's let's
[01:13:11] like stars but alright so let's let's actually talk about this so so the
[01:13:12] actually talk about this so so the observation here is that when we when we
[01:13:14] observation here is that when we when we think about the cost of getting from
[01:13:16] think about the cost of getting from start state to some s prime well that is
[01:13:20] start state to some s prime well that is going to be equal to cost of going from
[01:13:23] going to be equal to cost of going from s to s prime and then some past costs of
[01:13:26] s to s prime and then some past costs of us and then been dynamic programming
[01:13:28] us and then been dynamic programming like we make sure that we have this
[01:13:30] like we make sure that we have this ordering and these things are computed
[01:13:32] ordering and these things are computed in order so we're not worried about like
[01:13:34] in order so we're not worried about like visiting the states like multiple times
[01:13:35] visiting the states like multiple times but within in uniform path search we
[01:13:39] but within in uniform path search we might visit the state multiple times and
[01:13:41] might visit the state multiple times and if you have cycles we don't know what
[01:13:42] if you have cycles we don't know what order to go but the order we can go is
[01:13:45] order to go but the order we can go is we can actually compute a past cause the
[01:13:47] we can actually compute a past cause the suggested path cost and and basically go
[01:13:50] suggested path cost and and basically go over the states based on increasing past
[01:13:53] over the states based on increasing past costs okay so let me actually yeah so so
[01:13:57] costs okay so let me actually yeah so so uniform cost search what it does is it
[01:13:59] uniform cost search what it does is it numerous states in an order of
[01:14:01] numerous states in an order of increasing past costs so and then in
[01:14:05] increasing past costs so and then in this case we need to actually make an
[01:14:07] this case we need to actually make an assumption here we need to assume that
[01:14:09] assumption here we need to assume that the the cost is going to be non-negative
[01:14:11] the the cost is going to be non-negative so so I'm making this assumption for
[01:14:13] so so I'm making this assumption for uniform cost search so here is an
[01:14:16] uniform cost search so here is an example of uniform cost search running
[01:14:19] example of uniform cost search running oh we don't have internet I just yeah
[01:14:21] oh we don't have internet I just yeah there is a video of uniform cost search
[01:14:23] there is a video of uniform cost search running in action and if I have time now
[01:14:25] running in action and if I have time now connect to internet and get it working
[01:14:27] connect to internet and get it working but so so let's talk about the
[01:14:29] but so so let's talk about the high-level idea of uniform cost search
[01:14:32] high-level idea of uniform cost search so in uniform cost search we have three
[01:14:34] so in uniform cost search we have three sets that we need to keep track one is
[01:14:37] sets that we need to keep track one is explored set which is the states that we
[01:14:40] explored set which is the states that we have found the optimal path you sort of
[01:14:42] have found the optimal path you sort of stay say we're sure like how to get to
[01:14:43] stay say we're sure like how to get to you have computed the best as possible
[01:14:45] you have computed the best as possible to get there we are like done with them
[01:14:47] to get there we are like done with them yeah then we have another set called a
[01:14:50] yeah then we have another set called a frontier where this frontier are the
[01:14:53] frontier where this frontier are the states that we have seen you've computed
[01:14:56] states that we have seen you've computed like a cost of getting there like you
[01:14:59] like a cost of getting there like you know somehow how to get there and what
[01:15:00] know somehow how to get there and what would be the cost but you're just not
[01:15:02] would be the cost but you're just not sure about it like you're not sure if
[01:15:03] sure about it like you're not sure if that was the best way of getting there
[01:15:05] that was the best way of getting there okay so
[01:15:06] okay so so the frontier you can think of it as a
[01:15:09] so the frontier you can think of it as a known unknown I know they exist but like
[01:15:11] known unknown I know they exist but like I actually I'm not sure what's the
[01:15:13] I actually I'm not sure what's the optimal way of getting there and then
[01:15:14] optimal way of getting there and then finally we have this unexplored part of
[01:15:17] finally we have this unexplored part of states and these unexplored part of
[01:15:19] states and these unexplored part of state I haven't even seen them yet I I
[01:15:20] state I haven't even seen them yet I I don't even know how to get there and you
[01:15:22] don't even know how to get there and you can think of it as more of an unknown
[01:15:24] can think of it as more of an unknown unknown so so that's like how you would
[01:15:26] unknown so so that's like how you would think about these three
[01:15:27] think about these three so let's actually work out an example
[01:15:29] so let's actually work out an example for uniform cost search I'm actually
[01:15:31] for uniform cost search I'm actually going to do this one so so I'm just
[01:15:35] going to do this one so so I'm just gonna show how uniform cost search runs
[01:15:37] gonna show how uniform cost search runs on this example so I said we are going
[01:15:40] on this example so I said we are going to keep track of three sets unexplored
[01:15:47] frontier and then explore it okay all
[01:15:57] frontier and then explore it okay all right so everything ends up an
[01:15:59] right so everything ends up an unexplored at the beginning
[01:16:00] unexplored at the beginning a B C and D and what I want to do is I
[01:16:03] a B C and D and what I want to do is I want to go from A to D right that's what
[01:16:06] want to go from A to D right that's what I want to do okay so I want to find a
[01:16:07] I want to do okay so I want to find a minimum pass cloth minimum cost path to
[01:16:13] minimum pass cloth minimum cost path to get from A to D given that I have this
[01:16:16] get from A to D given that I have this graph so what I'm gonna do is I'm gonna
[01:16:18] graph so what I'm gonna do is I'm gonna take my initial State that's a I am
[01:16:21] take my initial State that's a I am going to put a on my frontier and it
[01:16:23] going to put a on my frontier and it costs zero to get to a because I'm just
[01:16:26] costs zero to get to a because I'm just starting at a okay so that's on my
[01:16:29] starting at a okay so that's on my frontier then the next step what I'm
[01:16:30] frontier then the next step what I'm gonna do is I'm gonna pop off the thing
[01:16:32] gonna do is I'm gonna pop off the thing with the lowest cost from my frontier
[01:16:35] with the lowest cost from my frontier there's one thing on my frontier I'm
[01:16:37] there's one thing on my frontier I'm just gonna pop off that one thing off my
[01:16:40] just gonna pop off that one thing off my frontier I'm gonna put that to export
[01:16:42] frontier I'm gonna put that to export the cost of getting to a is zero and
[01:16:44] the cost of getting to a is zero and then what I'm going to do is after
[01:16:47] then what I'm going to do is after popping it up from my frontier is I'm
[01:16:50] popping it up from my frontier is I'm gonna see how I can get from a to any
[01:16:53] gonna see how I can get from a to any other state so from a I can get to B
[01:16:55] other state so from a I can get to B that's one option
[01:16:56] that's one option and with the cost of one so from a I can
[01:17:01] and with the cost of one so from a I can go to be the cost of one where else can
[01:17:03] go to be the cost of one where else can I go
[01:17:04] I go I can go to see speed acosta 100 okay so
[01:17:09] I can go to see speed acosta 100 okay so what I just did is I move be from
[01:17:13] what I just did is I move be from unexplored to frontier and then I I know
[01:17:15] unexplored to frontier and then I I know how I to get there from a and I move to
[01:17:17] how I to get there from a and I move to say to a frontier and I know how to get
[01:17:19] say to a frontier and I know how to get there so now it's the next round I'm
[01:17:23] there so now it's the next round I'm looking at my frontier
[01:17:24] looking at my frontier he's not on my frontier anymore it's an
[01:17:26] he's not on my frontier anymore it's an export and I'm gonna pop up the thing
[01:17:29] export and I'm gonna pop up the thing with the best cost of my frontier
[01:17:31] with the best cost of my frontier well what is that that's B so I'm going
[01:17:35] well what is that that's B so I'm going to move B to my export the way the best
[01:17:39] to move B to my export the way the best way to get to B I already know that
[01:17:41] way to get to B I already know that right that's from A to B everything is
[01:17:43] right that's from A to B everything is good okay so now that have popped off B
[01:17:46] good okay so now that have popped off B from my frontier I'm gonna look at me
[01:17:48] from my frontier I'm gonna look at me and see what state I can get to from
[01:17:50] and see what state I can get to from from P I can go to a but a is already in
[01:17:53] from P I can go to a but a is already in export like I already know the best way
[01:17:55] export like I already know the best way to get to a so so there is no reason to
[01:17:57] to get to a so so there is no reason to do that from B I can get to see and if I
[01:18:01] do that from B I can get to see and if I want to get to C then I can actually get
[01:18:03] want to get to C then I can actually get to see with the cost of 1 plus whatever
[01:18:06] to see with the cost of 1 plus whatever cost of B is already 1 so what I'm gonna
[01:18:10] cost of B is already 1 so what I'm gonna do is I'm gonna erase this because
[01:18:12] do is I'm gonna erase this because there's a better way of getting there
[01:18:14] there's a better way of getting there and that's from me okay and then from me
[01:18:19] and that's from me okay and then from me I can get to D so I'm gonna move D from
[01:18:22] I can get to D so I'm gonna move D from an export to front here I can get to it
[01:18:26] an export to front here I can get to it from B and then how do I get to it from
[01:18:28] from B and then how do I get to it from me there's a cost of 1 0 1 alright
[01:18:32] me there's a cost of 1 0 1 alright because 100 plus cost of getting to
[01:18:36] because 100 plus cost of getting to alright so I'm done exploring everything
[01:18:40] alright so I'm done exploring everything I can do from be going back to my
[01:18:42] I can do from be going back to my frontier again so these two are not on
[01:18:44] frontier again so these two are not on my frontier I just have C and D on my
[01:18:46] my frontier I just have C and D on my frontier
[01:18:47] frontier I'm gonna pop off the thing with the
[01:18:49] I'm gonna pop off the thing with the best cost that is C I'm gonna move that
[01:18:52] best cost that is C I'm gonna move that to export with the cost of tooth and the
[01:18:56] to export with the cost of tooth and the way to eat the best way to get that is
[01:18:58] way to eat the best way to get that is from B okay so we're done with C and
[01:19:01] from B okay so we're done with C and then we're gonna see where we can go
[01:19:03] then we're gonna see where we can go from C from C I can go to a well that's
[01:19:06] from C from C I can go to a well that's done that's already on export and export
[01:19:08] done that's already on export and export said I'm not going to touch that similar
[01:19:10] said I'm not going to touch that similar thing with P already in export don't
[01:19:12] thing with P already in export don't need to worry about that from C I can
[01:19:14] need to worry about that from C I can get to D right and if I want to get to D
[01:19:17] get to D right and if I want to get to D from
[01:19:18] from CEO what would be the cost of that it
[01:19:20] CEO what would be the cost of that it would be two plus one so I can update
[01:19:22] would be two plus one so I can update this and have three and I can update the
[01:19:26] this and have three and I can update the way to get to D from here and then we're
[01:19:31] way to get to D from here and then we're done we go to frontier the only thing
[01:19:34] done we go to frontier the only thing that's left on the frontier is D I'm
[01:19:36] that's left on the frontier is D I'm gonna just pop that off and then I'm
[01:19:38] gonna just pop that off and then I'm gonna add that to exports and that is
[01:19:40] gonna add that to exports and that is three and that's what I have in my
[01:19:42] three and that's what I have in my exports so the way to get from A to D is
[01:19:44] exports so the way to get from A to D is is by taking this route and it costs one
[01:19:48] is by taking this route and it costs one so a b c ND okay is that is that clear
[01:19:54] so a b c ND okay is that is that clear alright okay so there are two slides
[01:20:00] alright okay so there are two slides left and they're probably gonna kick us
[01:20:02] left and they're probably gonna kick us out soon so I'll do this next time so so
[01:20:05] out soon so I'll do this next time so so yeah the to two slides left is one is
[01:20:07] yeah the to two slides left is one is going to just go over there the
[01:20:10] going to just go over there the pseudocode so take a look at that the
[01:20:11] pseudocode so take a look at that the code is online and there's a small
[01:20:13] code is online and there's a small theorem that says this is actually doing
[01:20:15] theorem that says this is actually doing the right thing I'll talk about that
[01:20:17] the right thing I'll talk about that next time


================================================================================
LECTURE 018
================================================================================

Search 2 - A* | Stanford CS221: Artificial Intelligence (Autumn 2019)

Source: https://www.youtube.com/watch?v=HEs1ZCvLH2s

---

Transcript

[00:00:05] okay so hi everyone so
[00:00:09] okay so hi everyone so today's to continue talking about search
[00:00:11] today's to continue talking about search so so that's what we were going to start
[00:00:14] so so that's what we were going to start doing finish off some of this stuff
[00:00:15] doing finish off some of this stuff we've started talking about last time
[00:00:18] we've started talking about last time and then after that switch to some of
[00:00:21] and then after that switch to some of the more interesting topics like
[00:00:23] the more interesting topics like learning so a few announcements so the
[00:00:25] learning so a few announcements so the solutions to the old exams are online
[00:00:27] solutions to the old exams are online now so if you guys want to start
[00:00:29] now so if you guys want to start studying for the exam you can do that so
[00:00:32] studying for the exam you can do that so so start looking at some of those
[00:00:34] so start looking at some of those problems I think that would be useful
[00:00:36] problems I think that would be useful actually let me start with the search to
[00:00:39] actually let me start with the search to lecture because I think that might be
[00:00:43] lecture because I think that might be like that has a review of some of the
[00:00:46] like that has a review of some of the topics we've talked about so it might be
[00:00:48] topics we've talked about so it might be easier to do that also I'm not connected
[00:00:50] easier to do that also I'm not connected to the network so we're not going to do
[00:00:52] to the network so we're not going to do the questions or show the videos because
[00:00:54] the questions or show the videos because I have I have a hard time connecting a
[00:00:56] I have I have a hard time connecting a network in this room ok all right so so
[00:00:58] network in this room ok all right so so let's start continue talking about
[00:01:01] let's start continue talking about search so if you guys remember we had
[00:01:05] search so if you guys remember we had this the city block problem so let's go
[00:01:08] this the city block problem so let's go back to that problem and let's just try
[00:01:09] back to that problem and let's just try to do a review of some of the some of
[00:01:11] to do a review of some of the some of the search search algorithms we talked
[00:01:14] the search search algorithms we talked about last time so so suppose you want
[00:01:17] about last time so so suppose you want to travel from city 1 to city n only
[00:01:20] to travel from city 1 to city n only going forward and then from city n you
[00:01:22] going forward and then from city n you want to go backwards so and back to city
[00:01:24] want to go backwards so and back to city 1 going only backwards okay so you so
[00:01:27] 1 going only backwards okay so you so the problem statement is kind of like
[00:01:30] the problem statement is kind of like this you're starting in city 1 you're
[00:01:35] this you're starting in city 1 you're going you're going forward and you're
[00:01:37] going you're going forward and you're getting to some city n so maybe you were
[00:01:39] getting to some city n so maybe you were doing out on this and then after that
[00:01:42] doing out on this and then after that you want to go backwards and get to get
[00:01:45] you want to go backwards and get to get to city one again so you're going to
[00:01:47] to city one again so you're going to some of these cities so that's the goal
[00:01:50] some of these cities so that's the goal and then the cost of going from any city
[00:01:53] and then the cost of going from any city I to city J is equal to CIJ ok so so
[00:01:57] I to city J is equal to CIJ ok so so that sells so the question is what which
[00:02:00] that sells so the question is what which one of these following algorithms could
[00:02:02] one of these following algorithms could you use to solve this problem and it
[00:02:04] you use to solve this problem and it could be multiple of them so we have
[00:02:06] could be multiple of them so we have depth first search breadth first search
[00:02:08] depth first search breadth first search dynamic programming and uniform cost
[00:02:10] dynamic programming and uniform cost search and these were the algorithms we
[00:02:12] search and these were the algorithms we talked about last time
[00:02:13] talked about last time so maybe just talk to your neighbors for
[00:02:16] so maybe just talk to your neighbors for a minute
[00:02:17] a minute and then we can do votes on each one of
[00:02:19] and then we can do votes on each one of these yes
[00:02:21] these yes listen pasa pasa okay let me take that
[00:02:27] listen pasa pasa okay let me take that again thank you thank you for so let's
[00:03:05] again thank you thank you for so let's maybe start talking about this so how
[00:03:08] maybe start talking about this so how about depth-first search like how many
[00:03:10] about depth-first search like how many will saying we can use their search how
[00:03:13] will saying we can use their search how many people think we can't use that
[00:03:15] many people think we can't use that search it's very like good split so so
[00:03:21] search it's very like good split so so some of the people think we can't use
[00:03:24] some of the people think we can't use that their search what are some reasons
[00:03:26] that their search what are some reasons maybe yeah so here we are basically
[00:03:34] maybe yeah so here we are basically going from city one to see the end each
[00:03:36] going from city one to see the end each one of these edges have a cost of CIJ
[00:03:39] one of these edges have a cost of CIJ I'm just saying CIJ is greater than or
[00:03:41] I'm just saying CIJ is greater than or equal to zero that's the only thing I'm
[00:03:43] equal to zero that's the only thing I'm saying about CIJ but if remember that
[00:03:45] saying about CIJ but if remember that first search you really wanted the cost
[00:03:47] first search you really wanted the cost to just be equal to zero because if you
[00:03:49] to just be equal to zero because if you remember that whole tree like the whole
[00:03:51] remember that whole tree like the whole point of depth-first search was I could
[00:03:53] point of depth-first search was I could just stop whenever I could find a
[00:03:54] just stop whenever I could find a solution and we were assuming that the
[00:03:56] solution and we were assuming that the cost of all the edges is just equal to
[00:03:59] cost of all the edges is just equal to zero so so we can't really use that
[00:04:00] zero so so we can't really use that search here because because our cost is
[00:04:02] search here because because our cost is not zero so assuming that like not again
[00:04:05] not zero so assuming that like not again you know that reasoning how about
[00:04:07] you know that reasoning how about breadth-first search can be used for
[00:04:09] breadth-first search can be used for search
[00:04:22] so so that's a good way so what
[00:04:24] so so that's a good way so what suggesting is can we think about the
[00:04:26] suggesting is can we think about the problem as going from city one to city
[00:04:28] problem as going from city one to city in and then after that like introduce
[00:04:31] in and then after that like introduce like a whole new problem that continues
[00:04:33] like a whole new problem that continues that and starts from CDN and goes to
[00:04:35] that and starts from CDN and goes to city one let me get back to that points
[00:04:37] city one let me get back to that points like in a second because like you could
[00:04:39] like in a second because like you could potentially think about that actually
[00:04:41] potentially think about that actually like that might be an interesting way of
[00:04:42] like that might be an interesting way of thinking about it but but irrespective
[00:04:44] thinking about it but but irrespective of that I can't use depth or search so
[00:04:46] of that I can't use depth or search so I'm so far I'm just talking about depth
[00:04:47] I'm so far I'm just talking about depth or search irrespective of how I'm
[00:04:49] or search irrespective of how I'm looking at the problem the costs are
[00:04:51] looking at the problem the costs are going to be nonzero so because the costs
[00:04:54] going to be nonzero so because the costs are going to be nonzero I can't use that
[00:04:56] are going to be nonzero I can't use that first search so so let's talk about
[00:04:58] first search so so let's talk about matters so how about birth research can
[00:05:00] matters so how about birth research can I use breadth-first search you cannot
[00:05:09] I use breadth-first search you cannot use fresh first search here because for
[00:05:11] use fresh first search here because for breadth-first search if you remember you
[00:05:13] breadth-first search if you remember you really wanted all the costs to be the
[00:05:14] really wanted all the costs to be the same they didn't need to be 0 date but
[00:05:16] same they didn't need to be 0 date but they needed to be the same thing because
[00:05:18] they needed to be the same thing because then you could just go over the levels
[00:05:19] then you could just go over the levels and then here I'm not like I'm not
[00:05:21] and then here I'm not like I'm not saying I'm not putting any restrictions
[00:05:23] saying I'm not putting any restrictions on CIJ being the same thing ok so now
[00:05:26] on CIJ being the same thing ok so now let's talk about dynamic programming how
[00:05:28] let's talk about dynamic programming how about dynamic programming can be used
[00:05:29] about dynamic programming can be used sine amock programming alright so that
[00:05:33] sine amock programming alright so that looks right right like we could use
[00:05:34] looks right right like we could use dynamic programming here everything
[00:05:36] dynamic programming here everything looks ok CI J's are positive looks fine
[00:05:41] looks ok CI J's are positive looks fine how about actually one question so so
[00:05:46] how about actually one question so so don't I have cycles here we kind of
[00:05:48] don't I have cycles here we kind of briefly talked about this already
[00:05:49] briefly talked about this already so don't I have like this cycle here so
[00:06:03] so don't I have like this cycle here so we could actually use dynamic
[00:06:05] we could actually use dynamic programming here even if it kind of
[00:06:06] programming here even if it kind of looks like we have a cycle and the
[00:06:08] looks like we have a cycle and the reasons we can kind of use this trick
[00:06:10] reasons we can kind of use this trick for we can basically draw this out again
[00:06:15] for we can basically draw this out again and for going forward basically go all
[00:06:18] and for going forward basically go all the way here and then after that we're
[00:06:20] the way here and then after that we're going backwards kind of include the
[00:06:23] going backwards kind of include the directionality too so all I am doing is
[00:06:26] directionality too so all I am doing is I'm extending the state this
[00:06:27] I'm extending the state this space to not just be the city but be the
[00:06:31] space to not just be the city but be the city in addition to that be the
[00:06:33] city in addition to that be the direction that we're going so if I'm in
[00:06:34] direction that we're going so if I'm in city four here it's City for going
[00:06:37] city four here it's City for going forward and if at some point in the
[00:06:39] forward and if at some point in the future I'm in city and on a four again
[00:06:40] future I'm in city and on a four again it's City four going backwards so I'll
[00:06:43] it's City four going backwards so I'll keep track of both the city and the
[00:06:45] keep track of both the city and the directionality and when I do that then
[00:06:47] directionality and when I do that then I'm kind of breaking the cycle like I'm
[00:06:49] I'm kind of breaking the cycle like I'm not putting any cycles here and I can
[00:06:51] not putting any cycles here and I can actually use dynamic programming that
[00:06:54] actually use dynamic programming that makes it and then uniform cost search
[00:06:59] that dad also sounds good - right like
[00:07:02] that dad also sounds good - right like uniform cost search you could actually
[00:07:03] uniform cost search you could actually use that doesn't matter if you have
[00:07:04] use that doesn't matter if you have cycles or not and then we have positive
[00:07:06] cycles or not and then we have positive positive non negative costs so we could
[00:07:09] positive non negative costs so we could use uniform cost search yeah all right
[00:07:12] use uniform cost search yeah all right so this was just a quick review of some
[00:07:13] so this was just a quick review of some of the things we talked about last time
[00:07:15] of the things we talked about last time and another thing we talked about last
[00:07:17] and another thing we talked about last time was this notion of state okay so so
[00:07:20] time was this notion of state okay so so we started talking about three search
[00:07:22] we started talking about three search algorithms and at some point we switch
[00:07:25] algorithms and at some point we switch to dynamic programming and uniform cost
[00:07:27] to dynamic programming and uniform cost search where we are where we don't need
[00:07:30] search where we are where we don't need to like we don't need to have this
[00:07:31] to like we don't need to have this exponential blow-up and the reason
[00:07:33] exponential blow-up and the reason behind that was you're we have
[00:07:35] behind that was you're we have memoization and in addition to that we
[00:07:37] memoization and in addition to that we have this notion of state okay and so
[00:07:40] have this notion of state okay and so what is a state a state is a summary of
[00:07:41] what is a state a state is a summary of all past actions that are sufficient for
[00:07:44] all past actions that are sufficient for us to choose the future optimally so so
[00:07:47] us to choose the future optimally so so we need to be really careful about
[00:07:48] we need to be really careful about choosing our state so in this previous
[00:07:51] choosing our state so in this previous question we looked at past actions so if
[00:07:54] question we looked at past actions so if you look at like all cities that you go
[00:07:56] you look at like all cities that you go over it can be in City one then three
[00:07:58] over it can be in City one then three and four five six and I see a three
[00:08:01] and four five six and I see a three again so in terms of state the things
[00:08:04] again so in terms of state the things that you want to keep track of is what
[00:08:05] that you want to keep track of is what city you are in but in addition to that
[00:08:07] city you are in but in addition to that you want to have the directionality
[00:08:08] you want to have the directionality because you need to know like where you
[00:08:11] because you need to know like where you are and how you're getting back okay so
[00:08:14] are and how you're getting back okay so and we did a couple of examples around
[00:08:16] and we did a couple of examples around that trying to figure out what is what
[00:08:18] that trying to figure out what is what is like a specific notion of state for
[00:08:20] is like a specific notion of state for various problems all right so so we
[00:08:25] various problems all right so so we started last time talking about search
[00:08:27] started last time talking about search problems and and we sort of formalizing
[00:08:29] problems and and we sort of formalizing it so if you remember our paradigm of
[00:08:30] it so if you remember our paradigm of modeling and inference and learning we
[00:08:32] modeling and inference and learning we started kind of modeling search problems
[00:08:35] started kind of modeling search problems using this formalism where we defined a
[00:08:37] using this formalism where we defined a starting state that's a start and then
[00:08:39] starting state that's a start and then we talked about the actions of
[00:08:41] we talked about the actions of which is a function over States which
[00:08:43] which is a function over States which returns all possible actions and then we
[00:08:46] returns all possible actions and then we talked about the cost function so the
[00:08:47] talked about the cost function so the cost function can take a state an action
[00:08:50] cost function can take a state an action and tell us what is the cost of that
[00:08:51] and tell us what is the cost of that that that that edge and then we talked
[00:08:55] that that that edge and then we talked about the successor function which takes
[00:08:57] about the successor function which takes the state in action and tells us where
[00:08:58] the state in action and tells us where we end up at and at the end we had this
[00:09:00] we end up at and at the end we had this is end function that was just checking
[00:09:03] is end function that was just checking if you're in an end state or not so
[00:09:04] if you're in an end state or not so these were all the things that we needed
[00:09:06] these were all the things that we needed to define a search problem and we kind
[00:09:09] to define a search problem and we kind of tried that in a couple of examples to
[00:09:11] of tried that in a couple of examples to trim' example the city example of that
[00:09:14] trim' example the city example of that and then after talking about these these
[00:09:17] and then after talking about these these different ways of thinking about search
[00:09:20] different ways of thinking about search problems we started talking about
[00:09:23] problems we started talking about various types of inference algorithm so
[00:09:25] various types of inference algorithm so we talked about tree search so depth
[00:09:27] we talked about tree search so depth first search breadth first search depth
[00:09:28] first search breadth first search depth research with iterative deepening
[00:09:30] research with iterative deepening backtracking search and then after that
[00:09:34] backtracking search and then after that we talked about some of these graph
[00:09:35] we talked about some of these graph search type algorithms like uniform cost
[00:09:37] search type algorithms like uniform cost search and and dynamic programming so
[00:09:40] search and and dynamic programming so last time we did an example of uniform
[00:09:46] last time we did an example of uniform cost search but we didn't get to prove
[00:09:48] cost search but we didn't get to prove the correctness of it so I want to
[00:09:50] the correctness of it so I want to switch to some of the last last last
[00:09:53] switch to some of the last last last times slides to just go over this quick
[00:09:57] times slides to just go over this quick theorem and then after that just switch
[00:09:58] theorem and then after that just switch back to to this lecture ok so uniform
[00:10:01] back to to this lecture ok so uniform cost search like if you remember what we
[00:10:03] cost search like if you remember what we were doing in uniform cost search we had
[00:10:05] were doing in uniform cost search we had three different sets
[00:10:06] three different sets we had an export set which was basically
[00:10:09] we had an export set which was basically the set of states that we have visited
[00:10:10] the set of states that we have visited and we are sure how to get to them and
[00:10:12] and we are sure how to get to them and we know the optimal path and we know
[00:10:14] we know the optimal path and we know everything about them we had this
[00:10:16] everything about them we had this frontier set which was a set with a set
[00:10:19] frontier set which was a set with a set of states that we have got to them but
[00:10:22] of states that we have got to them but we are not sure if the cost that we have
[00:10:24] we are not sure if the cost that we have is the best cost cost there might be a
[00:10:27] is the best cost cost there might be a better way of getting to them and we
[00:10:28] better way of getting to them and we don't know like we are not sure yet and
[00:10:30] don't know like we are not sure yet and then we have the uh neck Sports set of
[00:10:32] then we have the uh neck Sports set of states which are basically states that
[00:10:34] states which are basically states that we haven't seen yet so we did this
[00:10:37] we haven't seen yet so we did this example where we started with all the
[00:10:38] example where we started with all the states in the unexplored set and then we
[00:10:40] states in the unexplored set and then we moved them to the frontier and then from
[00:10:42] moved them to the frontier and then from the frontier we moved them to the export
[00:10:44] the frontier we moved them to the export set so so this was the example that we
[00:10:46] set so so this was the example that we did under board okay and then we
[00:10:49] did under board okay and then we realized that like even if you have
[00:10:51] realized that like even if you have cyclones we can actually do this
[00:10:52] cyclones we can actually do this algorithm and then
[00:10:53] algorithm and then we ended up finding the best path being
[00:10:56] we ended up finding the best path being from A to B to C to D and that cost
[00:10:58] from A to B to C to D and that cost three
[00:10:59] three so let's actually implement uniform cost
[00:11:03] so let's actually implement uniform cost search and so I think we didn't do this
[00:11:05] search and so I think we didn't do this last time so going back to our set of so
[00:11:11] last time so going back to our set of so we started writing up these algorithms
[00:11:13] we started writing up these algorithms for search problems so we have we have
[00:11:15] for search problems so we have we have written dynamic programming already and
[00:11:17] written dynamic programming already and backtracking search so now we can we can
[00:11:19] backtracking search so now we can we can try to kind of implement uniform cost
[00:11:21] try to kind of implement uniform cost search and for doing so we need to have
[00:11:23] search and for doing so we need to have this priority queue data structure so
[00:11:25] this priority queue data structure so this is in a util file I'm just showing
[00:11:27] this is in a util file I'm just showing you what it like what functions it has
[00:11:29] you what it like what functions it has it has an update function and it has a
[00:11:31] it has an update function and it has a remove min function so it's just a data
[00:11:34] remove min function so it's just a data structure that I'm gonna use for my
[00:11:35] structure that I'm gonna use for my frontier
[00:11:36] frontier because like my frontier I'm popping off
[00:11:37] because like my frontier I'm popping off things off my frontier so I'm going to
[00:11:40] things off my frontier so I'm going to use this data structure alright so let's
[00:11:43] use this data structure alright so let's go back to uniform cost search so we're
[00:11:46] go back to uniform cost search so we're going to define this frontier where
[00:11:48] going to define this frontier where you're adding states to it from
[00:11:50] you're adding states to it from unexplored set we're adding states to
[00:11:53] unexplored set we're adding states to the frontier okay and it's going to be a
[00:11:55] the frontier okay and it's going to be a priority queue so so we have that data
[00:11:57] priority queue so so we have that data structure because we just imported util
[00:11:59] structure because we just imported util and you're going to basically add this
[00:12:03] and you're going to basically add this start state with a cost of 0 to the
[00:12:05] start state with a cost of 0 to the frontier so that's the first thing we do
[00:12:06] frontier so that's the first thing we do and then after that like while the
[00:12:09] and then after that like while the frontier is not empty so while true what
[00:12:12] frontier is not empty so while true what we were going to do is we're going to
[00:12:14] we were going to do is we're going to remove the minimum past cough an element
[00:12:17] remove the minimum past cough an element from the frontier so so basically just
[00:12:20] from the frontier so so basically just pop off the frontier the best thing that
[00:12:23] pop off the frontier the best thing that exists there and just move that to the
[00:12:24] exists there and just move that to the explored set okay so when I pop off the
[00:12:27] explored set okay so when I pop off the thing from the frontier basically I get
[00:12:28] thing from the frontier basically I get this past cost and I get the state ok
[00:12:31] this past cost and I get the state ok all right so so if we are in an end
[00:12:35] all right so so if we are in an end state then you're just going to return
[00:12:37] state then you're just going to return that pass cost with the history I'm not
[00:12:38] that pass cost with the history I'm not putting the history here for now I'm
[00:12:40] putting the history here for now I'm just returning the cost okay so after
[00:12:43] just returning the cost okay so after popping off a state from the frontier
[00:12:45] popping off a state from the frontier the thing we were doing was we were
[00:12:46] the thing we were doing was we were adding the children of that so the way
[00:12:49] adding the children of that so the way we do that is we're going to use this
[00:12:51] we do that is we're going to use this successor and cost function that we
[00:12:52] successor and cost function that we defined last time so we can basically
[00:12:55] defined last time so we can basically iterate over action new state and costs
[00:12:58] iterate over action new state and costs in the successor and cost function and
[00:12:59] in the successor and cost function and basically update our frontier by adding
[00:13:03] basically update our frontier by adding these new States to it ok and then the
[00:13:06] these new States to it ok and then the cost that you're going to
[00:13:07] cost that you're going to it's cost post past cost if that is
[00:13:09] it's cost post past cost if that is better so so that's what the update
[00:13:12] better so so that's what the update function of the frontier does and that's
[00:13:14] function of the frontier does and that's pretty much it like that is uniform cost
[00:13:16] pretty much it like that is uniform cost search you add stuff to the frontier you
[00:13:18] search you add stuff to the frontier you pop up stuff from the frontier and and
[00:13:20] pop up stuff from the frontier and and that way you explore them you move
[00:13:22] that way you explore them you move things from now on exports it to the
[00:13:23] things from now on exports it to the export set so let's just try that out it
[00:13:26] export set so let's just try that out it looks like it is doing the right thing
[00:13:28] looks like it is doing the right thing so it got the same value as dynamic
[00:13:31] so it got the same value as dynamic programming so looks like it kind of
[00:13:34] programming so looks like it kind of works okay so this code is also online
[00:13:38] works okay so this code is also online so if you want to take a look at it
[00:13:40] so if you want to take a look at it later actually this is not what I want
[00:13:42] later actually this is not what I want to do yeah okay all right so so that was
[00:13:50] to do yeah okay all right so so that was and and here's also the pseudo code of
[00:13:52] and and here's also the pseudo code of uniform cost search okay okay so we have
[00:13:59] uniform cost search okay okay so we have between there's a question right so what
[00:14:05] between there's a question right so what sort of the question is what's the
[00:14:06] sort of the question is what's the runtime of uniform cost search so the
[00:14:08] runtime of uniform cost search so the wrong time of uniform cost search is
[00:14:09] wrong time of uniform cost search is order of n log n where the log n is
[00:14:12] order of n log n where the log n is because of like the bookkeeping of the
[00:14:15] because of like the bookkeeping of the priority queue and you're going over all
[00:14:17] priority queue and you're going over all the edges so so he can think of n here
[00:14:21] the edges so so he can think of n here as the edges and worst case scenario if
[00:14:23] as the edges and worst case scenario if you have a fully connected graph it's
[00:14:25] you have a fully connected graph it's technically and squirt log n but in
[00:14:28] technically and squirt log n but in practice we have sparse or graph so
[00:14:30] practice we have sparse or graph so people usually refer to B that's just n
[00:14:32] people usually refer to B that's just n log n where n is the number of states
[00:14:34] log n where n is the number of states that you have explored and it's actually
[00:14:35] that you have explored and it's actually not all of the states it's the states
[00:14:37] not all of the states it's the states that you have explored okay and dynamic
[00:14:39] that you have explored okay and dynamic programming is order of n so technically
[00:14:42] programming is order of n so technically like dynamic programming is slightly
[00:14:44] like dynamic programming is slightly better but really events actually go
[00:14:48] better but really events actually go first give it is the only difference
[00:14:50] first give it is the only difference between this and I stress that you just
[00:14:52] between this and I stress that you just don't have them all
[00:14:54] don't have them all yeah so what's the questions was the
[00:14:56] yeah so what's the questions was the difference between this and the
[00:14:57] difference between this and the Dijkstra's algorithm they're very
[00:14:59] Dijkstra's algorithm they're very similar the only difference is this is
[00:15:00] similar the only difference is this is trying to solve a search problem so
[00:15:02] trying to solve a search problem so you're not I like exploring all the
[00:15:04] you're not I like exploring all the states when you get to the solution you
[00:15:05] states when you get to the solution you get to the solution and then you just
[00:15:07] get to the solution and then you just return that Dijkstra you're going from
[00:15:09] return that Dijkstra you're going from you're basically exploring all of all of
[00:15:11] you're basically exploring all of all of the states in there in your graph all
[00:15:17] the states in there in your graph all right sounds good okay so I just want to
[00:15:21] right sounds good okay so I just want to quickly talk about this correctness
[00:15:23] quickly talk about this correctness theorem so so for uniform cost search we
[00:15:25] theorem so so for uniform cost search we actually have a correctness theorem
[00:15:27] actually have a correctness theorem which basically says uniform cost search
[00:15:29] which basically says uniform cost search does the right thing so what basically
[00:15:32] does the right thing so what basically the theorem says is if we have a state
[00:15:34] the theorem says is if we have a state that we are popping off the frontier and
[00:15:36] that we are popping off the frontier and we're moving it from the frontier to the
[00:15:38] we're moving it from the frontier to the export then it's priority value which is
[00:15:41] export then it's priority value which is equal to past cost of s is actually the
[00:15:44] equal to past cost of s is actually the minimum cost of getting to to the state
[00:15:46] minimum cost of getting to to the state s so what this is saying is let's say
[00:15:49] s so what this is saying is let's say that this is my export set so this is my
[00:15:51] that this is my export set so this is my exports and then right here is my
[00:15:55] exports and then right here is my frontier and I have a start state okay
[00:15:59] frontier and I have a start state okay and then I have some state s that right
[00:16:04] and then I have some state s that right now I've decided that I am popping off s
[00:16:06] now I've decided that I am popping off s from the frontier to export because that
[00:16:09] from the frontier to export because that is the best thing that has the best pass
[00:16:11] is the best thing that has the best pass cost so what the theorem says is this
[00:16:14] cost so what the theorem says is this this path that I have from and start to
[00:16:17] this path that I have from and start to s is the shortest path possible to get
[00:16:20] s is the shortest path possible to get to get to the state s ok so the way to
[00:16:23] to get to the state s ok so the way to prove that is to show that the cost of
[00:16:25] prove that is to show that the cost of this path is lower than any other path
[00:16:27] this path is lower than any other path paths that go from s start to s so let's
[00:16:32] paths that go from s start to s so let's say there is some other path this green
[00:16:34] say there is some other path this green one that goes from s star to s some
[00:16:37] one that goes from s star to s some other way and the way that it goes to s
[00:16:40] other way and the way that it goes to s is it should probably leave the export
[00:16:43] is it should probably leave the export set of states from some state called t
[00:16:46] set of states from some state called t maybe to some ghost go to some other
[00:16:48] maybe to some ghost go to some other state nu and then from you go to us UN s
[00:16:52] state nu and then from you go to us UN s can be the same thing but the point of
[00:16:53] can be the same thing but the point of it is if I have this other path that
[00:16:55] it is if I have this other path that goes through s it needs to leave the
[00:16:58] goes through s it needs to leave the export set from some state so what I
[00:17:02] export set from some state so what I want to show is I want to show that the
[00:17:04] want to show is I want to show that the cost of
[00:17:06] cost of the green line I want to show that that
[00:17:10] the green line I want to show that that is greater than the cost of the black
[00:17:12] is greater than the cost of the black line okay all right so the cost of the
[00:17:14] line okay all right so the cost of the Green Line what is the cost of the Green
[00:17:16] Green Line what is the cost of the Green Line it's going to be the cost to here
[00:17:18] Line it's going to be the cost to here and then cost of T to you and the cost
[00:17:20] and then cost of T to you and the cost of u to s so I can say well this cost is
[00:17:23] of u to s so I can say well this cost is actually greater than or equal to
[00:17:27] actually greater than or equal to priority of T because that is the cost
[00:17:29] priority of T because that is the cost of getting to T plus cost of C to you
[00:17:35] of getting to T plus cost of C to you and I'm just dropping that's this last
[00:17:38] and I'm just dropping that's this last part the u to s I'm just dropping that
[00:17:39] part the u to s I'm just dropping that okay so cost of green is like at least
[00:17:43] okay so cost of green is like at least equal to priority of t plus cost of T TT
[00:17:46] equal to priority of t plus cost of T TT to you okay well what is that equal to
[00:17:49] to you okay well what is that equal to priority is just a number right it's
[00:17:51] priority is just a number right it's just a number that you're getting off
[00:17:52] just a number that you're getting off the priority queue so that is actually
[00:17:55] the priority queue so that is actually equal to past cost of t plus cost of t
[00:18:04] equal to past cost of t plus cost of t to you and and this value is going to
[00:18:09] to you and and this value is going to actually be greater than or equal to
[00:18:11] actually be greater than or equal to priority do you well why is that because
[00:18:15] priority do you well why is that because if you is in my frontier I visited you
[00:18:18] if you is in my frontier I visited you so I already have some priority value
[00:18:20] so I already have some priority value for you and the value that I've assigned
[00:18:22] for you and the value that I've assigned for the priority of you is either equal
[00:18:25] for the priority of you is either equal to this path cost of T plus cos of T to
[00:18:27] to this path cost of T plus cos of T to you because I've like seen that to use
[00:18:29] you because I've like seen that to use in my export use in my frontier so I've
[00:18:31] in my export use in my frontier so I've definitely seen this or it is something
[00:18:33] definitely seen this or it is something better that I don't know what it is
[00:18:35] better that I don't know what it is right so so priority of U is going to be
[00:18:37] right so so priority of U is going to be less than or equal to this path cost of
[00:18:40] less than or equal to this path cost of T plus cost of T to you okay and well
[00:18:43] T plus cost of T to you okay and well what do I know in terms of priority of
[00:18:45] what do I know in terms of priority of you and priority of s well I know
[00:18:48] you and priority of s well I know priority of U is going to be greater
[00:18:50] priority of U is going to be greater than or equal to priority of this well
[00:18:54] than or equal to priority of this well why is that because I already know I'm
[00:18:56] why is that because I already know I'm popping off s next
[00:18:57] popping off s next I'm not topping off you like like I I
[00:18:59] I'm not topping off you like like I I know I'm popping off the the thing that
[00:19:00] know I'm popping off the the thing that has the least amount of priority and the
[00:19:03] has the least amount of priority and the least value here and that's s and and
[00:19:06] least value here and that's s and and well that is equal to cost of the black
[00:19:09] well that is equal to cost of the black line
[00:19:13] right so that was just a quick like
[00:19:16] right so that was just a quick like proof of why uniform cost search always
[00:19:19] proof of why uniform cost search always returns kind of the best minimum cost
[00:19:22] returns kind of the best minimum cost path all right so let's go to the slides
[00:19:29] path all right so let's go to the slides again so just the comparison quick
[00:19:32] again so just the comparison quick comparison between dynamic programming
[00:19:34] comparison between dynamic programming of uniform cost search so we talked
[00:19:36] of uniform cost search so we talked about dynamic programming we know it
[00:19:38] about dynamic programming we know it doesn't allow cycles but in terms of
[00:19:40] doesn't allow cycles but in terms of action cost it can be anything like it
[00:19:42] action cost it can be anything like it look you can have negative cost you can
[00:19:44] look you can have negative cost you can have positive cost and in terms of
[00:19:47] have positive cost and in terms of complexity is order N and then uniform
[00:19:50] complexity is order N and then uniform cost search you can have cycles so that
[00:19:52] cost search you can have cycles so that is cool but the problem is the costs
[00:19:54] is cool but the problem is the costs need to be non-negative and it's order n
[00:19:56] need to be non-negative and it's order n log N and if you have if you end up in a
[00:19:58] log N and if you have if you end up in a situation where you have cycles and your
[00:20:01] situation where you have cycles and your costs are actually negative there's this
[00:20:02] costs are actually negative there's this other algorithm called bellman-ford that
[00:20:04] other algorithm called bellman-ford that we are not talking about in this class
[00:20:05] we are not talking about in this class but you could actually like have a
[00:20:07] but you could actually like have a different algorithm that addresses those
[00:20:09] different algorithm that addresses those sort of the settings okay all right okay
[00:20:16] sort of the settings okay all right okay so that was that was this idea of in
[00:20:20] so that was that was this idea of in France right now we have like a good
[00:20:22] France right now we have like a good series of ways of going about doing
[00:20:24] series of ways of going about doing inference for search problems you have
[00:20:26] inference for search problems you have formalize them and now the plan for this
[00:20:28] formalize them and now the plan for this lecture is is to think about learning so
[00:20:31] lecture is is to think about learning so how are we going to go about learning
[00:20:32] how are we going to go about learning when we have search problems and when
[00:20:35] when we have search problems and when our search problem is not fully
[00:20:36] our search problem is not fully specified and there are things in the
[00:20:38] specified and there are things in the search problem that are not specified
[00:20:40] search problem that are not specified and you want to learn what they are like
[00:20:41] and you want to learn what they are like the cost okay so so that's going to be
[00:20:45] the cost okay so so that's going to be the first part of the lecture and then
[00:20:46] the first part of the lecture and then towards the end of the lecture we are
[00:20:48] towards the end of the lecture we are going to talk about a few other
[00:20:49] going to talk about a few other algorithms that make things faster so so
[00:20:52] algorithms that make things faster so so smarter ways of making things faster
[00:20:54] smarter ways of making things faster we're going to talk about a star and
[00:20:56] we're going to talk about a star and some sort of relaxation type strategies
[00:20:59] some sort of relaxation type strategies all right so so let's go back to our
[00:21:03] all right so so let's go back to our transportation problem so so this was
[00:21:05] transportation problem so so this was our transportation problem where we had
[00:21:09] our transportation problem where we had a start state and we can either walk and
[00:21:12] a start state and we can either walk and by walking we can go from state s to
[00:21:14] by walking we can go from state s to state s plus 1 and that costs 1 or we
[00:21:17] state s plus 1 and that costs 1 or we can take a tram a magic tram that takes
[00:21:19] can take a tram a magic tram that takes us from state s to state to s and that
[00:21:22] us from state s to state to s and that costs - ok
[00:21:24] costs - ok want to get to state so we can formalize
[00:21:27] want to get to state so we can formalize that a search problem we can like we saw
[00:21:29] that a search problem we can like we saw it we saw this last time we can actually
[00:21:31] it we saw this last time we can actually try to find what is the best path to get
[00:21:34] try to find what is the best path to get from state 1 to any state and like we
[00:21:36] from state 1 to any state and like we saw like past like walk walk tram tram
[00:21:39] saw like past like walk walk tram tram tram walked round tram this is one
[00:21:41] tram walked round tram this is one potential like optimal task that one can
[00:21:43] potential like optimal task that one can get ok but the thing is the world is not
[00:21:46] get ok but the thing is the world is not perfect right like modeling is actually
[00:21:48] perfect right like modeling is actually really hard like it's not that we always
[00:21:50] really hard like it's not that we always have this nice model with everything and
[00:21:52] have this nice model with everything and we could end up in scenarios where we
[00:21:54] we could end up in scenarios where we have a search problem and we don't
[00:21:56] have a search problem and we don't actually know what the costs are our
[00:21:58] actually know what the costs are our actions are so we don't actually know
[00:22:00] actions are so we don't actually know what the cost of walking is or what the
[00:22:02] what the cost of walking is or what the cost of traumas but maybe we actually
[00:22:04] cost of traumas but maybe we actually have access to to this optimal path like
[00:22:06] have access to to this optimal path like maybe I know the optimal path is walk
[00:22:08] maybe I know the optimal path is walk walk tram tram tram walk tram tram but I
[00:22:11] walk tram tram tram walk tram tram but I don't know what the costs are so the
[00:22:13] don't know what the costs are so the point of learning is is to go about
[00:22:15] point of learning is is to go about learning what these cost values are
[00:22:18] learning what these cost values are based on these these optimal paths that
[00:22:20] based on these these optimal paths that be half so so I want to actually learn
[00:22:22] be half so so I want to actually learn the cost of walking is one and the cost
[00:22:24] the cost of walking is one and the cost of travesty and this is actually a
[00:22:27] of travesty and this is actually a common problem that we have like in
[00:22:29] common problem that we have like in machine learning in general so like for
[00:22:30] machine learning in general so like for example you might have data from how a
[00:22:34] example you might have data from how a person does something or like how a
[00:22:35] person does something or like how a person let's say like grasp an object
[00:22:37] person let's say like grasp an object and and I have no idea what was the cost
[00:22:39] and and I have no idea what was the cost that the person was optimizing to grasp
[00:22:41] that the person was optimizing to grasp an object right but I have like the
[00:22:43] an object right but I have like the trajectory I know like what the path
[00:22:44] trajectory I know like what the path they took when they picked up an object
[00:22:46] they took when they picked up an object so what I can do is if I have access to
[00:22:49] so what I can do is if I have access to that path of how they picked up an
[00:22:50] that path of how they picked up an object then from that I can actually
[00:22:52] object then from that I can actually learn what was the cost function that
[00:22:54] learn what was the cost function that they were optimizing because then I can
[00:22:56] they were optimizing because then I can put that cost function maybe on a on a
[00:22:57] put that cost function maybe on a on a robot that does the same thing
[00:23:07] that's a good question so the question
[00:23:08] that's a good question so the question is is it possible to have multiple
[00:23:09] is is it possible to have multiple solutions yes so we are gonna actually
[00:23:11] solutions yes so we are gonna actually see that like later like what sort of
[00:23:13] see that like later like what sort of like solutions are we going to get or
[00:23:15] like solutions are we going to get or they're there there could be cases where
[00:23:16] they're there there could be cases where we have multiple solutions the ratio of
[00:23:18] we have multiple solutions the ratio of it is the thing that matters so if you
[00:23:20] it is the thing that matters so if you have like walk is one tram is for if you
[00:23:23] have like walk is one tram is for if you get to an eight you kind of get the same
[00:23:25] get to an eight you kind of get the same sort of behavior and then it also
[00:23:27] sort of behavior and then it also depends on what sort of data you have
[00:23:28] depends on what sort of data you have like if your data allowed you to
[00:23:30] like if your data allowed you to actually recover that that's true
[00:23:32] actually recover that that's true solution so so we're gonna actually talk
[00:23:34] solution so so we're gonna actually talk about all this cases okay all right okay
[00:23:38] about all this cases okay all right okay so if you think about it when the search
[00:23:42] so if you think about it when the search problem we were trying to solve this was
[00:23:44] problem we were trying to solve this was the inference problem was when we were
[00:23:46] the inference problem was when we were given kind of a search formulation and
[00:23:48] given kind of a search formulation and we are given a cost and our goal was to
[00:23:51] we are given a cost and our goal was to find a sequence of actions this optimal
[00:23:53] find a sequence of actions this optimal sequence of actions that was the
[00:23:54] sequence of actions that was the shortest path or the best path and and
[00:23:56] shortest path or the best path and and some thought some way and this is a
[00:23:58] some thought some way and this is a forwards problem so search is this
[00:24:00] forwards problem so search is this forward problem where you're given a
[00:24:01] forward problem where you're given a cost and you want to find the sequence
[00:24:03] cost and you want to find the sequence of actions okay so it's interesting
[00:24:05] of actions okay so it's interesting because learning in some sense is an
[00:24:08] because learning in some sense is an inverse problem it's the inverse of
[00:24:10] inverse problem it's the inverse of search so the inverse of search is if
[00:24:13] search so the inverse of search is if you give me that sequence of actions
[00:24:15] you give me that sequence of actions that's the best sequence of actions that
[00:24:16] that's the best sequence of actions that you've got then can you figure out what
[00:24:18] you've got then can you figure out what the cost this so so in some sense you
[00:24:20] the cost this so so in some sense you can think of learning as this inverse
[00:24:22] can think of learning as this inverse problem of search N and we are going to
[00:24:24] problem of search N and we are going to kind of address that so I'm going to go
[00:24:27] kind of address that so I'm going to go over one example to talk about learning
[00:24:30] over one example to talk about learning and I'm actually going to use the
[00:24:33] and I'm actually going to use the notation of the machine learning
[00:24:35] notation of the machine learning lectures that we had at the beginning of
[00:24:38] lectures that we had at the beginning of like last week basically so let's say
[00:24:42] like last week basically so let's say that we have maybe I can draw this so
[00:24:51] that we have maybe I can draw this so let's say that we have a search problem
[00:24:53] let's say that we have a search problem without costs and that's our input so if
[00:24:56] without costs and that's our input so if so so we are kind of framing this
[00:24:58] so so we are kind of framing this problem of learning as a prediction
[00:25:00] problem of learning as a prediction problem and if you remember prediction
[00:25:01] problem and if you remember prediction problems and prediction problems we had
[00:25:03] problems and prediction problems we had an input so our input was X okay and in
[00:25:09] an input so our input was X okay and in this case you're saying our input is a
[00:25:11] this case you're saying our input is a search problem search problem
[00:25:15] search problem search problem without costs okay so that is my input
[00:25:20] without costs okay so that is my input and then we have outputs and in this
[00:25:24] and then we have outputs and in this case my my output Y is this optimal
[00:25:27] case my my output Y is this optimal sequence of actions that one could get
[00:25:29] sequence of actions that one could get yet so it's a solution path so it's a
[00:25:32] yet so it's a solution path so it's a solution and what I want to do is I want
[00:25:37] solution and what I want to do is I want to look like if you remember machine
[00:25:38] to look like if you remember machine learning the idea was I would want to
[00:25:40] learning the idea was I would want to find this predictor this F function f
[00:25:42] find this predictor this F function f that would take an input f of X and then
[00:25:44] that would take an input f of X and then it would basically return the solution
[00:25:47] it would basically return the solution path and in other settings that it would
[00:25:49] path and in other settings that it would generalize so so that was kind of the
[00:25:50] generalize so so that was kind of the idea that we explored in machine
[00:25:52] idea that we explored in machine learning and you kind of want to do the
[00:25:53] learning and you kind of want to do the same thing in here so let's start with
[00:25:56] same thing in here so let's start with I'm gonna draw that here so let's start
[00:25:59] I'm gonna draw that here so let's start with an example where we are in city one
[00:26:01] with an example where we are in city one and then maybe we walk to City 2 so we
[00:26:05] and then maybe we walk to City 2 so we can walk to city 2 and then from there
[00:26:08] can walk to city 2 and then from there maybe I have two options I can keep
[00:26:10] maybe I have two options I can keep walking to get to City four so I can do
[00:26:13] walking to get to City four so I can do walk walk walk
[00:26:15] walk walk walk or maybe I can take the tram and end up
[00:26:19] or maybe I can take the tram and end up in City four and and the thing is I
[00:26:24] in City four and and the thing is I don't actually know what the costs of
[00:26:25] don't actually know what the costs of these these actions are I don't know
[00:26:28] these these actions are I don't know what the cost of don't walk is what the
[00:26:30] what the cost of don't walk is what the cost of tram is but one thing I know is
[00:26:34] cost of tram is but one thing I know is that my my solution path my Y is equal
[00:26:37] that my my solution path my Y is equal to walk walk walk so so one way to go
[00:26:47] to walk walk walk so so one way to go about this is to actually start with
[00:26:49] about this is to actually start with some initialization of these costs so
[00:26:52] some initialization of these costs so the way we are defining these costs are
[00:26:54] the way we are defining these costs are going to be I'm going to use the word
[00:26:57] going to be I'm going to use the word I'm gonna write here maybe oh just right
[00:27:01] I'm gonna write here maybe oh just right up here I'm going to use W like because
[00:27:07] up here I'm going to use W like because I want to use the same notation as the
[00:27:09] I want to use the same notation as the learning lectures so W is going to be
[00:27:12] learning lectures so W is going to be the weight that each one of my actions I
[00:27:15] the weight that each one of my actions I have two actions in this case I can
[00:27:17] have two actions in this case I can either walk or I can take the tram so
[00:27:19] either walk or I can take the tram so I'm going to call them action one so W
[00:27:21] I'm going to call them action one so W of action one is W of walking and then W
[00:27:27] of action one is W of walking and then W of action
[00:27:28] of action is w of taking the tram so action to is
[00:27:32] is w of taking the tram so action to is taking the track so I'm defining these W
[00:27:35] taking the track so I'm defining these W values and the way I'm defining these
[00:27:37] values and the way I'm defining these weights is just as a function of actions
[00:27:40] weights is just as a function of actions this could technically be a function of
[00:27:41] this could technically be a function of state and actions but right now I'm just
[00:27:43] state and actions but right now I'm just simplifying this and I'm saying the W's
[00:27:46] simplifying this and I'm saying the W's is this values the cost of walking just
[00:27:49] is this values the cost of walking just depend did the cost of going from 1 to 2
[00:27:52] depend did the cost of going from 1 to 2 it just depends on my action it doesn't
[00:27:55] it just depends on my action it doesn't depend on what state I am in you could
[00:27:56] depend on what state I am in you could imagine settings where it actually
[00:27:58] imagine settings where it actually depends on like what city you are in to
[00:27:59] depends on like what city you are in to okay so so then under that scenario what
[00:28:03] okay so so then under that scenario what is the cost of cost of y it is going to
[00:28:07] is the cost of cost of y it is going to be double your walk plus double your
[00:28:09] be double your walk plus double your walk plus double your walk okay so what
[00:28:12] walk plus double your walk okay so what I'm suggesting is let's just start with
[00:28:14] I'm suggesting is let's just start with something let's just start with yeah
[00:28:17] something let's just start with yeah let's just start with these weights so
[00:28:19] let's just start with these weights so I'm gonna say walking costs three and
[00:28:22] I'm gonna say walking costs three and it's always going to cost three again
[00:28:24] it's always going to cost three again the reason it's always going to cost
[00:28:26] the reason it's always going to cost three is I'm basically saying my weights
[00:28:27] three is I'm basically saying my weights only depend on the action
[00:28:29] only depend on the action they don't depend on the state so it's
[00:28:30] they don't depend on the state so it's always going to cost three and I'm gonna
[00:28:32] always going to cost three and I'm gonna say well why not let's just say the tram
[00:28:35] say well why not let's just say the tram takes the cost of two okay so this
[00:28:38] takes the cost of two okay so this doesn't like look right but like let's
[00:28:40] doesn't like look right but like let's just say I assume this is the right
[00:28:42] just say I assume this is the right solution okay so now what I want to do
[00:28:44] solution okay so now what I want to do is I want to be able to update these
[00:28:47] is I want to be able to update these weights update these values in a way
[00:28:49] weights update these values in a way that I can get this optimal path that I
[00:28:52] that I can get this optimal path that I have this this walk walk walk so how can
[00:28:55] have this this walk walk walk so how can I do that so I started with this random
[00:28:57] I do that so I started with this random initializations of what the weights are
[00:28:59] initializations of what the weights are okay so now that I have done that I can
[00:29:01] okay so now that I have done that I can I can try to figure out what is the
[00:29:02] I can try to figure out what is the optimal optimal path fear based on these
[00:29:05] optimal optimal path fear based on these weights so what is my prediction so that
[00:29:07] weights so what is my prediction so that is y prime that is my prediction based
[00:29:10] is y prime that is my prediction based on these weights that I've just set up
[00:29:11] on these weights that I've just set up in terms of what what the optimal path
[00:29:13] in terms of what what the optimal path is well what is that that is walked ran
[00:29:15] is well what is that that is walked ran because this cost five and discussed
[00:29:17] because this cost five and discussed nine so with these weights these random
[00:29:20] nine so with these weights these random weights or them just come up with I'm
[00:29:21] weights or them just come up with I'm going to pick what contra and that is my
[00:29:24] going to pick what contra and that is my prediction okay so now what we want to
[00:29:31] prediction okay so now what we want to do is you want to update our w's based
[00:29:33] do is you want to update our w's based on the fact that our true label is walk
[00:29:37] on the fact that our true label is walk walk walk and our prediction is walking
[00:29:39] walk walk and our prediction is walking on
[00:29:41] on and the algorithm that kind of does this
[00:29:43] and the algorithm that kind of does this it's just like the most like silliest
[00:29:45] it's just like the most like silliest thing possible so so what it does is
[00:29:47] thing possible so so what it does is it's going to first look at the truth
[00:29:49] it's going to first look at the truth value of W okay so it's going to look at
[00:29:52] value of W okay so it's going to look at so so so the weights are starting from
[00:29:54] so so so the weights are starting from so I decided that this guy is three and
[00:29:57] so I decided that this guy is three and I decided that this guy's too and I'm
[00:29:59] I decided that this guy's too and I'm gonna update them so I'm gonna look at
[00:30:01] gonna update them so I'm gonna look at every action in this path and for every
[00:30:04] every action in this path and for every action in this path I'm going to down
[00:30:07] action in this path I'm going to down wait the the weight of that well why am
[00:30:09] wait the the weight of that well why am I going to do that because I don't want
[00:30:11] I going to do that because I don't want to penalize that right this is the true
[00:30:12] to penalize that right this is the true thing I want the weight of the true
[00:30:14] thing I want the weight of the true thing to be small so I see walk I'm like
[00:30:16] thing to be small so I see walk I'm like okay so I see you walk the weight of
[00:30:18] okay so I see you walk the weight of that was three I'm going to down rate
[00:30:20] that was three I'm going to down rate that by one I'm gonna make that two I
[00:30:22] that by one I'm gonna make that two I see walk again so I'm gonna bring that
[00:30:25] see walk again so I'm gonna bring that one I see walk again I'm gonna subtract
[00:30:30] one I see walk again I'm gonna subtract one again I end up at zero okay now I'm
[00:30:34] one again I end up at zero okay now I'm going to go over my prediction and then
[00:30:37] going to go over my prediction and then for every action I see here I'm going to
[00:30:38] for every action I see here I'm going to bring it up bring the cost the weight up
[00:30:41] bring it up bring the cost the weight up by one so I see walk again here I'm
[00:30:44] by one so I see walk again here I'm going to bring it up by one so so these
[00:30:47] going to bring it up by one so so these were subtract subtract subtract bring it
[00:30:50] were subtract subtract subtract bring it up by one because it's over my Y Prime
[00:30:52] up by one because it's over my Y Prime and then I see Tran and then because I
[00:30:56] and then I see Tran and then because I see tram I'm going to bring this up by
[00:30:59] see tram I'm going to bring this up by one and that ends up in three so my new
[00:31:02] one and that ends up in three so my new weights here are going to be three the
[00:31:06] weights here are going to be three the the weight of walk just became one and
[00:31:10] the weight of walk just became one and then the weight of Tran just became
[00:31:13] then the weight of Tran just became three okay and and now I can kind of
[00:31:18] three okay and and now I can kind of repeat doing this and see if that gets
[00:31:20] repeat doing this and see if that gets me this this optimal solution or not so
[00:31:22] me this this optimal solution or not so I'm gonna try running my search
[00:31:23] I'm gonna try running my search algorithm if I run my search algorithm
[00:31:25] algorithm if I run my search algorithm this path and this path cost for this
[00:31:28] this path and this path cost for this path cost three so I'm actually going to
[00:31:30] path cost three so I'm actually going to get this path and this path so my new
[00:31:32] get this path and this path so my new prediction is just going to be walk walk
[00:31:33] prediction is just going to be walk walk walk there are going to be the same
[00:31:34] walk there are going to be the same thing my weights are now going to change
[00:31:36] thing my weights are now going to change I'm going to converge
[00:31:40] so I'm talking about a very simplified
[00:31:43] so I'm talking about a very simplified version of this but the idea is always
[00:31:44] version of this but the idea is always one so the very simplified version of
[00:31:46] one so the very simplified version of this is this version where I'm saying
[00:31:47] this is this version where I'm saying the W is just depend on on actions if
[00:31:50] the W is just depend on on actions if you if you make the weights depend on
[00:31:52] you if you make the weights depend on state and actions there is a more
[00:31:53] state and actions there is a more generalized form of this this is called
[00:31:55] generalized form of this this is called the strip and the structure perceptron
[00:31:57] the strip and the structure perceptron algorithm you'll talk about who briefly
[00:31:59] algorithm you'll talk about who briefly talked about the diversion where there
[00:32:01] talked about the diversion where there is a state action too but for in this
[00:32:03] is a state action too but for in this case where it just depends on action
[00:32:04] case where it just depends on action you're literally just bring it up by one
[00:32:06] you're literally just bring it up by one or play whatever like whatever you're
[00:32:08] or play whatever like whatever you're bringing it up here you got to bring it
[00:32:09] bringing it up here you got to bring it down by the same thing so it's plus and
[00:32:12] down by the same thing so it's plus and minus a what a raise so what am i doing
[00:32:21] minus a what a raise so what am i doing to minus one so I'll get to that so so
[00:32:23] to minus one so I'll get to that so so when I look at Y here right like this is
[00:32:26] when I look at Y here right like this is the thing that I really want it so if I
[00:32:28] the thing that I really want it so if I so when I see walk I realize that
[00:32:30] so when I see walk I realize that walking was a good thing so I need to
[00:32:32] walking was a good thing so I need to bring down the weight of that but if the
[00:32:35] bring down the weight of that but if the weights that I already had like knew
[00:32:37] weights that I already had like knew that walking is pretty good then like
[00:32:39] that walking is pretty good then like the rates that I already had knew that
[00:32:41] the rates that I already had knew that walking is pretty good I should like
[00:32:42] walking is pretty good I should like cancel that out so that's why we're
[00:32:44] cancel that out so that's why we're doing the plus one because like at this
[00:32:46] doing the plus one because like at this stage like I knew walking is pretty good
[00:32:48] stage like I knew walking is pretty good up here like like my prediction also
[00:32:50] up here like like my prediction also said walk so if I'm subtracting it off
[00:32:52] said walk so if I'm subtracting it off should add it to to kind of like get
[00:32:54] should add it to to kind of like get them cancel that but like right here
[00:32:57] them cancel that but like right here like I didn't know walking is good so
[00:32:58] like I didn't know walking is good so I'm going to bring down the weight of
[00:33:00] I'm going to bring down the weight of that and then bring up the weight of
[00:33:02] that and then bring up the weight of Tran I mistakenly thought ram is the way
[00:33:12] Tran I mistakenly thought ram is the way to go so to avoid that next time around
[00:33:14] to go so to avoid that next time around I'm going to make the cost of
[00:33:16] I'm going to make the cost of higher so I don't take that route it
[00:33:18] higher so I don't take that route it anymore there's a questionnaire secured
[00:33:21] anymore there's a questionnaire secured we like the only resource I'd like a
[00:33:24] we like the only resource I'd like a sentence
[00:33:24] sentence my primary another white pine is
[00:33:27] my primary another white pine is different from walking yes but then like
[00:33:30] different from walking yes but then like what if like we have like a long
[00:33:31] what if like we have like a long sequence and white time is only
[00:33:32] sequence and white time is only differently like one small location and
[00:33:35] differently like one small location and like with that change awaits efficiently
[00:33:38] like with that change awaits efficiently yeah so if so you're asking okay so if
[00:33:41] yeah so if so you're asking okay so if my yny prom prime are kind of like the
[00:33:43] my yny prom prime are kind of like the same thing walk walk walk or something
[00:33:44] same thing walk walk walk or something and at the very end this last one they
[00:33:46] and at the very end this last one they were going to be different yeah so like
[00:33:48] were going to be different yeah so like we were just then for that last one
[00:33:49] we were just then for that last one you're just adding one right so so it
[00:33:51] you're just adding one right so so it does like waited it does actually
[00:33:53] does like waited it does actually address that and it just run you can run
[00:33:55] address that and it just run you can run it until you get this sequences to be
[00:33:57] it until you get this sequences to be exactly the same thing so you don't have
[00:33:58] exactly the same thing so you don't have any mistakes
[00:34:00] any mistakes does it matter for our new class become
[00:34:03] does it matter for our new class become negative does it matter if our new costs
[00:34:05] negative does it matter if our new costs because it depends on what sort of
[00:34:07] because it depends on what sort of search algorithm you're using at the end
[00:34:10] search algorithm you're using at the end of the day it's fine if you're using
[00:34:11] of the day it's fine if you're using dynamic programming so I can have like a
[00:34:13] dynamic programming so I can have like a negative cost here and I'm just calling
[00:34:15] negative cost here and I'm just calling like dynamic programming at the end of
[00:34:17] like dynamic programming at the end of the day with that and that is fine the
[00:34:18] the day with that and that is fine the other is fine if the cost becomes the
[00:34:40] other is fine if the cost becomes the question is will be God here one and
[00:34:42] question is will be God here one and three is this actually right like if you
[00:34:45] three is this actually right like if you remember like when we define a strand
[00:34:46] remember like when we define a strand problem we said walking costs one and
[00:34:49] problem we said walking costs one and trial costs 2 but we never got that well
[00:34:51] trial costs 2 but we never got that well the reason we never got that is the
[00:34:53] the reason we never got that is the solution we are going to get here is
[00:34:54] solution we are going to get here is just based on our training data so if my
[00:34:57] just based on our training data so if my training data is just walk walk walk
[00:34:59] training data is just walk walk walk this is like the best thing I can get
[00:35:00] this is like the best thing I can get and I can kind of like converge to the
[00:35:02] and I can kind of like converge to the solution where the two end up being
[00:35:04] solution where the two end up being equal I don't know how many mistakes on
[00:35:06] equal I don't know how many mistakes on this if I have more like data points
[00:35:08] this if I have more like data points that I'm going to do this longer and
[00:35:09] that I'm going to do this longer and actually try it out another training
[00:35:11] actually try it out another training data and then I might converge to a
[00:35:12] data and then I might converge to a different thing so as far as
[00:35:15] different thing so as far as initializing the weights
[00:35:18] initializing the weights I'm assuming went further away you are
[00:35:22] I'm assuming went further away you are from the Asheville truth the bunkers
[00:35:24] from the Asheville truth the bunkers gonna take I please okay so the question
[00:35:28] gonna take I please okay so the question is how do we initialize so in the
[00:35:30] is how do we initialize so in the natural algorithm you're just
[00:35:31] natural algorithm you're just initializing with zero so we're
[00:35:32] initializing with zero so we're initializing everything by zero it's
[00:35:34] initializing everything by zero it's actually not that bad because you just
[00:35:36] actually not that bad because you just you just basically have this sequence
[00:35:37] you just basically have this sequence and in it for the more general case
[00:35:39] and in it for the more general case you're computing a feature value that
[00:35:42] you're computing a feature value that you just compute the full thing and you
[00:35:44] you just compute the full thing and you just do one single so it is not that
[00:35:46] just do one single so it is not that costly if we have that input and
[00:36:05] costly if we have that input and incorporate that into that so you're
[00:36:07] incorporate that into that so you're saying if we have some prior knowledge
[00:36:09] saying if we have some prior knowledge about the cost can be incorporated yeah
[00:36:11] about the cost can be incorporated yeah that is interesting so in this current
[00:36:15] that is interesting so in this current format so if you have some prior
[00:36:18] format so if you have some prior algorithm maybe you like then that your
[00:36:20] algorithm maybe you like then that your prediction is going to be better right
[00:36:22] prediction is going to be better right so if you have some knowledge about it
[00:36:24] so if you have some knowledge about it maybe you'll get a better prediction and
[00:36:25] maybe you'll get a better prediction and then based on that you don't update it
[00:36:27] then based on that you don't update it as much so maybe you can incorporate in
[00:36:29] as much so maybe you can incorporate in the search problem but like in again
[00:36:31] the search problem but like in again this is the most like general form of
[00:36:33] this is the most like general form of this algorithm the simple kind of with
[00:36:36] this algorithm the simple kind of with the simplified version of it also like
[00:36:38] the simplified version of it also like even like for the action so it's not
[00:36:40] even like for the action so it's not doing anything fancy it's not doing
[00:36:41] doing anything fancy it's not doing something that hard either
[00:36:45] overfeeding at all yes it's going to it
[00:36:49] overfeeding at all yes it's going to it can told you yeah I'll show some
[00:36:51] can told you yeah I'll show some examples on this like we're going to
[00:36:52] examples on this like we're going to code this up and then we'll see
[00:36:54] code this up and then we'll see overfitting kind of situations do so so
[00:36:56] overfitting kind of situations do so so I'll get back to that actually all right
[00:36:58] I'll get back to that actually all right all right so all right so let's move on
[00:37:02] all right so all right so let's move on okay so so this is just like the things
[00:37:04] okay so so this is just like the things that are on the slides or what I've
[00:37:06] that are on the slides or what I've already talked about so yeah so here is
[00:37:08] already talked about so yeah so here is an example so we start with three for a
[00:37:12] an example so we start with three for a walk and two for tram and then the idea
[00:37:14] walk and two for tram and then the idea is like how are we going to change the
[00:37:15] is like how are we going to change the costs so we get the solution that we
[00:37:18] costs so we get the solution that we were hoping for and and as I was saying
[00:37:22] were hoping for and and as I was saying well we can assume that the costs only
[00:37:23] well we can assume that the costs only depend on the action so I'm assuming
[00:37:25] depend on the action so I'm assuming cost of SN a is just W of a in the most
[00:37:29] cost of SN a is just W of a in the most general form it can depend on
[00:37:30] general form it can depend on on the state - okay so then if you take
[00:37:35] on the state - okay so then if you take any candidate output past then what
[00:37:37] any candidate output past then what would be the cost of the path it would
[00:37:39] would be the cost of the path it would just be the sum of these W values over
[00:37:41] just be the sum of these W values over over all the edges so it would just be W
[00:37:44] over all the edges so it would just be W of a 1 plus W of a 2 force W of 8 a 3
[00:37:46] of a 1 plus W of a 2 force W of 8 a 3 and as you've seen in this example the
[00:37:48] and as you've seen in this example the cost of a path is just double your work
[00:37:51] cost of a path is just double your work first over your walk for some of your
[00:37:52] first over your walk for some of your walk or double your walk plus W of
[00:37:54] walk or double your walk plus W of so so that's all this slide is saying so
[00:37:57] so so that's all this slide is saying so that's how we compute the cost all right
[00:38:00] that's how we compute the cost all right so so now let's actually look at this
[00:38:02] so so now let's actually look at this algorithm like running in practice okay
[00:38:07] algorithm like running in practice okay let me actually go over this video code
[00:38:08] let me actually go over this video code so so we start analyzing double used to
[00:38:11] so so we start analyzing double used to be equal to zero and then after that
[00:38:13] be equal to zero and then after that we're going to iterate for some amount
[00:38:15] we're going to iterate for some amount of T and then we have a training set of
[00:38:17] of T and then we have a training set of examples it might not be just one here I
[00:38:20] examples it might not be just one here I just showed this one example like like
[00:38:22] just showed this one example like like the only training example I had was was
[00:38:24] the only training example I had was was that wok wok wok is a good thing but I
[00:38:26] that wok wok wok is a good thing but I can you can imagine having multiple
[00:38:28] can you can imagine having multiple training examples for your search
[00:38:29] training examples for your search problem and then what you can do is you
[00:38:32] problem and then what you can do is you can compute your prediction so that is y
[00:38:34] can compute your prediction so that is y prime given that you have some W and you
[00:38:37] prime given that you have some W and you can start with this W equal to zero and
[00:38:39] can start with this W equal to zero and then just compute your prediction Y
[00:38:40] then just compute your prediction Y prime and then basically you can do this
[00:38:43] prime and then basically you can do this plus and minus type of action so for
[00:38:45] plus and minus type of action so for each action that is in your true Y that
[00:38:48] each action that is in your true Y that is in your true label you're going to
[00:38:50] is in your true label you're going to subtract one so to decrease the cost of
[00:38:53] subtract one so to decrease the cost of true Y and then for each action that is
[00:38:55] true Y and then for each action that is in your prediction you're going to add
[00:38:57] in your prediction you're going to add add one to kind of increase the cost of
[00:38:59] add one to kind of increase the cost of the predicted Y okay all right so so
[00:39:03] the predicted Y okay all right so so let's look at implementing this one and
[00:39:06] let's look at implementing this one and let's try to look at some examples here
[00:39:08] let's try to look at some examples here all right so let's go back to the
[00:39:10] all right so let's go back to the problem so this is again the same trying
[00:39:12] problem so this is again the same trying problem you just want to use the same
[00:39:15] problem you just want to use the same sort of format I actually went back and
[00:39:17] sort of format I actually went back and wrote up the history here if you
[00:39:20] wrote up the history here if you remember last time I was saying I'm not
[00:39:21] remember last time I was saying I'm not returning the history now we have a way
[00:39:23] returning the history now we have a way of returning history by each one of
[00:39:25] of returning history by each one of these algorithms because we were going
[00:39:26] these algorithms because we were going to call dynamic programming and we need
[00:39:28] to call dynamic programming and we need the history all right so let's go back
[00:39:30] the history all right so let's go back to our transportation problems so we had
[00:39:32] to our transportation problems so we had a cost of 1 and 2 for walking and tram
[00:39:36] a cost of 1 and 2 for walking and tram but what we want to do is we want to put
[00:39:38] but what we want to do is we want to put parameters there so we want to actually
[00:39:39] parameters there so we want to actually put this weight and we can give that to
[00:39:42] put this weight and we can give that to our transportation problem so
[00:39:44] our transportation problem so in addition to the number of bucks now
[00:39:45] in addition to the number of bucks now I'm going to actually give like the
[00:39:47] I'm going to actually give like the weight of different actions okay all
[00:39:53] weight of different actions okay all right so then walking has a weight and
[00:39:58] right so then walking has a weight and tram has weight so now I have updated my
[00:40:03] tram has weight so now I have updated my transportation problem to generally take
[00:40:06] transportation problem to generally take different values so so now we want to be
[00:40:10] different values so so now we want to be able to generate some some training
[00:40:12] able to generate some some training examples so that's what I want to do I
[00:40:13] examples so that's what I want to do I want to generate different types of
[00:40:14] want to generate different types of training examples that we can call so we
[00:40:17] training examples that we can call so we can get these true labels so let's
[00:40:19] can get these true labels so let's assume that the true weights for our
[00:40:21] assume that the true weights for our training example is just one on two so
[00:40:23] training example is just one on two so that is what we really want okay and
[00:40:26] that is what we really want okay and then you're going to just write this
[00:40:28] then you're going to just write this prediction function that we can call up
[00:40:30] prediction function that we can call up later to get different values of Y so
[00:40:34] later to get different values of Y so the prediction function is going to get
[00:40:36] the prediction function is going to get the number of blocks so so it's going to
[00:40:38] the number of blocks so so it's going to get an N the number of blocks here and
[00:40:42] get an N the number of blocks here and it is going to act with this path that
[00:40:45] it is going to act with this path that we want so it's going to act with these
[00:40:47] we want so it's going to act with these Y values this different time okay so all
[00:40:53] Y values this different time okay so all right so the whole point of prediction
[00:40:56] right so the whole point of prediction is is basically like running this f of X
[00:40:59] is is basically like running this f of X function and we can define our
[00:41:04] function and we can define our transportation problem its width and
[00:41:07] transportation problem its width and weight and the way we were going to get
[00:41:12] weight and the way we were going to get this is by calling dynamic programming
[00:41:13] this is by calling dynamic programming so someone asked earlier could the cost
[00:41:15] so someone asked earlier could the cost be negative well yes because now I'm
[00:41:17] be negative well yes because now I'm calling dynamic programming and if like
[00:41:19] calling dynamic programming and if like it's problem has negative cost that is
[00:41:21] it's problem has negative cost that is fine too so and the history is going to
[00:41:25] fine too so and the history is going to get and the action new state and an cost
[00:41:28] get and the action new state and an cost right so but the thing that I actually
[00:41:30] right so but the thing that I actually want to return from my predict function
[00:41:32] want to return from my predict function is a sequence of actions so I'll just
[00:41:34] is a sequence of actions so I'll just get the action out of this history that
[00:41:36] get the action out of this history that I get from dynamic programming some
[00:41:38] I get from dynamic programming some calling dynamic programming on my
[00:41:40] calling dynamic programming on my problem that is going to return a
[00:41:42] problem that is going to return a history or get the sequence of actions
[00:41:44] history or get the sequence of actions from that and that is my predict
[00:41:45] from that and that is my predict function and I can just call that later
[00:41:47] function and I can just call that later so let's go back to generating examples
[00:41:49] so let's go back to generating examples so so I'm just going to go for try out
[00:41:56] so so I'm just going to go for try out and end to go from 1 to 10 so one block
[00:41:58] and end to go from 1 to 10 so one block to 10 bucks and we're calling the
[00:42:01] to 10 bucks and we're calling the predict function on these true weights
[00:42:03] predict function on these true weights to get the true y-values so these are my
[00:42:06] to get the true y-values so these are my true labels okay
[00:42:07] true labels okay and those are my examples so my examples
[00:42:10] and those are my examples so my examples are just calling generate examples here
[00:42:12] are just calling generate examples here okay so let's just print out our
[00:42:14] okay so let's just print out our examples see how it looks like we
[00:42:16] examples see how it looks like we haven't done anything like in terms of
[00:42:18] haven't done anything like in terms of like the algorithm or anything we're
[00:42:19] like the algorithm or anything we're just like creating these training
[00:42:21] just like creating these training examples by calling this predict
[00:42:24] examples by calling this predict function on the true weights I have a
[00:42:27] function on the true weights I have a typo here generate examples I need
[00:42:30] typo here generate examples I need parentheses I'll fix the typo okay so
[00:42:38] parentheses I'll fix the typo okay so that kind of looks right right so that's
[00:42:40] that kind of looks right right so that's my training example one through nine and
[00:42:41] my training example one through nine and then what is what is the path that you
[00:42:44] then what is what is the path that you would want to do if you have these true
[00:42:46] would want to do if you have these true weights the one and two okay so now I
[00:42:49] weights the one and two okay so now I have my examples so I'm ready to write
[00:42:52] have my examples so I'm ready to write this structured perceptron algorithm it
[00:42:55] this structured perceptron algorithm it gets my examples it gets the training
[00:42:56] gets my examples it gets the training examples which are these paths and then
[00:43:02] examples which are these paths and then we're going to iterate for some range
[00:43:04] we're going to iterate for some range and then we can basically go over all
[00:43:10] and then we can basically go over all the examples that we have in our true
[00:43:12] the examples that we have in our true true y values and then we can't we can
[00:43:16] true y values and then we can't we can basically go and update our weights
[00:43:17] basically go and update our weights based on based on that and based on our
[00:43:20] based on based on that and based on our predictions so let's initialize the
[00:43:22] predictions so let's initialize the weights to just be zero so that's for
[00:43:23] weights to just be zero so that's for walking in tram they're just zero and
[00:43:26] walking in tram they're just zero and prediction actions this is when we're
[00:43:29] prediction actions this is when we're calling predict based on the current
[00:43:32] calling predict based on the current weights so if my current weights are
[00:43:33] weights so if my current weights are zero then pred actions is just that y
[00:43:36] zero then pred actions is just that y prime so pred actions is y prime true
[00:43:39] prime so pred actions is y prime true actions is y like the things that we had
[00:43:42] actions is y like the things that we had on the slides if okay and then I want to
[00:43:47] on the slides if okay and then I want to count the number of mistakes I'm making
[00:43:49] count the number of mistakes I'm making too so if the two are not equal to each
[00:43:50] too so if the two are not equal to each other then I'm going to just keep it
[00:43:52] other then I'm going to just keep it counter for number of mistakes if the
[00:43:54] counter for number of mistakes if the two become equal then then my number of
[00:43:56] two become equal then then my number of mistakes is zero I'm going to break it
[00:43:57] mistakes is zero I'm going to break it in maybe I'm happy then okay so I make a
[00:44:02] in maybe I'm happy then okay so I make a prediction and then after that I'm going
[00:44:04] prediction and then after that I'm going to update the weight values
[00:44:09] okay so how do I update
[00:44:12] okay so how do I update well basically subtract if you're in
[00:44:15] well basically subtract if you're in true actions which is why the labels
[00:44:17] true actions which is why the labels that have created from my training
[00:44:19] that have created from my training examples and then two plus one if you're
[00:44:24] examples and then two plus one if you're in prediction actions based on the
[00:44:25] in prediction actions based on the current weight values and then that's
[00:44:27] current weight values and then that's pretty much it like you like that is
[00:44:29] pretty much it like you like that is structured perceptron okay so let's just
[00:44:32] structured perceptron okay so let's just print things nicely so we can print the
[00:44:36] print things nicely so we can print the iteration and number of mistakes we have
[00:44:38] iteration and number of mistakes we have and what is actually the weight values
[00:44:40] and what is actually the weight values that we have and I'm just breaking this
[00:44:44] that we have and I'm just breaking this whenever I have like no mistakes
[00:44:47] whenever I have like no mistakes so if number of mistakes is zero oh I'll
[00:44:49] so if number of mistakes is zero oh I'll just break this okay
[00:44:59] just break this okay that sounds good so if number of
[00:45:02] that sounds good so if number of mistakes is zero then I'll break okay
[00:45:07] mistakes is zero then I'll break okay so all good I'm gonna run this it's not
[00:45:12] so all good I'm gonna run this it's not gonna do anything because I didn't call
[00:45:14] gonna do anything because I didn't call it so I'll go back and actually call it
[00:45:16] it so I'll go back and actually call it I have another typo here if you guys can
[00:45:20] I have another typo here if you guys can guess like where does my typos
[00:45:24] guess like where does my typos this is gonna give an error well I
[00:45:34] this is gonna give an error well I called you the weights not wait so I'll
[00:45:38] called you the weights not wait so I'll go fix that okay this is driving okay so
[00:45:43] go fix that okay this is driving okay so and then this is what we get so let's
[00:45:44] and then this is what we get so let's actually look at this so what we got is
[00:45:47] actually look at this so what we got is the first iteration number of mistakes
[00:45:49] the first iteration number of mistakes was six and then we ended up actually at
[00:45:52] was six and then we ended up actually at the first iteration we ended up
[00:45:54] the first iteration we ended up converging to one two so then the second
[00:45:56] converging to one two so then the second iteration the number of mistakes just
[00:45:58] iteration the number of mistakes just became zero and then we just got one two
[00:46:00] became zero and then we just got one two which is which is the weights that we
[00:46:02] which is which is the weights that we were hoping for
[00:46:03] were hoping for okay so that kind of looks okay to me
[00:46:06] okay so that kind of looks okay to me that's my training data everything looks
[00:46:08] that's my training data everything looks fine there's a question actually
[00:46:10] fine there's a question actually I like integers
[00:46:14] I like integers yeah so in this case you're sending all
[00:46:16] yeah so in this case you're sending all the way time give it our update model as
[00:46:19] the way time give it our update model as well we're assuming that the number of
[00:46:21] well we're assuming that the number of walks of the number of trams are
[00:46:22] walks of the number of trams are different where trim was in a different
[00:46:24] different where trim was in a different location but the number of Lawson
[00:46:25] location but the number of Lawson appearances correct you would still
[00:46:28] appearances correct you would still salute
[00:46:30] salute I see what you're asking no you treat it
[00:46:32] I see what you're asking no you treat it like it should figure figure that out so
[00:46:35] like it should figure figure that out so um you we can go over an example after
[00:46:38] um you we can go over an example after after the class and I'll show you like
[00:46:40] after the class and I'll show you like how how it actually doesn't all right so
[00:46:46] okay so let's try 1 &amp; 3 so we'd 1 &amp; 3
[00:46:50] okay so let's try 1 &amp; 3 so we'd 1 &amp; 3 takes a little bit longer and but it
[00:46:53] takes a little bit longer and but it does recover so 1 &amp; 4 is actually the
[00:46:57] does recover so 1 &amp; 4 is actually the interesting one because it does recover
[00:46:59] interesting one because it does recover something it does recover to 8
[00:47:02] something it does recover to 8 it doesn't recover 1 &amp; 4 but like given
[00:47:04] it doesn't recover 1 &amp; 4 but like given my data actually 2 8 is like like there
[00:47:07] my data actually 2 8 is like like there is no reason for me to get 1 1 &amp; 4 like
[00:47:10] is no reason for me to get 1 1 &amp; 4 like the ratio of them is the thing that I
[00:47:11] the ratio of them is the thing that I actually care about so even if I get 2 &amp;
[00:47:13] actually care about so even if I get 2 &amp; 8 like like that is a reasonable set of
[00:47:16] 8 like like that is a reasonable set of weights that one could get I'm gonna try
[00:47:22] weights that one could get I'm gonna try a couple more things so let's try one in
[00:47:24] a couple more things so let's try one in five so I'm gonna try one in five and
[00:47:27] five so I'm gonna try one in five and this is what I get
[00:47:29] this is what I get so I get the way to walk to be minus one
[00:47:32] so I get the way to walk to be minus one an afraid of charm to be one no more
[00:47:35] an afraid of charm to be one no more mistakes is zero so why is this
[00:47:37] mistakes is zero so why is this happening
[00:47:42] now your training is just all walking so
[00:47:45] now your training is just all walking so starting to just walk yeah so what's
[00:47:48] starting to just walk yeah so what's happening here is if you look at my
[00:47:50] happening here is if you look at my training data up here my training data
[00:47:52] training data up here my training data is just has like walk like all walks it
[00:47:54] is just has like walk like all walks it hasn't seen tram ever so it has no idea
[00:47:56] hasn't seen tram ever so it has no idea like what the cost of tram is with
[00:47:59] like what the cost of tram is with respect to cost the walk so it's not
[00:48:01] respect to cost the walk so it's not gonna learn that so we're gonna fix that
[00:48:02] gonna learn that so we're gonna fix that like one way to fix that is to go and
[00:48:04] like one way to fix that is to go and change the training data and actually
[00:48:06] change the training data and actually like get more data so we can kind of do
[00:48:09] like get more data so we can kind of do that so like just one thing to remember
[00:48:13] that so like just one thing to remember is this is just going to fit your
[00:48:14] is this is just going to fit your training data whatever it is so yeah so
[00:48:18] training data whatever it is so yeah so when we fix that then walk becomes - and
[00:48:21] when we fix that then walk becomes - and tram becomes nine which is not one and
[00:48:23] tram becomes nine which is not one and five but it is getting there like it's a
[00:48:26] five but it is getting there like it's a better ratio and number of mistakes is
[00:48:28] better ratio and number of mistakes is still zero so it really depends on what
[00:48:30] still zero so it really depends on what you're looking for like if you're trying
[00:48:31] you're looking for like if you're trying to like match your data and know if your
[00:48:33] to like match your data and know if your number of mistakes is zero and you're
[00:48:34] number of mistakes is zero and you're happy with this you can just go with
[00:48:36] happy with this you can just go with this and even though like it happened
[00:48:38] this and even though like it happened like actually recovered the exact value
[00:48:40] like actually recovered the exact value the ratios you can that's fine or maybe
[00:48:43] the ratios you can that's fine or maybe you're looking for the exact ratios and
[00:48:45] you're looking for the exact ratios and you should like run it longer more
[00:48:46] you should like run it longer more iteration it's a question structured
[00:48:48] iteration it's a question structured perceptron like suspect - getting stuck
[00:48:50] perceptron like suspect - getting stuck in local optima sorry I was looking at
[00:49:07] that is a good question so actually let
[00:49:12] that is a good question so actually let me think about that
[00:49:14] me think about that the Eustis and then i'll Pirie do you
[00:49:16] the Eustis and then i'll Pirie do you actually know if this gets into local
[00:49:18] actually know if this gets into local optima i have experienced it personally
[00:49:23] optima i have experienced it personally I feel like there's there's reasons for
[00:49:26] I feel like there's there's reasons for it to do this it's doing this kind of me
[00:49:30] it to do this it's doing this kind of me think about this because even in a more
[00:49:36] think about this because even in a more general form of it it's commonly used in
[00:49:39] general form of it it's commonly used in like like matching like sentence like
[00:49:42] like like matching like sentence like words and sentences so I haven't
[00:49:44] words and sentences so I haven't experienced it either but I can look
[00:49:46] experienced it either but I can look into that and back to you house guys are
[00:49:51] into that and back to you house guys are we just being at all of your optimal
[00:49:52] we just being at all of your optimal pass
[00:49:53] pass yes yeah yeah if we do figure out all
[00:49:56] yes yeah yeah if we do figure out all the alcohol paths and technically you
[00:49:57] the alcohol paths and technically you should just be complex right because I
[00:49:59] should just be complex right because I can just match up and if you're feeding
[00:50:04] can just match up and if you're feeding it all the optimal path it should you
[00:50:07] it all the optimal path it should you just matching path you're saying is yeah
[00:50:15] just matching path you're saying is yeah so in terms of okay so yeah so in terms
[00:50:17] so in terms of okay so yeah so in terms of like bringing down the number of
[00:50:19] of like bringing down the number of mistakes then then it should always
[00:50:20] mistakes then then it should always match it but if you have some true like
[00:50:23] match it but if you have some true like weights that you're looking for and it's
[00:50:25] weights that you're looking for and it's not represented in your data set then
[00:50:27] not represented in your data set then it's not necessarily like like learning
[00:50:29] it's not necessarily like like learning that so so in those settings they could
[00:50:31] that so so in those settings they could fall into local optima so kind of like a
[00:50:33] fall into local optima so kind of like a another version of this is when you're
[00:50:35] another version of this is when you're doing like reward learning and then you
[00:50:37] doing like reward learning and then you actually have this true reward you want
[00:50:39] actually have this true reward you want to find like in those settings you can
[00:50:40] to find like in those settings you can totally fall into like local optima
[00:50:42] totally fall into like local optima because you want to find out what the
[00:50:43] because you want to find out what the reward function is but you're right like
[00:50:45] reward function is but you're right like if you're just matching the data so the
[00:50:53] if you're just matching the data so the scaling would be a different problem
[00:50:54] scaling would be a different problem right so the scaling is kind of yeah so
[00:50:57] right so the scaling is kind of yeah so you can you have reward shaping so you
[00:50:58] you can you have reward shaping so you can have different versions of the
[00:50:59] can have different versions of the reverse function and if you get any of
[00:51:01] reverse function and if you get any of them that is fine but but you might
[00:51:04] them that is fine but but you might still I get into local optima that's not
[00:51:06] still I get into local optima that's not explained by reward shaping so okay so
[00:51:08] explained by reward shaping so okay so that we can talk about these things are
[00:51:10] that we can talk about these things are fine maybe I should just move on to the
[00:51:14] fine maybe I should just move on to the next topics cuz we have some more stuff
[00:51:17] next topics cuz we have some more stuff going on okay so I was actually going to
[00:51:19] going on okay so I was actually going to skip these slides because we have stuff
[00:51:21] skip these slides because we have stuff coming up but this is a more general
[00:51:22] coming up but this is a more general form of it so remember I was saying this
[00:51:24] form of it so remember I was saying this w is a function of a but you could you
[00:51:29] w is a function of a but you could you could have a more general form where
[00:51:32] could have a more general form where your cost function is not just W as a
[00:51:34] your cost function is not just W as a function of a
[00:51:35] function of a it is actually W times a set of features
[00:51:37] it is actually W times a set of features and and then the cost of a path is W
[00:51:41] and and then the cost of a path is W times the features of the path and
[00:51:43] times the features of the path and that's just the sum of features over the
[00:51:45] that's just the sum of features over the edges so so you can have this more
[00:51:47] edges so so you can have this more general form go over the slides later on
[00:51:49] general form go over the slides later on maybe because we got to move to the next
[00:51:51] maybe because we got to move to the next part but just real quick the update here
[00:51:54] part but just real quick the update here is this more general form up update
[00:51:56] is this more general form up update which is update your W based on
[00:51:58] which is update your W based on subtracting the features over your true
[00:52:01] subtracting the features over your true true path plus the features over your
[00:52:03] true path plus the features over your predicted path so more general form of
[00:52:05] predicted path so more general form of this it's called Cohen's algorithm so
[00:52:07] this it's called Cohen's algorithm so Mike Collins was working on this and in
[00:52:10] Mike Collins was working on this and in natural language processing he was
[00:52:11] natural language processing he was actually interested in it in the setting
[00:52:13] actually interested in it in the setting of part the speech target and tagging so
[00:52:15] of part the speech target and tagging so so you might have like a sentence and
[00:52:18] so you might have like a sentence and you want to tag each one of the each one
[00:52:20] you want to tag each one of the each one of the labels here as a noun or verb or
[00:52:23] of the labels here as a noun or verb or determiner or now and again so so he was
[00:52:26] determiner or now and again so so he was thinking he was basically looking at
[00:52:27] thinking he was basically looking at this problem as a search problem and he
[00:52:30] this problem as a search problem and he was using like similar type of
[00:52:31] was using like similar type of algorithms to try to figure out like
[00:52:34] algorithms to try to figure out like like match what what the value like
[00:52:36] like match what what the value like match noun or like each one of these
[00:52:39] match noun or like each one of these parts of speech tags to the sentence so
[00:52:42] parts of speech tags to the sentence so he has some scores and then based on the
[00:52:44] he has some scores and then based on the scores and and his data set he goes like
[00:52:46] scores and and his data set he goes like up and down he moves the scores up and
[00:52:48] up and down he moves the scores up and down which uses the same idea you can
[00:52:50] down which uses the same idea you can use the same idea again in machine
[00:52:52] use the same idea again in machine translation so you can have like if you
[00:52:53] translation so you can have like if you have heard of like beam search you can
[00:52:55] have heard of like beam search you can have multiple types like like a bunch of
[00:52:58] have multiple types like like a bunch of translations of some phrase and then you
[00:53:00] translations of some phrase and then you can operate and down rate them based on
[00:53:02] can operate and down rate them based on your training data okay all right okay
[00:53:07] your training data okay all right okay so now let's move to a ice a ice a star
[00:53:10] so now let's move to a ice a ice a star not a I star a star search alright
[00:53:14] not a I star a star search alright so okay so we've talked about this idea
[00:53:17] so okay so we've talked about this idea of learning costs right so we've talked
[00:53:19] of learning costs right so we've talked about search problems in general doing
[00:53:22] about search problems in general doing inference and then doing learning on top
[00:53:24] inference and then doing learning on top of them and then now I want to talk a
[00:53:26] of them and then now I want to talk a little bit about kind of making things
[00:53:28] little bit about kind of making things faster using smarter ideas and smarter
[00:53:31] faster using smarter ideas and smarter heuristics there's a question
[00:53:34] I would say what is the loss function
[00:53:37] I would say what is the loss function that we are trying to minimize this in
[00:53:39] that we are trying to minimize this in this structure so in this is this is a
[00:53:41] this structure so in this is this is a prediction problem right so in that
[00:53:43] prediction problem right so in that prediction problem we are trying to
[00:53:45] prediction problem we are trying to basically figure out what W WS are as
[00:53:48] basically figure out what W WS are as closely as possible as we are matching
[00:53:50] closely as possible as we are matching these w wi prime to Y right so so
[00:53:54] these w wi prime to Y right so so basically like the way we are solving
[00:53:57] basically like the way we are solving this is not necessarily as an
[00:53:59] this is not necessarily as an optimization the way that we have solve
[00:54:00] optimization the way that we have solve other types of learning problems the way
[00:54:02] other types of learning problems the way we are solving it what is by just like
[00:54:04] we are solving it what is by just like tweaking these weights to try to match
[00:54:06] tweaking these weights to try to match my Y as closely as possible to 2y time
[00:54:09] my Y as closely as possible to 2y time okay all right okay so let's get taught
[00:54:13] okay all right okay so let's get taught to talk about a star so I don't have
[00:54:16] to talk about a star so I don't have internet so I can't show these but I
[00:54:19] internet so I can't show these but I think the link for these should work
[00:54:20] think the link for these should work when you go to their to the file so the
[00:54:24] when you go to their to the file so the idea is if you go back to uniform cost
[00:54:27] idea is if you go back to uniform cost search like an uniform cost search what
[00:54:30] search like an uniform cost search what we wanted to do was you want to get from
[00:54:31] we wanted to do was you want to get from a point to some solution but we would
[00:54:34] a point to some solution but we would uniformly like increase explore the
[00:54:37] uniformly like increase explore the states around us until we get to some
[00:54:40] states around us until we get to some final state the idea of a a-star is to
[00:54:43] final state the idea of a a-star is to basically do a uniform cost search but
[00:54:45] basically do a uniform cost search but do it a little bit smarter and move
[00:54:47] do it a little bit smarter and move towards the direction of the goal state
[00:54:48] towards the direction of the goal state so if I have a goal state particularly
[00:54:50] so if I have a goal state particularly like in that corner maybe I can I can
[00:54:53] like in that corner maybe I can I can move in that direction smarter right
[00:54:55] move in that direction smarter right okay so here is like an example of that
[00:54:59] okay so here is like an example of that pictorially so I can start from a start
[00:55:01] pictorially so I can start from a start and and if I'm using uniform cost search
[00:55:04] and and if I'm using uniform cost search again I'm uniformly kind of exploring
[00:55:06] again I'm uniformly kind of exploring all the states possible until I hit my s
[00:55:09] all the states possible until I hit my s end and then I'm happy I'm done I've
[00:55:11] end and then I'm happy I'm done I've solved my search problem everything is
[00:55:13] solved my search problem everything is good but the thing is I've done all
[00:55:14] good but the thing is I've done all these like wasted effort on this side
[00:55:16] these like wasted effort on this side which is just not that great okay
[00:55:19] which is just not that great okay so uniform cost search and in that sense
[00:55:22] so uniform cost search and in that sense has this problem of just exploring a
[00:55:23] has this problem of just exploring a bunch of states for no good reason and
[00:55:25] bunch of states for no good reason and what we want to do is we want to take
[00:55:28] what we want to do is we want to take into account that we're just going from
[00:55:30] into account that we're just going from a star to s and so we don't really like
[00:55:32] a star to s and so we don't really like need to do all of that we can actually
[00:55:34] need to do all of that we can actually just try to get to the get to the end
[00:55:36] just try to get to the get to the end state okay so so going back to maybe I'm
[00:55:42] state okay so so going back to maybe I'm gonna go on this side so
[00:55:45] going back to how these search problems
[00:55:47] going back to how these search problems work the idea is to start from a start
[00:55:52] work the idea is to start from a start and then get to some state s and then we
[00:55:56] and then get to some state s and then we have this s end okay and what uniform
[00:55:59] have this s end okay and what uniform cost search does is it basically orders
[00:56:01] cost search does is it basically orders the states based on past cost of s and
[00:56:07] the states based on past cost of s and then explores everything around it based
[00:56:09] then explores everything around it based on past cost of f s until it reaches s
[00:56:11] on past cost of f s until it reaches s end okay but when you're in state s like
[00:56:14] end okay but when you're in state s like there is also this thing called future
[00:56:17] there is also this thing called future cost earnest right and ideally when I'm
[00:56:19] cost earnest right and ideally when I'm on the state s I don't want to explore
[00:56:21] on the state s I don't want to explore other things on this side I actually
[00:56:22] other things on this side I actually want to want to move in the direction of
[00:56:25] want to want to move in the direction of kind of reducing my my future cost and
[00:56:27] kind of reducing my my future cost and getting to my to my end State okay so so
[00:56:30] getting to my to my end State okay so so the cost of me getting from s start to
[00:56:32] the cost of me getting from s start to us and it's really just like past cost
[00:56:35] us and it's really just like past cost of s plus future cost of s and if I knew
[00:56:41] of s plus future cost of s and if I knew what future cost of s was I would just
[00:56:43] what future cost of s was I would just move in that direction but if I knew
[00:56:45] move in that direction but if I knew what future cost of S is well the
[00:56:47] what future cost of S is well the problem was solved right like I had the
[00:56:49] problem was solved right like I had the answers of my search problem like I'm
[00:56:51] answers of my search problem like I'm solving a problem so in reality I don't
[00:56:53] solving a problem so in reality I don't have access to future costs sorry I have
[00:56:55] have access to future costs sorry I have no idea what future cost is but I do
[00:56:57] no idea what future cost is but I do have access to something like I can't
[00:56:59] have access to something like I can't potentially have access to something
[00:57:00] potentially have access to something else and I'm gonna call that H of s and
[00:57:05] else and I'm gonna call that H of s and that is an estimate of future costs so
[00:57:08] that is an estimate of future costs so I'm going to add a function called H of
[00:57:10] I'm going to add a function called H of s and this is called a heuristic and a
[00:57:13] s and this is called a heuristic and a Serie C could estimate what future cost
[00:57:15] Serie C could estimate what future cost is and if I have access to this
[00:57:18] is and if I have access to this heuristic maybe I can update my cost to
[00:57:21] heuristic maybe I can update my cost to be something as what the past cost is in
[00:57:24] be something as what the past cost is in addition to that like I can add this
[00:57:26] addition to that like I can add this heuristic and that helps me to be a
[00:57:28] heuristic and that helps me to be a little bit smarter when I'm running my
[00:57:30] little bit smarter when I'm running my algorithm okay so so the idea is ideally
[00:57:33] algorithm okay so so the idea is ideally like what I would want to do is I want
[00:57:34] like what I would want to do is I want to explore in the order of past cost
[00:57:36] to explore in the order of past cost plus future cost I don't have future
[00:57:39] plus future cost I don't have future costs if I had future costs I had the
[00:57:40] costs if I had future costs I had the answer to my search problem instead what
[00:57:43] answer to my search problem instead what if star does is its it exports in the
[00:57:46] if star does is its it exports in the order of past costs plus some H of s ok
[00:57:49] order of past costs plus some H of s ok so remember uniform cost search it
[00:57:51] so remember uniform cost search it explores just in the order of past costs
[00:57:54] explores just in the order of past costs so in uniform cost search like we don't
[00:57:56] so in uniform cost search like we don't have that H of s okay
[00:57:58] have that H of s okay and H of s is a heuristic it's an
[00:58:01] and H of s is a heuristic it's an estimate of the future cost all right so
[00:58:06] estimate of the future cost all right so what does a star do it actually there's
[00:58:08] what does a star do it actually there's something really simple so so a star
[00:58:10] something really simple so so a star basically just does uniform cost search
[00:58:12] basically just does uniform cost search so all it does is uniform cost search
[00:58:15] so all it does is uniform cost search with a new cost so before I had this
[00:58:18] with a new cost so before I had this blue cost cost of SN a this was my cost
[00:58:22] blue cost cost of SN a this was my cost before now I'm going to update my cost
[00:58:24] before now I'm going to update my cost to be discussed prime of SN a which is
[00:58:28] to be discussed prime of SN a which is just cost plus the heuristic over the
[00:58:30] just cost plus the heuristic over the successor of SN a minus the heuristic so
[00:58:33] successor of SN a minus the heuristic so so that is the new cost and I can just
[00:58:36] so that is the new cost and I can just run uniform cost search on this new cost
[00:58:38] run uniform cost search on this new cost so so I'm gonna call it cost prime
[00:58:42] so so I'm gonna call it cost prime listener well what is that equal to that
[00:58:46] listener well what is that equal to that is equal to cost of SN a which is what
[00:58:48] is equal to cost of SN a which is what we had before when we're doing uniform
[00:58:50] we had before when we're doing uniform cost search plus heuristic over
[00:58:54] cost search plus heuristic over successor of SN a - heuristic over s so
[00:59:02] successor of SN a - heuristic over s so why do I want this well what this is
[00:59:05] why do I want this well what this is saying is if I'm at in some state s ok
[00:59:08] saying is if I'm at in some state s ok and there is some water state successor
[00:59:12] and there is some water state successor of SN a so I can take an action a and
[00:59:15] of SN a so I can take an action a and end up in successor of SN a and there is
[00:59:17] end up in successor of SN a and there is some s end here that I'm really trying
[00:59:19] some s end here that I'm really trying to get to remember H was my estimate of
[00:59:23] to get to remember H was my estimate of future cost what this is saying is my
[00:59:26] future cost what this is saying is my estimate of future cost for getting from
[00:59:29] estimate of future cost for getting from successor to SN minus my estimate of
[00:59:35] successor to SN minus my estimate of getting from future cost of s to us and
[00:59:39] getting from future cost of s to us and should be the thing I'm adding to my
[00:59:41] should be the thing I'm adding to my cost function I should penalize that and
[00:59:43] cost function I should penalize that and what this is really enforcing is it
[00:59:45] what this is really enforcing is it basically makes me move in the direction
[00:59:47] basically makes me move in the direction of s end because because if I end up in
[00:59:51] of s end because because if I end up in some other state that is not in the
[00:59:52] some other state that is not in the direction of s and then then that thing
[00:59:55] direction of s and then then that thing that I'm adding here is basically going
[00:59:57] that I'm adding here is basically going to penalize that right it's going to be
[01:00:00] to penalize that right it's going to be saying well it's really bad that you're
[01:00:02] saying well it's really bad that you're you're going in that action I'm going to
[01:00:03] you're going in that action I'm going to put more costs on that so you never go
[01:00:05] put more costs on that so you never go in that direction you should go in the
[01:00:06] in that direction you should go in the direction that go it's ghost towards
[01:00:08] direction that go it's ghost towards your s end and that all depends on like
[01:00:11] your s end and that all depends on like what your H function is
[01:00:12] what your H function is and how good like of an H function you
[01:00:14] and how good like of an H function you have and how you're designing your your
[01:00:16] have and how you're designing your your heuristics but that's kind of the idea
[01:00:18] heuristics but that's kind of the idea behind it so here is an example actually
[01:00:20] behind it so here is an example actually so let's say that we have this example
[01:00:22] so let's say that we have this example where we have ABCD and E and we have
[01:00:25] where we have ABCD and E and we have cost of one on all of these edges and
[01:00:27] cost of one on all of these edges and what we want to do is you want to go
[01:00:29] what we want to do is you want to go from C to e that's our plan okay so if
[01:00:32] from C to e that's our plan okay so if I'm running uniform cost search well
[01:00:34] I'm running uniform cost search well what would I do
[01:00:35] what would I do I'm at C I'm going to explore B and D
[01:00:37] I'm at C I'm going to explore B and D because they have a cost of 1 and then
[01:00:40] because they have a cost of 1 and then after that I'm going to explore a and E
[01:00:42] after that I'm going to explore a and E and then finally I get to get to eat but
[01:00:44] and then finally I get to get to eat but why did I spend all of that time we can
[01:00:46] why did I spend all of that time we can get a and B I shouldn't have done that
[01:00:48] get a and B I shouldn't have done that right like a and B are not in the
[01:00:50] right like a and B are not in the direction of getting to a cent so
[01:00:52] direction of getting to a cent so instead what I can do is if someone
[01:00:54] instead what I can do is if someone comes in tells me oh I have this
[01:00:56] comes in tells me oh I have this heuristic function you can evaluate it
[01:00:58] heuristic function you can evaluate it on your state and this heuristic
[01:01:00] on your state and this heuristic function is going to give you four three
[01:01:02] function is going to give you four three two and one and zero for each one of
[01:01:04] two and one and zero for each one of these states then you can update your
[01:01:05] these states then you can update your cost and maybe you'll have a better way
[01:01:07] cost and maybe you'll have a better way of getting to a cent so this heuristic
[01:01:09] of getting to a cent so this heuristic in this case is actually perfect because
[01:01:11] in this case is actually perfect because it's actually equal to future costs like
[01:01:13] it's actually equal to future costs like the point of the heuristic is to get as
[01:01:15] the point of the heuristic is to get as close as possible to the future cost
[01:01:17] close as possible to the future cost this is exactly equal to future class
[01:01:19] this is exactly equal to future class so with this heuristic what's going to
[01:01:21] so with this heuristic what's going to happen is my new cost is going to change
[01:01:23] happen is my new cost is going to change how is it going to change well it's
[01:01:25] how is it going to change well it's going to become the cost of whatever the
[01:01:27] going to become the cost of whatever the cost of the edge was before which was 1
[01:01:29] cost of the edge was before which was 1 plus h of the case of for example cost
[01:01:32] plus h of the case of for example cost of going from C to B if you work at C to
[01:01:34] of going from C to B if you work at C to B it's the old cost which was 1 plus
[01:01:37] B it's the old cost which was 1 plus heuristic at B which is 3 minus
[01:01:40] heuristic at B which is 3 minus heuristic at C which is 2 so that ends
[01:01:43] heuristic at C which is 2 so that ends up giving me 1 plus 3 minus 2 that is
[01:01:45] up giving me 1 plus 3 minus 2 that is equal to 2 and then similarly you can
[01:01:47] equal to 2 and then similarly you can compute like all these like new cost
[01:01:49] compute like all these like new cost values the purple values and and that
[01:01:52] values the purple values and and that has a cost of 2 for going in this
[01:01:54] has a cost of 2 for going in this direction and cost of 0 for going
[01:01:56] direction and cost of 0 for going towards E and if I just run uniform cost
[01:01:58] towards E and if I just run uniform cost search again here then I can get to like
[01:02:01] search again here then I can get to like much easier
[01:02:05] there's a a sterling result in 3d
[01:02:08] there's a a sterling result in 3d approaches where this opportunities like
[01:02:11] approaches where this opportunities like go back with them better loss star
[01:02:16] go back with them better loss star result in my greedy approaches like
[01:02:19] result in my greedy approaches like where user greedy greedy
[01:02:22] where user greedy greedy yes yeah so okay so so it's all so so
[01:02:25] yes yeah so okay so so it's all so so the question is is a star like causing
[01:02:27] the question is is a star like causing like greedy approaches so no actually we
[01:02:30] like greedy approaches so no actually we were going to talk about that a little
[01:02:31] were going to talk about that a little bit a star dependent depends on the
[01:02:33] bit a star dependent depends on the heuristic you are choosing so depending
[01:02:34] heuristic you are choosing so depending on the heuristic you are choosing a star
[01:02:36] on the heuristic you are choosing a star is actually going to be like return the
[01:02:38] is actually going to be like return the optimal value but yeah it does depend on
[01:02:41] optimal value but yeah it does depend on the heuristic so it actually does the
[01:02:42] the heuristic so it actually does the exact same thing as uniform cost search
[01:02:44] exact same thing as uniform cost search if you choose a good heuristic what why
[01:02:52] if you choose a good heuristic what why is cost of C C look really bad when it's
[01:03:00] is cost of C C look really bad when it's really not since to become so cost of CB
[01:03:03] really not since to become so cost of CB o because oh I see what you're saying
[01:03:05] o because oh I see what you're saying that's what we started with so this is
[01:03:08] that's what we started with so this is like the graph that I started with so I
[01:03:10] like the graph that I started with so I started with the cost like the blue cost
[01:03:12] started with the cost like the blue cost being all one but now I'm saying those
[01:03:14] being all one but now I'm saying those costs are not good I'm going to update
[01:03:16] costs are not good I'm going to update them based on this heuristic so I can
[01:03:18] them based on this heuristic so I can get closer to the goal like as fast as
[01:03:20] get closer to the goal like as fast as possible
[01:03:26] and state well you returned like the
[01:03:28] and state well you returned like the actual cause you wouldn't that's right
[01:03:34] actual cause you wouldn't that's right so so the question is what costs are you
[01:03:36] so so the question is what costs are you going to return at the end and you do
[01:03:38] going to return at the end and you do want to return the actual cost so you
[01:03:39] want to return the actual cost so you return the actual cost but you can run
[01:03:41] return the actual cost but you can run your algorithm with this heuristic thing
[01:03:43] your algorithm with this heuristic thing added in because that allows you to
[01:03:45] added in because that allows you to explore less things and just be more
[01:03:46] explore less things and just be more efficient okay oh I got a move on
[01:03:49] efficient okay oh I got a move on all right so okay so a good question to
[01:03:53] all right so okay so a good question to ask is well what is this heuristic how
[01:03:54] ask is well what is this heuristic how does this heuristic look like like can
[01:03:56] does this heuristic look like like can any it does any heuristic like work well
[01:03:58] any it does any heuristic like work well so turns out that not every heuristic
[01:04:01] so turns out that not every heuristic works so here's an example so again the
[01:04:03] works so here's an example so again the blue things are the costs that are
[01:04:05] blue things are the costs that are already given these are the things that
[01:04:06] already given these are the things that I already have and I can just run my
[01:04:08] I already have and I can just run my search algorithm bit the red things are
[01:04:10] search algorithm bit the red things are the values of the heuristic someone gave
[01:04:12] the values of the heuristic someone gave them to me for now in general we would
[01:04:14] them to me for now in general we would want to design them so someone comes in
[01:04:17] want to design them so someone comes in and gives me this this heuristic values
[01:04:18] and gives me this this heuristic values and then what I want to do is I want to
[01:04:22] and then what I want to do is I want to compute the new cost values so the
[01:04:24] compute the new cost values so the question is is this heuristic good so I
[01:04:27] question is is this heuristic good so I get my new cost values they look like
[01:04:29] get my new cost values they look like this like does this work we don't have
[01:04:34] this like does this work we don't have time so I'm gonna answer that it's not
[01:04:36] time so I'm gonna answer that it's not gonna work but the reason this is not
[01:04:38] gonna work but the reason this is not gonna work is though we just got a
[01:04:40] gonna work is though we just got a negative edge there right so I'm running
[01:04:42] negative edge there right so I'm running uniform cost search at the end of the
[01:04:44] uniform cost search at the end of the day like a star is just uniform cost
[01:04:46] day like a star is just uniform cost search and I can't have negative edges
[01:04:48] search and I can't have negative edges so I'm not like I'd like that it's just
[01:04:51] so I'm not like I'd like that it's just not a good heuristic to have here so so
[01:04:54] not a good heuristic to have here so so the heuristics need to have specific
[01:04:55] the heuristics need to have specific properties and and we should you should
[01:04:57] properties and and we should you should think about what those properties are so
[01:04:59] think about what those properties are so one property that you would want to have
[01:05:00] one property that you would want to have the heuristics to have is this idea of
[01:05:02] the heuristics to have is this idea of consistency this is actually like the
[01:05:05] consistency this is actually like the most important property really so so we
[01:05:08] most important property really so so we talked about heuristics I'm going to
[01:05:10] talked about heuristics I'm going to talk about properties of them here
[01:05:12] talk about properties of them here heuristics H they should be consistent
[01:05:17] so the consistent heuristic has two
[01:05:19] so the consistent heuristic has two conditions the first condition is it's
[01:05:22] conditions the first condition is it's going to satisfy the triangle inequality
[01:05:23] going to satisfy the triangle inequality and and what that means is like the cost
[01:05:26] and and what that means is like the cost that you're the updated cost that you
[01:05:28] that you're the updated cost that you have should be it should be non-negative
[01:05:29] have should be it should be non-negative so so this costs prime of this yesterday
[01:05:34] so so this costs prime of this yesterday this should be positive so
[01:05:37] this should be positive so that means that the old cost of sna plus
[01:05:41] that means that the old cost of sna plus h of successor i'm gonna use as prime
[01:05:45] h of successor i'm gonna use as prime for that - H of S is greater or equal to
[01:05:49] for that - H of S is greater or equal to zero okay so that is the first condition
[01:05:51] zero okay so that is the first condition and then the second condition that we
[01:05:54] and then the second condition that we are going to put is that future cost of
[01:05:57] are going to put is that future cost of SN is going to be equal to zero right
[01:06:00] SN is going to be equal to zero right because the future cost of the end state
[01:06:02] because the future cost of the end state should be zero so then the heuristic at
[01:06:05] should be zero so then the heuristic at the end state is also equal to zero so
[01:06:08] the end state is also equal to zero so so these are kind of the properties that
[01:06:10] so these are kind of the properties that we would want to have if you want to
[01:06:11] we would want to have if you want to talk about consistent heuristics okay
[01:06:14] talk about consistent heuristics okay and they're kind of like natural things
[01:06:16] and they're kind of like natural things that we would want to have right like
[01:06:17] that we would want to have right like like the first one is basically saying
[01:06:20] like the first one is basically saying well the cost you are going to end up at
[01:06:21] well the cost you are going to end up at should be should be greater than or
[01:06:23] should be should be greater than or equals 0 and you can run uniform cost
[01:06:24] equals 0 and you can run uniform cost search on it but it's really like
[01:06:26] search on it but it's really like talking about this triangle inequality
[01:06:28] talking about this triangle inequality that we want to have right like we H of
[01:06:30] that we want to have right like we H of s it's kind of an estimate of this
[01:06:32] s it's kind of an estimate of this future cost so if I'm going to from s
[01:06:35] future cost so if I'm going to from s take an action with that cost of s and a
[01:06:38] take an action with that cost of s and a dad added up H of successor of s SN a
[01:06:41] dad added up H of successor of s SN a should be greater than just H of s the
[01:06:43] should be greater than just H of s the estimate of future cost of s so so
[01:06:46] estimate of future cost of s so so that's that's all it is saying and then
[01:06:47] that's that's all it is saying and then the last one also makes sense right like
[01:06:49] the last one also makes sense right like I do want my future cost of s and to be
[01:06:52] I do want my future cost of s and to be 0 right so then the heuristic at s end
[01:06:54] 0 right so then the heuristic at s end should also be equal to 0 because again
[01:06:56] should also be equal to 0 because again heuristic is just an estimate of the
[01:06:58] heuristic is just an estimate of the future cost ok all right so so what do I
[01:07:02] future cost ok all right so so what do I know about a star beyond that so one
[01:07:06] know about a star beyond that so one thing that we know is that if if H is
[01:07:09] thing that we know is that if if H is consistent so if I have this consistency
[01:07:11] consistent so if I have this consistency property then I know that a star is
[01:07:15] property then I know that a star is correct so that there's a theorem that
[01:07:19] correct so that there's a theorem that says a star is going to be correct if H
[01:07:22] says a star is going to be correct if H is consistent well we can kind of look
[01:07:28] is consistent well we can kind of look at that through an example so so let's
[01:07:30] at that through an example so so let's say that I am at the zero and I take a 1
[01:07:34] say that I am at the zero and I take a 1 and I end up at this one and I take a 2
[01:07:37] and I end up at this one and I take a 2 and a minus 3 and 0 its to take a 3 so
[01:07:46] and a minus 3 and 0 its to take a 3 so let's say that I have I have kind of
[01:07:48] let's say that I have I have kind of like a path that looks like this okay so
[01:07:51] like a path that looks like this okay so then if I'm looking at the cost of each
[01:07:55] then if I'm looking at the cost of each each one of these right I'm looking at
[01:07:56] each one of these right I'm looking at cost of cost prime of 0 and a 1
[01:08:03] cost of cost prime of 0 and a 1 well what is that equal to that's that's
[01:08:05] well what is that equal to that's that's my updated cost updated cost is old cost
[01:08:09] my updated cost updated cost is old cost which is cost of a 0 and a plus
[01:08:13] which is cost of a 0 and a plus heuristic value at s1 minus heuristic
[01:08:16] heuristic value at s1 minus heuristic value is 0 value this one point is
[01:08:19] value is 0 value this one point is heuristic value I this year okay so so
[01:08:22] heuristic value I this year okay so so that is the cost of going from s 0 and
[01:08:24] that is the cost of going from s 0 and taking a 1 I'm going to just write all
[01:08:26] taking a 1 I'm going to just write all the costs for for the rest of this to
[01:08:28] the costs for for the rest of this to figure out what's the cost of the path
[01:08:29] figure out what's the cost of the path the cost of the path is just the sum of
[01:08:31] the cost of the path is just the sum of these costs so s 1 a 2 is cost of s 1 a
[01:08:39] these costs so s 1 a 2 is cost of s 1 a 2 plus heuristic is it as 2 - heuristic
[01:08:47] 2 plus heuristic is it as 2 - heuristic it is 1 so that is the new cost of this
[01:08:50] it is 1 so that is the new cost of this edge and the new cost of the last edge
[01:08:52] edge and the new cost of the last edge which is cost prime of s 2 a 3 and that
[01:08:58] which is cost prime of s 2 a 3 and that is equal to the old cost stuff s 2 a 3
[01:09:02] is equal to the old cost stuff s 2 a 3 plus heuristic add s 3 minus heuristic
[01:09:07] plus heuristic add s 3 minus heuristic is ok so I just wrote up all these costs
[01:09:10] is ok so I just wrote up all these costs if I'm talking about the cost of a path
[01:09:13] if I'm talking about the cost of a path then it's just that these costs added up
[01:09:15] then it's just that these costs added up right so if I add up these costs what
[01:09:18] right so if I add up these costs what happens a bunch of things get cancelled
[01:09:22] happens a bunch of things get cancelled out right
[01:09:23] out right this guy gets cancelled up like this guy
[01:09:25] this guy gets cancelled up like this guy this guy gets cancelled out right and
[01:09:29] this guy gets cancelled out right and what I end up with is is some of these
[01:09:33] what I end up with is is some of these new cost is cost Prime's of Si minus 1
[01:09:39] new cost is cost Prime's of Si minus 1 AI is just equal to some of my old cost
[01:09:44] AI is just equal to some of my old cost of Si minus 1 AI plus my heuristic at
[01:09:51] of Si minus 1 AI plus my heuristic at this last state is end State
[01:09:55] this last state is end State - heuristic at this year
[01:09:58] - heuristic at this year okay I'm saying my heuristic is a
[01:10:02] okay I'm saying my heuristic is a consistent heuristic so what is a
[01:10:04] consistent heuristic so what is a property of a consistent heuristic the
[01:10:07] property of a consistent heuristic the heuristic value at s nth should be equal
[01:10:10] heuristic value at s nth should be equal to zero so this guy is also equal to
[01:10:12] to zero so this guy is also equal to zero so what I end up with is is if I
[01:10:17] zero so what I end up with is is if I look at a path with the new cost the sum
[01:10:19] look at a path with the new cost the sum of the new cost is just equal to the sum
[01:10:21] of the new cost is just equal to the sum of the old cost minus sum sum constant
[01:10:24] of the old cost minus sum sum constant and this constant is just a heuristic
[01:10:26] and this constant is just a heuristic value at s 0 ok so so why is this
[01:10:30] value at s 0 ok so so why is this important because when we talk about the
[01:10:31] important because when we talk about the correctness like remember we just proved
[01:10:33] correctness like remember we just proved at the beginning of this lecture that
[01:10:35] at the beginning of this lecture that uniform cost search is correct so the
[01:10:37] uniform cost search is correct so the cost that it is returning is optimum
[01:10:39] cost that it is returning is optimum that is that if this cost a star is just
[01:10:42] that is that if this cost a star is just uniform cost search with a new cost so a
[01:10:45] uniform cost search with a new cost so a star is just running on this new cost
[01:10:47] star is just running on this new cost but this new cost is the same thing that
[01:10:49] but this new cost is the same thing that they have as old cost minus a constant
[01:10:51] they have as old cost minus a constant so if I'm optimizing the new cost it's
[01:10:54] so if I'm optimizing the new cost it's the same thing as optimizing the old
[01:10:55] the same thing as optimizing the old cost so it is going to return the
[01:10:57] cost so it is going to return the optimal solution okay all right so that
[01:11:03] optimal solution okay all right so that is basically the same things on the
[01:11:05] is basically the same things on the slide like ok basically so that's one
[01:11:09] slide like ok basically so that's one property right so so we talked about
[01:11:10] property right so so we talked about heuristics being consistent we have now
[01:11:13] heuristics being consistent we have now just talked about a star being correct
[01:11:15] just talked about a star being correct because it's uniform cost search it's
[01:11:17] because it's uniform cost search it's it's correct only if the heuristic is
[01:11:20] it's correct only if the heuristic is consistent right like only if you have
[01:11:22] consistent right like only if you have that property because because that
[01:11:23] that property because because that consistency gets us gets us the fact
[01:11:25] consistency gets us gets us the fact that this guy is equal to 0 and gets us
[01:11:27] that this guy is equal to 0 and gets us the fact that these guys are going to be
[01:11:29] the fact that these guys are going to be positive and I can run uniform cost
[01:11:31] positive and I can run uniform cost search on de um the next property that
[01:11:34] search on de um the next property that we have here for for a store is a star
[01:11:37] we have here for for a store is a star is actually more efficient than uniform
[01:11:39] is actually more efficient than uniform cost search and we kind of have already
[01:11:41] cost search and we kind of have already seen this right like like the whole
[01:11:43] seen this right like like the whole point of a star is to not explore
[01:11:45] point of a star is to not explore everything and explore in a directed
[01:11:46] everything and explore in a directed manner so if you remember uniform cost
[01:11:50] manner so if you remember uniform cost search like how does it explore well it
[01:11:52] search like how does it explore well it explores all the states that have a past
[01:11:54] explores all the states that have a past cost that are less than the past cost of
[01:11:57] cost that are less than the past cost of ascent so again remember the uniform
[01:12:01] ascent so again remember the uniform cost search you're exploring with the
[01:12:02] cost search you're exploring with the with the order of path cost of states
[01:12:05] with the order of path cost of states and then we explore all those states
[01:12:07] and then we explore all those states that
[01:12:07] that Haskell's less than the den state a star
[01:12:12] Haskell's less than the den state a star like the thing that they have stored us
[01:12:14] like the thing that they have stored us is it explores less states so it
[01:12:16] is it explores less states so it explores states that have a past cost
[01:12:18] explores states that have a past cost less than past cost of the end state -
[01:12:22] less than past cost of the end state - the heuristic so so if you kind of look
[01:12:24] the heuristic so so if you kind of look at the right side the right side just
[01:12:26] at the right side the right side just became become smaller right like the
[01:12:29] became become smaller right like the right side for uniform cost search was
[01:12:31] right side for uniform cost search was just past cost of SN now it is past cost
[01:12:33] just past cost of SN now it is past cost of ascent - the heuristic so it just
[01:12:35] of ascent - the heuristic so it just became smaller and then why did it
[01:12:37] became smaller and then why did it become smaller because now I'm doing
[01:12:39] become smaller because now I'm doing this more directed search I'm not
[01:12:40] this more directed search I'm not searching everything uniformly around me
[01:12:42] searching everything uniformly around me and and that's the whole point of the
[01:12:44] and and that's the whole point of the heuristic okay and that makes it
[01:12:46] heuristic okay and that makes it actually more efficient so and then kind
[01:12:49] actually more efficient so and then kind of the interpretation of this is if H is
[01:12:51] of the interpretation of this is if H is larger than then that's better right
[01:12:53] larger than then that's better right like if my heuristic is as large as
[01:12:55] like if my heuristic is as large as possible well well that is better
[01:12:57] possible well well that is better because then I am kind of exploring a
[01:12:59] because then I am kind of exploring a smaller like area to get to the solution
[01:13:02] smaller like area to get to the solution the proof of this is like two lines so
[01:13:04] the proof of this is like two lines so I'm gonna escape that so let me actually
[01:13:06] I'm gonna escape that so let me actually show how this looks like so if I'm
[01:13:10] show how this looks like so if I'm trying to get from a start to s and
[01:13:12] trying to get from a start to s and again if I'm doing uniform cost search
[01:13:14] again if I'm doing uniform cost search I'm uniformly exploring so like all
[01:13:17] I'm uniformly exploring so like all states around me and that is equivalent
[01:13:19] states around me and that is equivalent to assuming that the heuristic is equal
[01:13:21] to assuming that the heuristic is equal to zero like it's basically uniform cost
[01:13:23] to zero like it's basically uniform cost search is a star when the heuristic is
[01:13:26] search is a star when the heuristic is equal to zero so what is the point of
[01:13:29] equal to zero so what is the point of the heuristic the point of the heuristic
[01:13:31] the heuristic the point of the heuristic is to estimate what the future cost this
[01:13:32] is to estimate what the future cost this if I know what the future costs is then
[01:13:35] if I know what the future costs is then then H of s is just equal to future cost
[01:13:38] then H of s is just equal to future cost and then a and that would be awesome and
[01:13:40] and then a and that would be awesome and and I only need to like explore that
[01:13:42] and I only need to like explore that green kind of space and then the thing
[01:13:44] green kind of space and then the thing I'm exploring is it's just the notes
[01:13:46] I'm exploring is it's just the notes that are under minimum past cough and
[01:13:48] that are under minimum past cough and call cost path and I'm not exploring
[01:13:51] call cost path and I'm not exploring anything extra right like that's the
[01:13:53] anything extra right like that's the most like efficient thing one can do in
[01:13:56] most like efficient thing one can do in practice like I don't have access to
[01:13:58] practice like I don't have access to future costs right and in practice if I
[01:14:00] future costs right and in practice if I had access to future costs like the
[01:14:01] had access to future costs like the problem was solved I have access to some
[01:14:04] problem was solved I have access to some heuristic that is some estimate of the
[01:14:06] heuristic that is some estimate of the future cost it's not as bad as uniform
[01:14:08] future cost it's not as bad as uniform cost search it's getting close to future
[01:14:10] cost search it's getting close to future costs like look the value of future cost
[01:14:12] costs like look the value of future cost and you're kind of somewhere in between
[01:14:14] and you're kind of somewhere in between so it is going to be more efficient than
[01:14:16] so it is going to be more efficient than uniform cost search in some sense okay
[01:14:19] uniform cost search in some sense okay all right so so basically the whole idea
[01:14:23] all right so so basically the whole idea of a star is it kind of distorts edge
[01:14:26] of a star is it kind of distorts edge edge cost and favor sees and States so
[01:14:28] edge cost and favor sees and States so I'm going to add here that a star is
[01:14:30] I'm going to add here that a star is efficient so that is the other thing
[01:14:35] okay all right so so these are all cool
[01:14:39] okay all right so so these are all cool properties one more property about here
[01:14:42] properties one more property about here is six and then after that we can talk
[01:14:43] is six and then after that we can talk about your lack stations so so there's
[01:14:46] about your lack stations so so there's also this other property called
[01:14:47] also this other property called admissibility
[01:14:48] admissibility which is something that we have kind of
[01:14:50] which is something that we have kind of been talking about already right like
[01:14:51] been talking about already right like we've been talking about how this
[01:14:52] we've been talking about how this heuristic
[01:14:53] heuristic should get close to future cost and
[01:14:55] should get close to future cost and should be an estimate of the future cost
[01:14:57] should be an estimate of the future cost so an admissible heuristic is a
[01:14:59] so an admissible heuristic is a heuristic where H of s is less than or
[01:15:02] heuristic where H of s is less than or equal to future cost and then the cool
[01:15:04] equal to future cost and then the cool thing is if you already have consistency
[01:15:06] thing is if you already have consistency then you have admissibility - so if you
[01:15:08] then you have admissibility - so if you already have this property then you have
[01:15:11] already have this property then you have admissibility - so another property is
[01:15:15] admissibility - so another property is admissible
[01:15:17] admissible which means H of s is less than or equal
[01:15:20] which means H of s is less than or equal to future cost of all right so the
[01:15:29] to future cost of all right so the proofs of these are again like just
[01:15:31] proofs of these are again like just one-liners so this one is more than one
[01:15:33] one-liners so this one is more than one line but it is actually quite easy it's
[01:15:36] line but it is actually quite easy it's in the notes so you can use induction
[01:15:37] in the notes so you can use induction here to prove to prove that if you have
[01:15:40] here to prove to prove that if you have consistency then you're going to have
[01:15:42] consistency then you're going to have admissibility - okay so so we've just
[01:15:46] admissibility - okay so so we've just talked about how a star is the
[01:15:47] talked about how a star is the sufficient thing you have talked about
[01:15:49] sufficient thing you have talked about how like we can come up with we haven't
[01:15:50] how like we can come up with we haven't talked about how you come up with your
[01:15:51] talked about how you come up with your six but we have talked about consistent
[01:15:53] six but we have talked about consistent heuristics that are going to be useful
[01:15:55] heuristics that are going to be useful and they're going to give us
[01:15:56] and they're going to give us admissibility and they're going to give
[01:15:58] admissibility and they're going to give us correctness and how like a star is
[01:16:00] us correctness and how like a star is going to be this very efficient thing
[01:16:01] going to be this very efficient thing but we actually have not talked about
[01:16:03] but we actually have not talked about how to come up with heuristics so let's
[01:16:06] how to come up with heuristics so let's spend the next yeah couple of minutes
[01:16:09] spend the next yeah couple of minutes talking about talking about how to come
[01:16:12] talking about talking about how to come up with heuristics and then the main
[01:16:14] up with heuristics and then the main idea here is just relax the problem just
[01:16:17] idea here is just relax the problem just relaxation so so what are so so the way
[01:16:20] relaxation so so what are so so the way we come up with heuristics is we pick
[01:16:22] we come up with heuristics is we pick the problem and just make it easier and
[01:16:24] the problem and just make it easier and solve that easier problem so so that is
[01:16:25] solve that easier problem so so that is kind of the whole idea of it so remember
[01:16:28] kind of the whole idea of it so remember the HMS is supposed to be close to
[01:16:31] the HMS is supposed to be close to future cost
[01:16:33] future cost and some of these problems can be really
[01:16:35] and some of these problems can be really difficult right so this so if you have a
[01:16:37] difficult right so this so if you have a lot of constraints and it becomes harder
[01:16:39] lot of constraints and it becomes harder to solve the problem so if you relax it
[01:16:41] to solve the problem so if you relax it and we just remove the constraints we
[01:16:42] and we just remove the constraints we are solving a much easier problem and
[01:16:44] are solving a much easier problem and that could be used as a heuristic as a
[01:16:46] that could be used as a heuristic as a value of heuristic that estimates what
[01:16:48] value of heuristic that estimates what the future cost this so so you want to
[01:16:53] the future cost this so so you want to remove constraints and when we remove
[01:16:54] remove constraints and when we remove constraints the cool thing that happens
[01:16:56] constraints the cool thing that happens is sometimes we have closed form
[01:16:58] is sometimes we have closed form solutions sometimes you just have easier
[01:17:00] solutions sometimes you just have easier search problems that we can solve and
[01:17:02] search problems that we can solve and sometimes you have like independence of
[01:17:03] sometimes you have like independence of problems and we can find a solutions to
[01:17:05] problems and we can find a solutions to them and that gives us a good heuristic
[01:17:08] them and that gives us a good heuristic so so that is my goal right like I would
[01:17:09] so so that is my goal right like I would want to find these heuristics so let me
[01:17:12] want to find these heuristics so let me just go through a couple of examples for
[01:17:13] just go through a couple of examples for that so so let's say I have a search
[01:17:15] that so so let's say I have a search problem and I want to get the triangle
[01:17:17] problem and I want to get the triangle to get to the circle and that is what I
[01:17:19] to get to the circle and that is what I want to do and I have all these like
[01:17:21] want to do and I have all these like walls there and that just seems really
[01:17:23] walls there and that just seems really difficult so what is a good heuristic
[01:17:25] difficult so what is a good heuristic here I'm going to just relax the problem
[01:17:28] here I'm going to just relax the problem I'm gonna remove like all those walls
[01:17:30] I'm gonna remove like all those walls just knock down the walls and have that
[01:17:32] just knock down the walls and have that problem
[01:17:33] problem that just seems much easier okay so so
[01:17:36] that just seems much easier okay so so well like now I actually have a closed
[01:17:39] well like now I actually have a closed form solution for getting the triangle
[01:17:41] form solution for getting the triangle get to the get to the circle I can just
[01:17:43] get to the get to the circle I can just compute them in half in this sense and I
[01:17:45] compute them in half in this sense and I can use that as a heuristic again it's
[01:17:47] can use that as a heuristic again it's not going to be there like actually like
[01:17:49] not going to be there like actually like what future cost is but it is an
[01:17:51] what future cost is but it is an approximation for it so so usually you
[01:17:53] approximation for it so so usually you can think of the heuristics as these
[01:17:54] can think of the heuristics as these optimistic views of what the future
[01:17:57] optimistic views of what the future costs is like like it's an optimistic
[01:17:58] costs is like like it's an optimistic view of the problem like what if there
[01:18:00] view of the problem like what if there was like no walls like if there are no
[01:18:02] was like no walls like if there are no walls here then how would I get from
[01:18:04] walls here then how would I get from location to another location the
[01:18:06] location to another location the solution to that is going to give you
[01:18:08] solution to that is going to give you this future cost Val its estimate of
[01:18:10] this future cost Val its estimate of future cost value which is which is H of
[01:18:12] future cost value which is which is H of s okay or the Tran problem let's say we
[01:18:15] s okay or the Tran problem let's say we have the Tran problem but we have a more
[01:18:18] have the Tran problem but we have a more difficult version of it where we have a
[01:18:20] difficult version of it where we have a constraint and this constraint says you
[01:18:22] constraint and this constraint says you can't have more tram actions than walk
[01:18:25] can't have more tram actions than walk actions so now this is my search problem
[01:18:27] actions so now this is my search problem I need to solve this this seems kind of
[01:18:29] I need to solve this this seems kind of difficult like we talked about how to
[01:18:31] difficult like we talked about how to come up with States for it last time and
[01:18:33] come up with States for it last time and even that seems difficult like I need to
[01:18:35] even that seems difficult like I need to have the location I need to have the
[01:18:36] have the location I need to have the difference between the walk and tram
[01:18:38] difference between the walk and tram that seems kind of difficult like like I
[01:18:40] that seems kind of difficult like like I have an order of N squared states now so
[01:18:43] have an order of N squared states now so instead of doing that well let me just
[01:18:45] instead of doing that well let me just remove the constraint
[01:18:46] remove the constraint I'm just gonna remove the constraint
[01:18:47] I'm just gonna remove the constraint relax it and then after relaxing it then
[01:18:51] relax it and then after relaxing it then I have a much easier search problem I
[01:18:53] I have a much easier search problem I need to deal with I only have this
[01:18:54] need to deal with I only have this location and then I can just go with
[01:18:57] location and then I can just go with that location and everything will be
[01:18:59] that location and everything will be great okay all right so so the idea here
[01:19:04] great okay all right so so the idea here was like for this middle part is if I if
[01:19:08] was like for this middle part is if I if I remove these constraints I'm going to
[01:19:10] I remove these constraints I'm going to have these easier search problems these
[01:19:11] have these easier search problems these relaxations and I can compute the future
[01:19:14] relaxations and I can compute the future cost of these relaxations using my
[01:19:16] cost of these relaxations using my favorite techniques like dynamic
[01:19:17] favorite techniques like dynamic programming or uniform cost search but
[01:19:19] programming or uniform cost search but but the one thing to notice is I need to
[01:19:21] but the one thing to notice is I need to compute that for one through n because
[01:19:23] compute that for one through n because this heuristic is a function of state
[01:19:25] this heuristic is a function of state right so I actually need to compute
[01:19:26] right so I actually need to compute future cost for this relaxed problem for
[01:19:29] future cost for this relaxed problem for all states from 1 through n and that
[01:19:33] all states from 1 through n and that allows me to have like a better estimate
[01:19:34] allows me to have like a better estimate of this there are some like engineering
[01:19:37] of this there are some like engineering things that you might need to do here so
[01:19:39] things that you might need to do here so so for example you might so here we are
[01:19:43] so for example you might so here we are looking for future cost so if you plan
[01:19:44] looking for future cost so if you plan to use uniform cost search for whatever
[01:19:46] to use uniform cost search for whatever reason like maybe dynamic programming
[01:19:48] reason like maybe dynamic programming doesn't work in this setting you need to
[01:19:50] doesn't work in this setting you need to use uniform cost search you need to make
[01:19:52] use uniform cost search you need to make a few engineering things to make it work
[01:19:54] a few engineering things to make it work because if you remember uniform cost
[01:19:56] because if you remember uniform cost search would only work on past cost
[01:19:58] search would only work on past cost doesn't work on future costs so you need
[01:20:00] doesn't work on future costs so you need to like create a reverse problem where
[01:20:03] to like create a reverse problem where you can actually compute future costs so
[01:20:05] you can actually compute future costs so so a few engineering things but beyond
[01:20:08] so a few engineering things but beyond that it is basically just running our
[01:20:10] that it is basically just running our search algorithms that we know on these
[01:20:13] search algorithms that we know on these relaxed problems and that will give us a
[01:20:16] relaxed problems and that will give us a heuristic value and we'll put that in
[01:20:18] heuristic value and we'll put that in our problem and we'll go and solve it ok
[01:20:20] our problem and we'll go and solve it ok and another cool thing that heuristics
[01:20:23] and another cool thing that heuristics give us is this idea of having
[01:20:25] give us is this idea of having independent subproblems so so here's
[01:20:27] independent subproblems so so here's another example I want to solve this
[01:20:29] another example I want to solve this this eight puzzle and move blocks here
[01:20:31] this eight puzzle and move blocks here and there and come up with this new
[01:20:32] and there and come up with this new configuration that seems hard again a
[01:20:36] configuration that seems hard again a relaxation of that is just assume that
[01:20:38] relaxation of that is just assume that the tiles can overlap so the original
[01:20:40] the tiles can overlap so the original problem says the tiles cannot overlap
[01:20:42] problem says the tiles cannot overlap I'm just gonna relax it and say oh you
[01:20:44] I'm just gonna relax it and say oh you can just go wherever and you can overlap
[01:20:46] can just go wherever and you can overlap ok so that is again much simpler and now
[01:20:49] ok so that is again much simpler and now I have eight independent problems for
[01:20:51] I have eight independent problems for getting each one of these points from
[01:20:53] getting each one of these points from one location to another location and I
[01:20:55] one location to another location and I have a closed form solution for that
[01:20:57] have a closed form solution for that because that's again
[01:20:58] because that's again Manhattan distance so that gives me a
[01:21:01] Manhattan distance so that gives me a heuristic that that's an estimate that's
[01:21:02] heuristic that that's an estimate that's not perfect it's an estimate and I can
[01:21:05] not perfect it's an estimate and I can use that estimate in my original search
[01:21:06] use that estimate in my original search problem to solve the search problem so
[01:21:08] problem to solve the search problem so here it was just some examples of this
[01:21:10] here it was just some examples of this idea of removing constant removing
[01:21:12] idea of removing constant removing constraints and coming up a bit better
[01:21:14] constraints and coming up a bit better heuristics so like knocking down walls
[01:21:16] heuristics so like knocking down walls like why can't ramp freely overlapping
[01:21:19] like why can't ramp freely overlapping pieces and he says and that allows you
[01:21:21] pieces and he says and that allows you to kind of solve this new problem and
[01:21:24] to kind of solve this new problem and then the idea is you're reducing these
[01:21:27] then the idea is you're reducing these edge costs from infinity to some finite
[01:21:29] edge costs from infinity to some finite finite cost okay all right so yeah so
[01:21:35] finite cost okay all right so yeah so I'm gonna wrap up here and I guess we're
[01:21:38] I'm gonna wrap up here and I guess we're going to talk about these last few
[01:21:39] going to talk about these last few slides next time since you're running
[01:21:42] slides next time since you're running wait but I think you guys have got like
[01:21:45] wait but I think you guys have got like the main idea so what's our next time


================================================================================
LECTURE 019
================================================================================

Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019)

Source: https://www.youtube.com/watch?v=9g32v7bK3Co

---

Transcript

[00:00:05] okay let's start guys okay so I'll plan
[00:00:10] okay let's start guys okay so I'll plan for two days to catch up so you're a
[00:00:12] for two days to catch up so you're a little behind so it's okay so today I
[00:00:15] little behind so it's okay so today I want to talk about MVPs Markov decision
[00:00:17] want to talk about MVPs Markov decision processes my finest to talk about that
[00:00:20] processes my finest to talk about that for the first hour and then after that I
[00:00:22] for the first hour and then after that I want to talk for ten minutes about the
[00:00:24] want to talk for ten minutes about the previous lecture so remember like you
[00:00:27] previous lecture so remember like you went over relaxation is kind of quick so
[00:00:29] went over relaxation is kind of quick so maybe we can go over that again and then
[00:00:31] maybe we can go over that again and then the last ten minutes I want to talk
[00:00:33] the last ten minutes I want to talk about the project and and kind of the
[00:00:35] about the project and and kind of the plan for the project how you should
[00:00:36] plan for the project how you should think about it this is coming up so we
[00:00:38] think about it this is coming up so we should start talking about that so this
[00:00:40] should start talking about that so this is an optimistic plan though
[00:00:42] is an optimistic plan though let's see how it goes what this is the
[00:00:45] let's see how it goes what this is the current plan okay all right so okay
[00:00:48] current plan okay all right so okay let's get into it so Markov decision
[00:00:49] let's get into it so Markov decision processes so let's start with a question
[00:00:52] processes so let's start with a question let's actually do this just by hand so
[00:00:54] let's actually do this just by hand so don't need to go to the website so the
[00:00:57] don't need to go to the website so the question is it's Friday night and you
[00:00:58] question is it's Friday night and you want to go to Mountain View and you have
[00:01:00] want to go to Mountain View and you have a bunch of options but what you want to
[00:01:02] a bunch of options but what you want to do is you want to get to Mountain View
[00:01:03] do is you want to get to Mountain View with the least amount of time okay which
[00:01:06] with the least amount of time okay which one of these modes of transportation
[00:01:08] one of these modes of transportation would you use like how many of you would
[00:01:10] would you use like how many of you would bike no one would like help of people
[00:01:13] bike no one would like help of people like how many of you would drive this is
[00:01:16] like how many of you would drive this is this is popular mountain view would be
[00:01:18] this is popular mountain view would be good Cal trainers some people would take
[00:01:21] good Cal trainers some people would take out rain sounds good uber and lyft we
[00:01:25] out rain sounds good uber and lyft we have like a good like distribution why
[00:01:27] have like a good like distribution why yes yeah the good number of you go on a
[00:01:30] yes yeah the good number of you go on a flight as flying cars are becoming a
[00:01:32] flight as flying cars are becoming a thing like this could be an option in
[00:01:34] thing like this could be an option in the future there are a lot of actually
[00:01:35] the future there are a lot of actually startups working on flying cars but but
[00:01:39] startups working on flying cars but but as you think about this problem like the
[00:01:41] as you think about this problem like the way you think about it is is there a
[00:01:43] way you think about it is is there a bunch of uncertainties in the world
[00:01:44] bunch of uncertainties in the world right it's not a necessarily a search
[00:01:46] right it's not a necessarily a search problem right you could you could bike
[00:01:47] problem right you could you could bike and you can get a flat tire and you
[00:01:49] and you can get a flat tire and you don't really know that right you have to
[00:01:51] don't really know that right you have to like kind of take that into account if
[00:01:52] like kind of take that into account if you're driving there could be traffic if
[00:01:55] you're driving there could be traffic if you're taking the Caltrain there are all
[00:01:57] you're taking the Caltrain there are all sorts of delay with the Caltrain and
[00:01:59] sorts of delay with the Caltrain and then all sorts of other uncertainties
[00:02:01] then all sorts of other uncertainties that exist in the world and then you
[00:02:03] that exist in the world and then you need to think about those so it's not
[00:02:04] need to think about those so it's not just a pure research problem where you
[00:02:06] just a pure research problem where you pick your route and then you just go
[00:02:08] pick your route and then you just go with it right there are there things
[00:02:09] with it right there are there things that can happen and that can affect your
[00:02:11] that can happen and that can affect your decision so and that kind of takes us to
[00:02:14] decision so and that kind of takes us to Markov decision processes we talked
[00:02:16] Markov decision processes we talked about search problems where everything
[00:02:17] about search problems where everything was at terminus
[00:02:18] was at terminus and now you're talking about this next
[00:02:20] and now you're talking about this next class of state basis functions which are
[00:02:22] class of state basis functions which are Markov decision processes and the idea
[00:02:24] Markov decision processes and the idea of it is you take actions but you might
[00:02:26] of it is you take actions but you might not actually end up where you expect it
[00:02:28] not actually end up where you expect it to because there's this nature around
[00:02:30] to because there's this nature around you and there's this world around you
[00:02:31] you and there's this world around you that's going to be uncertain and do
[00:02:33] that's going to be uncertain and do stuff that you didn't expect okay so so
[00:02:36] stuff that you didn't expect okay so so so far we've talked about search
[00:02:37] so far we've talked about search problems the idea of it is you start
[00:02:39] problems the idea of it is you start with the state and then you take an
[00:02:42] with the state and then you take an action and you deterministically end up
[00:02:44] action and you deterministically end up in a new state if you remember the
[00:02:46] in a new state if you remember the successor function successor of SN a
[00:02:48] successor function successor of SN a would always give us s Prime and we
[00:02:50] would always give us s Prime and we would deterministically end up in s
[00:02:52] would deterministically end up in s prime so if you have like that graph up
[00:02:54] prime so if you have like that graph up there if you start in S and you decide
[00:02:56] there if you start in S and you decide to take this action one you're going to
[00:02:58] to take this action one you're going to end up in a like there's no other option
[00:03:01] end up in a like there's no other option that's how you're gonna end up in it
[00:03:03] that's how you're gonna end up in it okay and the solution to these search
[00:03:06] okay and the solution to these search problems were these paths so we had the
[00:03:08] problems were these paths so we had the sequence of actions because I know if I
[00:03:10] sequence of actions because I know if I take action one in action three in
[00:03:12] take action one in action three in action to I know like what is a path
[00:03:14] action to I know like what is a path that I'm going to end up hide and now
[00:03:16] that I'm going to end up hide and now with the idea okay so when we think
[00:03:18] with the idea okay so when we think about Markov decision processes that is
[00:03:20] about Markov decision processes that is the setting where we have uncertainty in
[00:03:22] the setting where we have uncertainty in the world and we need to take that into
[00:03:24] the world and we need to take that into account so so the idea of it is you
[00:03:26] account so so the idea of it is you start this state you decide to take an
[00:03:28] start this state you decide to take an action but then you can randomly end up
[00:03:30] action but then you can randomly end up in different states you can randomly end
[00:03:32] in different states you can randomly end up in s1 prime or ass to prime and again
[00:03:34] up in s1 prime or ass to prime and again because there is just so many other
[00:03:35] because there is just so many other things that are happening in the world
[00:03:37] things that are happening in the world and you need to you need to worry about
[00:03:39] and you need to you need to worry about that randomness and make decisions based
[00:03:41] that randomness and make decisions based on that okay and and this actually comes
[00:03:44] on that okay and and this actually comes up pretty much like everywhere in every
[00:03:46] up pretty much like everywhere in every application so this comes up in robotics
[00:03:49] application so this comes up in robotics so for example if you have a robot that
[00:03:50] so for example if you have a robot that wants to go and pick up an object you
[00:03:52] wants to go and pick up an object you decide on your strategy everything is
[00:03:54] decide on your strategy everything is great but like when it comes to actually
[00:03:56] great but like when it comes to actually moving the robot and getting the robot
[00:03:57] moving the robot and getting the robot to do the task like the actuators can
[00:03:59] to do the task like the actuators can fail or you might have all sorts of
[00:04:01] fail or you might have all sorts of obstacles around you that you didn't
[00:04:03] obstacles around you that you didn't think about so there is uncertainty
[00:04:04] think about so there is uncertainty about the environment or uncertainty
[00:04:06] about the environment or uncertainty about your model like your actuators
[00:04:08] about your model like your actuators that that you don't necessarily think
[00:04:10] that that you don't necessarily think about and in reality they're affecting
[00:04:12] about and in reality they're affecting your decisions and where you're ending
[00:04:14] your decisions and where you're ending up at this comes up and other settings
[00:04:16] up at this comes up and other settings like resource allocation so in resource
[00:04:18] like resource allocation so in resource allocation maybe you're deciding what to
[00:04:21] allocation maybe you're deciding what to produce what is the product you would
[00:04:22] produce what is the product you would want to produce and and that kind of
[00:04:24] want to produce and and that kind of depends on what is the customer demand
[00:04:26] depends on what is the customer demand and and you might not have a good model
[00:04:28] and and you might not have a good model of that and that's uncertain right it
[00:04:30] of that and that's uncertain right it really depends on what
[00:04:32] really depends on what products customers want and what they
[00:04:33] products customers want and what they don't and you might have a model but
[00:04:35] don't and you might have a model but it's not going to be like accurate and
[00:04:37] it's not going to be like accurate and and you need to do research allocation
[00:04:39] and you need to do research allocation under those assumptions of uncertainty
[00:04:42] under those assumptions of uncertainty about the world similar thing is in
[00:04:44] about the world similar thing is in agriculture so for example you want to
[00:04:46] agriculture so for example you want to decide what sort of what to plant but
[00:04:51] decide what sort of what to plant but but again you might not be sure about
[00:04:52] but again you might not be sure about the weather if it's gonna rain or if the
[00:04:54] the weather if it's gonna rain or if the if the if the crops are going to yield
[00:04:56] if the if the crops are going to yield or not so there's a lot of uncertainty
[00:04:58] or not so there's a lot of uncertainty in these decisions that we make and and
[00:05:00] in these decisions that we make and and they make these problems to go beyond
[00:05:03] they make these problems to go beyond search problems and become problems
[00:05:05] search problems and become problems where we have uncertainty and we need to
[00:05:07] where we have uncertainty and we need to make decisions under uncertainty okay
[00:05:09] make decisions under uncertainty okay all right so let's again another example
[00:05:12] all right so let's again another example so this is a volcano crossing example so
[00:05:15] so this is a volcano crossing example so so we have an island and you're on one
[00:05:17] so we have an island and you're on one side of the island and what we want to
[00:05:19] side of the island and what we want to do so we're in that black square over
[00:05:20] do so we're in that black square over there and what we want to do is we want
[00:05:22] there and what we want to do is we want to go from this black square to this
[00:05:24] to go from this black square to this side of the island where we have the
[00:05:26] side of the island where we have the scenic view and that's gonna give us a
[00:05:28] scenic view and that's gonna give us a lot of reward and happiness so so my
[00:05:30] lot of reward and happiness so so my goal is to go from one side of the
[00:05:31] goal is to go from one side of the island to the other side of the line but
[00:05:33] island to the other side of the line but the caveat is here is that there's a
[00:05:35] the caveat is here is that there's a small kano in the middle of the island
[00:05:37] small kano in the middle of the island that I need to actually pass okay so and
[00:05:39] that I need to actually pass okay so and and if I fall into okay now I'm going to
[00:05:42] and if I fall into okay now I'm going to get a minus 50 reward more like minus
[00:05:44] get a minus 50 reward more like minus infinity but but for this example like
[00:05:47] infinity but but for this example like imagine you're getting a minus 50 reward
[00:05:48] imagine you're getting a minus 50 reward if you fall into the volcano okay so
[00:05:52] if you fall into the volcano okay so alright so so if I have this thing here
[00:05:55] alright so so if I have this thing here on the side so if my slip probability is
[00:06:00] on the side so if my slip probability is zero which is I'm sure I'm not gonna
[00:06:02] zero which is I'm sure I'm not gonna fall into it okay now should I cross the
[00:06:03] fall into it okay now should I cross the island no oh yes well I should cross the
[00:06:10] island no oh yes well I should cross the island and because I'm not gonna fall
[00:06:12] island and because I'm not gonna fall right like I'm not gonna fall into that
[00:06:13] right like I'm not gonna fall into that minus 50 sleep probability is zero I'll
[00:06:16] minus 50 sleep probability is zero I'll get to my twenty you reward everything
[00:06:19] get to my twenty you reward everything will be great okay but the thing is like
[00:06:21] will be great okay but the thing is like we've been talking about how the world
[00:06:22] we've been talking about how the world is stochastic and SU probability is not
[00:06:25] is stochastic and SU probability is not going to be zero maybe maybe it is ten
[00:06:27] going to be zero maybe maybe it is ten percent so if there is ten percent
[00:06:29] percent so if there is ten percent chance of falling into the volcano how
[00:06:31] chance of falling into the volcano how many of you would would still across the
[00:06:33] many of you would would still across the island good number so the optimal
[00:06:39] island good number so the optimal solution is actually shown by these
[00:06:40] solution is actually shown by these arrows here and yes the optimal solution
[00:06:43] arrows here and yes the optimal solution is still to cross the island like you're
[00:06:46] is still to cross the island like you're you hear we're gonna talk about all
[00:06:47] you hear we're gonna talk about all these terms but the value here is
[00:06:48] these terms but the value here is basically the value you're gonna get at
[00:06:51] basically the value you're gonna get at that beginning like stage which is a
[00:06:52] that beginning like stage which is a kind of well we'll talk about it it's
[00:06:54] kind of well we'll talk about it it's expected utility that you're gonna get
[00:06:56] expected utility that you're gonna get it's gonna go down cuz there is some
[00:06:58] it's gonna go down cuz there is some probability that you're gonna fall -
[00:06:59] probability that you're gonna fall - okay no but still like the best thing to
[00:07:01] okay no but still like the best thing to do is to cross the island
[00:07:03] do is to cross the island how about 20% how many of you would do
[00:07:05] how about 20% how many of you would do it with 20% so number of people it's
[00:07:10] it with 20% so number of people it's less still turns out that the optimal
[00:07:13] less still turns out that the optimal strategy is to cross 30% one person so
[00:07:20] strategy is to cross 30% one person so with 30% that's actually the point that
[00:07:22] with 30% that's actually the point that you kind of you'd rather not not cross
[00:07:25] you kind of you'd rather not not cross because there's a soul okay no and it's
[00:07:27] because there's a soul okay no and it's a large probability you could you could
[00:07:29] a large probability you could you could fall into the okay and the value is
[00:07:31] fall into the okay and the value is going to go down okay so these are the
[00:07:32] going to go down okay so these are the types of problems you're gonna we're
[00:07:34] types of problems you're gonna we're gonna work with yes so two is like a
[00:07:40] gonna work with yes so two is like a value the reward that you are going to
[00:07:42] value the reward that you are going to get that that state and then value you
[00:07:44] get that that state and then value you compute that your propagated back you'll
[00:07:46] compute that your propagated back you'll talk about that in detail some on how to
[00:07:47] talk about that in detail some on how to compute the value okay
[00:07:49] compute the value okay all right okay so that was just an
[00:07:52] all right okay so that was just an example so so that was an example of a
[00:07:54] example so so that was an example of a Markov decision process what we want to
[00:07:56] Markov decision process what we want to do in this lecture is we are going to
[00:07:57] do in this lecture is we are going to like again model these types of systems
[00:08:00] like again model these types of systems as Markov decision processes then we are
[00:08:03] as Markov decision processes then we are going to talk about inference type
[00:08:04] going to talk about inference type algorithms so how do we do inference how
[00:08:06] algorithms so how do we do inference how do we come up with this best strategy
[00:08:08] do we come up with this best strategy path and in the middle I'm go talk about
[00:08:10] path and in the middle I'm go talk about policy valuation which is not an
[00:08:13] policy valuation which is not an inference algorithm but it's kind of a
[00:08:15] inference algorithm but it's kind of a step towards it and it's basically this
[00:08:16] step towards it and it's basically this idea of if someone tells me this is a
[00:08:18] idea of if someone tells me this is a policy can I evaluate how good it is and
[00:08:21] policy can I evaluate how good it is and we'll talk about value iteration which
[00:08:23] we'll talk about value iteration which tries to figure out what is the best
[00:08:24] tries to figure out what is the best policy that I can take so that's a plan
[00:08:28] policy that I can take so that's a plan for today then next lecture we are going
[00:08:29] for today then next lecture we are going to talk about reinforcement learning
[00:08:31] to talk about reinforcement learning where we don't actually know what the
[00:08:33] where we don't actually know what the reward is and we don't know what the
[00:08:34] reward is and we don't know what the what the transitions are so that's kind
[00:08:37] what the transitions are so that's kind of the learning part of a part of this
[00:08:38] of the learning part of a part of this MDP lectures so Reid is going to
[00:08:41] MDP lectures so Reid is going to actually do the Duda lecture next nix on
[00:08:43] actually do the Duda lecture next nix on Wednesday right okay so let's get into
[00:08:46] Wednesday right okay so let's get into let's get into Markov decision processes
[00:08:49] let's get into Markov decision processes so we have a bunch of examples
[00:08:50] so we have a bunch of examples throughout this lecture so let's look at
[00:08:52] throughout this lecture so let's look at another example so all right so actually
[00:08:54] another example so all right so actually I do need volunteers for this so in this
[00:08:57] I do need volunteers for this so in this example we have
[00:08:59] example we have two rounds and the idea is you can at
[00:09:02] two rounds and the idea is you can at any point in time you can choose two
[00:09:04] any point in time you can choose two actions you can either stay or you can
[00:09:06] actions you can either stay or you can quit okay if you decide to quit I'm
[00:09:09] quit okay if you decide to quit I'm going to give you $10 actually I'm not
[00:09:12] going to give you $10 actually I'm not gonna give you $10 but imagine I'm gonna
[00:09:13] gonna give you $10 but imagine I'm gonna give you $10 and then we'll end the game
[00:09:16] give you $10 and then we'll end the game okay and then if you decide to stay then
[00:09:19] okay and then if you decide to stay then you're gonna get $4 and then I'll roll
[00:09:20] you're gonna get $4 and then I'll roll the dice if I get one or two will end
[00:09:23] the dice if I get one or two will end the game otherwise you're going to
[00:09:26] the game otherwise you're going to continue to the next round and you can
[00:09:27] continue to the next round and you can decide again okay so who wants to play
[00:09:29] decide again okay so who wants to play with this okay all right volunteer do
[00:09:32] with this okay all right volunteer do you want to stay or quit so that was
[00:09:37] you want to stay or quit so that was easy you got your $10 does anyone else
[00:09:40] easy you got your $10 does anyone else wanna play stay oh you got $8 sorry I
[00:09:53] wanna play stay oh you got $8 sorry I kind of get the idea here
[00:09:55] kind of get the idea here right so you have these actions and then
[00:09:57] right so you have these actions and then with one of them like if you said to
[00:09:59] with one of them like if you said to quit you deterministically will get your
[00:10:01] quit you deterministically will get your $10 and you're done with the other one
[00:10:04] $10 and you're done with the other one it's it's probabilistic and you kind of
[00:10:06] it's it's probabilistic and you kind of want to see which one is better and what
[00:10:08] want to see which one is better and what would be the best policy to take in this
[00:10:10] would be the best policy to take in this setting so we'll come back to this
[00:10:12] setting so we'll come back to this question you'll formalize this and we'll
[00:10:14] question you'll formalize this and we'll go over this so okay so then you need to
[00:10:26] go over this so okay so then you need to actually compute what is the expected
[00:10:28] actually compute what is the expected utility right so and that's what we want
[00:10:29] utility right so and that's what we want to do right so so you might say oh I
[00:10:31] to do right so so you might say oh I want to I want to stay and then I get my
[00:10:33] want to I want to stay and then I get my four dollars and I want to quit and then
[00:10:35] four dollars and I want to quit and then I get 14 and maybe that is the way to go
[00:10:37] I get 14 and maybe that is the way to go that that could be a strategy but for
[00:10:39] that that could be a strategy but for doing that right like we were going to
[00:10:41] doing that right like we were going to actually talk about that before doing
[00:10:42] actually talk about that before doing that we are going to define what would
[00:10:44] that we are going to define what would be the optimal policy one other thing
[00:10:46] be the optimal policy one other thing that for this particular problem if
[00:10:49] that for this particular problem if you're going to keep in mind is I'll
[00:10:51] you're going to keep in mind is I'll talk about in a minute you find a policy
[00:10:53] talk about in a minute you find a policy but but the policy the way we define it
[00:10:55] but but the policy the way we define it is it's a function of state so if you
[00:10:57] is it's a function of state so if you decide to stay that is your policy if
[00:10:59] decide to stay that is your policy if you decide to not stay that is your
[00:11:01] you decide to not stay that is your policy like you're not allowing
[00:11:02] policy like you're not allowing searching right now like as I talked
[00:11:04] searching right now like as I talked about this later in a lecture but I'll
[00:11:06] about this later in a lecture but I'll come back to this problem okay so if you
[00:11:08] come back to this problem okay so if you if you decide that your policy the thing
[00:11:10] if you decide that your policy the thing you want to do is to just stay
[00:11:12] you want to do is to just stay keep staying this is the probability of
[00:11:15] keep staying this is the probability of like the total rewards that you're gonna
[00:11:17] like the total rewards that you're gonna get so you're gonna get four with some
[00:11:20] get so you're gonna get four with some probability and then if you're lucky
[00:11:21] probability and then if you're lucky you're gonna get eight and then even if
[00:11:23] you're gonna get eight and then even if you're luck here you're gonna get 12 and
[00:11:25] you're luck here you're gonna get 12 and if you're luckier you're gonna get 16
[00:11:26] if you're luckier you're gonna get 16 but but the probabilities are going to
[00:11:29] but but the probabilities are going to come down pretty much like really
[00:11:31] come down pretty much like really quickly so the thing we care about in
[00:11:34] quickly so the thing we care about in this setting is the expected utility
[00:11:36] this setting is the expected utility right an expectation like if I if I if I
[00:11:39] right an expectation like if I if I if I run this if I average all these possible
[00:11:41] run this if I average all these possible paths that I can do what would be the
[00:11:43] paths that I can do what would be the value that I get and for this particular
[00:11:45] value that I get and for this particular problem it turns out that an expectation
[00:11:47] problem it turns out that an expectation if you decide to stay you should get 12
[00:11:50] if you decide to stay you should get 12 so so you got really unlucky that you
[00:11:52] so so you got really unlucky that you got 8 but but in general in expectation
[00:11:54] got 8 but but in general in expectation you should decide to stay yeah and then
[00:11:57] you should decide to stay yeah and then we actually want to spend a little bit
[00:11:58] we actually want to spend a little bit of time in this lecture thinking about
[00:12:00] of time in this lecture thinking about how we get that 12 and then how to go
[00:12:02] how we get that 12 and then how to go about computing this expected utility
[00:12:04] about computing this expected utility and based on that how to decide what
[00:12:06] and based on that how to decide what policy to use right okay and then if you
[00:12:10] policy to use right okay and then if you decide to quit then then expected you
[00:12:13] decide to quit then then expected you chose either it's kind of obvious right
[00:12:14] chose either it's kind of obvious right because that you're quitting and that's
[00:12:16] because that you're quitting and that's with probability one you're getting ten
[00:12:18] with probability one you're getting ten dollars so you're just going to get ten
[00:12:20] dollars so you're just going to get ten dollars and that is the expected utility
[00:12:22] dollars and that is the expected utility of creating so so when you when I said
[00:12:31] of creating so so when you when I said when you roll a die I said if you get
[00:12:33] when you roll a die I said if you get one or two yeah you you say yeah and
[00:12:37] one or two yeah you you say yeah and then if you get the other so the 2/3 of
[00:12:39] then if you get the other so the 2/3 of it you continue so so it's a 1/3 2/3
[00:12:42] it you continue so so it's a 1/3 2/3 comes from there ok all right
[00:12:45] comes from there ok all right I'll come back to this example this is
[00:12:47] I'll come back to this example this is actually the running example throughout
[00:12:48] actually the running example throughout this lecture so
[00:12:56] what the lecture is about okay so let's
[00:12:59] what the lecture is about okay so let's actually I do wonder finish it an hour
[00:13:01] actually I do wonder finish it an hour that's why maybe I'm rushing things a
[00:13:03] that's why maybe I'm rushing things a little bit but we are going to talk
[00:13:05] little bit but we are going to talk about this problem like throughout the
[00:13:06] about this problem like throughout the class so so don't worry about it if it's
[00:13:08] class so so don't worry about it if it's not clear at the end of it we can
[00:13:09] not clear at the end of it we can clarify things okay all right so I do
[00:13:12] clarify things okay all right so I do want to formalize this problem the way I
[00:13:14] want to formalize this problem the way I want to formalize this problem is using
[00:13:16] want to formalize this problem is using an MDP so I want to I want to formalize
[00:13:18] an MDP so I want to I want to formalize this at the MA as a Markov decision
[00:13:20] this at the MA as a Markov decision process maybe I can just use this so in
[00:13:27] process maybe I can just use this so in Markov decision processes similar to
[00:13:29] Markov decision processes similar to search problem so you're going to up
[00:13:30] search problem so you're going to up stakes so in this particular game I'm
[00:13:33] stakes so in this particular game I'm going to have two states I'm either in
[00:13:35] going to have two states I'm either in the game or I'm out of the game so I'm
[00:13:39] the game or I'm out of the game so I'm in an end state where everything we
[00:13:41] in an end state where everything we ended you're out of the game you're done
[00:13:44] ended you're out of the game you're done okay so so those are my states then when
[00:13:48] okay so so those are my states then when I'm in these states
[00:13:49] I'm in these states I'm in each of these states I can take
[00:13:50] I'm in each of these states I can take an action and if I'm in in-state I can
[00:13:54] an action and if I'm in in-state I can take 2 actions right I can either decide
[00:13:56] take 2 actions right I can either decide to stay right or I can quit and if I if
[00:14:07] to stay right or I can quit and if I if I decide to stay from in-state that
[00:14:10] I decide to stay from in-state that takes me to something that I'm going to
[00:14:12] takes me to something that I'm going to call a chance node so a chance node is a
[00:14:16] call a chance node so a chance node is a node that represents state an action so
[00:14:20] node that represents state an action so it's not really like like the blue
[00:14:21] it's not really like like the blue things are my states but I'm creating
[00:14:23] things are my states but I'm creating these chance nodes as a way of kind of
[00:14:25] these chance nodes as a way of kind of going through this example to see where
[00:14:27] going through this example to see where things are going so so did these blue
[00:14:29] things are going so so did these blue states are going to be my state I mean s
[00:14:34] states are going to be my state I mean s these chance nodes are overstating
[00:14:37] these chance nodes are overstating actions so basically the snail tells me
[00:14:39] actions so basically the snail tells me that I started with in and I decided to
[00:14:42] that I started with in and I decided to stay an attached node here basically
[00:14:46] stay an attached node here basically tells me that I started with in and I
[00:14:49] tells me that I started with in and I decided to quit even though it's
[00:14:56] decided to quit even though it's deterministic me so I deterministically
[00:14:58] deterministic me so I deterministically go there but then from the chance node
[00:15:00] go there but then from the chance node that's where I'm introducing the
[00:15:02] that's where I'm introducing the probabilities so from the chance node I
[00:15:04] probabilities so from the chance node I can like Todd was the clay end up in
[00:15:06] can like Todd was the clay end up in there
[00:15:06] there different states in the case though it's
[00:15:08] different states in the case though it's also deterministic in the case of the
[00:15:10] also deterministic in the case of the quit in this case is deterministic yeah
[00:15:12] quit in this case is deterministic yeah so in the case of the quit we say 41 I'm
[00:15:15] so in the case of the quit we say 41 I'm going to end up in this end state so I'm
[00:15:17] going to end up in this end state so I'm gonna draw that with the note with the
[00:15:19] gonna draw that with the note with the edge that comes from my chance node and
[00:15:20] edge that comes from my chance node and I'm gonna save it by what if one I'm
[00:15:23] I'm gonna save it by what if one I'm going to get $10 and just be done okay
[00:15:27] going to get $10 and just be done okay what if you're in this state this is
[00:15:30] what if you're in this state this is actually the state where interesting
[00:15:31] actually the state where interesting things can happen with probability 2/3
[00:15:34] things can happen with probability 2/3 I'm going to go back to in and get $4 or
[00:15:41] I'm going to go back to in and get $4 or which probability 1/3 I'm going to end
[00:15:45] which probability 1/3 I'm going to end up an end and for $4 so so that is my
[00:15:53] up an end and for $4 so so that is my Markov decision process so so I had
[00:15:56] Markov decision process so so I had maybe you can keep track of a list of
[00:15:59] maybe you can keep track of a list of things we were defining in this lecture
[00:16:00] things we were defining in this lecture so we just defined States and then we
[00:16:04] so we just defined States and then we said well we're gonna have these chance
[00:16:05] said well we're gonna have these chance nodes because from these chance nodes
[00:16:07] nodes because from these chance nodes probabilistic who you're going to come
[00:16:09] probabilistic who you're going to come out of them depending on what happens to
[00:16:11] out of them depending on what happens to nature right and this is the decision
[00:16:13] nature right and this is the decision I've made
[00:16:14] I've made now nature kind of decides which one we
[00:16:17] now nature kind of decides which one we are going to end up at and and based on
[00:16:19] are going to end up at and and based on that we move forward yeah all right so
[00:16:23] that we move forward yeah all right so so more formally we have a bunch of
[00:16:25] so more formally we have a bunch of things when we define an MVP similar
[00:16:28] things when we define an MVP similar search problems like we now need to
[00:16:30] search problems like we now need to define the same set of things so so we
[00:16:32] define the same set of things so so we have a set of states in this case my
[00:16:34] have a set of states in this case my states are in an end
[00:16:36] states are in an end ok we have a start state I'm starting
[00:16:39] ok we have a start state I'm starting with in so that's my start state I have
[00:16:42] with in so that's my start state I have actions as a function of States so when
[00:16:45] actions as a function of States so when I ask what are the actions of the state
[00:16:47] I ask what are the actions of the state my actions are going to be stay or quit
[00:16:50] my actions are going to be stay or quit what are actions of end I don't have
[00:16:54] what are actions of end I don't have anything and state doesn't have any
[00:16:55] anything and state doesn't have any actions that come out of it and then we
[00:16:58] actions that come out of it and then we have these transition probabilities so
[00:17:00] have these transition probabilities so transition probabilities more formally
[00:17:03] transition probabilities more formally take a state an action and a new state
[00:17:06] take a state an action and a new state so as a s Prime and tell me what is the
[00:17:10] so as a s Prime and tell me what is the transition probability of that it's 1/3
[00:17:12] transition probability of that it's 1/3 in this case and then I have a reward
[00:17:15] in this case and then I have a reward which tells me how much was that
[00:17:17] which tells me how much was that rewarding
[00:17:17] rewarding four dollars so so I'm defining so when
[00:17:21] four dollars so so I'm defining so when I'm defining my MDP kind of the new
[00:17:24] I'm defining my MDP kind of the new things I'm defining is this transition
[00:17:25] things I'm defining is this transition probability it shows me if you're in
[00:17:28] probability it shows me if you're in state s and take action a and you end up
[00:17:31] state s and take action a and you end up in s prime what is the probability of
[00:17:32] in s prime what is the probability of that I'm an in I decide to stay and then
[00:17:36] that I'm an in I decide to stay and then I end up in end what's the probability
[00:17:38] I end up in end what's the probability of that that's one-third maybe I'm an in
[00:17:40] of that that's one-third maybe I'm an in I decide to quit I end up in end what's
[00:17:44] I decide to quit I end up in end what's the probability of that is equal to one
[00:17:46] the probability of that is equal to one okay and then over the same state action
[00:17:50] okay and then over the same state action state crimes like next States we're
[00:17:52] state crimes like next States we're going to end up ad we were going to
[00:17:53] going to end up ad we were going to define a reward which tells me how much
[00:17:56] define a reward which tells me how much money did I get or like how how good was
[00:17:59] money did I get or like how how good was that so it was four dollars in this case
[00:18:01] that so it was four dollars in this case or or if I decide to quit I got ten
[00:18:04] or or if I decide to quit I got ten dollars and if you remember in the case
[00:18:08] dollars and if you remember in the case of search problems you're talking about
[00:18:10] of search problems you're talking about cost I'm just flipping the sign here we
[00:18:12] cost I'm just flipping the sign here we wanted to minimize cost here we want to
[00:18:14] wanted to minimize cost here we want to maximize the reward it's just a more
[00:18:15] maximize the reward it's just a more optimistic view of the world I guess so
[00:18:19] optimistic view of the world I guess so so that is what the rewards are going to
[00:18:21] so that is what the rewards are going to be you find on you also have this is end
[00:18:24] be you find on you also have this is end function which again similar to search
[00:18:26] function which again similar to search problems just checks if we are in an end
[00:18:28] problems just checks if we are in an end state or not and in addition to that we
[00:18:31] state or not and in addition to that we have something that's called a discount
[00:18:32] have something that's called a discount factor it's it's this value gamma you
[00:18:35] factor it's it's this value gamma you choose between zero and one and I'll
[00:18:38] choose between zero and one and I'll talk about this later
[00:18:40] talk about this later don't worry about it right now but it's
[00:18:42] don't worry about it right now but it's a thing we define for our search problem
[00:18:44] a thing we define for our search problem for our any piece all right so how do I
[00:18:49] for our any piece all right so how do I compare this with search again these
[00:18:51] compare this with search again these were the things that we had in a search
[00:18:53] were the things that we had in a search problem we had the successor function
[00:18:54] problem we had the successor function that would deterministically take me to
[00:18:57] that would deterministically take me to s prime and we had this cost function I
[00:18:59] s prime and we had this cost function I would tell me what was the cost of being
[00:19:01] would tell me what was the cost of being in state s and taking action a so so the
[00:19:04] in state s and taking action a so so the major things that are changed is that
[00:19:05] major things that are changed is that instead of a successor function
[00:19:07] instead of a successor function I have transition probabilities these
[00:19:09] I have transition probabilities these T's that that basically terminal was the
[00:19:11] T's that that basically terminal was the probability of starting in s taking
[00:19:13] probability of starting in s taking action a and ending up in s prime and
[00:19:15] action a and ending up in s prime and then the cost just became reward okay so
[00:19:18] then the cost just became reward okay so those are kind of the major differences
[00:19:20] those are kind of the major differences between search and MVP because things
[00:19:22] between search and MVP because things are things are not deterministic
[00:19:26] alright so that was the formalism now I
[00:19:29] alright so that was the formalism now I can define any any MVP model any Markov
[00:19:33] can define any any MVP model any Markov decision process and then one thing just
[00:19:36] decision process and then one thing just one thing to point out is these
[00:19:37] one thing to point out is these transition probabilities to see
[00:19:39] transition probabilities to see basically specifies the probability of
[00:19:42] basically specifies the probability of ending up in in state s prime if you
[00:19:44] ending up in in state s prime if you take action a in state s so these are
[00:19:47] take action a in state s so these are probabilities right so so for example
[00:19:50] probabilities right so so for example you know like we have done this example
[00:19:52] you know like we have done this example but let's just do it under on the slides
[00:19:53] but let's just do it under on the slides again if I'm in state in I take action
[00:19:56] again if I'm in state in I take action quit I end up an end what's the
[00:19:59] quit I end up an end what's the probability of that one and then if I'm
[00:20:02] probability of that one and then if I'm stayed in I take action stay I end up in
[00:20:06] stayed in I take action stay I end up in state in again what's the probability of
[00:20:08] state in again what's the probability of that I ended up in in again two-thirds
[00:20:12] that I ended up in in again two-thirds and then if I'm stay in I take action
[00:20:15] and then if I'm stay in I take action stay I end up in end it's probability of
[00:20:17] stay I end up in end it's probability of that one third yeah and and these are
[00:20:21] that one third yeah and and these are probabilities so what that means is they
[00:20:23] probabilities so what that means is they need to kind of add up to one but one
[00:20:26] need to kind of add up to one but one thing to notice is well just what is
[00:20:28] thing to notice is well just what is gonna add up to one like like all of the
[00:20:30] gonna add up to one like like all of the things in the column are not gonna add
[00:20:31] things in the column are not gonna add up to one
[00:20:32] up to one the thing that's going to add up to one
[00:20:34] the thing that's going to add up to one is if you consider all possible these
[00:20:36] is if you consider all possible these different X Prime's that you're going to
[00:20:38] different X Prime's that you're going to end up at those probabilities are gonna
[00:20:41] end up at those probabilities are gonna add up to one so so if you look at this
[00:20:45] add up to one so so if you look at this table again if you look at deciding and
[00:20:49] table again if you look at deciding and being stay in and taking actions stay
[00:20:51] being stay in and taking actions stay then the probabilities that we have for
[00:20:54] then the probabilities that we have for difference s Prime's are two-thirds and
[00:20:56] difference s Prime's are two-thirds and one-third and those two are the things
[00:20:58] one-third and those two are the things that are going to add up to one and in
[00:21:00] that are going to add up to one and in the first case if you're in stay in and
[00:21:02] the first case if you're in stay in and you decide to quit then wherever
[00:21:04] you decide to quit then wherever whatever else primes you're gonna end up
[00:21:06] whatever else primes you're gonna end up had in this case it's just the end State
[00:21:07] had in this case it's just the end State those probabilities are gonna add up to
[00:21:09] those probabilities are gonna add up to one so so more formally what that means
[00:21:11] one so so more formally what that means is if I'm summing over s primes these
[00:21:14] is if I'm summing over s primes these new states that I'm going to end up at
[00:21:16] new states that I'm going to end up at the transition probabilities need to add
[00:21:18] the transition probabilities need to add up to one because they're basically
[00:21:20] up to one because they're basically probabilities that tell me what are the
[00:21:23] probabilities that tell me what are the what are the things that can happen if I
[00:21:25] what are the things that can happen if I take an action yeah and then this
[00:21:29] take an action yeah and then this transition probabilities are going to be
[00:21:31] transition probabilities are going to be non-negative because there are
[00:21:33] non-negative because there are probabilities so that's also another
[00:21:35] probabilities so that's also another property so
[00:21:38] property so usual things alright so so that's a
[00:21:42] usual things alright so so that's a search problem let's actually formalize
[00:21:43] search problem let's actually formalize another search problem this is let's
[00:21:45] another search problem this is let's actually try to code this up so what is
[00:21:48] actually try to code this up so what is this search problem this is the
[00:21:51] this search problem this is the problem so remember the problem I
[00:21:52] problem so remember the problem I have blocks 1 through n what I want to
[00:21:56] have blocks 1 through n what I want to do is I have two possible actions I can
[00:21:58] do is I have two possible actions I can either walk from state s to state s plus
[00:22:01] either walk from state s to state s plus 1 or I can take the magic tram that
[00:22:04] 1 or I can take the magic tram that takes me from state s to stay to s if I
[00:22:06] takes me from state s to stay to s if I walk that costs 1 minutes
[00:22:09] walk that costs 1 minutes okay means reward of that is minus 1 if
[00:22:12] okay means reward of that is minus 1 if I if I take the tram that costs 2
[00:22:15] I if I take the tram that costs 2 minutes and that means that the reward
[00:22:17] minutes and that means that the reward of that is minus 2 okay and then the
[00:22:21] of that is minus 2 okay and then the question was how like how do we want to
[00:22:22] question was how like how do we want to travel from from 1 to N in the least
[00:22:25] travel from from 1 to N in the least amount of time so so nothing here is is
[00:22:28] amount of time so so nothing here is is probable stick yet right so I'm gonna
[00:22:30] probable stick yet right so I'm gonna add an extra thing here which says the
[00:22:33] add an extra thing here which says the tram is going to fail with probability
[00:22:35] tram is going to fail with probability 0.5 so I'm gonna decide maybe to take a
[00:22:38] 0.5 so I'm gonna decide maybe to take a trial at some point and that tram can
[00:22:40] trial at some point and that tram can can fail with probability 0.5 if it
[00:22:43] can fail with probability 0.5 if it fails I end up in my state like I don't
[00:22:44] fails I end up in my state like I don't go anywhere and and actually like in
[00:22:47] go anywhere and and actually like in this case you're assuming you're still
[00:22:48] this case you're assuming you're still losing 2 minutes so if I decide to take
[00:22:50] losing 2 minutes so if I decide to take the tram I'm gonna lose 2 minutes
[00:22:52] the tram I'm gonna lose 2 minutes maybe you'll fail maybe you will not
[00:22:54] maybe you'll fail maybe you will not okay all right so let's try to formalize
[00:22:58] okay all right so let's try to formalize this so we're gonna take our tram
[00:23:02] this so we're gonna take our tram problem from two lectures ago so this is
[00:23:04] problem from two lectures ago so this is from search one we're gonna just copy
[00:23:07] from search one we're gonna just copy that
[00:23:08] that so alright so this was what we had from
[00:23:13] so alright so this was what we had from last time he had this transportation
[00:23:15] last time he had this transportation problem and we had all these like
[00:23:17] problem and we had all these like algorithms to solve the search problem
[00:23:19] algorithms to solve the search problem we don't really need them because we
[00:23:20] we don't really need them because we have a new problem so it's just get rid
[00:23:22] have a new problem so it's just get rid of them and now I just want to formalize
[00:23:25] of them and now I just want to formalize an MDP so it's a transportation MDP ok
[00:23:29] an MDP so it's a transportation MDP ok the initialization looks ok start state
[00:23:33] the initialization looks ok start state looks ok I'm starting from 1 is end
[00:23:35] looks ok I'm starting from 1 is end looks ok so the thing I'm going to
[00:23:38] looks ok so the thing I'm going to change is the first off I need to add
[00:23:40] change is the first off I need to add this actions function ok so what would
[00:23:44] this actions function ok so what would actions do it's going to return a list
[00:23:46] actions do it's going to return a list of actions that are potential actions
[00:23:50] of actions that are potential actions and you give
[00:23:50] and you give State so I just copy-paste it stuff from
[00:23:53] State so I just copy-paste it stuff from down there you just edit so it's going
[00:23:56] down there you just edit so it's going to return a list of valid actions okay
[00:23:59] to return a list of valid actions okay so what are the valid actions I can take
[00:24:03] so what are the valid actions I can take I can either walk or I can tram so I'm
[00:24:05] I can either walk or I can tram so I'm gonna remove all these extra things that
[00:24:06] gonna remove all these extra things that I had from before and just keep it to be
[00:24:09] I had from before and just keep it to be I'm either walking or I'm taking the
[00:24:11] I'm either walking or I'm taking the tram okay as long as it's a valid state
[00:24:13] tram okay as long as it's a valid state so so that looks right for actions the
[00:24:17] so so that looks right for actions the only thing we had was a successor and
[00:24:19] only thing we had was a successor and cost function so so now we want to just
[00:24:21] cost function so so now we want to just change that and return these transition
[00:24:24] change that and return these transition probabilities and under wort so so it's
[00:24:26] probabilities and under wort so so it's basically the successor probabilities
[00:24:28] basically the successor probabilities and reward okay so I'm putting those two
[00:24:32] and reward okay so I'm putting those two together similar to before we had
[00:24:33] together similar to before we had successor and cost now I'm returning
[00:24:35] successor and cost now I'm returning probabilities and rewards okay so so
[00:24:38] probabilities and rewards okay so so what this function is going to return is
[00:24:40] what this function is going to return is it's going to return this new state is s
[00:24:41] it's going to return this new state is s Prime I'm going to end up at and the
[00:24:44] Prime I'm going to end up at and the probability value for that under a word
[00:24:45] probability value for that under a word of that okay so so given that I'm
[00:24:48] of that okay so so given that I'm starting in state s and I'm taking
[00:24:50] starting in state s and I'm taking action a then what are the potential s
[00:24:53] action a then what are the potential s crimes that I can end up at and what are
[00:24:56] crimes that I can end up at and what are the probabilities of that like then what
[00:24:58] the probabilities of that like then what is T of si s Prime and what is the
[00:25:01] is T of si s Prime and what is the reward of that what is the reward of si
[00:25:02] reward of that what is the reward of si s Prime I want have a function to just
[00:25:04] s Prime I want have a function to just return to these so I can call it later
[00:25:06] return to these so I can call it later okay
[00:25:10] all right so I need to basically check
[00:25:18] all right so I need to basically check like for for each one of these actions I
[00:25:21] like for for each one of these actions I can or for action walk what happens for
[00:25:24] can or for action walk what happens for action walk what's new state I'm gonna
[00:25:27] action walk what's new state I'm gonna end up back well I'm gonna end up at s
[00:25:30] end up back well I'm gonna end up at s plus one it's a deterministic action so
[00:25:33] plus one it's a deterministic action so I'm gonna end up there with Parvati one
[00:25:35] I'm gonna end up there with Parvati one and what's the reward of that minus one
[00:25:39] and what's the reward of that minus one because it's one minute costs so it's
[00:25:41] because it's one minute costs so it's minus one report then for action tram we
[00:25:45] minus one report then for action tram we kind of do the same thing but you have
[00:25:47] kind of do the same thing but you have two options here I can I can end up in
[00:25:50] two options here I can I can end up in 2's tram doesn't fail I end up into s
[00:25:53] 2's tram doesn't fail I end up into s this probability 0.5 that cost that
[00:25:55] this probability 0.5 that cost that reward of that is minus 2 or the other
[00:25:58] reward of that is minus 2 or the other option is I'm going to end up in state s
[00:26:01] option is I'm going to end up in state s cuz I didn't go anywhere because we
[00:26:03] cuz I didn't go anywhere because we probability point 5 to Tran
[00:26:04] probability point 5 to Tran to fail and that cut that nerve world of
[00:26:07] to fail and that cut that nerve world of that is - - and that's pretty much it
[00:26:10] that is - - and that's pretty much it that that is my my MVP so I can just
[00:26:14] that that is my my MVP so I can just define this for a city with let's say
[00:26:16] define this for a city with let's say ten blocks oh and we need to have the
[00:26:19] ten blocks oh and we need to have the discount factor but we'll talk about
[00:26:21] discount factor but we'll talk about that later let's say it's just one for
[00:26:23] that later let's say it's just one for now yeah and they'll use right I'm
[00:26:26] now yeah and they'll use right I'm writing this other states functions were
[00:26:28] writing this other states functions were later but that look right just
[00:26:32] later but that look right just formalized this MVP so let's check if it
[00:26:37] formalized this MVP so let's check if it does the right thing so maybe we want to
[00:26:39] does the right thing so maybe we want to know what are the actions from state 3
[00:26:41] know what are the actions from state 3 what are the actions from state 3 oh we
[00:26:44] what are the actions from state 3 oh we need to remove this you to a function
[00:26:46] need to remove this you to a function from before because we don't have it in
[00:26:47] from before because we don't have it in the folder move that what are the
[00:26:51] the folder move that what are the actions from state 3 I have 10 blocks if
[00:26:55] actions from state 3 I have 10 blocks if I'm in state 3 I can either walk or tram
[00:26:57] I'm in state 3 I can either walk or tram right or one of them is fine right so so
[00:27:01] right or one of them is fine right so so that did the right thing
[00:27:03] that did the right thing maybe we want to just check if this
[00:27:06] maybe we want to just check if this successor probability and your horde
[00:27:08] successor probability and your horde function does the right thing so maybe
[00:27:10] function does the right thing so maybe maybe we can try that out for state 3
[00:27:13] maybe we can try that out for state 3 and walk so so for step 3 and action
[00:27:15] and walk so so for step 3 and action walk then what do we get well we end up
[00:27:18] walk then what do we get well we end up in 4 and that it that is with
[00:27:22] in 4 and that it that is with probability 1 with the reward of minus 1
[00:27:25] probability 1 with the reward of minus 1 okay let's try that for tram again
[00:27:31] okay let's try that for tram again remember tram can fail so I'm gonna get
[00:27:33] remember tram can fail so I'm gonna get two things here so these are the things
[00:27:36] two things here so these are the things I'm gonna get it for tram I'm going to
[00:27:38] I'm gonna get it for tram I'm going to either end up in 6 with probability 0.5
[00:27:40] either end up in 6 with probability 0.5 with the reward of minus 2 or I will not
[00:27:43] with the reward of minus 2 or I will not go anywhere I'm still at 3 with
[00:27:45] go anywhere I'm still at 3 with probability 0.5 and that is with the
[00:27:48] probability 0.5 and that is with the reward of minus 2 okay all right so that
[00:27:54] reward of minus 2 okay all right so that was just a tram problem and we formalize
[00:27:58] was just a tram problem and we formalize it as an MDP again the reason it's an
[00:28:01] it as an MDP again the reason it's an MVP here is is that the tram can fail
[00:28:03] MVP here is is that the tram can fail with probability 0.5 so we added that in
[00:28:06] with probability 0.5 so we added that in let me define our transition function
[00:28:07] let me define our transition function and our problem and our reward function
[00:28:10] and our problem and our reward function okay all right everyone happy with how
[00:28:15] okay all right everyone happy with how we are defining MVPs yeah
[00:28:18] we are defining MVPs yeah okay pretty similar to search problems
[00:28:19] okay pretty similar to search problems except for now we have these
[00:28:20] except for now we have these probabilities alright so so now I have
[00:28:24] probabilities alright so so now I have define an MVP that's great
[00:28:27] define an MVP that's great the next question that in general we
[00:28:29] the next question that in general we would like to answer is to give a
[00:28:31] would like to answer is to give a solution right so there's a question so
[00:28:37] solution right so there's a question so the Markov part means that you just
[00:28:38] the Markov part means that you just depends also when you you just depend on
[00:28:41] depends also when you you just depend on the state and this current state like a
[00:28:43] the state and this current state like a wavy to find our state you remember our
[00:28:44] wavy to find our state you remember our state is sufficient for us to make
[00:28:46] state is sufficient for us to make optimal decisions for the future so the
[00:28:48] optimal decisions for the future so the Markov part means that your Markov and
[00:28:49] Markov part means that your Markov and you only depends on the current state
[00:28:50] you only depends on the current state and actions to end up in that probable
[00:28:53] and actions to end up in that probable equally end up in the next next so yeah
[00:28:57] equally end up in the next next so yeah so the interesting question who would
[00:28:58] so the interesting question who would like to do is well we want to find a
[00:29:00] like to do is well we want to find a solution I want to figure out what is
[00:29:02] solution I want to figure out what is the optimal path to actually solve this
[00:29:04] the optimal path to actually solve this problem and again if you remember search
[00:29:06] problem and again if you remember search problems the solution to search problems
[00:29:08] problems the solution to search problems was just a sequence of action said
[00:29:10] was just a sequence of action said that's all I had like a sequence of
[00:29:12] that's all I had like a sequence of actions a path that was a solution and
[00:29:14] actions a path that was a solution and the reason that was a good solution was
[00:29:16] the reason that was a good solution was like everything was deterministic so I
[00:29:18] like everything was deterministic so I could just give you the path and then
[00:29:20] could just give you the path and then that was what you would fall but in the
[00:29:22] that was what you would fall but in the case of MVPs the way we are defining a
[00:29:24] case of MVPs the way we are defining a solution is by using this notion of a
[00:29:26] solution is by using this notion of a policy so a policy let me actually write
[00:29:30] policy so a policy let me actually write that here so so you've defined an MVP
[00:29:34] that here so so you've defined an MVP but now I want to say well what is a
[00:29:35] but now I want to say well what is a solution of an MVP a solution of an
[00:29:38] solution of an MVP a solution of an Markov the same process is a policy PI
[00:29:41] Markov the same process is a policy PI of s so and this policy basically goes
[00:29:46] of s so and this policy basically goes from States so it takes any state and it
[00:29:50] from States so it takes any state and it tells me what is the part was a
[00:29:52] tells me what is the part was a potential action that I would get for
[00:29:55] potential action that I would get for that state okay so the policy is a
[00:29:58] that state okay so the policy is a function it's a mapping from each state
[00:30:00] function it's a mapping from each state s in the set of all possible States to
[00:30:03] s in the set of all possible States to to an action in the set of all possible
[00:30:05] to an action in the set of all possible actions okay so in the case of volcano
[00:30:09] actions okay so in the case of volcano crossing like I can have something like
[00:30:10] crossing like I can have something like this I can be in state 1 1 and then the
[00:30:14] this I can be in state 1 1 and then the policy of that state could be going
[00:30:16] policy of that state could be going south or I can be in state 2 1 and a
[00:30:19] south or I can be in state 2 1 and a policy for that state is if this was a
[00:30:23] policy for that state is if this was a search problem I would just give it path
[00:30:25] search problem I would just give it path I would just say go south and then go to
[00:30:27] I would just say go south and then go to go east and go north right so so that
[00:30:29] go east and go north right so so that would be my solution but but again like
[00:30:31] would be my solution but but again like if
[00:30:31] if decide that well the policy at 1-1 is to
[00:30:34] decide that well the policy at 1-1 is to go south there is no reason for you to
[00:30:36] go south there is no reason for you to end up at South right because this thing
[00:30:38] end up at South right because this thing this thing is probabilistic so the best
[00:30:40] this thing is probabilistic so the best thing I can do is for every state just
[00:30:43] thing I can do is for every state just tell you what is the best thing you can
[00:30:44] tell you what is the best thing you can do for that particular state and and
[00:30:46] do for that particular state and and that's why we are defining a policy as
[00:30:47] that's why we are defining a policy as opposed to get giving like a full path
[00:30:50] opposed to get giving like a full path all right so policy is the thing you're
[00:30:53] all right so policy is the thing you're looking for and ideally I would like to
[00:30:56] looking for and ideally I would like to find this best policy that would just
[00:30:58] find this best policy that would just give me the right solution but in order
[00:31:00] give me the right solution but in order to get there I want to spend a little
[00:31:02] to get there I want to spend a little bit of time talking about how good a
[00:31:04] bit of time talking about how good a policy would be so and that's kind of
[00:31:06] policy would be so and that's kind of this idea of evaluating a policy so so
[00:31:09] this idea of evaluating a policy so so in this middle section I don't want to
[00:31:11] in this middle section I don't want to try to find a policy I just assume you
[00:31:13] try to find a policy I just assume you give me a policy and I can evaluate it
[00:31:15] give me a policy and I can evaluate it and tell you how good that is so so
[00:31:17] and tell you how good that is so so that's the plan for the middle section
[00:31:19] that's the plan for the middle section yeah all right everyone happy it so so
[00:31:23] yeah all right everyone happy it so so far all I've done is I've defined an MVP
[00:31:25] far all I've done is I've defined an MVP which is very similar to a search
[00:31:26] which is very similar to a search problem
[00:31:27] problem it's just probabilistic okay so so how
[00:31:30] it's just probabilistic okay so so how would we evaluate a policy so if you
[00:31:34] would we evaluate a policy so if you give me a policy which basically tells
[00:31:36] give me a policy which basically tells me at every state s take some action
[00:31:39] me at every state s take some action then that policy is going to generate a
[00:31:42] then that policy is going to generate a random path right I can get multiple
[00:31:44] random path right I can get multiple random paths because nature behaves
[00:31:46] random paths because nature behaves differently and the world is uncertain
[00:31:47] differently and the world is uncertain so I might get a bunch of random paths
[00:31:49] so I might get a bunch of random paths and then those are all random variables
[00:31:52] and then those are all random variables random paths sorry and then for each one
[00:31:55] random paths sorry and then for each one of those random paths I can I can define
[00:31:57] of those random paths I can I can define a utility so so what is a utility
[00:31:59] a utility so so what is a utility utility is just going to be the sum of
[00:32:02] utility is just going to be the sum of rewards that I'm going to get over that
[00:32:04] rewards that I'm going to get over that path and I'm calling it as the discount
[00:32:07] path and I'm calling it as the discount at some of the rewards remember that the
[00:32:10] at some of the rewards remember that the scouts will talk about that but but you
[00:32:12] scouts will talk about that but but you can't you can just count the future but
[00:32:14] can't you can just count the future but before now just assume it's just a sum
[00:32:16] before now just assume it's just a sum of rewards on that time okay so you took
[00:32:19] of rewards on that time okay so you took the utility that you're going to get is
[00:32:21] the utility that you're going to get is also going to be a random variable right
[00:32:23] also going to be a random variable right because if you think about a policy the
[00:32:27] because if you think about a policy the policy is going to generate a bunch of
[00:32:28] policy is going to generate a bunch of random paths and and utility is just
[00:32:31] random paths and and utility is just going to be the sum of rewards of each
[00:32:33] going to be the sum of rewards of each one of those so it's a random variable
[00:32:34] one of those so it's a random variable so if you remember this example right so
[00:32:39] so if you remember this example right so I can I can basically have a path that
[00:32:41] I can I can basically have a path that tells me starting in
[00:32:43] tells me starting in and then stay and then that ends right
[00:32:46] and then stay and then that ends right so so this is one random hat and for
[00:32:48] so so this is one random hat and for this particular on your path throw what
[00:32:50] this particular on your path throw what is it so see I'm gonna get I'm just
[00:32:51] is it so see I'm gonna get I'm just gonna get four dollars that's one
[00:32:53] gonna get four dollars that's one possible thing that can happen if my if
[00:32:56] possible thing that can happen if my if my policy is to let's say stay like
[00:33:00] my policy is to let's say stay like there is no reason for for the game to
[00:33:02] there is no reason for for the game to end right here right like I can have a
[00:33:04] end right here right like I can have a lot of different types of random path I
[00:33:06] lot of different types of random path I can have a situation where I'm staying
[00:33:07] can have a situation where I'm staying three times and an after that ending the
[00:33:10] three times and an after that ending the game and utility of that is twelve we
[00:33:12] game and utility of that is twelve we can have the situation where we have
[00:33:13] can have the situation where we have stay stay and end that's the situation
[00:33:15] stay stay and end that's the situation it's all like we told you have eight and
[00:33:18] it's all like we told you have eight and so on so so you're getting all these
[00:33:20] so on so so you're getting all these utilities for all these random paths so
[00:33:22] utilities for all these random paths so so these utilities are also going to be
[00:33:24] so these utilities are also going to be just random variables okay so I can't
[00:33:27] just random variables okay so I can't really play your arm with the utility
[00:33:28] really play your arm with the utility that's not telling me anything you know
[00:33:29] that's not telling me anything you know it's telling me something but
[00:33:31] it's telling me something but surrounding variable I can't optimize
[00:33:33] surrounding variable I can't optimize that so instead we need to define
[00:33:35] that so instead we need to define something that you can actually play
[00:33:37] something that you can actually play around with it and and that is this idea
[00:33:39] around with it and and that is this idea of a value which is just an expected
[00:33:42] of a value which is just an expected utility so so the value of a policy is
[00:33:45] utility so so the value of a policy is the expected utility of that policy and
[00:33:48] the expected utility of that policy and and that's not a random variable anymore
[00:33:49] and that's not a random variable anymore that's actually like a number and I can
[00:33:51] that's actually like a number and I can I can compute that number I can compute
[00:33:53] I can compute that number I can compute that number for every state and then
[00:33:55] that number for every state and then just play around with value particularly
[00:34:05] just play around with value particularly all possible so the question is yeah so
[00:34:07] all possible so the question is yeah so when you say value of policy is a policy
[00:34:09] when you say value of policy is a policy basically telling me it's a policy
[00:34:13] basically telling me it's a policy basically telling me what what is the
[00:34:16] basically telling me what what is the strategy for all possible states well
[00:34:17] strategy for all possible states well you're defining a policy has a function
[00:34:19] you're defining a policy has a function of state right so and value the same
[00:34:22] of state right so and value the same thing as a function of state I might ask
[00:34:23] thing as a function of state I might ask what is the value of being an in so the
[00:34:26] what is the value of being an in so the value of being in in is is an following
[00:34:29] value of being in in is is an following and following policies stay is going to
[00:34:32] and following policies stay is going to be the value of following policies stay
[00:34:36] be the value of following policies stay from this particular state which is
[00:34:37] from this particular state which is expected utility of that which is which
[00:34:39] expected utility of that which is which is basically that twelve value there I
[00:34:42] is basically that twelve value there I could ask it for about any other state
[00:34:44] could ask it for about any other state to so I can be in any other state and
[00:34:46] to so I can be in any other state and then say well what's the value of that
[00:34:47] then say well what's the value of that and and when we do value iteration and
[00:34:49] and and when we do value iteration and you actually need to compute this value
[00:34:50] you actually need to compute this value for all states to kind of have an idea
[00:34:53] for all states to kind of have an idea of how to get from one state
[00:34:55] of how to get from one state but way to being in state in yeah and
[00:35:00] but way to being in state in yeah and the policy given your state in taking
[00:35:02] the policy given your state in taking the action state yes yeah and that is
[00:35:05] the action state yes yeah and that is that is what 12 is okay and 12 like we
[00:35:08] that is what 12 is okay and 12 like we kind of empirically have seen it's 12
[00:35:10] kind of empirically have seen it's 12 but we haven't like shown how to get 12
[00:35:12] but we haven't like shown how to get 12 yet okay all right so um actually let me
[00:35:16] yet okay all right so um actually let me write these my list of things so we
[00:35:18] write these my list of things so we talked about the policy what else did we
[00:35:20] talked about the policy what else did we talk about we talked about utility so
[00:35:23] talk about we talked about utility so what is utility utility we said it's our
[00:35:25] what is utility utility we said it's our rewards so if I get like reward one then
[00:35:29] rewards so if I get like reward one then I get reward - it's a discount that
[00:35:32] I get reward - it's a discount that someone rewards so I'm gonna use this
[00:35:33] someone rewards so I'm gonna use this gamma which is that discount that I'll
[00:35:35] gamma which is that discount that I'll talk about in a little bit times reward
[00:35:37] talk about in a little bit times reward 2 plus gamma squared 2 times reward 3
[00:35:40] 2 plus gamma squared 2 times reward 3 and so on so utilities you give me a
[00:35:42] and so on so utilities you give me a random path and I just sum up the
[00:35:44] random path and I just sum up the rewards of that
[00:35:45] rewards of that imagine if gamma is 1 I'm just summing
[00:35:47] imagine if gamma is 1 I'm just summing up the rewards if gamma is not 1 on
[00:35:49] up the rewards if gamma is not 1 on something I'm looking at this discounted
[00:35:51] something I'm looking at this discounted song okay so so that is utility but
[00:35:54] song okay so so that is utility but value so this is utility value is just
[00:35:58] value so this is utility value is just the expected utility okay so you give me
[00:36:03] the expected utility okay so you give me a bunch of random path I can compute
[00:36:05] a bunch of random path I can compute their utilities I can just sum them up
[00:36:07] their utilities I can just sum them up and average them and that gives me value
[00:36:14] that's a very good question we'll get
[00:36:16] that's a very good question we'll get back there so so so in general in okay
[00:36:20] back there so so so in general in okay if it is a cyclic it is fine but if you
[00:36:23] if it is a cyclic it is fine but if you have a cyclic graph you want your yama
[00:36:25] have a cyclic graph you want your yama to be less than one and we will talk
[00:36:27] to be less than one and we will talk about that when we get to the
[00:36:28] about that when we get to the convergence of all right okay all right
[00:36:35] convergence of all right okay all right so so let's go to the this particular
[00:36:39] so so let's go to the this particular volcano crossing example so in this case
[00:36:45] volcano crossing example so in this case like I can run this game and every time
[00:36:48] like I can run this game and every time I run it I'm gonna get a different
[00:36:49] I run it I'm gonna get a different utility cuz like I'm gonna end up in
[00:36:52] utility cuz like I'm gonna end up in some random path some of them end up in
[00:36:54] some random path some of them end up in the volcano that's pretty bad right so I
[00:36:56] the volcano that's pretty bad right so I get different utility values utilities
[00:37:00] get different utility values utilities but the value which is expected utility
[00:37:02] but the value which is expected utility is not changing really it's just around
[00:37:04] is not changing really it's just around 3.7 which is just the average of these
[00:37:06] 3.7 which is just the average of these utilities so I can keep running this
[00:37:08] utilities so I can keep running this getting the
[00:37:08] getting the different utilities but what value is
[00:37:10] different utilities but what value is this one number that I can I can talk
[00:37:12] this one number that I can I can talk about and that's the value of this
[00:37:15] about and that's the value of this particular state and that tells me like
[00:37:17] particular state and that tells me like what would be the best policy that I can
[00:37:19] what would be the best policy that I can take and what's the best amount of
[00:37:20] take and what's the best amount of utility that I can get from an
[00:37:22] utility that I can get from an expectation from that state all right so
[00:37:29] expectation from that state all right so we've been talking about this utility
[00:37:31] we've been talking about this utility I've actually written that already on
[00:37:32] I've actually written that already on the board so utility is going to be a
[00:37:34] the board so utility is going to be a discounted somewhat rewards and then
[00:37:36] discounted somewhat rewards and then we've been talking about this discount
[00:37:38] we've been talking about this discount factor and yeah yield of the discount
[00:37:40] factor and yeah yield of the discount factor is I might like care about the
[00:37:43] factor is I might like care about the future differently from how much I care
[00:37:45] future differently from how much I care about now so so for example if you give
[00:37:48] about now so so for example if you give me four dollars today and you give me
[00:37:50] me four dollars today and you give me four dollars tomorrow look if that four
[00:37:52] four dollars tomorrow look if that four dollars tomorrow is the same kind of
[00:37:54] dollars tomorrow is the same kind of amount and has the same value to me as
[00:37:56] amount and has the same value to me as today then then I might it's kind of the
[00:37:58] today then then I might it's kind of the same idea of having a discount counter
[00:38:00] same idea of having a discount counter of one account and discount of 1 gamma 1
[00:38:03] of one account and discount of 1 gamma 1 so you're saving for the future the
[00:38:06] so you're saving for the future the values of things in the future is the
[00:38:07] values of things in the future is the same amount if you give me 4 dollars now
[00:38:09] same amount if you give me 4 dollars now if you give me four dollars 10 years
[00:38:11] if you give me four dollars 10 years from now it's going to be 4 dollars I
[00:38:13] from now it's going to be 4 dollars I care about it like with four dollars
[00:38:16] care about it like with four dollars amount and I can just add things up but
[00:38:19] amount and I can just add things up but it could also be the case like you might
[00:38:20] it could also be the case like you might be in a situation in a particular MVP
[00:38:23] be in a situation in a particular MVP where you don't care about the future as
[00:38:25] where you don't care about the future as much maybe you give me four dollars ten
[00:38:27] much maybe you give me four dollars ten years from now and that's that doesn't
[00:38:28] years from now and that's that doesn't like I don't have any value for that so
[00:38:30] like I don't have any value for that so so then if that is the case and you just
[00:38:33] so then if that is the case and you just want to live in the moment and you don't
[00:38:34] want to live in the moment and you don't care about the values you're gonna get
[00:38:36] care about the values you're gonna get in the future then that's kind of the
[00:38:38] in the future then that's kind of the other extreme when justice gamma this
[00:38:40] other extreme when justice gamma this discount is equal to zero so so that is
[00:38:43] discount is equal to zero so so that is a situation that if I get four dollars
[00:38:45] a situation that if I get four dollars in the future that they don't like value
[00:38:48] in the future that they don't like value like I don't have any value to me
[00:38:49] like I don't have any value to me they're just like are zero to me so so I
[00:38:51] they're just like are zero to me so so I only care about right now living in the
[00:38:53] only care about right now living in the moment
[00:38:54] moment what is them on I'm gonna get and in
[00:38:56] what is them on I'm gonna get and in reality you're like somewhere in between
[00:38:58] reality you're like somewhere in between right like we're not just this case
[00:39:00] right like we're not just this case we're real living in a moment we're also
[00:39:02] we're real living in a moment we're also not this case that that everything is
[00:39:04] not this case that that everything is just the same amounts like right now or
[00:39:06] just the same amounts like right now or in the future like in balance life is a
[00:39:08] in the future like in balance life is a setting where we have some discount
[00:39:10] setting where we have some discount factor it's it's not zero it's not one
[00:39:13] factor it's it's not zero it's not one it actually discounts values in the
[00:39:15] it actually discounts values in the future because future maybe doesn't have
[00:39:17] future because future maybe doesn't have the same values now but but we still
[00:39:20] the same values now but but we still value things and if
[00:39:21] value things and if like four dollars still something in the
[00:39:23] like four dollars still something in the future and that's where we pick like a
[00:39:25] future and that's where we pick like a gamma that's between zero and one so so
[00:39:28] gamma that's between zero and one so so that is kind of a design choice like
[00:39:29] that is kind of a design choice like depending on what problem you're in you
[00:39:31] depending on what problem you're in you might want to choose a different gamma
[00:39:44] it's not really an assessment of risk in
[00:39:46] it's not really an assessment of risk in that way it depends on the problem it
[00:39:48] that way it depends on the problem it depends on like in the particular
[00:39:50] depends on like in the particular problem I do want to get values in the
[00:39:52] problem I do want to get values in the future I have like some sort of
[00:39:53] future I have like some sort of long-term like goal that I want to get
[00:39:55] long-term like goal that I want to get to and I care about the future like it
[00:39:57] to and I care about the future like it depends like if you're solving a game
[00:39:59] depends like if you're solving a game versus you're solving like I don't know
[00:40:00] versus you're solving like I don't know like a robot manipulation problem like
[00:40:03] like a robot manipulation problem like it might just be very different like the
[00:40:05] it might just be very different like the scale factors that you would use for a
[00:40:07] scale factors that you would use for a lot of examples we would use in this
[00:40:08] lot of examples we would use in this class you just choose a gamma that's
[00:40:10] class you just choose a gamma that's close to one like usually like four for
[00:40:13] close to one like usually like four for a lot of problems that we end up dealing
[00:40:14] a lot of problems that we end up dealing with gamma is like points nine that's
[00:40:16] with gamma is like points nine that's like the usual okay like for usual
[00:40:19] like the usual okay like for usual problems like you might have a very
[00:40:20] problems like you might have a very different problem and we don't care
[00:40:21] different problem and we don't care about the future so so then we just drop
[00:40:23] about the future so so then we just drop it
[00:40:24] it yes okay so that's a good question so it
[00:40:34] yes okay so that's a good question so it is gamma a hyper parameter that you need
[00:40:36] is gamma a hyper parameter that you need to tune I would say gamma is a design
[00:40:37] to tune I would say gamma is a design choice it's not a hyper parameter
[00:40:39] choice it's not a hyper parameter necessarily in that sense that oh I'll
[00:40:40] necessarily in that sense that oh I'll pick the right gamma I will do the right
[00:40:42] pick the right gamma I will do the right thing you would want to pick a gamma
[00:40:43] thing you would want to pick a gamma that kind of works well with your
[00:40:45] that kind of works well with your problem statement and then gamma of zero
[00:40:48] problem statement and then gamma of zero is kind of young greedy like you were
[00:40:49] is kind of young greedy like you were picking like what is the best thing
[00:40:51] picking like what is the best thing right now and I just don't care about
[00:40:52] right now and I just don't care about the future ever
[00:41:00] it doesn't really the Markov property
[00:41:02] it doesn't really the Markov property it's just a discount of like you know
[00:41:05] it's just a discount of like you know it's about the reward it's not about how
[00:41:07] it's about the reward it's not about how this state affects the next state it's
[00:41:09] this state affects the next state it's basically affects how much reward you're
[00:41:11] basically affects how much reward you're gonna get or how much do value reward in
[00:41:14] gonna get or how much do value reward in the future it doesn't it doesn't
[00:41:15] the future it doesn't it doesn't actually like it's still Markov decision
[00:41:17] actually like it's still Markov decision process what you're getting whether it's
[00:41:24] process what you're getting whether it's affecting the reward yeah but it's more
[00:41:25] affecting the reward yeah but it's more called because the iphone's state s and
[00:41:27] called because the iphone's state s and I take action a I'm gonna end up in s
[00:41:29] I take action a I'm gonna end up in s Prime and that doesn't depend on like
[00:41:31] Prime and that doesn't depend on like gamma
[00:41:33] gamma all right so okay so so in this section
[00:41:37] all right so okay so so in this section we've been talking about this idea of
[00:41:38] we've been talking about this idea of someone comes in and gives me the policy
[00:41:40] someone comes in and gives me the policy so the policy is PI and what I want to
[00:41:43] so the policy is PI and what I want to do is I want to figure out what's the
[00:41:44] do is I want to figure out what's the value of that policy and again value is
[00:41:47] value of that policy and again value is just expected utility
[00:41:48] just expected utility okay so V PI of s is just six but a
[00:41:50] okay so V PI of s is just six but a utility received by following this
[00:41:53] utility received by following this policy PI from state yes okay so so I'm
[00:41:56] policy PI from state yes okay so so I'm not doing anything fancy I'm not even
[00:41:58] not doing anything fancy I'm not even trying to figure out what PI is all I
[00:42:00] trying to figure out what PI is all I want to do is I want to just evaluate if
[00:42:03] want to do is I want to just evaluate if you tell me this is PI how good is that
[00:42:05] you tell me this is PI how good is that what's the value of that okay so so
[00:42:08] what's the value of that okay so so that's what a value function is so value
[00:42:12] that's what a value function is so value of a policy is V PI of s okay that's
[00:42:17] of a policy is V PI of s okay that's expected utility of starting in some
[00:42:19] expected utility of starting in some state let me put this here yeah so we PI
[00:42:33] state let me put this here yeah so we PI is the value of the expected utility of
[00:42:36] is the value of the expected utility of me starting in some state us okay and
[00:42:41] me starting in some state us okay and state s has value of PI of s and if
[00:42:44] state s has value of PI of s and if phones tells me that well you're
[00:42:46] phones tells me that well you're following policy PI then I already know
[00:42:49] following policy PI then I already know from state s the action I'm gonna take
[00:42:51] from state s the action I'm gonna take is PI of s so that's very clear so I'll
[00:42:55] is PI of s so that's very clear so I'll take PI of s and if I take PI of s well
[00:42:58] take PI of s and if I take PI of s well I'm going to end up in some chance nope
[00:43:02] I'm going to end up in some chance nope okay and that chance note is a state
[00:43:06] okay and that chance note is a state action note it's going to be s and the
[00:43:09] action note it's going to be s and the action I've decided the action is PI of
[00:43:11] action I've decided the action is PI of s I have this define this new function
[00:43:16] s I have this define this new function this Q function Q PI of Si which is just
[00:43:22] this Q function Q PI of Si which is just the expected utility from the chance
[00:43:24] the expected utility from the chance node okay so so we've talked about value
[00:43:27] node okay so so we've talked about value values expected utility from my actual
[00:43:29] values expected utility from my actual States I'm gonna talk about Q values as
[00:43:32] States I'm gonna talk about Q values as expected utilities from the chance notes
[00:43:35] expected utilities from the chance notes so after you have committed that you
[00:43:36] so after you have committed that you have taken action a and then your
[00:43:39] have taken action a and then your following policy PI then what is the
[00:43:41] following policy PI then what is the expected utility from that point on okay
[00:43:43] expected utility from that point on okay and well what does the expected utility
[00:43:45] and well what does the expected utility from
[00:43:46] from point on we are in a chance note so many
[00:43:48] point on we are in a chance note so many things that can happen because I have
[00:43:50] things that can happen because I have like nature is going to play Andros die
[00:43:53] like nature is going to play Andros die and anything can happen and they're
[00:43:56] and anything can happen and they're gonna happen with transition s a s Prime
[00:44:00] gonna happen with transition s a s Prime and with that transition probability I'm
[00:44:03] and with that transition probability I'm going to end up in a new state I'm gonna
[00:44:05] going to end up in a new state I'm gonna call that s Prime and the value of that
[00:44:07] call that s Prime and the value of that state again expected utility of that
[00:44:09] state again expected utility of that state is V PI of s prime all right
[00:44:14] state is V PI of s prime all right so okay so what are these actually equal
[00:44:16] so okay so what are these actually equal to so I've just defined value as
[00:44:18] to so I've just defined value as expected utility Q value as expected
[00:44:21] expected utility Q value as expected utility from a chance node what are they
[00:44:23] utility from a chance node what are they actually equal to okay so I'm gonna
[00:44:26] actually equal to okay so I'm gonna write a recurrence that you're gonna use
[00:44:28] write a recurrence that you're gonna use for the rest of the class so pay
[00:44:30] for the rest of the class so pay attention for five seconds there's
[00:44:32] attention for five seconds there's question day so they're both of them are
[00:44:43] question day so they're both of them are expected value yeah just one is just a
[00:44:46] expected value yeah just one is just a function off state the other one you've
[00:44:47] function off state the other one you've committed to one action and the reason
[00:44:49] committed to one action and the reason I'm defining both of them is to just
[00:44:51] I'm defining both of them is to just writing my recurrence it's gonna be a
[00:44:53] writing my recurrence it's gonna be a little bit easier because I have this
[00:44:55] little bit easier because I have this state action note and I can talk about
[00:44:56] state action note and I can talk about them and I can talk about pal but if I
[00:44:58] them and I can talk about pal but if I get branching from these state action
[00:45:00] get branching from these state action notes okay all right so I'm gonna write
[00:45:02] notes okay all right so I'm gonna write a recurrence it's not hard but it's kind
[00:45:05] a recurrence it's not hard but it's kind of the basis of the next like n lectures
[00:45:08] of the basis of the next like n lectures so pay attention so alright so V PI of s
[00:45:12] so pay attention so alright so V PI of s what is that equal to well that is going
[00:45:17] what is that equal to well that is going to be equal to zero if I'm in an end
[00:45:19] to be equal to zero if I'm in an end state so if is end of s is equal to true
[00:45:25] state so if is end of s is equal to true then there is no expected utility that's
[00:45:27] then there is no expected utility that's equal to zero that's an easy case
[00:45:28] equal to zero that's an easy case otherwise
[00:45:30] otherwise well I took policy is someone told me
[00:45:33] well I took policy is someone told me take policy is so value is just equal to
[00:45:36] take policy is so value is just equal to Q right so so in this case we PI of s if
[00:45:40] Q right so so in this case we PI of s if someone comes and gives me policy hi
[00:45:42] someone comes and gives me policy hi it's just equal to Q PI of s these two
[00:45:50] it's just equal to Q PI of s these two are just equal to each other so the next
[00:45:53] are just equal to each other so the next question one might ask is actually let
[00:45:56] question one might ask is actually let me write this a little closer so I'll
[00:45:58] me write this a little closer so I'll have some space
[00:46:03] so this is equal to Q PI of s so so what
[00:46:08] so this is equal to Q PI of s so so what is that equal to what is Q PI of Si
[00:46:11] is that equal to what is Q PI of Si equal to this is via this so now I just
[00:46:14] equal to this is via this so now I just want to know what is Q value key PI of
[00:46:17] want to know what is Q value key PI of si what is that equal to
[00:46:19] si what is that equal to okay so if I'm right here then there are
[00:46:22] okay so if I'm right here then there are a bunch of different things that can
[00:46:23] a bunch of different things that can happen right and I can end up in these
[00:46:26] happen right and I can end up in these different s Prime so if I'm looking for
[00:46:28] different s Prime so if I'm looking for the expected utility then I'm looking
[00:46:30] the expected utility then I'm looking for the probability of me ending up in
[00:46:31] for the probability of me ending up in this state times the utility of this
[00:46:33] this state times the utility of this state plus the probability of we ending
[00:46:35] state plus the probability of we ending a new state times they told you of that
[00:46:37] a new state times they told you of that so so that is just equal to sum over all
[00:46:40] so so that is just equal to sum over all possible s crimes that I can end up at
[00:46:44] possible s crimes that I can end up at of transition probabilities of Si s
[00:46:48] of transition probabilities of Si s prime transition probability of ending
[00:46:50] prime transition probability of ending of a new state times the immediate
[00:46:53] of a new state times the immediate reward that I'm gonna get reward of si s
[00:46:57] reward that I'm gonna get reward of si s prime plus the value here what I care
[00:47:01] prime plus the value here what I care about the discounted value so I'm gonna
[00:47:03] about the discounted value so I'm gonna add gamma V PI of s prime because I'm
[00:47:07] add gamma V PI of s prime because I'm talking about this this next state okay
[00:47:10] talking about this this next state okay is this serious okay so this is the
[00:47:17] is this serious okay so this is the recurrence that we were doing in policy
[00:47:18] recurrence that we were doing in policy evaluation again remember someone came
[00:47:20] evaluation again remember someone came and gave me policy PI so I'm writing
[00:47:22] and gave me policy PI so I'm writing this policy PI here someone gave me
[00:47:24] this policy PI here someone gave me policy PI I just want to know how good
[00:47:26] policy PI I just want to know how good policy PI is I can do that by computing
[00:47:29] policy PI is I can do that by computing V PI what is reply equal to someone told
[00:47:31] V PI what is reply equal to someone told me it's your following policy PI so it's
[00:47:33] me it's your following policy PI so it's got to be equal to just Q PI what is Q
[00:47:36] got to be equal to just Q PI what is Q PI equal to it's just sum of all the
[00:47:37] PI equal to it's just sum of all the like the expectation of all the places
[00:47:40] like the expectation of all the places that I can end up at that sum over s
[00:47:42] that I can end up at that sum over s primes transition probabilities of
[00:47:44] primes transition probabilities of ending up in s prime times the reward
[00:47:47] ending up in s prime times the reward the total reward you are getting which
[00:47:48] the total reward you are getting which is the immediate reward Plus this
[00:47:50] is the immediate reward Plus this counting in my future okay
[00:47:54] and then following policy Pike I'm
[00:47:57] and then following policy Pike I'm starting from the master plan yes sir
[00:48:01] starting from the master plan yes sir you promised all right so okay so far so
[00:48:06] you promised all right so okay so far so good so so that is how I can evaluate
[00:48:09] good so so that is how I can evaluate this policy right so so I have these two
[00:48:12] this policy right so so I have these two recurrences if I have these two
[00:48:14] recurrences if I have these two recurrences I can just replace this guy
[00:48:17] recurrences I can just replace this guy here and let's imagine we are in the
[00:48:21] here and let's imagine we are in the case maybe I can use a different color
[00:48:22] case maybe I can use a different color up here I'm just replacing I'm just
[00:48:30] up here I'm just replacing I'm just replacing this guy right here I don't
[00:48:32] replacing this guy right here I don't know if it's worth writing it imagine we
[00:48:34] know if it's worth writing it imagine we are not in an end state if you're not in
[00:48:36] are not in an end state if you're not in an in the end of state then we PI of s
[00:48:38] an in the end of state then we PI of s or what is that equal to
[00:48:40] or what is that equal to that is just equal to sum of transition
[00:48:43] that is just equal to sum of transition probabilities si s prime over s Prime's
[00:48:47] probabilities si s prime over s Prime's times immediate reward that I'm gonna
[00:48:49] times immediate reward that I'm gonna get plus discounting V PI of s Prime
[00:48:54] get plus discounting V PI of s Prime okay so this is kind of a recurrence
[00:48:56] okay so this is kind of a recurrence that I have I literally just combined
[00:48:58] that I have I literally just combined these two and wrote it in green if
[00:49:00] these two and wrote it in green if you're not in an end state so if you're
[00:49:02] you're not in an end state so if you're not in an end State
[00:49:03] not in an end State this is the recurrence I have I have a
[00:49:05] this is the recurrence I have I have a PI here I have a PI on this side too
[00:49:08] PI here I have a PI on this side too so that is nice and and that is kind of
[00:49:11] so that is nice and and that is kind of the the Placer I can compute V PI maybe
[00:49:14] the the Placer I can compute V PI maybe I can do it iteratively or maybe I can
[00:49:16] I can do it iteratively or maybe I can actually find a closed form solution for
[00:49:17] actually find a closed form solution for some problems but that is basically what
[00:49:19] some problems but that is basically what I'm gonna do I have V PI as a function
[00:49:21] I'm gonna do I have V PI as a function that depends on V PI of s prime and I
[00:49:23] that depends on V PI of s prime and I can just solve for this V PI okay it
[00:49:26] can just solve for this V PI okay it allows me to evaluate policy PI I
[00:49:28] allows me to evaluate policy PI I haven't figured out a new policy all I
[00:49:30] haven't figured out a new policy all I have done is evaluating what's a value
[00:49:32] have done is evaluating what's a value of pi okay all right okay so let's go
[00:49:41] of pi okay all right okay so let's go back to this example so let's say that
[00:49:44] back to this example so let's say that someone comes in and tells me about the
[00:49:46] someone comes in and tells me about the policy you got to follow is is stay so
[00:49:48] policy you got to follow is is stay so my policy is to stay okay I want to know
[00:49:51] my policy is to stay okay I want to know I want to just evaluate that I want your
[00:49:53] I want to just evaluate that I want your policy evaluation when you're doing
[00:49:55] policy evaluation when you're doing policy evaluation you got to compute
[00:49:57] policy evaluation you got to compute that V PI for all states so let's start
[00:50:00] that V PI for all states so let's start with v pi up end well that is equal to 0
[00:50:03] with v pi up end well that is equal to 0 because we know V PI at the end state is
[00:50:04] because we know V PI at the end state is just equal to 0
[00:50:06] just equal to 0 now I want to know what's V PI of in
[00:50:08] now I want to know what's V PI of in okay state in what is that equal to
[00:50:12] okay state in what is that equal to that's just equal to Q PI of in and stay
[00:50:15] that's just equal to Q PI of in and stay right v pi is just equal to Q PI in
[00:50:20] right v pi is just equal to Q PI in mistake so I'm going to replace that
[00:50:24] mistake so I'm going to replace that that's just equal to 1/3 times immediate
[00:50:27] that's just equal to 1/3 times immediate rewards which is for cost value of the
[00:50:30] rewards which is for cost value of the next state I'm going to end up at which
[00:50:32] next state I'm going to end up at which is end in this case plus 2/3 times the
[00:50:36] is end in this case plus 2/3 times the immediate reward I'm going to get which
[00:50:37] immediate reward I'm going to get which is 4 dollars plus value of the state I'm
[00:50:39] is 4 dollars plus value of the state I'm going to end up at which is in okay so
[00:50:42] going to end up at which is in okay so so that is just that sum that we have
[00:50:44] so that is just that sum that we have there right the PI of end is 0 so let me
[00:50:48] there right the PI of end is 0 so let me just put that 0 there I'm gonna put 0
[00:50:50] just put that 0 there I'm gonna put 0 there I only have one state here to
[00:50:52] there I only have one state here to write so that I just have this as a
[00:50:54] write so that I just have this as a function of this one state in so having
[00:50:57] function of this one state in so having an equation I can find the closed form
[00:50:59] an equation I can find the closed form solution of the PI of n I'm just going
[00:51:01] solution of the PI of n I'm just going to move things around a little bit and
[00:51:04] to move things around a little bit and then I'll find out that V PI of in is
[00:51:06] then I'll find out that V PI of in is just equal to 12 so so that's how you
[00:51:09] just equal to 12 so so that's how you get that 12 that I've been talking about
[00:51:10] get that 12 that I've been talking about so you just found out that if you tell
[00:51:13] so you just found out that if you tell me the policy to follow stay if that is
[00:51:16] me the policy to follow stay if that is the policy then the value of that policy
[00:51:18] the policy then the value of that policy from state in is equal to 12 yeah so so
[00:51:27] from state in is equal to 12 yeah so so the policy is a function of state I only
[00:51:29] the policy is a function of state I only have this one state that's interesting
[00:51:31] have this one state that's interesting here right that that's on one state is
[00:51:32] here right that that's on one state is in so I need to win and when I define my
[00:51:35] in so I need to win and when I define my policy I need to kind of choose the same
[00:51:37] policy I need to kind of choose the same policy for that stage right my policy
[00:51:40] policy for that stage right my policy says and in you got either stay or you
[00:51:42] says and in you got either stay or you got either quick quit all right so you
[00:51:48] got either quick quit all right so you can basically do the same thing using an
[00:51:50] can basically do the same thing using an iterative algorithm to so so here like
[00:51:52] iterative algorithm to so so here like in the previous example it was kind of
[00:51:54] in the previous example it was kind of simple I just solved the closed form
[00:51:56] simple I just solved the closed form solution but in reality like you might
[00:51:58] solution but in reality like you might have different states and then in with
[00:52:00] have different states and then in with the company it might be a little bit
[00:52:02] the company it might be a little bit more complicated so we can actually have
[00:52:04] more complicated so we can actually have an iterative algorithm that allows us to
[00:52:06] an iterative algorithm that allows us to find these meat pies so the way we do
[00:52:08] find these meat pies so the way we do that is we start with the values for all
[00:52:13] that is we start with the values for all states to be equal to 0 and this 0 I put
[00:52:17] states to be equal to 0 and this 0 I put here is the first iteration so I'm going
[00:52:20] here is the first iteration so I'm going to
[00:52:20] to my iterations here so so I'm gonna just
[00:52:23] my iterations here so so I'm gonna just initialize all the values for all states
[00:52:25] initialize all the values for all states to just be equal to zero okay then I'm
[00:52:28] to just be equal to zero okay then I'm just gonna iterate for some number of
[00:52:30] just gonna iterate for some number of time whatever number I care like I would
[00:52:32] time whatever number I care like I would like to then what I'm gonna do is for
[00:52:35] like to then what I'm gonna do is for every state again remember the value
[00:52:37] every state again remember the value needs to be computed for every state so
[00:52:39] needs to be computed for every state so for every state I'm gonna update my
[00:52:42] for every state I'm gonna update my value by the same equation that I have
[00:52:45] value by the same equation that I have on the board okay and the same equation
[00:52:47] on the board okay and the same equation depends on the value at the previous
[00:52:49] depends on the value at the previous time step so this is just an iterative
[00:52:52] time step so this is just an iterative algorithm that allows me to compute new
[00:52:54] algorithm that allows me to compute new values based on previous values that I
[00:52:57] values based on previous values that I have had and I served it like everything
[00:52:58] have had and I served it like everything zero and then I keep updating values of
[00:53:01] zero and then I keep updating values of all states and they keep going so
[00:53:06] all states and they keep going so basically that equation but think of it
[00:53:08] basically that equation but think of it as like an iterative update every round
[00:53:12] as like an iterative update every round so you don't you run this for multiple
[00:53:14] so you don't you run this for multiple rounds every round you just update your
[00:53:17] rounds every round you just update your value okay so like here is just a
[00:53:22] value okay so like here is just a pictorial you're looking at it imagine
[00:53:23] pictorial you're looking at it imagine you have like five states here you
[00:53:26] you have like five states here you initialize all of them to be equal to
[00:53:28] initialize all of them to be equal to zero the first round you're going to get
[00:53:30] zero the first round you're going to get some value are going to update it and
[00:53:32] some value are going to update it and then you're going to keep running this
[00:53:34] then you're going to keep running this and then eventually you can kind of see
[00:53:36] and then eventually you can kind of see that the last two columns are kind of
[00:53:38] that the last two columns are kind of close to each other and you have
[00:53:39] close to each other and you have converged to the true value so so you
[00:53:42] converged to the true value so so you again someone comes and gives you the
[00:53:44] again someone comes and gives you the policy you start with values equal to
[00:53:47] policy you start with values equal to zero for all the states and then you
[00:53:49] zero for all the states and then you just update it based on your previous
[00:53:50] just update it based on your previous value yeah so how long should we run
[00:53:55] value yeah so how long should we run this well we have a heuristic to kind of
[00:53:58] this well we have a heuristic to kind of figure out how long we should run this
[00:53:59] figure out how long we should run this particular algorithm one thing you can
[00:54:02] particular algorithm one thing you can do is you can kind of keep track of the
[00:54:04] do is you can kind of keep track of the difference between your value at the
[00:54:05] difference between your value at the previous time step versus this time step
[00:54:08] previous time step versus this time step so so if the difference is below some
[00:54:10] so so if the difference is below some threshold you can kind of call it call
[00:54:13] threshold you can kind of call it call it done and then say well I've found the
[00:54:15] it done and then say well I've found the right values and then in this case we
[00:54:18] right values and then in this case we were basically looking at the difference
[00:54:20] were basically looking at the difference between value at iteration T versus
[00:54:22] between value at iteration T versus value I generation t minus 1 and then
[00:54:24] value I generation t minus 1 and then we're taking the max of that over all
[00:54:26] we're taking the max of that over all possible states because I want the
[00:54:28] possible states because I want the values to be close for all states
[00:54:33] is this gear so so I'm going to talk
[00:54:35] is this gear so so I'm going to talk about the convergence then you talk
[00:54:37] about the convergence then you talk about the gamma factor and and the
[00:54:39] about the gamma factor and and the discount factor and basically City and
[00:54:42] discount factor and basically City and also how long you should run this to get
[00:54:44] also how long you should run this to get these is also a difficult problem and it
[00:54:46] these is also a difficult problem and it depends on the properties of your MVP so
[00:54:48] depends on the properties of your MVP so if you have an ergodic if you have an
[00:54:49] if you have an ergodic if you have an hour guard again between this is just
[00:54:51] hour guard again between this is just should work okay but in general it's a
[00:54:54] should work okay but in general it's a hard problem to answer for general
[00:54:56] hard problem to answer for general Markov decision problem processes and
[00:54:59] Markov decision problem processes and another thing to notice here is I'm not
[00:55:01] another thing to notice here is I'm not storing that whole table like the only
[00:55:03] storing that whole table like the only thing I'm storing is the last two
[00:55:06] thing I'm storing is the last two columns of this table because because
[00:55:08] columns of this table because because that's the PI at iteration T and V PI I
[00:55:11] that's the PI at iteration T and V PI I generation t minus 1 those are like the
[00:55:14] generation t minus 1 those are like the only things I'm storing because that
[00:55:15] only things I'm storing because that allows me to compute if I have
[00:55:17] allows me to compute if I have conversion that kind of allows me to
[00:55:19] conversion that kind of allows me to keep going because I only need my
[00:55:21] keep going because I only need my previous values to update my values
[00:55:23] previous values to update my values right in terms of complexity but this is
[00:55:28] right in terms of complexity but this is going to take order of T times s times s
[00:55:30] going to take order of T times s times s prime well why is that because I'm
[00:55:33] prime well why is that because I'm iterating over T time steps and I'm
[00:55:37] iterating over T time steps and I'm iterating over all my states and I'm
[00:55:39] iterating over all my states and I'm summing over all s Prime's so because of
[00:55:42] summing over all s Prime's so because of that that's a complexity I yet and one
[00:55:44] that that's a complexity I yet and one thing to notice here is it doesn't
[00:55:46] thing to notice here is it doesn't depend on actions it doesn't depend on
[00:55:48] depend on actions it doesn't depend on the size of actions and the reason it
[00:55:50] the size of actions and the reason it doesn't depend on the size of actions is
[00:55:52] doesn't depend on the size of actions is you have given me the policy you're
[00:55:53] you have given me the policy you're telling me follow this policy so if
[00:55:55] telling me follow this policy so if you've given me the policy then I don't
[00:55:57] you've given me the policy then I don't really need to worry about like the
[00:55:59] really need to worry about like the number of actions I have okay all right
[00:56:05] here is just another like the same
[00:56:08] here is just another like the same example that we have seen so at
[00:56:10] example that we have seen so at iteration T equal to 1 n is going to get
[00:56:13] iteration T equal to 1 n is going to get 4 and it's going to get 0 Ida duration 2
[00:56:16] 4 and it's going to get 0 Ida duration 2 it gets a slightly better value and then
[00:56:19] it gets a slightly better value and then finally like a duration like 100 let's
[00:56:22] finally like a duration like 100 let's say we get the value 12 and remember for
[00:56:25] say we get the value 12 and remember for this particular example like this
[00:56:26] this particular example like this example we were able to solve it like
[00:56:28] example we were able to solve it like solve the closed form we have v of
[00:56:32] solve the closed form we have v of policy staying
[00:56:33] policy staying from state n but but you could also run
[00:56:38] from state n but but you could also run the iterative algorithm and get the same
[00:56:40] the iterative algorithm and get the same value of 12
[00:56:45] the number of actions is the size of s
[00:56:49] the number of actions is the size of s Prime no because you the size of s you
[00:56:52] Prime no because you the size of s you might end up in very different different
[00:56:54] might end up in very different different states this depends on your
[00:56:55] states this depends on your probabilities the size of X prime is
[00:56:59] probabilities the size of X prime is actually the size of like size of states
[00:57:00] actually the size of like size of states is the same thing like it's the worst
[00:57:03] is the same thing like it's the worst case scenario you're going from every
[00:57:04] case scenario you're going from every state to every state just imagine that
[00:57:07] state to every state just imagine that size of s okay oh it's the summary so
[00:57:11] size of s okay oh it's the summary so far where are we so we have talked about
[00:57:14] far where are we so we have talked about MVPs these are graphs with States and
[00:57:17] MVPs these are graphs with States and chance notes and transition
[00:57:18] chance notes and transition probabilities and rewards and we have
[00:57:21] probabilities and rewards and we have talked about policy as the solution to
[00:57:24] talked about policy as the solution to an MVP which is this function that takes
[00:57:26] an MVP which is this function that takes a state and gives us an action okay we
[00:57:29] a state and gives us an action okay we talked about value of a policy so value
[00:57:31] talked about value of a policy so value of a policy is the expected utility of
[00:57:34] of a policy is the expected utility of that policy so so if you talk about
[00:57:36] that policy so so if you talk about utility like you have these random
[00:57:39] utility like you have these random values before all these random paths
[00:57:41] values before all these random paths that you're gonna get for every policy
[00:57:42] that you're gonna get for every policy the value of utility is just an
[00:57:44] the value of utility is just an expectation over all those random random
[00:57:47] expectation over all those random random variables and so far we've talked about
[00:57:49] variables and so far we've talked about this idea of policy evaluation which is
[00:57:52] this idea of policy evaluation which is just an iterative algorithm to compute
[00:57:54] just an iterative algorithm to compute what's the value of a state if you give
[00:57:57] what's the value of a state if you give me some policy like how good is that
[00:57:58] me some policy like how good is that policy what's the value I'm gonna get at
[00:58:00] policy what's the value I'm gonna get at every state all right so that has been
[00:58:05] every state all right so that has been all assuming you give me the policy now
[00:58:07] all assuming you give me the policy now the thing I want to spend a little bit
[00:58:09] the thing I want to spend a little bit of time on is figuring out how to find
[00:58:11] of time on is figuring out how to find that policy here we only have a stay or
[00:58:22] that policy here we only have a stay or quit if you have a different problem
[00:58:24] quit if you have a different problem that they can learn another actually
[00:58:27] that they can learn another actually state way or something trade is going to
[00:58:32] state way or something trade is going to change the value of the policy
[00:58:34] change the value of the policy because then you have a new action and
[00:58:36] because then you have a new action and then you need to update our policies so
[00:58:38] then you need to update our policies so in this case so far I'm assuming that
[00:58:40] in this case so far I'm assuming that the set of actions is fixed I'm not like
[00:58:42] the set of actions is fixed I'm not like adding new actions right like the way
[00:58:44] adding new actions right like the way even with search problems like the way
[00:58:45] even with search problems like the way we defined search problems or the way we
[00:58:47] we defined search problems or the way we are defining MVPs is I am saying like
[00:58:50] are defining MVPs is I am saying like I'm starting with a set up where states
[00:58:52] I'm starting with a set up where states are fixed actions are fixed I have stay
[00:58:54] are fixed actions are fixed I have stay and create those are like the only
[00:58:55] and create those are like the only actions I can take the reward is fixed
[00:58:58] actions I can take the reward is fixed transition probabilities are fixed under
[00:59:00] transition probabilities are fixed under that scenario then what is the best the
[00:59:03] that scenario then what is the best the best policy I can take and best policy
[00:59:05] best policy I can take and best policy is just from those set up like they've
[00:59:06] is just from those set up like they've already defined actions okay next
[00:59:10] already defined actions okay next lecture we will talk about unknown
[00:59:12] lecture we will talk about unknown settings like when we have transition
[00:59:13] settings like when we have transition probabilities that are not known or
[00:59:15] probabilities that are not known or reward functions that are not known and
[00:59:16] reward functions that are not known and how we go about learning them and that
[00:59:18] how we go about learning them and that would be the reinforcement learning
[00:59:19] would be the reinforcement learning lecture so next lecture might address
[00:59:21] lecture so next lecture might address some okay all right so let's talk about
[00:59:23] some okay all right so let's talk about value iteration so so that was public
[00:59:25] value iteration so so that was public evaluation so like that whole thing was
[00:59:28] evaluation so like that whole thing was evaluation so now what I would like to
[00:59:30] evaluation so now what I would like to do is I want to try to get the maximum
[00:59:33] do is I want to try to get the maximum expected utility and find the set of
[00:59:36] expected utility and find the set of policies that gets me the maximum
[00:59:38] policies that gets me the maximum expected utility okay so to do that I'm
[00:59:41] expected utility okay so to do that I'm gonna define this thing that's called an
[00:59:43] gonna define this thing that's called an optimal value so instead of value have a
[00:59:45] optimal value so instead of value have a particular policy I just want to be
[00:59:47] particular policy I just want to be optimist which is the maximum value
[00:59:49] optimist which is the maximum value attained by any policy so so you might
[00:59:52] attained by any policy so so you might have a bunch of different policies I
[00:59:54] have a bunch of different policies I just want that policy that maximizes the
[00:59:56] just want that policy that maximizes the value okay so and that is the Optus so
[01:00:01] value okay so and that is the Optus so um so let me go back to this to this
[01:00:04] um so let me go back to this to this example so I'm gonna have this in
[01:00:05] example so I'm gonna have this in parallel to this example of policy
[01:00:08] parallel to this example of policy evaluation I want to do value iteration
[01:00:09] evaluation I want to do value iteration okay so I'm gonna start from state s
[01:00:13] okay so I'm gonna start from state s again state s has V opt
[01:00:17] again state s has V opt s okay that is what I would like to find
[01:00:21] s okay that is what I would like to find here I had V Pyrus if I'm looking for we
[01:00:25] here I had V Pyrus if I'm looking for we opt of s then I can have multiple
[01:00:28] opt of s then I can have multiple actions that can come out of here and I
[01:00:31] actions that can come out of here and I don't know which one to take but like
[01:00:33] don't know which one to take but like any of if I take any of them if I take
[01:00:35] any of if I take any of them if I take this guy that takes me to a chance note
[01:00:37] this guy that takes me to a chance note of si okay and then I'm looking for Q
[01:00:43] of si okay and then I'm looking for Q opt of si
[01:00:47] opt of si and from here it's actually pretty
[01:00:48] and from here it's actually pretty similar to what we had right here so I'm
[01:00:51] similar to what we had right here so I'm in a chance note anything can happen
[01:00:53] in a chance note anything can happen right nature plays and with some
[01:00:57] right nature plays and with some transition probability of a prime I'm
[01:01:01] transition probability of a prime I'm going to end up in some new state and
[01:01:04] going to end up in some new state and its prime and I care about the opt of
[01:01:07] its prime and I care about the opt of that so if I'm looking for this optimal
[01:01:12] that so if I'm looking for this optimal policy which comes from this optimal
[01:01:14] policy which comes from this optimal value then I need to find V opt and if I
[01:01:17] value then I need to find V opt and if I want to find V opt well that depends on
[01:01:20] want to find V opt well that depends on what action I'm taking here but let's
[01:01:22] what action I'm taking here but let's say I take one of these and if I take
[01:01:23] say I take one of these and if I take one of these I end up in a chance note I
[01:01:25] one of these I end up in a chance note I have Q opt of the saying that chance
[01:01:28] have Q opt of the saying that chance note and then from that point on with
[01:01:30] note and then from that point on with whatever probabilities I can end up in
[01:01:32] whatever probabilities I can end up in some s prime okay so I want to write the
[01:01:34] some s prime okay so I want to write the recurrence for this guy similar to the
[01:01:37] recurrence for this guy similar to the recurrence that we wrote here it's gonna
[01:01:38] recurrence that we wrote here it's gonna be actually very similar so okay so I'm
[01:01:42] be actually very similar so okay so I'm going to start with you because that is
[01:01:44] going to start with you because that is easier so what is Q opt of si that just
[01:01:48] easier so what is Q opt of si that just seems very similar to this previous case
[01:01:50] seems very similar to this previous case what is that equal to what was q pi q pi
[01:01:56] what is that equal to what was q pi q pi was just some of transition
[01:01:58] was just some of transition probabilities times rewards right so so
[01:02:00] probabilities times rewards right so so what is Q opt yeah so so it would just
[01:02:04] what is Q opt yeah so so it would just be basically this equation except for
[01:02:06] be basically this equation except for I'm gonna replace me pi we'd be opt so
[01:02:08] I'm gonna replace me pi we'd be opt so so from Q opt I can end up anywhere like
[01:02:11] so from Q opt I can end up anywhere like based on the transition probabilities so
[01:02:13] based on the transition probabilities so I'm going to sum up over s Prime's and
[01:02:14] I'm going to sum up over s Prime's and all possible places that I can end up at
[01:02:18] all possible places that I can end up at I'm gonna get an immediate reward which
[01:02:20] I'm gonna get an immediate reward which is RSA s Prime and I'm gonna discount
[01:02:24] is RSA s Prime and I'm gonna discount the future but the value of the future
[01:02:27] the future but the value of the future is V opt of s prime okay so so far so
[01:02:32] is V opt of s prime okay so so far so good that's Q opt how about we opt what
[01:02:37] good that's Q opt how about we opt what is that equal to well it's going to be
[01:02:42] is that equal to well it's going to be equal to zero if you're in an end state
[01:02:44] equal to zero if you're in an end state that's similar to before so if his end
[01:02:47] that's similar to before so if his end of S is true then ten eight is zero
[01:02:51] of S is true then ten eight is zero otherwise I have I have a bunch of
[01:02:54] otherwise I have I have a bunch of options here right I can take any of
[01:02:56] options here right I can take any of these actions and I can get any Q opt so
[01:02:59] these actions and I can get any Q opt so reach one should I pick which Q opt
[01:03:01] reach one should I pick which Q opt should I pick the one that maximizes
[01:03:06] should I pick the one that maximizes right which actually I should pick an
[01:03:10] right which actually I should pick an action from the set of actions of that
[01:03:14] action from the set of actions of that state that maximizes q opt so so the
[01:03:18] state that maximizes q opt so so the only thing that has changed here is
[01:03:20] only thing that has changed here is before someone told me what the policy
[01:03:22] before someone told me what the policy is I just took the cue of that here I'm
[01:03:25] is I just took the cue of that here I'm just picking the maximum value of Q and
[01:03:28] just picking the maximum value of Q and that actually tells me what action to
[01:03:31] that actually tells me what action to pick so what is the optimal policy what
[01:03:35] pick so what is the optimal policy what should be the optimal policy I'm gonna
[01:03:43] should be the optimal policy I'm gonna call it high opt is what is that equal
[01:03:48] call it high opt is what is that equal to
[01:03:48] to it's got to be the thing that maximizes
[01:03:51] it's got to be the thing that maximizes V right which is the thing that
[01:03:54] V right which is the thing that maximizes this this Q so because that
[01:03:58] maximizes this this Q so because that gives me the action so it's going to be
[01:04:01] gives me the action so it's going to be the argument of Q opt of SN a where a is
[01:04:08] the argument of Q opt of SN a where a is in actions okay all right so this one is
[01:04:19] in actions okay all right so this one is policy evaluation someone gave me the
[01:04:22] policy evaluation someone gave me the policy with that policy I was able to
[01:04:25] policy with that policy I was able to compute V I was able to compute Q I was
[01:04:27] compute V I was able to compute Q I was able to write this recurrence then I had
[01:04:29] able to write this recurrence then I had an iterative algorithm to do things this
[01:04:32] an iterative algorithm to do things this is called value iteration this is to
[01:04:35] is called value iteration this is to find the right policy iteration this is
[01:04:39] find the right policy iteration this is to find the policy how do I do that well
[01:04:41] to find the policy how do I do that well I have a value that's for the optimal
[01:04:43] I have a value that's for the optimal optimal value that I can get and it's
[01:04:45] optimal value that I can get and it's going to be maximum over all possible
[01:04:47] going to be maximum over all possible actions I can take of the Q values and Q
[01:04:50] actions I can take of the Q values and Q values is similar to before so I have
[01:04:53] values is similar to before so I have this recurrence now and then the optimal
[01:04:55] this recurrence now and then the optimal policy is just a narc max of Q
[01:05:04] tiny far exactly like this to eight oh
[01:05:07] tiny far exactly like this to eight oh yeah so you could get to ace it so the
[01:05:09] yeah so you could get to ace it so the question is yeah like what if like I
[01:05:11] question is yeah like what if like I have to ace that give me the same thing
[01:05:12] have to ace that give me the same thing I can return any of them it depends on
[01:05:14] I can return any of them it depends on your implementation of Mac's so you can
[01:05:16] your implementation of Mac's so you can return any of them you're five minutes
[01:05:21] return any of them you're five minutes over and be p1 okay so good news is the
[01:05:27] over and be p1 okay so good news is the slice are the same things that I have on
[01:05:28] slice are the same things that I have on the board so so Q opt is just equal to
[01:05:32] the board so so Q opt is just equal to the sum that we've talked about V opt I
[01:05:34] the sum that we've talked about V opt I just add the max on top of Q opt same
[01:05:38] just add the max on top of Q opt same story okay and then if I want the policy
[01:05:40] story okay and then if I want the policy then I just do the arc max of Q opt and
[01:05:43] then I just do the arc max of Q opt and that gives me the policy right I can
[01:05:46] that gives me the policy right I can have an again an iterative algorithm
[01:05:47] have an again an iterative algorithm that does the same thing it's actually
[01:05:49] that does the same thing it's actually quite similar to the iterative algorithm
[01:05:51] quite similar to the iterative algorithm for policy evaluation I just start
[01:05:53] for policy evaluation I just start setting everything to equal to zero I
[01:05:55] setting everything to equal to zero I iterate for some number of times I go
[01:05:57] iterate for some number of times I go over all possible states and then I just
[01:06:00] over all possible states and then I just update my value based on this new
[01:06:02] update my value based on this new recurrence that has a max so very
[01:06:06] recurrence that has a max so very similar to before I just do this update
[01:06:08] similar to before I just do this update one thing is the time complexity is
[01:06:11] one thing is the time complexity is going to be order of T times s times a
[01:06:13] going to be order of T times s times a times s fine because now I have this max
[01:06:15] times s fine because now I have this max value over all possible actions so I'm
[01:06:18] value over all possible actions so I'm actually iterating over all possible
[01:06:20] actually iterating over all possible actions versus in policy evaluations I
[01:06:22] actions versus in policy evaluations I didn't have a chriskiss someone would
[01:06:24] didn't have a chriskiss someone would give me the policy I didn't need to
[01:06:26] give me the policy I didn't need to worry about this all right so so let's
[01:06:29] worry about this all right so so let's look at coding this up real quick okay
[01:06:33] look at coding this up real quick okay so we have this MVP problem we define it
[01:06:36] so we have this MVP problem we define it it was a tram problem it was
[01:06:38] it was a tram problem it was probabilistic everything about it was
[01:06:40] probabilistic everything about it was great so now I just want to do an
[01:06:43] great so now I just want to do an algorithm section an inference section
[01:06:45] algorithm section an inference section where I code up value duration and I can
[01:06:49] where I code up value duration and I can call a value duration on this MVP
[01:06:52] call a value duration on this MVP problem to get the best optimal policy
[01:06:54] problem to get the best optimal policy okay so I'm going to call value
[01:06:57] okay so I'm going to call value iteration later all right
[01:07:07] so we initialize so all the values are
[01:07:11] so we initialize so all the values are going to become I might skip things to
[01:07:13] going to become I might skip things to make this faster so we're gonna
[01:07:15] make this faster so we're gonna initialize all the values to just zero
[01:07:17] initialize all the values to just zero right because all these values are going
[01:07:19] right because all these values are going to be 0 so I defined a state's function
[01:07:22] to be 0 so I defined a state's function so I for all of those the value is just
[01:07:26] so I for all of those the value is just going to be equal to 0
[01:07:27] going to be equal to 0 so let's initialize with that then
[01:07:30] so let's initialize with that then you're just gonna iterate or some number
[01:07:31] you're just gonna iterate or some number of time and what we want to do is you
[01:07:37] of time and what we want to do is you want to compute this new value given old
[01:07:39] want to compute this new value given old values so it's an iterative algorithm we
[01:07:41] values so it's an iterative algorithm we have old values you just update new
[01:07:44] have old values you just update new values based on them so what should that
[01:07:46] values based on them so what should that be equal to so we iterate over our
[01:07:54] be equal to so we iterate over our state's if you're in an end state then
[01:07:56] state's if you're in an end state then what is value equal to 0 right if you're
[01:08:01] what is value equal to 0 right if you're not in an end state then you're just
[01:08:03] not in an end state then you're just gonna do that it that that recurrence
[01:08:05] gonna do that it that that recurrence there okay so new value of a state is
[01:08:09] there okay so new value of a state is going to be equal to max of what the Q
[01:08:11] going to be equal to max of what the Q values okay so new V is just max of Q's
[01:08:17] values okay so new V is just max of Q's of states and actions okay so now I need
[01:08:22] of states and actions okay so now I need to define Q what does Q do here of state
[01:08:26] to define Q what does Q do here of state an action is just equal to that sum over
[01:08:29] an action is just equal to that sum over / s Prime's so it's gonna return sum and
[01:08:35] / s Prime's so it's gonna return sum and it's gonna return sum over s Prime's
[01:08:37] it's gonna return sum over s Prime's I define this successor probability and
[01:08:40] I define this successor probability and report function that gives me new state
[01:08:43] report function that gives me new state probability and rewards so I'm gonna
[01:08:44] probability and rewards so I'm gonna iterate over that and then call that up
[01:08:47] iterate over that and then call that up here so given that I have a state in
[01:08:51] here so given that I have a state in action I can get new state probability
[01:08:53] action I can get new state probability and report what are we summing you're
[01:08:55] and report what are we summing you're summing the probability the transition
[01:08:57] summing the probability the transition probabilities times the immediate reward
[01:08:59] probabilities times the immediate reward which is reward here times my costs my
[01:09:02] which is reward here times my costs my discount times my V which is the old
[01:09:05] discount times my V which is the old value of V over s prime over my new
[01:09:08] value of V over s prime over my new state so that is my cue that is my V and
[01:09:12] state so that is my cue that is my V and that's pretty much done we just need to
[01:09:16] that's pretty much done we just need to check for a convergence to check for
[01:09:18] check for a convergence to check for convergence we
[01:09:19] convergence we kind of do the same thing as before we
[01:09:20] kind of do the same thing as before we check if value of V and new V are close
[01:09:23] check if value of V and new V are close enough to each other that we can't call
[01:09:25] enough to each other that we can't call it done I'm gonna skip these parts so
[01:09:28] it done I'm gonna skip these parts so you can basically check if V minus nu V
[01:09:32] you can basically check if V minus nu V are within some threshold for for all
[01:09:34] are within some threshold for for all states and if they are then V is equal
[01:09:38] states and if they are then V is equal to nu V we need to read the policy so
[01:09:41] to nu V we need to read the policy so policy is just arc max of Q so I'm gonna
[01:09:46] policy is just arc max of Q so I'm gonna make this a little faster so the policy
[01:09:48] make this a little faster so the policy is just going to be well none if you're
[01:09:51] is just going to be well none if you're in an end state and otherwise it's just
[01:09:53] in an end state and otherwise it's just going to be arc max of our Q values so
[01:10:00] going to be arc max of our Q values so I'm just writing Arg max here pretty
[01:10:02] I'm just writing Arg max here pretty much I'm just returning the action that
[01:10:07] much I'm just returning the action that maximizes the Q and then we need to
[01:10:10] maximizes the Q and then we need to spend a bunch of time getting the
[01:10:11] spend a bunch of time getting the printing working so let me actually get
[01:10:16] yeah okay all right actually right here
[01:10:19] yeah okay all right actually right here so I'm running this function I'm rich
[01:10:21] so I'm running this function I'm rich I'm I'm writing out actually these are a
[01:10:24] I'm I'm writing out actually these are a little shifted weird States values and
[01:10:27] little shifted weird States values and then PI which is the policy okay so it
[01:10:30] then PI which is the policy okay so it starts off walk walk walk remember this
[01:10:32] starts off walk walk walk remember this is the case where we have 50 percent
[01:10:34] is the case where we have 50 percent probability of tram failing and with 50
[01:10:37] probability of tram failing and with 50 percent probability of translating these
[01:10:39] percent probability of translating these are the values we were gonna get and the
[01:10:41] are the values we were gonna get and the policies still walk until state five and
[01:10:45] policies still walk until state five and then take the tram from from state five
[01:10:48] then take the tram from from state five okay which is kind of interesting
[01:10:51] okay which is kind of interesting because the policy of the search problem
[01:10:53] because the policy of the search problem was the same thing too okay so the thing
[01:10:56] was the same thing too okay so the thing we can do is we can actually let me move
[01:10:59] we can do is we can actually let me move this little bit forward we can actually
[01:11:01] this little bit forward we can actually define this failed probability which
[01:11:03] define this failed probability which becomes just a variable so you can play
[01:11:05] becomes just a variable so you can play around with this if you pick different
[01:11:07] around with this if you pick different fail probabilities you're gonna get
[01:11:09] fail probabilities you're gonna get different policies so for example if you
[01:11:11] different policies so for example if you pick a failed probability that is large
[01:11:13] pick a failed probability that is large then probably like the policy is going
[01:11:16] then probably like the policy is going to be just just walk and never take the
[01:11:18] to be just just walk and never take the tram because the tram is failing all the
[01:11:20] tram because the tram is failing all the time but if you decide to take a failed
[01:11:23] time but if you decide to take a failed probability that's close to zero then
[01:11:26] probability that's close to zero then then this is your optimal policy which
[01:11:28] then this is your optimal policy which is close to the search problem so it's
[01:11:29] is close to the search problem so it's basically the solution to the search
[01:11:31] basically the solution to the search problem so
[01:11:32] problem so play around with this the code is online
[01:11:35] play around with this the code is online this was just value duration value
[01:11:40] this was just value duration value duration and use on this problem
[01:11:44] duration and use on this problem okay so I'm gonna skip this one too
[01:11:47] okay so I'm gonna skip this one too alright so yeah and then this is also
[01:11:55] alright so yeah and then this is also showing like how over multiple
[01:11:57] showing like how over multiple iterations you can kind of get to the
[01:11:59] iterations you can kind of get to the get to the optimal optimal value and
[01:12:01] get to the optimal optimal value and optimal policy using value duration so
[01:12:03] optimal policy using value duration so in one iterations it hasn't seen it yet
[01:12:06] in one iterations it hasn't seen it yet so it thinks that the value the optimal
[01:12:08] so it thinks that the value the optimal value is 1.85 it hasn't updated the
[01:12:10] value is 1.85 it hasn't updated the values and so like it I don't three
[01:12:13] values and so like it I don't three iterations it gets better but it hasn't
[01:12:16] iterations it gets better but it hasn't still updated it still thinks it can't
[01:12:17] still updated it still thinks it can't get to the other side and remember this
[01:12:20] get to the other side and remember this is a split probability of 10% but if I
[01:12:23] is a split probability of 10% but if I get to like I think 10 then it
[01:12:26] get to like I think 10 then it eventually learns the best policy is to
[01:12:29] eventually learns the best policy is to get to 20 and the value is 13 point 68
[01:12:32] get to 20 and the value is 13 point 68 and if you go even like higher
[01:12:34] and if you go even like higher iterations after that point it's just
[01:12:35] iterations after that point it's just fine-tuning so the values are around 13
[01:12:38] fine-tuning so the values are around 13 so you can play around you the okay no
[01:12:41] so you can play around you the okay no problem okay so when does this converge
[01:12:43] problem okay so when does this converge so if your discount factor is less than
[01:12:47] so if your discount factor is less than 1 or your MVP graph is a cyclic then
[01:12:51] 1 or your MVP graph is a cyclic then this is going to converge so if MVP
[01:12:53] this is going to converge so if MVP graph is a cyclic that's kind of obvious
[01:12:54] graph is a cyclic that's kind of obvious you're just doing dynamic programming
[01:12:56] you're just doing dynamic programming over your full thing so so that's going
[01:12:58] over your full thing so so that's going to that's going to convert if you have
[01:13:01] to that's going to convert if you have cycles you you want your your discount
[01:13:04] cycles you you want your your discount to be less than 1 because if your if you
[01:13:07] to be less than 1 because if your if you have cycles and your discount is let's
[01:13:08] have cycles and your discount is let's say 1 and let's say you're getting zero
[01:13:11] say 1 and let's say you're getting zero rewards from then you're never going to
[01:13:14] rewards from then you're never going to change
[01:13:14] change you're never going to move you move from
[01:13:16] you're never going to move you move from your state you're always going to be
[01:13:17] your state you're always going to be stuck in your state and if you have
[01:13:19] stuck in your state and if you have nonzero rewards you're going to get this
[01:13:21] nonzero rewards you're going to get this unbounded reward and keep going because
[01:13:23] unbounded reward and keep going because you have cycles and and it's just going
[01:13:25] you have cycles and and it's just going to end up becoming numerically so so
[01:13:28] to end up becoming numerically so so just a good rule of thumb is pick a
[01:13:29] just a good rule of thumb is pick a gamma that's less than 1 then then you
[01:13:31] gamma that's less than 1 then then you kind of get this convergence property ok
[01:13:34] kind of get this convergence property ok all right
[01:13:36] all right so yeah summary so far is we have mdps
[01:13:39] so yeah summary so far is we have mdps now we've talked about finding policies
[01:13:44] now we've talked about finding policies rather than paths
[01:13:46] rather than paths policy evaluation is just a way of
[01:13:48] policy evaluation is just a way of computing like how good a policy is and
[01:13:50] computing like how good a policy is and the reason I talk about policy
[01:13:51] the reason I talk about policy evaluation is there's this other
[01:13:53] evaluation is there's this other algorithm called policy iteration which
[01:13:55] algorithm called policy iteration which uses policy evaluation and we didn't
[01:13:57] uses policy evaluation and we didn't discuss that in the class but it's kind
[01:13:59] discuss that in the class but it's kind of like a quick not equivalent but you
[01:14:01] of like a quick not equivalent but you could use it in a similar manner as
[01:14:03] could use it in a similar manner as value iteration it has its pros and cons
[01:14:05] value iteration it has its pros and cons and so policy evaluation U is used in
[01:14:07] and so policy evaluation U is used in those settings do not leave please we
[01:14:09] those settings do not leave please we have more stuff to cover we have value
[01:14:14] have more stuff to cover we have value iteration which computes this optimal
[01:14:17] iteration which computes this optimal value which is the maximum expected
[01:14:19] value which is the maximum expected utility okay and next time you're going
[01:14:21] utility okay and next time you're going to talk about reinforcement learning and
[01:14:22] to talk about reinforcement learning and that's gonna be awesome so talk about
[01:14:25] that's gonna be awesome so talk about unknown rewards alright so that was MVPs
[01:14:30] unknown rewards alright so that was MVPs doing inference and and kind of defining
[01:14:34] doing inference and and kind of defining them I'm going back to the last lecture
[01:14:36] them I'm going back to the last lecture just to kind of talk about some of the
[01:14:38] just to kind of talk about some of the stuff that we didn't cover last time
[01:14:40] stuff that we didn't cover last time okay all right so if you remember last
[01:14:42] okay all right so if you remember last night we were talking about search
[01:14:43] night we were talking about search problems so big future search problems
[01:14:47] problems so big future search problems where we don't have probabilities and we
[01:14:50] where we don't have probabilities and we talked about a store as a way of just
[01:14:51] talked about a store as a way of just making things faster and we talked about
[01:14:53] making things faster and we talked about this idea of relaxations which was a way
[01:14:56] this idea of relaxations which was a way of finding good heuristics so a store
[01:14:59] of finding good heuristics so a store had this heuristic heuristic was an
[01:15:01] had this heuristic heuristic was an estimate of future cost we wanted to
[01:15:03] estimate of future cost we wanted to figure out how to find these heuristics
[01:15:05] figure out how to find these heuristics like how do we go about finding this
[01:15:07] like how do we go about finding this heuristic and one idea was just to relax
[01:15:09] heuristic and one idea was just to relax everything that allows you to come up
[01:15:11] everything that allows you to come up with an easier search problem or just
[01:15:13] with an easier search problem or just easier problem and that helps you to
[01:15:15] easier problem and that helps you to find what the heuristic is okay so so we
[01:15:19] find what the heuristic is okay so so we talked about this idea of removing
[01:15:20] talked about this idea of removing constraints and when you remove
[01:15:21] constraints and when you remove constraints then you can end up in nice
[01:15:24] constraints then you can end up in nice situations like in some settings you
[01:15:26] situations like in some settings you have a closed form solution in some
[01:15:28] have a closed form solution in some other settings you have just an easier
[01:15:29] other settings you have just an easier search problem and you can solve that
[01:15:30] search problem and you can solve that and in some other settings you have like
[01:15:32] and in some other settings you have like independent sell problems so when you
[01:15:34] independent sell problems so when you remove constraints then then you have
[01:15:37] remove constraints then then you have this easier problem you can solve that
[01:15:39] this easier problem you can solve that easier problem and that gives you a
[01:15:41] easier problem and that gives you a heuristic you're not done yet
[01:15:43] heuristic you're not done yet right you're you have a heuristic you
[01:15:45] right you're you have a heuristic you take that heuristic and then change your
[01:15:47] take that heuristic and then change your costs and and just run uniform cost
[01:15:49] costs and and just run uniform cost search on your original problem so so
[01:15:52] search on your original problem so so solving an easier problem is like you're
[01:15:54] solving an easier problem is like you're not done when you're solve the easy a
[01:15:56] not done when you're solve the easy a problem it just helps you to find the
[01:15:57] problem it just helps you to find the thing that helps for the origin
[01:16:00] thing that helps for the origin problem so it's kind of like a
[01:16:00] problem so it's kind of like a multi-step thing so examples of that is
[01:16:04] multi-step thing so examples of that is if you have walls remove all the walls
[01:16:06] if you have walls remove all the walls you have an easier problem if you solve
[01:16:08] you have an easier problem if you solve that easier problem that gives you a
[01:16:10] that easier problem that gives you a heuristic and in this case it's like
[01:16:12] heuristic and in this case it's like when you knock down these walls that
[01:16:13] when you knock down these walls that easier problem you have a closed-form
[01:16:16] easier problem you have a closed-form solution for it you don't need to do
[01:16:17] solution for it you don't need to do anything fancy you don't need to do
[01:16:18] anything fancy you don't need to do uniform cost search any of that you just
[01:16:20] uniform cost search any of that you just compute them in height and distance and
[01:16:21] compute them in height and distance and and that gives you a heuristic with that
[01:16:24] and that gives you a heuristic with that heuristic you go and solve your original
[01:16:26] heuristic you go and solve your original problem that was one example another
[01:16:28] problem that was one example another example is when you remove constraints
[01:16:31] example is when you remove constraints you have an easier search problem so you
[01:16:33] you have an easier search problem so you don't have closed form solutions but you
[01:16:35] don't have closed form solutions but you have an easier search problem so you
[01:16:36] have an easier search problem so you might have a really difficult search
[01:16:38] might have a really difficult search problem with a bunch of constraints that
[01:16:40] problem with a bunch of constraints that are hard to do remove the constraints so
[01:16:42] are hard to do remove the constraints so when you remove the constraints you have
[01:16:44] when you remove the constraints you have a relaxed problem which is just the
[01:16:46] a relaxed problem which is just the original problem without the constraint
[01:16:48] original problem without the constraint that's a search problem you can solve
[01:16:50] that's a search problem you can solve that search problem using uniform cost
[01:16:52] that search problem using uniform cost search or dynamic programming and and
[01:16:54] search or dynamic programming and and solving that allows you to find the
[01:16:56] solving that allows you to find the heuristic again you're not done yet
[01:16:58] heuristic again you're not done yet right you take the heuristic and then
[01:17:00] right you take the heuristic and then you go to the original problem change
[01:17:02] you go to the original problem change the cost and I'm drawing uniform hazard
[01:17:05] the cost and I'm drawing uniform hazard and just one quick kind of example here
[01:17:08] and just one quick kind of example here was when you're computing these relaxed
[01:17:10] was when you're computing these relaxed problems the thing you want to find is
[01:17:11] problems the thing you want to find is the future costs of this this relaxed
[01:17:14] the future costs of this this relaxed problem and to do that you have this
[01:17:16] problem and to do that you have this easier search problem you still need to
[01:17:18] easier search problem you still need to run uniform cost search or dynamic
[01:17:20] run uniform cost search or dynamic programming in this case if you decide
[01:17:22] programming in this case if you decide to run uniform cost search remember
[01:17:24] to run uniform cost search remember uniform cost search computes past cost
[01:17:26] uniform cost search computes past cost in this case I really want to compute
[01:17:28] in this case I really want to compute future costs so you need to do a bunch
[01:17:31] future costs so you need to do a bunch of engineering to get that working in
[01:17:33] of engineering to get that working in this particular case the relaxed problem
[01:17:35] this particular case the relaxed problem you need to reverse it because when you
[01:17:37] you need to reverse it because when you reverse it past cost of the reversed
[01:17:40] reverse it past cost of the reversed relaxed problem becomes future cost of
[01:17:43] relaxed problem becomes future cost of the relaxed problem if that makes sense
[01:17:45] the relaxed problem if that makes sense so so the way I'm reversing this is I'm
[01:17:47] so so the way I'm reversing this is I'm basically saying start to stay this N
[01:17:49] basically saying start to stay this N and state is 1 and my walk action takes
[01:17:52] and state is 1 and my walk action takes me to s minus 1 instead of s plus 1 and
[01:17:54] me to s minus 1 instead of s plus 1 and my Tran action takes me to s over 2
[01:17:56] my Tran action takes me to s over 2 instead of s times 2 and the whole
[01:17:59] instead of s times 2 and the whole reason I'm doing that is is that the
[01:18:01] reason I'm doing that is is that the past cost of this new problem is the
[01:18:03] past cost of this new problem is the future cost of the non reversed version
[01:18:05] future cost of the non reversed version ok because I need to use uniform cost
[01:18:08] ok because I need to use uniform cost search here so I run my uniform cost
[01:18:11] search here so I run my uniform cost search that gives me a heuristic
[01:18:13] search that gives me a heuristic and that heuristic gives me this future
[01:18:15] and that heuristic gives me this future cost of the relaxed problem and
[01:18:17] cost of the relaxed problem and everything will be great
[01:18:18] everything will be great another example is I can have
[01:18:20] another example is I can have independent sub problems using my new
[01:18:22] independent sub problems using my new RipStik so in this case like they have
[01:18:24] RipStik so in this case like they have these tiles they technically cannot
[01:18:26] these tiles they technically cannot overlap instead what we are allowing is
[01:18:29] overlap instead what we are allowing is you're allowing them to overlap so if
[01:18:30] you're allowing them to overlap so if they allow them to overlap I have eight
[01:18:32] they allow them to overlap I have eight independent subproblems that I can solve
[01:18:34] independent subproblems that I can solve these sub problems give me heuristics
[01:18:36] these sub problems give me heuristics and I can just go with them okay so so
[01:18:39] and I can just go with them okay so so these were just a bunch of examples and
[01:18:41] these were just a bunch of examples and kind of the key idea was reducing edge
[01:18:43] kind of the key idea was reducing edge like when we are coming up with this
[01:18:45] like when we are coming up with this relaxed problems you're reducing edge
[01:18:47] relaxed problems you're reducing edge costs from infinity to some finite cost
[01:18:50] costs from infinity to some finite cost okay so I'm getting rid of walls before
[01:18:52] okay so I'm getting rid of walls before I couldn't cross like it was infinity
[01:18:54] I couldn't cross like it was infinity cost of that was infinity but if I get
[01:18:55] cost of that was infinity but if I get rid of the wall I'm making it a finite
[01:18:57] rid of the wall I'm making it a finite cost so this type of method this is a
[01:19:02] cost so this type of method this is a general framework so the point I want to
[01:19:04] general framework so the point I want to make is generally you can talk about the
[01:19:06] make is generally you can talk about the relaxation of a search problem so if you
[01:19:08] relaxation of a search problem so if you have a search problem P a relaxation of
[01:19:10] have a search problem P a relaxation of a search problem I'm gonna call that P R
[01:19:12] a search problem I'm gonna call that P R ap R L is going to be a problem or the
[01:19:15] ap R L is going to be a problem or the cost of the relaxation for any state
[01:19:18] cost of the relaxation for any state action is less than or equal to cost of
[01:19:21] action is less than or equal to cost of state and action I'll take questions
[01:19:22] state and action I'll take questions afterwards all right so so that is a
[01:19:25] afterwards all right so so that is a relaxed problem okay so the cool thing
[01:19:28] relaxed problem okay so the cool thing about that is if you're given a relaxed
[01:19:30] about that is if you're given a relaxed problem then you can pick your heuristic
[01:19:33] problem then you can pick your heuristic to be the future cost of the relaxed
[01:19:35] to be the future cost of the relaxed problem and that is called the relaxed
[01:19:37] problem and that is called the relaxed heuristic okay so so this is kind of a
[01:19:39] heuristic okay so so this is kind of a recipe a general framework like if
[01:19:40] recipe a general framework like if someone asks you find a good heuristic
[01:19:42] someone asks you find a good heuristic find a relaxed problem future cost of
[01:19:45] find a relaxed problem future cost of the relaxed problem is a heuristic and
[01:19:48] the relaxed problem is a heuristic and the cool thing about that is it turns
[01:19:49] the cool thing about that is it turns out that that that that future cost of
[01:19:52] out that that that that future cost of the relaxed problem mature deciding to
[01:19:54] the relaxed problem mature deciding to be a heuristic is also consistent
[01:19:55] be a heuristic is also consistent because we talked about all these
[01:19:57] because we talked about all these consistency properties and and how you
[01:19:59] consistency properties and and how you want to find a heuristic to be
[01:20:00] want to find a heuristic to be consistent for the solution to be
[01:20:02] consistent for the solution to be correct and how in the world am I going
[01:20:04] correct and how in the world am I going to find a consistent heuristic well here
[01:20:06] to find a consistent heuristic well here is one here is one way of finding
[01:20:08] is one here is one way of finding consistent heuristics pick your problem
[01:20:10] consistent heuristics pick your problem make it relaxed making it relaxed means
[01:20:13] make it relaxed making it relaxed means that pick a cost that's less if we can
[01:20:15] that pick a cost that's less if we can pick a relaxed problem where the cost is
[01:20:17] pick a relaxed problem where the cost is less than the cost of the original
[01:20:18] less than the cost of the original problem and then future cost of that
[01:20:20] problem and then future cost of that relaxed problem is just going to be your
[01:20:22] relaxed problem is just going to be your heuristic and it's going to be
[01:20:24] heuristic and it's going to be consistent so proof of that is two lines
[01:20:27] consistent so proof of that is two lines skip that and and the quick think about
[01:20:30] skip that and and the quick think about this like what knows about this is
[01:20:31] this like what knows about this is there's a trade-off here there's a
[01:20:33] there's a trade-off here there's a trade-off between efficiency and
[01:20:35] trade-off between efficiency and tightness so sure like making things
[01:20:37] tightness so sure like making things relaxed and removing constraints it's
[01:20:39] relaxed and removing constraints it's kind of fun right you have this easier
[01:20:40] kind of fun right you have this easier problem and you just solve it and
[01:20:42] problem and you just solve it and everything is great about it but it's
[01:20:44] everything is great about it but it's not like there is kind of a trade-off
[01:20:46] not like there is kind of a trade-off between how tight you want your
[01:20:47] between how tight you want your heuristic to be like you shouldn't
[01:20:49] heuristic to be like you shouldn't remove too many constraints because if
[01:20:50] remove too many constraints because if you remove too many constraints then
[01:20:52] you remove too many constraints then your heuristic is not a good estimate of
[01:20:54] your heuristic is not a good estimate of future cost
[01:20:55] future cost remember your heuristic is supposed to
[01:20:57] remember your heuristic is supposed to be an estimate of future cost so so if
[01:20:59] be an estimate of future cost so so if it is not a good estimate of future cost
[01:21:00] it is not a good estimate of future cost and it's not tight then it's not that
[01:21:02] and it's not tight then it's not that great so so there is a balance between
[01:21:04] great so so there is a balance between how much you're removing you're
[01:21:05] how much you're removing you're considering your constraints and and how
[01:21:08] considering your constraints and and how that makes finding the heuristic easier
[01:21:11] that makes finding the heuristic easier versus the fact that you want your
[01:21:13] versus the fact that you want your heuristics to be tight and be close to
[01:21:15] heuristics to be tight and be close to your future costs so so don't remove
[01:21:16] your future costs so so don't remove everything leave some constraints and
[01:21:19] everything leave some constraints and then solve it and you can also do things
[01:21:22] then solve it and you can also do things like if you have two heuristics that are
[01:21:24] like if you have two heuristics that are both consistent you can take the max of
[01:21:26] both consistent you can take the max of that and you can take the max of that
[01:21:27] that and you can take the max of that it's a little bit more restrictive maybe
[01:21:29] it's a little bit more restrictive maybe maybe that is closer to your future
[01:21:31] maybe that is closer to your future costs and that is then you can actually
[01:21:33] costs and that is then you can actually show the max of that is also consistent
[01:21:35] show the max of that is also consistent okay so we talked about like relaxation
[01:21:39] okay so we talked about like relaxation say a start what a quick thing I want to
[01:21:41] say a start what a quick thing I want to mention because that wasn't very clear
[01:21:42] mention because that wasn't very clear last time it's structured this
[01:21:44] last time it's structured this perceptron we talked about that a little
[01:21:46] perceptron we talked about that a little bit too and we talked about convergence
[01:21:47] bit too and we talked about convergence of that so quick things on that
[01:21:49] of that so quick things on that structured perceptron actually converges
[01:21:51] structured perceptron actually converges there was this question that if we have
[01:21:53] there was this question that if we have if that if you have a patch that is
[01:21:57] if that if you have a patch that is let's say walk tram and and we end up
[01:22:01] let's say walk tram and and we end up recovering another path that is tram
[01:22:03] recovering another path that is tram walk is that bad is that good well turns
[01:22:07] walk is that bad is that good well turns out that the cost of both of these paths
[01:22:09] out that the cost of both of these paths are the same thing so if I end up
[01:22:10] are the same thing so if I end up getting this path that's perfectly fine
[01:22:12] getting this path that's perfectly fine to write like that that is also with the
[01:22:14] to write like that that is also with the same optimal weights in the example that
[01:22:17] same optimal weights in the example that we have shown in a tram example I don't
[01:22:19] we have shown in a tram example I don't think we are able to get two paths that
[01:22:21] think we are able to get two paths that look like this because of the nature of
[01:22:23] look like this because of the nature of the example so so in general things to
[01:22:26] the example so so in general things to remember from structures perceptron is
[01:22:28] remember from structures perceptron is it does converge it does converge in a
[01:22:30] it does converge it does converge in a way that it can recover the two Weiss
[01:22:32] way that it can recover the two Weiss but it doesn't necessarily get the exact
[01:22:35] but it doesn't necessarily get the exact double use as we saw last time right
[01:22:37] double use as we saw last time right like you might get two and four you
[01:22:38] like you might get two and four you might get foreign air so guys as long as
[01:22:40] might get foreign air so guys as long as you have the same
[01:22:40] you have the same relationships that that is enough but
[01:22:42] relationships that that is enough but but you're going to be able to get to
[01:22:44] but you're going to be able to get to actualize and it does convert so with
[01:22:47] actualize and it does convert so with that project conversation is going to be
[01:22:50] that project conversation is going to be next time do take a look at do take a
[01:22:53] next time do take a look at do take a look at the website so all the
[01:22:54] look at the website so all the information on the project is on the
[01:22:55] information on the project is on the website so you started thinking about it
[01:22:57] website so you started thinking about it look at the project page and that has
[01:22:59] look at the project page and that has something


================================================================================
LECTURE 020
================================================================================

Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019)

Source: https://www.youtube.com/watch?v=HpaHTfY52RQ

---

Transcript

[00:00:04] so this lecture is going to be on
[00:00:06] so this lecture is going to be on reinforcement learning I will in
[00:00:09] reinforcement learning I will in interest this time skip the quiz so so
[00:00:12] interest this time skip the quiz so so the way to think about how reinforcement
[00:00:15] the way to think about how reinforcement learning fits into what we've done so
[00:00:17] learning fits into what we've done so far is you remember this class has this
[00:00:21] far is you remember this class has this picture right so we talked about
[00:00:22] picture right so we talked about different models and we talked about
[00:00:24] different models and we talked about different algorithms inference
[00:00:26] different algorithms inference algorithms to be able to predict using
[00:00:28] algorithms to be able to predict using these models and answer queries and then
[00:00:30] these models and answer queries and then we have learning which is how do you
[00:00:33] we have learning which is how do you actually learn these models right so
[00:00:36] actually learn these models right so every type of model we go through we
[00:00:38] every type of model we go through we have to kind of check the boxes for each
[00:00:39] have to kind of check the boxes for each of these pieces so last lecture we
[00:00:44] of these pieces so last lecture we talked about Markov decision processes
[00:00:46] talked about Markov decision processes this is a kind of a modeling framework
[00:00:48] this is a kind of a modeling framework allows you to define models for example
[00:00:51] allows you to define models for example for crossing volcanoes or playing dice
[00:00:54] for crossing volcanoes or playing dice games or tram taking trams
[00:00:56] games or tram taking trams what about inference so what do we have
[00:00:59] what about inference so what do we have here last time we just had value
[00:01:01] here last time we just had value iteration and which allows you to
[00:01:05] iteration and which allows you to compute the optimal policy and policy
[00:01:06] compute the optimal policy and policy evaluation which allows you to estimate
[00:01:11] evaluation which allows you to estimate the value of a particular policy so
[00:01:14] the value of a particular policy so these are algorithms that operate on the
[00:01:18] these are algorithms that operate on the MVP right and we looked at these
[00:01:21] MVP right and we looked at these algorithms last time so this lecture is
[00:01:24] algorithms last time so this lecture is going to be about learning
[00:01:26] going to be about learning I'll just put RL for now RL is not an
[00:01:30] I'll just put RL for now RL is not an algorithm it's a kind of refers to the
[00:01:32] algorithm it's a kind of refers to the family of algorithms that fits in this
[00:01:35] family of algorithms that fits in this week but that's a way you should think
[00:01:37] week but that's a way you should think about it RL allows you to either
[00:01:40] about it RL allows you to either explicitly or implicitly as to my MVP s
[00:01:42] explicitly or implicitly as to my MVP s and then once you have that you can do
[00:01:44] and then once you have that you can do all these inference algorithms to figure
[00:01:49] all these inference algorithms to figure out what the optimal policy is okay so
[00:01:53] out what the optimal policy is okay so just to review so what is the MVP the
[00:01:58] just to review so what is the MVP the clearest way remember to think about it
[00:02:01] clearest way remember to think about it is it's in terms of a graph so you have
[00:02:04] is it's in terms of a graph so you have a set of states so in this dice game we
[00:02:07] a set of states so in this dice game we have in and n so we have a set of states
[00:02:10] have in and n so we have a set of states from every state you have a set of
[00:02:13] from every state you have a set of actions coming out so in this case stay
[00:02:18] actions coming out so in this case stay and quit actions take you to chance
[00:02:22] and quit actions take you to chance nodes where the you don't get to control
[00:02:26] nodes where the you don't get to control what happens but nature does and there's
[00:02:29] what happens but nature does and there's randomness so out of these chance nodes
[00:02:31] randomness so out of these chance nodes are transitions each transition takes
[00:02:35] are transitions each transition takes you into a state it has some probability
[00:02:37] you into a state it has some probability associated with it so two-thirds in this
[00:02:39] associated with it so two-thirds in this case it also has some reward associated
[00:02:41] case it also has some reward associated with it which you pick up along the way
[00:02:43] with it which you pick up along the way so naturally this has to be 1 3 4 and
[00:02:47] so naturally this has to be 1 3 4 and remember the last time this was
[00:02:49] remember the last time this was probability 1/10 okay so and then there
[00:02:56] probability 1/10 okay so and then there is no the discount factor which gamma
[00:02:59] is no the discount factor which gamma which is a number between 0 &amp; 1 tells
[00:03:01] which is a number between 0 &amp; 1 tells you how much you value the future for
[00:03:04] you how much you value the future for default you can think about it as 1 for
[00:03:06] default you can think about it as 1 for simplicity okay so this is a Markov
[00:03:10] simplicity okay so this is a Markov decision process and what do you do with
[00:03:14] decision process and what do you do with one of these things we have a notion of
[00:03:19] one of these things we have a notion of a policy and a policy
[00:03:23] let's see I'll write it over here so a
[00:03:27] let's see I'll write it over here so a policy denoted PI let me use screen so a
[00:03:33] policy denoted PI let me use screen so a policy pi is a mapping from States to
[00:03:38] policy pi is a mapping from States to action it tells you a policy when you
[00:03:39] action it tells you a policy when you apply it says when I land here where
[00:03:42] apply it says when I land here where should I go should I do stay or quit if
[00:03:44] should I go should I do stay or quit if I land well I mean this is kind of a
[00:03:46] I land well I mean this is kind of a simple MVP otherwise it usually would be
[00:03:48] simple MVP otherwise it usually would be more States and for every state blue
[00:03:50] more States and for every state blue circle it will tell you where to go and
[00:03:53] circle it will tell you where to go and when you run a policy what happens you
[00:03:58] when you run a policy what happens you get a path which I'm going to call an
[00:04:02] get a path which I'm going to call an episode so what do you do you start in
[00:04:04] episode so what do you do you start in state s0 not so that would be in in this
[00:04:07] state s0 not so that would be in in this particular example you take an action a1
[00:04:11] particular example you take an action a1 let's say stay you get some reward in
[00:04:15] let's say stay you get some reward in this case it would be for you end up in
[00:04:17] this case it would be for you end up in a new state oops s1 and suppose you go
[00:04:21] a new state oops s1 and suppose you go back to n and then you take another
[00:04:23] back to n and then you take another action maybe it's stay reward is for
[00:04:26] action maybe it's stay reward is for again and and so on right
[00:04:29] again and and so on right so this
[00:04:31] so this sequence is a path or in RL speak it's
[00:04:35] sequence is a path or in RL speak it's an episode
[00:04:38] an episode let's see so let me let me erase this
[00:04:42] let's see so let me let me erase this comment and so this is an episode and
[00:04:50] comment and so this is an episode and until you hit the end state and what
[00:04:54] until you hit the end state and what happens out of an episode you can look
[00:04:57] happens out of an episode you can look at a utility we're going to denote you
[00:05:00] at a utility we're going to denote you which is the discounted some of the
[00:05:02] which is the discounted some of the rewards along the way right so if you
[00:05:05] rewards along the way right so if you you know stayed three times and then
[00:05:09] you know stayed three times and then went there you would have utility of
[00:05:12] went there you would have utility of four plus four plus four plus four so
[00:05:14] four plus four plus four plus four so that would be sixteen okay so the last
[00:05:19] that would be sixteen okay so the last lecture we didn't really work with the
[00:05:22] lecture we didn't really work with the episodes and their utility because we
[00:05:25] episodes and their utility because we were able to define set of recurrences
[00:05:27] were able to define set of recurrences that computed the expected utility so
[00:05:31] that computed the expected utility so remember that we want to you know we
[00:05:35] remember that we want to you know we don't know what's going to happen so
[00:05:36] don't know what's going to happen so there's a distribution and in order to
[00:05:38] there's a distribution and in order to optimize something we have to turn it to
[00:05:40] optimize something we have to turn it to a number that's what expectation does so
[00:05:43] a number that's what expectation does so there's two concepts that we had from
[00:05:46] there's two concepts that we had from last time one is the value function of a
[00:05:49] last time one is the value function of a particular policy so V PI of S is the
[00:05:51] particular policy so V PI of S is the expected utility if you follow PI from s
[00:05:54] expected utility if you follow PI from s what does that mean that means if you
[00:05:57] what does that mean that means if you take a particular s so let's take in and
[00:06:00] take a particular s so let's take in and I put you there and you run the policy
[00:06:02] I put you there and you run the policy so stay and you traverse this graph you
[00:06:06] so stay and you traverse this graph you will have different utilities coming out
[00:06:08] will have different utilities coming out and the average of those is going to be
[00:06:10] and the average of those is going to be V PI of s similarly there's a Q value
[00:06:14] V PI of s similarly there's a Q value expected utility if you first take an
[00:06:17] expected utility if you first take an action from state s and then fall pi so
[00:06:19] action from state s and then fall pi so what does that mean that means if I put
[00:06:21] what does that mean that means if I put you on one of these read chance nodes
[00:06:23] you on one of these read chance nodes and you basically play out the game and
[00:06:27] and you basically play out the game and average the resulting utilities that you
[00:06:29] average the resulting utilities that you get what number do you get okay and we
[00:06:35] get what number do you get okay and we saw recurrences that related these two
[00:06:38] saw recurrences that related these two so V PI of s is U recurrence the name of
[00:06:43] so V PI of s is U recurrence the name of game is two kind of Dell
[00:06:44] game is two kind of Dell to some kind of simple problem so you
[00:06:47] to some kind of simple problem so you first look up what you're supposed to do
[00:06:49] first look up what you're supposed to do in s that's pious and that takes you in
[00:06:52] in s that's pious and that takes you in a chance node which is s comma PI R sub
[00:06:55] a chance node which is s comma PI R sub s and then you say hey how much utility
[00:06:57] s and then you say hey how much utility am I going to get from that node and
[00:06:59] am I going to get from that node and similarly from the the chance nodes you
[00:07:02] similarly from the the chance nodes you have to look at all the possible
[00:07:04] have to look at all the possible successors the probability of going into
[00:07:07] successors the probability of going into that successor of the immediate reward
[00:07:10] that successor of the immediate reward that you get along the edge plus the
[00:07:12] that you get along the edge plus the discounted reward of the kind of the
[00:07:16] discounted reward of the kind of the future when you end up in s prime okay
[00:07:22] future when you end up in s prime okay so any questions about this this is kind
[00:07:25] so any questions about this this is kind of a review of Markov decision processes
[00:07:27] of a review of Markov decision processes from last time okay so now we're able to
[00:07:40] from last time okay so now we're able to do something different okay so if you
[00:07:43] do something different okay so if you say goodbye to the transition and
[00:07:44] say goodbye to the transition and rewards
[00:07:45] rewards that's called reinforcement learning so
[00:07:47] that's called reinforcement learning so remember Markov decision processes I
[00:07:49] remember Markov decision processes I give you everything here and you just
[00:07:51] give you everything here and you just have to find the optimal policy and now
[00:07:54] have to find the optimal policy and now I'm gonna make life difficult by not
[00:07:58] I'm gonna make life difficult by not even telling you what rewards and what
[00:08:01] even telling you what rewards and what transitions you have to get okay so just
[00:08:04] transitions you have to get okay so just to give a kind of flavor of what that's
[00:08:06] to give a kind of flavor of what that's like let's play a game
[00:08:08] like let's play a game so I'm gonna need a volunteer
[00:08:11] so I'm gonna need a volunteer I'll give you the game but this
[00:08:13] I'll give you the game but this volunteer you have to have a lot of grit
[00:08:16] volunteer you have to have a lot of grit and persistence because this is not
[00:08:18] and persistence because this is not gonna be an easy game he has to be one
[00:08:20] gonna be an easy game he has to be one of those people that even though you're
[00:08:22] of those people that even though you're losing a lot you're still gonna not give
[00:08:25] losing a lot you're still gonna not give up okay so here's how the game works
[00:08:27] up okay so here's how the game works so for each round R equals one two three
[00:08:30] so for each round R equals one two three four or five six and so on you're just
[00:08:33] four or five six and so on you're just gonna choose a or b red pill or blue
[00:08:36] gonna choose a or b red pill or blue pill I guess and you you move to a new
[00:08:39] pill I guess and you you move to a new state so the state is here and you get
[00:08:43] state so the state is here and you get some rewards which I'm gonna show here
[00:08:45] some rewards which I'm gonna show here okay and the state is five comma zero
[00:08:49] okay and the state is five comma zero that's an initial state okay so
[00:08:52] that's an initial state okay so everything clear about the rules of the
[00:08:53] everything clear about the rules of the game
[00:08:55] game that's reinforcement learning all right
[00:08:57] that's reinforcement learning all right we don't know anything about how okay so
[00:09:00] we don't know anything about how okay so any volunteers how about you in the
[00:09:03] any volunteers how about you in the front okay that's an empty piece so that
[00:09:24] front okay that's an empty piece so that case I hope I warned you I'm glad this
[00:10:03] case I hope I warned you I'm glad this worked because last time it took a lot
[00:10:05] worked because last time it took a lot longer but you know so what did you have
[00:10:10] longer but you know so what did you have to do I mean you don't know what you try
[00:10:12] to do I mean you don't know what you try so you try a and B and then hopefully
[00:10:14] so you try a and B and then hopefully you're building MVP and you're your head
[00:10:17] you're building MVP and you're your head right yeah right okay
[00:10:19] right yeah right okay just smile and nod and you have to
[00:10:21] just smile and nod and you have to figure out how the game works right so
[00:10:23] figure out how the game works right so maybe you notice that hey a is no
[00:10:25] maybe you notice that hey a is no decrementing and B is going up but then
[00:10:27] decrementing and B is going up but then there's this other bit that gets flipped
[00:10:29] there's this other bit that gets flipped so can you figure this out in a process
[00:10:33] so can you figure this out in a process you're also trying to maximize reward
[00:10:34] you're also trying to maximize reward which apparently I guess wasn't it
[00:10:37] which apparently I guess wasn't it doesn't come until the very end because
[00:10:39] doesn't come until the very end because it's a cruel game okay so how do we
[00:10:44] it's a cruel game okay so how do we gonna algorithm to kind of do this and
[00:10:45] gonna algorithm to kind of do this and how do we think about cross doing this
[00:10:49] how do we think about cross doing this so just to kind of make the contrast
[00:10:52] so just to kind of make the contrast between MVPs and reinforcement learning
[00:10:54] between MVPs and reinforcement learning sharper so Markov decision process is an
[00:10:57] sharper so Markov decision process is an offline thing right so you already have
[00:11:00] offline thing right so you already have a mental model of how the work world
[00:11:01] a mental model of how the work world works that's the MVP that's all the
[00:11:03] works that's the MVP that's all the rewards in the transitions and the
[00:11:04] rewards in the transitions and the states and actions and you have to find
[00:11:07] states and actions and you have to find a policy to collect maximum rewards
[00:11:09] a policy to collect maximum rewards you have it all in your head so you just
[00:11:11] you have it all in your head so you just kind of think really hard about you know
[00:11:13] kind of think really hard about you know what is the best thing is I know if I do
[00:11:15] what is the best thing is I know if I do this action and I'll go here and you
[00:11:17] this action and I'll go here and you know look at the probabilities take the
[00:11:19] know look at the probabilities take the max or whatever so reinforcement
[00:11:21] max or whatever so reinforcement learning is very different you don't
[00:11:22] learning is very different you don't know how the world works so you can't
[00:11:23] know how the world works so you can't just sit there and think because
[00:11:25] just sit there and think because thinking isn't gonna help you figure out
[00:11:26] thinking isn't gonna help you figure out how the world works so you have to just
[00:11:28] how the world works so you have to just go out and perform actions in the world
[00:11:30] go out and perform actions in the world right and in doing so you hopefully
[00:11:33] right and in doing so you hopefully you'll learn something but also your
[00:11:34] you'll learn something but also your you'll get some rewards okay so since
[00:11:39] you'll get some rewards okay so since maybe formalize the paradigm of RL so
[00:11:44] maybe formalize the paradigm of RL so you can think about as an agent that's
[00:11:45] you can think about as an agent that's that's you and do you have the
[00:11:49] that's you and do you have the environment which is everything else
[00:11:50] environment which is everything else that's not agent the agent takes actions
[00:11:53] that's not agent the agent takes actions so that I sense action to the
[00:11:56] so that I sense action to the environment an environment just sends
[00:11:57] environment an environment just sends you back rewards and a new state and you
[00:12:00] you back rewards and a new state and you keep on doing this so what you have to
[00:12:04] keep on doing this so what you have to do is figure out first of all how am I
[00:12:07] do is figure out first of all how am I going to act if I'm in a particular
[00:12:09] going to act if I'm in a particular state s t minus one what action should I
[00:12:11] state s t minus one what action should I choose okay so that's one one question
[00:12:16] choose okay so that's one one question and then you're gonna get this reward
[00:12:18] and then you're gonna get this reward and observe a new state how what should
[00:12:20] and observe a new state how what should I do to update my mental model of the
[00:12:23] I do to update my mental model of the world okay so these are the main two
[00:12:25] world okay so these are the main two questions I'm gonna talk first about how
[00:12:28] questions I'm gonna talk first about how to update the parameters and then later
[00:12:31] to update the parameters and then later in the lecture I'm gonna come back to
[00:12:32] in the lecture I'm gonna come back to how do you actually go and you know
[00:12:34] how do you actually go and you know explore okay so I'm not gonna say much
[00:12:39] explore okay so I'm not gonna say much here but you know in the context of
[00:12:41] here but you know in the context of volcano crossing just to kind of think
[00:12:45] volcano crossing just to kind of think through things every time you play the
[00:12:47] through things every time you play the game right you're gonna get some utility
[00:12:49] game right you're gonna get some utility so you take of so this is the episode
[00:12:51] so you take of so this is the episode over here so ARS you're gonna sometimes
[00:12:54] over here so ARS you're gonna sometimes you you know fall into a pit sometimes
[00:12:56] you you know fall into a pit sometimes you go to a hut and based on these
[00:13:00] you go to a hut and based on these experiences if I didn't hadn't told you
[00:13:03] experiences if I didn't hadn't told you what any of the actions do and what's
[00:13:06] what any of the actions do and what's the probability or anything how would
[00:13:09] the probability or anything how would you kind of go about just kind of
[00:13:11] you kind of go about just kind of solving this problem that's that's the
[00:13:13] solving this problem that's that's the question okay so there's a bunch of
[00:13:17] question okay so there's a bunch of algorithms I think there's going to be
[00:13:18] algorithms I think there's going to be one two three four at least
[00:13:22] one two three four at least our algorithms that we're gonna talk
[00:13:24] our algorithms that we're gonna talk about with different characteristics but
[00:13:25] about with different characteristics but they're all gonna kind of build onto
[00:13:27] they're all gonna kind of build onto each other in some way so the first
[00:13:29] each other in some way so the first class of algorithms Monte Carlo methods
[00:13:31] class of algorithms Monte Carlo methods right so okay so when you're ever you're
[00:13:38] right so okay so when you're ever you're doing RL or any sort of learning the
[00:13:41] doing RL or any sort of learning the first thing you get is you just have
[00:13:43] first thing you get is you just have data let's let's suppose that you run
[00:13:46] data let's let's suppose that you run even a random policies you're just gonna
[00:13:48] even a random policies you're just gonna because in the beginning you don't know
[00:13:49] because in the beginning you don't know any better so you're just gonna try
[00:13:51] any better so you're just gonna try random actions and but in the process
[00:13:53] random actions and but in the process you're gonna see hey I tried this action
[00:13:55] you're gonna see hey I tried this action and a lot to this reward and and so on
[00:13:57] and a lot to this reward and and so on so in the concrete example just to make
[00:14:00] so in the concrete example just to make things open more crisp it's going to
[00:14:03] things open more crisp it's going to look something like in and then you take
[00:14:06] look something like in and then you take you know you did let's see let me try to
[00:14:09] you know you did let's see let me try to color coordinate this so you're in n you
[00:14:18] color coordinate this so you're in n you do stay and you get a free water four
[00:14:22] do stay and you get a free water four and then you're back in in you do a stay
[00:14:25] and then you're back in in you do a stay and then you get four and then maybe
[00:14:29] and then you get four and then maybe you're done you're out okay so this is
[00:14:31] you're done you're out okay so this is an example episode just to make things
[00:14:34] an example episode just to make things concrete so this is s 0 a 1 R 1 s 2 s 1
[00:14:40] concrete so this is s 0 a 1 R 1 s 2 s 1 incrementing too quickly a - R - s - ok
[00:14:46] incrementing too quickly a - R - s - ok ok so what should you do here all right
[00:14:52] ok so what should you do here all right so any ideas model-based Monte Carlo so
[00:14:58] so any ideas model-based Monte Carlo so if you have the MVP you would be done
[00:15:00] if you have the MVP you would be done but we don't have them be key we have
[00:15:02] but we don't have them be key we have data so what can we do
[00:15:09] yeah yeah let's try to build an MDP from
[00:15:13] yeah yeah let's try to build an MDP from that data okay so the idea is estimated
[00:15:18] that data okay so the idea is estimated MDP so intuitively we just need to
[00:15:24] MDP so intuitively we just need to figure out what the transitions and
[00:15:26] figure out what the transitions and rewards are and when we're done right
[00:15:27] rewards are and when we're done right so how do you do the transitions so the
[00:15:31] so how do you do the transitions so the transition says if I'm in state s and I
[00:15:34] transition says if I'm in state s and I take action a what will happen I don't
[00:15:36] take action a what will happen I don't know what will happen but let's see in
[00:15:38] know what will happen but let's see in the data what happened so I can go look
[00:15:40] the data what happened so I can go look at the number of times I went into a
[00:15:42] at the number of times I went into a particular s Prime and then divided over
[00:15:46] particular s Prime and then divided over a number of times I attempted any this
[00:15:49] a number of times I attempted any this action from that state at all and just
[00:15:51] action from that state at all and just take the ratio okay and for the rewards
[00:15:55] take the ratio okay and for the rewards this is actually fairly you know easy
[00:15:58] this is actually fairly you know easy when I because when I observe a reward
[00:16:01] when I because when I observe a reward from s a and s Prime I just write it
[00:16:05] from s a and s Prime I just write it down and say that's the reward okay okay
[00:16:09] down and say that's the reward okay okay so on the concrete example what does
[00:16:10] so on the concrete example what does this look like so remember now here's mt
[00:16:13] this look like so remember now here's mt b graph I don't know what the the
[00:16:15] b graph I don't know what the the transition distribution or the rewards
[00:16:18] transition distribution or the rewards are so let's suppose I get this
[00:16:23] are so let's suppose I get this trajectory what should I do
[00:16:25] trajectory what should I do so I get stay stay stay stay and I'm out
[00:16:28] so I get stay stay stay stay and I'm out okay
[00:16:29] okay so first I can write down the rewards of
[00:16:33] so first I can write down the rewards of for here and then I can estimate the
[00:16:38] for here and then I can estimate the probability of you know transitioning so
[00:16:42] probability of you know transitioning so three out of four times I went back to N
[00:16:44] three out of four times I went back to N one out of four times I went to N so I'm
[00:16:46] one out of four times I went to N so I'm gonna estimate this as 3/4 1/4 okay but
[00:16:50] gonna estimate this as 3/4 1/4 okay but then I suppose I get a new data point so
[00:16:53] then I suppose I get a new data point so I have stay stay and so what do I do
[00:16:57] I have stay stay and so what do I do I can add to these counts so everything
[00:17:00] I can add to these counts so everything is kind of cumulative so two more times
[00:17:02] is kind of cumulative so two more times I started one more time I went into N
[00:17:05] I started one more time I went into N and another time I went to n so this
[00:17:07] and another time I went to n so this becomes 4 out of 6 2 out of 6 and
[00:17:09] becomes 4 out of 6 2 out of 6 and suppose I see another time when I just
[00:17:11] suppose I see another time when I just go into n so I'm just gonna increment
[00:17:13] go into n so I'm just gonna increment this counter and now it's 3 out of 7 and
[00:17:15] this counter and now it's 3 out of 7 and 4 out of 7
[00:17:17] 4 out of 7 ok so pretty pretty simple
[00:17:21] ok so pretty pretty simple okay so for reasons I'm not going to get
[00:17:24] okay so for reasons I'm not going to get into this process actually you know
[00:17:26] into this process actually you know converges to the if you do this kind of
[00:17:28] converges to the if you do this kind of you know a million times you'll get
[00:17:31] you know a million times you'll get pretty accurate that question do we know
[00:17:40] pretty accurate that question do we know the number of states yes a question is
[00:17:43] the number of states yes a question is you don't know the rewards or the
[00:17:45] you don't know the rewards or the transitions but yes you do know the set
[00:17:48] transitions but yes you do know the set of states and the actions set of states
[00:17:52] of states and the actions set of states I guess you don't know have to know them
[00:17:54] I guess you don't know have to know them all events but you have just observe
[00:17:55] all events but you have just observe them as they come the actions you need
[00:17:57] them as they come the actions you need to know because you your agent and you
[00:17:59] to know because you your agent and you need to play the game yeah good question
[00:18:03] need to play the game yeah good question okay so yeah so the question is does
[00:18:14] okay so yeah so the question is does this work with variable rewards and if
[00:18:19] this work with variable rewards and if the reward is not a function of SAS
[00:18:22] the reward is not a function of SAS prime you would just take the average of
[00:18:26] prime you would just take the average of the rewards that you see yeah okay so so
[00:18:32] the rewards that you see yeah okay so so what do you do with this so after you
[00:18:34] what do you do with this so after you ask them eight the MVP so you know you
[00:18:35] ask them eight the MVP so you know you needed a transitions and rewards then
[00:18:39] needed a transitions and rewards then now we have MVP in mind it may not be
[00:18:42] now we have MVP in mind it may not be the exact right MVP because this is
[00:18:44] the exact right MVP because this is estimated from data so it's not gonna
[00:18:45] estimated from data so it's not gonna match it exactly but nonetheless we
[00:18:48] match it exactly but nonetheless we already have these tools from last time
[00:18:50] already have these tools from last time you can do value iteration to compute
[00:18:52] you can do value iteration to compute the optimal policy on it and then you
[00:18:55] the optimal policy on it and then you just you know you're done you run it on
[00:18:57] just you know you're done you run it on in practice you were probably kind of
[00:18:59] in practice you were probably kind of interleave the learning and the
[00:19:01] interleave the learning and the optimization but for simplicity we can
[00:19:04] optimization but for simplicity we can think about as a two stage where you
[00:19:06] think about as a two stage where you gather a bunch of data you estimate the
[00:19:07] gather a bunch of data you estimate the MTP and then you are off okay there's
[00:19:11] MTP and then you are off okay there's one problem here does you wanna know
[00:19:14] one problem here does you wanna know what the problem might be you can
[00:19:17] what the problem might be you can actually see it by looking on the slide
[00:19:25] yeah well with your face policy of
[00:19:28] yeah well with your face policy of always staying on never explore the wait
[00:19:30] always staying on never explore the wait branch of the world yeah yeah you didn't
[00:19:33] branch of the world yeah yeah you didn't explore this at all so you actually
[00:19:35] explore this at all so you actually don't know how much reward is here maybe
[00:19:37] don't know how much reward is here maybe like you know 100 right so so this is
[00:19:40] like you know 100 right so so this is this problem this kind of actually a
[00:19:43] this problem this kind of actually a pretty big problem that unless you have
[00:19:46] pretty big problem that unless you have a policy that actually goes and covers
[00:19:50] a policy that actually goes and covers all the the states you just won't know
[00:19:53] all the the states you just won't know right and this is kind of natural
[00:19:54] right and this is kind of natural because there could always be you know a
[00:19:56] because there could always be you know a lot of reward hiding under kind of one
[00:19:59] lot of reward hiding under kind of one state but unless you see it do you you
[00:20:01] state but unless you see it do you you don't do just so no okay so this is a
[00:20:04] don't do just so no okay so this is a kind of key idea key challenge I would
[00:20:08] kind of key idea key challenge I would say in reinforcement learning is
[00:20:09] say in reinforcement learning is exploration so you need to be able to
[00:20:12] exploration so you need to be able to explore the state space this is
[00:20:14] explore the state space this is different from normal machine learning
[00:20:16] different from normal machine learning where data just comes in passively and
[00:20:18] where data just comes in passively and you learn a nice function and then
[00:20:20] you learn a nice function and then you're you're done here you actually
[00:20:22] you're you're done here you actually have to figure out how to get the data
[00:20:24] have to figure out how to get the data and that's that's kind of one of the the
[00:20:26] and that's that's kind of one of the the key challenges of RL so we're gonna go
[00:20:33] key challenges of RL so we're gonna go back to this this problem and I'm not
[00:20:35] back to this this problem and I'm not really going to try to solve it now um
[00:20:37] really going to try to solve it now um for now you can just think about PI as a
[00:20:39] for now you can just think about PI as a random policy because a random policy
[00:20:41] random policy because a random policy you eventually will just you know hit
[00:20:43] you eventually will just you know hit everything for you know finite small
[00:20:46] everything for you know finite small state spaces okay so okay so that's the
[00:20:54] state spaces okay so okay so that's the basically end of the first algorithm let
[00:20:58] basically end of the first algorithm let me just write this over here so
[00:21:00] me just write this over here so algorithms we have model-based Monte
[00:21:05] algorithms we have model-based Monte Carlo and the model base is referring to
[00:21:10] Carlo and the model base is referring to the fact that we're estimating a model
[00:21:12] the fact that we're estimating a model that in particular to MVP the Monte
[00:21:15] that in particular to MVP the Monte Carlo part is just referring to the fact
[00:21:18] Carlo part is just referring to the fact that we're using samples to estimate a
[00:21:22] that we're using samples to estimate a model or you're basically applying the
[00:21:24] model or you're basically applying the policy multiple times and then
[00:21:26] policy multiple times and then estimating the model based on averages
[00:21:29] estimating the model based on averages okay so so now I'm going to present a
[00:21:34] okay so so now I'm going to present a different algorithm and it's called
[00:21:37] different algorithm and it's called model free
[00:21:39] model free to call oh and you might from the name
[00:21:45] to call oh and you might from the name guess what we might want to do is maybe
[00:21:48] guess what we might want to do is maybe we don't have to estimate this small
[00:21:50] we don't have to estimate this small okay and Wyatt why is that well what do
[00:21:54] okay and Wyatt why is that well what do we do with this model what we did was we
[00:21:58] we do with this model what we did was we you know presumably use value iteration
[00:22:01] you know presumably use value iteration - you know compute the optimal policy
[00:22:05] - you know compute the optimal policy and they remember this recurrence for
[00:22:09] and they remember this recurrence for computing Q OTT it's in terms of T and
[00:22:13] computing Q OTT it's in terms of T and reward but at the end of the day all you
[00:22:15] reward but at the end of the day all you need is Q opt if I told you Q oft si
[00:22:20] need is Q opt if I told you Q oft si which is what is aq opt si it's the the
[00:22:24] which is what is aq opt si it's the the maximum possible utility I could get if
[00:22:26] maximum possible utility I could get if I am in chance node si and I followed
[00:22:29] I am in chance node si and I followed often policy so clearly if I knew that
[00:22:31] often policy so clearly if I knew that then I would just produce often policy
[00:22:34] then I would just produce often policy and our you've done I don't even need to
[00:22:36] and our you've done I don't even need to know understand the rewards and
[00:22:38] know understand the rewards and transitions okay so with that insight is
[00:22:43] transitions okay so with that insight is model free learning which is that we're
[00:22:45] model free learning which is that we're just going to try to estimate Q opt you
[00:22:49] just going to try to estimate Q opt you know directly sometimes it can be a
[00:22:52] know directly sometimes it can be a little bit confusing what is meant by
[00:22:54] little bit confusing what is meant by model free so Q opt itself you can think
[00:22:57] model free so Q opt itself you can think about it as a model but in the context
[00:22:59] about it as a model but in the context of MDPs and reinforcement learning
[00:23:01] of MDPs and reinforcement learning generally people when they say model
[00:23:04] generally people when they say model free refers to the fact that there's no
[00:23:05] free refers to the fact that there's no MDP model not that there's no model in
[00:23:09] MDP model not that there's no model in general okay so so we're not going to
[00:23:15] general okay so so we're not going to get to Q opt yet
[00:23:18] get to Q opt yet that'll come later in a lecture so let's
[00:23:20] that'll come later in a lecture so let's warm up a little bit so here's our data
[00:23:23] warm up a little bit so here's our data staring at us remember let's look at a
[00:23:29] staring at us remember let's look at a related quantity so Q pi remember what Q
[00:23:31] related quantity so Q pi remember what Q pi is Q PI of Si is the expected utility
[00:23:34] pi is Q PI of Si is the expected utility if we start at s and you first take
[00:23:38] if we start at s and you first take action a and then follow a policy pi
[00:23:40] action a and then follow a policy pi right so in in I guess another way to
[00:23:45] right so in in I guess another way to write this is if you are at a particular
[00:23:50] write this is if you are at a particular time step T you can define you
[00:23:53] time step T you can define you t as the the discounted some of the
[00:23:56] t as the the discounted some of the rewards from that point on which is you
[00:23:58] rewards from that point on which is you know the reward immediately that you
[00:24:00] know the reward immediately that you would get plus a discounted pot and then
[00:24:02] would get plus a discounted pot and then I'm next time sad plus you know squared
[00:24:05] I'm next time sad plus you know squared discounted and then to time steps in the
[00:24:07] discounted and then to time steps in the future and so on and what you can do is
[00:24:13] future and so on and what you can do is you can try to estimate Q PI from this
[00:24:17] you can try to estimate Q PI from this utility right so this is the utility
[00:24:20] utility right so this is the utility that you get at a particular time step
[00:24:23] that you get at a particular time step so suppose you do the following so
[00:24:26] so suppose you do the following so suppose you average the utilities that
[00:24:28] suppose you average the utilities that you get only on the time steps where I
[00:24:32] you get only on the time steps where I was in a particular state s and I took
[00:24:35] was in a particular state s and I took an action a ok so you have it let's
[00:24:39] an action a ok so you have it let's suppose you have a bunch of episodes
[00:24:40] suppose you have a bunch of episodes right so here pictorially let's see
[00:24:47] right so here pictorially let's see here's another way to think about it so
[00:24:50] here's another way to think about it so I get a bunch of episodes I'm gonna do
[00:24:52] I get a bunch of episodes I'm gonna do some abstract drawing here so every time
[00:24:56] some abstract drawing here so every time you have you know si shows up here maybe
[00:24:59] you have you know si shows up here maybe it shows up here maybe it shows up here
[00:25:01] it shows up here maybe it shows up here maybe it shows up here you're gonna look
[00:25:03] maybe it shows up here you're gonna look at how much reward I get from that point
[00:25:05] at how much reward I get from that point on how much reward I get from here on
[00:25:07] on how much reward I get from here on how much reward do I get from here on
[00:25:09] how much reward do I get from here on and average them right so and there's a
[00:25:13] and average them right so and there's a kind of a technicality which is that if
[00:25:16] kind of a technicality which is that if si appears here and it also appears
[00:25:18] si appears here and it also appears after it then I'm not going to count
[00:25:21] after it then I'm not going to count that because I'm kind of if I do both
[00:25:23] that because I'm kind of if I do both I'm kind of double counting in fact it
[00:25:27] I'm kind of double counting in fact it works both ways but just conceptually
[00:25:29] works both ways but just conceptually it's easier to think about just taking
[00:25:31] it's easier to think about just taking the NS a if the same you don't kind of
[00:25:35] the NS a if the same you don't kind of go back to the same you know s position
[00:25:38] go back to the same you know s position ok so let's do that on a concrete
[00:25:41] ok so let's do that on a concrete example so Q PI let's just write it Q PI
[00:25:45] example so Q PI let's just write it Q PI as a is a thing where we're trying to
[00:25:47] as a is a thing where we're trying to estimate and this is a value associated
[00:25:51] estimate and this is a value associated with every chance node si so in
[00:25:53] with every chance node si so in particular I've drawn it here I need a
[00:25:55] particular I've drawn it here I need a value here and a value here okay so
[00:26:00] value here and a value here okay so suppose I get some data I say and then I
[00:26:03] suppose I get some data I say and then I got go to the end so what's my utility
[00:26:06] got go to the end so what's my utility here
[00:26:11] it's not a trick question
[00:26:13] it's not a trick question for yes some of for is for okay so now I
[00:26:18] for yes some of for is for okay so now I can say okay it's for that's my best
[00:26:21] can say okay it's for that's my best guess so far I mean I haven't seen
[00:26:22] guess so far I mean I haven't seen anything else maybe it's for so what
[00:26:25] anything else maybe it's for so what happens if I play the game again and I
[00:26:27] happens if I play the game again and I get four four so what's the utility here
[00:26:30] get four four so what's the utility here eight so then I update this to the
[00:26:34] eight so then I update this to the average of four and eight do it again I
[00:26:36] average of four and eight do it again I get sixteen to deny average in the
[00:26:39] get sixteen to deny average in the sixteen okay and and again you know I'm
[00:26:45] sixteen okay and and again you know I'm using stay so I don't learn anything
[00:26:47] using stay so I don't learn anything about this in practice you would
[00:26:48] about this in practice you would actually go explore this and figure out
[00:26:50] actually go explore this and figure out how much utility is saying there so in
[00:26:53] how much utility is saying there so in particular notice I'm not updating the
[00:26:55] particular notice I'm not updating the rewards nor the transitions because I'm
[00:26:57] rewards nor the transitions because I'm model free I just care about the Q
[00:26:59] model free I just care about the Q values that I get which are the values
[00:27:01] values that I get which are the values that sit at the no it's not on the edges
[00:27:06] okay so one caveat is that we are
[00:27:10] okay so one caveat is that we are estimating Q PI naught Q opt will
[00:27:12] estimating Q PI naught Q opt will revisit this later
[00:27:14] revisit this later and another thing to kind of note is the
[00:27:19] and another thing to kind of note is the difference between what is called on
[00:27:21] difference between what is called on policy and off policy okay so in
[00:27:27] policy and off policy okay so in reinforcement learning
[00:27:29] reinforcement learning you're always following some policy to
[00:27:32] you're always following some policy to get around the world right and that's
[00:27:36] get around the world right and that's generally called an exploration policy
[00:27:37] generally called an exploration policy or the control policy and then there's
[00:27:40] or the control policy and then there's usually some other thing that you're
[00:27:42] usually some other thing that you're trying to estimate usually the the value
[00:27:44] trying to estimate usually the the value of a particular policy and that policy
[00:27:46] of a particular policy and that policy could be the same or could be different
[00:27:48] could be the same or could be different so on policy means that we're estimating
[00:27:51] so on policy means that we're estimating the value of the policy that we're
[00:27:54] the value of the policy that we're following the generate data generate
[00:27:56] following the generate data generate policy our policy means that we're not
[00:27:58] policy our policy means that we're not okay so so in particular is model free
[00:28:05] okay so so in particular is model free Monte Carlo on policy or off policy it's
[00:28:11] Monte Carlo on policy or off policy it's on policy because I'm estimating q pi
[00:28:14] on policy because I'm estimating q pi knock you opt okay that's on policy and
[00:28:19] knock you opt okay that's on policy and off policy what about model-based
[00:28:23] off policy what about model-based Montecarlo I mean it's a little bit of a
[00:28:27] Montecarlo I mean it's a little bit of a slightly weird question but in
[00:28:33] slightly weird question but in model-based monte-carlo we're following
[00:28:35] model-based monte-carlo we're following some policy maybe even random policy but
[00:28:37] some policy maybe even random policy but we're estimating the transitions and
[00:28:39] we're estimating the transitions and rewards and from that we can compute the
[00:28:41] rewards and from that we can compute the optimal policy so you can you can think
[00:28:44] optimal policy so you can you can think about is a off policy but you know
[00:28:48] about is a off policy but you know that's maybe not completely standard
[00:28:52] that's maybe not completely standard okay so any questions about what model
[00:28:55] okay so any questions about what model free Monte Carlo is doing so let me just
[00:28:58] free Monte Carlo is doing so let me just actually write so what does the
[00:29:00] actually write so what does the model-based Monte College is doing it's
[00:29:02] model-based Monte College is doing it's trying to estimate the the transition
[00:29:05] trying to estimate the the transition and rewards and model 3 Monte Carlo is
[00:29:08] and rewards and model 3 Monte Carlo is trying to estimate the Q pi okay and
[00:29:15] trying to estimate the Q pi okay and just as a note I put hats on any letter
[00:29:18] just as a note I put hats on any letter that is supposed to be a quantity that's
[00:29:21] that is supposed to be a quantity that's estimated from data and that's what you
[00:29:23] estimated from data and that's what you know statisticians do to differentiate
[00:29:27] know statisticians do to differentiate them between whenever IQ PI that's the
[00:29:30] them between whenever IQ PI that's the true value of that you know policy which
[00:29:34] true value of that you know policy which you know I don't have ok any questions
[00:29:39] you know I don't have ok any questions about model 3 Monte Carlo both of these
[00:29:42] about model 3 Monte Carlo both of these algorithms are pretty simple right you
[00:29:44] algorithms are pretty simple right you just you know you look at the data and
[00:29:46] just you know you look at the data and you take averages yeah so question is
[00:29:55] you take averages yeah so question is model free making changes to a policy or
[00:29:59] model free making changes to a policy or as a fixed box so so this version I've
[00:30:01] as a fixed box so so this version I've given you is only for a fixed policy the
[00:30:04] given you is only for a fixed policy the general idea of model free as we'll see
[00:30:06] general idea of model free as we'll see you later
[00:30:07] you later you can also optimize a policy okay so
[00:30:14] you can also optimize a policy okay so so now what we're gonna do is we're
[00:30:17] so now what we're gonna do is we're gonna do theme and variations on model
[00:30:21] gonna do theme and variations on model free Monte Carlo actually where it's
[00:30:23] free Monte Carlo actually where it's gonna be the same algorithm but I just
[00:30:24] gonna be the same algorithm but I just want to interpret it in kind of slightly
[00:30:28] want to interpret it in kind of slightly different ways that will help us
[00:30:29] different ways that will help us generalize it in the future yeah
[00:30:32] generalize it in the future yeah problems where model three dozen
[00:30:34] problems where model three dozen employees are there certain problems
[00:30:37] employees are there certain problems where model free is better than
[00:30:38] where model free is better than model-based so this is actually a really
[00:30:40] model-based so this is actually a really interesting question right so you can
[00:30:44] interesting question right so you can show that if your model is correct if
[00:30:47] show that if your model is correct if your model of the world is correct model
[00:30:49] your model of the world is correct model base is kind of the way to go because it
[00:30:51] base is kind of the way to go because it will be more sample efficient meaning
[00:30:53] will be more sample efficient meaning that you need fewer data points but it's
[00:30:57] that you need fewer data points but it's really hard to get the model correct in
[00:30:59] really hard to get the model correct in the real world so recently especially
[00:31:03] the real world so recently especially with neo deep reinforcement learning
[00:31:04] with neo deep reinforcement learning people have gotten a lot of mileage by
[00:31:07] people have gotten a lot of mileage by just going model free because then jump
[00:31:10] just going model free because then jump your head a little bit you can model
[00:31:11] your head a little bit you can model this as a kind of a deep neural network
[00:31:13] this as a kind of a deep neural network and that gives you extraordinary
[00:31:14] and that gives you extraordinary flexibility and power without having to
[00:31:17] flexibility and power without having to solve the hard problem of Noah
[00:31:18] solve the hard problem of Noah constructing the MDP okay so so there's
[00:31:26] constructing the MDP okay so so there's kind of three ways you can think about
[00:31:27] kind of three ways you can think about this so the first we already talked
[00:31:30] this so the first we already talked about it is you know this average idea
[00:31:32] about it is you know this average idea so we're just looking at the utilities
[00:31:34] so we're just looking at the utilities that you see whenever you counter in SNA
[00:31:37] that you see whenever you counter in SNA and you just average them okay so here
[00:31:40] and you just average them okay so here is an equivalent formulation and the way
[00:31:45] is an equivalent formulation and the way it works is that for every si you that
[00:31:50] it works is that for every si you that you see so every time you see a
[00:31:51] you see so every time you see a particular sau sau sau and so on I'm
[00:31:56] particular sau sau sau and so on I'm going to perform the following update on
[00:31:59] going to perform the following update on so I'm gonna take my existing value and
[00:32:01] so I'm gonna take my existing value and I'm going to do a what is called a
[00:32:04] I'm going to do a what is called a convex combination so you know 1 - ADA
[00:32:07] convex combination so you know 1 - ADA and a - sum to 1 so it's you know kind
[00:32:09] and a - sum to 1 so it's you know kind of balancing between two things
[00:32:11] of balancing between two things balancing between the old value that I
[00:32:13] balancing between the old value that I had and the the new utility that I saw
[00:32:16] had and the the new utility that I saw ok and the ADA is set to be 1 over a 1
[00:32:20] ok and the ADA is set to be 1 over a 1 plus the number of updates ok so let me
[00:32:22] plus the number of updates ok so let me do a concrete example I think it'll make
[00:32:24] do a concrete example I think it'll make this very clear what's what's going on
[00:32:26] this very clear what's what's going on so suppose my data looks like this so I
[00:32:30] so suppose my data looks like this so I hacked at 4 and then a 1 and a 1 so
[00:32:35] hacked at 4 and then a 1 and a 1 so these are the utilities right that's
[00:32:37] these are the utilities right that's that's a you hear I'm ignoring the SNA
[00:32:39] that's a you hear I'm ignoring the SNA I'm just gonna assume that there's some
[00:32:41] I'm just gonna assume that there's some something
[00:32:42] something okay so first let's assume that Q pi is
[00:32:45] okay so first let's assume that Q pi is zero okay so the first time I do let's
[00:32:51] zero okay so the first time I do let's see number of updates I haven't done
[00:32:52] see number of updates I haven't done anything so it's 1 1 minus 0 so 0 times
[00:32:56] anything so it's 1 1 minus 0 so 0 times 0 plus 1 times 4 which is the first year
[00:33:01] 0 plus 1 times 4 which is the first year that comes in okay so this is 4 okay so
[00:33:06] that comes in okay so this is 4 okay so then that what about the next data point
[00:33:08] then that what about the next data point that comes in so I'm gonna take 1/2 now
[00:33:12] that comes in so I'm gonna take 1/2 now times 4 plus 1/2 times 1 which is a new
[00:33:16] times 4 plus 1/2 times 1 which is a new value that comes in and that is I'm
[00:33:19] value that comes in and that is I'm gonna write it as 4 plus 1 over 2
[00:33:23] gonna write it as 4 plus 1 over 2 okay so now okay just to keep track of
[00:33:26] okay so now okay just to keep track of things
[00:33:27] things this results in this this results in
[00:33:30] this results in this this results in this and then now we're running out of
[00:33:33] this and then now we're running out of space but hopefully I can so now on the
[00:33:36] space but hopefully I can so now on the third one I do two thirds so I have 4
[00:33:43] third one I do two thirds so I have 4 plus 1 over 2 times 2/3 plus actually I
[00:33:49] plus 1 over 2 times 2/3 plus actually I guess I should do 2/3 to be consistent
[00:33:53] guess I should do 2/3 to be consistent 2/3 times 4 plus 1 over 2 which is a
[00:33:56] 2/3 times 4 plus 1 over 2 which is a previous value that City and Q PI plus a
[00:33:59] previous value that City and Q PI plus a 1/3 times 1 which is a new value and
[00:34:02] 1/3 times 1 which is a new value and that gives me 4 plus 1 plus 1 over 3
[00:34:07] that gives me 4 plus 1 plus 1 over 3 right so you can see what's going on
[00:34:10] right so you can see what's going on here is that you know each each time I
[00:34:14] here is that you know each each time I have this you know sum over all the
[00:34:17] have this you know sum over all the acuity I've seen over the number of
[00:34:19] acuity I've seen over the number of times occurs and this ADA is set so that
[00:34:22] times occurs and this ADA is set so that next time I kind of cancel out the old
[00:34:24] next time I kind of cancel out the old account and I add the new count to the
[00:34:27] account and I add the new count to the denominator and it kind of all works out
[00:34:29] denominator and it kind of all works out so that and every time step what
[00:34:33] so that and every time step what actually is in Q PI is just the plain
[00:34:36] actually is in Q PI is just the plain average over all the numbers I've seen
[00:34:38] average over all the numbers I've seen before okay this is just kind of an
[00:34:43] before okay this is just kind of an algebraic trick to get this original
[00:34:47] algebraic trick to get this original formulation which is the notion of
[00:34:48] formulation which is the notion of average into this formulation which is a
[00:34:50] average into this formulation which is a notion of kind of you're trying to take
[00:34:55] notion of kind of you're trying to take a little bit of the old
[00:34:56] a little bit of the old thing and add a little bit of a new
[00:34:59] thing and add a little bit of a new thing okay
[00:35:04] thing okay so I guess I'm gonna call this I guess
[00:35:14] so I guess I'm gonna call this I guess combination
[00:35:15] combination I guess so that's the second
[00:35:17] I guess so that's the second interpretation there's a third
[00:35:20] interpretation there's a third interpretation here which you can think
[00:35:24] interpretation here which you can think about is in terms of stochastic gradient
[00:35:27] about is in terms of stochastic gradient descent so this is actually a kind of a
[00:35:28] descent so this is actually a kind of a simple algebraic manipulation so if you
[00:35:31] simple algebraic manipulation so if you look at this expression what is this so
[00:35:34] look at this expression what is this so you have 1 times Q pi so I'm going to
[00:35:38] you have 1 times Q pi so I'm going to pull it out and put it down here and
[00:35:39] pull it out and put it down here and then I'm gonna have minus a 2 times Q pi
[00:35:42] then I'm gonna have minus a 2 times Q pi that's this thing and then I also have
[00:35:44] that's this thing and then I also have an ADA a u so I'm gonna put kind of -
[00:35:49] an ADA a u so I'm gonna put kind of - you here and this is inside this print
[00:35:52] you here and this is inside this print so if you just you know do the algebra
[00:35:55] so if you just you know do the algebra you can see that these two are
[00:35:57] you can see that these two are equivalent so what's the point of this
[00:36:02] equivalent so what's the point of this right so where have you kind of seen
[00:36:06] right so where have you kind of seen this before something like maybe not not
[00:36:13] this before something like maybe not not this exact expression but something like
[00:36:15] this exact expression but something like it any ideas yeah when you looked at a
[00:36:20] it any ideas yeah when you looked at a saccadic radius and in the context of
[00:36:22] saccadic radius and in the context of the square loss for linear regression
[00:36:25] the square loss for linear regression right so remember we had these updates
[00:36:29] right so remember we had these updates that all looked like kind of prediction
[00:36:31] that all looked like kind of prediction - Targa which was you know the residual
[00:36:34] - Targa which was you know the residual and that was used to kind of update so
[00:36:37] and that was used to kind of update so one way to interpret this is this is
[00:36:41] one way to interpret this is this is kind of implicitly trying to do
[00:36:43] kind of implicitly trying to do stochastic gradient descent on the
[00:36:44] stochastic gradient descent on the objective which is a squared loss on the
[00:36:49] objective which is a squared loss on the the q pi value that you you you're
[00:36:53] the q pi value that you you you're trying to set and you which is the new
[00:36:58] trying to set and you which is the new piece of data that you got so think
[00:36:59] piece of data that you got so think about in regression this is the Y this
[00:37:02] about in regression this is the Y this is y you know the what the output is and
[00:37:05] is y you know the what the output is and you this is the model that's trying to
[00:37:07] you this is the model that's trying to predict it and you want those to be
[00:37:09] predict it and you want those to be close to each other
[00:37:10] close to each other okay okay so so those are kind of three
[00:37:17] okay okay so so those are kind of three views on basically this idea of
[00:37:21] views on basically this idea of averaging or incremental updates okay so
[00:37:26] averaging or incremental updates okay so it'll become clear why you know I did
[00:37:30] it'll become clear why you know I did this this isn't just to you know have
[00:37:31] this this isn't just to you know have fun okay so now let's see an example of
[00:37:37] fun okay so now let's see an example of model free Monte Carlo in action on this
[00:37:39] model free Monte Carlo in action on this the volcano game so remember here we
[00:37:42] the volcano game so remember here we have as an example and I'm going to set
[00:37:49] have as an example and I'm going to set the number of episodes to let's say a
[00:37:51] the number of episodes to let's say a thousand let's see what happens
[00:37:53] thousand let's see what happens so here okay so what is this kind of a
[00:37:57] so here okay so what is this kind of a grid like structure degree of triangles
[00:38:00] grid like structure degree of triangles denote so this remember is a state this
[00:38:03] denote so this remember is a state this is 2 comma 1 so what i'm doing here is
[00:38:07] is 2 comma 1 so what i'm doing here is dividing into four pieces which
[00:38:08] dividing into four pieces which corresponds to the four different action
[00:38:10] corresponds to the four different action so this triangle is 2 comma 1 north this
[00:38:13] so this triangle is 2 comma 1 north this triangle is 2 comma 1 east and so on
[00:38:16] triangle is 2 comma 1 east and so on okay so and the number here is the q pi
[00:38:19] okay so and the number here is the q pi value that i'm estimating along the way
[00:38:23] value that i'm estimating along the way okay so the policy I'm using is a
[00:38:27] okay so the policy I'm using is a complete random just move randomly and
[00:38:32] complete random just move randomly and I've run this a thousand times and we
[00:38:35] I've run this a thousand times and we see that the average utility is no minus
[00:38:40] see that the average utility is no minus 18 which is obviously not great okay but
[00:38:46] 18 which is obviously not great okay but this is an estimate of how well the
[00:38:48] this is an estimate of how well the random policy is doing so you know as
[00:38:51] random policy is doing so you know as advertised no random policy you would
[00:38:53] advertised no random policy you would expect to fall into a volcano quite
[00:38:55] expect to fall into a volcano quite often okay
[00:38:58] often okay and you can run this and sometimes you
[00:39:01] and you can run this and sometimes you get slightly different results but you
[00:39:02] get slightly different results but you know it's pretty much stable around
[00:39:04] know it's pretty much stable around minus 19 - 18 ok any questions about
[00:39:12] minus 19 - 18 ok any questions about this before we move on to different
[00:39:15] this before we move on to different algorithms okay so model base Monte
[00:39:19] algorithms okay so model base Monte Carlo were estimating the MVP lot of
[00:39:21] Carlo were estimating the MVP lot of free Monte Carlo we're just estimating a
[00:39:23] free Monte Carlo we're just estimating a Q
[00:39:24] Q values of a particular policy for now
[00:39:29] okay
[00:39:31] okay so so let's really visit what model free
[00:39:37] so so let's really visit what model free Monty Hall is doing so if you use a
[00:39:41] Monty Hall is doing so if you use a policy PI eco stay for the dice game you
[00:39:45] policy PI eco stay for the dice game you know you might get a bunch of different
[00:39:48] know you might get a bunch of different trajectories that come out these are
[00:39:50] trajectories that come out these are possible episodes and each episode you
[00:39:52] possible episodes and each episode you have a utility you know associated with
[00:39:54] have a utility you know associated with that and what model free Monte Carlo is
[00:39:58] that and what model free Monte Carlo is doing is it's using these utilities to
[00:40:02] doing is it's using these utilities to kind of update towards update uq pi
[00:40:07] kind of update towards update uq pi right so in particular like for example
[00:40:11] right so in particular like for example this you're saying okay I'm in I'm in
[00:40:14] this you're saying okay I'm in I'm in the in state and I you know take an
[00:40:17] the in state and I you know take an action stay we know what will happen
[00:40:19] action stay we know what will happen well in this case I got you know 16 and
[00:40:22] well in this case I got you know 16 and this case I got twelve and notice that
[00:40:25] this case I got twelve and notice that there's you know quite a bit of variance
[00:40:27] there's you know quite a bit of variance so on average this actually does the
[00:40:29] so on average this actually does the right thing right so um just by
[00:40:32] right thing right so um just by definition this is our unbiased
[00:40:35] definition this is our unbiased you know estimate if you do this a
[00:40:36] you know estimate if you do this a million times in average you're just
[00:40:38] million times in average you're just gonna get the right value which is
[00:40:39] gonna get the right value which is twelve in this case but the variance is
[00:40:43] twelve in this case but the variance is huge so if for example if you only do it
[00:40:46] huge so if for example if you only do it a few times you're not gonna get twelve
[00:40:48] a few times you're not gonna get twelve you might get something you know cited
[00:40:50] you might get something you know cited so how can we kind of counteract all
[00:40:53] so how can we kind of counteract all this this variance so the key idea
[00:40:56] this this variance so the key idea behind what we're gonna call
[00:40:58] behind what we're gonna call bootstrapping is is that you know we
[00:41:02] bootstrapping is is that you know we actually have you know some more
[00:41:05] actually have you know some more information here so we have this Q PI
[00:41:09] information here so we have this Q PI that we're estimating along the way
[00:41:11] that we're estimating along the way right so so this view is saying okay
[00:41:14] right so so this view is saying okay we're trying to estimate Q PI and then
[00:41:17] we're trying to estimate Q PI and then we're going to try to basically regress
[00:41:19] we're going to try to basically regress it against you know this data that we're
[00:41:21] it against you know this data that we're seeing but you know can we actually use
[00:41:23] seeing but you know can we actually use q pi itself to help you reduce the
[00:41:28] q pi itself to help you reduce the variance so so the idea here is I'm
[00:41:35] variance so so the idea here is I'm going to look at
[00:41:37] going to look at although paces where I you know I
[00:41:39] although paces where I you know I started in and I take stay I get a for
[00:41:42] started in and I take stay I get a for okay so I'm gonna say I get a for but
[00:41:47] okay so I'm gonna say I get a for but then after that point I'm actually just
[00:41:50] then after that point I'm actually just going to substitute this 11 in okay this
[00:41:55] going to substitute this 11 in okay this is kind of weird right because normally
[00:41:58] is kind of weird right because normally I would just see okay what would happen
[00:41:59] I would just see okay what would happen but what happens is kind of random on
[00:42:02] but what happens is kind of random on average is gonna be right but you know
[00:42:04] average is gonna be right but you know on any given case I'm a kid like you
[00:42:06] on any given case I'm a kid like you know 24 or something and the hope here
[00:42:10] know 24 or something and the hope here is that by using my current estimate
[00:42:13] is that by using my current estimate which isn't going to be right because if
[00:42:15] which isn't going to be right because if I were they were right I would be done
[00:42:17] I were they were right I would be done but hopefully it's kind of somewhat
[00:42:19] but hopefully it's kind of somewhat right and that will you know be no
[00:42:23] right and that will you know be no better than using the the kind of the
[00:42:25] better than using the the kind of the raw roll out value yeah question yes the
[00:42:36] raw roll out value yeah question yes the question is would you update the furnace
[00:42:38] question is would you update the furnace after each episode yes so all these
[00:42:42] after each episode yes so all these algorithms I haven't been explicit about
[00:42:44] algorithms I haven't been explicit about it is that you see an episode you update
[00:42:47] it is that you see an episode you update after you see it and then you get a new
[00:42:49] after you see it and then you get a new episode and so on yeah sometimes you
[00:42:52] episode and so on yeah sometimes you would even update before you're done
[00:42:54] would even update before you're done with episode okay so let me show this
[00:43:00] with episode okay so let me show this what this algorithm so this is a new
[00:43:06] what this algorithm so this is a new algorithm
[00:43:06] algorithm it's called sarsa does anyone know why
[00:43:11] it's called sarsa does anyone know why it's called sarsa oh yeah right so if
[00:43:16] it's called sarsa oh yeah right so if you look at this it's bill starts up and
[00:43:19] you look at this it's bill starts up and that's literally the reason why it's
[00:43:20] that's literally the reason why it's called sarsa so what is this algorithm
[00:43:24] called sarsa so what is this algorithm say so you're in a state s you took
[00:43:26] say so you're in a state s you took action a you've got a reward and then
[00:43:28] action a you've got a reward and then you ended up in state s prime and then
[00:43:30] you ended up in state s prime and then you took another action a prime so for
[00:43:32] you took another action a prime so for every kind of quintuple that you see
[00:43:34] every kind of quintuple that you see you're gonna perform this update okay so
[00:43:37] you're gonna perform this update okay so what is this update doing so this is the
[00:43:39] what is this update doing so this is the convex combination remember that we saw
[00:43:42] convex combination remember that we saw from before where you take a part of the
[00:43:45] from before where you take a part of the old value and then you try to merge them
[00:43:48] old value and then you try to merge them with the new value
[00:43:50] with the new value so what is the new value here this is
[00:43:53] so what is the new value here this is looking at just the immediate reward not
[00:43:55] looking at just the immediate reward not the full utility just a media reward
[00:43:57] the full utility just a media reward which is this for here and you're adding
[00:43:59] which is this for here and you're adding the discount which is one for now of
[00:44:01] the discount which is one for now of your estimate and remember what is the
[00:44:04] your estimate and remember what is the estimate trying to do
[00:44:05] estimate trying to do estimate is trying to be the expectation
[00:44:09] estimate is trying to be the expectation of rewards that you will get in the
[00:44:12] of rewards that you will get in the future so if this were actually coupon
[00:44:15] future so if this were actually coupon occupy hat then this will actually just
[00:44:17] occupy hat then this will actually just be strictly no better because that would
[00:44:19] be strictly no better because that would be just reducing the variance but you
[00:44:24] be just reducing the variance but you know of course this is not exactly right
[00:44:26] know of course this is not exactly right there is bias so it's 11 not 12 but the
[00:44:29] there is bias so it's 11 not 12 but the hope is that you know this is not you
[00:44:31] hope is that you know this is not you know bias bite you know too much okay so
[00:44:36] know bias bite you know too much okay so these would be the kind of the values
[00:44:38] these would be the kind of the values that you will be updating rather than
[00:44:40] that you will be updating rather than these kind of raw values here okay so
[00:44:46] these kind of raw values here okay so just to kind of compare them well okay
[00:44:48] just to kind of compare them well okay okay any questions about what sarsa is
[00:44:52] okay any questions about what sarsa is doing before I move on so maybe I'll
[00:45:01] doing before I move on so maybe I'll write something to try to be helpful
[00:45:05] write something to try to be helpful here so Q PI Marquis Monticello
[00:45:08] here so Q PI Marquis Monticello estimates Q PI based on you and sarsa is
[00:45:12] estimates Q PI based on you and sarsa is still Q PI hat but it's based on reward
[00:45:15] still Q PI hat but it's based on reward plus essentially Q PI hat I mean this is
[00:45:19] plus essentially Q PI hat I mean this is not like a velvet expression but
[00:45:21] not like a velvet expression but hopefully it's some symbols that will
[00:45:22] hopefully it's some symbols that will evoke the right memories okay so let's
[00:45:31] evoke the right memories okay so let's discuss the difference is so this is
[00:45:36] discuss the difference is so this is this whenever people say kind of
[00:45:38] this whenever people say kind of bootstrapping in the context of
[00:45:40] bootstrapping in the context of reinforcement learning this is kind of
[00:45:42] reinforcement learning this is kind of what they mean is that instead of using
[00:45:45] what they mean is that instead of using you as the prediction target you're
[00:45:47] you as the prediction target you're using R plus Q PI and this is kind of
[00:45:51] using R plus Q PI and this is kind of your pulling up yourself from your
[00:45:52] your pulling up yourself from your bootstraps because you're trying to ask
[00:45:54] bootstraps because you're trying to ask me Q PI but you don't know Q pi by using
[00:45:56] me Q PI but you don't know Q pi by using q pi two minutes
[00:45:58] q pi two minutes okay so us based on one path
[00:46:04] in salsa your based on the estimate
[00:46:06] in salsa your based on the estimate which is based on all your previous kind
[00:46:08] which is based on all your previous kind of experiences which means that this is
[00:46:12] of experiences which means that this is unbiased model free monitor colors bias
[00:46:16] unbiased model free monitor colors bias but sarsa is bias want to have has large
[00:46:19] but sarsa is bias want to have has large variance sarsa has you know smaller
[00:46:22] variance sarsa has you know smaller variance and one I guess a consequence
[00:46:26] variance and one I guess a consequence of the way the algorithm set up is that
[00:46:28] of the way the algorithm set up is that model fee Monte Carlo you have to kind
[00:46:30] model fee Monte Carlo you have to kind of roll out the entire game basically
[00:46:33] of roll out the entire game basically play the game or the MVP until you reach
[00:46:36] play the game or the MVP until you reach the terminal state and then you can now
[00:46:38] the terminal state and then you can now you have your you to update where as
[00:46:41] you have your you to update where as sarsa when or any sort of bootstrapping
[00:46:43] sarsa when or any sort of bootstrapping algorithm you can just immediately
[00:46:45] algorithm you can just immediately update because all you need to do is you
[00:46:48] update because all you need to do is you need to see this like a very local
[00:46:50] need to see this like a very local window of SA RSA and then you can just
[00:46:53] window of SA RSA and then you can just update and that can happen kind of you
[00:46:55] update and that can happen kind of you know anywhere you don't have to wait
[00:46:57] know anywhere you don't have to wait until the very end to get the value okay
[00:47:04] until the very end to get the value okay so just as a quick sanity check which of
[00:47:09] so just as a quick sanity check which of the following algorithms allows you to
[00:47:12] the following algorithms allows you to estimate Q opt okay so model-based Monte
[00:47:16] estimate Q opt okay so model-based Monte Carlo model free Monte Carlo or sarsa
[00:47:22] Carlo model free Monte Carlo or sarsa okay so I'll give you maybe 10 seconds
[00:47:26] okay so I'll give you maybe 10 seconds to ponder this okay how many more many
[00:47:38] to ponder this okay how many more many more time okay let's get a report I
[00:47:49] more time okay let's get a report I think I didn't reset it from last year
[00:47:51] think I didn't reset it from last year so this is includes last year's uh
[00:47:53] so this is includes last year's uh participants so model-based Monte Carlo
[00:47:57] participants so model-based Monte Carlo allows you to get Q opt right because
[00:48:00] allows you to get Q opt right because once you have that me P you can get
[00:48:02] once you have that me P you can get whatever you want you can kick you up
[00:48:03] whatever you want you can kick you up motive free Monte Carlo estimates q pi
[00:48:07] motive free Monte Carlo estimates q pi doesn't have to make you opt and sarsa
[00:48:11] doesn't have to make you opt and sarsa also has some it's q pi but doesn't has
[00:48:13] also has some it's q pi but doesn't has to make you opt okay
[00:48:17] to make you opt okay all right so so that's a kind of a
[00:48:21] all right so so that's a kind of a problem like I mean these algorithms are
[00:48:24] problem like I mean these algorithms are fine for estimating the value of a
[00:48:28] fine for estimating the value of a policy but you really want the optimal
[00:48:33] policy but you really want the optimal policy right in fact these can be used
[00:48:36] policy right in fact these can be used to improve the policy as well because
[00:48:38] to improve the policy as well because you can do something called policy
[00:48:41] you can do something called policy improvement which I didn't talk about
[00:48:43] improvement which I didn't talk about once you have the Q values you can
[00:48:45] once you have the Q values you can define the new policy based on the Q
[00:48:47] define the new policy based on the Q values but there's actually a kind of a
[00:48:49] values but there's actually a kind of a more direct way to do this okay so so
[00:48:53] more direct way to do this okay so so here's the kind of the way mental
[00:48:55] here's the kind of the way mental framework you should have in your head
[00:48:57] framework you should have in your head so there's two values Q PI and Q opt so
[00:49:00] so there's two values Q PI and Q opt so in MDPs we saw that policy evaluation
[00:49:03] in MDPs we saw that policy evaluation allows you to get Q PI value iteration
[00:49:05] allows you to get Q PI value iteration get allows you get a Q opt and now we're
[00:49:08] get allows you get a Q opt and now we're doing reinforcement learning here we saw
[00:49:10] doing reinforcement learning here we saw a model free Monte Carlo in star sellout
[00:49:12] a model free Monte Carlo in star sellout you get Q PI and now we need I'm gonna
[00:49:16] you get Q PI and now we need I'm gonna show you a new algorithm called Q
[00:49:17] show you a new algorithm called Q learning that allows you to get Q optin
[00:49:26] so this gives you Q opt and it's based
[00:49:30] so this gives you Q opt and it's based on reward plus Q opt okay so this is
[00:49:38] on reward plus Q opt okay so this is going to be very similar to sarsa it's
[00:49:41] going to be very similar to sarsa it's only gonna differ by essentially as you
[00:49:44] only gonna differ by essentially as you might guess the same difference between
[00:49:46] might guess the same difference between policy evaluation and value iteration
[00:49:48] policy evaluation and value iteration okay so it's helpful to go back to kind
[00:49:53] okay so it's helpful to go back to kind of the MVP recurrences so even though
[00:49:55] of the MVP recurrences so even though MVP recurrences can only apply when you
[00:49:57] MVP recurrences can only apply when you know the MVP for deriving and
[00:49:59] know the MVP for deriving and reinforcement learning algorithms it's
[00:50:01] reinforcement learning algorithms it's they can kind of give you inspiration
[00:50:03] they can kind of give you inspiration for the actual algorithm okay so
[00:50:06] for the actual algorithm okay so remember a Q opt what is Q opt to the Q
[00:50:09] remember a Q opt what is Q opt to the Q opt is considering all possible
[00:50:11] opt is considering all possible successors the probability immediate
[00:50:13] successors the probability immediate reward plus future returns okay so the Q
[00:50:17] reward plus future returns okay so the Q learning is this actually really kind of
[00:50:19] learning is this actually really kind of clever idea and it's it could also be
[00:50:23] clever idea and it's it could also be called czars stars I guess but maybe you
[00:50:26] called czars stars I guess but maybe you don't want to call it that
[00:50:29] and what it does is as follows so this
[00:50:33] and what it does is as follows so this has the same form the convex combination
[00:50:35] has the same form the convex combination of the old value and the the new value
[00:50:40] of the old value and the the new value right so what is the new value so if you
[00:50:46] right so what is the new value so if you look at Q opt Q opt is looking at
[00:50:49] look at Q opt Q opt is looking at different successors reward plus V opt
[00:50:52] different successors reward plus V opt what we're gonna do is well we don't
[00:50:55] what we're gonna do is well we don't have all we're not going to be able to
[00:50:56] have all we're not going to be able to some of our successors because when our
[00:50:58] some of our successors because when our reinforcement learning setting and we
[00:50:59] reinforcement learning setting and we only saw one particular successor so
[00:51:02] only saw one particular successor so let's just use that successor so on that
[00:51:04] let's just use that successor so on that successor we're going to get the reward
[00:51:06] successor we're going to get the reward so R is a stand-in for the actual reward
[00:51:09] so R is a stand-in for the actual reward I mean is the stand-in for the reward
[00:51:12] I mean is the stand-in for the reward reward function and then you have gamma
[00:51:15] reward function and then you have gamma times and then V opt I am going to
[00:51:19] times and then V opt I am going to replace it with our estimate of what V
[00:51:23] replace it with our estimate of what V opt is and what should it be estimate of
[00:51:26] opt is and what should it be estimate of V off to be so what relates V up to Q
[00:51:35] V off to be so what relates V up to Q opt yeah yeah exactly so you define V
[00:51:47] opt yeah yeah exactly so you define V off to be the max over all possible
[00:51:49] off to be the max over all possible actions of Q opto of s in that
[00:51:51] actions of Q opto of s in that particular action then this is V opt
[00:51:55] particular action then this is V opt right so Q is saying I'm in a chance
[00:51:57] right so Q is saying I'm in a chance node how much what is optimal utility I
[00:52:01] node how much what is optimal utility I can get provided I took an action
[00:52:03] can get provided I took an action clearly the best thing to do if you're
[00:52:06] clearly the best thing to do if you're at a state is just choose action that
[00:52:08] at a state is just choose action that gives you the maximum Q value that you
[00:52:11] gives you the maximum Q value that you get okay so that that's just no Q
[00:52:15] get okay so that that's just no Q learning so let's put it side-by-side
[00:52:16] learning so let's put it side-by-side with sarsa okay so sarsa these two are
[00:52:22] with sarsa okay so sarsa these two are very similar
[00:52:23] very similar right so sarsa remember updates against
[00:52:26] right so sarsa remember updates against r plus a q pi and now we're updating
[00:52:30] r plus a q pi and now we're updating against our plus this max over Q opt
[00:52:36] against our plus this max over Q opt okay
[00:52:37] okay and you can see that salsa requires
[00:52:39] and you can see that salsa requires knowing what action I'm gonna take next
[00:52:42] knowing what action I'm gonna take next kind of a one step look ahead a prime
[00:52:46] kind of a one step look ahead a prime and it plugs it into here whereas Q
[00:52:49] and it plugs it into here whereas Q learning it doesn't matter what a you
[00:52:50] learning it doesn't matter what a you took because I'm good just gonna take
[00:52:53] took because I'm good just gonna take the one that maximizes right so you can
[00:52:56] the one that maximizes right so you can see why salsa dip is estimating the
[00:52:59] see why salsa dip is estimating the value of a policy because you know what
[00:53:01] value of a policy because you know what a prime shows up here is you know a
[00:53:04] a prime shows up here is you know a function of a policy and here I'm kind
[00:53:08] function of a policy and here I'm kind of insulated from that because I'm just
[00:53:09] of insulated from that because I'm just taking the maximum overall actions this
[00:53:11] taking the maximum overall actions this is the same intuition as for value
[00:53:13] is the same intuition as for value iteration versus a policy evaluation
[00:53:19] okay I'll pause here any questions
[00:53:22] okay I'll pause here any questions Q learning versus sarsa Q learning on
[00:53:28] Q learning versus sarsa Q learning on policy or off policy it's off policy
[00:53:34] policy or off policy it's off policy because I'm following whatever policy
[00:53:36] because I'm following whatever policy I'm following and I you get to estimate
[00:53:39] I'm following and I you get to estimate the value of the optimal policy which is
[00:53:41] the value of the optimal policy which is probably not the one I'm following how
[00:53:42] probably not the one I'm following how they seem the beginning okay so let's
[00:53:46] they seem the beginning okay so let's look at the example here so here's sarsa
[00:53:49] look at the example here so here's sarsa and run for a thousand iterations and
[00:53:52] and run for a thousand iterations and like model free Monte Carlo this I'm
[00:53:56] like model free Monte Carlo this I'm estimating that average the average
[00:53:59] estimating that average the average utility I'm getting is minus 20 and in
[00:54:01] utility I'm getting is minus 20 and in particularly the values I'm getting are
[00:54:03] particularly the values I'm getting are all very negative because this is Q PI
[00:54:06] all very negative because this is Q PI this is a policy I'm following which is
[00:54:08] this is a policy I'm following which is the random policy if I replace this with
[00:54:12] the random policy if I replace this with Q what happens
[00:54:14] Q what happens so first notice that the average utility
[00:54:18] so first notice that the average utility is still minus 19 because I actually
[00:54:20] is still minus 19 because I actually haven't changed my exploration policy
[00:54:23] haven't changed my exploration policy I'm still doing random exploration well
[00:54:26] I'm still doing random exploration well yeah I'm still doing random exploration
[00:54:29] yeah I'm still doing random exploration but notice that the value the Q up
[00:54:32] but notice that the value the Q up values are all around you know 20 right
[00:54:37] values are all around you know 20 right and this is because the optimal policy
[00:54:38] and this is because the optimal policy remember is just 2 and this is a slip
[00:54:41] remember is just 2 and this is a slip probability 0 so although policy is just
[00:54:43] probability 0 so although policy is just to go down here and get your 20 okay and
[00:54:46] to go down here and get your 20 okay and Q and I guess it's kind of interesting
[00:54:49] Q and I guess it's kind of interesting that Q learning I'm just blindly
[00:54:51] that Q learning I'm just blindly following the policy running you know
[00:54:53] following the policy running you know off off the cliff into the volcano all
[00:54:55] off off the cliff into the volcano all the time
[00:54:56] the time but you know I'm learning something and
[00:54:59] but you know I'm learning something and I'm learning how to behave optimally
[00:55:02] I'm learning how to behave optimally even though I'm not behaving optimally
[00:55:04] even though I'm not behaving optimally and that's a kind of a hallmark of off
[00:55:07] and that's a kind of a hallmark of off policy learning okay so any questions
[00:55:16] policy learning okay so any questions about these four algorithms so model
[00:55:20] about these four algorithms so model based Monte Carlo estimate MVP model
[00:55:22] based Monte Carlo estimate MVP model free Monte Carlo estimate the Q value of
[00:55:27] free Monte Carlo estimate the Q value of this policy based on the actual returns
[00:55:30] this policy based on the actual returns that you get the actual sum of the
[00:55:33] that you get the actual sum of the rewards sarsa is bootstrapping to
[00:55:36] rewards sarsa is bootstrapping to estimating the same thing but with kind
[00:55:38] estimating the same thing but with kind of a one step look ahead and Q learning
[00:55:40] of a one step look ahead and Q learning is like sarsa except for I'm estimating
[00:55:42] is like sarsa except for I'm estimating the optimal instead of a fixed policy pi
[00:55:46] the optimal instead of a fixed policy pi yeah
[00:55:49] Garza is on policy because I'm
[00:55:53] Garza is on policy because I'm estimating Q PI all right okay so now
[00:56:01] estimating Q PI all right okay so now let's talk about covering the unknown so
[00:56:03] let's talk about covering the unknown so these are the algorithms so at this
[00:56:05] these are the algorithms so at this point if I just hand you some data if I
[00:56:08] point if I just hand you some data if I told you here's a fixed policy here's
[00:56:10] told you here's a fixed policy here's some data you can actually ask me all
[00:56:12] some data you can actually ask me all these quantities but now there's a
[00:56:16] these quantities but now there's a question of exploration which we saw was
[00:56:18] question of exploration which we saw was really important because if you don't
[00:56:20] really important because if you don't even even see all the states how can you
[00:56:22] even even see all the states how can you possibly act optimally so so which
[00:56:27] possibly act optimally so so which exploration policy should you use so
[00:56:30] exploration policy should you use so here are kind of two extremes
[00:56:33] here are kind of two extremes so the first extreme is let's just set
[00:56:36] so the first extreme is let's just set the exploration policy so so imagine
[00:56:40] the exploration policy so so imagine we're doing Q learning now so you have
[00:56:41] we're doing Q learning now so you have this Q opt estimate so it's not their
[00:56:44] this Q opt estimate so it's not their true Q Appa you have an estimate of Q
[00:56:46] true Q Appa you have an estimate of Q opt the naivety thing to do is just take
[00:56:49] opt the naivety thing to do is just take a use that kuat figure out which action
[00:56:52] a use that kuat figure out which action is best and just always do that action
[00:56:55] is best and just always do that action okay so what happens when you do this is
[00:57:00] you don't do very well so why don't you
[00:57:05] you don't do very well so why don't you do very well because initially well you
[00:57:08] do very well because initially well you explore randomly
[00:57:10] explore randomly and soon you you find the two and once
[00:57:15] and soon you you find the two and once you've found that two you say uh well
[00:57:17] you've found that two you say uh well two is better than zero zero zero so I'm
[00:57:19] two is better than zero zero zero so I'm just gonna keep on going down to the two
[00:57:21] just gonna keep on going down to the two which is your all exploitation no
[00:57:24] which is your all exploitation no exploration right you don't realize you
[00:57:28] exploration right you don't realize you that there's all this other stuff over
[00:57:30] that there's all this other stuff over here so in the other direction we have
[00:57:35] here so in the other direction we have no exploitation all exploration here you
[00:57:43] no exploitation all exploration here you kind of have the opposite set up where
[00:57:45] kind of have the opposite set up where I'm running a q-learning right so as we
[00:57:48] I'm running a q-learning right so as we saw before I'm actually able to estimate
[00:57:50] saw before I'm actually able to estimate the the the Q opt values so I learn a
[00:57:54] the the the Q opt values so I learn a lot but the average utility which is the
[00:57:57] lot but the average utility which is the actual utility I'm getting by playing
[00:57:59] actual utility I'm getting by playing this game is pretty bad in pretty good
[00:58:03] this game is pretty bad in pretty good it's the you to you get from just you
[00:58:06] it's the you to you get from just you know moving randomly so kind of what you
[00:58:10] know moving randomly so kind of what you really want to do is balance you know
[00:58:13] really want to do is balance you know exploration exploitation so just kind of
[00:58:17] exploration exploitation so just kind of a kind of a site or a commentary is that
[00:58:21] a kind of a site or a commentary is that I really feel reinforcement learning
[00:58:22] I really feel reinforcement learning kind of captures our life pretty well
[00:58:27] kind of captures our life pretty well because in life there's you know you
[00:58:29] because in life there's you know you don't know what's going on you want to
[00:58:32] don't know what's going on you want to get rewards you know you want to do well
[00:58:34] get rewards you know you want to do well and but at the same time time you have
[00:58:39] and but at the same time time you have to kind of learn about how the world
[00:58:42] to kind of learn about how the world works so that you can kind of improve
[00:58:43] works so that you can kind of improve your your policy so if you think about
[00:58:45] your your policy so if you think about going to restaurants or finding the
[00:58:47] going to restaurants or finding the shortest path better way to get to to
[00:58:51] shortest path better way to get to to school or to work or in research even
[00:58:54] school or to work or in research even when you're trying to figure out a
[00:58:56] when you're trying to figure out a problem you can work on the thing that
[00:58:58] problem you can work on the thing that you know how to do and will definitely
[00:58:59] you know how to do and will definitely work or you know do you try to do
[00:59:03] work or you know do you try to do something new in hopes of you learning
[00:59:05] something new in hopes of you learning something but maybe it won't get you as
[00:59:07] something but maybe it won't get you as high reward
[00:59:08] high reward so hopefully reinforcement learning is I
[00:59:11] so hopefully reinforcement learning is I know it's kind of a metaphor for life
[00:59:13] know it's kind of a metaphor for life anyways okay so back to concrete stuff
[00:59:21] anyways okay so back to concrete stuff so here is one
[00:59:24] so here is one you can balance exploration exploitation
[00:59:26] you can balance exploration exploitation right so it's called the epsilon greedy
[00:59:29] right so it's called the epsilon greedy policy and this assumes that you're
[00:59:32] policy and this assumes that you're doing something like q-learning so you
[00:59:33] doing something like q-learning so you have these Q up values and the idea is
[00:59:36] have these Q up values and the idea is that you know with probability 1 minus
[00:59:39] that you know with probability 1 minus epsilon where epsilon is let's say like
[00:59:41] epsilon where epsilon is let's say like point 1 you're just gonna give exploit
[00:59:44] point 1 you're just gonna give exploit we're just gonna do give you a give it
[00:59:47] we're just gonna do give you a give it all you you have and then once in a
[00:59:51] all you you have and then once in a while you're just gonna do something
[00:59:53] while you're just gonna do something random okay so this is actually not a
[00:59:56] random okay so this is actually not a bad policy to act in life so once in a
[00:59:58] bad policy to act in life so once in a while maybe you should just do something
[00:59:59] while maybe you should just do something random and you kind of see what happens
[01:00:01] random and you kind of see what happens so if you do this what do you get okay
[01:00:06] so if you do this what do you get okay so what I've done here is I've set
[01:00:10] so what I've done here is I've set epsilon to be starting with 1 so 1 is
[01:00:14] epsilon to be starting with 1 so 1 is all exploration and then I'm going to
[01:00:18] all exploration and then I'm going to change the value a third of the way into
[01:00:21] change the value a third of the way into 0.5 and then I'm gonna 2/3 of way I'm
[01:00:24] 0.5 and then I'm gonna 2/3 of way I'm going to change it to 0 okay so if I do
[01:00:27] going to change it to 0 okay so if I do this then I actually estimate the values
[01:00:30] this then I actually estimate the values really really well and also I get
[01:00:34] really really well and also I get utility which is you know pretty good
[01:00:37] utility which is you know pretty good you know 32 okay and this is also kind
[01:00:44] you know 32 okay and this is also kind of something that happens as you get
[01:00:47] of something that happens as you get older you tend to explore earth less and
[01:00:50] older you tend to explore earth less and explore it just happens ok alright so
[01:00:56] explore it just happens ok alright so that was exploration so let's put some
[01:00:59] that was exploration so let's put some stuff on the board here do I need this
[01:01:03] stuff on the board here do I need this anymore
[01:01:04] anymore maybe this ok
[01:01:06] maybe this ok [Applause]
[01:01:11] okay so covering the unknown so we
[01:01:13] okay so covering the unknown so we talked about your exploration your
[01:01:19] talked about your exploration your epsilon greedy and there's other ways to
[01:01:24] epsilon greedy and there's other ways to do this epsilon greedy is just kind of a
[01:01:26] do this epsilon greedy is just kind of a simplest thing that actually you know
[01:01:28] simplest thing that actually you know works remarkably you know well even the
[01:01:32] works remarkably you know well even the state of our systems so the other
[01:01:36] state of our systems so the other problem now I'm going to talk about is
[01:01:38] problem now I'm going to talk about is you know generalization so remember when
[01:01:41] you know generalization so remember when we said exploration well if you don't
[01:01:43] we said exploration well if you don't see a particular state then you don't
[01:01:45] see a particular state then you don't know what to do with this I mean think
[01:01:47] know what to do with this I mean think about for a moment that's kind of
[01:01:49] about for a moment that's kind of unreasonable because you know in life
[01:01:51] unreasonable because you know in life you're never gonna be an exact same you
[01:01:54] you're never gonna be an exact same you know situation and yet we are we need to
[01:01:56] know situation and yet we are we need to be able to act properly right so general
[01:02:00] be able to act properly right so general problem is at the state space that you
[01:02:02] problem is at the state space that you you might deal with in a kind of a
[01:02:03] you might deal with in a kind of a real-world situation is enormous
[01:02:05] real-world situation is enormous and there's no way you're going to go
[01:02:07] and there's no way you're going to go track down every possible state okay so
[01:02:10] track down every possible state okay so this state space is actually not that
[01:02:12] this state space is actually not that enormous but this is the biggest state
[01:02:15] enormous but this is the biggest state space I could draw on this on the screen
[01:02:17] space I could draw on this on the screen and you can see that this you know the
[01:02:19] and you can see that this you know the average utility is you're pretty bad
[01:02:20] average utility is you're pretty bad here okay so what can we do about this
[01:02:24] here okay so what can we do about this so I guess let's talk about a large
[01:02:28] so I guess let's talk about a large state space so this is the problem so
[01:02:36] state space so this is the problem so now this is where the second the third
[01:02:39] now this is where the second the third interpretation of model Fremont color
[01:02:41] interpretation of model Fremont color will come in handy so let's take a look
[01:02:44] will come in handy so let's take a look at q-learning okay so in the context of
[01:02:49] at q-learning okay so in the context of SGD it looks like this right so it's a
[01:02:53] SGD it looks like this right so it's a kind of a gradient step where you take
[01:02:55] kind of a gradient step where you take the old value and you - ADA and
[01:02:57] the old value and you - ADA and something that kind of looks like it
[01:03:00] something that kind of looks like it could be a gradient which is the
[01:03:01] could be a gradient which is the residual here so one thing to note is
[01:03:07] residual here so one thing to note is that under the the kind of formulations
[01:03:09] that under the the kind of formulations of a q-learning that I've talked about
[01:03:12] of a q-learning that I've talked about so far this is what we call kind of roll
[01:03:15] so far this is what we call kind of roll earning right which if we were you know
[01:03:19] earning right which if we were you know two weeks ago we were said this is
[01:03:21] two weeks ago we were said this is you know kind of ridiculous because it's
[01:03:23] you know kind of ridiculous because it's not really learning we're generalizing
[01:03:26] not really learning we're generalizing at all right now it's basically for
[01:03:28] at all right now it's basically for every single state in action I have a
[01:03:30] every single state in action I have a value if I have a difference in action
[01:03:33] value if I have a difference in action completely different value I don't I
[01:03:35] completely different value I don't I don't there's no kind of sharing of
[01:03:36] don't there's no kind of sharing of information and naturally if I do that I
[01:03:39] information and naturally if I do that I can't generalize between states and
[01:03:41] can't generalize between states and actions okay so here's the key idea that
[01:03:45] actions okay so here's the key idea that will allow us to actually overcome this
[01:03:48] will allow us to actually overcome this so it's called function approximation in
[01:03:50] so it's called function approximation in the context of reinforcement learning in
[01:03:54] the context of reinforcement learning in normal machine learning it's just called
[01:03:55] normal machine learning it's just called normal machine learning so the way it
[01:04:00] normal machine learning so the way it works is this so we're going to define
[01:04:03] works is this so we're going to define this Q opt si it's not going to be a
[01:04:07] this Q opt si it's not going to be a lookup table it's going to depend on
[01:04:10] lookup table it's going to depend on some parameters here W and I'm going to
[01:04:12] some parameters here W and I'm going to define this function to be W dot fee si
[01:04:16] define this function to be W dot fee si okay so I'm going to define this feature
[01:04:18] okay so I'm going to define this feature vector very similar to how we did in
[01:04:21] vector very similar to how we did in kind of machine in the machine learning
[01:04:23] kind of machine in the machine learning section like except for instead of si we
[01:04:25] section like except for instead of si we had X and now all the weights are going
[01:04:28] had X and now all the weights are going to be kind of us you know saying okay so
[01:04:30] to be kind of us you know saying okay so what kind of features might you have you
[01:04:33] what kind of features might you have you might have for example features on you
[01:04:36] might have for example features on you know actions so these are indicator
[01:04:38] know actions so these are indicator features let's say hey maybe it's better
[01:04:40] features let's say hey maybe it's better to go east than to go west or maybe it's
[01:04:43] to go east than to go west or maybe it's better to be in the fifth row or it's
[01:04:47] better to be in the fifth row or it's good to be in a sixth column and you
[01:04:49] good to be in a sixth column and you know things like that so you have a
[01:04:51] know things like that so you have a smaller set of features and you try to
[01:04:54] smaller set of features and you try to use that to kind of generalize across I
[01:04:56] use that to kind of generalize across I know all the different states that you
[01:04:57] know all the different states that you might see so what this looks like is now
[01:05:03] might see so what this looks like is now with the features is actually the same
[01:05:07] with the features is actually the same as before except for now we have
[01:05:11] as before except for now we have something that really looks like you
[01:05:13] something that really looks like you know the machine learning of lectures is
[01:05:15] know the machine learning of lectures is that you take your weight vector and you
[01:05:18] that you take your weight vector and you do an update of the residual times the
[01:05:24] do an update of the residual times the feature vector okay so how many of you
[01:05:29] feature vector okay so how many of you this looks familiar from linear
[01:05:32] this looks familiar from linear regression okay
[01:05:34] regression okay all right so so just a contrast so
[01:05:36] all right so so just a contrast so before we were just updating the Q opt
[01:05:39] before we were just updating the Q opt values but the residual is exactly the
[01:05:43] values but the residual is exactly the same and there's nothing over here and
[01:05:44] same and there's nothing over here and now what we're doing is we're updating
[01:05:46] now what we're doing is we're updating not the Q values we're updating the
[01:05:48] not the Q values we're updating the weights the residual is the same and the
[01:05:51] weights the residual is the same and the thing that connects the the Q values
[01:05:55] thing that connects the the Q values with the the residual width or the
[01:05:58] with the the residual width or the weights is the kind of a feature vector
[01:06:00] weights is the kind of a feature vector okay as a sanity check this has the same
[01:06:03] okay as a sanity check this has the same dimension this is a vector this is a
[01:06:05] dimension this is a vector this is a scalar this is a vector which has the
[01:06:07] scalar this is a vector which has the same dimensionality as W okay and if you
[01:06:12] same dimensionality as W okay and if you want to derive this you can actually
[01:06:14] want to derive this you can actually think about the implied objective
[01:06:16] think about the implied objective function as simply you know linear
[01:06:20] function as simply you know linear regression you have a model that's
[01:06:22] regression you have a model that's trying to predict a value from an input
[01:06:27] trying to predict a value from an input si so si is like X and Q opt is like
[01:06:31] si so si is like X and Q opt is like kind of Y and then your record this
[01:06:35] kind of Y and then your record this target is like the Y that you're trying
[01:06:37] target is like the Y that you're trying to predict and you're just trying to
[01:06:39] to predict and you're just trying to make this prediction close to the target
[01:06:45] yeah question yeah so a good question so
[01:06:53] yeah question yeah so a good question so what is this a door now is it the same
[01:06:56] what is this a door now is it the same as before or is it new so when we first
[01:06:59] as before or is it new so when we first started talking about these algorithms
[01:07:01] started talking about these algorithms right ADA was supposed to be one over
[01:07:02] right ADA was supposed to be one over the number of updates and so on but once
[01:07:06] the number of updates and so on but once you get into the sgf form like this then
[01:07:10] you get into the sgf form like this then now this is just behaves as a step size
[01:07:12] now this is just behaves as a step size and you can tune it to your heart's
[01:07:13] and you can tune it to your heart's conduct all right so that's all I'll say
[01:07:21] conduct all right so that's all I'll say about these two challenges one is how do
[01:07:26] about these two challenges one is how do you do exploration you can use epsilon
[01:07:30] you do exploration you can use epsilon greedy which allows you to kind of
[01:07:32] greedy which allows you to kind of balance exploration with exploitation oh
[01:07:35] balance exploration with exploitation oh and then the second thing is that for
[01:07:37] and then the second thing is that for large state spaces epsilon greedy isn't
[01:07:40] large state spaces epsilon greedy isn't going to cut it because you you're not
[01:07:42] going to cut it because you you're not going to see all the states even if you
[01:07:43] going to see all the states even if you try really hard and you need something
[01:07:46] try really hard and you need something like function approximation to
[01:07:47] like function approximation to tell you about new states that you
[01:07:49] tell you about new states that you fundamentally haven't seen before
[01:07:53] okay so summary so far online learning
[01:07:57] okay so summary so far online learning we're in the online setting this is the
[01:07:59] we're in the online setting this is the game of reinforcement learning you have
[01:08:01] game of reinforcement learning you have to learn and take actions in real world
[01:08:03] to learn and take actions in real world one of the key challenges is this
[01:08:05] one of the key challenges is this exploration exploitation trade-off we
[01:08:08] exploration exploitation trade-off we saw for algorithms there's kind of two
[01:08:12] saw for algorithms there's kind of two key ideas here one is Monte Carlo which
[01:08:15] key ideas here one is Monte Carlo which is that from data alone you can
[01:08:18] is that from data alone you can basically use averages to estimate
[01:08:20] basically use averages to estimate quantities that you care about for
[01:08:21] quantities that you care about for example transitions rewards and Q values
[01:08:23] example transitions rewards and Q values and the second key idea is this
[01:08:26] and the second key idea is this bootstrapping which shows up in salsa
[01:08:27] bootstrapping which shows up in salsa and Q learning which is that you're
[01:08:29] and Q learning which is that you're updating towards a target that depends
[01:08:32] updating towards a target that depends on your estimate of what you're trying
[01:08:35] on your estimate of what you're trying to predict not just the the kind of raw
[01:08:39] to predict not just the the kind of raw data that you see okay so now I'm gonna
[01:08:45] data that you see okay so now I'm gonna maybe step back a little bit and talk
[01:08:47] maybe step back a little bit and talk about reinforcement learning in the
[01:08:50] about reinforcement learning in the context of some kind of other things so
[01:08:54] context of some kind of other things so there's kind of two things that happen
[01:08:57] there's kind of two things that happen when we went from binary classification
[01:08:59] when we went from binary classification which was you know two weeks ago to
[01:09:01] which was you know two weeks ago to reinforcement learning now and it's
[01:09:03] reinforcement learning now and it's worth kind of decoupling there's two
[01:09:05] worth kind of decoupling there's two things one is state and one is feedback
[01:09:07] things one is state and one is feedback so the idea about partial feedback is
[01:09:13] so the idea about partial feedback is that you can only learn about actions
[01:09:15] that you can only learn about actions you take right I mean this is kind of
[01:09:18] you take right I mean this is kind of obvious in reinforcement learning if you
[01:09:21] obvious in reinforcement learning if you didn't don't quit in this game you never
[01:09:23] didn't don't quit in this game you never know how much money you'll yoyo get and
[01:09:28] know how much money you'll yoyo get and the other idea is the notion of state
[01:09:31] the other idea is the notion of state which is that you rewards depend on your
[01:09:37] which is that you rewards depend on your previous actions so if you're going
[01:09:39] previous actions so if you're going through a volcano you have to there's a
[01:09:42] through a volcano you have to there's a kind of a different situation depending
[01:09:45] kind of a different situation depending on where you are in in the map and
[01:09:49] on where you are in in the map and there's actually kind of so so this is
[01:09:52] there's actually kind of so so this is kind of a you can draw two by two grid
[01:09:54] kind of a you can draw two by two grid where you go from supervised learning
[01:09:56] where you go from supervised learning which is stateless and full feedback
[01:09:58] which is stateless and full feedback right so there's no state every
[01:10:00] right so there's no state every iteration you just get in
[01:10:01] iteration you just get in new example and that doesn't have you
[01:10:05] new example and that doesn't have you know there's no dependency and in terms
[01:10:07] know there's no dependency and in terms of prediction on the previous examples
[01:10:09] of prediction on the previous examples and full feedback and because in
[01:10:14] and full feedback and because in supervised learning you're told which is
[01:10:15] supervised learning you're told which is the correct label even though there's
[01:10:18] the correct label even though there's even if there might be you know a
[01:10:20] even if there might be you know a thousand labels for example in image
[01:10:22] thousand labels for example in image classification you're just told which
[01:10:23] classification you're just told which ones a correct label and now in real for
[01:10:26] ones a correct label and now in real for still learning both of those are made
[01:10:28] still learning both of those are made harder there's two other interesting
[01:10:32] harder there's two other interesting points so what is called multi armed
[01:10:34] points so what is called multi armed bandits is kind of a you can think about
[01:10:37] bandits is kind of a you can think about as a warm-up to reinforce for learning
[01:10:38] as a warm-up to reinforce for learning where there's partial feedback but
[01:10:41] where there's partial feedback but there's no state which makes it easier
[01:10:42] there's no state which makes it easier and there's also you can get full
[01:10:45] and there's also you can get full feedback by their state so in structure
[01:10:47] feedback by their state so in structure prediction for example a machine
[01:10:49] prediction for example a machine translation you're told what the
[01:10:51] translation you're told what the translation output should be but clearly
[01:10:54] translation output should be but clearly do actions depend on previous actions
[01:10:57] do actions depend on previous actions because you know you can't just
[01:11:01] because you know you can't just translate words in isolation essentially
[01:11:05] translate words in isolation essentially okay so one of the things I would just
[01:11:08] okay so one of the things I would just mention very briefly is you know this
[01:11:09] mention very briefly is you know this deeper reinforcement learning has been
[01:11:13] deeper reinforcement learning has been you know very popular in recent years so
[01:11:16] you know very popular in recent years so reinforce ammonia there was kind of a
[01:11:17] reinforce ammonia there was kind of a lot of interest in the the kind of 90s
[01:11:20] lot of interest in the the kind of 90s where a lot of the algorithms were kind
[01:11:23] where a lot of the algorithms were kind of in theory work um was kind of
[01:11:25] of in theory work um was kind of developed and then there was a period
[01:11:27] developed and then there was a period where kind of not that much not as much
[01:11:29] where kind of not that much not as much happened and since I guess 2013 there's
[01:11:34] happened and since I guess 2013 there's been a revival of reinforcement of
[01:11:36] been a revival of reinforcement of research a lot of it's due to I guess at
[01:11:40] research a lot of it's due to I guess at the D mind where they published a paper
[01:11:44] the D mind where they published a paper showing how they can do use robbery and
[01:11:47] showing how they can do use robbery and first morning to play Atari so this will
[01:11:49] first morning to play Atari so this will be talked about more in a section this
[01:11:51] be talked about more in a section this Friday but the basic idea of deep
[01:11:55] Friday but the basic idea of deep reinforcement learning just to kind of
[01:11:57] reinforcement learning just to kind of demystify things is that you're using a
[01:12:00] demystify things is that you're using a neural network for cue opt essentially
[01:12:02] neural network for cue opt essentially that's what it is and there's also a lot
[01:12:06] that's what it is and there's also a lot of tricks to make this kind of work
[01:12:10] of tricks to make this kind of work which are necessary when you're dealing
[01:12:13] which are necessary when you're dealing with enormous state spaces so what
[01:12:15] with enormous state spaces so what of the things that's different about
[01:12:16] of the things that's different about deep reinforcement learning is that
[01:12:17] deep reinforcement learning is that people are much more ambitious about
[01:12:19] people are much more ambitious about handling problems where the state spaces
[01:12:21] handling problems where the state spaces are kind of enormous so for this the
[01:12:23] are kind of enormous so for this the state is that just the you know the
[01:12:25] state is that just the you know the pixels right so there's no huge number
[01:12:28] pixels right so there's no huge number of pixels and whereas before people were
[01:12:32] of pixels and whereas before people were kind of in what is known as a tabular
[01:12:34] kind of in what is known as a tabular case which there's a number of states
[01:12:36] case which there's a number of states you can kind of enumerate so there's a
[01:12:39] you can kind of enumerate so there's a lot of kind of details here to care
[01:12:42] lot of kind of details here to care right one general comment is that
[01:12:44] right one general comment is that reinforcement learning is tech news it's
[01:12:48] reinforcement learning is tech news it's really hard right because of this
[01:12:50] really hard right because of this statefulness
[01:12:52] statefulness and also the delayed feedback so just
[01:12:54] and also the delayed feedback so just when you're maybe thinking about final
[01:12:56] when you're maybe thinking about final projects I mean it's a really cool area
[01:12:57] projects I mean it's a really cool area but you know don't underestimate how
[01:12:59] but you know don't underestimate how much work and compute you need to do
[01:13:03] much work and compute you need to do some other things I won't have a time to
[01:13:05] some other things I won't have a time to talk about is so far we've talked about
[01:13:07] talk about is so far we've talked about methods that try to estimate there the
[01:13:09] methods that try to estimate there the cue function there's also a way to even
[01:13:12] cue function there's also a way to even do without the cue function and just try
[01:13:14] do without the cue function and just try to estimate the policy directly that's
[01:13:16] to estimate the policy directly that's called you know methods like policy
[01:13:19] called you know methods like policy gradient there's also methods like actor
[01:13:21] gradient there's also methods like actor critic that tried to combine of these
[01:13:23] critic that tried to combine of these value based methods and policy based
[01:13:26] value based methods and policy based methods these are used in deep minds
[01:13:31] methods these are used in deep minds you know I'll forego and alpha0 programs
[01:13:34] you know I'll forego and alpha0 programs for you know crushing humans I go this
[01:13:38] for you know crushing humans I go this one actually will be deferred to next
[01:13:41] one actually will be deferred to next week section because this is in the
[01:13:43] week section because this is in the context of games there's a bunch of
[01:13:46] context of games there's a bunch of other applications you know you can fly
[01:13:48] other applications you know you can fly helicopters play backgammon this is
[01:13:50] helicopters play backgammon this is actually one of the early examples TD
[01:13:53] actually one of the early examples TD gamma was one of early examples in there
[01:13:54] gamma was one of early examples in there the nineties up on they're kind of one
[01:13:56] the nineties up on they're kind of one of success stories of using
[01:13:58] of success stories of using reinforcement learning and particulars
[01:14:00] reinforcement learning and particulars you know self play for non games you
[01:14:04] you know self play for non games you know reinforcement can be used to do
[01:14:06] know reinforcement can be used to do have later scheduling and managing data
[01:14:08] have later scheduling and managing data centers and so on okay so that concludes
[01:14:12] centers and so on okay so that concludes a section on Markov decision processes
[01:14:15] a section on Markov decision processes which we the ideas that we are playing
[01:14:19] which we the ideas that we are playing against nature so nature is kind of
[01:14:21] against nature so nature is kind of random but you know kind of neutral next
[01:14:24] random but you know kind of neutral next time we're going to play against an
[01:14:26] time we're going to play against an opponent where they're out together
[01:14:29] opponent where they're out together so we'll see about that


================================================================================
LECTURE 021
================================================================================

Game Playing 1 - Minimax, Alpha-beta Pruning | Stanford CS221: AI (Autumn 2019)

Source: https://www.youtube.com/watch?v=3pU-Hrz_xy4

---

Transcript

[00:00:04] all right let's start guys okay so a few
[00:00:11] all right let's start guys okay so a few announcements before we start so if you
[00:00:14] announcements before we start so if you have if you need away.he accommodations
[00:00:16] have if you need away.he accommodations please let us know if you haven't done
[00:00:18] please let us know if you haven't done that already so you need to let us know
[00:00:20] that already so you need to let us know by October 31st because we need to
[00:00:22] by October 31st because we need to figure out the alternate exam date so
[00:00:24] figure out the alternate exam date so we'll get back to you about the exact
[00:00:26] we'll get back to you about the exact like details around the alternate exam
[00:00:28] like details around the alternate exam date but let us know by October 31st
[00:00:31] date but let us know by October 31st project proposals are also due this
[00:00:34] project proposals are also due this Thursday so to talk to the TAS you talk
[00:00:36] Thursday so to talk to the TAS you talk to us come to office hours alright so
[00:00:40] to us come to office hours alright so today we want to talk about games so so
[00:00:43] today we want to talk about games so so we have started talking about this idea
[00:00:45] we have started talking about this idea of state based models like the fact that
[00:00:47] of state based models like the fact that you want to have state as a way of
[00:00:48] you want to have state as a way of representing everything about everything
[00:00:51] representing everything about everything that we need to plan for the future we
[00:00:53] that we need to plan for the future we talked about search problems already if
[00:00:55] talked about search problems already if you have talked about MVPs where we have
[00:00:57] you have talked about MVPs where we have a setting where we are playing against
[00:00:59] a setting where we are playing against the nature and and the nature can play
[00:01:01] the nature and and the nature can play like probabilistically and then based on
[00:01:03] like probabilistically and then based on that and we need to respond and today we
[00:01:07] that and we need to respond and today we want to talk about games so so the setup
[00:01:09] want to talk about games so so the setup is we have two players playing against
[00:01:11] is we have two players playing against each other so we're not necessarily
[00:01:12] each other so we're not necessarily playing against nature which can act
[00:01:14] playing against nature which can act probabilistically we're actually playing
[00:01:16] probabilistically we're actually playing against another intelligent agent that's
[00:01:19] against another intelligent agent that's deciding for for his own or her own good
[00:01:21] deciding for for his own or her own good so so that's kind of the main idea of of
[00:01:24] so so that's kind of the main idea of of games alright so so let's start with an
[00:01:27] games alright so so let's start with an example so this is actually an example
[00:01:28] example so this is actually an example that you're gonna use throughout the
[00:01:30] that you're gonna use throughout the lecture all right so the example is we
[00:01:32] lecture all right so the example is we have three buckets we have a B and C and
[00:01:35] have three buckets we have a B and C and then you're choosing one of these three
[00:01:37] then you're choosing one of these three buckets and then I choose a number from
[00:01:39] buckets and then I choose a number from the bucket and the question is well well
[00:01:42] the bucket and the question is well well your goal here is to maximize the chosen
[00:01:44] your goal here is to maximize the chosen number and the question is which bucket
[00:01:46] number and the question is which bucket would you use okay so so how many of you
[00:01:49] would you use okay so so how many of you would choose bucket a no one trusts me
[00:01:54] would choose bucket a no one trusts me okay it's how much good how many of you
[00:01:58] okay it's how much good how many of you choose B okay so now if people don't
[00:02:01] choose B okay so now if people don't trust me how many of you to see okay so
[00:02:05] trust me how many of you to see okay so so there's a number of people there yeah
[00:02:06] so there's a number of people there yeah so so how are you making that decision
[00:02:08] so so how are you making that decision so the way you are making this decision
[00:02:09] so the way you are making this decision is if you choose a you're basically
[00:02:12] is if you choose a you're basically assuming that I'm not playing like like
[00:02:14] assuming that I'm not playing like like trying I'm not trying to get you I might
[00:02:16] trying I'm not trying to get you I might actually give you 50 and if I
[00:02:17] actually give you 50 and if I you fifty that'll be awesome and you
[00:02:19] you fifty that'll be awesome and you have this very large value that you're
[00:02:21] have this very large value that you're trying to maximize if you think I'm
[00:02:23] trying to maximize if you think I'm going to act adversarial and go against
[00:02:25] going to act adversarial and go against you and try to minimize your your number
[00:02:27] you and try to minimize your your number then you're going to choose bucket B
[00:02:29] then you're going to choose bucket B right because because worst case
[00:02:31] right because because worst case scenario I'll choose the lowest number
[00:02:33] scenario I'll choose the lowest number of the bucket and and in buckets be the
[00:02:35] of the bucket and and in buckets be the lowest number is one which is better
[00:02:37] lowest number is one which is better than minus 50 and minus 5 so if you're
[00:02:39] than minus 50 and minus 5 so if you're assuming I'm trying to like minimize
[00:02:41] assuming I'm trying to like minimize your good then you're gonna choose
[00:02:42] your good then you're gonna choose bucket B and then if you have no idea
[00:02:44] bucket B and then if you have no idea how I'm playing and you're just assuming
[00:02:47] how I'm playing and you're just assuming maybe I'm acting ad stochastically and
[00:02:49] maybe I'm acting ad stochastically and maybe I'm like flipping a coin and then
[00:02:52] maybe I'm like flipping a coin and then based on that deciding like what number
[00:02:54] based on that deciding like what number to give you you might choose C because
[00:02:56] to give you you might choose C because in expectation C is not bad right like
[00:02:59] in expectation C is not bad right like seems like if you just averaged out
[00:03:00] seems like if you just averaged out these numbers and then pick the average
[00:03:02] these numbers and then pick the average values from ABC a and B and see the
[00:03:06] values from ABC a and B and see the average value for a is 0 for B it's 2
[00:03:08] average value for a is 0 for B it's 2 and then for C is 5 right so so so if
[00:03:13] and then for C is 5 right so so so if I'm playing in stochastically you might
[00:03:14] I'm playing in stochastically you might say oh I'm probably going to give you
[00:03:16] say oh I'm probably going to give you something around 5 so you would pick C
[00:03:18] something around 5 so you would pick C okay so so today we want to talk about
[00:03:20] okay so so today we want to talk about these different policies that you might
[00:03:22] these different policies that you might choose in these settings and how we
[00:03:23] choose in these settings and how we should model our opponent and how we
[00:03:25] should model our opponent and how we formalize these problems at game
[00:03:27] formalize these problems at game problems so this is an example that Dad
[00:03:29] problems so this is an example that Dad we just started ok so so the plan is to
[00:03:32] we just started ok so so the plan is to formalize games talk about how we
[00:03:34] formalize games talk about how we compute values in the setting of games
[00:03:36] compute values in the setting of games so we're going to talk about expecting
[00:03:38] so we're going to talk about expecting Max and minimax and then towards the end
[00:03:40] Max and minimax and then towards the end of the lecture we're gonna talk about
[00:03:41] of the lecture we're gonna talk about how to make things faster so we're gonna
[00:03:44] how to make things faster so we're gonna talk about evaluation functions as a way
[00:03:46] talk about evaluation functions as a way of making things faster which is using
[00:03:49] of making things faster which is using domain knowledge to define evaluation
[00:03:51] domain knowledge to define evaluation functions over notes we're also going to
[00:03:53] functions over notes we're also going to talk about alpha beta pruning which is a
[00:03:55] talk about alpha beta pruning which is a more general way of pruning your tree
[00:03:57] more general way of pruning your tree and making things faster yeah all right
[00:04:00] and making things faster yeah all right so that's the plan for today okay so we
[00:04:04] so that's the plan for today okay so we just defined a scam and the way that to
[00:04:06] just defined a scam and the way that to go about this game is to create
[00:04:07] go about this game is to create something that's called a game tree a
[00:04:09] something that's called a game tree a game tree is very similar to a search
[00:04:12] game tree is very similar to a search tree so this might remind you of search
[00:04:14] tree so this might remind you of search tree where we talked about it like two
[00:04:15] tree where we talked about it like two weeks ago
[00:04:16] weeks ago right so so the idea is we have this
[00:04:18] right so so the idea is we have this game tree where we have notes in the
[00:04:20] game tree where we have notes in the industry and each node is a decision
[00:04:23] industry and each node is a decision point of a player and we have different
[00:04:25] point of a player and we have different players here right like I was flying or
[00:04:27] players here right like I was flying or you were playing you like we have two
[00:04:28] you were playing you like we have two different people like playing here so
[00:04:30] different people like playing here so these decision notes
[00:04:31] these decision notes could be for what one of the players not
[00:04:33] could be for what one of the players not both of them and then each roots to leaf
[00:04:36] both of them and then each roots to leaf path is going to be a possible outcome
[00:04:38] path is going to be a possible outcome of the game okay so like it could be
[00:04:40] of the game okay so like it could be that I'm choosing minus 50 and then your
[00:04:43] that I'm choosing minus 50 and then your decision was to pick bucket a so that
[00:04:46] decision was to pick bucket a so that path is going to give us one possible
[00:04:48] path is going to give us one possible outcome of how things can go okay so so
[00:04:51] outcome of how things can go okay so so that is what the tree is basically
[00:04:53] that is what the tree is basically representing here so the notes and in
[00:04:56] representing here so the notes and in the first level or the decisions that I
[00:04:59] the first level or the decisions that I was making and in the the first node the
[00:05:01] was making and in the the first node the roots node are the decisions that you
[00:05:02] roots node are the decisions that you were making in this setting so if you
[00:05:05] were making in this setting so if you were to formalize this a little bit more
[00:05:06] were to formalize this a little bit more we're gonna formalize this problem as a
[00:05:09] we're gonna formalize this problem as a two player zero-sum game okay so so in
[00:05:12] two player zero-sum game okay so so in this class at least like today we are
[00:05:14] this class at least like today we are going to talk about two player games
[00:05:15] going to talk about two player games where we have an agent and we have an
[00:05:17] where we have an agent and we have an opponent and then we are going to talk
[00:05:19] opponent and then we are going to talk about policies and values and for all of
[00:05:22] about policies and values and for all of those things think of yourself as being
[00:05:24] those things think of yourself as being the agent so you're playing for the
[00:05:26] the agent so you're playing for the agent you're optimizing for the agent
[00:05:28] agent you're optimizing for the agent opponent is this opponent that's playing
[00:05:30] opponent is this opponent that's playing against okay so we're also going to like
[00:05:34] against okay so we're also going to like today we're going to talk about games
[00:05:35] today we're going to talk about games that are turn-taking games so we're
[00:05:38] that are turn-taking games so we're going to talk about things like chess
[00:05:39] going to talk about things like chess we're not talking about things like
[00:05:41] we're not talking about things like rock-paper-scissors we will talk about
[00:05:43] rock-paper-scissors we will talk about that actually next time then we have
[00:05:44] that actually next time then we have like seeing like simultaneous games
[00:05:46] like seeing like simultaneous games where you're playing simultaneously
[00:05:48] where you're playing simultaneously today we are talking about turn-taking
[00:05:49] today we are talking about turn-taking settings two-player turn-taking settings
[00:05:52] settings two-player turn-taking settings full observability we see everything
[00:05:54] full observability we see everything we're not talking about like games like
[00:05:56] we're not talking about like games like poker where you don't necessarily see
[00:05:57] poker where you don't necessarily see like you have partial observation and
[00:05:59] like you have partial observation and you don't necessarily see the hand of
[00:06:00] you don't necessarily see the hand of your opponent full observation to player
[00:06:03] your opponent full observation to player and also zero-sum games and what
[00:06:05] and also zero-sum games and what zero-sum means is if I'm winning and if
[00:06:08] zero-sum means is if I'm winning and if I'm getting like ten dollars from
[00:06:09] I'm getting like ten dollars from winning then my opponent is losing ten
[00:06:11] winning then my opponent is losing ten dollars so the total utility is going to
[00:06:14] dollars so the total utility is going to be equal to zero if I win some amount my
[00:06:16] be equal to zero if I win some amount my opponent is losing the same amount all
[00:06:19] opponent is losing the same amount all right so so what are the things that we
[00:06:22] right so so what are the things that we need when we define games so we need to
[00:06:23] need when we define games so we need to know the players we have the agent we
[00:06:26] know the players we have the agent we have the opponent in addition to that
[00:06:28] have the opponent in addition to that you need to define a bunch of things
[00:06:29] you need to define a bunch of things this should remind you of the search
[00:06:31] this should remind you of the search lecture or them if you like sure so we
[00:06:33] lecture or them if you like sure so we might have a start state that's a start
[00:06:35] might have a start state that's a start we have actions which is a function of
[00:06:38] we have actions which is a function of states which gives us the possible
[00:06:40] states which gives us the possible actions from a state s similar to before
[00:06:43] actions from a state s similar to before you have a successor functions
[00:06:45] you have a successor functions to search problems so successor function
[00:06:47] to search problems so successor function takes a state an action and it tells us
[00:06:49] takes a state an action and it tells us what's the resulting state you're going
[00:06:51] what's the resulting state you're going to end up and this and you have an is
[00:06:54] to end up and this and you have an is end function which checks if you're in
[00:06:56] end function which checks if you're in an end state or not and the thing that's
[00:06:58] an end state or not and the thing that's different here there are two things that
[00:06:59] different here there are two things that are different here one is this utility
[00:07:01] are different here one is this utility function and the utility function
[00:07:04] function and the utility function basically gives us the agents utility at
[00:07:06] basically gives us the agents utility at the end State so one thing to notice
[00:07:10] the end State so one thing to notice here is is that the utility only comes
[00:07:12] here is is that the utility only comes at an end state so after you finish the
[00:07:15] at an end state so after you finish the game like I've played my chest and I
[00:07:17] game like I've played my chest and I have one chest now and this is this
[00:07:18] have one chest now and this is this chess game and and then I get my utility
[00:07:21] chess game and and then I get my utility like as I'm making moves like through my
[00:07:23] like as I'm making moves like through my chess game I'm not getting getting any
[00:07:25] chess game I'm not getting getting any utility it's like you only get the
[00:07:26] utility it's like you only get the utility at an end state and I no way
[00:07:28] utility at an end state and I no way they are defined me utility as you are
[00:07:30] they are defined me utility as you are defining it for the agent because again
[00:07:31] defining it for the agent because again we are we're playing from perspective of
[00:07:33] we are we're playing from perspective of the agent so so what would be the
[00:07:35] the agent so so what would be the utility of the opponent - that right so
[00:07:40] utility of the opponent - that right so the negation of that would be the
[00:07:42] the negation of that would be the utility of opponent I've heard about
[00:07:47] utility of opponent I've heard about partially observable Markov decision
[00:07:49] partially observable Markov decision process this like kind of what it is
[00:07:53] process this like kind of what it is okay so the question is is this
[00:07:55] okay so the question is is this partially observable Markov decision
[00:07:56] partially observable Markov decision process this is not a partially
[00:07:57] process this is not a partially observable Markov decision processes
[00:07:59] observable Markov decision processes there are classes that talk about like
[00:08:02] there are classes that talk about like there's this decision under uncertainty
[00:08:03] there's this decision under uncertainty michael Koch endure first class that
[00:08:04] michael Koch endure first class that actually teaches that so you should
[00:08:06] actually teaches that so you should shoot you should take classes on that
[00:08:07] shoot you should take classes on that this is not partially observable Markov
[00:08:09] this is not partially observable Markov decision process this is fully
[00:08:11] decision process this is fully observable you have two players playing
[00:08:13] observable you have two players playing against each other so very different
[00:08:14] against each other so very different setup so so the question is are there
[00:08:20] setup so so the question is are there any random this year and and so far I
[00:08:22] any random this year and and so far I haven't discussed any randomness yet
[00:08:24] haven't discussed any randomness yet later in the lecture I'll talk actually
[00:08:25] later in the lecture I'll talk actually about the case where there might be a
[00:08:26] about the case where there might be a nature in the middle that acts randomly
[00:08:28] nature in the middle that acts randomly and then how we go about it but so far
[00:08:30] and then how we go about it but so far two players playing against each other
[00:08:32] two players playing against each other okay all right and then the other thing
[00:08:36] okay all right and then the other thing that we need to define when we are
[00:08:37] that we need to define when we are defining a game is is the player so so a
[00:08:40] defining a game is is the player so so a player is a function of state and
[00:08:42] player is a function of state and basically tells us who is in control
[00:08:44] basically tells us who is in control like who is playing now so in the game
[00:08:45] like who is playing now so in the game of chess the coos turn is it now and and
[00:08:48] of chess the coos turn is it now and and Ida is a function that we are going to
[00:08:49] Ida is a function that we are going to define when we are formally defining a
[00:08:51] define when we are formally defining a game okay all right so so let's look at
[00:08:56] game okay all right so so let's look at an example so we have a game of chess
[00:08:57] an example so we have a game of chess players or white
[00:08:58] players or white and black let's say you're playing for
[00:09:00] and black let's say you're playing for white so the agent is white the opponent
[00:09:02] white so the agent is white the opponent is black and then the state s can
[00:09:05] is black and then the state s can represent the position of all pieces and
[00:09:07] represent the position of all pieces and who started this so that is going to
[00:09:10] who started this so that is going to what the state is representing so whose
[00:09:12] what the state is representing so whose players turn it is and the position of
[00:09:14] players turn it is and the position of all pieces so actions would be all the
[00:09:17] all pieces so actions would be all the legal chess moves that player s can take
[00:09:19] legal chess moves that player s can take and then is ends basically checks if the
[00:09:23] and then is ends basically checks if the state is checkmate or draw that is what
[00:09:26] state is checkmate or draw that is what is so so then what would the utility be
[00:09:29] is so so then what would the utility be the utility will be will be if you're
[00:09:32] the utility will be will be if you're like you're only going to get it when
[00:09:34] like you're only going to get it when you win or when you lose or or if
[00:09:36] you win or when you lose or or if there's a draw so the way we are
[00:09:37] there's a draw so the way we are defining it is it's going to be let's
[00:09:39] defining it is it's going to be let's say plus infinity if white wins because
[00:09:41] say plus infinity if white wins because because the agent is white and and it's
[00:09:44] because the agent is white and and it's going to be zero if there is a draw and
[00:09:46] going to be zero if there is a draw and then it's going to be minus infinity if
[00:09:48] then it's going to be minus infinity if black wins
[00:09:50] black wins yeah so so that was all the things that
[00:09:53] yeah so so that was all the things that we would need to define yes
[00:10:01] why do we have why do we have whose turn
[00:10:04] why do we have why do we have whose turn it is in this state and so that's one
[00:10:06] it is in this state and so that's one way of actually like extracting the
[00:10:08] way of actually like extracting the player function so you defy you can
[00:10:10] player function so you defy you can define the player function is a player
[00:10:12] define the player function is a player is a function of state so the state
[00:10:14] is a function of state so the state already needs to encode whose turn it is
[00:10:15] already needs to encode whose turn it is so you can kind of extract that this way
[00:10:17] so you can kind of extract that this way you said the killer but you probably
[00:10:19] you said the killer but you probably been negative utility for the the agent
[00:10:21] been negative utility for the the agent is that assuming that both taking the
[00:10:23] is that assuming that both taking the same actions no so so so this is
[00:10:27] same actions no so so so this is turn-taking right so I take an action
[00:10:28] turn-taking right so I take an action and then the opponent takes an action
[00:10:30] and then the opponent takes an action and an agent a connection opponent takes
[00:10:31] and an agent a connection opponent takes an action and then at the very end of
[00:10:33] an action and then at the very end of the game then then you get the utility
[00:10:36] the game then then you get the utility and then the opponent gets in to get
[00:10:37] and then the opponent gets in to get negative of that utility but the actions
[00:10:39] negative of that utility but the actions could be very different policies could
[00:10:40] could be very different policies could be very different and we'll talk about
[00:10:42] be very different and we'll talk about how to come up with that so why is the
[00:10:43] how to come up with that so why is the condition like very influential exactly
[00:10:45] condition like very influential exactly for what wins you get plus infinity but
[00:10:47] for what wins you get plus infinity but we just black when you trade but if
[00:10:49] we just black when you trade but if black plays black wins you have you
[00:10:56] black plays black wins you have you don't have a zero-sum game we'll talk
[00:10:57] don't have a zero-sum game we'll talk about that next lecture actually a
[00:10:58] about that next lecture actually a little bit so so I'm talking about
[00:11:00] little bit so so I'm talking about zero-sum games here because the
[00:11:02] zero-sum games here because the algorithms you're talking about are for
[00:11:03] algorithms you're talking about are for zero-sum games like we're going to talk
[00:11:05] zero-sum games like we're going to talk about minute minimax type policies where
[00:11:07] about minute minimax type policies where I'm minimizing and the agent that's
[00:11:09] I'm minimizing and the agent that's maximizing so I'll get back to that if I
[00:11:11] maximizing so I'll get back to that if I haven't understand sir that like we can
[00:11:13] haven't understand sir that like we can talk about it after the class but also
[00:11:15] talk about it after the class but also next lecture we'll talk about more
[00:11:17] next lecture we'll talk about more variations of games so but for now I'm
[00:11:19] variations of games so but for now I'm assuming a bunch of simplifying
[00:11:20] assuming a bunch of simplifying assumptions about the assumption of
[00:11:24] assumptions about the assumption of stuff like you're twins
[00:11:28] why twins back at zero that's time yeah
[00:11:33] why twins back at zero that's time yeah so this utilities need to add up to zero
[00:11:35] so this utilities need to add up to zero if white wins maybe white gets ten but
[00:11:37] if white wins maybe white gets ten but black gets minus ten
[00:11:39] black gets minus ten so so like any titles okay all right so
[00:11:42] so so like any titles okay all right so and then kind of the characteristics of
[00:11:44] and then kind of the characteristics of games that we have already discussed are
[00:11:46] games that we have already discussed are two main things one is that all
[00:11:48] two main things one is that all utilities are at end state so throughout
[00:11:50] utilities are at end state so throughout this path you're not getting in
[00:11:52] this path you're not getting in utilities as opposed to like things like
[00:11:54] utilities as opposed to like things like MVPs or we were getting rewards like
[00:11:56] MVPs or we were getting rewards like throughout the path but here like the
[00:11:58] throughout the path but here like the utility only comes in at the very end at
[00:12:00] utility only comes in at the very end at the end state and then the other thing
[00:12:02] the end state and then the other thing about it is that different players are
[00:12:04] about it is that different players are in control at different states right
[00:12:06] in control at different states right like if you're in state you might not be
[00:12:08] like if you're in state you might not be able to control things in my control
[00:12:09] able to control things in my control things it might be your opponent's turn
[00:12:11] things it might be your opponent's turn I knew
[00:12:12] I knew be able to do anything okay so those are
[00:12:14] be able to do anything okay so those are kind of the two main characteristics of
[00:12:16] kind of the two main characteristics of games all right so let's look at a game
[00:12:19] games all right so let's look at a game that you're going to play alright so the
[00:12:22] that you're going to play alright so the game is a half a game so we start with
[00:12:25] game is a half a game so we start with the number and and then the player and
[00:12:27] the number and and then the player and the players take turn and they can do
[00:12:30] the players take turn and they can do two things they can either subtract one
[00:12:31] two things they can either subtract one second decrement N or they can replace n
[00:12:34] second decrement N or they can replace n with n over two so they can divide or
[00:12:36] with n over two so they can divide or subtract yeah and the player that's left
[00:12:39] subtract yeah and the player that's left with zero is going to win okay so that
[00:12:42] with zero is going to win okay so that is that is the setup is that so I'll
[00:12:45] is that is the setup is that so I'll follow that so so let's try to formalize
[00:12:47] follow that so so let's try to formalize the game and then after that you want to
[00:12:49] the game and then after that you want to figure out what is a good policy to do
[00:12:51] figure out what is a good policy to do it so so right now let's just try to
[00:12:53] it so so right now let's just try to let's just try to formalize this say
[00:12:56] let's just try to formalize this say like what are all the different things
[00:12:58] like what are all the different things for the model R so let's just have a new
[00:13:00] for the model R so let's just have a new file we are going to define this game so
[00:13:05] file we are going to define this game so it's 1/2 in game this alright so if
[00:13:13] it's 1/2 in game this alright so if you're initializing with n so we're
[00:13:15] you're initializing with n so we're starting with some number n so what is
[00:13:21] starting with some number n so what is our state where our state is going to
[00:13:22] our state where our state is going to encode whose player turn it is and that
[00:13:24] encode whose player turn it is and that number n ok so we have a player let's
[00:13:27] number n ok so we have a player let's say our players are either plus 1 or
[00:13:28] say our players are either plus 1 or minus 1 that's how I'm defining like
[00:13:30] minus 1 that's how I'm defining like whose player this so the start state
[00:13:32] whose player this so the start state let's say player plus 1 place with n so
[00:13:35] let's say player plus 1 place with n so so that is plus 1 and N and then we need
[00:13:39] so that is plus 1 and N and then we need to define is and ok so what should is n
[00:13:42] to define is and ok so what should is n check well we take the state we decouple
[00:13:45] check well we take the state we decouple it into player and number and if the
[00:13:48] it into player and number and if the number is equal to 0 then then that is
[00:13:50] number is equal to 0 then then that is when the game ends that's our ending
[00:13:51] when the game ends that's our ending condition ok
[00:13:54] condition ok how about utility well we get the
[00:13:57] how about utility well we get the utility at an end state so again I take
[00:13:59] utility at an end state so again I take a state I decouple it into player and
[00:14:02] a state I decouple it into player and number I make sure that we are in an end
[00:14:05] number I make sure that we are in an end state so we search that number is equal
[00:14:07] state so we search that number is equal to 0 because that kind of defines if
[00:14:09] to 0 because that kind of defines if you're in an end state or not and then
[00:14:11] you're in an end state or not and then the utility I'm gonna get if I'm winning
[00:14:13] the utility I'm gonna get if I'm winning I'm gonna get infinity if I'm not
[00:14:15] I'm gonna get infinity if I'm not winning I'm gonna get minus infinity and
[00:14:17] winning I'm gonna get minus infinity and the way I'm defining that here is by
[00:14:19] the way I'm defining that here is by just doing player times infinity because
[00:14:21] just doing player times infinity because player I'm the agent I'm the player plus
[00:14:23] player I'm the agent I'm the player plus 1 the opponent is player
[00:14:25] 1 the opponent is player a swan that's how like if -1 is winning
[00:14:29] a swan that's how like if -1 is winning I'm gonna get minus infinity okay the
[00:14:32] I'm gonna get minus infinity okay the actions that we can do is we can
[00:14:33] actions that we can do is we can subtract 1 or we can divide
[00:14:35] subtract 1 or we can divide / - I mean subtract and avoid or the
[00:14:38] / - I mean subtract and avoid or the main actions and play your display or
[00:14:41] main actions and play your display or function again takes a state I'm going
[00:14:43] function again takes a state I'm going to decouple this state into player a
[00:14:45] to decouple this state into player a number and just return the player that's
[00:14:47] number and just return the player that's how I know whose player's turn is and
[00:14:51] how I know whose player's turn is and then we need to define the successor
[00:14:53] then we need to define the successor function the successor function takes
[00:14:55] function the successor function takes the state and an action and tells us
[00:14:58] the state and an action and tells us what state you're gonna end up by so
[00:15:00] what state you're gonna end up by so again a state I'm going to decouple that
[00:15:02] again a state I'm going to decouple that into a player and a number and then the
[00:15:05] into a player and a number and then the actions I can take are two things I can
[00:15:07] actions I can take are two things I can either subtract 1 or I can I can divide
[00:15:10] either subtract 1 or I can I can divide by 2 so if I'm subtracting then I'm
[00:15:12] by 2 so if I'm subtracting then I'm going to return a new state which is -
[00:15:14] going to return a new state which is - player because now it's - once turn or
[00:15:16] player because now it's - once turn or +1 so like it's - whoever turned to this
[00:15:19] +1 so like it's - whoever turned to this now and then I'm going to do number
[00:15:21] now and then I'm going to do number minus 1 if the action is divided we are
[00:15:24] minus 1 if the action is divided we are going to return the new player which is
[00:15:26] going to return the new player which is - player and then number / - okay that
[00:15:31] - player and then number / - okay that is it
[00:15:31] is it so so we just defined this game
[00:15:40] all right so so that was my game we were
[00:15:47] all right so so that was my game we were gonna play this game in a little bit but
[00:15:49] gonna play this game in a little bit but let's quickly before playing it let's
[00:15:50] let's quickly before playing it let's talk about what is a solution to a game
[00:15:52] talk about what is a solution to a game like what are we trying to do in a game
[00:15:54] like what are we trying to do in a game so if you remember MVPs the solution to
[00:15:56] so if you remember MVPs the solution to a game was with a policy so a policy was
[00:15:59] a game was with a policy so a policy was a function of state it would return the
[00:16:01] a function of state it would return the action that you need to take in that
[00:16:02] action that you need to take in that state so similar term the piece here we
[00:16:05] state so similar term the piece here we have policies but but the thing is I
[00:16:07] have policies but but the thing is I have two players so policy should depend
[00:16:09] have two players so policy should depend on the player two so I have pi of P
[00:16:12] on the player two so I have pi of P which is the policy of player P and I
[00:16:15] which is the policy of player P and I can define it similar to before it can
[00:16:17] can define it similar to before it can be a policy as a function of a state and
[00:16:18] be a policy as a function of a state and it can return just an action and this
[00:16:21] it can return just an action and this would be a deterministic policy like
[00:16:22] would be a deterministic policy like deterministically if I'm in a state the
[00:16:25] deterministically if I'm in a state the policy is going to tell me what action
[00:16:26] policy is going to tell me what action to take yes we can also define
[00:16:29] to take yes we can also define stochastic policies so what's the
[00:16:32] stochastic policies so what's the catholic policies would do is they would
[00:16:33] catholic policies would do is they would take a state and action and then they
[00:16:36] take a state and action and then they would return a number between 0 to 1
[00:16:38] would return a number between 0 to 1 which is the probability of taking that
[00:16:41] which is the probability of taking that action so policy PI of a state and
[00:16:44] action so policy PI of a state and action basically will return the
[00:16:46] action basically will return the probability of player P taking action a
[00:16:49] probability of player P taking action a in state s so if you remember the bucket
[00:16:52] in state s so if you remember the bucket example like maybe half the time I would
[00:16:54] example like maybe half the time I would pick the number on the right and half
[00:16:56] pick the number on the right and half the time I would pick the number under
[00:16:57] the time I would pick the number under on the left that would be a stochastic
[00:16:59] on the left that would be a stochastic policy right I'm not deterministically
[00:17:01] policy right I'm not deterministically telling you what the action is I'm
[00:17:03] telling you what the action is I'm coming up with the stochastic way of
[00:17:04] coming up with the stochastic way of telling you like what policy I'm falling
[00:17:07] telling you like what policy I'm falling ok so if you have deterministic policies
[00:17:08] ok so if you have deterministic policies stochastic policies like in our game we
[00:17:11] stochastic policies like in our game we could fall either one of them fantastic
[00:17:14] could fall either one of them fantastic policy as the student has been pausing
[00:17:17] policy as the student has been pausing and can you speak up so under what case
[00:17:20] and can you speak up so under what case would you want to stochastic policy
[00:17:22] would you want to stochastic policy versus a deterministic policy so I know
[00:17:24] versus a deterministic policy so I know what case do you want a stochastic
[00:17:25] what case do you want a stochastic policy is a policy again we'll have
[00:17:28] policy is a policy again we'll have covered that a little bit more next time
[00:17:29] covered that a little bit more next time depending on what games you are in like
[00:17:31] depending on what games you are in like we have some property yourself when
[00:17:33] we have some property yourself when stochastic policies are giving us some
[00:17:35] stochastic policies are giving us some some properties than the term mystic
[00:17:37] some properties than the term mystic policies are giving us some other
[00:17:38] policies are giving us some other properties right now you're just
[00:17:40] properties right now you're just defining them as things that could exist
[00:17:41] defining them as things that could exist and we could think our opponent is
[00:17:44] and we could think our opponent is acting deterministically if we know
[00:17:47] acting deterministically if we know exactly what they're doing sometimes
[00:17:48] exactly what they're doing sometimes have no idea maybe you're like I've
[00:17:50] have no idea maybe you're like I've learned it somehow and I have some
[00:17:51] learned it somehow and I have some randomness there and then I'm gonna do
[00:17:53] randomness there and then I'm gonna do some stochastic policy for how my
[00:17:55] some stochastic policy for how my opponent is going to play against me but
[00:17:57] opponent is going to play against me but we are going to talk about the like what
[00:17:58] we are going to talk about the like what we get out of a stochastic versus
[00:18:00] we get out of a stochastic versus deterministic policies a little bit more
[00:18:01] deterministic policies a little bit more next time yes all right so okay so now
[00:18:06] next time yes all right so okay so now let's okay so now that we know that it's
[00:18:07] let's okay so now that we know that it's the policy that we want to get let's try
[00:18:09] the policy that we want to get let's try to let's try to write up a policy for
[00:18:13] to let's try to write up a policy for this game and then I'm gonna define a
[00:18:15] this game and then I'm gonna define a human policy and what I mean by that is
[00:18:17] human policy and what I mean by that is this is going to come from a human that
[00:18:20] this is going to come from a human that means one of you guys or two of you guys
[00:18:22] means one of you guys or two of you guys so um so I need two volunteers for this
[00:18:25] so um so I need two volunteers for this but let's quickly actually write this up
[00:18:27] but let's quickly actually write this up so what is a human policy it's just
[00:18:29] so what is a human policy it's just going to get the input from the keyboard
[00:18:31] going to get the input from the keyboard so what I'm going to tell you up here is
[00:18:33] so what I'm going to tell you up here is is get the action from from from the
[00:18:36] is get the action from from from the keyboard so get the input from the
[00:18:38] keyboard so get the input from the keyboard and that is going to be the
[00:18:39] keyboard and that is going to be the action that we are picking remember the
[00:18:41] action that we are picking remember the actions are either divide or subtract
[00:18:43] actions are either divide or subtract subtract one and if action is valid then
[00:18:47] subtract one and if action is valid then return that action that sounds like a
[00:18:49] return that action that sounds like a good good policy okay so that is a human
[00:18:53] good good policy okay so that is a human policy so now what I want to do is I
[00:18:55] policy so now what I want to do is I want to have like this game that's
[00:18:57] want to have like this game that's actually playing against each other so I
[00:18:58] actually playing against each other so I want to have policies for my agent my
[00:19:02] want to have policies for my agent my agent is plus one that's going to be a
[00:19:04] agent is plus one that's going to be a human policy and for my opponent I'm
[00:19:06] human policy and for my opponent I'm gonna say my opponent is also a human
[00:19:08] gonna say my opponent is also a human policy so I just want to humans to play
[00:19:10] policy so I just want to humans to play against each other and the game is let's
[00:19:15] against each other and the game is let's say we are starting with 15 so our
[00:19:18] say we are starting with 15 so our number that you're starting with is 15
[00:19:20] number that you're starting with is 15 okay all right
[00:19:23] okay all right so that looks right to me
[00:19:28] so that looks right to me so how do we how do we ensure that we
[00:19:30] so how do we how do we ensure that we are progressing in the game so if you're
[00:19:32] are progressing in the game so if you're in an NSA if you're not in an end-state
[00:19:34] in an NSA if you're not in an end-state you want to progress
[00:19:35] you want to progress so let's print a bunch of things here
[00:19:37] so let's print a bunch of things here just print our state okay let's get the
[00:19:45] just print our state okay let's get the player out of the state because again
[00:19:47] player out of the state because again the state encodes the player let's get
[00:19:51] the state encodes the player let's get the policy because because we've defined
[00:19:53] the policy because because we've defined these policies for both of the players
[00:19:55] these policies for both of the players so we can we can get the policy for
[00:19:57] so we can we can get the policy for whoever is playing right now and then
[00:20:00] whoever is playing right now and then the action comes from the policy in that
[00:20:03] the action comes from the policy in that state and then the new state you are
[00:20:05] state and then the new state you are gonna end up at is just a successor of
[00:20:07] gonna end up at is just a successor of the current state and action so that I'm
[00:20:09] the current state and action so that I'm just progressing so so this while loop
[00:20:11] just progressing so so this while loop here just figures out what stage we are
[00:20:13] here just figures out what stage we are in what policy are we following and
[00:20:15] in what policy are we following and where are we going to end up at and
[00:20:17] where are we going to end up at and that's the successor function and
[00:20:19] that's the successor function and another very end I'm just going to print
[00:20:21] another very end I'm just going to print out the utility so that's either plus
[00:20:25] out the utility so that's either plus infinity or minus infinity and that
[00:20:29] infinity or minus infinity and that sounds good so all right so let's
[00:20:32] sounds good so all right so let's actually alright so who wants play this
[00:20:36] okay that's one person your agent your
[00:20:38] okay that's one person your agent your player plus one opponent all three to
[00:20:42] player plus one opponent all three to four I think you reversed white shirt
[00:20:46] four I think you reversed white shirt yeah okay so you're minus one all right
[00:20:49] yeah okay so you're minus one all right so let's play this yeah is this large
[00:20:56] so let's play this yeah is this large enough yeah okay all right so player 1
[00:20:58] enough yeah okay all right so player 1 player +1 we are at number 15 you want
[00:21:01] player +1 we are at number 15 you want to decrement okay so minus 1 so you are
[00:21:06] to decrement okay so minus 1 so you are player minus 1 we are at 14 what do you
[00:21:08] player minus 1 we are at 14 what do you want to do divide okay you have a policy
[00:21:14] want to do divide okay you have a policy that you know minus 1
[00:21:23] oh yeah so I kind of get the point right
[00:21:36] oh yeah so I kind of get the point right so wait do they make you lose now sorry
[00:21:41] so wait do they make you lose now sorry my back but when you get the utility at
[00:21:43] my back but when you get the utility at the end and then basically kind of
[00:21:46] the end and then basically kind of actually does any no I don't know we
[00:21:49] actually does any no I don't know we don't have down motion I was gonna try
[00:21:50] don't have down motion I was gonna try like another pair but the code is online
[00:21:52] like another pair but the code is online if you wanna play with it just play with
[00:21:53] if you wanna play with it just play with it you'll have one other version playing
[00:21:56] it you'll have one other version playing with an automated policy later
[00:21:58] with an automated policy later all right so okay so we're back here all
[00:22:09] all right so okay so we're back here all right so we just saw how we can give
[00:22:10] right so we just saw how we can give some human policies and the human
[00:22:12] some human policies and the human policies playing against each other and
[00:22:14] policies playing against each other and again the policy you give it a state an
[00:22:16] again the policy you give it a state an action it gives you a probability or you
[00:22:18] action it gives you a probability or you give it a state and it gives you an
[00:22:19] give it a state and it gives you an action so the term Mystic policy is just
[00:22:21] action so the term Mystic policy is just an instance of a stochastic policy right
[00:22:24] an instance of a stochastic policy right so so if you have deterministic policy
[00:22:25] so so if you have deterministic policy you can kind of treat it as a stochastic
[00:22:27] you can kind of treat it as a stochastic policy where with probability one you're
[00:22:29] policy where with probability one you're picking you're picking an action so
[00:22:31] picking you're picking an action so alright so so now we want to talk about
[00:22:34] alright so so now we want to talk about how we evaluate a game so so let's say
[00:22:36] how we evaluate a game so so let's say that someone comes in and gives me the
[00:22:38] that someone comes in and gives me the policy of an agent and an opponent and I
[00:22:41] policy of an agent and an opponent and I just want to know how good that was and
[00:22:42] just want to know how good that was and again if you remember the MVP lecture we
[00:22:44] again if you remember the MVP lecture we started with policy evaluation so
[00:22:46] started with policy evaluation so indemnity lecture we started with this
[00:22:49] indemnity lecture we started with this idea of someone gives me the policy if
[00:22:51] idea of someone gives me the policy if you just want to evaluate how good that
[00:22:53] you just want to evaluate how good that is and you're kind of doing an analogous
[00:22:54] is and you're kind of doing an analogous to exactly that someone comes in and
[00:22:56] to exactly that someone comes in and tells me that my agent is going to
[00:22:58] tells me that my agent is going to pickpocket eighth that is what my agent
[00:23:00] pickpocket eighth that is what my agent is going to just do all the time and
[00:23:02] is going to just do all the time and someone comes in and says well my
[00:23:03] someone comes in and says well my opponent is going to access classically
[00:23:05] opponent is going to access classically and and with probability 1/2 you give me
[00:23:08] and and with probability 1/2 you give me one of those numbers ok so so these are
[00:23:10] one of those numbers ok so so these are the two policies that we were going to
[00:23:11] the two policies that we were going to have so the question is how good is this
[00:23:13] have so the question is how good is this so going back to the to the tree the
[00:23:15] so going back to the to the tree the game tree what is really happening is my
[00:23:17] game tree what is really happening is my agent is going to pick this one right
[00:23:21] agent is going to pick this one right because it's going to pickpocket
[00:23:22] because it's going to pickpocket a so with probability 1 we are going to
[00:23:24] a so with probability 1 we are going to end up here with probability 0 we end up
[00:23:26] end up here with probability 0 we end up in any of these other buckets and then
[00:23:28] in any of these other buckets and then my opponent is going to stochastically
[00:23:30] my opponent is going to stochastically pick either minus 50 or 50 ok
[00:23:32] pick either minus 50 or 50 ok so if my opponent is picking minus 50 or
[00:23:34] so if my opponent is picking minus 50 or 50 then the value of
[00:23:36] 50 then the value of snowed is just the the expectation of
[00:23:40] snowed is just the the expectation of that or it's just going to be zero so
[00:23:41] that or it's just going to be zero so 50% of the times it's minus 50 or 50
[00:23:43] 50% of the times it's minus 50 or 50 percent of the times it's 50 then the
[00:23:45] percent of the times it's 50 then the value of the snow Dussehra and then if
[00:23:48] value of the snow Dussehra and then if my agent is picking picking a then then
[00:23:50] my agent is picking picking a then then the value of this node is going to be
[00:23:52] the value of this node is going to be zero okay so so you kind of can see how
[00:23:55] zero okay so so you kind of can see how the value is going to propagate up from
[00:23:58] the value is going to propagate up from the utility so we have the utilities are
[00:24:00] the utility so we have the utilities are the leaf nodes but we could actually
[00:24:01] the leaf nodes but we could actually compute a value for each one of these
[00:24:04] compute a value for each one of these notes if I know what the policies are
[00:24:06] notes if I know what the policies are right like if I know who is following
[00:24:07] right like if I know who is following what policy I can actually compute these
[00:24:09] what policy I can actually compute these values and go up the tree so in this
[00:24:12] values and go up the tree so in this case I can say a value of the start
[00:24:15] case I can say a value of the start state if I'm evaluating this particular
[00:24:18] state if I'm evaluating this particular policy is going to be equal to zero okay
[00:24:20] policy is going to be equal to zero okay all right so someone gave me the policy
[00:24:24] all right so someone gave me the policy I evaluated the value of the start State
[00:24:26] I evaluated the value of the start State so in general I guess I was just saying
[00:24:29] so in general I guess I was just saying earlier this is this is similar to two
[00:24:31] earlier this is this is similar to two policy evaluation this is similar to the
[00:24:34] policy evaluation this is similar to the case that someone gives me the policies
[00:24:35] case that someone gives me the policies and all evaluate what how good the
[00:24:37] and all evaluate what how good the situation is and you can write a
[00:24:39] situation is and you can write a recurrence to actually compute that so
[00:24:41] recurrence to actually compute that so I'm going to write the recurrence here
[00:24:43] I'm going to write the recurrence here maybe so you want to compute this value
[00:24:45] maybe so you want to compute this value and this value is evaluating a given
[00:24:49] and this value is evaluating a given policy it's a function of state but what
[00:24:53] policy it's a function of state but what is that going to be equal to
[00:24:55] is that going to be equal to it's going to be a quarter utility of s
[00:24:57] it's going to be a quarter utility of s if you're in an end state so it's
[00:24:59] if you're in an end state so it's utility of s if we are already in an end
[00:25:04] utility of s if we are already in an end state otherwise I have access to the
[00:25:09] state otherwise I have access to the policy of my opponent and policy of my
[00:25:11] policy of my opponent and policy of my agent so I can just do and expect that
[00:25:14] agent so I can just do and expect that sum over all possible actions of us
[00:25:18] sum over all possible actions of us let's say that I'm if player s is agent
[00:25:22] let's say that I'm if player s is agent I'm looking at policy of agent say it's
[00:25:26] I'm looking at policy of agent say it's a stochastic policy times V of eval of
[00:25:30] a stochastic policy times V of eval of the successor state successor of SN a
[00:25:34] the successor state successor of SN a and this is if my player is agent so so
[00:25:39] and this is if my player is agent so so if is player I'm just gonna write this
[00:25:42] if is player I'm just gonna write this player of s is equal to agent
[00:25:46] player of s is equal to agent what happens if my players opponent I'm
[00:25:49] what happens if my players opponent I'm gonna do the same thing I'm just
[00:25:51] gonna do the same thing I'm just evaluating I have access to the policy
[00:25:53] evaluating I have access to the policy of the opponent I am again just doing
[00:25:55] of the opponent I am again just doing going to do a sum over all possible
[00:25:58] going to do a sum over all possible actions of a policy of the opponent this
[00:26:01] actions of a policy of the opponent this is given to me someone gave this to me
[00:26:03] is given to me someone gave this to me of state and action x value of the
[00:26:07] of state and action x value of the successor state there's an a and this is
[00:26:12] successor state there's an a and this is the case that my player is the opponent
[00:26:15] the case that my player is the opponent here so this is the recurrence that we
[00:26:17] here so this is the recurrence that we were going to just write and it's kind
[00:26:19] were going to just write and it's kind of intuitive again we have seen this
[00:26:20] of intuitive again we have seen this entry search shoe like you start with
[00:26:22] entry search shoe like you start with the utilities at the leaf nodes and you
[00:26:24] the utilities at the leaf nodes and you just push that back up based on what
[00:26:26] just push that back up based on what your policies are and what your policies
[00:26:28] your policies are and what your policies are telling you like which sides like
[00:26:29] are telling you like which sides like which and which edges of the tree you
[00:26:32] which and which edges of the tree you are taking what probably okay
[00:26:35] are taking what probably okay this make sense all right okay so that
[00:26:40] this make sense all right okay so that was evaluating the game but what if now
[00:26:42] was evaluating the game but what if now I want to solve what the agent should do
[00:26:45] I want to solve what the agent should do right like I am the agent I care about
[00:26:47] right like I am the agent I care about doing like look like figuring out what
[00:26:49] doing like look like figuring out what my PI agent is I don't know what my PI
[00:26:50] my PI agent is I don't know what my PI agent is I need to figure out what sort
[00:26:52] agent is I need to figure out what sort of policy I should be following and that
[00:26:54] of policy I should be following and that kind of takes us to this idea of expect
[00:26:57] kind of takes us to this idea of expect Emacs which is basically the idea of if
[00:27:00] Emacs which is basically the idea of if I'm in a scenario where I know what my
[00:27:02] I'm in a scenario where I know what my opponent does so I'm is still assuming
[00:27:04] opponent does so I'm is still assuming what I know what my opponent does what
[00:27:07] what I know what my opponent does what would be the best thing that I should be
[00:27:08] would be the best thing that I should be doing as an agent okay what would be the
[00:27:11] doing as an agent okay what would be the best thing I should do like if you knew
[00:27:15] best thing I should do like if you knew like in the bucket example I was trying
[00:27:17] like in the bucket example I was trying I was acting probabilistic Lee what
[00:27:20] I was acting probabilistic Lee what would you do
[00:27:23] the action that like gives you the
[00:27:25] the action that like gives you the maximum so you pick the action that
[00:27:28] maximum so you pick the action that gives you the maximum value because
[00:27:29] gives you the maximum value because you're trying to maximize your own your
[00:27:31] you're trying to maximize your own your own value so so then if that is the case
[00:27:34] own value so so then if that is the case then this recurrence needs to change
[00:27:35] then this recurrence needs to change right this recurrence the way it changes
[00:27:38] right this recurrence the way it changes is I'm gonna call this is that new value
[00:27:40] is I'm gonna call this is that new value so I'm gonna just do everything on top
[00:27:42] so I'm gonna just do everything on top of this I'm not rewriting it I'm gonna
[00:27:45] of this I'm not rewriting it I'm gonna call this value value of expecting max
[00:27:48] call this value value of expecting max policy okay so so this value eval I'm
[00:27:52] policy okay so so this value eval I'm not evaluating anything anymore I want
[00:27:55] not evaluating anything anymore I want to actually figure out what my agent
[00:27:57] to actually figure out what my agent should do so I'm gonna call it expect a
[00:27:59] should do so I'm gonna call it expect a max and if I know a policy of my
[00:28:04] max and if I know a policy of my opponent I'm not changing anything here
[00:28:06] opponent I'm not changing anything here because I know the policy of my opponent
[00:28:08] because I know the policy of my opponent I'm just gonna compute this but now I
[00:28:10] I'm just gonna compute this but now I want to figure out what the agent should
[00:28:12] want to figure out what the agent should do and what should the agent do mall the
[00:28:13] do and what should the agent do mall the agent should do the thing that maximizes
[00:28:15] agent should do the thing that maximizes this value so I'm gonna erase this sum
[00:28:19] this value so I'm gonna erase this sum with the policy because I don't have
[00:28:21] with the policy because I don't have that policy and the agent should do the
[00:28:23] that policy and the agent should do the thing that maximizes this value so this
[00:28:32] thing that maximizes this value so this should remind you of value iteration so
[00:28:35] should remind you of value iteration so if you remember value iteration in the
[00:28:37] if you remember value iteration in the MVP lecture if you weren't evaluating
[00:28:40] MVP lecture if you weren't evaluating things right you were trying to maximize
[00:28:41] things right you were trying to maximize our value and that's kind of like
[00:28:43] our value and that's kind of like analogous to what we are doing here
[00:28:45] analogous to what we are doing here you're trying to figure out what should
[00:28:47] you're trying to figure out what should be the policy that the agent should take
[00:28:48] be the policy that the agent should take that maximizes the value under the
[00:28:51] that maximizes the value under the scenario that I know what the opponent
[00:28:52] scenario that I know what the opponent does so I still kind of know what the
[00:28:54] does so I still kind of know what the opponent does so going back to this
[00:28:57] opponent does so going back to this example so let's say I know my opponent
[00:28:59] example so let's say I know my opponent is acting stochastically what should I
[00:29:01] is acting stochastically what should I do
[00:29:01] do so if my opponent is acting
[00:29:03] so if my opponent is acting stochastically with probability 1/2 then
[00:29:05] stochastically with probability 1/2 then the values of each one of these buckets
[00:29:07] the values of each one of these buckets are going to be 0 2 and 5 and I'm trying
[00:29:10] are going to be 0 2 and 5 and I'm trying to maximize my own you to my own value
[00:29:13] to maximize my own you to my own value so I'm gonna pick the one that gives me
[00:29:15] so I'm gonna pick the one that gives me 5 and then that's shown with this upward
[00:29:16] 5 and then that's shown with this upward triangle I'm trying to maximize so I'm
[00:29:18] triangle I'm trying to maximize so I'm gonna pick buckets see because I'm
[00:29:20] gonna pick buckets see because I'm trying to maximize under this knowledge
[00:29:22] trying to maximize under this knowledge that the other agent is stochastic
[00:29:24] that the other agent is stochastic reacting and and then we're calling this
[00:29:29] reacting and and then we're calling this the value of expecting max policy and
[00:29:31] the value of expecting max policy and the value of expecting max policy from
[00:29:33] the value of expecting max policy from the start state is equal to 5 right
[00:29:36] the start state is equal to 5 right because that's that's
[00:29:37] because that's that's you did I think I'm gonna get posture -
[00:29:41] yes this is assuming I know my opponents
[00:29:44] yes this is assuming I know my opponents policy and then I'm following you Nick I
[00:29:46] policy and then I'm following you Nick I guess so I'm maximizing my own
[00:29:49] guess so I'm maximizing my own you took my own value knowing that my
[00:29:51] you took my own value knowing that my opponent is following this policy and
[00:29:53] opponent is following this policy and what the opponent would do an
[00:29:54] what the opponent would do an expectation alright so and then this is
[00:29:57] expectation alright so and then this is the this is the recurrence that you
[00:29:59] the this is the recurrence that you would get we would just update the
[00:30:01] would get we would just update the recurrence so if the agent is playing
[00:30:04] recurrence so if the agent is playing then we maximize the value of expecting
[00:30:06] then we maximize the value of expecting max okay all right so okay in general I
[00:30:12] max okay all right so okay in general I don't know the policy of my opponent
[00:30:14] don't know the policy of my opponent right so in general like I know what
[00:30:16] right so in general like I know what gives me this payoff so so if that is
[00:30:20] gives me this payoff so so if that is the case then what should we do so one
[00:30:22] the case then what should we do so one thing that we could do is we could
[00:30:25] thing that we could do is we could assume worst case so one thing that you
[00:30:27] assume worst case so one thing that you could do is it could be like oh the
[00:30:28] could do is it could be like oh the opponent is trying to get me in and
[00:30:30] opponent is trying to get me in and they're going to play the worst-case
[00:30:31] they're going to play the worst-case scenario and they are trying to minimize
[00:30:33] scenario and they are trying to minimize my value and and that's the first thing
[00:30:36] my value and and that's the first thing to do and we are going to talk about if
[00:30:38] to do and we are going to talk about if that is always the best thing we can do
[00:30:40] that is always the best thing we can do or not a little bit later in the lecture
[00:30:41] or not a little bit later in the lecture but for now like we could assume that if
[00:30:43] but for now like we could assume that if I know nothing about my opponent I can
[00:30:45] I know nothing about my opponent I can just assume my opponent is acting
[00:30:47] just assume my opponent is acting adversarially against me so and that
[00:30:49] adversarially against me so and that kind of introduces this idea of minimax
[00:30:52] kind of introduces this idea of minimax as opposed to expect you know just
[00:30:55] as opposed to expect you know just talking about so so what would minimax
[00:30:57] talking about so so what would minimax so in the case of a minimax policy what
[00:31:00] so in the case of a minimax policy what I am assuming is I am this agent trying
[00:31:03] I am assuming is I am this agent trying to maximize my my own my own value and
[00:31:06] to maximize my my own my own value and then I'm assuming my opponent is acting
[00:31:09] then I'm assuming my opponent is acting adversarial so my opponent is really
[00:31:11] adversarial so my opponent is really trying to minimize my values and then
[00:31:13] trying to minimize my values and then what that means is from this pocket I'm
[00:31:15] what that means is from this pocket I'm gonna get minus 50 from this one I'm
[00:31:17] gonna get minus 50 from this one I'm gonna get one from this one I'm gonna
[00:31:19] gonna get one from this one I'm gonna get minus five and under that assumption
[00:31:21] get minus five and under that assumption well I'm gonna pick the second bucket
[00:31:23] well I'm gonna pick the second bucket because that gives me the highest the
[00:31:25] because that gives me the highest the highest value so so that is a minimax
[00:31:28] highest value so so that is a minimax policy so how would I change my
[00:31:31] policy so how would I change my recurrence if I were to play minimax oh
[00:31:34] recurrence if I were to play minimax oh I'm gonna I'm going to call it vo let's
[00:31:38] I'm gonna I'm going to call it vo let's look at the V of minimax of a state well
[00:31:43] look at the V of minimax of a state well the recurrence is going to be over our
[00:31:44] the recurrence is going to be over our minimax that we of minimax so let me
[00:31:46] minimax that we of minimax so let me change that
[00:31:52] if the agent is playing the agent is
[00:31:55] if the agent is playing the agent is still trying to maximize the value so
[00:31:57] still trying to maximize the value so that is all good what if the opponent is
[00:32:00] that is all good what if the opponent is playing hmm the opponent is going to
[00:32:05] playing hmm the opponent is going to minimize right so I don't have access to
[00:32:07] minimize right so I don't have access to PI up so what I'm gonna do is I'm gonna
[00:32:10] PI up so what I'm gonna do is I'm gonna remove this and say well the opponent is
[00:32:12] remove this and say well the opponent is going to take an action that minimizes
[00:32:14] going to take an action that minimizes the value of the successor of SN a and
[00:32:20] the value of the successor of SN a and this is how you would compute the value
[00:32:23] this is how you would compute the value of minimax policy so what happens like
[00:32:43] of minimax policy so what happens like if the adversary or agent is not always
[00:32:45] if the adversary or agent is not always adversarial right so in that case you
[00:32:47] adversarial right so in that case you have another stochastic policy that kind
[00:32:48] have another stochastic policy that kind of defines what the what the opponent is
[00:32:50] of defines what the what the opponent is doing and if you have access to then you
[00:32:52] doing and if you have access to then you can do something similar to expect the
[00:32:53] can do something similar to expect the max if you don't have access to that
[00:32:55] max if you don't have access to that maybe you would want to like act
[00:32:56] maybe you would want to like act worse-case and assume that they're
[00:32:58] worse-case and assume that they're always trying to minimize but but that's
[00:33:00] always trying to minimize but but that's some prior knowledge that you have that
[00:33:01] some prior knowledge that you have that allows you to track better or maybe
[00:33:03] allows you to track better or maybe evaluate the value better for every
[00:33:06] evaluate the value better for every state so we will talk about evaluation
[00:33:07] state so we will talk about evaluation functions a little bit in the lecture
[00:33:09] functions a little bit in the lecture and maybe you look that can inform your
[00:33:10] and maybe you look that can inform your evaluation function all right so so so
[00:33:16] evaluation function all right so so so here the value of minimax from the start
[00:33:19] here the value of minimax from the start state is going to be one right does
[00:33:21] state is going to be one right does everyone see that so I'm assuming my
[00:33:23] everyone see that so I'm assuming my opponent is acting adversarially so we
[00:33:26] opponent is acting adversarially so we have minus 51 and minus 5 if I am
[00:33:28] have minus 51 and minus 5 if I am maximizing then the best thing I can get
[00:33:30] maximizing then the best thing I can get is 1 and that's how we compute V of
[00:33:33] is 1 and that's how we compute V of minimax and then there is really no
[00:33:36] minimax and then there is really no analogy to this in MVP setting because
[00:33:39] analogy to this in MVP setting because in the MVP setting you don't really have
[00:33:40] in the MVP setting you don't really have this game we don't really have this
[00:33:41] this game we don't really have this opponent that's playing against us and
[00:33:44] opponent that's playing against us and what happens is is that this is the
[00:33:46] what happens is is that this is the recurrence that you're gonna get which
[00:33:47] recurrence that you're gonna get which is what we already have on the board
[00:33:49] is what we already have on the board right okay so so what would the policy
[00:33:53] right okay so so what would the policy be so the policy is just going to be the
[00:33:55] be so the policy is just going to be the Arg max of this V of minimax so if you
[00:33:58] Arg max of this V of minimax so if you want to know what the policy of your
[00:33:59] want to know what the policy of your agent should be that's PI max it's the
[00:34:01] agent should be that's PI max it's the Arg max of V
[00:34:02] Arg max of V minimax / successor of that state and if
[00:34:05] minimax / successor of that state and if you want to know what the policy of your
[00:34:07] you want to know what the policy of your opponent that state s should be well
[00:34:09] opponent that state s should be well that's argument of me of minimax which
[00:34:12] that's argument of me of minimax which is intuitive it's other than that way
[00:34:14] is intuitive it's other than that way you can actually figure out what the
[00:34:15] you can actually figure out what the action should be what the policy what
[00:34:17] action should be what the policy what the actual action should be okay all
[00:34:20] the actual action should be okay all right so let's go back to this example
[00:34:22] right so let's go back to this example this half in game so what we want to do
[00:34:23] this half in game so what we want to do is you want to actually code up what a
[00:34:25] is you want to actually code up what a minimax policy would do in this setting
[00:34:28] minimax policy would do in this setting and maybe we can play with a minimax
[00:34:30] and maybe we can play with a minimax policy after that okay so what would a
[00:34:36] policy after that okay so what would a minimax policy dudes so it's a policy so
[00:34:38] minimax policy dudes so it's a policy so it's going to be a function of states
[00:34:40] it's going to be a function of states it's just give it the state and you're
[00:34:42] it's just give it the state and you're gonna just write this recursion that we
[00:34:44] gonna just write this recursion that we have on the board so so we're recursing
[00:34:46] have on the board so so we're recursing over the state if you're in an end state
[00:34:49] over the state if you're in an end state then what are we returning just so
[00:34:53] then what are we returning just so utility okay
[00:34:55] utility okay so we're returning the utility of that
[00:34:58] so we're returning the utility of that state and there's no actions and then if
[00:35:02] state and there's no actions and then if you're not in an end state then you're
[00:35:04] you're not in an end state then you're either maximizing or minimizing over a
[00:35:07] either maximizing or minimizing over a set of choices so let's actually like
[00:35:09] set of choices so let's actually like create those choices so they can just
[00:35:10] create those choices so they can just call Max and min on them so the choices
[00:35:14] call Max and min on them so the choices we're going to iterate over all actions
[00:35:16] we're going to iterate over all actions that we have and and what is that going
[00:35:23] that we have and and what is that going to be exactly well that's going to be
[00:35:24] to be exactly well that's going to be doing a recursion over the successor
[00:35:27] doing a recursion over the successor state so we're going to recurse over the
[00:35:30] state so we're going to recurse over the successor state so recurs over signal
[00:35:33] successor state so recurs over signal game that successor of state in action
[00:35:36] game that successor of state in action and I'm gonna return the action here too
[00:35:38] and I'm gonna return the action here too because I just want to get the policy
[00:35:40] because I just want to get the policy later and the successor this district
[00:35:43] later and the successor this district URIs function returns a state in action
[00:35:45] URIs function returns a state in action so I just want to get the state from the
[00:35:47] so I just want to get the state from the first one in the action for the second
[00:35:49] first one in the action for the second one okay so if player is plus 1 that's
[00:35:53] one okay so if player is plus 1 that's the agent the agent should maximize the
[00:35:55] the agent the agent should maximize the choices and if player is minus 1 then
[00:35:58] choices and if player is minus 1 then then that's the opponent the opponent
[00:36:01] then that's the opponent the opponent should try to minimize over these
[00:36:02] should try to minimize over these choices and that's pretty much like this
[00:36:06] choices and that's pretty much like this recursion that we have on the board and
[00:36:08] recursion that we have on the board and that's our recursive function
[00:36:13] okay so we're gonna recurse over over
[00:36:15] okay so we're gonna recurse over over our state and that gives us a value and
[00:36:18] our state and that gives us a value and it also gives us gives us an action so
[00:36:21] it also gives us gives us an action so let's just print things out so you can
[00:36:23] let's just print things out so you can refer to them so minimax gives us an
[00:36:28] refer to them so minimax gives us an action and it tells us this is the value
[00:36:31] action and it tells us this is the value that you can get all right and then it's
[00:36:40] that you can get all right and then it's a policy so let's just return the action
[00:36:42] a policy so let's just return the action okay so now what I'm gonna do is I'm
[00:36:44] okay so now what I'm gonna do is I'm gonna say +1 agent is still a human
[00:36:47] gonna say +1 agent is still a human policy and then it's playing against a
[00:36:49] policy and then it's playing against a minimax policy so alright so let's who
[00:36:53] minimax policy so alright so let's who wants to play with this it's a little
[00:36:56] wants to play with this it's a little scarier to play with a mini policy
[00:37:01] alright so let's do this
[00:37:03] alright so let's do this Python alright so you are the agent so
[00:37:09] Python alright so you are the agent so your player one you're starting from 15
[00:37:11] your player one you're starting from 15 what do you want to do so you just lost
[00:37:17] what do you want to do so you just lost the game why do I know you lost the game
[00:37:22] the game why do I know you lost the game oh now it's player minus one point we
[00:37:24] oh now it's player minus one point we are at 7 and minimax policy took action
[00:37:27] are at 7 and minimax policy took action minor and says action - and and and it
[00:37:32] minor and says action - and and and it also yeah
[00:37:33] also yeah takes action - so we're at 6 and then
[00:37:35] takes action - so we're at 6 and then the value of the game is minus infinity
[00:37:37] the value of the game is minus infinity so you're playing at a minimize policy
[00:37:39] so you're playing at a minimize policy you're already getting minus infinity so
[00:37:41] you're already getting minus infinity so so you just lost you anyone want to try
[00:37:44] so you just lost you anyone want to try this again you want to try it again
[00:37:53] subtract okay so you can win right so
[00:38:00] subtract okay so you can win right so value is infinity right now and then
[00:38:04] value is infinity right now and then yeah so and then the minimax policy also
[00:38:06] yeah so and then the minimax policy also did a minus so we are at 13 right now
[00:38:08] did a minus so we are at 13 right now it's your turn you're at 13 you just
[00:38:14] it's your turn you're at 13 you just lost the game again yeah so minus
[00:38:18] lost the game again yeah so minus infinity yes yeah actually you need to
[00:38:20] infinity yes yeah actually you need to like alternate between them I think that
[00:38:22] like alternate between them I think that is the best policy but maybe this get a
[00:38:27] is the best policy but maybe this get a sense of how this runs the code is on
[00:38:29] sense of how this runs the code is on line so just feel free to play with her
[00:38:31] line so just feel free to play with her then figure out what is the best policy
[00:38:34] then figure out what is the best policy to use all right
[00:38:37] to use all right so okay so that was the minimax policy
[00:38:42] so okay so that was the minimax policy and then this is kind of a recurrence
[00:38:44] and then this is kind of a recurrence that we get for a minimax policy now
[00:38:46] that we get for a minimax policy now what I want to do is I want to spend a
[00:38:47] what I want to do is I want to spend a little bit of time talking about some
[00:38:51] little bit of time talking about some properties of this minimax policy and
[00:38:53] properties of this minimax policy and we've talked about two types of policies
[00:38:55] we've talked about two types of policies so far right we've talked about
[00:38:56] so far right we've talked about expecting max which is basically saying
[00:38:58] expecting max which is basically saying I as an agent I'm trying to maximize but
[00:39:02] I as an agent I'm trying to maximize but I know what my opponent is going to do
[00:39:04] I know what my opponent is going to do so I'm going to assume my opponent does
[00:39:06] so I'm going to assume my opponent does whatever and then I'm going to maximize
[00:39:08] whatever and then I'm going to maximize based on that so so for example I am
[00:39:10] based on that so so for example I am following and I'm gonna refer that to a
[00:39:12] following and I'm gonna refer that to a spy of expecting max which means that
[00:39:15] spy of expecting max which means that the agent and everything in red is for
[00:39:17] the agent and everything in red is for the agent everything in blue is for the
[00:39:19] the agent everything in blue is for the opponent so I'm gonna say the agent is
[00:39:22] opponent so I'm gonna say the agent is following this policy which says I'm
[00:39:24] following this policy which says I'm going to maximize assuming my opponent
[00:39:26] going to maximize assuming my opponent is doing whatever and here I'm calling
[00:39:28] is doing whatever and here I'm calling pi7
[00:39:29] pi7 as like some opponent policy it could be
[00:39:31] as like some opponent policy it could be like anything but pi7
[00:39:33] like anything but pi7 so let's say that's opponent is playing
[00:39:34] so let's say that's opponent is playing PI 7 I am going to maximize based on
[00:39:36] PI 7 I am going to maximize based on that and and the value we just talked
[00:39:38] that and and the value we just talked about is the value of expecting max the
[00:39:40] about is the value of expecting max the other value we just talked about is the
[00:39:42] other value we just talked about is the value of minimax which says I am the
[00:39:44] value of minimax which says I am the agent I am going to maximize assuming
[00:39:46] agent I am going to maximize assuming the opponent is going to minimize and
[00:39:48] the opponent is going to minimize and then the opponent actually is going to
[00:39:50] then the opponent actually is going to minimize and it's going to follow-up I
[00:39:51] minimize and it's going to follow-up I mean okay so so these are the two values
[00:39:53] mean okay so so these are the two values you have talked about so far I want to
[00:39:56] you have talked about so far I want to talk a little bit about the properties
[00:39:57] talk a little bit about the properties of this but before that let me sorry oh
[00:40:01] of this but before that let me sorry oh wait it like kind of like mix the two
[00:40:03] wait it like kind of like mix the two together we just say like
[00:40:05] together we just say like heightened like the probability of
[00:40:06] heightened like the probability of typing the minimum for like in expectrum
[00:40:08] typing the minimum for like in expectrum acts I've double probability
[00:40:11] acts I've double probability distribution over like the actions right
[00:40:13] distribution over like the actions right so they're like why don't we just take
[00:40:14] so they're like why don't we just take the action that like minimizes whatever
[00:40:16] the action that like minimizes whatever our reward is and give it a higher
[00:40:17] our reward is and give it a higher weight I didn't fully follow is it are
[00:40:27] weight I didn't fully follow is it are you coming up with a new policy that
[00:40:28] you coming up with a new policy that your thing would be a better policy
[00:40:33] between like expecting Max and minimax
[00:40:35] between like expecting Max and minimax in something so this this this table
[00:40:38] in something so this this this table might kind of address that because it's
[00:40:40] might kind of address that because it's it's considering four different cases
[00:40:42] it's considering four different cases it's actually not considering the two
[00:40:43] it's actually not considering the two cases so this might actually refer to
[00:40:45] cases so this might actually refer to what you were what you were proposing so
[00:40:47] what you were what you were proposing so so let's actually go through this first
[00:40:48] so let's actually go through this first and then maybe like if it doesn't answer
[00:40:50] and then maybe like if it doesn't answer that so all right so so I want to talk
[00:40:52] that so all right so so I want to talk about a setting so this table it's
[00:40:54] about a setting so this table it's actually not that confusing but it can
[00:40:55] actually not that confusing but it can get confusing so do pay attention to
[00:40:57] get confusing so do pay attention to this part all right so do I want to
[00:41:02] this part all right so do I want to maybe maybe I'll write it over there so
[00:41:03] maybe maybe I'll write it over there so I'm gonna use red for agent blue one for
[00:41:11] I'm gonna use red for agent blue one for left my right okay all right okay and
[00:41:15] left my right okay all right okay and then I'm gonna use before I'm gonna use
[00:41:20] then I'm gonna use before I'm gonna use blue for the opponent policy okay so so
[00:41:25] blue for the opponent policy okay so so then for agent who are going to have pie
[00:41:28] then for agent who are going to have pie max right in agent could play pie max
[00:41:31] max right in agent could play pie max what does that mean again I'm going to
[00:41:33] what does that mean again I'm going to maximize assuming you're going to
[00:41:35] maximize assuming you're going to minimize an agent could play pie
[00:41:37] minimize an agent could play pie expecting max maybe a policy 7 I'm going
[00:41:41] expecting max maybe a policy 7 I'm going to put 7 here which means I'm going to
[00:41:43] to put 7 here which means I'm going to maximize assuming you're going to follow
[00:41:45] maximize assuming you're going to follow this pie 7 so this is a thing that the
[00:41:48] this pie 7 so this is a thing that the agents can do okay and then there are
[00:41:53] agents can do okay and then there are things that my opponent can you I'm
[00:41:56] things that my opponent can you I'm gonna write that here my opponent can
[00:41:58] gonna write that here my opponent can actually follow pie min which is I'm
[00:42:01] actually follow pie min which is I'm just going to minimize all my opponent
[00:42:03] just going to minimize all my opponent could follow some other policy PI 7
[00:42:05] could follow some other policy PI 7 let's say PI 7 in the bucket example
[00:42:07] let's say PI 7 in the bucket example right now is just acting as
[00:42:09] right now is just acting as stochastically so half the time pick one
[00:42:11] stochastically so half the time pick one number half the time pick another number
[00:42:12] number half the time pick another number okay so so that is
[00:42:15] okay so so that is what we have so I'm gonna draw my
[00:42:18] what we have so I'm gonna draw my actually my tree so we can go over
[00:42:20] actually my tree so we can go over examples of that too so this was the
[00:42:23] examples of that too so this was the bucket example we started at minus 50
[00:42:26] bucket example we started at minus 50 and 50 in bucket a 1 and 3 in bucket B
[00:42:31] and 50 in bucket a 1 and 3 in bucket B minus 5 and 15 in bucket C ok so this
[00:42:34] minus 5 and 15 in bucket C ok so this was my buckets example I'm actually
[00:42:36] was my buckets example I'm actually going to talk about it so alright so I'm
[00:42:39] going to talk about it so alright so I'm going to talk about a bunch of
[00:42:40] going to talk about a bunch of properties of me of Pi Max and timing
[00:42:46] properties of me of Pi Max and timing which is what we have been referring to
[00:42:48] which is what we have been referring to as the minimax value okay so I want to
[00:42:52] as the minimax value okay so I want to talk about this a little bit so the
[00:42:55] talk about this a little bit so the first property that that we can have is
[00:42:57] first property that that we can have is is that V of Pi max time in is it is
[00:43:10] is that V of Pi max time in is it is going to be an upper bound of any other
[00:43:18] going to be an upper bound of any other value of any other policy pi of I'm
[00:43:22] value of any other policy pi of I'm gonna just write PI of expecting max for
[00:43:23] gonna just write PI of expecting max for any other policy for the agent assuming
[00:43:27] any other policy for the agent assuming that my opponent is playing as a
[00:43:30] that my opponent is playing as a minimizer okay so so what I'm writing so
[00:43:35] minimizer okay so so what I'm writing so what I'm writing here is is the value is
[00:43:36] what I'm writing here is is the value is going to be an upper bound of any other
[00:43:39] going to be an upper bound of any other value if my agent decides to do anything
[00:43:41] value if my agent decides to do anything else under the assumption that my
[00:43:44] else under the assumption that my opponent is a minimizer so my opponent
[00:43:46] opponent is a minimizer so my opponent is really trying to get me if my
[00:43:47] is really trying to get me if my opponent is really trying to get me then
[00:43:49] opponent is really trying to get me then the best thing I can do is to maximize
[00:43:51] the best thing I can do is to maximize okay so that's kind of intuitive right
[00:43:53] okay so that's kind of intuitive right that's an upper bound let's look at that
[00:43:55] that's an upper bound let's look at that example so what is PI V of Pi mix PI max
[00:43:59] example so what is PI V of Pi mix PI max and PI min so so we just talked about
[00:44:01] and PI min so so we just talked about that right so if this guy is a minimizer
[00:44:03] that right so if this guy is a minimizer we're gonna get minus 50 here 1 here
[00:44:06] we're gonna get minus 50 here 1 here minus 5 here if this guy is a Maximizer
[00:44:09] minus 5 here if this guy is a Maximizer what is the value I'm gonna get get it 1
[00:44:12] what is the value I'm gonna get get it 1 right I'm gonna go down here and then
[00:44:13] right I'm gonna go down here and then I'm gonna get one so V of PI max and
[00:44:17] I'm gonna get one so V of PI max and timing is just equal to 1 that is this
[00:44:19] timing is just equal to 1 that is this value that is just equal to 1 okay what
[00:44:23] value that is just equal to 1 okay what is this saying is that this is going to
[00:44:26] is this saying is that this is going to be greater than
[00:44:28] be greater than maybe the setting where my opponent
[00:44:32] maybe the setting where my opponent sorry my my agent is following expecting
[00:44:34] sorry my my agent is following expecting max and my opponent is still doing
[00:44:36] max and my opponent is still doing timing so so what would this correspond
[00:44:38] timing so so what would this correspond to what would this value correspond to
[00:44:40] to what would this value correspond to so this is a value which says well I'm
[00:44:45] so this is a value which says well I'm going to take an action assuming my
[00:44:47] going to take an action assuming my opponent is acting as stochastically if
[00:44:50] opponent is acting as stochastically if my opponent is acting stochastically I'm
[00:44:52] my opponent is acting stochastically I'm gonna get zero here I'm gonna get two
[00:44:54] gonna get zero here I'm gonna get two here I'm gonna get five here if I'm
[00:44:56] here I'm gonna get five here if I'm assuming that and I'm trying to maximize
[00:44:58] assuming that and I'm trying to maximize my own my own value which trout do I go
[00:45:01] my own my own value which trout do I go I'm gonna go at this trout but it turns
[00:45:04] I'm gonna go at this trout but it turns out that my opponent was not doing that
[00:45:06] out that my opponent was not doing that my opponent was actually a minimizer so
[00:45:09] my opponent was actually a minimizer so if my opponent was actually a minimizer
[00:45:10] if my opponent was actually a minimizer and I went this route my opponent is
[00:45:14] and I went this route my opponent is going to give me minus 5 so the value
[00:45:16] going to give me minus 5 so the value I'm gonna end up getting is minus 5 so
[00:45:19] I'm gonna end up getting is minus 5 so this is equal to minus 5 this is equal
[00:45:23] this is equal to minus 5 this is equal to minus y so so far I've shown that
[00:45:27] to minus y so so far I've shown that this guy is greater than this guy all
[00:45:32] this guy is greater than this guy all right so that's the first property first
[00:45:34] right so that's the first property first property is if my opponent is terrible
[00:45:35] property is if my opponent is terrible and is trying to get me best thing I can
[00:45:37] and is trying to get me best thing I can do is to maximize I shouldn't do
[00:45:39] do is to maximize I shouldn't do anything else okay the second property
[00:45:42] anything else okay the second property is is that this is V of Pi knocks again
[00:45:47] is is that this is V of Pi knocks again the same be V of Pi Max and pi min is
[00:45:52] the same be V of Pi Max and pi min is now a lower bound of a setting where
[00:45:56] now a lower bound of a setting where your agent is maximizing assuming your
[00:46:00] your agent is maximizing assuming your opponent is minimizing but your opponent
[00:46:02] opponent is minimizing but your opponent was actually not minimizing your
[00:46:03] was actually not minimizing your opponent was falling by 7 so so what
[00:46:06] opponent was falling by 7 so so what this says is if you're trying to
[00:46:08] this says is if you're trying to maximize assuming your agent or your
[00:46:10] maximize assuming your agent or your opponent is always minimizing then then
[00:46:12] opponent is always minimizing then then you're doing like you'll come up with
[00:46:14] you're doing like you'll come up with like a lower bound and if your opponent
[00:46:16] like a lower bound and if your opponent ends up doing something else you can
[00:46:17] ends up doing something else you can always just do better than this lower
[00:46:19] always just do better than this lower bound okay so what is what is this V you
[00:46:25] bound okay so what is what is this V you call - well we just showed that is Titus
[00:46:26] call - well we just showed that is Titus 1 right that is this value okay what
[00:46:30] 1 right that is this value okay what does this correspond to so this is value
[00:46:33] does this correspond to so this is value of pi max which is I am going to assume
[00:46:37] of pi max which is I am going to assume you're trying to get me if I'm going to
[00:46:39] you're trying to get me if I'm going to assume you're trying to get me I'm going
[00:46:40] assume you're trying to get me I'm going to
[00:46:40] to on this route because that is the thing
[00:46:42] on this route because that is the thing that gives me the highest the highest
[00:46:44] that gives me the highest the highest value but you're not trying to get me or
[00:46:46] value but you're not trying to get me or falling pi7
[00:46:47] falling pi7 so if you're falling falling by seven
[00:46:49] so if you're falling falling by seven you're just going to give me half the
[00:46:51] you're just going to give me half the time one half the times three and that
[00:46:54] time one half the times three and that actually corresponds to two and I'm
[00:46:56] actually corresponds to two and I'm going to get value two instead of value
[00:46:59] going to get value two instead of value one so this is actually equal to two in
[00:47:02] one so this is actually equal to two in this case and this corresponds to this
[00:47:05] this case and this corresponds to this value in the table which is again the
[00:47:07] value in the table which is again the agent is following a Maximizer assuming
[00:47:10] agent is following a Maximizer assuming the opponent is a minimizer ponen was
[00:47:12] the opponent is a minimizer ponen was not a minimize their opponent was just
[00:47:13] not a minimize their opponent was just following pi seven and this is just
[00:47:17] following pi seven and this is just equal to two okay so so far the things
[00:47:21] equal to two okay so so far the things I've shown are actually very intuitive
[00:47:23] I've shown are actually very intuitive they seem a little complicated but
[00:47:24] they seem a little complicated but they're very intuitive what I've shown
[00:47:25] they're very intuitive what I've shown is that this value of minimax it's an
[00:47:28] is that this value of minimax it's an upper bound if you're assuming our
[00:47:30] upper bound if you're assuming our opponent is a terrible opponent like
[00:47:33] opponent is a terrible opponent like it's going to be an upper bound because
[00:47:34] it's going to be an upper bound because the best thing I can do is maximize I've
[00:47:36] the best thing I can do is maximize I've also shown it's a lower bound if my
[00:47:38] also shown it's a lower bound if my opponent is not as bad so so that's
[00:47:40] opponent is not as bad so so that's that's what I've shown so far secure the
[00:47:45] that's what I've shown so far secure the opponent's policy is completely innocent
[00:47:49] opponent's policy is completely innocent yeah so here like because yeah the agent
[00:47:52] yeah so here like because yeah the agent actually doesn't see the opponent where
[00:47:54] actually doesn't see the opponent where the opponent does right even in the
[00:47:56] the opponent does right even in the expecting max case it thinks the
[00:47:57] expecting max case it thinks the opponent is going to follow PI seven but
[00:47:59] opponent is going to follow PI seven but maybe the opponent falls at PI 7 maybe
[00:48:01] maybe the opponent falls at PI 7 maybe not right so so like when we talk about
[00:48:03] not right so so like when we talk about expecting Max and minimax it's always
[00:48:05] expecting Max and minimax it's always the case that the opponent doesn't
[00:48:06] the case that the opponent doesn't actually see what the opponent does but
[00:48:08] actually see what the opponent does but the opponent can't think like the agent
[00:48:10] the opponent can't think like the agent can think what the opponent does and I'm
[00:48:13] can think what the opponent does and I'm going to talk about one more property
[00:48:14] going to talk about one more property and this last property basically says if
[00:48:18] and this last property basically says if you know something actually goes back to
[00:48:20] you know something actually goes back to your question if you know something
[00:48:22] your question if you know something about your opponent right if you know
[00:48:24] about your opponent right if you know something about your opponent then you
[00:48:26] something about your opponent then you shouldn't do that minimax policy you
[00:48:27] shouldn't do that minimax policy you should actually do the thing that has
[00:48:29] should actually do the thing that has some knowledge of what your opponent to
[00:48:31] some knowledge of what your opponent to us so so that basically says this we PI
[00:48:38] us so so that basically says this we PI max and some PI of opponent
[00:48:42] max and some PI of opponent you know something about PI opponent you
[00:48:44] you know something about PI opponent you know that opponent is playing PI 7 that
[00:48:47] know that opponent is playing PI 7 that is going to be less than or equal to the
[00:48:48] is going to be less than or equal to the case where you are following the PI of
[00:48:54] case where you are following the PI of expect emacs of seven and the opponent
[00:48:59] expect emacs of seven and the opponent actually falls by seven okay so what is
[00:49:02] actually falls by seven okay so what is this last equality inequality saying
[00:49:04] this last equality inequality saying well it is saying that the case where
[00:49:05] well it is saying that the case where you're trying to maximize and you think
[00:49:07] you're trying to maximize and you think your opponent is minimizing but your
[00:49:09] your opponent is minimizing but your opponent is actually not minimizing the
[00:49:11] opponent is actually not minimizing the value of that is going to be less than a
[00:49:12] value of that is going to be less than a case where you're maximizing under some
[00:49:15] case where you're maximizing under some knowledge of your opponent's policy and
[00:49:17] knowledge of your opponent's policy and your opponent's policy actually ended up
[00:49:19] your opponent's policy actually ended up doing that okay so so the first term is
[00:49:23] doing that okay so so the first term is always the agent the second term is
[00:49:25] always the agent the second term is always the opponent right so so this
[00:49:26] always the opponent right so so this value we have already computed that
[00:49:28] value we have already computed that that's equal to 2 this value what is
[00:49:30] that's equal to 2 this value what is this value saying it is saying you are
[00:49:34] this value saying it is saying you are going to maximize assuming your opponent
[00:49:36] going to maximize assuming your opponent is stochastic so if I'm assuming my
[00:49:38] is stochastic so if I'm assuming my opponent is stochastic then I'm assuming
[00:49:41] opponent is stochastic then I'm assuming that this is 0 this is 2 this is 5 right
[00:49:45] that this is 0 this is 2 this is 5 right I am trying to maximize so which one am
[00:49:49] I am trying to maximize so which one am I every track should I go I should go
[00:49:51] I every track should I go I should go this route because that gives me 5 so
[00:49:54] this route because that gives me 5 so this is the agent thinking the opponent
[00:49:56] this is the agent thinking the opponent is going to be a stochastic thinking is
[00:49:58] is going to be a stochastic thinking is going to get 5 and it gets here and the
[00:50:00] going to get 5 and it gets here and the opponent actually ends up following PI 7
[00:50:02] opponent actually ends up following PI 7 which is a stochastic thing so so we are
[00:50:04] which is a stochastic thing so so we are actually going to get 5 so so this guy
[00:50:07] actually going to get 5 so so this guy is equal to 5 and this is the last
[00:50:11] is equal to 5 and this is the last inequality that we have
[00:50:13] inequality that we have we are PI expecting a max of 7 and pi of
[00:50:20] we are PI expecting a max of 7 and pi of 7 is greater than or equal to V of Pi
[00:50:26] 7 is greater than or equal to V of Pi max and PI 7 we just showed this is
[00:50:28] max and PI 7 we just showed this is equal to 5
[00:50:29] equal to 5 for this example ok all right
[00:50:37] the reactions of your points always
[00:50:40] the reactions of your points always whether or not if it's the guests is it
[00:50:43] whether or not if it's the guests is it we say too so if you know something
[00:50:50] we say too so if you know something about this the cast to say that's not
[00:50:51] about this the cast to say that's not really like here I knew that the
[00:50:53] really like here I knew that the opponent was following this is a casting
[00:50:54] opponent was following this is a casting policy of one half one out I might have
[00:50:56] policy of one half one out I might have known that the opponent is following a
[00:50:58] known that the opponent is following a deterministic policy and always is
[00:51:00] deterministic policy and always is picking the left one so I could have
[00:51:01] picking the left one so I could have like follows like a same expecting max
[00:51:03] like follows like a same expecting max policy under that knowledge it could be
[00:51:05] policy under that knowledge it could be anything else but the whole idea of
[00:51:06] anything else but the whole idea of expecting max is I have some knowledge
[00:51:08] expecting max is I have some knowledge of what the policy of the opponent is it
[00:51:10] of what the policy of the opponent is it could be a stochastic policy it could be
[00:51:12] could be a stochastic policy it could be a deterministic policy under that how
[00:51:14] a deterministic policy under that how would I maximize that me not like that
[00:51:19] would I maximize that me not like that bottom right is greater than the bottom
[00:51:22] bottom right is greater than the bottom always yeah so the question is do we
[00:51:25] always yeah so the question is do we have so we have what is in equality so
[00:51:27] have so we have what is in equality so transitively this guy is always greater
[00:51:29] transitively this guy is always greater than this guy and that kind of makes
[00:51:31] than this guy and that kind of makes sense right like you're saying like if
[00:51:33] sense right like you're saying like if you're following expecting max okay so
[00:51:35] you're following expecting max okay so this last one kind of makes sense right
[00:51:36] this last one kind of makes sense right it's basically saying if you're
[00:51:37] it's basically saying if you're following expecting max and you know
[00:51:39] following expecting max and you know something about your opponent and your
[00:51:41] something about your opponent and your opponent actually ended up doing that so
[00:51:42] opponent actually ended up doing that so though your value should be greater than
[00:51:44] though your value should be greater than pretty much anything right because you
[00:51:45] pretty much anything right because you knew something about the opponent you
[00:51:47] knew something about the opponent you play knowing that having that knowledge
[00:51:50] play knowing that having that knowledge yes
[00:51:53] is that just caste clear know what so
[00:51:58] is that just caste clear know what so it's knowing what they're gonna take
[00:51:59] it's knowing what they're gonna take right click here I knew what the point I
[00:52:01] right click here I knew what the point I knew that half the time they're going to
[00:52:02] knew that half the time they're going to take this one half the time you're going
[00:52:04] take this one half the time you're going to take the other one and then I use
[00:52:05] to take the other one and then I use that knowledge right now is the
[00:52:10] that knowledge right now is the expecting max policy given that your
[00:52:12] expecting max policy given that your opponent is following atomizer policy
[00:52:14] opponent is following atomizer policy given that given if your opponent is
[00:52:17] given that given if your opponent is following pipe in it is it to do a
[00:52:20] following pipe in it is it to do a Maximizer
[00:52:21] Maximizer so the expecting that policy is as this
[00:52:24] so the expecting that policy is as this policy when here we have a some the
[00:52:27] policy when here we have a some the expecting max policy assumes your
[00:52:30] expecting max policy assumes your opponent is following PI opponent and
[00:52:32] opponent is following PI opponent and assumes that it has access to PI
[00:52:34] assumes that it has access to PI opponent so it ends up doing this some
[00:52:36] opponent so it ends up doing this some over here
[00:52:38] we probably were saying so you're saying
[00:52:41] we probably were saying so you're saying if PI opponent is actually PI min then
[00:52:43] if PI opponent is actually PI min then do they end up being equal to each other
[00:52:45] do they end up being equal to each other and yeah I guess yeah so you know that
[00:52:49] and yeah I guess yeah so you know that you're poor it's becomes minimax right
[00:52:50] you're poor it's becomes minimax right if you know your opponent is following
[00:52:52] if you know your opponent is following me as I can't minimize every just like
[00:52:54] me as I can't minimize every just like all that minimax all right so I'm gonna
[00:52:56] all that minimax all right so I'm gonna move ahead a little bit
[00:52:58] move ahead a little bit all right so and then this is like what
[00:53:01] all right so and then this is like what we have already talked about okay
[00:53:03] we have already talked about okay so a few other things about modifying
[00:53:05] so a few other things about modifying this game so so we have okay so we have
[00:53:06] this game so so we have okay so we have talked about this game we have talked
[00:53:08] talked about this game we have talked about properties of this game there's a
[00:53:09] about properties of this game there's a simple modification one can do which is
[00:53:11] simple modification one can do which is bring nature in so there was a question
[00:53:13] bring nature in so there was a question earlier which was like is there any
[00:53:15] earlier which was like is there any chance here and then yeah you can like
[00:53:17] chance here and then yeah you can like actually bring chance inside here so so
[00:53:19] actually bring chance inside here so so let's say that you have the same game as
[00:53:20] let's say that you have the same game as before you're choosing one of the three
[00:53:22] before you're choosing one of the three bins and then after choosing one of the
[00:53:24] bins and then after choosing one of the three wins you can flip a coin and if
[00:53:27] three wins you can flip a coin and if heads comes then you can move one bin to
[00:53:30] heads comes then you can move one bin to the left with wraparound so what this
[00:53:33] the left with wraparound so what this means is 50% of the time tails comes
[00:53:35] means is 50% of the time tails comes you're not changing anything you have
[00:53:37] you're not changing anything you have this setup 50% of the time you get heads
[00:53:40] this setup 50% of the time you get heads and then in those settings you're just
[00:53:42] and then in those settings you're just gonna pick like a neighboring bin as
[00:53:43] gonna pick like a neighboring bin as opposed to your zombie so so there
[00:53:47] opposed to your zombie so so there you're adding this notion of chance here
[00:53:48] you're adding this notion of chance here and and it's kind of acting as a new
[00:53:51] and and it's kind of acting as a new player so so it's not actually the
[00:53:52] player so so it's not actually the making things that much more complicated
[00:53:54] making things that much more complicated so so what happens is in some sense we
[00:53:56] so so what happens is in some sense we have a policy of a coin which is nature
[00:53:59] have a policy of a coin which is nature here right and policy of coin is half
[00:54:01] here right and policy of coin is half the time I get 0 I don't change anything
[00:54:04] the time I get 0 I don't change anything half the time I just get the neighboring
[00:54:06] half the time I just get the neighboring bin as opposed to my main bin and then I
[00:54:09] bin as opposed to my main bin and then I get this new tree Berber I have like a
[00:54:11] get this new tree Berber I have like a whole new level for what we're the
[00:54:13] whole new level for what we're the chance place so we have now we have max
[00:54:15] chance place so we have now we have max nodes we have min nodes we also have
[00:54:17] nodes we have min nodes we also have these chance nodes here and the chance
[00:54:19] these chance nodes here and the chance nodes again like sometimes they take me
[00:54:21] nodes again like sometimes they take me to the original bucket and then 50% of
[00:54:23] to the original bucket and then 50% of times they take me to a neighboring
[00:54:25] times they take me to a neighboring bucket but but the whole story like
[00:54:27] bucket but but the whole story like stays the same like nothing changes you
[00:54:29] stays the same like nothing changes you can you can still compute value
[00:54:30] can you can still compute value functions you can still push the value
[00:54:32] functions you can still push the value functions further up it's the same sort
[00:54:34] functions further up it's the same sort of recurrence nothing fundamental
[00:54:36] of recurrence nothing fundamental changes just it just feels like there
[00:54:37] changes just it just feels like there are three things playing now so so then
[00:54:43] are three things playing now so so then this is actually called expecting
[00:54:45] this is actually called expecting minimax so value of expecting minimax
[00:54:48] minimax so value of expecting minimax here in this case for example is minus
[00:54:51] here in this case for example is minus two because there is a min node for the
[00:54:55] two because there is a min node for the opponent there's an expectation node for
[00:54:57] opponent there's an expectation node for what nature does and then there is a max
[00:54:59] what nature does and then there is a max node for what the agent should do that's
[00:55:01] node for what the agent should do that's why it's called expecting minimax and
[00:55:03] why it's called expecting minimax and then you can actually compute the same
[00:55:04] then you can actually compute the same value there's like two players I pick up
[00:55:09] value there's like two players I pick up in then you flip the point and then
[00:55:12] in then you flip the point and then shift it left or notch it to left and
[00:55:15] shift it left or notch it to left and then I get to take the number yes well
[00:55:17] then I get to take the number yes well you thought you loved the opponent yeah
[00:55:19] you thought you loved the opponent yeah yeah so there's still two players and
[00:55:21] yeah so there's still two players and then the third coin think yes all right
[00:55:26] then the third coin think yes all right so so yes so the way to formalize this
[00:55:28] so so yes so the way to formalize this is you have players so you have an agent
[00:55:30] is you have players so you have an agent you have an opponent down you have coin
[00:55:32] you have an opponent down you have coin and then the recurrence changes a little
[00:55:35] and then the recurrence changes a little bit I guess so so what happens is the
[00:55:38] bit I guess so so what happens is the recurrence that we have had for minimax
[00:55:39] recurrence that we have had for minimax was just the Max and min and it would
[00:55:42] was just the Max and min and it would just return us the utility if you're in
[00:55:44] just return us the utility if you're in an nth function and in an end-state now
[00:55:46] an nth function and in an end-state now if the if it is the coins term we just
[00:55:49] if the if it is the coins term we just do a sum over and expected some of their
[00:55:52] do a sum over and expected some of their policy of the coin which is what we were
[00:55:54] policy of the coin which is what we were doing expecting minimax but we just have
[00:55:56] doing expecting minimax but we just have like a new return for one coin place so
[00:55:58] like a new return for one coin place so so everything here kind of follows
[00:56:00] so everything here kind of follows naturally terms up what we're expecting
[00:56:03] naturally terms up what we're expecting okay all right so the summary so far is
[00:56:06] okay all right so the summary so far is well we've been talking about max notes
[00:56:08] well we've been talking about max notes we've been talking about chance notes
[00:56:10] we've been talking about chance notes like what if you have a coin there and
[00:56:11] like what if you have a coin there and then also these mid notes and and
[00:56:14] then also these mid notes and and basically we've been talking about
[00:56:16] basically we've been talking about composing these sore
[00:56:17] composing these sore notes together and creating like a
[00:56:19] notes together and creating like a minimax game or or an expecting max game
[00:56:22] minimax game or or an expecting max game ana value function is we just do the
[00:56:25] ana value function is we just do the usual recurrence that we have been doing
[00:56:27] usual recurrence that we have been doing in this class from the expected utility
[00:56:29] in this class from the expected utility to from the utility to come up with this
[00:56:31] to from the utility to come up with this expected utility value for all the notes
[00:56:33] expected utility value for all the notes that we have so there might be other
[00:56:35] that we have so there might be other other scenarios that you might want to
[00:56:37] other scenarios that you might want to think about for example for your
[00:56:38] think about for example for your projects or like in general there are
[00:56:40] projects or like in general there are other variations of games that you might
[00:56:43] other variations of games that you might want to think about so what if like the
[00:56:45] want to think about so what if like the case that you're playing with multiple
[00:56:46] case that you're playing with multiple opponents like so far we have talked
[00:56:47] opponents like so far we have talked about like the two-player setting where
[00:56:49] about like the two-player setting where we have one opponent and one agent but
[00:56:51] we have one opponent and one agent but what if you have multiple opponents like
[00:56:52] what if you have multiple opponents like you can think about how the tree changes
[00:56:54] you can think about how the tree changes in those settings or for example like
[00:56:57] in those settings or for example like the taking turns aspects of it like is
[00:56:58] the taking turns aspects of it like is it so if the game is simultaneous versus
[00:57:00] it so if the game is simultaneous versus your turn taking or like you can imagine
[00:57:03] your turn taking or like you can imagine settings where you have some actions
[00:57:05] settings where you have some actions that allow you to have an extra turn so
[00:57:07] that allow you to have an extra turn so so you have two turns and then the next
[00:57:09] so you have two turns and then the next person takes takes turn so you should
[00:57:12] person takes takes turn so you should think about some of these some of them
[00:57:13] think about some of these some of them come up into homework so think about
[00:57:16] come up into homework so think about variations of games in general they're
[00:57:18] variations of games in general they're kind of fun so to talk a little bit
[00:57:21] kind of fun so to talk a little bit about the computation aspects of this so
[00:57:25] about the computation aspects of this so this is pretty bad are you talking about
[00:57:27] this is pretty bad are you talking about a game tree which is similar to tree
[00:57:30] a game tree which is similar to tree search so we're taking its research
[00:57:33] search so we're taking its research approach if you remember it's research
[00:57:35] approach if you remember it's research like the algorithms we're using like if
[00:57:39] like the algorithms we're using like if you have branching factor of B and some
[00:57:41] you have branching factor of B and some depth of D then then in terms of time
[00:57:44] depth of D then then in terms of time it's exponential in the order of B to
[00:57:46] it's exponential in the order of B to the to D in this case so I'm using D for
[00:57:49] the to D in this case so I'm using D for the number of how do I say this so so
[00:57:53] the number of how do I say this so so that's to D because the play the agent
[00:57:55] that's to D because the play the agent plays and an opponent plays so that's
[00:57:57] plays and an opponent plays so that's how I'm counting it so every every to D
[00:57:59] how I'm counting it so every every to D like we have two replies but D that's
[00:58:03] like we have two replies but D that's all right and then in terms of space its
[00:58:06] all right and then in terms of space its order of D in terms of time it's
[00:58:07] order of D in terms of time it's exponential that's pretty bad so for a
[00:58:09] exponential that's pretty bad so for a game of like chess for example the
[00:58:11] game of like chess for example the branching factor is around thirty five
[00:58:13] branching factor is around thirty five steps is around 50 so if you compute B
[00:58:17] steps is around 50 so if you compute B to the to D then it goes in the order of
[00:58:20] to the to D then it goes in the order of like number of atoms and universe that
[00:58:22] like number of atoms and universe that that's not doable we should we are not
[00:58:25] that's not doable we should we are not able to use any of these methods so so
[00:58:27] able to use any of these methods so so how do we make things faster so we
[00:58:29] how do we make things faster so we should be talking about high
[00:58:31] should be talking about high things faster so there are two
[00:58:33] things faster so there are two approaches that we are talking about in
[00:58:35] approaches that we are talking about in this class too to make things faster the
[00:58:37] this class too to make things faster the first approach is using an evaluation
[00:58:38] first approach is using an evaluation function so using an evaluation function
[00:58:42] function so using an evaluation function what we can do is we can use
[00:58:43] what we can do is we can use domain-specific knowledge about the game
[00:58:45] domain-specific knowledge about the game to define almost like features about the
[00:58:48] to define almost like features about the game in order to approximate like the
[00:58:50] game in order to approximate like the value did this value function at a
[00:58:52] value did this value function at a particular state so I'm gonna talk about
[00:58:54] particular state so I'm gonna talk about that a little bit and then another
[00:58:56] that a little bit and then another approach is this approach which is kind
[00:58:58] approach is this approach which is kind of simple and kind of nice which is
[00:59:00] of simple and kind of nice which is called alpha beta pruning and and alpha
[00:59:02] called alpha beta pruning and and alpha beta pruning approach basically gets rid
[00:59:04] beta pruning approach basically gets rid of part of the tree if it realizes you
[00:59:07] of part of the tree if it realizes you don't need to go down that tree that
[00:59:08] don't need to go down that tree that part that part of the subtree so so it's
[00:59:10] part that part of the subtree so so it's a pruning approach that doesn't explore
[00:59:12] a pruning approach that doesn't explore all of the tree only explores parts of
[00:59:14] all of the tree only explores parts of the tree so so we're going to talk about
[00:59:15] the tree so so we're going to talk about both of them alright so evaluation
[00:59:19] both of them alright so evaluation functions so let's talk about that okay
[00:59:22] functions so let's talk about that okay so the depths can be really like the
[00:59:25] so the depths can be really like the breadth and depth of the game can be
[00:59:27] breadth and depth of the game can be really large that's not that great so
[00:59:29] really large that's not that great so one approach to go about solving the
[00:59:32] one approach to go about solving the problem is is to kind of limit the depth
[00:59:35] problem is is to kind of limit the depth so instead of like exploring everything
[00:59:37] so instead of like exploring everything in a tree just limits the depth and get
[00:59:40] in a tree just limits the depth and get to that particular depth and then after
[00:59:42] to that particular depth and then after that when you get to that depth just
[00:59:43] that when you get to that depth just call an evaluation function so so if you
[00:59:46] call an evaluation function so so if you were to like search the full tree this
[00:59:48] were to like search the full tree this was the recursion that that we had like
[00:59:50] was the recursion that that we had like we have talked about right this is like
[00:59:52] we have talked about right this is like if you're doing a minimax approach this
[00:59:53] if you're doing a minimax approach this is the recursion that you got to do you
[00:59:55] is the recursion that you got to do you got over all the states and actions and
[00:59:56] got over all the states and actions and go over all the tree but if you're using
[01:00:00] go over all the tree but if you're using a limited depth tree search approach
[01:00:02] a limited depth tree search approach what you can do is you can basically
[01:00:04] what you can do is you can basically have this depth need and then decrement
[01:00:08] have this depth need and then decrement D every time you go over an agent an
[01:00:10] D every time you go over an agent an opponent like every time you go down the
[01:00:13] opponent like every time you go down the tree and at some point D just becomes
[01:00:15] tree and at some point D just becomes zero so you get to put some particular
[01:00:17] zero so you get to put some particular depth of the tree and when D becomes
[01:00:19] depth of the tree and when D becomes zero you're going to call an evaluation
[01:00:21] zero you're going to call an evaluation function on the states that you get okay
[01:00:23] function on the states that you get okay and this evaluation function is almost
[01:00:27] and this evaluation function is almost of the same form of what future costs
[01:00:29] of the same form of what future costs maybe we're talking about search
[01:00:30] maybe we're talking about search problems right so so if you knew exactly
[01:00:32] problems right so so if you knew exactly what it was that then you were done but
[01:00:35] what it was that then you were done but you don't know exactly what it is
[01:00:36] you don't know exactly what it is because if you knew that you were to
[01:00:38] because if you knew that you were to solve like the whole tree search problem
[01:00:41] solve like the whole tree search problem but in general you can have some sort of
[01:00:43] but in general you can have some sort of weak estimate of
[01:00:44] weak estimate of of what what the future costs would be
[01:00:47] of what what the future costs would be so yeah so an evaluation function eval s
[01:00:52] so yeah so an evaluation function eval s is a weak estimate of V minimax of s so
[01:00:57] is a weak estimate of V minimax of s so it's a weak estimate of your value
[01:00:59] it's a weak estimate of your value function okay all right so so analogy of
[01:01:04] function okay all right so so analogy of that is future cost in search problems
[01:01:07] that is future cost in search problems so how do we come up with an evaluation
[01:01:09] so how do we come up with an evaluation function so we do it in a similar manner
[01:01:12] function so we do it in a similar manner to admitted in the learning lecture
[01:01:14] to admitted in the learning lecture where we're coming up with with features
[01:01:16] where we're coming up with with features and and and weights for those features
[01:01:19] and and and weights for those features right so so if I'm playing like chess
[01:01:20] right so so if I'm playing like chess and like the way we play it right like
[01:01:22] and like the way we play it right like we think about a set of actions that we
[01:01:24] we think about a set of actions that we can take and where we end up at and and
[01:01:26] can take and where we end up at and and based on where we end up at then we kind
[01:01:29] based on where we end up at then we kind of evaluate how good that were this
[01:01:31] of evaluate how good that were this right we have some notions of features
[01:01:33] right we have some notions of features and how good looking like how good that
[01:01:35] and how good looking like how good that boards would be from that point on and
[01:01:37] boards would be from that point on and that allows us evaluate what action to
[01:01:39] that allows us evaluate what action to pick right like when we play chess
[01:01:40] pick right like when we play chess that's kind of what we do we pick a
[01:01:42] that's kind of what we do we pick a couple of actions and we see how the
[01:01:44] couple of actions and we see how the boards would look like after taking them
[01:01:45] boards would look like after taking them an evaluation function kind of does the
[01:01:47] an evaluation function kind of does the same thing it tries to figure out what
[01:01:48] same thing it tries to figure out what are the things that we should care about
[01:01:50] are the things that we should care about in a specific game in this case and in
[01:01:53] in a specific game in this case and in chess and then try so I give values to
[01:01:55] chess and then try so I give values to them so so it might be things like the
[01:01:57] them so so it might be things like the number of pieces we have or mobility of
[01:01:59] number of pieces we have or mobility of those pieces or if our king is safe or
[01:02:02] those pieces or if our king is safe or or if you have central control or not so
[01:02:04] or if you have central control or not so for example for the pieces what we can
[01:02:06] for example for the pieces what we can do is we can look at the difference
[01:02:08] do is we can look at the difference between the number of pieces we have
[01:02:10] between the number of pieces we have between what we have and what our
[01:02:11] between what we have and what our opponent has so number of Kings that I
[01:02:13] opponent has so number of Kings that I have versus number of opponents that I
[01:02:15] have versus number of opponents that I have well that seems really important
[01:02:17] have well that seems really important because if I don't have a king an
[01:02:18] because if I don't have a king an opponent has a king then now I've lost
[01:02:21] opponent has a king then now I've lost the game so so you might put like a
[01:02:23] the game so so you might put like a really large weight for that and you
[01:02:25] really large weight for that and you might care about like the differences
[01:02:26] might care about like the differences between the number of ponds or number
[01:02:29] between the number of ponds or number Queens and other types of pieces that
[01:02:31] Queens and other types of pieces that you have on the board so so that allows
[01:02:32] you have on the board so so that allows you to care about to think about how
[01:02:35] you to care about to think about how good the board is or number of legal
[01:02:37] good the board is or number of legal moves that you have a number of legal
[01:02:39] moves that you have a number of legal moves that your opponent has and then
[01:02:41] moves that your opponent has and then that gives you some notion of like
[01:02:43] that gives you some notion of like mobility of that state ok all right
[01:02:47] mobility of that state ok all right so so summary so far is yeah so this is
[01:02:51] so so summary so far is yeah so this is pretty bad order of B to the to D is
[01:02:53] pretty bad order of B to the to D is pretty bad and an evaluation function
[01:02:56] pretty bad and an evaluation function basically tries to estimate this
[01:02:58] basically tries to estimate this the minimax using some domain knowledge
[01:03:00] the minimax using some domain knowledge and then unlike a estar we actually
[01:03:03] and then unlike a estar we actually don't have like any guarantees in terms
[01:03:04] don't have like any guarantees in terms of like error from from these sort of
[01:03:06] of like error from from these sort of approximations so but it's an
[01:03:09] approximations so but it's an approximation people use it it's pretty
[01:03:11] approximation people use it it's pretty good we will talk about it a little bit
[01:03:13] good we will talk about it a little bit later next time when we think about like
[01:03:16] later next time when we think about like what sort of weights we should we should
[01:03:18] what sort of weights we should we should pick for each one of these for each one
[01:03:21] pick for each one of these for each one of these features so you should think
[01:03:22] of these features so you should think learning when you think about what are
[01:03:24] learning when you think about what are the weights we are using all right so
[01:03:27] the weights we are using all right so okay so now I want to spend a bit of
[01:03:29] okay so now I want to spend a bit of time on alpha beta pruning because this
[01:03:32] time on alpha beta pruning because this is yeah important okay so alpha beta
[01:03:36] is yeah important okay so alpha beta pruning um yeah the concept of alpha
[01:03:40] pruning um yeah the concept of alpha beta pruning is also pretty simple but I
[01:03:41] beta pruning is also pretty simple but I think it's one of those things that was
[01:03:43] think it's one of those things that was it's kind of that table you should pay
[01:03:45] it's kind of that table you should pay attention to kind of get what it is
[01:03:47] attention to kind of get what it is happening all right so so let's say that
[01:03:51] happening all right so so let's say that you want to choose between some bucket a
[01:03:53] you want to choose between some bucket a and bucket be okay and you want to
[01:03:56] and bucket be okay and you want to choose the maximum value and then you
[01:03:58] choose the maximum value and then you know that the values of a fall into like
[01:04:00] know that the values of a fall into like three to five and the values of B fall
[01:04:02] three to five and the values of B fall into five to ten so so they don't really
[01:04:04] into five to ten so so they don't really have like any any intersections between
[01:04:06] have like any any intersections between each other so so in that case you don't
[01:04:09] each other so so in that case you don't really care about your if you're picking
[01:04:11] really care about your if you're picking a maximum right you shouldn't care about
[01:04:13] a maximum right you shouldn't care about your bucket a or rest of your bucket a
[01:04:15] your bucket a or rest of your bucket a because because you already know that
[01:04:17] because because you already know that you're above wise you're happy with B
[01:04:19] you're above wise you're happy with B you shouldn't even look at a so so kind
[01:04:21] you shouldn't even look at a so so kind of the underlying concept of alpha beta
[01:04:26] of the underlying concept of alpha beta pruning is is maintaining a lower bound
[01:04:28] pruning is is maintaining a lower bound and upper bound on values and then if
[01:04:31] and upper bound on values and then if the intervals don't overlap then
[01:04:32] the intervals don't overlap then basically dropping part of the subtree
[01:04:35] basically dropping part of the subtree that you don't need to work on because
[01:04:37] that you don't need to work on because there is there is no overlap between
[01:04:38] there is there is no overlap between there okay so here's an example so let's
[01:04:43] there okay so here's an example so let's say we have these max notes and mid
[01:04:44] say we have these max notes and mid notes and you're gonna go down and see
[01:04:46] notes and you're gonna go down and see three and then this is a mid note so so
[01:04:49] three and then this is a mid note so so you're gonna get three here so when I
[01:04:52] you're gonna get three here so when I get to the max note here right I know
[01:04:54] get to the max note here right I know what I know is that the max node is
[01:04:57] what I know is that the max node is going to get three or higher right
[01:04:59] going to get three or higher right that's one one thing that I would know
[01:05:01] that's one one thing that I would know without even looking at anything on the
[01:05:04] without even looking at anything on the on the other side that I've been looking
[01:05:06] on the other side that I've been looking at the sub tree on the Left I already
[01:05:08] at the sub tree on the Left I already know that this max no it should get
[01:05:09] know that this max no it should get three or higher right
[01:05:11] three or higher right your dad okay so so then when I go down
[01:05:15] your dad okay so so then when I go down to the this min node and I see two here
[01:05:18] to the this min node and I see two here right I know this is a min node it's
[01:05:21] right I know this is a min node it's going to get a value that's less than or
[01:05:23] going to get a value that's less than or equal to 2 less than or equal to 2 does
[01:05:26] equal to 2 less than or equal to 2 does not have any interval with greater than
[01:05:28] not have any interval with greater than or equal to 3 so I should not worry
[01:05:31] or equal to 3 so I should not worry about that subtree did everyone see that
[01:05:35] about that subtree did everyone see that so maybe you're like let me draw that
[01:05:38] so maybe you're like let me draw that here
[01:05:42] so that's kind of like the whole concept
[01:05:44] so that's kind of like the whole concept of what happens in the alpha-beta
[01:05:46] of what happens in the alpha-beta pruning so I have this max node this was
[01:05:53] pruning so I have this max node this was three this was five I found that this
[01:05:57] three this was five I found that this guy is three this is a max node whatever
[01:05:59] guy is three this is a max node whatever it gets it's going to be greater than or
[01:06:02] it gets it's going to be greater than or equal to three because it's already seen
[01:06:04] equal to three because it's already seen three it's not gonna get any value less
[01:06:06] three it's not gonna get any value less than three all right so we know whatever
[01:06:08] than three all right so we know whatever value we are gonna get at this max node
[01:06:10] value we are gonna get at this max node is going to be three or higher then I'm
[01:06:15] is going to be three or higher then I'm gonna go down here and then I see two
[01:06:17] gonna go down here and then I see two here right
[01:06:19] here right it's a min node whatever it gets is
[01:06:21] it's a min node whatever it gets is going to be less than or equal to two so
[01:06:24] going to be less than or equal to two so less than or equal to 2 is the value
[01:06:27] less than or equal to 2 is the value that's going to get popped up here I
[01:06:30] that's going to get popped up here I already know less than or equal to 2 has
[01:06:33] already know less than or equal to 2 has no interval with 3 or greater so I don't
[01:06:36] no interval with 3 or greater so I don't even need to worry about this like I
[01:06:37] even need to worry about this like I like I can completely ignore this side
[01:06:40] like I can completely ignore this side of the tree I don't need to know
[01:06:41] of the tree I don't need to know whatever is happening down here I don't
[01:06:43] whatever is happening down here I don't even need to look at that okay cuz cuz I
[01:06:46] even need to look at that okay cuz cuz I like this value should be greater than
[01:06:48] like this value should be greater than applause sorry
[01:06:57] now minimum so it's a minimum it's a
[01:07:00] now minimum so it's a minimum it's a minimum note right so it's going to be
[01:07:03] minimum note right so it's going to be your less than or equal it's a mid note
[01:07:09] your less than or equal it's a mid note so I saw two if I see ten here or twenty
[01:07:12] so I saw two if I see ten here or twenty here like I'm not gonna pick that like
[01:07:13] here like I'm not gonna pick that like it's two or all right so yeah so if it
[01:07:22] it's two or all right so yeah so if it is 10 or 100 or whatever substrate is
[01:07:24] is 10 or 100 or whatever substrate is there like we're not gonna look at that
[01:07:26] there like we're not gonna look at that so that that is kind of the whole
[01:07:28] so that that is kind of the whole concept
[01:07:30] concept all right so okay so the key idea of
[01:07:37] all right so okay so the key idea of alpha-beta pruning is as we're like an
[01:07:39] alpha-beta pruning is as we're like an optimal path is going to get to some
[01:07:41] optimal path is going to get to some leaf node that has some utility and that
[01:07:43] leaf node that has some utility and that utility is the thing that is going to be
[01:07:46] utility is the thing that is going to be pushed up like like and then the
[01:07:49] pushed up like like and then the interesting thing is if you pick the
[01:07:51] interesting thing is if you pick the optimal path the value of the note on
[01:07:53] optimal path the value of the note on that optimal path are all going to be
[01:07:55] that optimal path are all going to be equal to each other like that basically
[01:07:58] equal to each other like that basically that utility that you're gonna get
[01:07:59] that utility that you're gonna get pushed up all the way to the top so
[01:08:02] pushed up all the way to the top so because of that like we need to have
[01:08:04] because of that like we need to have like these DS like we can't have
[01:08:07] like these DS like we can't have settings where we don't have any
[01:08:09] settings where we don't have any intersections between the intervals
[01:08:10] intersections between the intervals because we know if this rule is if this
[01:08:12] because we know if this rule is if this were to be the optimal path the value on
[01:08:15] were to be the optimal path the value on this node should have been the same as
[01:08:16] this node should have been the same as the value at this node the same as the
[01:08:19] the value at this node the same as the value at this node and and so on so if
[01:08:21] value at this node and and so on so if they don't have any intervals then no
[01:08:22] they don't have any intervals then no way that they would have the same value
[01:08:24] way that they would have the same value and no way for that path to be the
[01:08:26] and no way for that path to be the optimal path okay so so that's kind of
[01:08:29] optimal path okay so so that's kind of the reason that it works cuz the optimal
[01:08:31] the reason that it works cuz the optimal path you're gonna have the same value
[01:08:33] path you're gonna have the same value throughout okay
[01:08:34] throughout okay so all right so how do we actually do
[01:08:36] so all right so how do we actually do this so the way we do this is we are
[01:08:37] this so the way we do this is we are going to keep a lower bound on max nodes
[01:08:40] going to keep a lower bound on max nodes so I'm gonna call that a s here so we
[01:08:47] so I'm gonna call that a s here so we are gonna have a s which is a lower
[01:08:51] are gonna have a s which is a lower bound on max nodes
[01:08:57] so we're gonna keep track of that you're
[01:09:00] so we're gonna keep track of that you're also going to keep track of BS which is
[01:09:03] also going to keep track of BS which is an upper bound on mid notes and then if
[01:09:14] an upper bound on mid notes and then if they don't have any intervals we just
[01:09:15] they don't have any intervals we just drop that subtree if they have intervals
[01:09:17] drop that subtree if they have intervals we just keep updating a snps okay so so
[01:09:21] we just keep updating a snps okay so so here's an example so let's say that we
[01:09:23] here's an example so let's say that we start with this top node somehow we have
[01:09:25] start with this top node somehow we have found out that this top node should be
[01:09:27] found out that this top node should be greater than or equal to 6 right somehow
[01:09:29] greater than or equal to 6 right somehow I know it should be greater than or
[01:09:32] I know it should be greater than or equal to 6
[01:09:33] equal to 6 okay so that is my a s value so my a s
[01:09:38] okay so that is my a s value so my a s is equal to 6 it is it is going to be a
[01:09:42] is equal to 6 it is it is going to be a lower bound on my max node I know that
[01:09:46] lower bound on my max node I know that the value the optimal value is going to
[01:09:48] the value the optimal value is going to be something greater than equal to 6
[01:09:50] be something greater than equal to 6 ok then somehow we get to this min node
[01:09:53] ok then somehow we get to this min node and then we realize that this min node
[01:09:55] and then we realize that this min node should be less than or equal to 8
[01:09:57] should be less than or equal to 8 so you're here let's say 8 is here
[01:10:02] so you're here let's say 8 is here you still have some interval you're all
[01:10:04] you still have some interval you're all good right so the s is going to be equal
[01:10:07] good right so the s is going to be equal to 8 right we have an upper bound on the
[01:10:10] to 8 right we have an upper bound on the min node and that tells us that upper
[01:10:12] min node and that tells us that upper bound is 8 so the the value the optimal
[01:10:14] bound is 8 so the the value the optimal value the value on the optimal path is
[01:10:17] value the value on the optimal path is going to be less than or equal to 8 okay
[01:10:19] going to be less than or equal to 8 okay so far so good then somehow I found out
[01:10:23] so far so good then somehow I found out that that one is greater than or equal
[01:10:25] that that one is greater than or equal to 3 greater than or equal to 3 should
[01:10:28] to 3 greater than or equal to 3 should be fine right like greater than or equal
[01:10:30] be fine right like greater than or equal to 3 is still greater than or equal to 6
[01:10:32] to 3 is still greater than or equal to 6 my a s in this case I'm gonna call this
[01:10:36] my a s in this case I'm gonna call this s 1 s 2 s 3 is equal to 3 right cuz I
[01:10:42] s 1 s 2 s 3 is equal to 3 right cuz I know I need to be greater than or equal
[01:10:43] know I need to be greater than or equal to 3 what like 6 already does the job
[01:10:45] to 3 what like 6 already does the job right like I don't need to worry about
[01:10:47] right like I don't need to worry about that 3 so so that's all and then for
[01:10:51] that 3 so so that's all and then for this last node I am at this min node and
[01:10:53] this last node I am at this min node and I realize that ps4 I'm gonna call it B S
[01:10:57] I realize that ps4 I'm gonna call it B S 4 is equal to 5 and what this tells me
[01:11:01] 4 is equal to 5 and what this tells me is that your value should be less than 5
[01:11:04] is that your value should be less than 5 and less than 5 so I'm going to update
[01:11:06] and less than 5 so I'm going to update less than 8 to less than 5
[01:11:12] and now I don't have any inner walls so
[01:11:15] and now I don't have any inner walls so what that tells me is that path is not
[01:11:17] what that tells me is that path is not going to be the optimal path because
[01:11:19] going to be the optimal path because there is no interval so we're not going
[01:11:21] there is no interval so we're not going to find this this one number that is
[01:11:23] to find this this one number that is going to be the utility and what that
[01:11:25] going to be the utility and what that tells me is I can actually ignore that
[01:11:27] tells me is I can actually ignore that whole subtree because because that's not
[01:11:28] whole subtree because because that's not going to be in my optimal path I can I
[01:11:31] going to be in my optimal path I can I can get rid of it I can ignore it yes so
[01:11:42] can get rid of it I can ignore it yes so we're ignoring three in a different way
[01:11:44] we're ignoring three in a different way so yeah so we're ignoring the value of
[01:11:47] so yeah so we're ignoring the value of three because this is already encoded
[01:11:49] three because this is already encoded here but we're ignoring the subtree of
[01:11:51] here but we're ignoring the subtree of five like I'm not exploring it like I
[01:11:53] five like I'm not exploring it like I need to explore things after the three
[01:11:55] need to explore things after the three already because I'd like like like with
[01:11:57] already because I'd like like like with the three of you already had an overlap
[01:11:59] the three of you already had an overlap with the beta so you're looking at with
[01:12:01] with the beta so you're looking at with the B value you're looking at the
[01:12:03] the B value you're looking at the overlap between your upper bound of mid
[01:12:06] overlap between your upper bound of mid node and lower bound of max node so that
[01:12:07] node and lower bound of max node so that interval is the interval you were making
[01:12:09] interval is the interval you were making sure it still has values in it if the
[01:12:13] sure it still has values in it if the two or three instead we just ignore that
[01:12:16] two or three instead we just ignore that anyways because you have something else
[01:12:18] anyways because you have something else that yeah yeah so yeah I think so yeah
[01:12:24] that yeah yeah so yeah I think so yeah so if you already have like if three
[01:12:26] so if you already have like if three where two is that what you're saying
[01:12:27] where two is that what you're saying yeah so so you want to have non-trivial
[01:12:29] yeah so so you want to have non-trivial intervals basically yes yes so like if
[01:12:32] intervals basically yes yes so like if if you use the same value you still yeah
[01:12:36] if you use the same value you still yeah you don't have non-trivial intervals and
[01:12:37] you don't have non-trivial intervals and yeah what are we got six an a300 this is
[01:12:42] yeah what are we got six an a300 this is an example that imagines some Holly
[01:12:45] an example that imagines some Holly we'll talk about some examples where we
[01:12:47] we'll talk about some examples where we get them so let's talk about one more
[01:12:49] get them so let's talk about one more example where we actually like it these
[01:12:51] example where we actually like it these quotes for now just assume somehow we
[01:12:52] quotes for now just assume somehow we have found this
[01:12:57] I don't understand why brie is an
[01:13:00] I don't understand why brie is an upper-bound what seems a lower bound so
[01:13:02] upper-bound what seems a lower bound so um so then you actual value I'm not
[01:13:08] um so then you actual value I'm not showing a full example here so the
[01:13:09] showing a full example here so the actual values are coming from somewhere
[01:13:11] actual values are coming from somewhere that I'm not talking about yet but oh
[01:13:14] that I'm not talking about yet but oh the one at the top okay oh sorry yeah so
[01:13:18] the one at the top okay oh sorry yeah so the one at the top right so so this is a
[01:13:21] the one at the top right so so this is a min note let me note the same accent
[01:13:24] min note let me note the same accent right so at my mid note I found out that
[01:13:26] right so at my mid note I found out that minimum between three and five is three
[01:13:28] minimum between three and five is three right so max no it is maximizing between
[01:13:32] right so max no it is maximizing between three and a bunch of other things that's
[01:13:35] three and a bunch of other things that's that's what it's supposed to do right so
[01:13:37] that's what it's supposed to do right so if it is maximizing between three and a
[01:13:39] if it is maximizing between three and a bunch of other things then it's at least
[01:13:40] bunch of other things then it's at least going to be three it's not going to be
[01:13:42] going to be three it's not going to be two there's no way for it to be two or
[01:13:44] two there's no way for it to be two or it's not going to be zero right because
[01:13:46] it's not going to be zero right because it's it's going to take maximum of three
[01:13:48] it's it's going to take maximum of three and something else so that's why I'm
[01:13:50] and something else so that's why I'm saying well this value whatever I'm
[01:13:52] saying well this value whatever I'm going to get at this max node it's going
[01:13:54] going to get at this max node it's going to be greater than or equal to three X s
[01:13:57] to be greater than or equal to three X s so now I come down here and I see like I
[01:14:00] so now I come down here and I see like I see this - this is a min note so the
[01:14:04] see this - this is a min note so the value here is going to be the minimum
[01:14:06] value here is going to be the minimum between two and whatever is down this
[01:14:09] between two and whatever is down this tree right so it is going to be at least
[01:14:13] tree right so it is going to be at least very bad way that we said it was it's
[01:14:16] very bad way that we said it was it's going to be it's going to be two or
[01:14:18] going to be it's going to be two or lower so so what we're getting here is
[01:14:22] lower so so what we're getting here is going to be two or lower right so I'm
[01:14:24] going to be two or lower right so I'm either going to get 2 or 1 or 0 or all
[01:14:27] either going to get 2 or 1 or 0 or all that and that's the value that's going
[01:14:29] that and that's the value that's going to be pushed up here right so that's the
[01:14:33] to be pushed up here right so that's the value that's going to go down here it's
[01:14:34] value that's going to go down here it's going to be a value that is 2 or lower
[01:14:36] going to be a value that is 2 or lower so if I'm maximizing between 3 and
[01:14:41] so if I'm maximizing between 3 and something that is 2 or lower then 3 is
[01:14:44] something that is 2 or lower then 3 is enough and I can kind of figure that out
[01:14:46] enough and I can kind of figure that out based on these intervals and don't look
[01:14:48] based on these intervals and don't look at this side of the tree like like once
[01:14:50] at this side of the tree like like once I have I've seen this - I already feel
[01:14:52] I have I've seen this - I already feel that there is no no trivial interval
[01:14:54] that there is no no trivial interval between a value that's greater than 3
[01:14:57] between a value that's greater than 3 and a value that's less than 2 so I can
[01:14:59] and a value that's less than 2 so I can just not worry about stuff
[01:15:03] all right so one quick cutter
[01:15:06] all right so one quick cutter implementation thing is we talked about
[01:15:10] implementation thing is we talked about these ace a values and B values you can
[01:15:14] these ace a values and B values you can keep track of only one value and that
[01:15:16] keep track of only one value and that would be this alpha value and beta value
[01:15:18] would be this alpha value and beta value where alpha value is just I'm gonna just
[01:15:21] where alpha value is just I'm gonna just write it here
[01:15:21] write it here alpha value right so op of S is the max
[01:15:26] alpha value right so op of S is the max of a s for all these s Prime's that are
[01:15:30] of a s for all these s Prime's that are listen s yeah so so it's so what this
[01:15:34] listen s yeah so so it's so what this basically says is it remember like when
[01:15:36] basically says is it remember like when we saw three we said well that's already
[01:15:38] we saw three we said well that's already included like we already knew that
[01:15:40] included like we already knew that that's kind of the same idea so alpha s
[01:15:42] that's kind of the same idea so alpha s is just going to be one value in this
[01:15:44] is just going to be one value in this case it's just going to be six because
[01:15:46] case it's just going to be six because like when I see three like I don't
[01:15:48] like when I see three like I don't really care about that three right like
[01:15:49] really care about that three right like I already know I'm greater than six
[01:15:50] I already know I'm greater than six knowing that I'm greater than three is
[01:15:52] knowing that I'm greater than three is not adding anything so we keep track of
[01:15:54] not adding anything so we keep track of one value off of Asaph of s in this case
[01:15:58] one value off of Asaph of s in this case F of s is just equal to six and a
[01:16:02] F of s is just equal to six and a similar thing for beta we are going to
[01:16:04] similar thing for beta we are going to keep track of beta of s and beta of s is
[01:16:06] keep track of beta of s and beta of s is just minimum of BS s and then what I'm
[01:16:12] just minimum of BS s and then what I'm writing here is just the ordering of the
[01:16:14] writing here is just the ordering of the notes that you have seen so so beta is s
[01:16:18] notes that you have seen so so beta is s fine and then you're looking at the
[01:16:20] fine and then you're looking at the intervals like f of s and F of s and
[01:16:24] intervals like f of s and F of s and above and beta of SM below and if those
[01:16:27] above and beta of SM below and if those intervals don't have any trivial
[01:16:29] intervals don't have any trivial intersections then you can you can prune
[01:16:30] intersections then you can you can prune part of the tree okay so this is more of
[01:16:33] part of the tree okay so this is more of an implementation thing instead of
[01:16:34] an implementation thing instead of keeping track of all these assn BS s
[01:16:37] keeping track of all these assn BS s just keep like one number one alpha and
[01:16:38] just keep like one number one alpha and 1 beta okay all right
[01:16:42] 1 beta okay all right okay so let's look at one other example
[01:16:45] okay so let's look at one other example so all right so I'm gonna just do this
[01:16:49] so all right so I'm gonna just do this example real quick okay so we're gonna
[01:16:54] example real quick okay so we're gonna start from some top note we're gonna go
[01:16:57] start from some top note we're gonna go to this note this is a mid note between
[01:16:59] to this note this is a mid note between nine and seven between nine and seven
[01:17:03] nine and seven between nine and seven right so it's a mid note I'm gonna get
[01:17:06] right so it's a mid note I'm gonna get this guy a seven I'm gonna realize that
[01:17:08] this guy a seven I'm gonna realize that this max node is going to be something
[01:17:11] this max node is going to be something that's at least seven right it's going
[01:17:13] that's at least seven right it's going to be something that's greater than or
[01:17:14] to be something that's greater than or equal to seven so my alpha
[01:17:17] equal to seven so my alpha there's going to be seven right now I
[01:17:19] there's going to be seven right now I know whatever value I'm gonna get is
[01:17:21] know whatever value I'm gonna get is going to be 7 or higher whatever value
[01:17:24] going to be 7 or higher whatever value to start notice going to get it's got to
[01:17:25] to start notice going to get it's got to be 7 or higher so now I come down here I
[01:17:29] be 7 or higher so now I come down here I am at a mid note I see a 6 here
[01:17:32] am at a mid note I see a 6 here right I go here it's a min note so
[01:17:36] right I go here it's a min note so whatever we get here is going to be less
[01:17:38] whatever we get here is going to be less than or equal to 6 right so it's going
[01:17:41] than or equal to 6 right so it's going to be 6 or something that is lower that
[01:17:44] to be 6 or something that is lower that tells me my beta is is equal to 6 that
[01:17:48] tells me my beta is is equal to 6 that tells me whatever I'm getting in that
[01:17:50] tells me whatever I'm getting in that min node is going to be 6 and lower that
[01:17:52] min node is going to be 6 and lower that doesn't have any intersections with my
[01:17:54] doesn't have any intersections with my alpha of s so I can just not do anything
[01:17:58] alpha of s so I can just not do anything about this this branch like I don't need
[01:18:01] about this this branch like I don't need to go over like all these other things
[01:18:04] to go over like all these other things like I can kind of like ignore like this
[01:18:06] like I can kind of like ignore like this whole bunch okay all right so now I go
[01:18:10] whole bunch okay all right so now I go back up I go down here I'm at a mid note
[01:18:15] back up I go down here I'm at a mid note so remember the way we were computing
[01:18:17] so remember the way we were computing these beta values we were based on the
[01:18:19] these beta values we were based on the notice that we have seen previously so I
[01:18:21] notice that we have seen previously so I have a new beta now cuz I'm done with
[01:18:23] have a new beta now cuz I'm done with this branch right so I need to get here
[01:18:25] this branch right so I need to get here here I have a min between what is it 8 8
[01:18:30] here I have a min between what is it 8 8 and 3 so okay so so I see my maybe let
[01:18:34] and 3 so okay so so I see my maybe let me just rate I see my 8 here
[01:18:36] me just rate I see my 8 here it's a min node so it's going to be less
[01:18:38] it's a min node so it's going to be less than or equal to 8 so my new beta value
[01:18:41] than or equal to 8 so my new beta value is going to be 8 my alpha is still 7
[01:18:46] is going to be 8 my alpha is still 7 because that's for my top note so it's 8
[01:18:48] because that's for my top note so it's 8 or lower we do have an interval
[01:18:52] or lower we do have an interval overlapping interval 7 to 8 everything
[01:18:54] overlapping interval 7 to 8 everything is good so I actually need to go and see
[01:18:57] is good so I actually need to go and see what this value is this value is 3 so I
[01:19:01] what this value is this value is 3 so I get 3 here or like it's exactly equal to
[01:19:04] get 3 here or like it's exactly equal to 3 so that updates my beta from 8 to 3
[01:19:09] 3 so that updates my beta from 8 to 3 we'll have already explored that part of
[01:19:11] we'll have already explored that part of the tree anyways but 3 you don't have an
[01:19:14] the tree anyways but 3 you don't have an interval if there were a bunch of things
[01:19:16] interval if there were a bunch of things below this 3 like I like a nice somehow
[01:19:19] below this 3 like I like a nice somehow sound it's not like I wouldn't need to
[01:19:20] sound it's not like I wouldn't need to explore it but we don't really have that
[01:19:21] explore it but we don't really have that and then we just find that our optimal
[01:19:24] and then we just find that our optimal value 7 so we just return something okay
[01:19:27] value 7 so we just return something okay and we did an explore this giant middle
[01:19:30] and we did an explore this giant middle of the tree okay one more slide and
[01:19:34] of the tree okay one more slide and enough two more two more quick one quick
[01:19:36] enough two more two more quick one quick idea okay so yeah alright so the order
[01:19:40] idea okay so yeah alright so the order of things actually matters so the only
[01:19:42] of things actually matters so the only thing I want to mention about this idea
[01:19:43] thing I want to mention about this idea of pruning is this order of things
[01:19:45] of pruning is this order of things matter so so when you look at this
[01:19:47] matter so so when you look at this example remember we didn't explore
[01:19:48] example remember we didn't explore anything about the ten because we
[01:19:50] anything about the ten because we already knew that this value needs to be
[01:19:52] already knew that this value needs to be greater than equal to three these are my
[01:19:54] greater than equal to three these are my buckets right if I swap the buckets like
[01:19:56] buckets right if I swap the buckets like if I just swap the order of buckets I
[01:19:58] if I just swap the order of buckets I moved the to ten bucket to this side
[01:19:59] moved the to ten bucket to this side three five pocket to the other side I
[01:20:01] three five pocket to the other side I wouldn't be able to do that I actually
[01:20:03] wouldn't be able to do that I actually need to explore the whole tree because
[01:20:06] need to explore the whole tree because my alpha and beta
[01:20:07] my alpha and beta wouldn't have the same properties so the
[01:20:09] wouldn't have the same properties so the order that you're putting things on the
[01:20:11] order that you're putting things on the tree actually matters and you should
[01:20:13] tree actually matters and you should care about that so worst case scenario
[01:20:16] care about that so worst case scenario our ordering is terrible so we need to
[01:20:18] our ordering is terrible so we need to actually go over the full tree that's
[01:20:20] actually go over the full tree that's order of B to the to D that's the worst
[01:20:21] order of B to the to D that's the worst case scenario
[01:20:22] case scenario there are ends of this best ordering
[01:20:24] there are ends of this best ordering where you don't explore like half of it
[01:20:26] where you don't explore like half of it so you can't like if you had that if you
[01:20:29] so you can't like if you had that if you if you have a tree where you're you can
[01:20:31] if you have a tree where you're you can explore up to like depth ten then with
[01:20:34] explore up to like depth ten then with the best order and you can actually
[01:20:35] the best order and you can actually explore up to depth like 20 so sorry
[01:20:38] explore up to depth like 20 so sorry that's a huge improvement actually so
[01:20:40] that's a huge improvement actually so best ordering is going to be order of B
[01:20:43] best ordering is going to be order of B to the D and then random ordering turns
[01:20:46] to the D and then random ordering turns out to be pretty okay to so random
[01:20:48] out to be pretty okay to so random ordering would be order of P to the 2
[01:20:50] ordering would be order of P to the 2 times 3/4 times D so even if you had a
[01:20:52] times 3/4 times D so even if you had a random ordering it would be better than
[01:20:54] random ordering it would be better than the worst-case scenario and then well
[01:20:57] the worst-case scenario and then well how do you figure out what is a good
[01:20:58] how do you figure out what is a good bordering in ordering well we can have
[01:21:00] bordering in ordering well we can have this evaluation function remember you're
[01:21:02] this evaluation function remember you're computing the evaluation function and
[01:21:04] computing the evaluation function and and what you can do is you can order
[01:21:06] and what you can do is you can order your Super Max nodes you can order the
[01:21:09] your Super Max nodes you can order the successors by decreasing evaluation
[01:21:11] successors by decreasing evaluation function and then for min nodes you can
[01:21:14] function and then for min nodes you can order successors by increasing
[01:21:16] order successors by increasing evaluation functions that allows you to
[01:21:18] evaluation functions that allows you to prune as much things as possible all
[01:21:21] prune as much things as possible all right so with that I'll see you guys
[01:21:24] right so with that I'll see you guys next lecture talking about tea new
[01:21:25] next lecture talking about tea new learning


================================================================================
LECTURE 022
================================================================================

Game Playing 2 - TD Learning, Game Theory | Stanford CS221: Artificial Intelligence (Autumn 2019)

Source: https://www.youtube.com/watch?v=WoFwXj4p4Sc

---

Transcript

[00:00:04] let's start guys okay so we're gonna
[00:00:09] let's start guys okay so we're gonna continue talking about games today and
[00:00:11] continue talking about games today and just quick announcement the project
[00:00:13] just quick announcement the project proposals are due today I think you all
[00:00:16] proposals are due today I think you all know that all right so let's good
[00:00:19] know that all right so let's good tomorrow right okay yeah today is not
[00:00:25] tomorrow right okay yeah today is not Thursday yeah tomorrow for a second I
[00:00:28] Thursday yeah tomorrow for a second I thought it's Thursday all right so let's
[00:00:32] thought it's Thursday all right so let's talk about games so we started talking
[00:00:35] talk about games so we started talking about games last time we formalized them
[00:00:38] about games last time we formalized them we talked about none we talked about
[00:00:41] we talked about none we talked about zero-sum two-player games that were
[00:00:43] zero-sum two-player games that were turn-taking
[00:00:44] turn-taking right and we talked about a bunch of
[00:00:46] right and we talked about a bunch of different strategies to solve them like
[00:00:47] different strategies to solve them like the minimax strategy or the expecting
[00:00:49] the minimax strategy or the expecting max strategy and today we want to talk a
[00:00:52] max strategy and today we want to talk a little bit about learning in the setting
[00:00:54] little bit about learning in the setting of games so what does learning mean how
[00:00:56] of games so what does learning mean how do we learn those evaluation functions
[00:00:57] do we learn those evaluation functions that we talked about and then towards
[00:01:00] that we talked about and then towards the end of the lecture we want to talk a
[00:01:01] the end of the lecture we want to talk a little bit about variations of the game
[00:01:04] little bit about variations of the game the games you have talked about so how
[00:01:06] the games you have talked about so how about you have how about the cases where
[00:01:08] about you have how about the cases where we have simultaneous games or
[00:01:09] we have simultaneous games or nonzero-sum games so that's a that's a
[00:01:11] nonzero-sum games so that's a that's a plan for today so I'm gonna start with a
[00:01:13] plan for today so I'm gonna start with a question that you're actually going to
[00:01:16] question that you're actually going to talk about it towards the end of the
[00:01:17] talk about it towards the end of the lecture but it's a good motivation so
[00:01:20] lecture but it's a good motivation so think about a setting where we have a
[00:01:23] think about a setting where we have a simultaneous two player zero-sum game so
[00:01:25] simultaneous two player zero-sum game so it's a two player zero-sum game similar
[00:01:28] it's a two player zero-sum game similar to the games we talked about last time
[00:01:29] to the games we talked about last time but it is simultaneous so you're not
[00:01:31] but it is simultaneous so you're not taking turns you're playing at the same
[00:01:33] taking turns you're playing at the same time and an example of that is
[00:01:36] time and an example of that is rock-paper-scissors so can you still be
[00:01:39] rock-paper-scissors so can you still be optimal if you reveal your strategy
[00:01:41] optimal if you reveal your strategy so we'll say you're playing with someone
[00:01:43] so we'll say you're playing with someone if you tell them what your strategy is
[00:01:45] if you tell them what your strategy is can you still be optimal that's the
[00:01:48] can you still be optimal that's the question exactly what you're going to
[00:01:54] question exactly what you're going to play you won't be successful too huge
[00:01:56] play you won't be successful too huge for a zero-sum real-time so I was using
[00:01:58] for a zero-sum real-time so I was using a larger scale I think you could still
[00:02:00] a larger scale I think you could still be successful if that approach is like
[00:02:03] be successful if that approach is like superior to the others rates
[00:02:05] superior to the others rates so it's not so the answer was about the
[00:02:07] so it's not so the answer was about the size of the game so rock-paper-scissors
[00:02:09] size of the game so rock-paper-scissors being small versus versus not being
[00:02:11] being small versus versus not being small so so the question is more of a
[00:02:13] small so so the question is more of a motivating thing people talk about this
[00:02:14] motivating thing people talk about this a lot of details towards the end of the
[00:02:16] a lot of details towards the end of the class it's actually not the size that
[00:02:18] class it's actually not the size that matters is a type of strategy that you
[00:02:20] matters is a type of strategy that you play that matters just to give you an
[00:02:21] play that matters just to give you an idea but like the reason that we have
[00:02:24] idea but like the reason that we have put this I guess at the beginning of the
[00:02:26] put this I guess at the beginning of the lecture is intuitively when you think
[00:02:28] lecture is intuitively when you think about this you might say no I'm not
[00:02:30] about this you might say no I'm not gonna tell you what my strategy is right
[00:02:32] gonna tell you what my strategy is right because if I say I'm gonna play like
[00:02:34] because if I say I'm gonna play like scissors you'll know what to play but
[00:02:37] scissors you'll know what to play but this has an intuitive answer that you're
[00:02:40] this has an intuitive answer that you're gonna talk about towards the end of the
[00:02:41] gonna talk about towards the end of the lecture so just more of a motivating
[00:02:43] lecture so just more of a motivating example don't think about it too hard
[00:02:45] example don't think about it too hard all right so so let's do a quick review
[00:02:48] all right so so let's do a quick review of games so so last night we talked
[00:02:52] of games so so last night we talked about having an agent an opponent
[00:02:54] about having an agent an opponent playing against each other so and you
[00:02:56] playing against each other so and you were playing for the agent and the agent
[00:02:58] were playing for the agent and the agent was trying to maximize their utility so
[00:03:01] was trying to maximize their utility so they were trying to get this utility the
[00:03:02] they were trying to get this utility the example we looked at was agent is going
[00:03:05] example we looked at was agent is going to pickpocket a bucket B or bucket C and
[00:03:08] to pickpocket a bucket B or bucket C and then the opponent is going to pick a
[00:03:09] then the opponent is going to pick a number from these buckets they can
[00:03:11] number from these buckets they can either pick minus 50 or 51 or 3 or minus
[00:03:14] either pick minus 50 or 51 or 3 or minus 5 or 15 and then if you want to maximize
[00:03:17] 5 or 15 and then if you want to maximize your your utility as an agent then you
[00:03:20] your your utility as an agent then you can potentially think that your opponent
[00:03:21] can potentially think that your opponent is trying to trying to minimize your
[00:03:24] is trying to trying to minimize your utility and you can have this minimax
[00:03:26] utility and you can have this minimax game kind of playing against each other
[00:03:27] game kind of playing against each other and based on that decide what to do so
[00:03:30] and based on that decide what to do so we had this minimax tree and based on
[00:03:33] we had this minimax tree and based on that the utilities that are gonna pop up
[00:03:34] that the utilities that are gonna pop up or minus 51 and minus 5 so if your goal
[00:03:38] or minus 51 and minus 5 so if your goal is to maximize your utility you're gonna
[00:03:40] is to maximize your utility you're gonna pick bucket be the second bucket because
[00:03:42] pick bucket be the second bucket because that's the best thing you can do
[00:03:43] that's the best thing you can do assuming your opponent is a minimizer so
[00:03:45] assuming your opponent is a minimizer so so that was kind of the setup that we
[00:03:47] so that was kind of the setup that we started looking at and the way we
[00:03:49] started looking at and the way we thought about solving this game why was
[00:03:51] thought about solving this game why was by writing a recurrence so so we had
[00:03:54] by writing a recurrence so so we had this value this is V which was the value
[00:03:56] this value this is V which was the value of a minimax at state s and if you're at
[00:04:01] of a minimax at state s and if you're at the utility so if you're at an end state
[00:04:03] the utility so if you're at an end state you're gonna get utility of s right like
[00:04:05] you're gonna get utility of s right like if you get to the end state we get the
[00:04:06] if you get to the end state we get the utility because we get the utility only
[00:04:08] utility because we get the utility only idea at the very end of the game and if
[00:04:11] idea at the very end of the game and if the agent is playing we the recurrence
[00:04:13] the agent is playing we the recurrence is maximize V of the successor States
[00:04:16] is maximize V of the successor States and if the opponent is playing
[00:04:18] and if the opponent is playing want to minimize the value of the
[00:04:20] want to minimize the value of the successor States so that was the
[00:04:22] successor States so that was the recurrence we started with and and we
[00:04:25] recurrence we started with and and we looked at games that were kind of large
[00:04:27] looked at games that were kind of large like the game of chess and if you think
[00:04:28] like the game of chess and if you think about the game of chess the branching
[00:04:30] about the game of chess the branching factor is huge the depth is really large
[00:04:32] factor is huge the depth is really large it's not practical to you to do the
[00:04:35] it's not practical to you to do the recurrence so we we started talking
[00:04:37] recurrence so we we started talking about waste for speeding things up and
[00:04:39] about waste for speeding things up and one way to speed things up with this
[00:04:41] one way to speed things up with this idea of using an evaluation function so
[00:04:44] idea of using an evaluation function so do their recurrence but only do it until
[00:04:46] do their recurrence but only do it until some depth so don't go over the full
[00:04:49] some depth so don't go over the full tree just do it until some depth and
[00:04:51] tree just do it until some depth and then after that just call an evaluation
[00:04:53] then after that just call an evaluation function and hopefully your evaluation
[00:04:55] function and hopefully your evaluation function which is kind of this weak
[00:04:57] function which is kind of this weak estimate of your value is going to work
[00:05:00] estimate of your value is going to work well and give you an idea what to do
[00:05:02] well and give you an idea what to do next so so instead of the usual
[00:05:04] next so so instead of the usual recurrence what we did was we decided to
[00:05:06] recurrence what we did was we decided to add this knee here and this D right here
[00:05:09] add this knee here and this D right here which is the depths that until which we
[00:05:11] which is the depths that until which we are exploring and then we decrease the
[00:05:13] are exploring and then we decrease the value of depth after an agent an
[00:05:16] value of depth after an agent an opponent place and then my depth is
[00:05:18] opponent place and then my depth is equal to zero we just call an evaluation
[00:05:20] equal to zero we just call an evaluation function so intuitively if you're
[00:05:22] function so intuitively if you're playing chess for example you might
[00:05:24] playing chess for example you might think a few steps ahead and when you
[00:05:26] think a few steps ahead and when you think a few steps ahead you might think
[00:05:27] think a few steps ahead you might think about how the board looks like and kind
[00:05:29] about how the board looks like and kind of evaluates that based on the features
[00:05:31] of evaluates that based on the features that that that board has and based on
[00:05:32] that that that board has and based on that you might you might decide to take
[00:05:34] that you might you might decide to take various actions so similar type of idea
[00:05:36] various actions so similar type of idea and then the question was well how are
[00:05:39] and then the question was well how are we going to come up with a solution
[00:05:39] we going to come up with a solution function like where is this evaluation
[00:05:42] function like where is this evaluation function coming from and then one idea
[00:05:44] function coming from and then one idea that that we talked about last time was
[00:05:46] that that we talked about last time was it can be handcrafted the designer can
[00:05:48] it can be handcrafted the designer can come in and sit down and figure out what
[00:05:50] come in and sit down and figure out what is a good evaluation function so in the
[00:05:53] is a good evaluation function so in the game of chase and test example is you
[00:05:56] game of chase and test example is you have this evaluation function that can
[00:05:58] have this evaluation function that can depend on the number of pieces you have
[00:05:59] depend on the number of pieces you have the mobility of your pieces maybe the
[00:06:01] the mobility of your pieces maybe the safety of your king central control all
[00:06:04] safety of your king central control all these various things that you might care
[00:06:05] these various things that you might care about so the difference between the
[00:06:08] about so the difference between the number of Queens that you have and your
[00:06:09] number of Queens that you have and your opponent's number of Queens these are
[00:06:11] opponent's number of Queens these are things these are features that you care
[00:06:13] things these are features that you care about and and potentially a designer can
[00:06:15] about and and potentially a designer can come in and say well I care about nine
[00:06:17] come in and say well I care about nine times more than I care about how many
[00:06:19] times more than I care about how many pawns I sew so a hand like you can
[00:06:21] pawns I sew so a hand like you can actually hand design these things and
[00:06:23] actually hand design these things and write down these weights about how much
[00:06:26] write down these weights about how much you care about this so I'm using
[00:06:28] you care about this so I'm using terminology from the learning lecture
[00:06:30] terminology from the learning lecture right I'm saying we have
[00:06:31] right I'm saying we have wait here and we have features here and
[00:06:33] wait here and we have features here and someone can comment just handcraft well
[00:06:37] someone can comment just handcraft well one other thing we can do is instead of
[00:06:39] one other thing we can do is instead of hand crafting it we could actually try
[00:06:41] hand crafting it we could actually try to learn this evaluation function so so
[00:06:44] to learn this evaluation function so so we can still have to have two features
[00:06:45] we can still have to have two features right we can still say well I care about
[00:06:47] right we can still say well I care about the number of kings and queens and these
[00:06:48] the number of kings and queens and these are things that I have but I don't know
[00:06:50] are things that I have but I don't know how much I care about them and I
[00:06:52] how much I care about them and I actually want to learn that evaluation
[00:06:54] actually want to learn that evaluation function like what the weights should be
[00:06:57] function like what the weights should be so to do that I can write my evaluation
[00:06:59] so to do that I can write my evaluation function eval of s as this me as a
[00:07:03] function eval of s as this me as a function of state parameterize by pie
[00:07:06] function of state parameterize by pie weights doubles and and my goal is to
[00:07:09] weights doubles and and my goal is to figure out what these w's what these
[00:07:11] figure out what these w's what these weights are and ideally I want to learn
[00:07:13] weights are and ideally I want to learn that from some data ok so so we're going
[00:07:15] that from some data ok so so we're going to talk about how learning is applied to
[00:07:17] to talk about how learning is applied to the game setting and specifically the
[00:07:19] the game setting and specifically the way we are using learning for these game
[00:07:21] way we are using learning for these game settings is to just get a better sense
[00:07:23] settings is to just get a better sense of what this evaluation function should
[00:07:24] of what this evaluation function should be from some data so so the questions
[00:07:27] be from some data so so the questions you might have right now is well how
[00:07:29] you might have right now is well how does we look like where does my data
[00:07:31] does we look like where does my data come from because I find if you know
[00:07:33] come from because I find if you know where your data comes from and your V is
[00:07:35] where your data comes from and your V is then all you need to do is to come up
[00:07:37] then all you need to do is to come up with a learning algorithm that takes
[00:07:38] with a learning algorithm that takes your data and tries to figure out what
[00:07:40] your data and tries to figure out what your V is so so we're going to talk
[00:07:42] your V is so so we're going to talk about that at the first part of
[00:07:43] about that at the first part of knowledge and that kind of introduces to
[00:07:47] knowledge and that kind of introduces to this this temporal difference learning
[00:07:49] this this temporal difference learning which you're gonna discuss in a second
[00:07:50] which you're gonna discuss in a second it's very similar to key learning and
[00:07:54] it's very similar to key learning and then towards the end of the class we'll
[00:07:55] then towards the end of the class we'll talk about simultaneous games and
[00:07:57] talk about simultaneous games and nonzero symbols
[00:07:58] nonzero symbols all right so so let's start with this V
[00:08:02] all right so so let's start with this V function I just said well this new
[00:08:04] function I just said well this new function could be parameterize by a set
[00:08:07] function could be parameterize by a set of weights I set up double use and the
[00:08:09] of weights I set up double use and the simplest form of this V function is to
[00:08:11] simplest form of this V function is to just write it as a linear classifier as
[00:08:13] just write it as a linear classifier as as a linear function of a set of
[00:08:15] as a linear function of a set of features double use time space and these
[00:08:18] features double use time space and these these are the features that are hind
[00:08:19] these are the features that are hind coded and someone writes them and then
[00:08:21] coded and someone writes them and then and then I just want to figure out what
[00:08:22] and then I just want to figure out what W sir so this is the simplest form but
[00:08:25] W sir so this is the simplest form but in general did this this V function
[00:08:27] in general did this this V function doesn't need to be a linear classifier
[00:08:29] doesn't need to be a linear classifier it can actually be any supervised
[00:08:31] it can actually be any supervised learning model that you have discussed
[00:08:32] learning model that you have discussed in the first few lectures it can be a
[00:08:34] in the first few lectures it can be a neural network it can be anything even
[00:08:36] neural network it can be anything even more complicated than network that just
[00:08:39] more complicated than network that just does regression so we can basically any
[00:08:41] does regression so we can basically any model you could use in supervised
[00:08:43] model you could use in supervised learning could be placed here
[00:08:44] learning could be placed here as I see function so all I'm doing is
[00:08:48] as I see function so all I'm doing is I'm writing this P function as a
[00:08:49] I'm writing this P function as a function of state and a bunch of
[00:08:50] function of state and a bunch of parameters those parameters in the case
[00:08:52] parameters those parameters in the case of linear classifiers are just w's and
[00:08:54] of linear classifiers are just w's and in the case of the neural network there
[00:08:56] in the case of the neural network there are WS and these fees in this case of
[00:08:58] are WS and these fees in this case of like one layer alright so let's look at
[00:09:06] like one layer alright so let's look at an example so let's think about an
[00:09:07] an example so let's think about an example and I'm gonna focus on the
[00:09:09] example and I'm gonna focus on the linear classifier way of looking at this
[00:09:11] linear classifier way of looking at this just for simplicity so okay let's pick a
[00:09:14] just for simplicity so okay let's pick a game so we're gonna look at backgammon
[00:09:16] game so we're gonna look at backgammon so this is a very old game it's a
[00:09:19] so this is a very old game it's a two-player game the way it works is you
[00:09:21] two-player game the way it works is you have the red player and you have the
[00:09:23] have the red player and you have the white player and each one of them have
[00:09:25] white player and each one of them have these pieces and what they want to do is
[00:09:27] these pieces and what they want to do is they want to move all their pieces from
[00:09:28] they want to move all their pieces from one side of the board to the other side
[00:09:30] one side of the board to the other side of the board it's a game of chance you
[00:09:32] of the board it's a game of chance you can actually like roll two dice and
[00:09:34] can actually like roll two dice and based on the outcome of your dice you
[00:09:36] based on the outcome of your dice you move your pieces various various amounts
[00:09:39] move your pieces various various amounts to two various columns there bunch of
[00:09:41] to two various columns there bunch of rules so your goal is to get all your
[00:09:43] rules so your goal is to get all your pieces off the board but if you have
[00:09:45] pieces off the board but if you have only like one piece and your opponent
[00:09:47] only like one piece and your opponent like gets on top of you they can push
[00:09:49] like gets on top of you they can push you to the bar and you have to like
[00:09:50] you to the bar and you have to like start again there are a bunch of rules
[00:09:53] start again there are a bunch of rules about it read about it on Wikipedia if
[00:09:55] about it read about it on Wikipedia if you're interested but you're gonna look
[00:09:57] you're interested but you're gonna look at a simplified version of it so in this
[00:09:59] at a simplified version of it so in this simplified version I have player O and
[00:10:02] simplified version I have player O and play your X and I only have four columns
[00:10:04] play your X and I only have four columns I have column 0 1 2 &amp; 3 and in this case
[00:10:09] I have column 0 1 2 &amp; 3 and in this case I have four of each one of these players
[00:10:10] I have four of each one of these players and and the idea is we want to come up
[00:10:13] and and the idea is we want to come up with features that we would care about
[00:10:15] with features that we would care about in this game of backgammon so so what
[00:10:17] in this game of backgammon so so what are some features how do you think might
[00:10:19] are some features how do you think might be useful
[00:10:21] be useful remember the learning lecture how do we
[00:10:24] remember the learning lecture how do we come up with like future templates third
[00:10:30] come up with like future templates third is still down with the color but it's a
[00:10:32] is still down with the color but it's a mistake so maybe like the location of
[00:10:34] mistake so maybe like the location of the X's and O's the number of them yeah
[00:10:37] the X's and O's the number of them yeah yeah so like what idea is you have all
[00:10:40] yeah so like what idea is you have all these knowledge about the boards so
[00:10:41] these knowledge about the boards so maybe we should like care about the
[00:10:42] maybe we should like care about the location of the X's maybe we should care
[00:10:44] location of the X's maybe we should care about like where the O's are how many
[00:10:46] about like where the O's are how many pieces are on the board how many pieces
[00:10:47] pieces are on the board how many pieces are off the board so similar type of way
[00:10:50] are off the board so similar type of way that we would come up with features in
[00:10:51] that we would come up with features in the first few lectures we were basically
[00:10:53] the first few lectures we were basically we would do the same thing so a feature
[00:10:54] we would do the same thing so a feature template set of feature templates could
[00:10:56] template set of feature templates could look like this like number of
[00:10:58] look like this like number of X's or OS in column whatever con being
[00:11:01] X's or OS in column whatever con being equal to some value or a number of
[00:11:03] equal to some value or a number of excess zeros on the bar may be fraction
[00:11:06] excess zeros on the bar may be fraction of excesses or OS that are removed whose
[00:11:08] of excesses or OS that are removed whose turn it is so these are all like
[00:11:10] turn it is so these are all like potential features that it could be so
[00:11:11] potential features that it could be so for this particular board here are what
[00:11:14] for this particular board here are what those features would look like so if you
[00:11:16] those features would look like so if you look at number of OS in column 0 equal
[00:11:18] look at number of OS in column 0 equal to 1 that's equal to 1 remember we were
[00:11:20] to 1 that's equal to 1 remember we were using these indicator functions to be
[00:11:22] using these indicator functions to be more general so like here again we are
[00:11:24] more general so like here again we are using this indicator functions you might
[00:11:26] using this indicator functions you might ask number of O's on a bar that's equal
[00:11:28] ask number of O's on a bar that's equal to one fraction of O's that are removed
[00:11:31] to one fraction of O's that are removed so I have four pieces two of them are
[00:11:33] so I have four pieces two of them are already removed so that's one half
[00:11:34] already removed so that's one half number of X's in column one equal to 1
[00:11:37] number of X's in column one equal to 1 that's one number of X's and columns
[00:11:38] that's one number of X's and columns three equal to three that's one it's to
[00:11:41] three equal to three that's one it's to stern so that's a cool okay so so we
[00:11:43] stern so that's a cool okay so so we have a bunch of features these features
[00:11:45] have a bunch of features these features kind of explain what the sport looks
[00:11:47] kind of explain what the sport looks like or how good this world is and what
[00:11:49] like or how good this world is and what we want to do is we want to figure out
[00:11:50] we want to do is we want to figure out what it what are the weights that we
[00:11:53] what it what are the weights that we should put for each one of these
[00:11:54] should put for each one of these features and how much we should care
[00:11:55] features and how much we should care about each one of these features so that
[00:11:57] about each one of these features so that is the goal of learning here okay all
[00:12:01] is the goal of learning here okay all right so okay so that was my model right
[00:12:03] right so okay so that was my model right so far I've talked about this vs of W I
[00:12:06] so far I've talked about this vs of W I defined it as a linear classifier
[00:12:08] defined it as a linear classifier predictor W times features and now the
[00:12:12] predictor W times features and now the question is where do I get data like
[00:12:14] question is where do I get data like where it's because if I'm doing learning
[00:12:16] where it's because if I'm doing learning I got to get data from somewhere so so
[00:12:19] I got to get data from somewhere so so what idea that we can use here is we can
[00:12:21] what idea that we can use here is we can try to generate data based on our
[00:12:23] try to generate data based on our current policy PI agent or PI opponent
[00:12:26] current policy PI agent or PI opponent which is based on our current estimate
[00:12:28] which is based on our current estimate of what we use right so currently I
[00:12:31] of what we use right so currently I might have some idea of what this V
[00:12:33] might have some idea of what this V function is it might be a very bad idea
[00:12:34] function is it might be a very bad idea of what V is but that's okay I can just
[00:12:37] of what V is but that's okay I can just start with that and starting with that V
[00:12:40] start with that and starting with that V function that I currently have what I
[00:12:42] function that I currently have what I can do is I can I can call our max of V
[00:12:44] can do is I can I can call our max of V over successors of SN a to get a policy
[00:12:47] over successors of SN a to get a policy for my agent
[00:12:48] for my agent remember this was how we were getting
[00:12:49] remember this was how we were getting policy in a minimax setting policy for
[00:12:52] policy in a minimax setting policy for the opponent is just argument of that V
[00:12:54] the opponent is just argument of that V function and then when I call these
[00:12:56] function and then when I call these policies I get a bunch of actions I get
[00:12:59] policies I get a bunch of actions I get a sequence of like states based on based
[00:13:02] a sequence of like states based on based on how we're following these policies
[00:13:03] on how we're following these policies and that is some data that I can
[00:13:05] and that is some data that I can actually go over and try to make might
[00:13:08] actually go over and try to make might be better and better so so that's kind
[00:13:09] be better and better so so that's kind of how we do it we call these policies
[00:13:11] of how we do it we call these policies we get
[00:13:12] we get bunch of episodes we go over them to
[00:13:14] bunch of episodes we go over them to make things better and better so that's
[00:13:16] make things better and better so that's kind of the key idea um one question you
[00:13:19] kind of the key idea um one question you might have at this point is is this
[00:13:21] might have at this point is is this deterministic or not like do I need to
[00:13:23] deterministic or not like do I need to do something like Epsilon greedy so in
[00:13:25] do something like Epsilon greedy so in general you would need to do something
[00:13:27] general you would need to do something like epsilon greedy but in this
[00:13:29] like epsilon greedy but in this particular case you don't really need to
[00:13:31] particular case you don't really need to do that because we have again it like we
[00:13:32] do that because we have again it like we have this guy that you're actually
[00:13:35] have this guy that you're actually rolling the dice and by rolling the dice
[00:13:37] rolling the dice and by rolling the dice you are getting random different
[00:13:39] you are getting random different different random paths that that we
[00:13:41] different random paths that that we might take so that would take us
[00:13:43] might take so that would take us different states so we kind of already
[00:13:45] different states so we kind of already have this this element of random this
[00:13:46] have this this element of random this year that does some of the exploration
[00:13:48] year that does some of the exploration for us you just like yes so why if
[00:13:53] for us you just like yes so why if someone greedy what I mean here is do I
[00:13:55] someone greedy what I mean here is do I need to do extra exploration am I gonna
[00:13:57] need to do extra exploration am I gonna get stuck like in particular set of
[00:13:59] get stuck like in particular set of states if I don't do exploration and in
[00:14:01] states if I don't do exploration and in this particular case because we have
[00:14:02] this particular case because we have this randomness we don't really need to
[00:14:04] this randomness we don't really need to do that in general you might imagine
[00:14:06] do that in general you might imagine having some sort of epsilon greedy to
[00:14:09] having some sort of epsilon greedy to take us explore a little bit more okay
[00:14:11] take us explore a little bit more okay so then we generate episodes and then
[00:14:12] so then we generate episodes and then from these episodes we want to learn
[00:14:14] from these episodes we want to learn okay his episodes look like state action
[00:14:18] okay his episodes look like state action reward states and then they keep going
[00:14:20] reward states and then they keep going until like if you get a full episode one
[00:14:23] until like if you get a full episode one thing to notice here is is the reward is
[00:14:25] thing to notice here is is the reward is going to be 0
[00:14:26] going to be 0 throughout the episode until the very
[00:14:28] throughout the episode until the very end of end of the game right like on
[00:14:30] end of end of the game right like on till we end the episode and we might get
[00:14:32] till we end the episode and we might get some reward at that point or we might
[00:14:33] some reward at that point or we might not but but the reward throughout is
[00:14:36] not but but the reward throughout is going to be equal to 0 because we were
[00:14:38] going to be equal to 0 because we were playing a game right like you're not
[00:14:39] playing a game right like you're not getting any rewards available and if you
[00:14:41] getting any rewards available and if you think about each one of these small
[00:14:43] think about each one of these small pieces of experience si RS prime you can
[00:14:46] pieces of experience si RS prime you can try to learn something from each one of
[00:14:48] try to learn something from each one of these pieces of experience okay so so
[00:14:51] these pieces of experience okay so so what you have is actually going bored
[00:14:53] what you have is actually going bored maybe what you have here is you have a
[00:14:56] maybe what you have here is you have a piece of experience it's not like s hey
[00:15:00] piece of experience it's not like s hey you get some reward maybe it is zero
[00:15:02] you get some reward maybe it is zero that's fine if it is zero and you go to
[00:15:04] that's fine if it is zero and you go to some s 5 through that so let's take an
[00:15:09] some s 5 through that so let's take an action and you get a reward and you get
[00:15:11] action and you get a reward and you get a reward you go to something from that
[00:15:13] a reward you go to something from that and you have some prediction right your
[00:15:17] and you have some prediction right your prediction is your current like your
[00:15:19] prediction is your current like your current V function so your predict
[00:15:26] current V function so your predict is going to be this V function at the
[00:15:29] is going to be this V function at the state s prime meter eyes with W and this
[00:15:32] state s prime meter eyes with W and this is what your already like you kind of
[00:15:34] is what your already like you kind of know right now this is your current
[00:15:35] know right now this is your current estimate or what he is and this is your
[00:15:37] estimate or what he is and this is your prediction I'm writing the prediction as
[00:15:39] prediction I'm writing the prediction as a function of W right because it depends
[00:15:41] a function of W right because it depends on W and then we had a target that your
[00:15:45] on W and then we had a target that your try to get to and my target which is
[00:15:51] try to get to and my target which is kind of acts as a label it's going to be
[00:15:54] kind of acts as a label it's going to be equal to my reward the reward that I'm
[00:15:56] equal to my reward the reward that I'm getting so it's kind of the route so if
[00:15:59] getting so it's kind of the route so if you look at this V of s and W well
[00:16:02] you look at this V of s and W well what's kind of Polish to reward plus I'm
[00:16:07] what's kind of Polish to reward plus I'm gonna write discount factor yeah del V
[00:16:10] gonna write discount factor yeah del V of s prime right so so my target the
[00:16:14] of s prime right so so my target the thing that I'm trying to like get to is
[00:16:16] thing that I'm trying to like get to is the reward plus gamma V of s prime so
[00:16:24] the reward plus gamma V of s prime so we're playing games in games gamma is
[00:16:26] we're playing games in games gamma is usually one I'm gonna keep it here for
[00:16:28] usually one I'm gonna keep it here for now but I'm gonna drop it at some point
[00:16:29] now but I'm gonna drop it at some point so you don't need to really worry about
[00:16:31] so you don't need to really worry about gamma and then one other thing to notice
[00:16:33] gamma and then one other thing to notice here is I'm not writing target as a
[00:16:35] here is I'm not writing target as a function of W because target acts kind
[00:16:37] function of W because target acts kind of like my label right if I'm if I'm
[00:16:39] of like my label right if I'm if I'm trying to do regression here how get is
[00:16:41] trying to do regression here how get is my label it's kind of the ground truth
[00:16:43] my label it's kind of the ground truth thing that I'm trying to get to so I'm
[00:16:46] thing that I'm trying to get to so I'm gonna treat my target
[00:16:47] gonna treat my target that's just like a value I'm not writing
[00:16:49] that's just like a value I'm not writing it as function of W all right so so what
[00:16:53] it as function of W all right so so what do we try to do usually like when you're
[00:16:55] do we try to do usually like when you're trying to do learning yeah a prediction
[00:16:56] trying to do learning yeah a prediction we have a target what do I do
[00:17:00] we have a target what do I do minimize say yeah so what is there so I
[00:17:02] minimize say yeah so what is there so I can write my error as potential you
[00:17:04] can write my error as potential you squirt or so I'm gonna write 1/2 of
[00:17:08] squirt or so I'm gonna write 1/2 of prediction of W minus target squirt this
[00:17:14] prediction of W minus target squirt this is my square there I want to minimize
[00:17:18] is my square there I want to minimize that so with respect to W okay how do I
[00:17:23] that so with respect to W okay how do I do that I can take the gradient what is
[00:17:27] do that I can take the gradient what is a gradient equal to this is simple right
[00:17:33] a gradient equal to this is simple right so you do two gets cancelled gradient is
[00:17:36] so you do two gets cancelled gradient is just this guy prediction of
[00:17:39] just this guy prediction of w- target times the gradient of this
[00:17:45] w- target times the gradient of this inner expression the gradient of this
[00:17:47] inner expression the gradient of this inner expression with respect to w is
[00:17:49] inner expression with respect to w is the gradient of prediction with respect
[00:17:51] the gradient of prediction with respect to w minus zero plus target is treating
[00:17:54] to w minus zero plus target is treating it as a number okay let me move this up
[00:18:04] it as a number okay let me move this up so now I have the gradient what
[00:18:06] so now I have the gradient what algorithm should I use I can use
[00:18:12] algorithm should I use I can use gradient descent right so I'm going to
[00:18:15] gradient descent right so I'm going to update my W how do I update it I'm going
[00:18:19] update my W how do I update it I'm going to move in the negative direction of my
[00:18:21] to move in the negative direction of my gradient using some learning rate a de
[00:18:23] gradient using some learning rate a de times my gradient my gradient is
[00:18:26] times my gradient my gradient is prediction of W minus target times
[00:18:31] prediction of W minus target times gradient of prediction of W all right so
[00:18:38] gradient of prediction of W all right so that's actually what's on the slide so
[00:18:40] that's actually what's on the slide so the objective function there's
[00:18:42] the objective function there's prediction - target squared
[00:18:44] prediction - target squared gradient we just took that it's
[00:18:45] gradient we just took that it's prediction - target times gradient of
[00:18:48] prediction - target times gradient of prediction and then the update is just
[00:18:50] prediction and then the update is just this this particular update probably
[00:18:52] this this particular update probably move in a negative direction of the
[00:18:53] move in a negative direction of the gradient this is this is what you guys
[00:18:55] gradient this is this is what you guys have seen already all right so so far so
[00:18:58] have seen already all right so so far so good
[00:18:59] good um so this is the TD learning algorithm
[00:19:03] um so this is the TD learning algorithm this is all it does so temporal
[00:19:05] this is all it does so temporal difference learning what it does is it
[00:19:07] difference learning what it does is it picks like these pieces of experience
[00:19:09] picks like these pieces of experience si R s prime and then based on that
[00:19:12] si R s prime and then based on that pieces of experience it just updates W
[00:19:14] pieces of experience it just updates W based on this gradient descent update
[00:19:16] based on this gradient descent update difference between prediction and target
[00:19:19] difference between prediction and target times a gradient of V so what happens if
[00:19:24] times a gradient of V so what happens if I have if I have this this linear
[00:19:26] I have if I have this this linear function may be let me write let me
[00:19:27] function may be let me write let me write this in the case that I have a
[00:19:29] write this in the case that I have a linear linear function so what if my V
[00:19:31] linear linear function so what if my V of s W is just equal to W dot feel this
[00:19:39] of s W is just equal to W dot feel this LS so what happens to my update - ADA
[00:19:47] LS so what happens to my update - ADA what is prediction right
[00:19:53] what is prediction right although you don't feel this little horn
[00:19:55] although you don't feel this little horn what is target did you find it up there
[00:20:00] what is target did you find it up there it's the reward you're getting the
[00:20:03] it's the reward you're getting the immediate reward you're getting plus
[00:20:04] immediate reward you're getting plus gamma times V of s prime W which is w
[00:20:10] gamma times V of s prime W which is w dot V of s prime times gradient of your
[00:20:14] dot V of s prime times gradient of your prediction which is what's the illness
[00:20:17] prediction which is what's the illness so I just I just wrote up this in the
[00:20:20] so I just I just wrote up this in the case of a linear predictor case what you
[00:20:23] case of a linear predictor case what you learn you know where the difference is
[00:20:25] learn you know where the difference is between you two yeah so this is very
[00:20:27] between you two yeah so this is very similar to Q learning they're very minor
[00:20:29] similar to Q learning they're very minor differences that you'll talk about
[00:20:30] differences that you'll talk about actually at the end of this section
[00:20:31] actually at the end of this section comparing it to Q line all right so so I
[00:20:34] comparing it to Q line all right so so I want to go over an example it's kind of
[00:20:36] want to go over an example it's kind of like a tedious example but I think it
[00:20:37] like a tedious example but I think it helps going over that and kind of saying
[00:20:40] helps going over that and kind of saying why it works especially in the case that
[00:20:42] why it works especially in the case that the reward is just equal to zero like
[00:20:44] the reward is just equal to zero like throughout an episode so it kind of
[00:20:46] throughout an episode so it kind of feels funny to use this algorithm and
[00:20:48] feels funny to use this algorithm and make it work but it works so I want to
[00:20:51] make it work but it works so I want to just go over like one example of this so
[00:20:53] just go over like one example of this so I'm gonna show you one episode starting
[00:20:56] I'm gonna show you one episode starting from s 1 to some other state and and I
[00:20:59] from s 1 to some other state and and I have an episode I start from some state
[00:21:01] have an episode I start from some state I get some features of that state again
[00:21:04] I get some features of that state again these features are by just evaluating
[00:21:06] these features are by just evaluating those hand-coded features and I'm just
[00:21:09] those hand-coded features and I'm just gonna start what W should I start with 0
[00:21:13] gonna start what W should I start with 0 let me just initialized over u to be
[00:21:15] let me just initialized over u to be equal to 0 ok right how do I update my W
[00:21:21] equal to 0 ok right how do I update my W maybe let me just write it in this so so
[00:21:23] maybe let me just write it in this so so this is I want to write in as simple or
[00:21:26] this is I want to write in as simple or not the simpler form we're just in the
[00:21:27] not the simpler form we're just in the other form so W the way we're updating
[00:21:29] other form so W the way we're updating it is previous W - ADA times prediction
[00:21:34] it is previous W - ADA times prediction - target I'm going to use P and T for
[00:21:36] - target I'm going to use P and T for prediction - target times V of s this is
[00:21:40] prediction - target times V of s this is update you are doing ok yeah that's
[00:21:44] update you are doing ok yeah that's right okay so so what is my prediction
[00:21:47] right okay so so what is my prediction with my prediction W dot C of s 0 what
[00:21:54] with my prediction W dot C of s 0 what is my target so for my target I need to
[00:21:56] is my target so for my target I need to know what state I'm ending up at I'm
[00:21:58] know what state I'm ending up at I'm gonna end up at 1 0 in this episode and
[00:22:01] gonna end up at 1 0 in this episode and I'm gonna get a reward of 0 so what is
[00:22:03] I'm gonna get a reward of 0 so what is my target my target is reward which is 0
[00:22:06] my target my target is reward which is 0 plus double
[00:22:07] plus double times V of s prime that is zero because
[00:22:09] times V of s prime that is zero because W is equal to zero so my target is equal
[00:22:11] W is equal to zero so my target is equal to zero my P minus P is equal to zero so
[00:22:15] to zero my P minus P is equal to zero so P minus C is equal to zero this whole
[00:22:17] P minus C is equal to zero this whole thing is 0 W stays the same so in the
[00:22:20] thing is 0 W stays the same so in the next kind of step that we use just okay
[00:22:25] next kind of step that we use just okay I'm gonna move forward so what is
[00:22:29] I'm gonna move forward so what is prediction here 0 x 0 prediction is 0
[00:22:33] prediction here 0 x 0 prediction is 0 what is target I haven't yeah it's 0
[00:22:36] what is target I haven't yeah it's 0 because I haven't got any anything any
[00:22:38] because I haven't got any anything any word yet about 1/2 so yeah so target is
[00:22:44] word yet about 1/2 so yeah so target is going to be a reward which is 0 plus 0
[00:22:47] going to be a reward which is 0 plus 0 times whatever state of V of s prime
[00:22:49] times whatever state of V of s prime that I'm at so that's equal to 0 P minus
[00:22:51] that I'm at so that's equal to 0 P minus C is equal to 0 it's kind of boring so
[00:22:54] C is equal to 0 it's kind of boring so at this point W haven't changed W is
[00:22:59] at this point W haven't changed W is equal to 0 what is my prediction
[00:23:01] equal to 0 what is my prediction prediction is equal to 0 that's great
[00:23:04] prediction is equal to 0 that's great what is target equal to so I'm gonna end
[00:23:07] what is target equal to so I'm gonna end up in an end state where I get 1 0 and I
[00:23:13] up in an end state where I get 1 0 and I get a reward of 1 so this is the first
[00:23:16] get a reward of 1 so this is the first time I'm getting a reward which is my
[00:23:18] time I'm getting a reward which is my target to be my target is reward 1 plus
[00:23:23] target to be my target is reward 1 plus 0 times 1 0 which is 0 so my target is 1
[00:23:26] 0 times 1 0 which is 0 so my target is 1 so what this tells me is I'm predicting
[00:23:29] so what this tells me is I'm predicting 0 but my target is 1 so I need to push
[00:23:32] 0 but my target is 1 so I need to push my w's a little bit up to actually
[00:23:34] my w's a little bit up to actually address the fact that this is this is
[00:23:35] address the fact that this is this is this is equal to 1 so P minus C is equal
[00:23:38] this is equal to 1 so P minus C is equal to minus 1 so I need to do an update
[00:23:41] to minus 1 so I need to do an update maybe I'll do that update here so how am
[00:23:44] maybe I'll do that update here so how am i updating it so I'm doing starting from
[00:23:46] i updating it so I'm doing starting from zero zero minus my ADA is 0.5 that's
[00:23:51] zero zero minus my ADA is 0.5 that's what I allowed it like I put it I
[00:23:53] what I allowed it like I put it I defined it to be my prediction - target
[00:23:55] defined it to be my prediction - target is minus 1 what is fee of s P of s is 1
[00:23:59] is minus 1 what is fee of s P of s is 1 2 right so what should my new W be for
[00:24:07] 2 right so what should my new W be for this an equal to point 5 and then 1 X
[00:24:12] this an equal to point 5 and then 1 X I'm just doing arithmetic here
[00:24:13] I'm just doing arithmetic here so my new W is going to become 0.5 and 1
[00:24:17] so my new W is going to become 0.5 and 1 at the end of this one episode so I just
[00:24:20] at the end of this one episode so I just did a 1 episode 1 full
[00:24:21] did a 1 episode 1 full we're w0 throughout and then at the very
[00:24:24] we're w0 throughout and then at the very end when I got a reward then I updated
[00:24:26] end when I got a reward then I updated my W because I realized that my
[00:24:28] my W because I realized that my prediction and target or not the same
[00:24:29] prediction and target or not the same thing okay so now I'm gonna I'm gonna
[00:24:32] thing okay so now I'm gonna I'm gonna start a new episode and the new episode
[00:24:34] start a new episode and the new episode I'm starting is going to start with this
[00:24:36] I'm starting is going to start with this particular W and in the new episode even
[00:24:38] particular W and in the new episode even though the rewards are gonna be 0
[00:24:40] though the rewards are gonna be 0 throughout so like we're actually going
[00:24:41] throughout so like we're actually going to update our w's this is amazing think
[00:25:01] to update our w's this is amazing think about the same features right you could
[00:25:15] about the same features right you could have like I said up yeah depends on what
[00:25:17] have like I said up yeah depends on what source or feature you can you could use
[00:25:18] source or feature you can you could use like really not representative features
[00:25:21] like really not representative features like if you really want S 4 and s 1 s 9
[00:25:23] like if you really want S 4 and s 1 s 9 to differentiate between them we should
[00:25:25] to differentiate between them we should pick features that differentiates
[00:25:26] pick features that differentiates between them but if you are kind of the
[00:25:28] between them but if you are kind of the same and have the same sort of
[00:25:29] same and have the same sort of characteristics it's fine to have it's
[00:25:53] characteristics it's fine to have it's going to yeah you will never converge
[00:25:54] going to yeah you will never converge and that kind of tells you that that
[00:25:56] and that kind of tells you that that entry in your future victory you don't
[00:25:57] entry in your future victory you don't care about that or it's always like it's
[00:25:59] care about that or it's always like it's always staying the same and if it is
[00:26:01] always staying the same and if it is always zero it doesn't matter like what
[00:26:02] always zero it doesn't matter like what the weight of that entry is so in
[00:26:04] the weight of that entry is so in general we wanna have feature is that or
[00:26:05] general we wanna have feature is that or differentiating and you're losing
[00:26:07] differentiating and you're losing something so for the second row I'm not
[00:26:10] something so for the second row I'm not gonna write it up because that takes
[00:26:12] gonna write it up because that takes time so so okay so let's start with a
[00:26:15] time so so okay so let's start with a new episode we started this one again
[00:26:18] new episode we started this one again but now I'm starting with this new W
[00:26:20] but now I'm starting with this new W that I have so I can compute the
[00:26:22] that I have so I can compute the prediction the prediction is one I can
[00:26:24] prediction the prediction is one I can compute my target it's point five and
[00:26:26] compute my target it's point five and what we realize here is we overshoot it
[00:26:28] what we realize here is we overshoot it so before prediction was zero target was
[00:26:30] so before prediction was zero target was one who are under shooting we fixed our
[00:26:32] one who are under shooting we fixed our w's but now you're over shooting so we
[00:26:35] w's but now you're over shooting so we to fix that variation on the
[00:26:38] to fix that variation on the relationship between the teachers and
[00:26:40] relationship between the teachers and elites they always have to be the same
[00:26:42] elites they always have to be the same dimension and what should we be thinking
[00:26:46] dimension and what should we be thinking about that would make a good feature for
[00:26:47] about that would make a good feature for updating the link specifically like so
[00:26:50] updating the link specifically like so okay so first off yes they need to be
[00:26:52] okay so first off yes they need to be always the same there is dimension
[00:26:53] always the same there is dimension because you're doing this dot product
[00:26:55] because you're doing this dot product between them the feature selection you
[00:26:59] between them the feature selection you don't necessarily think of it as like
[00:27:02] don't necessarily think of it as like how am i updating the weights you think
[00:27:03] how am i updating the weights you think of the feature selection as is it
[00:27:04] of the feature selection as is it representative of how good my board is
[00:27:06] representative of how good my board is is it for example in the case of
[00:27:08] is it for example in the case of backgammon or is it representative of
[00:27:10] backgammon or is it representative of how good I'm navigating so so it should
[00:27:13] how good I'm navigating so so it should be a representation of how good your
[00:27:14] be a representation of how good your state is and then it's yeah it's usually
[00:27:16] state is and then it's yeah it's usually like hand designed right so so it's not
[00:27:19] like hand designed right so so it's not necessarily you shouldn't think of it as
[00:27:21] necessarily you shouldn't think of it as how is it helping my weights as you
[00:27:23] how is it helping my weights as you think of it as how is it representing
[00:27:25] think of it as how is it representing how good my state is the blackjack
[00:27:28] how good my state is the blackjack example you have a threshold of 21 and
[00:27:30] example you have a threshold of 21 and then your threshold intent if you're
[00:27:32] then your threshold intent if you're using the same feature extraction for
[00:27:33] using the same feature extraction for both how does that affect the
[00:27:35] both how does that affect the generalizability of the model the agent
[00:27:37] generalizability of the model the agent yeah so so you might choose do two
[00:27:39] yeah so so you might choose do two different features and one of them might
[00:27:40] different features and one of them might be more like so so there is kind of a
[00:27:42] be more like so so there is kind of a trade-off right you might get a feature
[00:27:44] trade-off right you might get a feature that actually the friendships between
[00:27:46] that actually the friendships between different states very well but then that
[00:27:48] different states very well but then that that makes learning longer that makes it
[00:27:51] that makes learning longer that makes it not as generalizable and then the idea
[00:27:52] not as generalizable and then the idea on the other hand you might get a
[00:27:53] on the other hand you might get a feature that's pretty generalizable but
[00:27:55] feature that's pretty generalizable but but then it might not do these specific
[00:27:58] but then it might not do these specific things that you would want to do or
[00:27:59] things that you would want to do or these differentiating factors about it
[00:28:01] these differentiating factors about it so picking features it's an art right so
[00:28:04] so picking features it's an art right so all right so let me let me move forward
[00:28:07] all right so let me let me move forward because we have a bunch of things coming
[00:28:09] because we have a bunch of things coming up okay so I'll go over this real quick
[00:28:12] up okay so I'll go over this real quick then so we have the W's right so so we
[00:28:14] then so we have the W's right so so we now update the W based on this new value
[00:28:16] now update the W based on this new value and kind of similar thing you have a
[00:28:19] and kind of similar thing you have a prediction you have a target you're
[00:28:21] prediction you have a target you're still over shooting so you still need to
[00:28:23] still over shooting so you still need to update it and then once you update it to
[00:28:25] update it and then once you update it to a point 25 and point 75 and it kind of
[00:28:28] a point 25 and point 75 and it kind of stays there you're happy okay all right
[00:28:31] stays there you're happy okay all right so so this was just an example of TD
[00:28:34] so so this was just an example of TD learning but this is the update that you
[00:28:35] learning but this is the update that you have kind of already seen right and then
[00:28:37] have kind of already seen right and then a lot of you pointed out that this is
[00:28:39] a lot of you pointed out that this is this is similar to Q learning already
[00:28:40] this is similar to Q learning already right this is actually pretty similar
[00:28:42] right this is actually pretty similar the update is very similar like we have
[00:28:45] the update is very similar like we have these gradients and
[00:28:47] these gradients and same way that we have in q-learning and
[00:28:49] same way that we have in q-learning and and we are looking at the difference
[00:28:51] and we are looking at the difference between prediction and target same way
[00:28:52] between prediction and target same way that we were looking at in cue learning
[00:28:54] that we were looking at in cue learning but there are some minor differences so
[00:28:55] but there are some minor differences so the first difference here is that
[00:28:57] the first difference here is that q-learning
[00:28:58] q-learning operates on the cue function q function
[00:29:00] operates on the cue function q function is a function over state and actions
[00:29:01] is a function over state and actions here we are operating on a value
[00:29:04] here we are operating on a value function right on V and V is only a
[00:29:06] function right on V and V is only a function of state right and part of that
[00:29:08] function of state right and part of that is actually because in the setting and
[00:29:11] is actually because in the setting and setting of a game you already know the
[00:29:14] setting of a game you already know the rules of the game so we kind of already
[00:29:15] rules of the game so we kind of already know the actions you don't need to worry
[00:29:17] know the actions you don't need to worry about it as much the same way that
[00:29:19] about it as much the same way that you're worrying about it in cue learning
[00:29:21] you're worrying about it in cue learning the second difference is Q learning is
[00:29:24] the second difference is Q learning is an auth policy algorithm so so the
[00:29:26] an auth policy algorithm so so the values based on this estimate of the
[00:29:28] values based on this estimate of the optimal policy which is this Q opt it's
[00:29:31] optimal policy which is this Q opt it's based on this optimal policy but in the
[00:29:34] based on this optimal policy but in the case of TD learning it's a non policy
[00:29:36] case of TD learning it's a non policy the values based on this exploration
[00:29:37] the values based on this exploration policy which is based on a fixed pie and
[00:29:40] policy which is based on a fixed pie and sure you're updating the PI but you're
[00:29:42] sure you're updating the PI but you're going with whatever PI you have and and
[00:29:44] going with whatever PI you have and and kind of running with that I keep
[00:29:46] kind of running with that I keep updating it okay so that's another
[00:29:48] updating it okay so that's another difference and then finally like in Q
[00:29:51] difference and then finally like in Q learning you don't need to know the MVP
[00:29:53] learning you don't need to know the MVP transitions so you don't need to know
[00:29:54] transitions so you don't need to know this transition function as transition
[00:29:56] this transition function as transition from SI to s Prime but in the case of TD
[00:29:59] from SI to s Prime but in the case of TD learning you need to know the rules of
[00:30:01] learning you need to know the rules of the game so you need to know how the
[00:30:03] the game so you need to know how the successor function OSN a works so so
[00:30:07] successor function OSN a works so so those are some kind of minor differences
[00:30:08] those are some kind of minor differences but from like a perspective of like how
[00:30:11] but from like a perspective of like how your update works it is pretty similar
[00:30:13] your update works it is pretty similar to what Q is
[00:30:15] to what Q is all right so so that was kind of this
[00:30:18] all right so so that was kind of this idea of I have this evaluation function
[00:30:20] idea of I have this evaluation function I want to learn it from data I'm gonna
[00:30:22] I want to learn it from data I'm gonna generate data from that generated data
[00:30:24] generate data from that generated data I'm gonna update my W so that's what
[00:30:26] I'm gonna update my W so that's what we've been talking about so far and the
[00:30:29] we've been talking about so far and the idea of learning using learning to play
[00:30:31] idea of learning using learning to play games is not a new idea actually so so
[00:30:34] games is not a new idea actually so so in 50s
[00:30:35] in 50s samuel looked at a checkers game program
[00:30:38] samuel looked at a checkers game program so where he was using ideas from self
[00:30:42] so where he was using ideas from self play and ideas from like similar type of
[00:30:44] play and ideas from like similar type of things we have talked about using really
[00:30:46] things we have talked about using really smart features using linear evaluation
[00:30:48] smart features using linear evaluation functions to try to solve the checkers
[00:30:50] functions to try to solve the checkers program so a bunch of other things that
[00:30:52] program so a bunch of other things that he did included adding intermediate
[00:30:55] he did included adding intermediate rewards so kind of threw out like sui to
[00:30:57] rewards so kind of threw out like sui to get to the end point he added some
[00:30:59] get to the end point he added some intermediate
[00:30:59] intermediate words use alpha-beta pruning and some
[00:31:02] words use alpha-beta pruning and some search heuristics and and it was kind of
[00:31:03] search heuristics and and it was kind of impressive like what he did in 50's like
[00:31:05] impressive like what he did in 50's like he ended up having this game that was
[00:31:08] he ended up having this game that was playing like I was reaching like human
[00:31:09] playing like I was reaching like human em amateur level of play and he only
[00:31:12] em amateur level of play and he only used like 9k of memory which is like
[00:31:15] used like 9k of memory which is like really impressive if you think about it
[00:31:16] really impressive if you think about it so so this idea of learning in games is
[00:31:19] so so this idea of learning in games is old people have been using it in the
[00:31:21] old people have been using it in the case of backgammon this was around 90s
[00:31:24] case of backgammon this was around 90s when tesora came up with with an
[00:31:26] when tesora came up with with an algorithm to solve the game of
[00:31:28] algorithm to solve the game of backgammon so he specifically used this
[00:31:30] backgammon so he specifically used this TV Holanda for them which is similar to
[00:31:33] TV Holanda for them which is similar to the TV learning that we have talked
[00:31:34] the TV learning that we have talked about it has this lambda temperature
[00:31:37] about it has this lambda temperature parameter that that kind of tells us how
[00:31:39] parameter that that kind of tells us how good states are like as they get far
[00:31:40] good states are like as they get far from the reward he didn't have any any
[00:31:43] from the reward he didn't have any any intermediate rewards usually dumb
[00:31:45] intermediate rewards usually dumb features but then he used neural
[00:31:46] features but then he used neural networks which was kind of cool and he
[00:31:49] networks which was kind of cool and he was able to reach human experts play and
[00:31:52] was able to reach human experts play and kind of gave us and this kind of gave us
[00:31:54] kind of gave us and this kind of gave us some insights into how to play games and
[00:31:56] some insights into how to play games and how to solve like these really difficult
[00:31:57] how to solve like these really difficult problems and then more recently we have
[00:32:00] problems and then more recently we have been looking at the game of growth so in
[00:32:02] been looking at the game of growth so in 2016 we had alphago which was using a
[00:32:05] 2016 we had alphago which was using a lot of expert knowledge in addition to
[00:32:08] lot of expert knowledge in addition to ideas from Monte Carlo tree search and
[00:32:10] ideas from Monte Carlo tree search and then in 2017 yet alphago zero which
[00:32:13] then in 2017 yet alphago zero which wasn't using even expert knowledge it
[00:32:15] wasn't using even expert knowledge it was all like based on self play it was
[00:32:18] was all like based on self play it was using dumb features neural networks and
[00:32:20] using dumb features neural networks and then basically the main idea was using
[00:32:23] then basically the main idea was using Monte Carlo tree search to try to solve
[00:32:25] Monte Carlo tree search to try to solve this really challenging difficult
[00:32:27] this really challenging difficult problem so I think in this section
[00:32:29] problem so I think in this section you're gonna talk a little bit about
[00:32:30] you're gonna talk a little bit about alphago 0-2 you're attending section all
[00:32:36] alphago 0-2 you're attending section all right so the summary so far is we've
[00:32:38] right so the summary so far is we've been talking about parameterizing these
[00:32:39] been talking about parameterizing these evaluation functions using using
[00:32:41] evaluation functions using using features and the idea of TD learning is
[00:32:44] features and the idea of TD learning is to look at this error between our
[00:32:46] to look at this error between our prediction and our target and try to
[00:32:48] prediction and our target and try to minimize that error and find better
[00:32:51] minimize that error and find better double use as we go through so um
[00:32:53] double use as we go through so um alright so that was learning and in
[00:32:56] alright so that was learning and in games so now I want to spend a little
[00:32:59] games so now I want to spend a little bit of time talking about other
[00:33:01] bit of time talking about other variations of games so so the setting
[00:33:04] variations of games so so the setting where we take our games two simultaneous
[00:33:06] where we take our games two simultaneous games from turn-based and then the
[00:33:08] games from turn-based and then the setting or we go from zero song
[00:33:10] setting or we go from zero song nonzero-sum alright okay simultaneous
[00:33:17] nonzero-sum alright okay simultaneous games so um all right so so far we have
[00:33:21] games so um all right so so far we have talked about turn-based games like chess
[00:33:23] talked about turn-based games like chess where you play and an X player plays and
[00:33:26] where you play and an X player plays and you play a next play in place and
[00:33:27] you play a next play in place and minimax key and strategies seem to be
[00:33:30] minimax key and strategies seem to be pretty okay but it comes to solving
[00:33:32] pretty okay but it comes to solving these time bases but not all games are
[00:33:35] these time bases but not all games are turn-based right like an example of it
[00:33:37] turn-based right like an example of it is rock-paper-scissors you're all
[00:33:39] is rock-paper-scissors you're all playing at the same time everyone is
[00:33:40] playing at the same time everyone is playing simultaneously the question is
[00:33:42] playing simultaneously the question is how do we go about solving a
[00:33:45] how do we go about solving a simultaneous so let's search with a game
[00:33:50] simultaneous so let's search with a game that is a simplified version of
[00:33:52] that is a simplified version of rock-paper-scissors this is called a two
[00:33:55] rock-paper-scissors this is called a two finger Mora game so the way it works is
[00:33:58] finger Mora game so the way it works is we have two players player a and player
[00:34:00] we have two players player a and player B and each player is going to show
[00:34:03] B and each player is going to show either one finger or two fingers and
[00:34:05] either one finger or two fingers and then you're playing at the same time and
[00:34:07] then you're playing at the same time and then the way it works is if both of the
[00:34:10] then the way it works is if both of the players show one at the same time then
[00:34:13] players show one at the same time then player B gives two dollars to play or a
[00:34:15] player B gives two dollars to play or a if both of you show two at the same time
[00:34:18] if both of you show two at the same time player B gives player a four dollars and
[00:34:21] player B gives player a four dollars and then if you show different numbers like
[00:34:24] then if you show different numbers like 1 or 2 or 2 or 1 then player a has to
[00:34:27] 1 or 2 or 2 or 1 then player a has to give they'll give 3 dollars to 2 player
[00:34:29] give they'll give 3 dollars to 2 player B ok does that make sense so can you
[00:34:32] B ok does that make sense so can you guys talk to your neighbors and play the
[00:34:35] guys talk to your neighbors and play the same
[00:34:37] [Music]
[00:35:02] all right so what was the outcome how
[00:35:08] all right so what was the outcome how many of you are in the case or a chose
[00:35:10] many of you are in the case or a chose one Danby chose one oh yeah one ok
[00:35:15] one Danby chose one oh yeah one ok Whomper here a chose one we chose to
[00:35:19] Whomper here a chose one we chose to perfect like four people played so I
[00:35:22] perfect like four people played so I chose to be chose one okay - and then -
[00:35:28] chose to be chose one okay - and then - in - all right so so you can kind of see
[00:35:32] in - all right so so you can kind of see like a whole mix of strategies here
[00:35:35] like a whole mix of strategies here happening and this is the game that
[00:35:36] happening and this is the game that you're going to talk about a little bit
[00:35:38] you're going to talk about a little bit and think about what would be a good
[00:35:39] and think about what would be a good strategy to use when you're solving this
[00:35:42] strategy to use when you're solving this this simultaneous game yeah all right so
[00:35:45] this simultaneous game yeah all right so um all right so let's formalize this
[00:35:47] um all right so let's formalize this yeah player a and player B you have
[00:35:50] yeah player a and player B you have these possible actions of showing one or
[00:35:52] these possible actions of showing one or two and then you're gonna use this this
[00:35:55] two and then you're gonna use this this payoff matrix which is which represents
[00:35:57] payoff matrix which is which represents a utility if a chooses action a and B
[00:36:01] a utility if a chooses action a and B chooses action B so so before we had
[00:36:03] chooses action B so so before we had this this value function right before
[00:36:05] this this value function right before the itis value function over over our
[00:36:08] the itis value function over over our state here now we have this value
[00:36:10] state here now we have this value function that is that is again from the
[00:36:18] function that is that is again from the perspective of a agent a so remember
[00:36:20] perspective of a agent a so remember like before we were thinking about value
[00:36:22] like before we were thinking about value of function you were looking at it from
[00:36:23] of function you were looking at it from the perspective of the first player the
[00:36:25] the perspective of the first player the Maximizer player the agent now I'm
[00:36:27] Maximizer player the agent now I'm looking at all of these scares from the
[00:36:29] looking at all of these scares from the perspective of a player so so I'm trying
[00:36:31] perspective of a player so so I'm trying to like get good things for a yeah and
[00:36:38] to like get good things for a yeah and then this is like a one step game -
[00:36:40] then this is like a one step game - right so so like you're just playing and
[00:36:42] right so so like you're just playing and then you see what you get so so we're
[00:36:43] then you see what you get so so we're not talking about it repeated games
[00:36:45] not talking about it repeated games you're playing you see what happens okay
[00:36:47] you're playing you see what happens okay so so we have this V which is V of a and
[00:36:51] so so we have this V which is V of a and B and and this basically represent ace
[00:36:54] B and and this basically represent ace utility if agent a plays a and if agent
[00:36:58] utility if agent a plays a and if agent B plays and this is called
[00:37:00] B plays and this is called and then you can represent this with a
[00:37:01] and then you can represent this with a matrix and that's why it's called a
[00:37:03] matrix and that's why it's called a payoff matrix I'm gonna write that
[00:37:04] payoff matrix I'm gonna write that payoff matrix here so payoff matrix a
[00:37:10] payoff matrix here so payoff matrix a you me here agent a can show one or can
[00:37:14] you me here agent a can show one or can show two agent B can show one or can
[00:37:17] show two agent B can show one or can show to right if both of us show one at
[00:37:20] show to right if both of us show one at the same time agent a gets two dollars
[00:37:22] the same time agent a gets two dollars if both of us show two at the same time
[00:37:24] if both of us show two at the same time agent a gets four dollars otherwise
[00:37:27] agent a gets four dollars otherwise agent a has to pay so agent a gets minus
[00:37:30] agent a has to pay so agent a gets minus three dollars and again the reason I
[00:37:33] three dollars and again the reason I only like to talk about one way is we
[00:37:35] only like to talk about one way is we are still in the setting of zero-sum
[00:37:37] are still in the setting of zero-sum games so whatever we age in a gets agent
[00:37:40] games so whatever we age in a gets agent B gets negative of that right so so if
[00:37:43] B gets negative of that right so so if agent a gets foreign dollars agent B is
[00:37:45] agent a gets foreign dollars agent B is paying minus four dollars so I'm just
[00:37:47] paying minus four dollars so I'm just writing one B from perspective of agent
[00:37:50] writing one B from perspective of agent a and this is called the payoff matrix
[00:37:52] a and this is called the payoff matrix all right so so now we need to talk
[00:37:55] all right so so now we need to talk about what does a solution mean in this
[00:37:58] about what does a solution mean in this setting so so what is a policy in this
[00:37:59] setting so so what is a policy in this setting and then the way we refer to
[00:38:02] setting and then the way we refer to them in this case are as strategies so
[00:38:04] them in this case are as strategies so we have pure strategy which is almost
[00:38:07] we have pure strategy which is almost like the same thing as as deterministic
[00:38:10] like the same thing as as deterministic policies so a pure strategy is just a
[00:38:13] policies so a pure strategy is just a single action that you decide to take so
[00:38:15] single action that you decide to take so you have things like period strategies
[00:38:19] you have things like period strategies your strategies the difference between
[00:38:23] your strategies the difference between pure strategy and an deterministic
[00:38:26] pure strategy and an deterministic policy is if you remember a
[00:38:27] policy is if you remember a deterministic policy again is a function
[00:38:28] deterministic policy again is a function of States right so so it's a policy as a
[00:38:31] of States right so so it's a policy as a function of state it gives you an action
[00:38:32] function of state it gives you an action here we have like a one move game right
[00:38:34] here we have like a one move game right so it's just that one action and we call
[00:38:36] so it's just that one action and we call it pure strategy we have also this other
[00:38:40] it pure strategy we have also this other thing that's called mixed strategy which
[00:38:43] thing that's called mixed strategy which is equivalent to two stochastic policies
[00:38:45] is equivalent to two stochastic policies and what a mixed strategy is is is a
[00:38:48] and what a mixed strategy is is is a probability distribution that tells you
[00:38:50] probability distribution that tells you what's the probability of you choosing
[00:38:51] what's the probability of you choosing so so fewer strategies are just actions
[00:38:55] so so fewer strategies are just actions ace and then you can have things that
[00:38:59] ace and then you can have things that are called mixed strategies and they are
[00:39:03] are called mixed strategies and they are probabilities of our choosing
[00:39:06] probabilities of our choosing actually okay all right
[00:39:11] actually okay all right so here is an example so if you say well
[00:39:14] so here is an example so if you say well I'm gonna show you one I'm going to
[00:39:16] I'm gonna show you one I'm going to always show you one then the if you can
[00:39:19] always show you one then the if you can you can write that strategy as a pure
[00:39:21] you can write that strategy as a pure strategy that says I'm
[00:39:22] strategy that says I'm always be probable do you want to show
[00:39:24] always be probable do you want to show you one and probably what is zero show
[00:39:26] you one and probably what is zero show you two so so let's say the first column
[00:39:29] you two so so let's say the first column is for showing one the second column is
[00:39:31] is for showing one the second column is for sure so so this is a pure strategy
[00:39:33] for sure so so this is a pure strategy that says always I'm going to show you
[00:39:35] that says always I'm going to show you one if I told you well I always I'm
[00:39:37] one if I told you well I always I'm gonna show you two then I can write that
[00:39:39] gonna show you two then I can write that strategy like this right with
[00:39:41] strategy like this right with probability 1
[00:39:42] probability 1 I'm always showing you 2 I can also come
[00:39:45] I'm always showing you 2 I can also come up with a mixed strategy mix strategy
[00:39:46] up with a mixed strategy mix strategy with me I'm gonna flip a coin and if I
[00:39:49] with me I'm gonna flip a coin and if I get 1/2 I'm gonna give you I find if I
[00:39:51] get 1/2 I'm gonna give you I find if I get heads I'm gonna show you one if I
[00:39:54] get heads I'm gonna show you one if I get tails I'm gonna show you two and
[00:39:58] get tails I'm gonna show you two and then you can write that as this and this
[00:40:00] then you can write that as this and this is going to be a mixed strategy you
[00:40:01] is going to be a mixed strategy you could totally play that to your in the
[00:40:02] could totally play that to your in the simultaneous game you could just bring
[00:40:04] simultaneous game you could just bring chance in and be like half the time I'm
[00:40:07] chance in and be like half the time I'm gonna show you one half the time I'm
[00:40:09] gonna show you one half the time I'm going to show you two based on chance
[00:40:11] going to show you two based on chance everyone happy with mixed strategies and
[00:40:14] everyone happy with mixed strategies and your strategies alright so so how do we
[00:40:18] your strategies alright so so how do we evaluate a value of a game so so
[00:40:20] evaluate a value of a game so so remember in previous lecture and like in
[00:40:23] remember in previous lecture and like in the MVP lecture even we were talking
[00:40:25] the MVP lecture even we were talking about evaluating if someone gives me the
[00:40:27] about evaluating if someone gives me the policy how do I evaluate how good that
[00:40:29] policy how do I evaluate how good that is so the way we are evaluating that is
[00:40:31] is so the way we are evaluating that is again by this value function me and we
[00:40:33] again by this value function me and we are gonna write this value function as a
[00:40:35] are gonna write this value function as a function of PI a and PI B you know I'll
[00:40:38] function of PI a and PI B you know I'll just write that up here or I'm gonna
[00:40:40] just write that up here or I'm gonna erase this so I'm gonna say a value of
[00:40:46] erase this so I'm gonna say a value of agent a following PI a and agent be
[00:40:49] agent a following PI a and agent be following PI B what is that equal to
[00:40:51] following PI B what is that equal to well that is going to be the setting
[00:40:53] well that is going to be the setting where PI a chooses action a PI B chooses
[00:40:59] where PI a chooses action a PI B chooses action be x value of choice a and B
[00:41:03] action be x value of choice a and B summing over all possible aims okay so
[00:41:06] summing over all possible aims okay so so let's look at an actual example for
[00:41:08] so let's look at an actual example for this so so for this particular case of
[00:41:11] this so so for this particular case of two-finger more game let's say someone
[00:41:13] two-finger more game let's say someone comes in and says I'm gonna tell you
[00:41:16] comes in and says I'm gonna tell you what PI a is policy of agent a is just
[00:41:20] what PI a is policy of agent a is just always show one and policy of agent B is
[00:41:24] always show one and policy of agent B is this this mixed strategy which is half
[00:41:27] this this mixed strategy which is half the time show one half the time show
[00:41:29] the time show one half the time show show two and and the question is what is
[00:41:32] show two and and the question is what is the value of these two policies how do
[00:41:35] the value of these two policies how do we compute that
[00:41:39] well I'm gonna use my payoff matrix
[00:41:41] well I'm gonna use my payoff matrix right so so 1 times 1 over 2 times the
[00:41:46] right so so 1 times 1 over 2 times the value that we get 1 which is equal to 2
[00:41:49] value that we get 1 which is equal to 2 so it's 1 times 1 1 over 2 times 2 plus
[00:41:53] so it's 1 times 1 1 over 2 times 2 plus 0 times 1 over 2 times 4 plus 1 times 1
[00:42:02] 0 times 1 over 2 times 4 plus 1 times 1 over 2 times minus 3 the value that I
[00:42:06] over 2 times minus 3 the value that I get is minus 3 plus 0 times 1 over 2
[00:42:10] get is minus 3 plus 0 times 1 over 2 times minus 3 and well what is that
[00:42:15] times minus 3 and well what is that equal to what is that equal to
[00:42:17] equal to what is that equal to there are two zeroes here that's minus 1
[00:42:19] there are two zeroes here that's minus 1 over 2 okay so I just computed that the
[00:42:23] over 2 okay so I just computed that the value of these two policies is going to
[00:42:26] value of these two policies is going to be minus 1 over 2 and again this is a
[00:42:28] be minus 1 over 2 and again this is a front perspective of agent a and it kind
[00:42:31] front perspective of agent a and it kind of makes sense right if agent a tells
[00:42:33] of makes sense right if agent a tells you I'm gonna always show you 1 then
[00:42:36] you I'm gonna always show you 1 then probably agent and an agent 2 is
[00:42:37] probably agent and an agent 2 is following this mixed strategy agent a is
[00:42:39] following this mixed strategy agent a is probably losing an agent a is losing
[00:42:42] probably losing an agent a is losing minus 1 over 2 that opens of a whole set
[00:43:07] minus 1 over 2 that opens of a whole set of new questions and you're not
[00:43:08] of new questions and you're not discussing this class so that introduces
[00:43:11] discussing this class so that introduces repeated games so you might be
[00:43:13] repeated games so you might be interested in looking at what happens in
[00:43:14] interested in looking at what happens in repeated games in this class right now
[00:43:16] repeated games in this class right now we were just talking about this one step
[00:43:17] we were just talking about this one step one play we're playing like zero-sum
[00:43:19] one play we're playing like zero-sum game and we're playing like Jose
[00:43:21] game and we're playing like Jose rock-paper-scissors and you just play
[00:43:23] rock-paper-scissors and you just play once like you might say oh what happens
[00:43:25] once like you might say oh what happens if you play you're like 10 times then
[00:43:26] if you play you're like 10 times then you're building some relationship and
[00:43:28] you're building some relationship and weird things can happen and so so that
[00:43:31] weird things can happen and so so that introduces a whole new class oh all
[00:43:35] introduces a whole new class oh all right so so the value is equal to minus
[00:43:37] right so so the value is equal to minus 1 over 2 okay
[00:43:39] 1 over 2 okay all right so so that was a game value so
[00:43:42] all right so so that was a game value so so we just evaluated it right if someone
[00:43:45] so we just evaluated it right if someone tells me it's Phi a and PI B I can
[00:43:46] tells me it's Phi a and PI B I can evaluate it I can know how good PI a and
[00:43:49] evaluate it I can know how good PI a and PI B is
[00:43:50] PI B is from the perspective of agent a okay so
[00:43:53] from the perspective of agent a okay so what do we want to do like when we saw
[00:43:54] what do we want to do like when we saw we want to try to solve games all we
[00:43:57] we want to try to solve games all we want to do is from the agent ace
[00:43:58] want to do is from the agent ace perspective you want to maximize this I
[00:44:01] perspective you want to maximize this I want to get as much money as possible
[00:44:03] want to get as much money as possible and its values from my agent a
[00:44:05] and its values from my agent a perspective so I should be trying to
[00:44:07] perspective so I should be trying to maximize this agent mean you should be
[00:44:09] maximize this agent mean you should be trying to minimize this like thinking
[00:44:11] trying to minimize this like thinking minimax agent B should be making
[00:44:12] minimax agent B should be making minimizing this agent a should be
[00:44:14] minimizing this agent a should be maximizing this yeah that's what we want
[00:44:16] maximizing this yeah that's what we want to do but the challenge here is we're
[00:44:18] to do but the challenge here is we're playing simultaneously so we can't
[00:44:20] playing simultaneously so we can't really use the minimax tree we can
[00:44:22] really use the minimax tree we can remember any minimax 3 like in that
[00:44:24] remember any minimax 3 like in that setting we had sequential place and then
[00:44:26] setting we had sequential place and then quick like wait for agent a to play and
[00:44:28] quick like wait for agent a to play and then after that play and that would give
[00:44:30] then after that play and that would give us a lot of information here we are
[00:44:32] us a lot of information here we are playing simultaneously so what should we
[00:44:34] playing simultaneously so what should we do ok so what should we do so I'm gonna
[00:44:38] do ok so what should we do so I'm gonna assume we can play sequential e so
[00:44:40] assume we can play sequential e so that's what I want to do for now so so
[00:44:42] that's what I want to do for now so so I'm gonna limit myself to pure
[00:44:44] I'm gonna limit myself to pure strategies so maybe I'll I'll come over
[00:44:48] strategies so maybe I'll I'll come over here so right now I'm gonna focus only
[00:44:52] here so right now I'm gonna focus only on pure strategies I won't just consider
[00:44:54] on pure strategies I won't just consider a setting very limited setting and see
[00:44:58] a setting very limited setting and see what happens and I'm gonna assume oh
[00:45:01] what happens and I'm gonna assume oh what if what if we were to play
[00:45:02] what if what if we were to play sequentially what would happen how bad
[00:45:04] sequentially what would happen how bad would it be if we were to play
[00:45:06] would it be if we were to play sequentially so so we have the setting
[00:45:09] sequentially so so we have the setting where player a place goes first
[00:45:16] what do you think we do you think like
[00:45:18] what do you think we do you think like if player a goes first is that better
[00:45:20] if player a goes first is that better for player a or is that worse for player
[00:45:22] for player a or is that worse for player i worse for player II ok so that's
[00:45:25] i worse for player II ok so that's probably what's gonna happen find out so
[00:45:28] probably what's gonna happen find out so player a was trying to maximize right
[00:45:31] player a was trying to maximize right this me player B was trying to minimize
[00:45:35] this me player B was trying to minimize right and then each of them have a have
[00:45:38] right and then each of them have a have actions of either showing 1 or showing 2
[00:45:41] actions of either showing 1 or showing 2 this is player a B I can shoot one or
[00:45:46] this is player a B I can shoot one or show one or two right if you do want if
[00:45:48] show one or two right if you do want if we show one one player a gets what $2 is
[00:45:51] we show one one player a gets what $2 is that right that's right and otherwise
[00:45:57] that right that's right and otherwise for your it gets - three dollars if you
[00:46:00] for your it gets - three dollars if you have two two player a gets four dollars
[00:46:02] have two two player a gets four dollars so
[00:46:04] so okay so so now if we have this
[00:46:07] okay so so now if we have this sequential sitting if you're playing
[00:46:09] sequential sitting if you're playing minimax then player B's going second
[00:46:12] minimax then player B's going second player B is going to take the minimizer
[00:46:14] player B is going to take the minimizer here so player B is going to be like
[00:46:15] here so player B is going to be like this one and in this case player B is
[00:46:18] this one and in this case player B is going to be like this one what should
[00:46:20] going to be like this one what should player a do well in both cases player a
[00:46:24] player a do well in both cases player a is getting minus three dollars it
[00:46:25] is getting minus three dollars it doesn't actually matter play you're a
[00:46:26] doesn't actually matter play you're a could do any of them and pull your a at
[00:46:29] could do any of them and pull your a at the end of the day is going to get minus
[00:46:31] the end of the day is going to get minus three dollars and this is a case where a
[00:46:33] three dollars and this is a case where a player a goes first
[00:46:34] player a goes first what if player a goes second second soso
[00:46:44] what if player a goes second second soso then player B is going first player B is
[00:46:47] then player B is going first player B is minimizing and then play your a is
[00:46:50] minimizing and then play your a is maximizing and we have the same values
[00:46:56] maximizing and we have the same values here okay
[00:46:58] here okay so this is this is player a going second
[00:47:01] so this is this is player a going second player a going second tries to maximize
[00:47:04] player a going second tries to maximize so we'd like to pick these ones player
[00:47:07] so we'd like to pick these ones player B's is here player B wants to minimize
[00:47:10] B's is here player B wants to minimize so player B is going to be like okay if
[00:47:14] so player B is going to be like okay if you're going second I'd rather I'd
[00:47:16] you're going second I'd rather I'd rather show you one because by showing
[00:47:18] rather show you one because by showing you one I'm losing less if I show you
[00:47:21] you one I'm losing less if I show you two I'm losing even more so and then in
[00:47:26] two I'm losing even more so and then in that setting we're gonna get to so
[00:47:28] that setting we're gonna get to so player a is going to get two dollars all
[00:47:33] player a is going to get two dollars all right so that was kind of intuitive if
[00:47:35] right so that was kind of intuitive if you have fewer strategies it looks like
[00:47:37] you have fewer strategies it looks like if you're going second that should be
[00:47:39] if you're going second that should be better so so going second is no worse
[00:47:43] better so so going second is no worse it's the same or better and that
[00:47:46] it's the same or better and that basically can be represented by this
[00:47:49] basically can be represented by this minimax relationship right so so agent a
[00:47:52] minimax relationship right so so agent a is trying to maximize so so in this
[00:47:54] is trying to maximize so so in this second case
[00:47:57] in the second case we are maximizing
[00:48:03] in the second case we are maximizing second over our actions of via a and B
[00:48:06] second over our actions of via a and B and player B is going first so this is
[00:48:11] and player B is going first so this is going to be greater than or equal to the
[00:48:14] going to be greater than or equal to the case where player a is going first sorry
[00:48:22] case where player a is going first sorry no not me that makes sense
[00:48:31] so I'm gonna just write these things
[00:48:35] so I'm gonna just write these things that you're learning throughout on the
[00:48:37] that you're learning throughout on the side of the board maybe up here so what
[00:48:40] side of the board maybe up here so what did we just learn we learned if we have
[00:48:44] did we just learn we learned if we have fewer strategies if we have pure
[00:48:47] fewer strategies if we have pure strategies right going second is better
[00:48:56] that sounds intuitive and right okay so
[00:49:03] that sounds intuitive and right okay so far says it okay so the question that I
[00:49:08] far says it okay so the question that I want to try to think about right now is
[00:49:09] want to try to think about right now is what if we have mixed strategies what's
[00:49:12] what if we have mixed strategies what's gonna happen if you have mixed
[00:49:13] gonna happen if you have mixed strategies are we going to get the same
[00:49:14] strategies are we going to get the same thing look if you have mixed strategies
[00:49:16] thing look if you have mixed strategies it's going second better or is it worse
[00:49:18] it's going second better or is it worse or is it the same so that's a question
[00:49:20] or is it the same so that's a question in charge at okay so so let's say player
[00:49:22] in charge at okay so so let's say player a comes in and play your a says well I'm
[00:49:25] a comes in and play your a says well I'm gonna reveal my strategy to you what I'm
[00:49:27] gonna reveal my strategy to you what I'm gonna do is I'm gonna flip a coin
[00:49:28] gonna do is I'm gonna flip a coin depending on what it comes I'm either
[00:49:30] depending on what it comes I'm either show you're going to show you one or I'm
[00:49:32] show you're going to show you one or I'm going to show you two that's what I'm
[00:49:33] going to show you two that's what I'm gonna tell you tell you that's what I'm
[00:49:34] gonna tell you tell you that's what I'm gonna do so so what would be the value
[00:49:37] gonna do so so what would be the value of the game under that setting so the
[00:49:39] of the game under that setting so the value of the game would be maybe all
[00:49:42] value of the game would be maybe all right it's here so the value of PI a and
[00:49:47] right it's here so the value of PI a and PI B PI ay is already this mixed
[00:49:51] PI B PI ay is already this mixed strategy of 1/2 1/2 right is going to be
[00:49:55] strategy of 1/2 1/2 right is going to be equal to PI is this actually alright so
[00:50:00] equal to PI is this actually alright so what is that going to be equal to its
[00:50:03] what is that going to be equal to its going to be PI B times 1 right type so
[00:50:08] going to be PI B times 1 right type so it's going to be PI
[00:50:09] it's going to be PI be choosing one times one happy
[00:50:16] be choosing one times one happy probability 1/2 agent is also picking
[00:50:18] probability 1/2 agent is also picking one if it is one one you're gonna get to
[00:50:20] one if it is one one you're gonna get to write plus I be choosing one PI a with
[00:50:29] write plus I be choosing one PI a with 1/2 choosing one and you're gonna get -
[00:50:32] 1/2 choosing one and you're gonna get - $3 so choosing - you're gonna get -
[00:50:35] $3 so choosing - you're gonna get - $3.00 plus PI be choosing 2 times 1/2 PI
[00:50:42] $3.00 plus PI be choosing 2 times 1/2 PI a choosing do - you're gonna get $4 plus
[00:50:45] a choosing do - you're gonna get $4 plus PI be choosing 2 times pi a choosing 1
[00:50:49] PI be choosing 2 times pi a choosing 1 and that's - - so I just like iterated
[00:50:52] and that's - - so I just like iterated all the four options that we can get
[00:50:54] all the four options that we can get here under the policy of PI be choosing
[00:50:57] here under the policy of PI be choosing one or two and then PI is always just
[00:51:00] one or two and then PI is always just are following this mixed strategy so
[00:51:03] are following this mixed strategy so well what is this equal to
[00:51:04] well what is this equal to that's equal to minus 1 over 2 pi B of 1
[00:51:09] that's equal to minus 1 over 2 pi B of 1 plus 1 over 2i b of 2 so that's the
[00:51:16] plus 1 over 2i b of 2 so that's the value ok so again the setting is someone
[00:51:18] value ok so again the setting is someone came in agent a came in AJ told me I'm
[00:51:21] came in agent a came in AJ told me I'm following this mixed strategy this is
[00:51:23] following this mixed strategy this is gonna be the thing I'm gonna do what
[00:51:25] gonna be the thing I'm gonna do what should I do as an agency what should I
[00:51:28] should I do as an agency what should I do as an agent Lee
[00:51:32] so ok so now it's so quick so you always
[00:51:35] so ok so now it's so quick so you always have to do one but why why is that well
[00:51:38] have to do one but why why is that well well if agent a comes and tells me well
[00:51:40] well if agent a comes and tells me well this is the thing I want to do I should
[00:51:42] this is the thing I want to do I should try to minimize value of agent a right
[00:51:44] try to minimize value of agent a right so what I'm really trying to do as agent
[00:51:46] so what I'm really trying to do as agent B is to minimize this right cuz I don't
[00:51:49] B is to minimize this right cuz I don't want agent a to get anything so if I'm
[00:51:51] want agent a to get anything so if I'm minimizing this and some sense I'm
[00:51:52] minimizing this and some sense I'm trying to come up with a policy that
[00:51:54] trying to come up with a policy that minimizes this is a probability so it's
[00:51:57] minimizes this is a probability so it's like a positive number I have like
[00:51:58] like a positive number I have like positive part and negative part here the
[00:52:01] positive part and negative part here the way to minimize this is to put as much
[00:52:03] way to minimize this is to put as much weight as possible for this side and as
[00:52:06] weight as possible for this side and as little as possible for this side so that
[00:52:08] little as possible for this side so that tells me that never show too and always
[00:52:11] tells me that never show too and always show one does anyone see that so so the
[00:52:15] show one does anyone see that so so the best thing that I can do as agent 2 is
[00:52:18] best thing that I can do as agent 2 is to follow a pure strategy that always
[00:52:20] to follow a pure strategy that always shows 1 and never shows
[00:52:24] okay so this was kind of interesting
[00:52:28] okay so this was kind of interesting right like if someone comes in and tells
[00:52:29] right like if someone comes in and tells me this is the thing this is a mixed
[00:52:31] me this is the thing this is a mixed strategy I'm gonna follow I'll have a
[00:52:33] strategy I'm gonna follow I'll have a solution in response to that and that
[00:52:35] solution in response to that and that solution is always going to be a pure
[00:52:36] solution is always going to be a pure strategy actually so that's I hope cool
[00:52:45] all right so so this is actually what's
[00:52:47] all right so so this is actually what's happening in a more general case I'm
[00:52:48] happening in a more general case I'm gonna make a lot of generalizations in
[00:52:50] gonna make a lot of generalizations in this lecture so I show one example and I
[00:52:53] this lecture so I show one example and I generalize it but if you're interested
[00:52:54] generalize it but if you're interested in details of it we can talk quite at
[00:52:56] in details of it we can talk quite at offline so yeah so the setting is for
[00:52:58] offline so yeah so the setting is for any fixed mixed strategy pie a so-so pie
[00:53:02] any fixed mixed strategy pie a so-so pie a told me what their mixed strategy is
[00:53:04] a told me what their mixed strategy is it's a fixed mixed and mixed strategy
[00:53:06] it's a fixed mixed and mixed strategy what I should do as agent P's I should
[00:53:08] what I should do as agent P's I should minimize that value I should pick PI B
[00:53:10] minimize that value I should pick PI B in a way that minimizes that value and
[00:53:12] in a way that minimizes that value and that can be attained by pure strategy so
[00:53:15] that can be attained by pure strategy so the second thing that I've learned here
[00:53:17] the second thing that I've learned here is if player a plays plays mixed
[00:53:31] is if player a plays plays mixed strategy makes strategy player B has an
[00:53:40] strategy makes strategy player B has an optimal pure strategy that's kind of
[00:53:45] optimal pure strategy that's kind of interesting right okay so so in this
[00:53:53] interesting right okay so so in this case also we haven't decided what the
[00:53:56] case also we haven't decided what the policies should be yet right like behave
[00:53:58] policies should be yet right like behave you have started you still we have still
[00:54:00] you have started you still we have still been talking about the setting we're PI
[00:54:01] been talking about the setting we're PI it like agent a comes in and tells us
[00:54:03] it like agent a comes in and tells us what their policy is and we know how to
[00:54:05] what their policy is and we know how to respond to it it's going to be a pure
[00:54:06] respond to it it's going to be a pure strategy so now I want to figure out
[00:54:09] strategy so now I want to figure out what is this this policy like what it
[00:54:11] what is this this policy like what it what should be this mixed strategy
[00:54:12] what should be this mixed strategy actually so so I want to think of it
[00:54:14] actually so so I want to think of it more generally so so I want to go back
[00:54:16] more generally so so I want to go back to those two diagrams and actually
[00:54:18] to those two diagrams and actually modify those two diagrams in a way where
[00:54:20] modify those two diagrams in a way where you talk about it a little bit more
[00:54:22] you talk about it a little bit more generally maybe yeah I'll just modify
[00:54:27] generally maybe yeah I'll just modify these okay so so let's say that okay and
[00:54:32] these okay so so let's say that okay and I'm going to think about both of the
[00:54:33] I'm going to think about both of the settings so let's say it again player a
[00:54:35] settings so let's say it again player a is deciding to go
[00:54:36] is deciding to go first player a is going to follow and
[00:54:39] first player a is going to follow and make a mixed strategy so this is all we
[00:54:41] make a mixed strategy so this is all we know but we don't know what mix strategy
[00:54:43] know but we don't know what mix strategy play player a is going to decide to do
[00:54:45] play player a is going to decide to do to follow a mixed strategy this is
[00:54:48] to follow a mixed strategy this is player a player ace maximizing player a
[00:54:51] player a player ace maximizing player a is falling a mixed strategy the way I'm
[00:54:53] is falling a mixed strategy the way I'm writing at mixed strategy there's more
[00:54:55] writing at mixed strategy there's more generally saying player a is gonna show
[00:54:57] generally saying player a is gonna show one a bit probability P and is going to
[00:54:59] one a bit probability P and is going to show two with probability 1 minus P or
[00:55:02] show two with probability 1 minus P or generally like some something value okay
[00:55:04] generally like some something value okay and then after that it's player B Stern
[00:55:08] and then after that it's player B Stern we have just seen that player B the best
[00:55:11] we have just seen that player B the best thing player B can do this is to do a
[00:55:13] thing player B can do this is to do a pure strategy so player B is either 100%
[00:55:16] pure strategy so player B is either 100% is going to pick one or a hundred
[00:55:19] is going to pick one or a hundred percent is going to pick turns then like
[00:55:43] percent is going to pick turns then like so the thing is that the strategies are
[00:55:46] so the thing is that the strategies are probabilities right so there are values
[00:55:48] probabilities right so there are values from 0 to 1 and then you kind of always
[00:55:51] from 0 to 1 and then you kind of always end up with this negative turn that
[00:55:53] end up with this negative turn that you're trying to eliminate as negative
[00:55:54] you're trying to eliminate as negative as possible and this positive term that
[00:55:56] as possible and this positive term that you're trying to get as positive as
[00:55:57] you're trying to get as positive as possible and that's kind of intuitively
[00:55:59] possible and that's kind of intuitively why you end up with a period strategy
[00:56:01] why you end up with a period strategy and Mercury strategy what I mean is you
[00:56:03] and Mercury strategy what I mean is you always end up like putting as much
[00:56:05] always end up like putting as much possible like 1 like all your
[00:56:07] possible like 1 like all your probabilities on the negative term and
[00:56:09] probabilities on the negative term and nothing on the positive term because
[00:56:11] nothing on the positive term because you're trying to minimize this so that's
[00:56:12] you're trying to minimize this so that's kind of like intuitively why you are
[00:56:14] kind of like intuitively why you are getting this here's to do so so you
[00:56:18] getting this here's to do so so you wouldn't get one so silver that's what I
[00:56:19] wouldn't get one so silver that's what I mean so like if you would have never get
[00:56:21] mean so like if you would have never get like 1/2 + 1 if you get 1/2 and 1/2
[00:56:23] like 1/2 + 1 if you get 1/2 and 1/2 that's that's a mixed strategy that's
[00:56:25] that's that's a mixed strategy that's not a pure strategy and I'm saying you
[00:56:27] not a pure strategy and I'm saying you wouldn't get a mixed strategy because
[00:56:29] wouldn't get a mixed strategy because you would always end up in this setting
[00:56:31] you would always end up in this setting that to minimize this you end up pushing
[00:56:33] that to minimize this you end up pushing all your probabilities is negative 1 all
[00:56:39] all your probabilities is negative 1 all right so so all right so let me go back
[00:56:40] right so so all right so let me go back to this so alright so we have this
[00:56:48] to this so alright so we have this setting or a player a goes first play
[00:56:50] setting or a player a goes first play you're a is following a mixed strategy
[00:56:52] you're a is following a mixed strategy with P and 1 minus P player B is going
[00:56:55] with P and 1 minus P player B is going to follow a period strategy either 1 or
[00:56:57] to follow a period strategy either 1 or 2 I don't know which one right so what's
[00:57:00] 2 I don't know which one right so what's gonna happen is if you have 1 1 and then
[00:57:02] gonna happen is if you have 1 1 and then then that is going to give me 2 value 2
[00:57:05] then that is going to give me 2 value 2 all right so it's 2 times P I'm trying
[00:57:08] all right so it's 2 times P I'm trying to write a value here by writing it wait
[00:57:10] to write a value here by writing it wait is it 2 times P plus yeah 1 minus P
[00:57:16] times 3 right so if we probability 1
[00:57:19] times 3 right so if we probability 1 minus P this guy's gonna pick 2 but this
[00:57:21] minus P this guy's gonna pick 2 but this guy picks one you're gonna get - three -
[00:57:24] guy picks one you're gonna get - three - three okay and then for this side if
[00:57:28] three okay and then for this side if with probability 1 minus P a is gonna
[00:57:31] with probability 1 minus P a is gonna show two if I am gonna show - then I'm
[00:57:34] show two if I am gonna show - then I'm gonna get four so it's 4 times 1 minus P
[00:57:39] gonna get four so it's 4 times 1 minus P and we 1200 TP designs are gonna show
[00:57:42] and we 1200 TP designs are gonna show one I'm going to show 2 so that is minus
[00:57:45] one I'm going to show 2 so that is minus 3 all right so what are these equal to
[00:57:53] 3 all right so what are these equal to so this is equal to 5 P minus 3 that is
[00:57:59] so this is equal to 5 P minus 3 that is equal to minus 7 P plus 4 right so so
[00:58:06] equal to minus 7 P plus 4 right so so I'm talking about this more general
[00:58:07] I'm talking about this more general cases in this more general case player a
[00:58:09] cases in this more general case player a comes in clear is playing first and it's
[00:58:13] comes in clear is playing first and it's following a mixed strategy but doesn't
[00:58:15] following a mixed strategy but doesn't know what P they should choose they're
[00:58:16] know what P they should choose they're choosing a P and 1 minus P here and then
[00:58:19] choosing a P and 1 minus P here and then player B has to follow a pure strategy
[00:58:21] player B has to follow a pure strategy that's what we decided
[00:58:23] that's what we decided and then under that case we either get
[00:58:25] and then under that case we either get 5p minus 3 and minus MP possible what
[00:58:28] 5p minus 3 and minus MP possible what should player B do here this is player B
[00:58:31] should player B do here this is player B and this min node
[00:58:32] and this min node what should player B do which we should
[00:58:35] what should player B do which we should player B pick one or two it should
[00:58:40] player B pick one or two it should player B should pick a thing that
[00:58:42] player B should pick a thing that minimizes between these two so player B
[00:58:45] minimizes between these two so player B is going to take the minimum of 5p minus
[00:58:50] is going to take the minimum of 5p minus 3 and minus 7 P plus 4 what should play
[00:58:56] 3 and minus 7 P plus 4 what should play your ad what should player a do I'm
[00:59:02] your ad what should player a do I'm thinking minimax right
[00:59:04] thinking minimax right so when you think about the minimax play
[00:59:06] so when you think about the minimax play you're a it's maximum maximizing the
[00:59:08] you're a it's maximum maximizing the value so player is going to maximize the
[00:59:10] value so player is going to maximize the value that comes up here so player is
[00:59:13] value that comes up here so player is going to maximize that and also I'm
[00:59:15] going to maximize that and also I'm saying clear air you needs to decide
[00:59:17] saying clear air you needs to decide what PDR picking so they're gonna pick a
[00:59:19] what PDR picking so they're gonna pick a P that maximizes that these computations
[00:59:30] P that maximizes that these computations yeah so these are the four different
[00:59:32] yeah so these are the four different things in my payoff matrix so I'm saying
[00:59:36] things in my payoff matrix so I'm saying is with probability P a is going to show
[00:59:39] is with probability P a is going to show me one right and I'm gonna go down this
[00:59:41] me one right and I'm gonna go down this other route where B is also choosing one
[00:59:43] other route where B is also choosing one so one like both of us are showing one
[00:59:45] so one like both of us are showing one then I'm gonna get two right so I'm
[00:59:47] then I'm gonna get two right so I'm gonna get $2 so that's where the two
[00:59:49] gonna get $2 so that's where the two dollars comes from times probability P
[00:59:51] dollars comes from times probability P with probability 1 minus P a is going to
[00:59:54] with probability 1 minus P a is going to show me - I'm gonna show one that's - $3
[00:59:58] show me - I'm gonna show one that's - $3 times probability 1 minus P so so that's
[01:00:01] times probability 1 minus P so so that's how and for this particular branch I
[01:00:04] how and for this particular branch I know the payoff is going to be 5 P minus
[01:00:06] know the payoff is going to be 5 P minus 3 that make sense and then for this side
[01:00:09] 3 that make sense and then for this side again like with probability 1 minus P is
[01:00:11] again like with probability 1 minus P is going to show me - if it is both of them
[01:00:13] going to show me - if it is both of them - I'm gonna get $4 that's why it's 4
[01:00:15] - I'm gonna get $4 that's why it's 4 times probability 1 minus P your
[01:00:18] times probability 1 minus P your probability P is going to show me one so
[01:00:20] probability P is going to show me one so that's why I'll lose $3 that's -3 times
[01:00:23] that's why I'll lose $3 that's -3 times probability P so that's minus 17 so and
[01:00:27] probability P so that's minus 17 so and then and then the second player what
[01:00:29] then and then the second player what they're gonna do is they're going to
[01:00:30] they're gonna do is they're going to minimize between these two values
[01:00:31] minimize between these two values they're gonna pick one or two they're
[01:00:33] they're gonna pick one or two they're going they're deciding should I pick one
[01:00:34] going they're deciding should I pick one or should I pick two and the way they
[01:00:36] or should I pick two and the way they are deciding that is by trying to pick
[01:00:38] are deciding that is by trying to pick pick one or two based on which one
[01:00:40] pick one or two based on which one minimizes I'm writing it like using this
[01:00:44] minimizes I'm writing it like using this variable P that's not decided yet and
[01:00:46] variable P that's not decided yet and this variable P is the thing that player
[01:00:48] this variable P is the thing that player a needs to decide so what what P should
[01:00:51] a needs to decide so what what P should tell you're a decide employer a should
[01:00:53] tell you're a decide employer a should decide repeated maximizes so I'm writing
[01:00:55] decide repeated maximizes so I'm writing like literally a minimax relationship
[01:00:57] like literally a minimax relationship here yeah all right so the interesting
[01:01:02] here yeah all right so the interesting thing here is this 5p minus 3 is some
[01:01:06] thing here is this 5p minus 3 is some line right with with positive slope
[01:01:08] line right with with positive slope this is 5p minus and this minus 7 p +
[01:01:13] this is 5p minus and this minus 7 p + where is another line - 7 P plus 4
[01:01:16] where is another line - 7 P plus 4 it's another line
[01:01:18] it's another line negative slope what is the minimum of
[01:01:20] negative slope what is the minimum of this where is going to be the minimum of
[01:01:22] this where is going to be the minimum of this happening anymore of these two
[01:01:24] this happening anymore of these two lines where they meet each other right
[01:01:29] lines where they meet each other right this is going to be a minimum so so the
[01:01:35] this is going to be a minimum so so the period I'm going to pick is going to be
[01:01:39] period I'm going to pick is going to be actually the P where the value of P
[01:01:41] actually the P where the value of P where these two are equal to each other
[01:01:43] where these two are equal to each other and that turns out to be at I don't know
[01:01:46] and that turns out to be at I don't know what it is 7 over 12 or something
[01:01:48] what it is 7 over 12 or something actually I don't know where this what is
[01:01:50] actually I don't know where this what is this value yeah so it's going to happen
[01:01:53] this value yeah so it's going to happen at 7 over 12 and the value of it is
[01:01:58] at 7 over 12 and the value of it is minus 1 over 12 right so ok so let's
[01:02:07] minus 1 over 12 right so ok so let's recap ok what did I do
[01:02:08] recap ok what did I do so I'm talking about the simultaneous
[01:02:11] so I'm talking about the simultaneous game but I'm relaxing it and making it
[01:02:12] game but I'm relaxing it and making it sequential I'm saying a is going to play
[01:02:14] sequential I'm saying a is going to play first base playing second the thing
[01:02:17] first base playing second the thing that's gonna happen is ace playing first
[01:02:18] that's gonna happen is ace playing first is deciding to choose a mixed strategy
[01:02:20] is deciding to choose a mixed strategy so he's deciding to say maybe 1/2 1/2
[01:02:24] so he's deciding to say maybe 1/2 1/2 well maybe the a doesn't want to say 1/2
[01:02:25] well maybe the a doesn't want to say 1/2 1/2 once to come up with some other
[01:02:26] 1/2 once to come up with some other probabilities so the thing is deciding
[01:02:29] probabilities so the thing is deciding is should I pick one and with
[01:02:31] is should I pick one and with probability P and should I pick 2 with
[01:02:33] probability P and should I pick 2 with probability 1 minus P and what should
[01:02:35] probability 1 minus P and what should that P be so so what is the probability
[01:02:36] that P be so so what is the probability I should be picking 1 so that's what a
[01:02:39] I should be picking 1 so that's what a is trying to decide here ok so whatever
[01:02:41] is trying to decide here ok so whatever a decides with P and 1 minus P ends up
[01:02:44] a decides with P and 1 minus P ends up in two different results and based on
[01:02:46] in two different results and based on them me is trying to minimize that when
[01:02:49] them me is trying to minimize that when B is trying to minimize that B is
[01:02:50] B is trying to minimize that B is minimizing between these two linear
[01:02:52] minimizing between these two linear functions these two linear functions
[01:02:54] functions these two linear functions meet at one point that is the points
[01:02:56] meet at one point that is the points that this thing is going to be minimized
[01:02:58] that this thing is going to be minimized and that actually corresponds to a
[01:02:59] and that actually corresponds to a p-value when a nice to maximize this
[01:03:02] p-value when a nice to maximize this this is I know but this is requires a
[01:03:04] this is I know but this is requires a little bit of thinking but any
[01:03:07] little bit of thinking but any clarification questions any see a lot of
[01:03:10] clarification questions any see a lot of boss faces so yeah and then yeah the
[01:03:20] boss faces so yeah and then yeah the interesting point is exactly right yeah
[01:03:21] interesting point is exactly right yeah so a is 2 by the way of losing so even
[01:03:23] so a is 2 by the way of losing so even in this case where a is trying to come
[01:03:25] in this case where a is trying to come up with the best mixed strategy he could
[01:03:27] up with the best mixed strategy he could do the best mixed strategy a is doing
[01:03:29] do the best mixed strategy a is doing is shown show 1 with probability 7 over
[01:03:32] is shown show 1 with probability 7 over 12 and show 2 with probability 5 over 12
[01:03:35] 12 and show 2 with probability 5 over 12 this comes from here even under that
[01:03:38] this comes from here even under that scenario aces is losing minus 1 over 2
[01:03:41] scenario aces is losing minus 1 over 2 ok all right ok so also I haven't solved
[01:03:49] ok all right ok so also I haven't solved a simultaneous game yeah that's right
[01:03:50] a simultaneous game yeah that's right like I have talked about the setting
[01:03:52] like I have talked about the setting where a place first so what if B plays
[01:03:56] where a place first so what if B plays first so I'm gonna swap this what if he
[01:04:00] first so I'm gonna swap this what if he plays first so a goes second B plays
[01:04:03] plays first so a goes second B plays first I'm gonna modify this one now ok
[01:04:11] first I'm gonna modify this one now ok he goes first
[01:04:13] he goes first a is going second he is going to start
[01:04:16] a is going second he is going to start is going to reveal this strategy his
[01:04:18] is going to reveal this strategy his strategy the strategy that B is going to
[01:04:20] strategy the strategy that B is going to reveal is also again I'm gonna with
[01:04:23] reveal is also again I'm gonna with probability P show you 1 the probability
[01:04:25] probability P show you 1 the probability 1 minus P show you show you 2 then a
[01:04:30] 1 minus P show you show you 2 then a place is trying to maximize and a has to
[01:04:36] place is trying to maximize and a has to play a pure strategy because of that
[01:04:39] play a pure strategy because of that like the best thing I can do is go into
[01:04:42] like the best thing I can do is go into the appear strategy always going to the
[01:04:44] the appear strategy always going to the Eider showing 1 or 2 and a is deciding
[01:04:46] Eider showing 1 or 2 and a is deciding which one but doesn't know yet and the
[01:04:48] which one but doesn't know yet and the values here are going to be exactly the
[01:04:50] values here are going to be exactly the same thing as third so there are five
[01:04:51] same thing as third so there are five point five P minus three minus seven P
[01:04:54] point five P minus three minus seven P plus four
[01:04:56] plus four all right so what's happening here so so
[01:04:59] all right so what's happening here so so in this case a is playing second what a
[01:05:02] in this case a is playing second what a likes to do is a likes to maximize
[01:05:04] likes to do is a likes to maximize between 5p minus 3 and minus 7 P plus 4
[01:05:09] between 5p minus 3 and minus 7 P plus 4 that's what a likes to do he is going
[01:05:12] that's what a likes to do he is going second sorry is going first
[01:05:14] second sorry is going first so then B has to minimize that and pick
[01:05:18] so then B has to minimize that and pick it P that minimizes that okay so these
[01:05:21] it P that minimizes that okay so these two are exactly the same two lines but
[01:05:23] two are exactly the same two lines but now I'm picking the maximum of them the
[01:05:25] now I'm picking the maximum of them the maximum of these two lines end up being
[01:05:27] maximum of these two lines end up being exactly the same point as before ends up
[01:05:30] exactly the same point as before ends up being exactly the same P is before and
[01:05:32] being exactly the same P is before and giving you exactly the same value as
[01:05:34] giving you exactly the same value as before so so this is also equal to minus
[01:05:36] before so so this is also equal to minus 1 over 12 so what this is telling me is
[01:05:41] 1 over 12 so what this is telling me is if you're playing a mixed strategy
[01:05:43] if you're playing a mixed strategy even if you reveal your best to make
[01:05:45] even if you reveal your best to make strategy at the beginning it doesn't
[01:05:47] strategy at the beginning it doesn't matter it actually doesn't matter if
[01:05:49] matter it actually doesn't matter if you're going first or second so like in
[01:05:52] you're going first or second so like in the more a game and you were playing if
[01:05:54] the more a game and you were playing if you were playing get make strategy and
[01:05:56] you were playing get make strategy and you would tell your opponent this is the
[01:05:57] you would tell your opponent this is the thing I'm gonna do and this is a mixed
[01:05:58] thing I'm gonna do and this is a mixed strategy actually like anything what's
[01:06:01] strategy actually like anything what's the optimal thing like like you didn't
[01:06:03] the optimal thing like like you didn't matter like if they don't know it or not
[01:06:04] matter like if they don't know it or not like you just still get the same value
[01:06:07] like you just still get the same value so again you get 5p minus 3 and minus 7
[01:06:11] so again you get 5p minus 3 and minus 7 P plus 4 and then now you're minimizing
[01:06:13] P plus 4 and then now you're minimizing or a maximum of these two lines maximum
[01:06:15] or a maximum of these two lines maximum of these two lines ends up being at the
[01:06:17] of these two lines ends up being at the same point and you pick a P that kind of
[01:06:20] same point and you pick a P that kind of maximizes that and you get the same
[01:06:23] maximizes that and you get the same value so this is called the Von Neumanns
[01:06:25] value so this is called the Von Neumanns theorem so this whole thing that you
[01:06:28] theorem so this whole thing that you just did over this one example there's a
[01:06:30] just did over this one example there's a theorem about it that says for every
[01:06:32] theorem about it that says for every simultaneous two player zero-sum game
[01:06:35] simultaneous two player zero-sum game with a finite number of actions the
[01:06:39] with a finite number of actions the order of place doesn't matter so he is
[01:06:42] order of place doesn't matter so he is playing second or B's playing first the
[01:06:45] playing second or B's playing first the values are going to be the same thing if
[01:06:46] values are going to be the same thing if you're minimizing over a maximum we're
[01:06:47] you're minimizing over a maximum we're maximizing minimum of that value it's
[01:06:50] maximizing minimum of that value it's going to be the same thing okay so this
[01:06:54] going to be the same thing okay so this is kind of the third thing that we just
[01:06:56] is kind of the third thing that we just learned which is one Newman's theorem
[01:07:06] learned which is one Newman's theorem which says if I'm writing a modification
[01:07:10] which says if I'm writing a modification of simpler shorter version of it safe
[01:07:13] of simpler shorter version of it safe playing makes strategy order of play
[01:07:18] playing makes strategy order of play doesn't matter and remember if you play
[01:07:27] doesn't matter and remember if you play mixed strategy your opponent is going to
[01:07:29] mixed strategy your opponent is going to play fewer strategy because this is like
[01:07:31] play fewer strategy because this is like this is the first point right if you
[01:07:35] this is the first point right if you play mixed strategy your opponent is
[01:07:37] play mixed strategy your opponent is going to follow a pure strategy I don't
[01:07:39] going to follow a pure strategy I don't want to like you for doing that board or
[01:07:46] want to like you for doing that board or anything
[01:07:47] anything one of the two answers like look valid
[01:07:49] one of the two answers like look valid I'll disappear one one like it will be
[01:07:51] I'll disappear one one like it will be either one or two and then in that case
[01:07:52] either one or two and then in that case the second music so in this case yeah so
[01:07:58] the second music so in this case yeah so so the thing is these two end up being
[01:08:00] so the thing is these two end up being equal so the way to it doesn't it
[01:08:02] equal so the way to it doesn't it doesn't matter because your way for you
[01:08:05] doesn't matter because your way for you to maximize this is going to be the
[01:08:07] to maximize this is going to be the point where the two end up being equal
[01:08:09] point where the two end up being equal so the two branches like if you actually
[01:08:11] so the two branches like if you actually plug in P equal to 7 over to 12 here
[01:08:15] plug in P equal to 7 over to 12 here like these two values end up being equal
[01:08:19] like these two values end up being equal I'm not an interpretation they're
[01:08:21] I'm not an interpretation they're actually equal and the reason that they
[01:08:23] actually equal and the reason that they end up being equal is you are trying to
[01:08:25] end up being equal is you are trying to minimize the thing that this guy is
[01:08:26] minimize the thing that this guy is trying to maximize so you're trying to
[01:08:29] trying to maximize so you're trying to pick the P that actually makes this
[01:08:30] pick the P that actually makes this thing equal so no matter what your
[01:08:32] thing equal so no matter what your opponent's does like you're gonna like
[01:08:35] opponent's does like you're gonna like get the best thing that you can do so so
[01:08:36] get the best thing that you can do so so yeah like think of it like this okay so
[01:08:38] yeah like think of it like this okay so I'm player a I'm still I still have a
[01:08:40] I'm player a I'm still I still have a choice my choice is to pick a P I want
[01:08:42] choice my choice is to pick a P I want to pick a P that I'm not gonna like lose
[01:08:45] to pick a P that I'm not gonna like lose as much what P should I pick I should
[01:08:47] as much what P should I pick I should pick a P that makes these choices the
[01:08:49] pick a P that makes these choices the same because if I pick a P that makes
[01:08:51] same because if I pick a P that makes this one higher than this one of course
[01:08:53] this one higher than this one of course the second player is going to make me
[01:08:54] the second player is going to make me lose and then go down the routes that's
[01:08:56] lose and then go down the routes that's that's better for the second player so
[01:08:58] that's better for the second player so the best thing that I can do here is
[01:08:59] the best thing that I can do here is make these two as equal as possible so
[01:09:02] make these two as equal as possible so then the second player whatever they
[01:09:04] then the second player whatever they choose choose one or two like it's gonna
[01:09:06] choose choose one or two like it's gonna be the same thing it's going to be those
[01:09:08] be the same thing it's going to be those does that make sense no expectations we
[01:09:12] does that make sense no expectations we multiplied by P and one was easier
[01:09:14] multiplied by P and one was easier saying like oh so in expectation you're
[01:09:17] saying like oh so in expectation you're saying when you're choosing P yeah yeah
[01:09:18] saying when you're choosing P yeah yeah so I'm just I'm treating P as a variable
[01:09:21] so I'm just I'm treating P as a variable that I'm deciding right like peas the
[01:09:22] that I'm deciding right like peas the thing I gotta be deciding so I'm player
[01:09:25] thing I gotta be deciding so I'm player a I gotta be citing a P that's not gonna
[01:09:28] a I gotta be citing a P that's not gonna be too bad for me like let's say I would
[01:09:30] be too bad for me like let's say I would pick a P that doesn't make these things
[01:09:32] pick a P that doesn't make these things equal let's say I don't know I would
[01:09:33] equal let's say I don't know I would pick your P that makes this guy I don't
[01:09:35] pick your P that makes this guy I don't know 10 and this makes this guy 5 the
[01:09:38] know 10 and this makes this guy 5 the second player is of course going to make
[01:09:40] second player is of course going to make me lose and of course is going to like
[01:09:41] me lose and of course is going to like pick the thing that's going to be the
[01:09:43] pick the thing that's going to be the worst thing for me so the best thing I
[01:09:45] worst thing for me so the best thing I can do is I can like make both of them I
[01:09:47] can do is I can like make both of them I don't know 7 so it's not gonna be as bad
[01:09:50] don't know 7 so it's not gonna be as bad so so that's kind of the idea all right
[01:09:53] so so that's kind of the idea all right so we move forward because there's so
[01:09:56] so we move forward because there's so much things happening all right so it's
[01:09:59] much things happening all right so it's okay so the kind of key idea here is
[01:10:01] okay so the kind of key idea here is revealing your
[01:10:02] revealing your optimal make strategy does not hurt you
[01:10:04] optimal make strategy does not hurt you which is kind of a cool idea the proof
[01:10:07] which is kind of a cool idea the proof of that is interesting if you're
[01:10:08] of that is interesting if you're interested in look at the notes you can
[01:10:10] interested in look at the notes you can use linear programming here the reason
[01:10:13] use linear programming here the reason kind of the intuition behind it is if
[01:10:15] kind of the intuition behind it is if you're playing mixed strategy the next
[01:10:17] you're playing mixed strategy the next person it has to play pure strategy and
[01:10:19] person it has to play pure strategy and you have n possible options for that
[01:10:21] you have n possible options for that keyword strategy so that creates n
[01:10:23] keyword strategy so that creates n constraints that you're putting in for
[01:10:25] constraints that you're putting in for your optimization you end up with a
[01:10:27] your optimization you end up with a single optimization with n constraints
[01:10:29] single optimization with n constraints and then you can use like linear
[01:10:30] and then you can use like linear programming duality to actually solve it
[01:10:32] programming duality to actually solve it so you could compute this using linear
[01:10:35] so you could compute this using linear programming and that's kind of the one
[01:10:37] programming and that's kind of the one LS here so so let's summarize what we
[01:10:39] LS here so so let's summarize what we have talked about so far so so we have
[01:10:41] have talked about so far so so we have talked about these simultaneous games
[01:10:43] talked about these simultaneous games and and we've talked about the setting
[01:10:46] and and we've talked about the setting where we have pure strategies and we saw
[01:10:48] where we have pure strategies and we saw that if you have pure strategies going
[01:10:49] that if you have pure strategies going second is better right going second is
[01:10:52] second is better right going second is better if you're just telling you like
[01:10:53] better if you're just telling you like what's the pure strategy you're using
[01:10:54] what's the pure strategy you're using right so that was none of the first
[01:10:56] right so that was none of the first point couple and then if you're using
[01:10:57] point couple and then if you're using mixed strategies it turns out it doesn't
[01:10:59] mixed strategies it turns out it doesn't matter if you're going first or a second
[01:11:01] matter if you're going first or a second you're telling them what you're mixed
[01:11:02] you're telling them what you're mixed best mixed strategy is and they're going
[01:11:05] best mixed strategy is and they're going to respond based on that so and that's
[01:11:07] to respond based on that so and that's the Von Neumanns minimax there all right
[01:11:10] the Von Neumanns minimax there all right so next ten minutes I want to spend a
[01:11:12] so next ten minutes I want to spend a little bit of time talking about non
[01:11:14] little bit of time talking about non zero sum games so so far we have talked
[01:11:16] zero sum games so so far we have talked about zero-sum games where it's either
[01:11:19] about zero-sum games where it's either minimax I get some reward you get the
[01:11:21] minimax I get some reward you get the negative of that or vice versa there are
[01:11:24] negative of that or vice versa there are also these other things called
[01:11:25] also these other things called collaborative games where we were just
[01:11:27] collaborative games where we were just supposed to maximizing something so we
[01:11:29] supposed to maximizing something so we both get like money out of it and that's
[01:11:32] both get like money out of it and that's kind of like a single optimization it's
[01:11:34] kind of like a single optimization it's a single maximization you can think of
[01:11:35] a single maximization you can think of it as playing search in real life you're
[01:11:38] it as playing search in real life you're kind of somewhere in between that and I
[01:11:40] kind of somewhere in between that and I want to motivate that by an example so I
[01:11:42] want to motivate that by an example so I want to be that but by this idea of
[01:11:44] want to be that but by this idea of prisoner's dilemma how many of you have
[01:11:46] prisoner's dilemma how many of you have heard of prisoner's dilemma ok good ok
[01:11:49] heard of prisoner's dilemma ok good ok so the idea of prisoner's dilemma is you
[01:11:51] so the idea of prisoner's dilemma is you have a prosecutor who asks a and B
[01:11:54] have a prosecutor who asks a and B individually if they will testify
[01:11:55] individually if they will testify against each other or not ok if both of
[01:11:58] against each other or not ok if both of them testified and both of them are
[01:12:00] them testified and both of them are sentenced to five years in jail
[01:12:01] sentenced to five years in jail if both of them refused and both of them
[01:12:04] if both of them refused and both of them are sentenced to one year in jail if one
[01:12:07] are sentenced to one year in jail if one testifies then he or she gets out for
[01:12:09] testifies then he or she gets out for free and and then the other one gets ten
[01:12:11] free and and then the other one gets ten years sentence
[01:12:12] years sentence play with your partner real quick
[01:12:33] okay so let's look at the payoff matrix
[01:12:35] okay so let's look at the payoff matrix so I think you kind of have an idea of
[01:12:37] so I think you kind of have an idea of how the game works so so you have two
[01:12:42] how the game works so so you have two players a or B each one of you have an
[01:12:45] players a or B each one of you have an option you can either testify or you can
[01:12:49] option you can either testify or you can refuse to testify so you can you can
[01:12:53] refuse to testify so you can you can testify and you can refuse to testify
[01:12:57] testify and you can refuse to testify and I'm going to create this payoff
[01:12:58] and I'm going to create this payoff matrix this payoff matrix is going to
[01:13:00] matrix this payoff matrix is going to have two entries now in each one of
[01:13:02] have two entries now in each one of these in these cells and then why is
[01:13:05] these in these cells and then why is that because we have a non zero sum game
[01:13:06] that because we have a non zero sum game before our payoff matrix only had one
[01:13:08] before our payoff matrix only had one entry because this was for player a
[01:13:10] entry because this was for player a player B would just get negative of that
[01:13:12] player B would just get negative of that but now for your a and B are getting
[01:13:14] but now for your a and B are getting different values so if both of us
[01:13:15] different values so if both of us testify then both of us get five years
[01:13:19] testify then both of us get five years jail right so hey gets five years in
[01:13:21] jail right so hey gets five years in jail he gets five years right if both of
[01:13:25] jail he gets five years right if both of us refuse and gets one year jail he gets
[01:13:29] us refuse and gets one year jail he gets one year jail one year one year gee and
[01:13:35] one year jail one year one year gee and then if it is a setting or one of us
[01:13:38] then if it is a setting or one of us testifies the other one refuses one of
[01:13:39] testifies the other one refuses one of us gets Nero the other one gets ten
[01:13:41] us gets Nero the other one gets ten years jail so if I refuse to testify
[01:13:44] years jail so if I refuse to testify then I get ten years and then B gets
[01:13:49] then I get ten years and then B gets zero and then in this case a gets 0 and
[01:13:53] zero and then in this case a gets 0 and B gets ten yeah so the payoff matrix is
[01:14:00] B gets ten yeah so the payoff matrix is now going to be for every player we are
[01:14:01] now going to be for every player we are gonna have a payoff matrix so so now we
[01:14:04] gonna have a payoff matrix so so now we have this this v value function which is
[01:14:06] have this this v value function which is a function of a player for policy a and
[01:14:08] a function of a player for policy a and policy B will be the utility for one
[01:14:10] policy B will be the utility for one particular player because you might be
[01:14:12] particular player because you might be looking the idea from perspective of
[01:14:13] looking the idea from perspective of different players okay so the one known
[01:14:16] different players okay so the one known as minimax theorem it doesn't really
[01:14:18] as minimax theorem it doesn't really apply here because we don't have the
[01:14:20] apply here because we don't have the zero sum game but you actually get
[01:14:22] zero sum game but you actually get something a little bit weaker and that's
[01:14:24] something a little bit weaker and that's the idea of Nash equilibrium so a Nash
[01:14:26] the idea of Nash equilibrium so a Nash equilibrium is set of policies PI star a
[01:14:29] equilibrium is set of policies PI star a and PI star B so that no player has an
[01:14:32] and PI star B so that no player has an incentive to change their strategy so
[01:14:35] incentive to change their strategy so what does that mean so what that means
[01:14:37] what does that mean so what that means is if you look at the value function
[01:14:40] is if you look at the value function from perspective employer a value phone
[01:14:42] from perspective employer a value phone from perspective player a at the Nash
[01:14:45] from perspective player a at the Nash equilibrium at PI star a and PI star B
[01:14:47] equilibrium at PI star a and PI star B is going to be greater than or equal to
[01:14:49] is going to be greater than or equal to value of any other policy PI a if you
[01:14:54] value of any other policy PI a if you fix pipe and at the same time the same
[01:14:58] fix pipe and at the same time the same thing is true for value of me so for
[01:15:00] thing is true for value of me so for agent p value of B Atma and Nash
[01:15:02] agent p value of B Atma and Nash equilibrium is going to be greater than
[01:15:04] equilibrium is going to be greater than or equal to a value of B at any other PI
[01:15:07] or equal to a value of B at any other PI B if if PI a fixes their Falls okay so
[01:15:11] B if if PI a fixes their Falls okay so what does that mean in this setting do
[01:15:12] what does that mean in this setting do we have a Nash equilibrium here so let's
[01:15:16] we have a Nash equilibrium here so let's say I start from here I start from a
[01:15:18] say I start from here I start from a equal to minus 10 B equal to 0 can I get
[01:15:21] equal to minus 10 B equal to 0 can I get this better and I make this better I did
[01:15:24] this better and I make this better I did I flip them I only flipped right 0 minus
[01:15:27] I flip them I only flipped right 0 minus n minus 10 0 okay so let's say I start
[01:15:32] n minus 10 0 okay so let's say I start from here can I get can I get this
[01:15:34] from here can I get can I get this better and I make this better I start
[01:15:39] better and I make this better I start from this cell a gets 0 years of jail
[01:15:42] from this cell a gets 0 years of jail that's pretty good he gets 10 years of
[01:15:45] that's pretty good he gets 10 years of ill that's not that great so he has an
[01:15:47] ill that's not that great so he has an incentive to change that right like he
[01:15:50] incentive to change that right like he has an incentive to actually move in
[01:15:52] has an incentive to actually move in this direction right so B has an
[01:15:55] this direction right so B has an incentive to get 5 years jail instead of
[01:15:56] incentive to get 5 years jail instead of 10 years similar thing here what if you
[01:16:00] 10 years similar thing here what if you start here a has 1 year jail B has 1
[01:16:03] start here a has 1 year jail B has 1 year jail they have an incentive to
[01:16:05] year jail they have an incentive to change this now and get 0 years jail he
[01:16:09] change this now and get 0 years jail he has an incentive to change this and get
[01:16:11] has an incentive to change this and get 0 and we end up with this cell where
[01:16:15] 0 and we end up with this cell where like you don't have any incentive to
[01:16:17] like you don't have any incentive to change our strategy so we have one Nash
[01:16:19] change our strategy so we have one Nash equilibrium here and that one Nash
[01:16:21] equilibrium here and that one Nash equilibrium here is both of us are
[01:16:23] equilibrium here is both of us are testifying and both of us are getting 5
[01:16:25] testifying and both of us are getting 5 years jail it's kind of interesting
[01:16:27] years jail it's kind of interesting because there is like a socially better
[01:16:29] because there is like a socially better choice to have here I like both of us
[01:16:30] choice to have here I like both of us like if both of us were with your fuse
[01:16:32] like if both of us were with your fuse like you would each had one year jail
[01:16:34] like you would each had one year jail but that's not gonna be Nash equilibria
[01:16:38] all right so there's a theorem which is
[01:16:44] all right so there's a theorem which is Ash's existence theorem which basically
[01:16:46] Ash's existence theorem which basically says if any finite player game with
[01:16:49] says if any finite player game with finite number of actions if you have any
[01:16:50] finite number of actions if you have any finite for your game we find out number
[01:16:52] finite for your game we find out number of actions then there exists at least
[01:16:54] of actions then there exists at least one
[01:16:55] one Shaku Libyan and then this is usually
[01:16:57] Shaku Libyan and then this is usually one mixed strategy Nash equilibria on
[01:17:00] one mixed strategy Nash equilibria on mixed strategy Nash equilibrium in this
[01:17:02] mixed strategy Nash equilibrium in this case it's actually a pure strategy Nash
[01:17:04] case it's actually a pure strategy Nash equilibria but but in general there's at
[01:17:07] equilibria but but in general there's at least one Nash equilibrium if you have
[01:17:09] least one Nash equilibrium if you have game this one okay all right
[01:17:14] game this one okay all right so so let's look at a few other examples
[01:17:16] so so let's look at a few other examples two-finger Maura what would be a Nash
[01:17:19] two-finger Maura what would be a Nash equilibrium for that so we just actually
[01:17:23] equilibrium for that so we just actually solved the dad's using the minimax
[01:17:25] solved the dad's using the minimax oneness minimax theorem right so so it
[01:17:27] oneness minimax theorem right so so it would be if you're playing it makes
[01:17:29] would be if you're playing it makes strategy of seven over 12 and 5 over 12
[01:17:33] strategy of seven over 12 and 5 over 12 you might you might kind of modify your
[01:17:36] you might you might kind of modify your two finger Mora game and make it
[01:17:37] two finger Mora game and make it collaborative so in a collaborative
[01:17:39] collaborative so in a collaborative setting and what that means is we both
[01:17:41] setting and what that means is we both get two dollars or we both get four
[01:17:43] get two dollars or we both get four dollars or we both lose three dollars so
[01:17:46] dollars or we both lose three dollars so so a collaborative two finger morai game
[01:17:49] so a collaborative two finger morai game it's not a zero-sum game anymore and and
[01:17:52] it's not a zero-sum game anymore and and you have two Nash equilibria so you
[01:17:55] you have two Nash equilibria so you would have a setting where a and B both
[01:17:57] would have a setting where a and B both of them play one and the values two or a
[01:17:59] of them play one and the values two or a and B both of them play two values 4 and
[01:18:04] and B both of them play two values 4 and then prisoner's dilemma it's the case
[01:18:07] then prisoner's dilemma it's the case where both of them passive life we just
[01:18:08] where both of them passive life we just just saw that on the world alright okay
[01:18:13] just saw that on the world alright okay so the summary so far is we've talked
[01:18:15] so the summary so far is we've talked about simultaneous zero-sum games we
[01:18:17] about simultaneous zero-sum games we talked about this one knowing mins
[01:18:19] talked about this one knowing mins minimax theorem which has like multiple
[01:18:22] minimax theorem which has like multiple minimax strategies and the single game
[01:18:25] minimax strategies and the single game value like we had a single game value
[01:18:27] value like we had a single game value because it was zero-sum but in the case
[01:18:29] because it was zero-sum but in the case of non zero sum games we would have
[01:18:32] of non zero sum games we would have something that's slightly weaker that's
[01:18:33] something that's slightly weaker that's Nash's existence theorem we would still
[01:18:36] Nash's existence theorem we would still have multiple Nash equilibria we could
[01:18:38] have multiple Nash equilibria we could have multiple Nash equilibria but we
[01:18:40] have multiple Nash equilibria but we have multiple we also have multiple game
[01:18:42] have multiple we also have multiple game values from depending on whose
[01:18:44] values from depending on whose perspective you're looking at so this
[01:18:47] perspective you're looking at so this kind of was just a brief like short
[01:18:50] kind of was just a brief like short introduction to game theory and econ
[01:18:52] introduction to game theory and econ there is a huge literature around
[01:18:53] there is a huge literature around different types of games in game theory
[01:18:55] different types of games in game theory and economics if you're interested in
[01:18:57] and economics if you're interested in that take classes and yeah there are
[01:19:03] that take classes and yeah there are other types of games too like security
[01:19:04] other types of games too like security games or resource allocation games that
[01:19:07] games or resource allocation games that have
[01:19:08] have some characteristics that are similar to
[01:19:09] some characteristics that are similar to things you have talked about you're
[01:19:11] things you have talked about you're interested in any of them maybe you can
[01:19:12] interested in any of them maybe you can take a look at them would be useful for
[01:19:14] take a look at them would be useful for projects and with that I'll see you guys
[01:19:18] projects and with that I'll see you guys next time


================================================================================
LECTURE 023
================================================================================

Constraint Satisfaction Problems (CSPs) 1 - Overview | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=-IO4fPO0rxk

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about constraint satisfaction problems
[00:00:12] so before we get into constrained
[00:00:14] so before we get into constrained satisfaction problems i just want to
[00:00:15] satisfaction problems i just want to revisit where we've been in the course
[00:00:17] revisit where we've been in the course we started off with machine learning and
[00:00:20] we started off with machine learning and applied to reflex based models such as
[00:00:22] applied to reflex based models such as classification or regression where the
[00:00:25] classification or regression where the goal is just to output a single number
[00:00:27] goal is just to output a single number or a label
[00:00:29] or a label and then we looked at state-based models
[00:00:32] and then we looked at state-based models in which case the goal was to output a
[00:00:35] in which case the goal was to output a solution path
[00:00:36] solution path and we thought in terms of states
[00:00:38] and we thought in terms of states actions and costs or rewards
[00:00:41] actions and costs or rewards and now we're going to embark on a new
[00:00:43] and now we're going to embark on a new journey through variable based models
[00:00:46] journey through variable based models it's going to be a new paradigm for
[00:00:48] it's going to be a new paradigm for modeling in which case we're going to
[00:00:50] modeling in which case we're going to think in terms of variables and factors
[00:00:55] so
[00:00:55] so the heart of variable based models is an
[00:00:58] the heart of variable based models is an object called a factor graph
[00:01:00] object called a factor graph we're going to define factor graphs
[00:01:02] we're going to define factor graphs formally in the next module but for now
[00:01:05] formally in the next module but for now let's just try to give some intuition so
[00:01:07] let's just try to give some intuition so a factor graph
[00:01:09] a factor graph consists of a set of variables usually
[00:01:12] consists of a set of variables usually denoted x1 x2
[00:01:15] denoted x1 x2 x3 these are in circles
[00:01:18] x3 these are in circles and a factor graph also contains a set
[00:01:20] and a factor graph also contains a set of factors
[00:01:22] of factors usually denoted f1 f2 f3 f4 these are
[00:01:26] usually denoted f1 f2 f3 f4 these are going to be in squares
[00:01:27] going to be in squares so now each factor as you'll notice here
[00:01:30] so now each factor as you'll notice here touch is a subset of the variables
[00:01:34] touch is a subset of the variables and so each factor is going to express
[00:01:36] and so each factor is going to express some sort of preference or
[00:01:38] some sort of preference or determine the relationship that a subset
[00:01:41] determine the relationship that a subset of variables
[00:01:42] of variables has so for example f2 is going to
[00:01:45] has so for example f2 is going to specify how x1 and x2 are related and f3
[00:01:49] specify how x1 and x2 are related and f3 is going to specify how x2 and x3 are
[00:01:51] is going to specify how x2 and x3 are related and f4 is going to specify how x
[00:01:54] related and f4 is going to specify how x 3 is
[00:01:55] 3 is should be related
[00:01:57] should be related the objective of a constraint
[00:01:59] the objective of a constraint satisfaction problem is to find the best
[00:02:01] satisfaction problem is to find the best assignment of values
[00:02:03] assignment of values to the variables where we're going to
[00:02:05] to the variables where we're going to define what best means in a second
[00:02:09] define what best means in a second so let's look at an example
[00:02:12] so let's look at an example of
[00:02:14] of a problem that can be solved via
[00:02:16] a problem that can be solved via constraint satisfaction problem
[00:02:18] constraint satisfaction problem so here's math coloring
[00:02:20] so here's math coloring a classic problem here is a
[00:02:23] a classic problem here is a map of australia
[00:02:25] map of australia we have a number of provinces um seven
[00:02:28] we have a number of provinces um seven to be exact
[00:02:30] to be exact and each province on western australia
[00:02:32] and each province on western australia northern territory south australia etc
[00:02:35] northern territory south australia etc has to be assigned a color
[00:02:38] has to be assigned a color and the question is how can we color
[00:02:40] and the question is how can we color each province either red green or blue
[00:02:42] each province either red green or blue so that no two neighboring provinces
[00:02:45] so that no two neighboring provinces have the same color so we don't want
[00:02:47] have the same color so we don't want western australia and northern territory
[00:02:49] western australia and northern territory to have the same color
[00:02:53] so here is one possible solution we can
[00:02:56] so here is one possible solution we can color western australia red northern
[00:02:58] color western australia red northern territory green and so on and you can
[00:03:00] territory green and so on and you can double check that no two adjacent
[00:03:02] double check that no two adjacent provinces have the same color here
[00:03:06] provinces have the same color here so now this is a simple enough problem
[00:03:07] so now this is a simple enough problem that we can just solve it by hand
[00:03:10] that we can just solve it by hand but as usual we want to ask what are the
[00:03:13] but as usual we want to ask what are the algorithmic principles or how do we come
[00:03:15] algorithmic principles or how do we come up with something more general to solve
[00:03:17] up with something more general to solve problems such as these
[00:03:19] problems such as these when we encounter them
[00:03:21] when we encounter them so before we talk about how we do this
[00:03:23] so before we talk about how we do this with constraint satisfaction problems i
[00:03:25] with constraint satisfaction problems i want to revisit how we might do it with
[00:03:28] want to revisit how we might do it with as a state-based model
[00:03:30] as a state-based model um because that's the hammer we have
[00:03:34] um because that's the hammer we have so um
[00:03:35] so um let's try to cast this as a search
[00:03:37] let's try to cast this as a search problem
[00:03:39] problem so
[00:03:40] so we're going to start with initial state
[00:03:43] we're going to start with initial state and this state is going to represent
[00:03:45] and this state is going to represent not having assigned any promises any
[00:03:47] not having assigned any promises any colors
[00:03:49] colors and then from the state we can take
[00:03:50] and then from the state we can take three possible actions we can grab wa
[00:03:53] three possible actions we can grab wa and assign it red
[00:03:55] and assign it red we can grab wa and assign it green or
[00:03:58] we can grab wa and assign it green or you can grab wa and sign it blue
[00:04:01] you can grab wa and sign it blue and from each of these
[00:04:03] and from each of these points we can take nt and sign it red
[00:04:06] points we can take nt and sign it red green or blue
[00:04:08] green or blue red green or blue red green or blue
[00:04:11] red green or blue red green or blue and you can see here that this is a
[00:04:13] and you can see here that this is a search tree as the ones that we have
[00:04:16] search tree as the ones that we have studied before
[00:04:17] studied before and at the very bottom
[00:04:20] and at the very bottom of the search tree
[00:04:21] of the search tree we have a complete assignment to all the
[00:04:25] we have a complete assignment to all the variables
[00:04:27] variables and each assignment to all the variables
[00:04:29] and each assignment to all the variables is going to be labeled with either a
[00:04:32] is going to be labeled with either a zero
[00:04:33] zero if it is inconsistent in other words it
[00:04:36] if it is inconsistent in other words it doesn't solve the problem here the
[00:04:37] doesn't solve the problem here the problem is that nt and sa are assigned
[00:04:40] problem is that nt and sa are assigned the same color that's bad
[00:04:42] the same color that's bad here's another complete assignment this
[00:04:44] here's another complete assignment this is also bad because wna and nt share the
[00:04:47] is also bad because wna and nt share the same color
[00:04:49] same color here is an assignment that is good and
[00:04:51] here is an assignment that is good and you can verify that all the provinces
[00:04:54] you can verify that all the provinces that are neighboring each other have
[00:04:56] that are neighboring each other have different colors and this is going to be
[00:04:58] different colors and this is going to be denoted with a weight of one
[00:05:02] denoted with a weight of one so in general each
[00:05:04] so in general each state here represents a partial
[00:05:06] state here represents a partial assignment of colors to variables
[00:05:11] assignment of colors to variables and at the end of the day we can simply
[00:05:14] and at the end of the day we can simply return any leaf that is consistent
[00:05:16] return any leaf that is consistent for example this one
[00:05:21] so this is a perfectly fine way of
[00:05:24] so this is a perfectly fine way of solving
[00:05:25] solving this problem
[00:05:26] this problem and it goes to show how powerful these
[00:05:28] and it goes to show how powerful these state-based models can be
[00:05:31] state-based models can be just to recap the state here is a
[00:05:33] just to recap the state here is a partial assignment of colors to
[00:05:35] partial assignment of colors to provinces
[00:05:36] provinces and from each state an action
[00:05:39] and from each state an action assigns the next uncolored province a
[00:05:42] assigns the next uncolored province a compatible color
[00:05:43] compatible color so what's missing why
[00:05:45] so what's missing why are we talking about this when we
[00:05:47] are we talking about this when we already know how to solve it using a
[00:05:48] already know how to solve it using a state-based model
[00:05:50] state-based model well
[00:05:51] well the question is can we do better than
[00:05:53] the question is can we do better than this and the answer is going to be yes
[00:05:55] this and the answer is going to be yes because there is more problem structure
[00:05:58] because there is more problem structure let me say what i mean by that
[00:06:00] let me say what i mean by that so notice that in this problem there's
[00:06:04] so notice that in this problem there's just a bunch of provinces they need to
[00:06:06] just a bunch of provinces they need to get us all assigned colors it doesn't
[00:06:08] get us all assigned colors it doesn't matter which order i assign
[00:06:11] matter which order i assign the colors
[00:06:12] the colors in other words the variable ordering
[00:06:14] in other words the variable ordering doesn't affect correctness
[00:06:16] doesn't affect correctness which means that we can
[00:06:18] which means that we can not just stick with a fixed ordering but
[00:06:20] not just stick with a fixed ordering but we can optimize this ordering and this
[00:06:23] we can optimize this ordering and this is something that inference algorithm
[00:06:24] is something that inference algorithm can do for us
[00:06:26] can do for us and secondly
[00:06:27] and secondly the variables here are interdependent in
[00:06:31] the variables here are interdependent in only a local way
[00:06:33] only a local way and we can decompose the problem
[00:06:36] and we can decompose the problem so for example
[00:06:38] so for example here we see that
[00:06:40] here we see that tasmania
[00:06:42] tasmania is completely separated from the rest of
[00:06:44] is completely separated from the rest of australia which means that we can
[00:06:46] australia which means that we can effectively solve the two
[00:06:49] effectively solve the two separate independent problems separately
[00:06:51] separate independent problems separately and just combine the solutions
[00:06:54] and just combine the solutions and this is as we'll see later
[00:06:56] and this is as we'll see later is is great because allow us to really
[00:06:59] is is great because allow us to really speed up search
[00:07:03] so variable space models
[00:07:06] so variable space models allow us to capture these two additional
[00:07:09] allow us to capture these two additional pieces of structure
[00:07:11] pieces of structure variables based models are umbrella term
[00:07:13] variables based models are umbrella term that include constraint satisfaction
[00:07:15] that include constraint satisfaction problems markov networks and bayesian
[00:07:18] problems markov networks and bayesian networks which all of which we're going
[00:07:19] networks which all of which we're going to get through over the next few weeks
[00:07:23] to get through over the next few weeks and the key idea
[00:07:25] and the key idea behind variable space models is we want
[00:07:27] behind variable space models is we want to think in terms of variables and in
[00:07:30] to think in terms of variables and in variables based models a solution to a
[00:07:33] variables based models a solution to a problem
[00:07:34] problem is simply an assignment to the variables
[00:07:37] is simply an assignment to the variables and so when you're modeling using
[00:07:39] and so when you're modeling using various models we want to set up a set
[00:07:41] various models we want to set up a set of variables so that the solution is an
[00:07:44] of variables so that the solution is an assignment to the variables
[00:07:47] assignment to the variables and the decisions about how to choose
[00:07:50] and the decisions about how to choose the ordering of the variables and how to
[00:07:53] the ordering of the variables and how to determine which
[00:07:54] determine which variables to set first this is going to
[00:07:57] variables to set first this is going to be chosen by the inference algorithm
[00:08:00] be chosen by the inference algorithm and the key idea here is that you can
[00:08:03] and the key idea here is that you can think about variable based models as a
[00:08:05] think about variable based models as a higher level modeling language than
[00:08:07] higher level modeling language than spade based models so here's an
[00:08:09] spade based models so here's an imperfect analogy from programming
[00:08:11] imperfect analogy from programming languages
[00:08:12] languages so if you were just trying to solve a
[00:08:15] so if you were just trying to solve a problem directly in an ad hoc way that's
[00:08:17] problem directly in an ad hoc way that's kind of like writing an assembly you
[00:08:19] kind of like writing an assembly you just kind of go at it
[00:08:21] just kind of go at it um
[00:08:22] um if you were using um
[00:08:24] if you were using um you know c or c plus plus
[00:08:27] you know c or c plus plus that's kind of like using state-based
[00:08:29] that's kind of like using state-based models it gives you a higher level
[00:08:31] models it gives you a higher level abstraction which is powerful um and
[00:08:34] abstraction which is powerful um and allows you to save a lot of kind of
[00:08:36] allows you to save a lot of kind of headaches
[00:08:38] headaches um but variable based models are kind of
[00:08:40] um but variable based models are kind of even a higher level language like let's
[00:08:42] even a higher level language like let's say python
[00:08:44] say python which allows you to think
[00:08:45] which allows you to think um
[00:08:46] um purely in terms of kind of the variables
[00:08:49] purely in terms of kind of the variables and the modeling and let the inference
[00:08:51] and the modeling and let the inference algorithm do more of the work which is
[00:08:52] algorithm do more of the work which is always good because then you can spend
[00:08:54] always good because then you can spend more time doing the fun stuff which is a
[00:08:56] more time doing the fun stuff which is a lot line
[00:08:59] so
[00:09:00] so i'm going to talk about first constraint
[00:09:02] i'm going to talk about first constraint satisfaction problems constraint
[00:09:04] satisfaction problems constraint satisfaction problems appear in a number
[00:09:06] satisfaction problems appear in a number of applications most of which revolve
[00:09:09] of applications most of which revolve around large-scale logistics scheduling
[00:09:11] around large-scale logistics scheduling and supply chain management so companies
[00:09:14] and supply chain management so companies such as amazon have to figure out how to
[00:09:16] such as amazon have to figure out how to put packages on vehicles and deliver
[00:09:19] put packages on vehicles and deliver them to customers
[00:09:21] them to customers and at the same time minimizing costs
[00:09:23] and at the same time minimizing costs and meeting all those promise delivery
[00:09:25] and meeting all those promise delivery times
[00:09:26] times and so here the variables might be the
[00:09:29] and so here the variables might be the assignment of packages to vehicles and
[00:09:33] assignment of packages to vehicles and the factors would include
[00:09:35] the factors would include travel times and various costs
[00:09:38] travel times and various costs so ride sharing services such as uber
[00:09:40] so ride sharing services such as uber and lyft also have to
[00:09:42] and lyft also have to figure out how to best assign drivers to
[00:09:44] figure out how to best assign drivers to riders and all these are extensions of a
[00:09:46] riders and all these are extensions of a classical vehicle routing problem
[00:09:49] classical vehicle routing problem um here's another example from
[00:09:51] um here's another example from sports scheduling so the nfl every year
[00:09:54] sports scheduling so the nfl every year they have to schedule
[00:09:56] they have to schedule which teams play which other teams and
[00:09:58] which teams play which other teams and when these games are going to be
[00:10:01] when these games are going to be held
[00:10:02] held and the schedule here should minimize
[00:10:04] and the schedule here should minimize travel times between uh of teams
[00:10:07] travel times between uh of teams um they have to be a time where they fit
[00:10:09] um they have to be a time where they fit the tv broadcast schedule
[00:10:11] the tv broadcast schedule um you want to be fair against
[00:10:13] um you want to be fair against across teams and so on
[00:10:16] across teams and so on so um other problem scheduling problems
[00:10:19] so um other problem scheduling problems such as these also involve assigning
[00:10:21] such as these also involve assigning courses
[00:10:23] courses to slots so the registrar office has a
[00:10:26] to slots so the registrar office has a number of courses that need to be
[00:10:27] number of courses that need to be offered every quarter and they have to
[00:10:29] offered every quarter and they have to figure out which classrooms to have
[00:10:31] figure out which classrooms to have these courses in and at what's various
[00:10:34] these courses in and at what's various time slots again training off various
[00:10:36] time slots again training off various constraints like preferences and
[00:10:39] constraints like preferences and availability
[00:10:42] availability so a final application of
[00:10:45] so a final application of constraint satisfaction problems is a
[00:10:47] constraint satisfaction problems is a little bit different and this is uh
[00:10:49] little bit different and this is uh called the formal verification of
[00:10:52] called the formal verification of circuits and programs so say you have a
[00:10:54] circuits and programs so say you have a computer program and you want to prove
[00:10:57] computer program and you want to prove that this program
[00:10:58] that this program is correct let's say the program is
[00:11:00] is correct let's say the program is trying to do something like sort numbers
[00:11:02] trying to do something like sort numbers um
[00:11:03] um so here what you can do is
[00:11:06] so here what you can do is um normally you would let's say test the
[00:11:08] um normally you would let's say test the program design a bunch of test cases run
[00:11:10] program design a bunch of test cases run the program and see what happens but
[00:11:12] the program and see what happens but this how do you know for sure that it
[00:11:14] this how do you know for sure that it works on all inputs
[00:11:16] works on all inputs so this is where verification comes in
[00:11:18] so this is where verification comes in you want to actually check that it works
[00:11:20] you want to actually check that it works for all inputs
[00:11:22] for all inputs um so the way you would set this up is
[00:11:24] um so the way you would set this up is that you define a set of variables which
[00:11:26] that you define a set of variables which correspond to the unknown inputs to the
[00:11:29] correspond to the unknown inputs to the program
[00:11:30] program and then the factors encode the program
[00:11:32] and then the factors encode the program itself it's going to encode how
[00:11:35] itself it's going to encode how execution proceeds line to line
[00:11:38] execution proceeds line to line and then you're going to ask the
[00:11:40] and then you're going to ask the question whether there exists a program
[00:11:42] question whether there exists a program input that produces an error or an
[00:11:45] input that produces an error or an incorrect result
[00:11:47] incorrect result so unlike the other
[00:11:49] so unlike the other applications of csps where you're trying
[00:11:52] applications of csps where you're trying to find a satisfying assignment
[00:11:55] to find a satisfying assignment in form of vacation you're trying to
[00:11:57] in form of vacation you're trying to prove the
[00:11:58] prove the that no such satisfying
[00:12:00] that no such satisfying assignment exists because that would
[00:12:02] assignment exists because that would mean an error in your program
[00:12:06] so here is a road map for the rest of
[00:12:09] so here is a road map for the rest of the modules on csps
[00:12:12] the modules on csps so first we're going to talk about the
[00:12:14] so first we're going to talk about the definition of a constraint satisfaction
[00:12:17] definition of a constraint satisfaction problem and factor graphs do it more
[00:12:19] problem and factor graphs do it more formally then we're going to give a few
[00:12:21] formally then we're going to give a few examples of csps
[00:12:23] examples of csps then we're going to move over to
[00:12:24] then we're going to move over to inference
[00:12:25] inference we're just going to start by
[00:12:27] we're just going to start by talking about backtracking search
[00:12:29] talking about backtracking search which is in the worst case exponential
[00:12:31] which is in the worst case exponential time unfortunately
[00:12:34] time unfortunately but but there are a number of ways to
[00:12:35] but but there are a number of ways to speed up search
[00:12:37] speed up search taking full advantage of the fact that
[00:12:38] taking full advantage of the fact that we can assign variables in any order we
[00:12:40] we can assign variables in any order we can look at dynamic order
[00:12:43] can look at dynamic order ordering which we're using heuristics to
[00:12:45] ordering which we're using heuristics to figure out which variables assigned
[00:12:47] figure out which variables assigned first
[00:12:48] first and then we're going to look at a
[00:12:49] and then we're going to look at a pruning strategy based on our
[00:12:51] pruning strategy based on our consistency which is going to allow us
[00:12:54] consistency which is going to allow us to prune out various values
[00:12:56] to prune out various values for each of the variables which are not
[00:12:59] for each of the variables which are not promising to explore
[00:13:01] promising to explore so that dynamic ordering can be much
[00:13:03] so that dynamic ordering can be much more effective
[00:13:06] more effective but in case you're impatient and don't
[00:13:08] but in case you're impatient and don't want to wait a next month of time but
[00:13:10] want to wait a next month of time but you're satisfied with an approximate
[00:13:13] you're satisfied with an approximate solution you can also do approximate
[00:13:15] solution you can also do approximate search so here there's two algorithms
[00:13:17] search so here there's two algorithms beam search
[00:13:18] beam search which is kind of a
[00:13:21] which is kind of a extension of the greedy search algorithm
[00:13:24] extension of the greedy search algorithm but it's a little bit smarter it's going
[00:13:26] but it's a little bit smarter it's going to small explore only a small fraction
[00:13:28] to small explore only a small fraction of the exponentially sized search tree
[00:13:31] of the exponentially sized search tree and local search is going to take an
[00:13:33] and local search is going to take an initial assignment to all the variables
[00:13:35] initial assignment to all the variables and just trying to improve it by
[00:13:37] and just trying to improve it by changing
[00:13:38] changing one variable at a time
[00:13:40] one variable at a time all right so that's it for this overview
[00:13:42] all right so that's it for this overview module
[00:13:49] you


================================================================================
LECTURE 024
================================================================================

Constraint Satisfaction Problems (CSPs) 2 - Definitions | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=uj5wCcHsSlA

---

Transcript

[00:00:05] hi in this module i'm going to formally
[00:00:07] hi in this module i'm going to formally define constraint satisfaction problems
[00:00:09] define constraint satisfaction problems and the more general notion of a factor
[00:00:11] and the more general notion of a factor graph
[00:00:13] graph so let's begin with an example a voting
[00:00:16] so let's begin with an example a voting example so let's imagine there are three
[00:00:18] example so let's imagine there are three people
[00:00:19] people person one person two and person three
[00:00:23] person one person two and person three and each one is going to cast the vote
[00:00:25] and each one is going to cast the vote either blue or red blue red
[00:00:28] either blue or red blue red blue or red and we know something about
[00:00:31] blue or red and we know something about these people we know that person one is
[00:00:34] these people we know that person one is definitely going to vote blue here and
[00:00:36] definitely going to vote blue here and we know that person three is gonna is
[00:00:39] we know that person three is gonna is leaning red
[00:00:41] leaning red we also know that person one and person
[00:00:42] we also know that person one and person two are really close friends so they
[00:00:44] two are really close friends so they must agree on their vote whereas person
[00:00:46] must agree on their vote whereas person two and person three are mere
[00:00:47] two and person three are mere acquaintances and their votes only tend
[00:00:50] acquaintances and their votes only tend to agree
[00:00:52] to agree so the question is um how are all these
[00:00:55] so the question is um how are all these people going to influence each other and
[00:00:58] people going to influence each other and ultimately cast votes
[00:01:00] ultimately cast votes so we can model this problem using a
[00:01:02] so we can model this problem using a factor graph
[00:01:04] factor graph we're going to define
[00:01:05] we're going to define a set of variables x1 for person 1 x2
[00:01:09] a set of variables x1 for person 1 x2 for person 2 x3 for person three
[00:01:13] for person 2 x3 for person three and we're going to define a set of
[00:01:15] and we're going to define a set of factors that capture each of these four
[00:01:19] factors that capture each of these four uh constraints or preferences
[00:01:21] uh constraints or preferences so let's begin with f1 f1 is going to
[00:01:24] so let's begin with f1 f1 is going to capture the fact that person 1 is
[00:01:26] capture the fact that person 1 is definitely blue
[00:01:28] definitely blue so
[00:01:29] so i'm going to write f1 as
[00:01:31] i'm going to write f1 as specifying as a table specifying for
[00:01:34] specifying as a table specifying for each value
[00:01:35] each value of x1
[00:01:37] of x1 i'm going to specify a number so f1 of
[00:01:41] i'm going to specify a number so f1 of x1
[00:01:42] x1 is going to be 0 if x1 is red and it's
[00:01:46] is going to be 0 if x1 is red and it's going to be 1 if x1 is b
[00:01:49] going to be 1 if x1 is b or blue
[00:01:50] or blue and this captures the fact that
[00:01:52] and this captures the fact that 0 means
[00:01:54] 0 means no way this is going to happen and 1
[00:01:56] no way this is going to happen and 1 means it's ok
[00:01:58] means it's ok so mathematically i can write this
[00:02:01] so mathematically i can write this factor f1 as
[00:02:03] factor f1 as an indicator function of x1 equals b
[00:02:07] an indicator function of x1 equals b so now i'm going to write these
[00:02:08] so now i'm going to write these indicator functions without
[00:02:10] indicator functions without usually you would write a 1 here on
[00:02:12] usually you would write a 1 here on because it's just going to drop it
[00:02:14] because it's just going to drop it for notational simplicity
[00:02:18] so let's look at
[00:02:21] so let's look at leaning red so this factor is going to
[00:02:23] leaning red so this factor is going to be x f4 and this is also going to
[00:02:26] be x f4 and this is also going to correspond to a table where for every
[00:02:29] correspond to a table where for every possible value of x3
[00:02:31] possible value of x3 um i'm going to specify a value so r is
[00:02:35] um i'm going to specify a value so r is going to be 2 and b is going to be 1.
[00:02:37] going to be 2 and b is going to be 1. and mathematically this is going to be
[00:02:40] and mathematically this is going to be f4 is equal to x3 equals r this
[00:02:43] f4 is equal to x3 equals r this indicator a function plus the smoothing
[00:02:46] indicator a function plus the smoothing constant of 1. so remember this
[00:02:48] constant of 1. so remember this indicator is going to return one or zero
[00:02:51] indicator is going to return one or zero depending on whether it's condition is
[00:02:53] depending on whether it's condition is true or false and i'm adding one so i
[00:02:55] true or false and i'm adding one so i offset that to a two or a one
[00:02:58] offset that to a two or a one so intuitively you can think about this
[00:03:00] so intuitively you can think about this as person three
[00:03:02] as person three prefers
[00:03:04] prefers r to be maybe twice as much
[00:03:09] so now let's look at these other factors
[00:03:11] so now let's look at these other factors so f2 is going to
[00:03:14] so f2 is going to represent the fact that person 1 and
[00:03:17] represent the fact that person 1 and person 2 have to agree
[00:03:20] person 2 have to agree so again i'm going to look at all the
[00:03:22] so again i'm going to look at all the possible assignments to the variables in
[00:03:25] possible assignments to the variables in the scope of
[00:03:27] the scope of f2 so these two variables x1 x2
[00:03:31] f2 so these two variables x1 x2 and for every value i'm going to assign
[00:03:34] and for every value i'm going to assign a particular non-negative number so here
[00:03:37] a particular non-negative number so here um rr i'm going to say that's one it's
[00:03:40] um rr i'm going to say that's one it's okay they agree and if they don't agree
[00:03:42] okay they agree and if they don't agree i'm going to return 0 because i really
[00:03:44] i'm going to return 0 because i really don't like that and
[00:03:46] don't like that and if they return b that's agree so that's
[00:03:48] if they return b that's agree so that's a 1.
[00:03:49] a 1. so more succinctly i can write this
[00:03:51] so more succinctly i can write this factor as f2 as
[00:03:53] factor as f2 as x1 equals x2
[00:03:56] x1 equals x2 and now finally
[00:03:58] and now finally for f3 f3 is going to capture whether x2
[00:04:02] for f3 f3 is going to capture whether x2 and x3 tend to agree
[00:04:05] and x3 tend to agree and this table is going to look like
[00:04:07] and this table is going to look like this for x2 and x3
[00:04:10] this for x2 and x3 if they're both r i'm going to return 3
[00:04:13] if they're both r i'm going to return 3 if they're different i'm going to return
[00:04:14] if they're different i'm going to return 2
[00:04:16] 2 and if they're all both b then i'm going
[00:04:18] and if they're all both b then i'm going to return 3.
[00:04:20] to return 3. so mathematically
[00:04:21] so mathematically this factor is going to be
[00:04:24] this factor is going to be indicated function of whether x2 equals
[00:04:26] indicated function of whether x2 equals x3
[00:04:28] x3 plus a smoothing factor of 2
[00:04:30] plus a smoothing factor of 2 which makes it
[00:04:31] which makes it instead of 1 0 0 1 that gives me 3 2 2
[00:04:35] instead of 1 0 0 1 that gives me 3 2 2 3.
[00:04:36] 3. so there's a kind of a mild preference
[00:04:39] so there's a kind of a mild preference for these two people to agree compared
[00:04:42] for these two people to agree compared to not agree
[00:04:44] to not agree so now if you click on this demo in the
[00:04:46] so now if you click on this demo in the slides here this is going to take you to
[00:04:48] slides here this is going to take you to a little javascript application here
[00:04:51] a little javascript application here where you can actually write your own
[00:04:54] where you can actually write your own fact graph
[00:04:56] fact graph and we're going to come back to this um
[00:04:58] and we're going to come back to this um later
[00:05:00] later so this is the first example of a factor
[00:05:02] so this is the first example of a factor graph capturing this simple voting
[00:05:05] graph capturing this simple voting situation
[00:05:06] situation so now let's look at a different example
[00:05:09] so now let's look at a different example that we looked at the overview module so
[00:05:11] that we looked at the overview module so this is map colony australia so remember
[00:05:14] this is map colony australia so remember australia has these
[00:05:16] australia has these seven beautiful provinces and each one
[00:05:19] seven beautiful provinces and each one needs to be assigned a color
[00:05:22] needs to be assigned a color so each of these provinces is going to
[00:05:24] so each of these provinces is going to be represented as a variable
[00:05:27] be represented as a variable and here i'm going to give every
[00:05:30] and here i'm going to give every variable a name um wa for western
[00:05:33] variable a name um wa for western australia nt for northern territory and
[00:05:35] australia nt for northern territory and so on and i'm going to use big x usually
[00:05:38] so on and i'm going to use big x usually to denote the set of all variables
[00:05:41] to denote the set of all variables each variable is also going to take on
[00:05:44] each variable is also going to take on a set of values
[00:05:46] a set of values which in this case is going to be red
[00:05:48] which in this case is going to be red green or blue
[00:05:51] green or blue and now i'm going to define the factors
[00:05:53] and now i'm going to define the factors of this factor graph
[00:05:55] of this factor graph so for every two neighboring provinces i
[00:05:59] so for every two neighboring provinces i want to say that they can't have the
[00:06:01] want to say that they can't have the same color
[00:06:02] same color so for example f1 is going to say wa and
[00:06:08] so for example f1 is going to say wa and nt must be different
[00:06:10] nt must be different that corresponds to this
[00:06:12] that corresponds to this factor over here
[00:06:14] factor over here f2 says ntnq must be different
[00:06:18] f2 says ntnq must be different and that's going to correspond to this
[00:06:20] and that's going to correspond to this factor here
[00:06:21] factor here and so on and so forth
[00:06:25] so now we're ready to formally define a
[00:06:28] so now we're ready to formally define a factor graph so a factor graph
[00:06:31] factor graph so a factor graph is uh going to consist of a set of
[00:06:34] is uh going to consist of a set of variables um x1 through xn in the
[00:06:37] variables um x1 through xn in the general case remember big x is going to
[00:06:40] general case remember big x is going to denote the set of all variables
[00:06:42] denote the set of all variables where each variable x i
[00:06:44] where each variable x i takes on values in some set of possible
[00:06:48] takes on values in some set of possible values known as the domain of variable i
[00:06:52] values known as the domain of variable i and a factor graph also consists of a
[00:06:55] and a factor graph also consists of a set of factors
[00:06:57] set of factors generally denoted f1 through fm
[00:07:00] generally denoted f1 through fm each fj is going to be a function
[00:07:03] each fj is going to be a function that takes on um takes as input an
[00:07:07] that takes on um takes as input an assignment
[00:07:08] assignment to the variables and going to represent
[00:07:11] to the variables and going to represent uh represent a return of a non-negative
[00:07:14] uh represent a return of a non-negative number so it's really important that
[00:07:16] number so it's really important that this function return a non-negative
[00:07:18] this function return a non-negative number instead of a negative number
[00:07:20] number instead of a negative number because later we'll see that we're going
[00:07:22] because later we'll see that we're going to multiply them
[00:07:24] to multiply them together
[00:07:26] together so that's the definition of a factor
[00:07:28] so that's the definition of a factor graph
[00:07:30] graph so a bit of terminology here um so
[00:07:34] so a bit of terminology here um so i'm going to define the scope of a
[00:07:36] i'm going to define the scope of a factor
[00:07:37] factor as the set of factors as a set of
[00:07:40] as the set of factors as a set of variables it depends on
[00:07:42] variables it depends on so in the map coloring example
[00:07:45] so in the map coloring example the scope of f1
[00:07:47] the scope of f1 is going to be
[00:07:48] is going to be simply um wa and nt
[00:07:53] simply um wa and nt this corresponds to visually the set of
[00:07:56] this corresponds to visually the set of variables that this
[00:07:57] variables that this this factor is touching
[00:08:00] this factor is touching um the arity of a factor is number of
[00:08:03] um the arity of a factor is number of variables in the scope
[00:08:06] variables in the scope so in this case you just count how many
[00:08:08] so in this case you just count how many variables are here the answer is two
[00:08:11] variables are here the answer is two um some shorthand notation so unary
[00:08:14] um some shorthand notation so unary factors are ones that have area 1 and
[00:08:17] factors are ones that have area 1 and binary factors are ones that have error
[00:08:19] binary factors are ones that have error d2
[00:08:20] d2 and constraints are factors that return
[00:08:24] and constraints are factors that return 0 or 1.
[00:08:25] 0 or 1. so notice that factor can return any
[00:08:27] so notice that factor can return any non-negative number but a special case
[00:08:30] non-negative number but a special case is when it returns 0 1 which means yes
[00:08:33] is when it returns 0 1 which means yes or no essentially
[00:08:35] or no essentially and in this context
[00:08:37] and in this context f1 is a binary
[00:08:38] f1 is a binary constraint so one thing to remember
[00:08:41] constraint so one thing to remember about factors is
[00:08:43] about factors is that each factor usually depends only on
[00:08:46] that each factor usually depends only on a
[00:08:47] a subset of the variables
[00:08:51] and not all the variables and this is
[00:08:53] and not all the variables and this is going to be kind of important when we
[00:08:55] going to be kind of important when we talk about an algorithmic efficiency
[00:08:59] talk about an algorithmic efficiency so now we fully define
[00:09:02] so now we fully define what a factor graph is
[00:09:04] what a factor graph is i'm going to now talk about the notion
[00:09:05] i'm going to now talk about the notion of assignment weight
[00:09:08] of assignment weight so let's go back to the voting example
[00:09:10] so let's go back to the voting example in the voting example we had a four
[00:09:13] in the voting example we had a four factors corresponding to whether
[00:09:16] factors corresponding to whether person one and person three were voting
[00:09:19] person one and person three were voting a certain way and whether person one
[00:09:20] a certain way and whether person one person two and person two and person
[00:09:22] person two and person two and person three are agreed or not
[00:09:25] three are agreed or not um so an assignment
[00:09:27] um so an assignment is going to be um just
[00:09:30] is going to be um just um assignment of values to each of the
[00:09:33] um assignment of values to each of the the variables in this case there's three
[00:09:35] the variables in this case there's three variables x1 x2 x3
[00:09:37] variables x1 x2 x3 and each assignment is going to be
[00:09:39] and each assignment is going to be associated with a weight
[00:09:41] associated with a weight so here's how the weight is going to be
[00:09:42] so here's how the weight is going to be calculated i'm going to go through each
[00:09:45] calculated i'm going to go through each of these factors and i'm going to plug
[00:09:47] of these factors and i'm going to plug in this assignment and read out a
[00:09:48] in this assignment and read out a particular number
[00:09:50] particular number so let's take this factor f1
[00:09:52] so let's take this factor f1 so what is x1 it's r
[00:09:55] so what is x1 it's r so i'm going to get a zero
[00:09:58] what about this factor what is x1 and x2
[00:10:02] what about this factor what is x1 and x2 it's going to rr i'm going to return a
[00:10:04] it's going to rr i'm going to return a 1. let me copy that down here well this
[00:10:07] 1. let me copy that down here well this factor x2 and x3 are rr i'm going to get
[00:10:10] factor x2 and x3 are rr i'm going to get a 3
[00:10:11] a 3 and finally the fourth factor f4
[00:10:15] and finally the fourth factor f4 what is x3 it's r so i'm going to read
[00:10:18] what is x3 it's r so i'm going to read out a 2.
[00:10:19] out a 2. and the
[00:10:20] and the all these uh the outputs of the factors
[00:10:23] all these uh the outputs of the factors are numbers i'm going to multiply all of
[00:10:26] are numbers i'm going to multiply all of them together i'm going to get a weight
[00:10:29] them together i'm going to get a weight and that weight in this case is 0. so
[00:10:32] and that weight in this case is 0. so now you can go through all the other
[00:10:33] now you can go through all the other possible assignments of values to all
[00:10:36] possible assignments of values to all the variables
[00:10:37] the variables in this case there are eight possible
[00:10:39] in this case there are eight possible assignments
[00:10:41] assignments and each of them is going to have a
[00:10:43] and each of them is going to have a particular weight
[00:10:46] particular weight so now let's look at the demo um if you
[00:10:48] so now let's look at the demo um if you click step here that's going to run this
[00:10:50] click step here that's going to run this inference algorithm and produce
[00:10:54] inference algorithm and produce a wait for
[00:10:56] a wait for every possible
[00:10:58] every possible assignment that's um that has non-zero
[00:11:01] assignment that's um that has non-zero weight so in this case we verify that
[00:11:03] weight so in this case we verify that there is two possible
[00:11:05] there is two possible um assignments
[00:11:07] um assignments that have non-zero weight
[00:11:09] that have non-zero weight assigning bbr and
[00:11:12] assigning bbr and bbb
[00:11:16] okay so now let's switch over again to
[00:11:19] okay so now let's switch over again to the map coloring example just to see how
[00:11:22] the map coloring example just to see how weights are computed here
[00:11:24] weights are computed here so here is a possible
[00:11:26] so here is a possible assignment
[00:11:27] assignment and
[00:11:28] and of course to provinces
[00:11:31] of course to provinces so here
[00:11:33] so here notationally i'm going to make a slight
[00:11:35] notationally i'm going to make a slight change it's going to be sometimes
[00:11:36] change it's going to be sometimes convenient to be representing
[00:11:38] convenient to be representing assignments in this kind of dictionary
[00:11:40] assignments in this kind of dictionary format where
[00:11:42] format where the variables have names
[00:11:44] the variables have names so here
[00:11:46] so here i have
[00:11:47] i have wa is assigned red
[00:11:50] wa is assigned red nt assigned time green and so on and so
[00:11:52] nt assigned time green and so on and so forth so literally you can think about
[00:11:53] forth so literally you can think about this as a python dictionary if you like
[00:11:57] this as a python dictionary if you like what is the weight of this assignment
[00:11:59] what is the weight of this assignment well in this particular case
[00:12:01] well in this particular case all
[00:12:02] all neighbors
[00:12:03] neighbors have different colors
[00:12:05] have different colors and remember the each factor is just
[00:12:07] and remember the each factor is just going to thumbs up
[00:12:10] going to thumbs up return one if
[00:12:12] return one if the two adjacent neighbors have this
[00:12:15] the two adjacent neighbors have this different colors so i'm just gonna get
[00:12:17] different colors so i'm just gonna get one times one times one and that's just
[00:12:19] one times one times one and that's just one
[00:12:20] one now consider alternative assignment
[00:12:22] now consider alternative assignment where i've simply replaced nt
[00:12:25] where i've simply replaced nt with red here so nt it becomes red
[00:12:29] with red here so nt it becomes red and now we can see that the weight of
[00:12:31] and now we can see that the weight of this altered assignment is going to be
[00:12:34] this altered assignment is going to be zero because these two factors are going
[00:12:37] zero because these two factors are going to evaluate to zero
[00:12:39] to evaluate to zero these two here
[00:12:41] these two here and one thing you might realize very
[00:12:42] and one thing you might realize very quickly here is that
[00:12:45] quickly here is that all it takes is for one
[00:12:48] all it takes is for one factor to veto
[00:12:50] factor to veto the entire assignment because we're
[00:12:53] the entire assignment because we're multiplying if one of the factors
[00:12:55] multiplying if one of the factors returns zero
[00:12:56] returns zero then the product of all the factors is
[00:12:59] then the product of all the factors is also going to be zero
[00:13:04] so here is a general definition of
[00:13:06] so here is a general definition of assignment weight
[00:13:08] assignment weight assignment little x
[00:13:10] assignment little x is going to be x1 through xn
[00:13:14] is going to be x1 through xn has a weight
[00:13:15] has a weight um
[00:13:17] um and this weight
[00:13:18] and this weight uh
[00:13:19] uh is a function that takes an assignment
[00:13:22] is a function that takes an assignment and returns
[00:13:25] and returns the product over all possible factors
[00:13:30] the product over all possible factors of
[00:13:31] of the factor f j
[00:13:33] the factor f j apply to
[00:13:35] apply to an assignment
[00:13:36] an assignment and here no even though
[00:13:39] and here no even though each factor only depends on the subset
[00:13:41] each factor only depends on the subset of variables i'm i'm kind of simplifying
[00:13:44] of variables i'm i'm kind of simplifying notation by just passing in an entire
[00:13:46] notation by just passing in an entire assignment in practice i would only pass
[00:13:48] assignment in practice i would only pass in only the variables that are in the
[00:13:50] in only the variables that are in the scope of fj
[00:13:53] scope of fj so a bit of terminology an assignment is
[00:13:56] so a bit of terminology an assignment is consistent
[00:13:57] consistent if its weight is greater than
[00:14:01] if its weight is greater than um zero
[00:14:01] um zero weight can't be negative because all the
[00:14:03] weight can't be negative because all the factors return non-negative numbers so a
[00:14:05] factors return non-negative numbers so a weight is zero
[00:14:07] weight is zero that means the assignment is
[00:14:09] that means the assignment is inconsistent
[00:14:11] inconsistent and the objective of a constrained
[00:14:13] and the objective of a constrained satisfaction problem finally getting to
[00:14:15] satisfaction problem finally getting to what the point of all this is is to find
[00:14:18] what the point of all this is is to find the maximum weight assignment
[00:14:19] the maximum weight assignment mathematically it's written arc max over
[00:14:22] mathematically it's written arc max over all possible assignments x of weight of
[00:14:24] all possible assignments x of weight of x
[00:14:26] x and a constraint satisfaction problem is
[00:14:28] and a constraint satisfaction problem is said to be satisfiable
[00:14:31] said to be satisfiable if
[00:14:32] if the weight of a maximum weight
[00:14:34] the weight of a maximum weight assignment is greater than zero
[00:14:37] assignment is greater than zero another way to say the same thing is
[00:14:39] another way to say the same thing is whether there exists some consistent
[00:14:41] whether there exists some consistent assignment
[00:14:44] assignment and
[00:14:45] and note one thing is that the weight here
[00:14:48] note one thing is that the weight here in the context of factor graphs and
[00:14:50] in the context of factor graphs and constraint satisfaction problems
[00:14:52] constraint satisfaction problems are
[00:14:53] are not the same as a weight in that we
[00:14:56] not the same as a weight in that we study in machine learning those weights
[00:14:58] study in machine learning those weights can be negative or non-negative um but
[00:15:01] can be negative or non-negative um but these weights and concerns satisfaction
[00:15:03] these weights and concerns satisfaction problems and factor graphs have to be
[00:15:05] problems and factor graphs have to be non-negative
[00:15:07] non-negative one other
[00:15:08] one other small comment is that here we are
[00:15:10] small comment is that here we are actually defining a slight
[00:15:12] actually defining a slight generalization of constraint
[00:15:13] generalization of constraint satisfaction problems where factors can
[00:15:16] satisfaction problems where factors can actually
[00:15:17] actually um
[00:15:19] um have not just zero or one
[00:15:22] have not just zero or one weights uh but actually
[00:15:24] weights uh but actually any non-negative value
[00:15:29] so constraint satisfaction problems um
[00:15:32] so constraint satisfaction problems um actually is a general umbrella term that
[00:15:35] actually is a general umbrella term that captures several important
[00:15:37] captures several important cases
[00:15:38] cases so the first is boolean satisfiability
[00:15:41] so the first is boolean satisfiability problems otherwise known as sat so in
[00:15:43] problems otherwise known as sat so in these cases um the variables are boolean
[00:15:46] these cases um the variables are boolean valued and the factors are a logical
[00:15:48] valued and the factors are a logical formula such as x1 or not x2 or x5
[00:15:53] formula such as x1 or not x2 or x5 so satisfiability problems are
[00:15:56] so satisfiability problems are become mp complete problems which means
[00:15:58] become mp complete problems which means that in the worst case they're really
[00:16:00] that in the worst case they're really really hard and we don't have efficient
[00:16:01] really hard and we don't have efficient algorithms for solving them
[00:16:04] algorithms for solving them but
[00:16:05] but in practice
[00:16:06] in practice it turns out that we've there's been an
[00:16:08] it turns out that we've there's been an extraordinary amount of progress in set
[00:16:11] extraordinary amount of progress in set solving
[00:16:12] solving and we can actually routinely solve us
[00:16:14] and we can actually routinely solve us sad problems with many many more
[00:16:17] sad problems with many many more variables than we might be able to
[00:16:19] variables than we might be able to predict uh by theory alone
[00:16:24] so there's a joke that says you know
[00:16:26] so there's a joke that says you know theoreticians reduce a problem to set if
[00:16:28] theoreticians reduce a problem to set if they want to show that it's a hard to
[00:16:30] they want to show that it's a hard to solve and practitioners reduce a problem
[00:16:32] solve and practitioners reduce a problem to set if they want to solve the problem
[00:16:36] to set if they want to solve the problem another class of problems that are is
[00:16:38] another class of problems that are is important is linear programming and this
[00:16:41] important is linear programming and this in linear programs the variables are
[00:16:43] in linear programs the variables are real valued numbers and the factors are
[00:16:46] real valued numbers and the factors are a linear inequality such as x2 plus x3
[00:16:50] a linear inequality such as x2 plus x3 x5 less than or equal to one
[00:16:53] x5 less than or equal to one and despite the fact that variables can
[00:16:55] and despite the fact that variables can take on an infinite number of values
[00:16:57] take on an infinite number of values linear programs have the special
[00:16:59] linear programs have the special structure that makes them especially
[00:17:00] structure that makes them especially efficient to solve and there's been a
[00:17:02] efficient to solve and there's been a lot of work in
[00:17:04] lot of work in solving linear programs efficiently
[00:17:08] solving linear programs efficiently integer linear programs are same as
[00:17:10] integer linear programs are same as linear programs except for
[00:17:12] linear programs except for the variables are integer valued
[00:17:15] the variables are integer valued and this is the fact that they're
[00:17:17] and this is the fact that they're integer values makes these incredibly
[00:17:19] integer values makes these incredibly hard again just like satisfiability
[00:17:22] hard again just like satisfiability problems mixed interior linear programs
[00:17:24] problems mixed interior linear programs are
[00:17:25] are um are problems where variables are
[00:17:28] um are problems where variables are reals and integers
[00:17:30] reals and integers um and these problems are also hard to
[00:17:32] um and these problems are also hard to solve
[00:17:35] so in summary
[00:17:37] so in summary we formally defined the notion of a
[00:17:39] we formally defined the notion of a factor graph which includes variables
[00:17:42] factor graph which includes variables and
[00:17:44] and factors so variables specify
[00:17:46] factors so variables specify unknown quantities that we need to
[00:17:48] unknown quantities that we need to ascertain
[00:17:49] ascertain and factors specify
[00:17:51] and factors specify preferences or constraints for partial
[00:17:54] preferences or constraints for partial assignments
[00:17:56] assignments and one thing
[00:17:57] and one thing that's special about factor graphs is
[00:17:59] that's special about factor graphs is that you're specifying constraints and
[00:18:01] that you're specifying constraints and preferences in a local way
[00:18:03] preferences in a local way so suppose you're modeling
[00:18:05] so suppose you're modeling and
[00:18:07] and you think of a particular preference
[00:18:09] you think of a particular preference that you have you can just simply write
[00:18:11] that you have you can just simply write down a factor
[00:18:13] down a factor in terms of the variables that are that
[00:18:15] in terms of the variables that are that matter and throw that factor into the
[00:18:18] matter and throw that factor into the constraint satisfactoring problem
[00:18:21] constraint satisfactoring problem and now the hard work comes in actually
[00:18:24] and now the hard work comes in actually processing all these set of factors
[00:18:27] processing all these set of factors so a key definition is the weight of a
[00:18:30] so a key definition is the weight of a possible an assignment
[00:18:32] possible an assignment is the product of all the factors and
[00:18:34] is the product of all the factors and this is where all the magic happens this
[00:18:35] this is where all the magic happens this is where
[00:18:36] is where the
[00:18:37] the you have to think globally about all the
[00:18:40] you have to think globally about all the factors together
[00:18:43] factors together and
[00:18:44] and the point of a constraint satisfaction
[00:18:46] the point of a constraint satisfaction problem again is to find the maximum
[00:18:48] problem again is to find the maximum weight assignment and this is again
[00:18:51] weight assignment and this is again something that requires global reasoning
[00:18:53] something that requires global reasoning over all the factors
[00:18:55] over all the factors and so the model here to remember is
[00:18:58] and so the model here to remember is specify locally if you're modeling and
[00:19:00] specify locally if you're modeling and optimize globally which is what the
[00:19:02] optimize globally which is what the inference algorithm will do
[00:19:05] inference algorithm will do that's the end of this module
[00:19:11] you


================================================================================
LECTURE 025
================================================================================

Constraint Satisfaction Problems (CSPs) 3 - Examples | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=Tu6BiZhMDCc

---

Transcript

[00:00:05] hi in this module i'm going to show you
[00:00:07] hi in this module i'm going to show you how you can take some real world
[00:00:08] how you can take some real world problems and model them as constraint
[00:00:10] problems and model them as constraint satisfaction problems
[00:00:12] satisfaction problems so we'll begin with our first example
[00:00:15] so we'll begin with our first example so lsat is the standardized test for a
[00:00:18] so lsat is the standardized test for a mission into law school and it features
[00:00:20] mission into law school and it features these logic puzzles so here's one
[00:00:23] these logic puzzles so here's one example of a logic puzzle so imagine you
[00:00:25] example of a logic puzzle so imagine you have three sculptures a b and c
[00:00:28] have three sculptures a b and c that are to be exhibited in two rooms
[00:00:30] that are to be exhibited in two rooms one or two of an art gallery
[00:00:32] one or two of an art gallery so the exhibition has
[00:00:34] so the exhibition has imposed a certain number of conditions
[00:00:36] imposed a certain number of conditions on you
[00:00:37] on you so sculptures a and b cannot be in the
[00:00:40] so sculptures a and b cannot be in the same room
[00:00:41] same room sculptures b and c must be in the same
[00:00:43] sculptures b and c must be in the same room
[00:00:44] room and room two can hold
[00:00:46] and room two can hold only one sculpture
[00:00:49] only one sculpture so how do you model this as a constraint
[00:00:51] so how do you model this as a constraint satisfaction problem let's do it via
[00:00:54] satisfaction problem let's do it via this javascript demo
[00:00:56] this javascript demo erase that and start over
[00:00:58] erase that and start over so
[00:00:59] so um the first thing you want to do when
[00:01:01] um the first thing you want to do when you model is you figure out what the
[00:01:03] you model is you figure out what the variables are
[00:01:04] variables are so looking back here we want to
[00:01:07] so looking back here we want to put the three sculptures in rooms so
[00:01:10] put the three sculptures in rooms so let's just define a variable for each of
[00:01:13] let's just define a variable for each of these sculptures
[00:01:15] these sculptures so
[00:01:15] so in this javascript demo i'm going to
[00:01:17] in this javascript demo i'm going to define a variable
[00:01:19] define a variable a
[00:01:20] a and the domain of that a is either one
[00:01:23] and the domain of that a is either one or two depending on what
[00:01:25] or two depending on what room
[00:01:26] room that sculpture a should be placed in
[00:01:29] that sculpture a should be placed in and i hit step and
[00:01:31] and i hit step and i get this
[00:01:33] i get this actually
[00:01:35] actually a variable and i can mouse over and can
[00:01:37] a variable and i can mouse over and can see the domain of that variable
[00:01:40] see the domain of that variable okay so now i can do it for the other
[00:01:43] okay so now i can do it for the other two
[00:01:44] two sculptures b
[00:01:45] sculptures b and c
[00:01:47] and c um and you'll see that now i have three
[00:01:49] um and you'll see that now i have three variables a b and c each of which can
[00:01:52] variables a b and c each of which can take on values one or two
[00:01:55] take on values one or two so now let me define the factors
[00:01:58] so now let me define the factors so i'm going to define a factor for each
[00:02:00] so i'm going to define a factor for each of these three conditions usually each
[00:02:01] of these three conditions usually each condition corresponds to a factor but as
[00:02:03] condition corresponds to a factor but as we'll see later that's not always the
[00:02:05] we'll see later that's not always the case
[00:02:06] case so the first condition says that
[00:02:07] so the first condition says that sculptures a and b cannot be in the same
[00:02:10] sculptures a and b cannot be in the same room
[00:02:12] room so this naturally is a factor that um
[00:02:17] so this naturally is a factor that um touches variables a and b
[00:02:19] touches variables a and b so i'm going to call that factor a b
[00:02:22] so i'm going to call that factor a b it's a scope is
[00:02:24] it's a scope is variables a and b
[00:02:26] variables a and b and remember a factor is a function
[00:02:29] and remember a factor is a function that takes on an assignment to the
[00:02:31] that takes on an assignment to the variables in that scope
[00:02:34] variables in that scope a and b in this case
[00:02:35] a and b in this case and it's going to return a non-negative
[00:02:38] and it's going to return a non-negative number
[00:02:40] number in this case i want it to be that case
[00:02:42] in this case i want it to be that case that a and b are not in the same room so
[00:02:45] that a and b are not in the same room so i'm going to return a not equal to b
[00:02:50] i'm going to return a not equal to b so
[00:02:51] so um if i hit enter that is going to give
[00:02:53] um if i hit enter that is going to give me this factor and i can check its table
[00:02:56] me this factor and i can check its table that says 1 2 is good 2 1 is not
[00:03:00] that says 1 2 is good 2 1 is not uh 2 1 is also good but
[00:03:03] uh 2 1 is also good but 1 1 and 2 2 are not good
[00:03:07] so now i'm going to move on to the
[00:03:08] so now i'm going to move on to the second condition sculptures b and c must
[00:03:11] second condition sculptures b and c must be in the same room
[00:03:12] be in the same room so this is similar
[00:03:14] so this is similar but now applied to b and c
[00:03:18] but now applied to b and c um
[00:03:21] um b and c and they have to be in the same
[00:03:23] b and c and they have to be in the same room so i'm just going to return b
[00:03:25] room so i'm just going to return b equals c
[00:03:28] equals c so i'm going to check that that factor
[00:03:29] so i'm going to check that that factor does what i wanted to do so it's happy
[00:03:33] does what i wanted to do so it's happy with one one and two two
[00:03:36] with one one and two two which is good
[00:03:37] which is good and now what about the final condition
[00:03:39] and now what about the final condition so room two can hold only one sculpture
[00:03:42] so room two can hold only one sculpture this one's a little bit tricky because
[00:03:44] this one's a little bit tricky because it doesn't mention
[00:03:46] it doesn't mention sculptures exactly it mentions only the
[00:03:48] sculptures exactly it mentions only the room
[00:03:49] room but here what it really means is that i
[00:03:51] but here what it really means is that i have to look at all
[00:03:54] have to look at all the sculpture variables
[00:03:56] the sculpture variables so i'm going to define a factor let's
[00:03:57] so i'm going to define a factor let's call it r2
[00:03:59] call it r2 which depends on all the variables here
[00:04:03] which depends on all the variables here and i'm going to need to figure out
[00:04:06] and i'm going to need to figure out whether room 2 has
[00:04:09] whether room 2 has at most one sculpture so let's keep a
[00:04:12] at most one sculpture so let's keep a counter
[00:04:13] counter and we're going to go through all the
[00:04:16] and we're going to go through all the the sculptures and see if sculpture a is
[00:04:19] the sculptures and see if sculpture a is in room 2 if it is i'm going to
[00:04:21] in room 2 if it is i'm going to increment the counter
[00:04:22] increment the counter if
[00:04:23] if sculpture b is in room 2 i'm going to
[00:04:26] sculpture b is in room 2 i'm going to increment the counter if sculpture c is
[00:04:29] increment the counter if sculpture c is in room through i'm going to increment
[00:04:31] in room through i'm going to increment the counter
[00:04:32] the counter and now i'm going to return whether the
[00:04:34] and now i'm going to return whether the number of sculptures in room 2 which is
[00:04:37] number of sculptures in room 2 which is now n
[00:04:39] now n is at most 1.
[00:04:42] is at most 1. okay so i make that factor
[00:04:44] okay so i make that factor and i can see that
[00:04:47] and i can see that this factor is happy if
[00:04:49] this factor is happy if at most
[00:04:51] at most one
[00:04:52] one sculpture is in room 2 or 0.
[00:04:56] sculpture is in room 2 or 0. okay so now
[00:04:58] okay so now i have defined my constraint
[00:05:00] i have defined my constraint satisfaction problem or factor graph set
[00:05:03] satisfaction problem or factor graph set of variables set of factors
[00:05:06] of variables set of factors and now if i press step
[00:05:08] and now if i press step then it will magically solve
[00:05:12] then it will magically solve the the csv
[00:05:13] the the csv and here there is one
[00:05:15] and here there is one satisfying assignment which assigns um a
[00:05:19] satisfying assignment which assigns um a to room two
[00:05:20] to room two um signs b and c to room one
[00:05:25] um signs b and c to room one okay so that's our first example of
[00:05:28] okay so that's our first example of solving a constraint satisfaction
[00:05:30] solving a constraint satisfaction problem
[00:05:32] um here is another example from object
[00:05:35] um here is another example from object tracking so suppose you're trying to
[00:05:37] tracking so suppose you're trying to build an autonomous driving system
[00:05:39] build an autonomous driving system you want to track where objects are such
[00:05:42] you want to track where objects are such as cars and pedestrians so you know
[00:05:43] as cars and pedestrians so you know where not to drive
[00:05:45] where not to drive so we're going to
[00:05:47] so we're going to work with a very simplified
[00:05:49] work with a very simplified setup here
[00:05:51] setup here and here the setting is we're going to
[00:05:52] and here the setting is we're going to have a number of discrete time steps 0 1
[00:05:56] have a number of discrete time steps 0 1 2
[00:05:57] 2 3 4. and in each time step we're going
[00:06:00] 3 4. and in each time step we're going to have a sensor observation that tells
[00:06:03] to have a sensor observation that tells us
[00:06:03] us um a noisy indicator of the position of
[00:06:06] um a noisy indicator of the position of a particular object so maybe at time
[00:06:08] a particular object so maybe at time step one i'm going to observe that the
[00:06:11] step one i'm going to observe that the object was at
[00:06:13] object was at zero
[00:06:14] zero and that's time step two
[00:06:17] and that's time step two um i get observation of two and at times
[00:06:19] um i get observation of two and at times f3 i'm going to get observation of
[00:06:22] f3 i'm going to get observation of 2. so the noisy centers report these
[00:06:25] 2. so the noisy centers report these positions 0 to 2.
[00:06:28] positions 0 to 2. and we know that objects
[00:06:30] and we know that objects can't teleport
[00:06:32] can't teleport um so the question is what trajectory
[00:06:36] um so the question is what trajectory did the object take
[00:06:39] did the object take did it do something like this and
[00:06:42] did it do something like this and actually the sensor readings were
[00:06:44] actually the sensor readings were correct or maybe it did something like
[00:06:45] correct or maybe it did something like that or something completely different
[00:06:50] so how do we do this
[00:06:52] so how do we do this we're going to set up a object tracking
[00:06:55] we're going to set up a object tracking csp
[00:06:56] csp so let's first define a factor graph
[00:06:59] so let's first define a factor graph the variables of the factor graph are
[00:07:02] the variables of the factor graph are going to include the position of the
[00:07:04] going to include the position of the object at each time step um
[00:07:07] object at each time step um one two or three there's three time
[00:07:09] one two or three there's three time steps
[00:07:10] steps and the domain of each variable is a
[00:07:13] and the domain of each variable is a zero one or two so object could be in
[00:07:15] zero one or two so object could be in position zero one or two so x i
[00:07:19] position zero one or two so x i represents the true position of the
[00:07:21] represents the true position of the object at time step i
[00:07:24] object at time step i so now we're going to define a bunch of
[00:07:27] so now we're going to define a bunch of observation factors and this is going to
[00:07:29] observation factors and this is going to be attempting to
[00:07:31] be attempting to incorporate the sensor information into
[00:07:34] incorporate the sensor information into the problem
[00:07:36] the problem so remember um at time step
[00:07:39] so remember um at time step one we observed that the object was zero
[00:07:41] one we observed that the object was zero of course this is noisy so we don't want
[00:07:43] of course this is noisy so we don't want to trust it completely we're going to
[00:07:45] to trust it completely we're going to find an observation factor o1 that
[00:07:48] find an observation factor o1 that captures this
[00:07:49] captures this so o1 is going to be a unifac urinary
[00:07:52] so o1 is going to be a unifac urinary factor depends on only on x1
[00:07:55] factor depends on only on x1 and it's going to highly favor assigning
[00:07:59] and it's going to highly favor assigning x1 to 0 which is the actual observation
[00:08:02] x1 to 0 which is the actual observation but if x1 might be at 1 which is a
[00:08:06] but if x1 might be at 1 which is a neighboring location
[00:08:08] neighboring location that's going to have a weight of 1.
[00:08:10] that's going to have a weight of 1. and if the object is too
[00:08:13] and if the object is too far away
[00:08:14] far away then a 2 then i'm going to say that's
[00:08:16] then a 2 then i'm going to say that's disallowed so whenever you see a 0
[00:08:19] disallowed so whenever you see a 0 weight
[00:08:20] weight a factor returning 0 that's saying
[00:08:23] a factor returning 0 that's saying that's a
[00:08:24] that's a veto okay so um
[00:08:28] veto okay so um o2 is similar but applied to um
[00:08:31] o2 is similar but applied to um you know x2
[00:08:34] you know x2 x2 is remember the position of the
[00:08:36] x2 is remember the position of the object at time step two
[00:08:38] object at time step two um and it's going to favor x2 being 2
[00:08:42] um and it's going to favor x2 being 2 and degrade if it's a 1 away and forbid
[00:08:45] and degrade if it's a 1 away and forbid it if it's 2 away
[00:08:48] it if it's 2 away x 3 is also similar but applied to x3
[00:08:53] x 3 is also similar but applied to x3 which is the object position at times f3
[00:08:56] which is the object position at times f3 it's going to favor x3 equal to but also
[00:08:59] it's going to favor x3 equal to but also going to degrade and forbid if it's too
[00:09:01] going to degrade and forbid if it's too far away
[00:09:02] far away okay so we have three observation
[00:09:04] okay so we have three observation factors that capture the sensor readings
[00:09:08] factors that capture the sensor readings and now we're going to define
[00:09:10] and now we're going to define transition factors which represent the
[00:09:12] transition factors which represent the fact that objects positions can't change
[00:09:15] fact that objects positions can't change too much or in other words objects can't
[00:09:17] too much or in other words objects can't teleport
[00:09:20] teleport so here we're going to write this factor
[00:09:21] so here we're going to write this factor a little bit differently
[00:09:23] a little bit differently it's going to be a little bit more
[00:09:24] it's going to be a little bit more compact so we're going to look at the
[00:09:26] compact so we're going to look at the absolute difference between
[00:09:28] absolute difference between an
[00:09:29] an object at
[00:09:30] object at time step i and an object at the next
[00:09:33] time step i and an object at the next time step i plus
[00:09:35] time step i plus and if the object hasn't moved which
[00:09:38] and if the object hasn't moved which means that the difference is zero i'm
[00:09:40] means that the difference is zero i'm going to assign a weight of two
[00:09:42] going to assign a weight of two if it's moved by one then i'm going to
[00:09:44] if it's moved by one then i'm going to assign a weight of one and if it's moved
[00:09:47] assign a weight of one and if it's moved by two i'm going to sign a weight of
[00:09:49] by two i'm going to sign a weight of zero which is you know disallowing it
[00:09:52] zero which is you know disallowing it okay so this concludes the definition of
[00:09:55] okay so this concludes the definition of the constraint satisfaction problem for
[00:09:57] the constraint satisfaction problem for this simple object tracking
[00:09:59] this simple object tracking example
[00:10:00] example if i click on the demo
[00:10:02] if i click on the demo i'll show
[00:10:03] i'll show i can see that the
[00:10:05] i can see that the what the cfc looks like in javascript
[00:10:07] what the cfc looks like in javascript code i've defined three variables x1 x2
[00:10:10] code i've defined three variables x1 x2 x3
[00:10:11] x3 i'm going to define this helper function
[00:10:13] i'm going to define this helper function nearby
[00:10:14] nearby that returns 2 if a and b are equal 1 if
[00:10:18] that returns 2 if a and b are equal 1 if they're 1 apart and 0 if they're 2 apart
[00:10:22] they're 1 apart and 0 if they're 2 apart and then i'm going to define these
[00:10:23] and then i'm going to define these factors o1
[00:10:25] factors o1 o2 o3 and
[00:10:28] o2 o3 and t1 and t2
[00:10:30] t1 and t2 so if i um
[00:10:33] so if i um solve this
[00:10:34] solve this csp this will return all the
[00:10:38] csp this will return all the the set of
[00:10:40] the set of non-zero
[00:10:41] non-zero weight assignments and i'll see the
[00:10:43] weight assignments and i'll see the maximum weight assignment is one two two
[00:10:47] maximum weight assignment is one two two so this is a solution to a csp it's
[00:10:50] so this is a solution to a csp it's assigning x one one x two two and x
[00:10:53] assigning x one one x two two and x three two
[00:10:54] three two um looking at this picture it's uh one
[00:10:59] um looking at this picture it's uh one two two so
[00:11:01] two two so we think that the object probably took
[00:11:03] we think that the object probably took this path
[00:11:07] okay so
[00:11:08] okay so that's the end of this example
[00:11:11] that's the end of this example so now let's look at a third example
[00:11:15] so now let's look at a third example event scheduling so csps are really
[00:11:17] event scheduling so csps are really suited for generally scheduling problems
[00:11:20] suited for generally scheduling problems so here is an example of a simple
[00:11:22] so here is an example of a simple scheduling problem so you have a set of
[00:11:25] scheduling problem so you have a set of events
[00:11:26] events and that need to be assigned into a
[00:11:29] and that need to be assigned into a number of time slots
[00:11:32] number of time slots so the events are numbered one through e
[00:11:34] so the events are numbered one through e and the time slots are number one
[00:11:35] and the time slots are number one through t
[00:11:37] through t um so we have three conditions here the
[00:11:39] um so we have three conditions here the first condition is that each event must
[00:11:41] first condition is that each event must be put in exactly one time slot
[00:11:44] be put in exactly one time slot condition two says that each time slot
[00:11:46] condition two says that each time slot can have at most one event so you can't
[00:11:49] can have at most one event so you can't double book two events into one time
[00:11:50] double book two events into one time slot
[00:11:51] slot and then condition three says
[00:11:53] and then condition three says that event
[00:11:56] that event e is allowed in time slot t
[00:11:59] e is allowed in time slot t only if
[00:12:00] only if this pair is exists in a set of allowed
[00:12:03] this pair is exists in a set of allowed pairs so i can visualize a as a set of
[00:12:06] pairs so i can visualize a as a set of edges between the events and the time
[00:12:08] edges between the events and the time slots and here is one
[00:12:12] slots and here is one possible assignment i assign
[00:12:15] possible assignment i assign event 1 to time slot 2
[00:12:17] event 1 to time slot 2 assign
[00:12:19] assign event 2 to time slot 1 and assign event
[00:12:22] event 2 to time slot 1 and assign event 3 to time slot 3.
[00:12:24] 3 to time slot 3. notice that i can't assign event to time
[00:12:27] notice that i can't assign event to time slot 2 because that would violate
[00:12:30] slot 2 because that would violate c3 there's no edge between
[00:12:33] c3 there's no edge between event 2 and time slot 2.
[00:12:36] event 2 and time slot 2. okay so how are we going to model this
[00:12:38] okay so how are we going to model this as a csp i'm actually going to show you
[00:12:40] as a csp i'm actually going to show you not one but two
[00:12:42] not one but two possible formulations of the csp which
[00:12:44] possible formulations of the csp which goes to show that there's some
[00:12:46] goes to show that there's some flexibility or you can say artistic
[00:12:49] flexibility or you can say artistic you know license in terms of how you
[00:12:52] you know license in terms of how you decide to
[00:12:54] decide to formulate problems with csps
[00:12:58] formulate problems with csps okay so the first formulation is going
[00:13:02] okay so the first formulation is going to be looking at it from the events
[00:13:05] to be looking at it from the events perspective
[00:13:06] perspective so here i'm going to define a set of
[00:13:08] so here i'm going to define a set of variables for each variable each event e
[00:13:11] variables for each variable each event e i'm going to define a variable
[00:13:14] i'm going to define a variable x e
[00:13:15] x e and the domain of
[00:13:18] and the domain of xe is going to be
[00:13:20] xe is going to be some integer 1 through t
[00:13:22] some integer 1 through t so notice here that right off the bat
[00:13:24] so notice here that right off the bat i've satisfied condition c1
[00:13:27] i've satisfied condition c1 because in a csp
[00:13:29] because in a csp every variable has to take on exactly
[00:13:30] every variable has to take on exactly one value
[00:13:32] one value and so that means that each
[00:13:35] and so that means that each event will be put in exactly one time
[00:13:37] event will be put in exactly one time slot
[00:13:38] slot so what about c2 now i have to do
[00:13:41] so what about c2 now i have to do something for c2
[00:13:42] something for c2 notice that c2 is in terms of time slots
[00:13:45] notice that c2 is in terms of time slots but you know our variables are in terms
[00:13:47] but you know our variables are in terms of events
[00:13:49] of events so if you remember from the lz puzzle
[00:13:51] so if you remember from the lz puzzle that means we implicitly have to define
[00:13:54] that means we implicitly have to define a factor that looks at all
[00:13:56] a factor that looks at all possible variables here
[00:13:59] possible variables here so i'm going to define a constraint
[00:14:02] so i'm going to define a constraint which is for
[00:14:04] which is for on every pair of events um
[00:14:07] on every pair of events um i'm going to make sure that
[00:14:10] i'm going to make sure that um the
[00:14:11] um the the time slot that event e was assigned
[00:14:14] the time slot that event e was assigned is not the same as the
[00:14:16] is not the same as the time slot that event e prime was
[00:14:18] time slot that event e prime was assigned
[00:14:19] assigned so if i check this for all pairs of
[00:14:21] so if i check this for all pairs of events
[00:14:22] events now i've satisfied a c2 i can guarantee
[00:14:26] now i've satisfied a c2 i can guarantee that no time slot has two events piling
[00:14:29] that no time slot has two events piling onto it
[00:14:31] onto it okay so now what about c3 so each event
[00:14:35] okay so now what about c3 so each event must be only allowed in certain time
[00:14:37] must be only allowed in certain time slots
[00:14:39] slots so here again i'm going to look at each
[00:14:42] so here again i'm going to look at each possible event
[00:14:43] possible event and i'm simply going to enforce that
[00:14:46] and i'm simply going to enforce that whatever time slot event e was assigned
[00:14:49] whatever time slot event e was assigned that's denoted x e
[00:14:51] that's denoted x e that pair is in the set of allowed event
[00:14:55] that pair is in the set of allowed event time slot pairs
[00:14:57] time slot pairs and that's enough to satisfy condition
[00:14:59] and that's enough to satisfy condition three
[00:15:01] three okay so that's the first formulation of
[00:15:03] okay so that's the first formulation of the csp
[00:15:05] the csp so now let's look at
[00:15:07] so now let's look at alternative formulation
[00:15:09] alternative formulation so now i'm going to look at from the
[00:15:11] so now i'm going to look at from the perspective of time slots
[00:15:13] perspective of time slots so here i'm going to define a variable
[00:15:16] so here i'm going to define a variable um yt for every possible time slot t
[00:15:19] um yt for every possible time slot t and yt can take on a
[00:15:22] and yt can take on a value which is either one of the
[00:15:24] value which is either one of the possible events or
[00:15:27] possible events or no which means that no events have been
[00:15:29] no which means that no events have been assigned to that time slot
[00:15:31] assigned to that time slot so notice here right off the bat i've
[00:15:33] so notice here right off the bat i've satisfied condition 2
[00:15:36] satisfied condition 2 because
[00:15:37] because remember every variable gets assigned
[00:15:39] remember every variable gets assigned one
[00:15:40] one exactly one value which is either going
[00:15:43] exactly one value which is either going to be event or no event so you can't
[00:15:45] to be event or no event so you can't possibly assign two events to a time
[00:15:48] possibly assign two events to a time slot
[00:15:49] slot now now we have to deal with
[00:15:52] now now we have to deal with condition one so how do we deal with it
[00:15:56] condition one so how do we deal with it so here
[00:15:57] so here um
[00:15:58] um all variables are in terms of time slots
[00:16:01] all variables are in terms of time slots but uh condition one is in terms of
[00:16:02] but uh condition one is in terms of events so again we're going to have to
[00:16:04] events so again we're going to have to define
[00:16:05] define um a constraint that touches all the
[00:16:08] um a constraint that touches all the variables
[00:16:09] variables so for every event here
[00:16:11] so for every event here i need to enforce that
[00:16:14] i need to enforce that if i look over all the time slots
[00:16:18] if i look over all the time slots that
[00:16:20] that that event shows up exactly once
[00:16:23] that event shows up exactly once so what this is saying is that
[00:16:25] so what this is saying is that this factor looks at all y1
[00:16:28] this factor looks at all y1 through
[00:16:29] through yt
[00:16:31] yt and checks that
[00:16:33] and checks that yt equals
[00:16:35] yt equals y little t equals e for exactly one of
[00:16:37] y little t equals e for exactly one of the possibilities
[00:16:39] the possibilities so this will check the box for c1
[00:16:42] so this will check the box for c1 and now c3 is similar to before
[00:16:45] and now c3 is similar to before so for every time slot we're going to
[00:16:48] so for every time slot we're going to enforce that either
[00:16:49] enforce that either nothing was selected at that time slot
[00:16:51] nothing was selected at that time slot or if something were a schedule that
[00:16:54] or if something were a schedule that that event
[00:16:56] that event and that time slot are compatible
[00:16:59] and that time slot are compatible okay so that concludes the definition of
[00:17:03] okay so that concludes the definition of the second formulation
[00:17:05] the second formulation and now one might wonder which one is
[00:17:07] and now one might wonder which one is better
[00:17:09] better and this is um
[00:17:11] and this is um you know a matter of efficiency and
[00:17:13] you know a matter of efficiency and there's various trade-offs which are
[00:17:15] there's various trade-offs which are discussed more in the notes
[00:17:19] okay so here is a final example of a csp
[00:17:24] okay so here is a final example of a csp which is going to be a little bit
[00:17:25] which is going to be a little bit different and um so it will be kind of
[00:17:28] different and um so it will be kind of interesting
[00:17:29] interesting so this is program verification
[00:17:32] so this is program verification so everyone writes programs and you're
[00:17:34] so everyone writes programs and you're probably used to the idea of writing
[00:17:35] probably used to the idea of writing unit tests to check whether a program is
[00:17:39] unit tests to check whether a program is correct
[00:17:40] correct but just because you program past a
[00:17:42] but just because you program past a bunch of tests doesn't actually
[00:17:44] bunch of tests doesn't actually guarantee that it's correct because
[00:17:45] guarantee that it's correct because you're never sure that you covered all
[00:17:47] you're never sure that you covered all the cases
[00:17:48] the cases so behind the idea behind program
[00:17:50] so behind the idea behind program verification is to prove that your
[00:17:53] verification is to prove that your program works for all possible inputs
[00:17:56] program works for all possible inputs so let's work through a simple example
[00:17:58] so let's work through a simple example suppose you have this program foo which
[00:18:00] suppose you have this program foo which takes in two values x and y
[00:18:03] takes in two values x and y and it computes the following
[00:18:05] and it computes the following so it's gonna assign
[00:18:07] so it's gonna assign x times x to a
[00:18:09] x times x to a it's going to add y times y to a and
[00:18:12] it's going to add y times y to a and then assign that to b
[00:18:14] then assign that to b and then it's going to subtract this
[00:18:16] and then it's going to subtract this quantity and assigned to the c and
[00:18:18] quantity and assigned to the c and return c
[00:18:20] return c so
[00:18:21] so the thing i want to prove here is this
[00:18:23] the thing i want to prove here is this following specification that c
[00:18:26] following specification that c is greater than u or equal to zero no
[00:18:29] is greater than u or equal to zero no matter what value x and y take
[00:18:32] matter what value x and y take so here is how i'm going to specify the
[00:18:36] so here is how i'm going to specify the csp
[00:18:37] csp i'm going to define a set of variables
[00:18:40] i'm going to define a set of variables that corresponds to both the inputs and
[00:18:43] that corresponds to both the inputs and also the intermediate quantities that
[00:18:45] also the intermediate quantities that are computed along the way so x y a b
[00:18:47] are computed along the way so x y a b and c
[00:18:49] and c and now i'm going to define a set of
[00:18:50] and now i'm going to define a set of constraints
[00:18:52] constraints corresponding to the program statements
[00:18:54] corresponding to the program statements which are going to relate these
[00:18:55] which are going to relate these variables
[00:18:57] variables and so for the first constraint i'm
[00:18:59] and so for the first constraint i'm going to have a equals x squared which
[00:19:02] going to have a equals x squared which captures what this first statement is
[00:19:05] captures what this first statement is doing
[00:19:06] doing i'm going to have b equals a plus y
[00:19:08] i'm going to have b equals a plus y squared which is going to capture the
[00:19:10] squared which is going to capture the second program statement and c equals b
[00:19:12] second program statement and c equals b minus 2 x y which is going to capture
[00:19:14] minus 2 x y which is going to capture the third
[00:19:16] the third program statement
[00:19:18] program statement so an important but really subtle note
[00:19:21] so an important but really subtle note is that equals means two things here
[00:19:24] is that equals means two things here so in the python program
[00:19:26] so in the python program equal is an assignment operator it says
[00:19:29] equal is an assignment operator it says take the right-hand side compute its
[00:19:31] take the right-hand side compute its value and then put it in the variable
[00:19:34] value and then put it in the variable that is on the left-hand side
[00:19:36] that is on the left-hand side whereas in the csp
[00:19:38] whereas in the csp equal represents mathematical equality
[00:19:42] equal represents mathematical equality it's saying whether um the left-hand
[00:19:44] it's saying whether um the left-hand side is equal to the right-hand side
[00:19:48] side is equal to the right-hand side so
[00:19:49] so you'll remember what this factor don't
[00:19:52] you'll remember what this factor don't be c deceived what the
[00:19:54] be c deceived what the by the looks of this actor is actually a
[00:19:56] by the looks of this actor is actually a function
[00:19:58] function that takes in a value of a
[00:20:00] that takes in a value of a and a value of x and checks whether a
[00:20:03] and a value of x and checks whether a equals x squared it returns a 1 or a 0.
[00:20:06] equals x squared it returns a 1 or a 0. so it's doing checking whereas a equals
[00:20:10] so it's doing checking whereas a equals x
[00:20:10] x times x is doing assignment it's taking
[00:20:13] times x is doing assignment it's taking x squared and putting into a
[00:20:17] so now there's a final constraint for
[00:20:19] so now there's a final constraint for this specification and this is also kind
[00:20:22] this specification and this is also kind of interesting
[00:20:23] of interesting no note that we wanted to check that
[00:20:26] no note that we wanted to check that c is greater than 0 for all x and y but
[00:20:30] c is greater than 0 for all x and y but we're going to negate that here because
[00:20:32] we're going to negate that here because csvs were doing um only looking for an
[00:20:35] csvs were doing um only looking for an existence
[00:20:36] existence of a particular assignment and csps
[00:20:39] of a particular assignment and csps can't um
[00:20:40] can't um you know natively check all possible
[00:20:43] you know natively check all possible you know
[00:20:43] you know assignments in a sense
[00:20:45] assignments in a sense um so we're going to negate it so
[00:20:47] um so we're going to negate it so intuitively what this is doing is
[00:20:49] intuitively what this is doing is looking for a counter example it's going
[00:20:50] looking for a counter example it's going to say hey can we find a setting of x y
[00:20:55] to say hey can we find a setting of x y a b and c such that we are able to find
[00:20:58] a b and c such that we are able to find c less than or equal to 0.
[00:21:01] c less than or equal to 0. and
[00:21:02] and you know if we can
[00:21:04] you know if we can that means um the
[00:21:07] that means um the the specification doesn't hold there's a
[00:21:09] the specification doesn't hold there's a counter example
[00:21:11] counter example but if we are not able to find any
[00:21:13] but if we are not able to find any consistent
[00:21:14] consistent assignment if the csp is not satisfiable
[00:21:18] assignment if the csp is not satisfiable that means the program satisfies the
[00:21:21] that means the program satisfies the specification
[00:21:23] specification so it's maybe a little bit counter to it
[00:21:24] so it's maybe a little bit counter to it at first but we're proving
[00:21:27] at first but we're proving correctness
[00:21:28] correctness based on the fact that the csp has no
[00:21:30] based on the fact that the csp has no satisfying assignments
[00:21:33] satisfying assignments so one thing that's really kind of cool
[00:21:36] so one thing that's really kind of cool and interesting about
[00:21:38] and interesting about formulating the program as a csp and the
[00:21:41] formulating the program as a csp and the fact that this mathematical equality is
[00:21:44] fact that this mathematical equality is not uh it's bi-directional
[00:21:47] not uh it's bi-directional is that the csp can actually reason
[00:21:50] is that the csp can actually reason in no particular order it can look start
[00:21:53] in no particular order it can look start with this constraint c less than zero
[00:21:55] with this constraint c less than zero and it can work backwards it can look
[00:21:57] and it can work backwards it can look backwards through c b n a or it can look
[00:22:00] backwards through c b n a or it can look forwards
[00:22:01] forwards starting with x and y or it can do it in
[00:22:03] starting with x and y or it can do it in kind of a more sophisticated order
[00:22:06] kind of a more sophisticated order whereas if you were only to execute the
[00:22:08] whereas if you were only to execute the program you can only go forwards so this
[00:22:11] program you can only go forwards so this shows you kind of the flexibility and
[00:22:12] shows you kind of the flexibility and power of
[00:22:13] power of reasoning over programs using a
[00:22:15] reasoning over programs using a constrained satisfaction problem
[00:22:19] okay so we've presented a number of
[00:22:22] okay so we've presented a number of examples of real world problems and show
[00:22:24] examples of real world problems and show you how to formulate them as a csp or
[00:22:27] you how to formulate them as a csp or two
[00:22:29] two so how do you do it well first step is
[00:22:31] so how do you do it well first step is to decide on the variables and the
[00:22:33] to decide on the variables and the domains and you want to check that an
[00:22:36] domains and you want to check that an assignment to all these variables gives
[00:22:38] assignment to all these variables gives you
[00:22:39] you the result of interest
[00:22:41] the result of interest and then we take a look at all the
[00:22:43] and then we take a look at all the zerata the constraints and the
[00:22:45] zerata the constraints and the preferences the wishes
[00:22:47] preferences the wishes and translate them into a set of factors
[00:22:51] and translate them into a set of factors and the nice thing about csps is that
[00:22:53] and the nice thing about csps is that this process is often paralyzable
[00:22:56] this process is often paralyzable so if you have a set of disjointer
[00:22:58] so if you have a set of disjointer usually each digital rod term translates
[00:23:01] usually each digital rod term translates into a factor or a set of factors and
[00:23:03] into a factor or a set of factors and then at the end of the day you just
[00:23:05] then at the end of the day you just throw in all the factors into your csp
[00:23:09] so there are some
[00:23:12] so there are some notes to keep in mind when you're
[00:23:13] notes to keep in mind when you're designing a constraint satisfaction
[00:23:15] designing a constraint satisfaction problems you should keep the csps small
[00:23:17] problems you should keep the csps small so that they will be more
[00:23:18] so that they will be more computationally efficient to solve
[00:23:21] computationally efficient to solve so which means either having fewer
[00:23:24] so which means either having fewer variables uh fewer
[00:23:26] variables uh fewer factors a smaller number of uh domains
[00:23:30] factors a smaller number of uh domains um smaller number of errors you can't
[00:23:33] um smaller number of errors you can't make everything small but and there's
[00:23:35] make everything small but and there's various trade-offs
[00:23:36] various trade-offs exactly
[00:23:38] exactly what exactly is the recipe for uh
[00:23:40] what exactly is the recipe for uh computational efficiency really depends
[00:23:43] computational efficiency really depends on the problem there's no kind of
[00:23:46] on the problem there's no kind of general rule
[00:23:47] general rule so this is going to be a little bit of
[00:23:49] so this is going to be a little bit of art here
[00:23:51] art here and finally one just kind of reminder is
[00:23:54] and finally one just kind of reminder is that when you think about implementing
[00:23:57] that when you think about implementing each factor
[00:23:58] each factor you know it is true that each factor is
[00:24:01] you know it is true that each factor is itself a little mini program but you
[00:24:04] itself a little mini program but you should really think of in terms of
[00:24:05] should really think of in terms of checking a solution
[00:24:07] checking a solution checking whether assignment to the the
[00:24:10] checking whether assignment to the the variables of that factor
[00:24:12] variables of that factor encompasses is valid rather than trying
[00:24:15] encompasses is valid rather than trying to compute the solution
[00:24:17] to compute the solution so equals is mathematical equality
[00:24:19] so equals is mathematical equality rather than
[00:24:21] rather than assignment and this is really important
[00:24:24] assignment and this is really important and it takes a little bit of getting
[00:24:25] and it takes a little bit of getting used to because csps requires a
[00:24:27] used to because csps requires a fundamentally different mindset than
[00:24:29] fundamentally different mindset than normal
[00:24:30] normal kind of procedural programming
[00:24:32] kind of procedural programming which is most salient in the program
[00:24:34] which is most salient in the program verification example
[00:24:36] verification example um but you know hopefully after a bit of
[00:24:39] um but you know hopefully after a bit of practice you'll get um used to how
[00:24:41] practice you'll get um used to how thinking in terms of csps and hopefully
[00:24:43] thinking in terms of csps and hopefully will become more second nature
[00:24:46] will become more second nature all right so that's the end of this
[00:24:48] all right so that's the end of this module


================================================================================
LECTURE 026
================================================================================

Constraint Satisfaction Problems (CSPs) 4 - Dynamic Ordering | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=Lyu8VzbIe_A

---

Transcript

[00:00:06] hi in the previous module we looked at
[00:00:08] hi in the previous module we looked at modeling
[00:00:09] modeling using this module I'm going to start
[00:00:12] using this module I'm going to start talking about inference particular
[00:00:14] talking about inference particular introducing backtracking with dynamic
[00:00:19] introducing backtracking with dynamic or so just a quick refresher remember a
[00:00:24] or so just a quick refresher remember a CSP is defined by a factor graph which
[00:00:27] CSP is defined by a factor graph which has a set of
[00:00:28] has a set of variables n where each variable on some
[00:00:32] variables n where each variable on some values in domain of I and it also has
[00:00:36] values in domain of I and it also has some factors F1 through FM where each
[00:00:39] some factors F1 through FM where each factor FJ is a function that takes a
[00:00:43] factor FJ is a function that takes a subset of the variables and returns a
[00:00:46] subset of the variables and returns a non- negative
[00:00:47] non- negative quantity so the
[00:00:50] quantity so the assignment weight is defined as follows
[00:00:53] assignment weight is defined as follows so each
[00:00:54] so each assignment to all the variables has a
[00:00:57] assignment to all the variables has a weight which is given by the product of
[00:01:00] weight which is given by the product of all the factors and the goal in finding
[00:01:04] all the factors and the goal in finding this in solving CSP is to compute the
[00:01:07] this in solving CSP is to compute the maximum weight
[00:01:15] assignment so let's start with
[00:01:18] assignment so let's start with backtracking
[00:01:19] backtracking search um which we already talked about
[00:01:22] search um which we already talked about a little bit so uh the backtracking
[00:01:25] a little bit so uh the backtracking search is going to be the kind of
[00:01:26] search is going to be the kind of blueprint for the um current algorithm
[00:01:30] blueprint for the um current algorithm that we're going to talk
[00:01:31] that we're going to talk about so if we start with empty
[00:01:35] about so if we start with empty assignment no variable has
[00:01:37] assignment no variable has any and we uh choose one of the
[00:01:41] any and we uh choose one of the variables assign of a particular value
[00:01:44] variables assign of a particular value red in this case and then we recurse and
[00:01:47] red in this case and then we recurse and we pick another variable assign a value
[00:01:50] we pick another variable assign a value and recursing maybea and then we back up
[00:01:55] and recursing maybea and then we back up backtrack and uh Try Green um backtrack
[00:01:59] backtrack and uh Try Green um backtrack try
[00:02:00] try uh blue and we backtrack up here now
[00:02:03] uh blue and we backtrack up here now we're going to try setting W green down
[00:02:07] we're going to try setting W green down here explore this sub tree then we come
[00:02:10] here explore this sub tree then we come back up ex uh explore n green n blue and
[00:02:16] back up ex uh explore n green n blue and so so
[00:02:19] forth so at the bottom of this tree we
[00:02:23] forth so at the bottom of this tree we have the
[00:02:24] have the leaves and each
[00:02:27] leaves and each Leaf is a complete assignment and each
[00:02:30] Leaf is a complete assignment and each assignment um has a weight which we can
[00:02:34] assignment um has a weight which we can compute and now if once we search
[00:02:37] compute and now if once we search through all the assignments we simply
[00:02:39] through all the assignments we simply take the assignment with the
[00:02:42] take the assignment with the m all right so this is the most
[00:02:44] m all right so this is the most straightforward way of taking a CSP and
[00:02:47] straightforward way of taking a CSP and solving it using backtracking
[00:02:51] solving it using backtracking search so the first thing we'll note is
[00:02:53] search so the first thing we'll note is that we can actually compute weight of
[00:02:58] that we can actually compute weight of partial assignments as we go rather than
[00:03:00] partial assignments as we go rather than waiting until the very end to compute
[00:03:02] waiting until the very end to compute the weight of an entire assignment so
[00:03:05] the weight of an entire assignment so this is how it's going to
[00:03:07] this is how it's going to proceed so um let's start with empty
[00:03:11] proceed so um let's start with empty assignment and we uh assign wa
[00:03:15] assignment and we uh assign wa red
[00:03:16] red and we can't evaluate any of the factors
[00:03:19] and we can't evaluate any of the factors so far um but once we assign
[00:03:23] so far um but once we assign NT we can actually evaluate this Factor
[00:03:27] NT we can actually evaluate this Factor uh and test whether wa equal n
[00:03:30] uh and test whether wa equal n so these other factors we can't evaluate
[00:03:32] so these other factors we can't evaluate yet because we don't know these values
[00:03:34] yet because we don't know these values but values to these variables but we can
[00:03:37] but values to these variables but we can move on um now we recurse we assign saay
[00:03:41] move on um now we recurse we assign saay value and now we can assign we can
[00:03:44] value and now we can assign we can evaluate these
[00:03:45] evaluate these factors um W not equal sa and NT not
[00:03:50] factors um W not equal sa and NT not equal then we assign Q we can evaluate
[00:03:55] equal then we assign Q we can evaluate these two factors sign NS w we can
[00:03:58] these two factors sign NS w we can evaluate these two factors
[00:04:01] evaluate these two factors and assign V we can
[00:04:04] and assign V we can now evaluate these two factors and these
[00:04:07] now evaluate these two factors and these are all the factors in
[00:04:10] are all the factors in CSP so at any point in time for example
[00:04:13] CSP so at any point in time for example at
[00:04:14] at nssw we have this partial assignment
[00:04:18] nssw we have this partial assignment here and we Define the weight of that
[00:04:21] here and we Define the weight of that partial assignment to be the product of
[00:04:24] partial assignment to be the product of all the factors that we can
[00:04:27] all the factors that we can evaluate where all the factor factors
[00:04:30] evaluate where all the factor factors are evalu are evaluable if all the
[00:04:35] are evalu are evaluable if all the variables scope of that factor have been
[00:04:41] set so more formally suppose we have a
[00:04:45] set so more formally suppose we have a partial assignment
[00:04:48] partial assignment and as follows we're going to define the
[00:04:51] and as follows we're going to define the set of dependent factors as follows so D
[00:04:55] set of dependent factors as follows so D of a partial assignment X and a
[00:04:58] of a partial assignment X and a particular variable X I is the set of
[00:05:01] particular variable X I is the set of factors depending on x i and X but not
[00:05:05] factors depending on x i and X but not on the unassigned variable so for
[00:05:08] on the unassigned variable so for example D of this partial assignment up
[00:05:12] example D of this partial assignment up here and um this variable
[00:05:15] here and um this variable sa are simply these two factors
[00:05:21] here these are the factors that are
[00:05:23] here these are the factors that are going to be multiplied in when x i is
[00:05:28] going to be multiplied in when x i is set okay so now we're ready to present
[00:05:31] set okay so now we're ready to present our main backtracking search algorithm
[00:05:35] our main backtracking search algorithm okay so this is going to be a kind of
[00:05:36] okay so this is going to be a kind of General blueprint for many of the bells
[00:05:40] General blueprint for many of the bells and whistles that we're going to talk
[00:05:41] and whistles that we're going to talk about uh soon backtrack takes a partial
[00:05:45] about uh soon backtrack takes a partial assignment X and the weight of that
[00:05:48] assignment X and the weight of that partial assignment which is all the
[00:05:50] partial assignment which is all the factors that we can evaluate so far and
[00:05:53] factors that we can evaluate so far and domains which specifies valid uh
[00:05:57] domains which specifies valid uh possible values for each of the
[00:05:59] possible values for each of the variables in the CSP more on this in a
[00:06:03] variables in the CSP more on this in a bit so if x is a complete assignment
[00:06:06] bit so if x is a complete assignment then we have reached the leaf and we
[00:06:09] then we have reached the leaf and we look at its weight and we update um our
[00:06:13] look at its weight and we update um our current best and we
[00:06:16] current best and we return if not we're going to choose an
[00:06:19] return if not we're going to choose an unassigned variable
[00:06:21] unassigned variable XI we're going to look at the values in
[00:06:25] XI we're going to look at the values in the domain I of XI and order them
[00:06:30] the domain I of XI and order them somehow we're going to go through each
[00:06:33] somehow we're going to go through each value in that
[00:06:34] value in that order and we're going to compute a
[00:06:37] order and we're going to compute a weight um
[00:06:39] weight um update so we're going to look at this
[00:06:42] update so we're going to look at this assignment which is the X extended with
[00:06:46] assignment which is the X extended with XI set to V and then we're going to look
[00:06:49] XI set to V and then we're going to look at all the factors in the dependent set
[00:06:52] at all the factors in the dependent set of factors of X the partial assignment
[00:06:55] of factors of X the partial assignment and new variable XI that we're going to
[00:06:58] and new variable XI that we're going to assign vol multiply all the factors um
[00:07:02] assign vol multiply all the factors um evaluated at this extended assignment so
[00:07:05] evaluated at this extended assignment so that number we're going to call
[00:07:07] that number we're going to call Delta which is going to be the update on
[00:07:10] Delta which is going to be the update on W okay so if Delta equals zero then we
[00:07:16] W okay so if Delta equals zero then we uh stop there and don't recurse further
[00:07:19] uh stop there and don't recurse further because
[00:07:20] because remember any zero uh by a factor is
[00:07:24] remember any zero uh by a factor is enough to zero out the um this uh the
[00:07:28] enough to zero out the um this uh the weight of their given assignment
[00:07:31] weight of their given assignment assignment so if not then we continue
[00:07:34] assignment so if not then we continue we're going to do this thing called look
[00:07:36] we're going to do this thing called look ahead which takes the domains and tries
[00:07:39] ahead which takes the domains and tries to reduce them tries to prune away
[00:07:42] to reduce them tries to prune away Things based
[00:07:45] Things based on this new assignment XI
[00:07:49] on this new assignment XI to so now if any of these domains become
[00:07:52] to so now if any of these domains become empty then we can again prune and stop
[00:07:56] empty then we can again prune and stop recursing otherwise we going to recurse
[00:07:59] recursing otherwise we going to recurse and Backtrack on this extended
[00:08:02] and Backtrack on this extended assignment with this updated weight w *
[00:08:06] assignment with this updated weight w * Delta and the new domains that we've uh
[00:08:09] Delta and the new domains that we've uh ConEd
[00:08:11] ConEd ahead okay so this recipe has three
[00:08:15] ahead okay so this recipe has three Choice points how to choose the
[00:08:18] Choice points how to choose the unassigned variable how to order the
[00:08:20] unassigned variable how to order the values of the assigned unassigned
[00:08:23] values of the assigned unassigned variable and finally this look ahead
[00:08:26] variable and finally this look ahead which is how we prune the so we're going
[00:08:29] which is how we prune the so we're going to talk about each of these in turn
[00:08:32] to talk about each of these in turn starting with the look
[00:08:36] ahead so we're going to introduce a
[00:08:39] ahead so we're going to introduce a simple form of look ahead called forward
[00:08:41] simple form of look ahead called forward checking okay so first we're going to
[00:08:45] checking okay so first we're going to visualize the domains of each of the
[00:08:48] visualize the domains of each of the variables with this um set of valid
[00:08:50] variables with this um set of valid colors above the respective so in the
[00:08:53] colors above the respective so in the empty assignment all the values are
[00:08:57] empty assignment all the values are allowed okay so now um we're going to um
[00:09:02] allowed okay so now um we're going to um set let's say wa equals um red so at
[00:09:06] set let's say wa equals um red so at this point two things happen first we
[00:09:09] this point two things happen first we wipe out the all the other values from
[00:09:13] wipe out the all the other values from that variable which is clear that W
[00:09:16] that variable which is clear that W we're committing to that but in addition
[00:09:19] we're committing to that but in addition what we're going to do is do one step
[00:09:21] what we're going to do is do one step look ahead forward checking so we're
[00:09:24] look ahead forward checking so we're going to eliminate the inconsistent
[00:09:26] going to eliminate the inconsistent values from the domains of xi's
[00:09:28] values from the domains of xi's Neighbors so in this casee we're going
[00:09:30] Neighbors so in this casee we're going to look at the neighbors of wa which are
[00:09:33] to look at the neighbors of wa which are ntns and we're going to
[00:09:35] ntns and we're going to remove red from those domains and why is
[00:09:39] remove red from those domains and why is that because this Factor says that well
[00:09:42] that because this Factor says that well if this is red then this can't be red so
[00:09:45] if this is red then this can't be red so red is gone
[00:09:46] red is gone now okay so now uh backtracking search
[00:09:50] now okay so now uh backtracking search is going to recurse and it's going to
[00:09:53] is going to recurse and it's going to let's say it sets a n to
[00:09:56] let's say it sets a n to Green um so now again I do one step look
[00:09:59] Green um so now again I do one step look Ahad look at the neighbors of ENT and
[00:10:01] Ahad look at the neighbors of ENT and I'm going to wipe out green from
[00:10:06] I'm going to wipe out green from these okay so suppose I recurse again
[00:10:09] these okay so suppose I recurse again and now I set um Q to be
[00:10:14] and now I set um Q to be blue so again one step look ahead I'm
[00:10:17] blue so again one step look ahead I'm going to wipe out Blue from my
[00:10:21] going to wipe out Blue from my neighbors now look what happens essay
[00:10:24] neighbors now look what happens essay has an empty
[00:10:26] has an empty domain which means that there are no
[00:10:29] domain which means that there are no Poss possible values that I can set
[00:10:31] Poss possible values that I can set essay to to make the assignment uh
[00:10:36] essay to to make the assignment uh consistent so when in this case if any
[00:10:39] consistent so when in this case if any domain becomes empty I simply return
[00:10:42] domain becomes empty I simply return here and this is important because now
[00:10:46] here and this is important because now all these other variables have not been
[00:10:48] all these other variables have not been set yet and rather than recursing and
[00:10:51] set yet and rather than recursing and trying to set them all sorts of
[00:10:52] trying to set them all sorts of different ways I already know at this
[00:10:55] different ways I already know at this point that essay is not setable so I
[00:10:59] point that essay is not setable so I just stop there so this allows forward
[00:11:02] just stop there so this allows forward checking allows me to use these domains
[00:11:06] checking allows me to use these domains to
[00:11:09] prun okay so forward checking is also
[00:11:12] prun okay so forward checking is also going to allow me to um choose a
[00:11:15] going to allow me to um choose a unassigned variable and order the values
[00:11:18] unassigned variable and order the values in a variable I'll show um follows so
[00:11:22] in a variable I'll show um follows so suppose we're in this situation so wa
[00:11:24] suppose we're in this situation so wa and N have been set and I've applied
[00:11:26] and N have been set and I've applied forward checking to um propagate the
[00:11:29] forward checking to um propagate the constraint to the all
[00:11:31] constraint to the all theor so now the question is which
[00:11:34] theor so now the question is which variable do I assign next so there is
[00:11:37] variable do I assign next so there is this horis called most constrained
[00:11:40] this horis called most constrained variable
[00:11:41] variable MCB which simply chooses the variable
[00:11:44] MCB which simply chooses the variable that has the smallest doain okay so um
[00:11:49] that has the smallest doain okay so um what's uh the domain size here so
[00:11:51] what's uh the domain size here so there's two elements of Q three elements
[00:11:55] there's two elements of Q three elements here one element here so SA is the
[00:11:59] here one element here so SA is the variable that has the smallest domain it
[00:12:02] variable that has the smallest domain it has only one element and the intuition
[00:12:05] has only one element and the intuition here is I want to restrict the branching
[00:12:07] here is I want to restrict the branching factor and choose variables that have
[00:12:10] factor and choose variables that have small um
[00:12:12] small um branching determined by number of
[00:12:15] branching determined by number of elements in that
[00:12:18] domain so the
[00:12:21] domain so the second choice point is once I've
[00:12:24] second choice point is once I've selected a variable how do I order the
[00:12:27] selected a variable how do I order the values to explore
[00:12:30] values to explore so consider the following so I'm trying
[00:12:32] so consider the following so I'm trying to assign a value to Q do I first try
[00:12:36] to assign a value to Q do I first try red or do I try
[00:12:40] red or do I try blue so the idea behind this herisa
[00:12:43] blue so the idea behind this herisa called least constraint value is I'm
[00:12:46] called least constraint value is I'm going to order the values of um the
[00:12:51] going to order the values of um the selected x i by decreasing number of
[00:12:55] selected x i by decreasing number of consistent values of neighboring VAR
[00:12:57] consistent values of neighboring VAR okay so what does this mean
[00:12:59] okay so what does this mean on this example so I look at
[00:13:02] on this example so I look at Q and remember I set this to Red
[00:13:06] Q and remember I set this to Red tentatively and I propagate um via
[00:13:10] tentatively and I propagate um via forward checking to its neighbor so I
[00:13:11] forward checking to its neighbor so I wiped out red here and now I look at the
[00:13:14] wiped out red here and now I look at the neighbors and say how many possible uh
[00:13:18] neighbors and say how many possible uh consistent values are there so there are
[00:13:20] consistent values are there so there are two plus two plus two so that's uh six
[00:13:24] two plus two plus two so that's uh six values and what about if I set it to
[00:13:26] values and what about if I set it to Blue here and I've eliminate blue B from
[00:13:30] Blue here and I've eliminate blue B from these neighbors and the number of
[00:13:32] these neighbors and the number of consistent values is 1 + 1 + 2 which is
[00:13:37] consistent values is 1 + 1 + 2 which is four here six is larger than four so I'm
[00:13:40] four here six is larger than four so I'm going to try red this Cas so intuitively
[00:13:44] going to try red this Cas so intuitively why does this make sense I want to
[00:13:47] why does this make sense I want to choose a value that gives as much
[00:13:50] choose a value that gives as much Freedom as possible to its neighbors so
[00:13:53] Freedom as possible to its neighbors so that I don't and run into trouble and
[00:13:56] that I don't and run into trouble and get things to be inconsistent here and
[00:13:59] get things to be inconsistent here and you can see that by having red here and
[00:14:04] you can see that by having red here and red here I have more options for the
[00:14:07] red here I have more options for the Neighbors NTN essay then over
[00:14:10] Neighbors NTN essay then over here if since I can only do green here
[00:14:14] here if since I can only do green here and you can if you look even one step
[00:14:17] and you can if you look even one step ahead you'll notice that this is already
[00:14:19] ahead you'll notice that this is already going to cause
[00:14:21] going to cause trouble okay so least constrainted value
[00:14:25] trouble okay so least constrainted value order the values to in order to free up
[00:14:29] order the values to in order to free up the neighbors as much as
[00:14:32] possible
[00:14:34] possible so this might seem a little bit strange
[00:14:37] so this might seem a little bit strange so most constrained variable least
[00:14:39] so most constrained variable least constrained value these seem
[00:14:41] constrained value these seem superficially kind of at odds with each
[00:14:44] superficially kind of at odds with each other but there is a reasoning which is
[00:14:49] other but there is a reasoning which is that variables and values are very
[00:14:52] that variables and values are very different so in a CSP every variable
[00:14:56] different so in a CSP every variable must be assigned so we don't we can't
[00:14:58] must be assigned so we don't we can't left of leave a variable alone and hope
[00:15:01] left of leave a variable alone and hope that the problem uh will disappear later
[00:15:05] that the problem uh will disappear later um so what we're going to do is we're
[00:15:07] um so what we're going to do is we're going to try to choose the most
[00:15:08] going to try to choose the most constrained variables as possible so if
[00:15:11] constrained variables as possible so if we're going to fail we're going to
[00:15:12] we're going to fail we're going to choose the hardest variables first to
[00:15:15] choose the hardest variables first to try and we might as well fail early
[00:15:17] try and we might as well fail early which leads to more
[00:15:19] which leads to more pruning on the other hand values
[00:15:24] pruning on the other hand values are those that we're going to Cho we
[00:15:26] are those that we're going to Cho we just need to choose some value for each
[00:15:28] just need to choose some value for each variable
[00:15:30] variable so what we're going to try to do is
[00:15:31] so what we're going to try to do is choose the value that is most likely to
[00:15:34] choose the value that is most likely to lead uh to a
[00:15:36] lead uh to a solution it doesn't matter if some value
[00:15:40] solution it doesn't matter if some value is going to cause trouble because if we
[00:15:43] is going to cause trouble because if we choose a value that happens to work then
[00:15:46] choose a value that happens to work then maybe we'll be
[00:15:49] maybe we'll be happy so when do these hortic help kind
[00:15:52] happy so when do these hortic help kind of
[00:15:52] of formul the most constrained variable is
[00:15:55] formul the most constrained variable is useful when some of the factors are
[00:15:58] useful when some of the factors are constrained
[00:16:00] constrained um it's okay if some of the factors are
[00:16:02] um it's okay if some of the factors are not constrained um but it's important
[00:16:07] not constrained um but it's important that at least uh one of the factors is a
[00:16:10] that at least uh one of the factors is a constraint which means that it's
[00:16:13] constraint which means that it's something that returns uh can return
[00:16:15] something that returns uh can return zero if all the factors are returning
[00:16:18] zero if all the factors are returning non zero values then none of these
[00:16:21] non zero values then none of these teristics are going to be helpful you
[00:16:23] teristics are going to be helpful you kind of have to explore
[00:16:26] kind of have to explore everything this constraint value is
[00:16:28] everything this constraint value is useful when all the factors are
[00:16:31] useful when all the factors are constraints in other words the Simon
[00:16:34] constraints in other words the Simon weights are one and or zero so they have
[00:16:37] weights are one and or zero so they have to look like this but not like this
[00:16:40] to look like this but not like this factor and the rationale here is
[00:16:45] factor and the rationale here is that we have
[00:16:48] that we have to uh we don't have to um find all of
[00:16:53] to uh we don't have to um find all of the assignments in this case if we find
[00:16:56] the assignments in this case if we find an assignment that has weight one
[00:16:59] an assignment that has weight one then we know we're done because one is
[00:17:02] then we know we're done because one is the maximum weight possible and we just
[00:17:04] the maximum weight possible and we just return immediately kind of stop
[00:17:07] return immediately kind of stop early but if there's other factors which
[00:17:10] early but if there's other factors which have um varying
[00:17:13] have um varying values oh VAR weights of different
[00:17:15] values oh VAR weights of different magnitudes then we can't necessarily
[00:17:17] magnitudes then we can't necessarily stop if we find a weight of two well
[00:17:20] stop if we find a weight of two well maybe there's a weight another
[00:17:21] maybe there's a weight another assignment that has a weight of four or
[00:17:22] assignment that has a weight of four or eight or 17 or so on we don't really we
[00:17:26] eight or 17 or so on we don't really we can't really stop early enough
[00:17:30] can't really stop early enough and notice that we need forward checking
[00:17:33] and notice that we need forward checking uh to make both of these things work
[00:17:35] uh to make both of these things work because these horis sixs rely on
[00:17:38] because these horis sixs rely on Counting the number of elements in
[00:17:40] Counting the number of elements in domains and so we need to groom or prune
[00:17:42] domains and so we need to groom or prune these domains so that theistic
[00:17:47] are okay so let's conclude here so we
[00:17:51] are okay so let's conclude here so we presented backtracking search so
[00:17:54] presented backtracking search so backtracking search has uh three Choice
[00:17:57] backtracking search has uh three Choice points first we need to choose an
[00:18:00] points first we need to choose an unassigned variable XI this is done by
[00:18:04] unassigned variable XI this is done by using um most constrain variable
[00:18:08] using um most constrain variable MCV once we found a variable to assign
[00:18:12] MCV once we found a variable to assign we're going to order the values of that
[00:18:16] we're going to order the values of that unassigned variable based on the lcv
[00:18:20] unassigned variable based on the lcv heris or least constrain
[00:18:23] heris or least constrain value and then we're going
[00:18:26] value and then we're going to um compute the updated weight as we
[00:18:30] to um compute the updated weight as we discussed before and then we're going to
[00:18:34] discussed before and then we're going to update the
[00:18:35] update the domains uh via one-step look ahead AKA
[00:18:38] domains uh via one-step look ahead AKA forward
[00:18:39] forward checking and then if the number of uh
[00:18:43] checking and then if the number of uh elements in any domain is zero then we
[00:18:46] elements in any domain is zero then we stop there and recurse don't recurse
[00:18:48] stop there and recurse don't recurse otherwise
[00:18:50] otherwise we okay so notice that none of these her
[00:18:52] we okay so notice that none of these her istics is guaranteed to speed up
[00:18:55] istics is guaranteed to speed up backtracking search there's no uh Theory
[00:18:58] backtracking search there's no uh Theory here but often in practice these can
[00:19:01] here but often in practice these can make a big difference so next time we'll
[00:19:04] make a big difference so next time we'll look at the look ahead and see how we
[00:19:07] look at the look ahead and see how we can even improve upon over we checking
[00:19:10] can even improve upon over we checking so that's it


================================================================================
LECTURE 027
================================================================================

Constraint Satisfaction Problems (CSPs) 5 - Arc Consistency | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=5rlIYGJdPy4

---

Transcript

[00:00:05] hi in this module i'm going to be
[00:00:07] hi in this module i'm going to be talking about the notion of our
[00:00:09] talking about the notion of our consistency this is going to lead us to
[00:00:11] consistency this is going to lead us to a look at algorithm called ac3 which is
[00:00:14] a look at algorithm called ac3 which is going to enable us to prune domains much
[00:00:17] going to enable us to prune domains much more aggressively than before in the
[00:00:19] more aggressively than before in the context of backtracking search
[00:00:21] context of backtracking search let's begin
[00:00:23] let's begin first i want to review backtracking
[00:00:25] first i want to review backtracking search
[00:00:26] search so backtracking search is recursive
[00:00:28] so backtracking search is recursive procedure
[00:00:30] procedure where
[00:00:31] where takes
[00:00:32] takes a partial assignment x
[00:00:34] a partial assignment x its weight
[00:00:36] its weight and the domains of each of the variables
[00:00:38] and the domains of each of the variables in the csp
[00:00:40] in the csp if all the variables have already been
[00:00:43] if all the variables have already been assigned in x
[00:00:45] assigned in x then we just
[00:00:46] then we just see if it's better than the best
[00:00:48] see if it's better than the best assignment we've seen so far and if so
[00:00:50] assignment we've seen so far and if so updated and then we return this is the
[00:00:54] updated and then we return this is the base case
[00:00:55] base case otherwise we're going to choose an
[00:00:58] otherwise we're going to choose an unassigned variable x i
[00:01:00] unassigned variable x i we're going to look at all the values in
[00:01:03] we're going to look at all the values in the domain of x i and order them
[00:01:06] the domain of x i and order them according to some heuristic lcv
[00:01:09] according to some heuristic lcv and now we're going to step through each
[00:01:11] and now we're going to step through each of the values v
[00:01:13] of the values v in that order
[00:01:14] in that order we're going to compute
[00:01:17] we're going to compute the weight update
[00:01:19] the weight update based on the xi's
[00:01:22] based on the xi's being set to v
[00:01:24] being set to v and if this is zero then we can just
[00:01:26] and if this is zero then we can just stop recursing right there
[00:01:28] stop recursing right there um otherwise we're going to
[00:01:30] um otherwise we're going to use this updated uh
[00:01:33] use this updated uh assignment
[00:01:34] assignment to
[00:01:36] to um as an input into the look at head
[00:01:38] um as an input into the look at head algorithm to reduce the domains
[00:01:41] algorithm to reduce the domains and now if any of the domains become
[00:01:43] and now if any of the domains become empty then again we stop recursing
[00:01:46] empty then again we stop recursing otherwise we recurse
[00:01:49] otherwise we recurse so last time we talked about
[00:01:51] so last time we talked about the heuristics for choosing unassigned
[00:01:53] the heuristics for choosing unassigned variable
[00:01:54] variable ordering the values these are the mcb
[00:01:57] ordering the values these are the mcb and lcv heuristics and then we looked at
[00:02:00] and lcv heuristics and then we looked at forward checking which was a one step
[00:02:02] forward checking which was a one step look at now we're going to
[00:02:04] look at now we're going to upgrade that to ac3
[00:02:08] upgrade that to ac3 so before we get into ac3 i need to talk
[00:02:11] so before we get into ac3 i need to talk about our consistency using us let's use
[00:02:13] about our consistency using us let's use a simple example
[00:02:15] a simple example so suppose we have just two
[00:02:18] so suppose we have just two variables x i and xj xi can be one two
[00:02:22] variables x i and xj xi can be one two three four or five and x j can be one or
[00:02:25] three four or five and x j can be one or two
[00:02:27] two so
[00:02:28] so and x i and x j are related be via a
[00:02:33] and x i and x j are related be via a single factor which says that their sum
[00:02:35] single factor which says that their sum must equal 4 exactly
[00:02:39] must equal 4 exactly so what does it mean to enforce our
[00:02:41] so what does it mean to enforce our consistency on let's say x i
[00:02:45] consistency on let's say x i this means i'm going to go through each
[00:02:47] this means i'm going to go through each of the values in the domain of x i and
[00:02:50] of the values in the domain of x i and try to eliminate it if eliminated if it
[00:02:54] try to eliminate it if eliminated if it can't be satisfied by any value in xj's
[00:02:58] can't be satisfied by any value in xj's domain okay so let's try this so look at
[00:03:02] domain okay so let's try this so look at one
[00:03:02] one does there exist any possible setting of
[00:03:05] does there exist any possible setting of x j so that i can do one plus something
[00:03:08] x j so that i can do one plus something to get four
[00:03:10] to get four one plus one is not four one plus two is
[00:03:12] one plus one is not four one plus two is not four so therefore one is just
[00:03:14] not four so therefore one is just impossible
[00:03:15] impossible without even knowing the value of xj so
[00:03:18] without even knowing the value of xj so let me eliminate it
[00:03:20] let me eliminate it what about 2
[00:03:21] what about 2 well i can set xj to 2
[00:03:24] well i can set xj to 2 to get 4 so that's okay
[00:03:27] to get 4 so that's okay notice that it's fine that 1
[00:03:31] notice that it's fine that 1 plus two isn't four it just matters that
[00:03:34] plus two isn't four it just matters that there exists
[00:03:35] there exists one of the values in xj that work so
[00:03:38] one of the values in xj that work so let's leave two alone so what about
[00:03:40] let's leave two alone so what about three
[00:03:41] three well three plus one is four so that's
[00:03:43] well three plus one is four so that's okay too
[00:03:44] okay too what about four
[00:03:46] what about four i can't add four to one or two to get
[00:03:48] i can't add four to one or two to get four so
[00:03:49] four so that gets eliminated and same with five
[00:03:53] that gets eliminated and same with five so in the end
[00:03:55] so in the end enforcing our consistency on x i results
[00:03:58] enforcing our consistency on x i results in a smaller domain which only consists
[00:04:00] in a smaller domain which only consists of two and three
[00:04:03] of two and three so notice i can eliminate values without
[00:04:06] so notice i can eliminate values without even knowing what
[00:04:08] even knowing what the exact value of xj is
[00:04:13] so more formally our consistency
[00:04:16] so more formally our consistency is a property which i'll explain
[00:04:18] is a property which i'll explain so a variable x i
[00:04:21] so a variable x i is are consistent with respect to
[00:04:23] is are consistent with respect to another variable xj if for each
[00:04:27] another variable xj if for each value
[00:04:28] value in the domain of x x i
[00:04:31] in the domain of x x i there exists some other value
[00:04:34] there exists some other value in the domain of x j such that
[00:04:37] in the domain of x j such that essentially all the factors check out so
[00:04:40] essentially all the factors check out so formally what that means is that if you
[00:04:41] formally what that means is that if you look at all the factors
[00:04:44] look at all the factors whose scope contains x i and x j and you
[00:04:47] whose scope contains x i and x j and you evaluate that factor
[00:04:49] evaluate that factor on x i
[00:04:51] on x i x j
[00:04:52] x j then you get something that's not zero
[00:04:56] then you get something that's not zero okay so an enforcing our consistency
[00:04:59] okay so an enforcing our consistency is a procedure that takes two variables
[00:05:03] is a procedure that takes two variables and just simply removes the values from
[00:05:06] and just simply removes the values from domain i
[00:05:08] domain i to make x i are consistent with respect
[00:05:11] to make x i are consistent with respect to xj
[00:05:12] to xj exactly what we did on the example on
[00:05:15] exactly what we did on the example on the previous slide
[00:05:18] so let's
[00:05:20] so let's revisit the australia example
[00:05:23] revisit the australia example and apply
[00:05:26] and apply ac3
[00:05:27] ac3 okay
[00:05:28] okay so
[00:05:29] so here is the empty assignment
[00:05:32] here is the empty assignment and here are all the domains of each of
[00:05:34] and here are all the domains of each of the variables
[00:05:36] the variables so let's suppose we set wa
[00:05:39] so let's suppose we set wa to be red okay so as before we eliminate
[00:05:43] to be red okay so as before we eliminate um the other
[00:05:46] um the other uh
[00:05:47] uh values from was domain of course
[00:05:50] values from was domain of course and then we enforce our consistency on
[00:05:54] and then we enforce our consistency on the neighbors of wa in this case ntnsa
[00:05:58] the neighbors of wa in this case ntnsa so out goes red on both of these
[00:06:02] so out goes red on both of these um and now we
[00:06:03] um and now we continue try to enforce our consistency
[00:06:07] continue try to enforce our consistency on the neighbors of ntnsa
[00:06:09] on the neighbors of ntnsa but in this case i can't actually
[00:06:11] but in this case i can't actually eliminate anything
[00:06:13] eliminate anything so
[00:06:14] so now
[00:06:15] now we're going to recurse
[00:06:19] we're going to recurse and suppose now in the next level of
[00:06:21] and suppose now in the next level of backtracking we assign nt green
[00:06:25] backtracking we assign nt green so now again
[00:06:27] so now again we're going to
[00:06:30] apply
[00:06:32] apply enforce our consistency on the neighbors
[00:06:34] enforce our consistency on the neighbors of nt
[00:06:35] of nt so that will eliminate green
[00:06:37] so that will eliminate green from these two
[00:06:39] from these two so notice that one step
[00:06:41] so notice that one step should look very very familiar this is
[00:06:43] should look very very familiar this is exactly forward checking
[00:06:46] exactly forward checking but ac3 doesn't stop there and then it
[00:06:49] but ac3 doesn't stop there and then it says enforce our consistency on the
[00:06:51] says enforce our consistency on the neighbors of q and sa okay so let's uh
[00:06:55] neighbors of q and sa okay so let's uh enforce our consistency on the neighbors
[00:06:57] enforce our consistency on the neighbors of sa
[00:06:58] of sa that eliminates
[00:06:59] that eliminates blue from
[00:07:01] blue from its neighbors
[00:07:03] its neighbors and now let's enforce our consistency on
[00:07:05] and now let's enforce our consistency on the neighbors of q
[00:07:06] the neighbors of q so that eliminates red
[00:07:09] so that eliminates red from the neighbors
[00:07:11] from the neighbors and now let's enforce our consistency on
[00:07:13] and now let's enforce our consistency on the neighbors
[00:07:14] the neighbors of nsw
[00:07:17] of nsw well that eliminates green
[00:07:20] well that eliminates green and at this point
[00:07:21] and at this point now we're done
[00:07:23] now we're done so notice what happened
[00:07:27] each of these domains is only left with
[00:07:29] each of these domains is only left with one value
[00:07:31] one value so even though we're still in the
[00:07:33] so even though we're still in the context of backtracking search at nt and
[00:07:37] context of backtracking search at nt and we're still trying to figure out what to
[00:07:38] we're still trying to figure out what to do with nt by looking ahead we've
[00:07:41] do with nt by looking ahead we've actually
[00:07:43] actually seen what
[00:07:44] seen what values are even possible
[00:07:47] values are even possible and
[00:07:48] and we essentially solve the problem so now
[00:07:51] we essentially solve the problem so now formally we haven't set these values yet
[00:07:53] formally we haven't set these values yet we just eliminated their domains
[00:07:56] we just eliminated their domains but backtracking search
[00:07:58] but backtracking search uh recurs
[00:07:59] uh recurs recursing on the rest of these values
[00:08:01] recursing on the rest of these values should be really a walk in the park you
[00:08:03] should be really a walk in the park you go into essay and you said set it to
[00:08:05] go into essay and you said set it to blue set uh q to red
[00:08:08] blue set uh q to red um and a sub u to green and v to red and
[00:08:12] um and a sub u to green and v to red and you're done
[00:08:13] you're done so this shows you the power of the ac3
[00:08:16] so this shows you the power of the ac3 with one fell swoop it basically can
[00:08:18] with one fell swoop it basically can clean out a lot of the domains and
[00:08:20] clean out a lot of the domains and reveal kind of what the actual
[00:08:24] reveal kind of what the actual assignments
[00:08:25] assignments values are possible here
[00:08:30] so here is ac3
[00:08:33] so here is ac3 more formally so remember forward
[00:08:35] more formally so remember forward checking
[00:08:37] checking is what you do is when you assign
[00:08:41] is what you do is when you assign the variable xj to some value xj
[00:08:44] the variable xj to some value xj literally xj
[00:08:45] literally xj you set the domain
[00:08:48] you set the domain to only include that value and then you
[00:08:50] to only include that value and then you enforce our consistency on the neighbors
[00:08:53] enforce our consistency on the neighbors of
[00:08:54] of neighbors x i with respect to x j
[00:08:58] neighbors x i with respect to x j okay so here's a picture so you're
[00:09:00] okay so here's a picture so you're setting x j
[00:09:02] setting x j and then you consider all the neighbors
[00:09:03] and then you consider all the neighbors of x j
[00:09:05] of x j for example x i and then you enforce our
[00:09:07] for example x i and then you enforce our consistency on x i so you try to
[00:09:10] consistency on x i so you try to propagate what you know about xj to xi
[00:09:13] propagate what you know about xj to xi and try to eliminate excise domains
[00:09:17] and try to eliminate excise domains so now ac3 just repeatedly enforces our
[00:09:20] so now ac3 just repeatedly enforces our consistency and there's nothing left to
[00:09:22] consistency and there's nothing left to do
[00:09:23] do so here's the algorithm
[00:09:26] so here's the algorithm we're going to maintain a working set
[00:09:29] we're going to maintain a working set of variables that we need to go process
[00:09:33] of variables that we need to go process so we start with xj which is
[00:09:35] so we start with xj which is the variable that we just assigned
[00:09:37] the variable that we just assigned and while there's still
[00:09:39] and while there's still variables to process we're going to just
[00:09:41] variables to process we're going to just remove any xj from
[00:09:44] remove any xj from s this order doesn't really matter here
[00:09:46] s this order doesn't really matter here and then for each of the neighbors x i
[00:09:49] and then for each of the neighbors x i of x
[00:09:50] of x x j
[00:09:51] x j we're going to enforce our consistency
[00:09:53] we're going to enforce our consistency on that neighbor with respect to x j so
[00:09:56] on that neighbor with respect to x j so propagate the constraints out
[00:09:59] propagate the constraints out and now if the domain
[00:10:01] and now if the domain of x i changed then we're going to add x
[00:10:05] of x i changed then we're going to add x i to s because we know more about x i
[00:10:08] i to s because we know more about x i now and we can hopefully propagate the
[00:10:12] now and we can hopefully propagate the information farther to its neighbors
[00:10:15] information farther to its neighbors so notice that
[00:10:18] a variable could be revisited multiple
[00:10:21] a variable could be revisited multiple times so this is kind of like breadth
[00:10:23] times so this is kind of like breadth first search with exception that
[00:10:26] first search with exception that you might visit a node
[00:10:28] you might visit a node more than once because you might
[00:10:30] more than once because you might propagate some value to another neighbor
[00:10:33] propagate some value to another neighbor and that
[00:10:34] and that uh value might be
[00:10:36] uh value might be constrained something else and then you
[00:10:38] constrained something else and then you might get more additional information
[00:10:40] might get more additional information back um and this can kind of go on for a
[00:10:43] back um and this can kind of go on for a while but it does run in
[00:10:46] while but it does run in polynomial time you can read the notes
[00:10:48] polynomial time you can read the notes for a little bit more details about the
[00:10:49] for a little bit more details about the running time
[00:10:52] so
[00:10:54] so as great as ac3 might seem it's not a
[00:10:57] as great as ac3 might seem it's not a panacea and it shouldn't be and it
[00:11:00] panacea and it shouldn't be and it shouldn't be surprising because solving
[00:11:02] shouldn't be surprising because solving a csp should take an exponential time
[00:11:05] a csp should take an exponential time in general and ac3 isn't doing
[00:11:08] in general and ac3 isn't doing any sort of backtracking search
[00:11:10] any sort of backtracking search so here is a small example that shows
[00:11:13] so here is a small example that shows when ac3 doesn't do anything
[00:11:16] when ac3 doesn't do anything so here we have a mini australia here
[00:11:18] so here we have a mini australia here with three
[00:11:19] with three variables
[00:11:20] variables and suppose each of them can either be
[00:11:22] and suppose each of them can either be red or blue red or blue red or blue
[00:11:25] red or blue red or blue red or blue so immediately you should
[00:11:27] so immediately you should realize that there is no
[00:11:29] realize that there is no consistent assignment to three variables
[00:11:32] consistent assignment to three variables with only two
[00:11:33] with only two colors such that any pair can't have the
[00:11:36] colors such that any pair can't have the same color
[00:11:39] same color but what happens
[00:11:41] but what happens if you run ac3
[00:11:44] if you run ac3 okay so let's look at this uh factor
[00:11:47] okay so let's look at this uh factor here so w-a-n-t
[00:11:50] here so w-a-n-t so this is r consistent
[00:11:53] so this is r consistent because if i assign w-a red then nt can
[00:11:56] because if i assign w-a red then nt can be blue if i assign wa blue then nt can
[00:11:59] be blue if i assign wa blue then nt can be red so if i just look at this local
[00:12:02] be red so if i just look at this local configuration there's no problem and
[00:12:05] configuration there's no problem and analogously if i look over here there's
[00:12:07] analogously if i look over here there's no problem and if i look over here
[00:12:09] no problem and if i look over here there's no problem
[00:12:10] there's no problem so ac3 doesn't detect a problem
[00:12:15] so ac3 doesn't detect a problem even though there's no satisfying
[00:12:17] even though there's no satisfying assignment
[00:12:18] assignment so the intuition here is that ac3 and in
[00:12:21] so the intuition here is that ac3 and in general our consistency all it's doing
[00:12:23] general our consistency all it's doing is look looking locally at the graph
[00:12:26] is look looking locally at the graph and it says it only
[00:12:29] and it says it only detects problems that are kind of
[00:12:31] detects problems that are kind of blatantly wrong which can be detected
[00:12:33] blatantly wrong which can be detected locally
[00:12:34] locally but you can't avoid
[00:12:36] but you can't avoid exhaustive search to actually detect the
[00:12:39] exhaustive search to actually detect the kind of the deep problems
[00:12:44] so let me summarize here enforcing our
[00:12:46] so let me summarize here enforcing our consistency
[00:12:48] consistency is a way to take what you know about one
[00:12:52] is a way to take what you know about one variable's domain to propagate that
[00:12:54] variable's domain to propagate that information via the factors to make
[00:12:57] information via the factors to make reduce the domains of its neighbors
[00:13:01] reduce the domains of its neighbors forward checking only applies our
[00:13:03] forward checking only applies our consistency to its neighbors
[00:13:05] consistency to its neighbors and this was already somewhat effective
[00:13:08] and this was already somewhat effective ac3 just takes that
[00:13:10] ac3 just takes that to the extreme limit and enforces our
[00:13:12] to the extreme limit and enforces our consistency on the neighbors and their
[00:13:14] consistency on the neighbors and their neighbors and their neighbors and so on
[00:13:16] neighbors and their neighbors and so on until you converge
[00:13:18] until you converge so it's trying to kind of exhaustively
[00:13:21] so it's trying to kind of exhaustively enforce our consistency as much as
[00:13:23] enforce our consistency as much as possible to eliminate as much
[00:13:25] possible to eliminate as much of the values from the domains as
[00:13:27] of the values from the domains as possible
[00:13:29] possible and
[00:13:30] and of course remember that
[00:13:32] of course remember that ac3 forward checking or look at head
[00:13:34] ac3 forward checking or look at head algorithms which are used in the context
[00:13:37] algorithms which are used in the context of backtracking search to detect
[00:13:39] of backtracking search to detect inconsistencies so we can prune early
[00:13:42] inconsistencies so we can prune early and also to maintain these domains so
[00:13:44] and also to maintain these domains so that we can use them for heuristics such
[00:13:47] that we can use them for heuristics such as mcb and lcd and look ahead turns out
[00:13:50] as mcb and lcd and look ahead turns out to be very very important for
[00:13:52] to be very very important for backtracking search if you can look
[00:13:53] backtracking search if you can look ahead and detect
[00:13:55] ahead and detect inconsistency then that saves you the
[00:13:57] inconsistency then that saves you the work of actually having to recurse and
[00:13:59] work of actually having to recurse and explore a combinatorial number of
[00:14:01] explore a combinatorial number of possibilities
[00:14:03] possibilities okay that's the end


================================================================================
LECTURE 028
================================================================================

Constraint Satisfaction Problems (CSPs) 6 - Beam Search | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=XuWMeIHGkus

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about beam search a really simple
[00:00:09] about beam search a really simple algorithm for finding approximate
[00:00:11] algorithm for finding approximate maximum weight assignments efficiently
[00:00:13] maximum weight assignments efficiently when you're in a hurry and don't want to
[00:00:15] when you're in a hurry and don't want to incur the full cost of backtracking
[00:00:17] incur the full cost of backtracking search
[00:00:19] search so just as to review
[00:00:21] so just as to review remember constraint satisfaction or csp
[00:00:25] remember constraint satisfaction or csp is defined by a factor graph which
[00:00:27] is defined by a factor graph which consists of a set of variables x1
[00:00:29] consists of a set of variables x1 through xn where each xi can be
[00:00:32] through xn where each xi can be at some element of a domain i
[00:00:34] at some element of a domain i and a set of factors f1 through fm where
[00:00:38] and a set of factors f1 through fm where each factor function fj
[00:00:40] each factor function fj is a function that takes
[00:00:42] is a function that takes assignment and returns a non-negative
[00:00:46] assignment and returns a non-negative number and usually the factor function
[00:00:48] number and usually the factor function depends only on a subset of the
[00:00:50] depends only on a subset of the variables
[00:00:51] variables so each assignment little x to all the
[00:00:55] so each assignment little x to all the variables
[00:00:56] variables has a weight and that weight is given by
[00:00:59] has a weight and that weight is given by simply the product of all the factors
[00:01:02] simply the product of all the factors applied to the assignment
[00:01:04] applied to the assignment and the objective is to find the maximum
[00:01:07] and the objective is to find the maximum weight assignment
[00:01:10] so let us revisit the object object
[00:01:13] so let us revisit the object object tracking example so in this example
[00:01:16] tracking example so in this example we're trying to track an object over
[00:01:17] we're trying to track an object over time
[00:01:18] time and at each time step we record a noisy
[00:01:21] and at each time step we record a noisy sensor reading of its position so at
[00:01:23] sensor reading of its position so at time step one we see zero times f2 we
[00:01:27] time step one we see zero times f2 we see two times f3 c2
[00:01:31] see two times f3 c2 and we can the question is
[00:01:34] and we can the question is what was the trajectory that object took
[00:01:36] what was the trajectory that object took is this one or this one or something
[00:01:39] is this one or this one or something else
[00:01:40] else we modeled this
[00:01:42] we modeled this problem as a csp
[00:01:45] problem as a csp with
[00:01:46] with x1 x2 and x3
[00:01:48] x1 x2 and x3 we defined factors that captured our
[00:01:52] we defined factors that captured our intuitions about the problem
[00:01:54] intuitions about the problem o1 captures the fact that the actual
[00:01:57] o1 captures the fact that the actual position should be close to the sensor
[00:01:59] position should be close to the sensor reading
[00:02:00] reading so
[00:02:01] so 2
[00:02:02] 2 is the weight assigned to 0. so 0 x1
[00:02:05] is the weight assigned to 0. so 0 x1 equals 0 is favored and x1 equals 2 is
[00:02:09] equals 0 is favored and x1 equals 2 is disallowed
[00:02:10] disallowed similarly
[00:02:12] similarly o2
[00:02:14] o2 favors x2 equals 2
[00:02:16] favors x2 equals 2 o3 favors x3 equals 2
[00:02:19] o3 favors x3 equals 2 and finally the transition factors
[00:02:22] and finally the transition factors t1 and t2
[00:02:24] t1 and t2 favor
[00:02:25] favor adjacent xi's which are close so a
[00:02:29] adjacent xi's which are close so a distance of zero will get
[00:02:31] distance of zero will get to a weight of two whereas a distance of
[00:02:34] to a weight of two whereas a distance of one will get
[00:02:35] one will get one and so on
[00:02:37] one and so on okay and you can click on this demo
[00:02:40] okay and you can click on this demo to actually play with um
[00:02:42] to actually play with um this
[00:02:43] this uh csp
[00:02:45] uh csp we'll come back to this in a bit
[00:02:49] we'll come back to this in a bit okay so this is the object tracking um
[00:02:52] okay so this is the object tracking um example so now
[00:02:54] example so now so far we've seen backtracking search as
[00:02:56] so far we've seen backtracking search as a way to compute
[00:02:58] a way to compute maximal white uh assignments
[00:03:00] maximal white uh assignments and backtracking search essentially does
[00:03:03] and backtracking search essentially does uh exhaustive depth first search of the
[00:03:05] uh exhaustive depth first search of the entire tree in the worst case which can
[00:03:08] entire tree in the worst case which can take a very very long time
[00:03:10] take a very very long time so how can we avoid this well we have to
[00:03:13] so how can we avoid this well we have to give up on something and what we're
[00:03:15] give up on something and what we're going to give up on
[00:03:17] going to give up on is correctness
[00:03:19] is correctness so what we're going to do is simply not
[00:03:22] so what we're going to do is simply not backtrack
[00:03:23] backtrack so let's start with something called the
[00:03:25] so let's start with something called the greedy search algorithm
[00:03:27] greedy search algorithm so again we start with the empty
[00:03:29] so again we start with the empty assignment
[00:03:30] assignment we consider
[00:03:32] we consider possible settings of let's say x1
[00:03:35] possible settings of let's say x1 so let's say there's two possible
[00:03:37] so let's say there's two possible settings
[00:03:38] settings and we're just going to choose one of
[00:03:39] and we're just going to choose one of them whichever ones has the
[00:03:42] them whichever ones has the highest
[00:03:43] highest weight
[00:03:44] weight and the weight of remember of a partial
[00:03:46] and the weight of remember of a partial assignment is the product of all the
[00:03:48] assignment is the product of all the factors that you can evaluate so far
[00:03:50] factors that you can evaluate so far so let's pick with this one
[00:03:52] so let's pick with this one again let's set x2
[00:03:55] again let's set x2 there's two possible ways to set it
[00:03:57] there's two possible ways to set it let's pick the better one and keep on
[00:03:59] let's pick the better one and keep on going until we reach a complete
[00:04:02] going until we reach a complete assignment and then we just return that
[00:04:06] assignment and then we just return that so
[00:04:07] so formally
[00:04:08] formally what greedy search is doing is starting
[00:04:10] what greedy search is doing is starting with a partial assignment which is empty
[00:04:13] with a partial assignment which is empty and then it's going through each of the
[00:04:15] and then it's going through each of the variables x1 through xn
[00:04:17] variables x1 through xn i'm going to try to extend the partial
[00:04:20] i'm going to try to extend the partial assignment to set
[00:04:23] assignment to set x i so
[00:04:24] x i so for each possible value
[00:04:27] for each possible value that i can assign x i i'm going to form
[00:04:30] that i can assign x i i'm going to form a
[00:04:31] a potential candidate partial assignment
[00:04:35] potential candidate partial assignment and call it xp
[00:04:37] and call it xp and then i'm going to compute the weight
[00:04:39] and then i'm going to compute the weight of each of these xv's and then choose
[00:04:42] of each of these xv's and then choose the one with the highest weight
[00:04:45] the one with the highest weight so important caveat is this is
[00:04:46] so important caveat is this is definitely not guaranteed to find the
[00:04:48] definitely not guaranteed to find the maximum weight assignment even local
[00:04:50] maximum weight assignment even local even though locally it appears to be
[00:04:52] even though locally it appears to be optimizing and finding the value with
[00:04:55] optimizing and finding the value with the best
[00:04:56] the best weight
[00:04:58] weight so let's look at this um
[00:05:00] so let's look at this um demo to see how it works on object
[00:05:03] demo to see how it works on object tracking okay so
[00:05:05] tracking okay so here we have the csp that's defined
[00:05:08] here we have the csp that's defined and i'm going to step through this
[00:05:09] and i'm going to step through this algorithm so
[00:05:11] algorithm so initially
[00:05:13] initially i extend the empty assignment to
[00:05:16] i extend the empty assignment to assignments that
[00:05:18] assignments that only
[00:05:19] only fill in x1 so x1 could be 0 1 or 2 and
[00:05:23] fill in x1 so x1 could be 0 1 or 2 and these are the weights of these partial
[00:05:26] these are the weights of these partial three partial assignments
[00:05:28] three partial assignments remember the sensor reading was zero
[00:05:31] remember the sensor reading was zero so therefore x one equals zero has a
[00:05:34] so therefore x one equals zero has a larger one
[00:05:36] larger one so next step i prune i keep only the
[00:05:40] so next step i prune i keep only the best candidate which in this case is x
[00:05:42] best candidate which in this case is x one equals zero
[00:05:45] one equals zero so then i go to um i equals two and i
[00:05:49] so then i go to um i equals two and i extend
[00:05:50] extend that assignment to
[00:05:52] that assignment to uh three possible settings of x2 compute
[00:05:56] uh three possible settings of x2 compute their weights
[00:05:57] their weights and then i keep the best one which in
[00:05:59] and then i keep the best one which in case this case is uh another one
[00:06:02] case this case is uh another one and now i extend again to x3
[00:06:06] and now i extend again to x3 three possible values to set x3 compute
[00:06:09] three possible values to set x3 compute the weights of these um
[00:06:11] the weights of these um well now
[00:06:12] well now complete assignments
[00:06:14] complete assignments and then i choose uh the best one
[00:06:17] and then i choose uh the best one so in this case 3d search ends up with
[00:06:20] so in this case 3d search ends up with the assignment 0 1 1
[00:06:23] the assignment 0 1 1 with a weight of 4. and if you remember
[00:06:27] with a weight of 4. and if you remember this
[00:06:28] this this example the best
[00:06:30] this example the best weight assignment have weight 8. so four
[00:06:33] weight assignment have weight 8. so four is definitely not the right answer but
[00:06:36] is definitely not the right answer but it's not zero either it found something
[00:06:42] okay so what's the problem with 3d
[00:06:44] okay so what's the problem with 3d search is that it's too myopic it only
[00:06:46] search is that it's too myopic it only keeps the best canada single best
[00:06:49] keeps the best canada single best candidate
[00:06:50] candidate so beam search is just the natural
[00:06:52] so beam search is just the natural generalization
[00:06:53] generalization of greedy where i'm keeping at most k
[00:06:57] of greedy where i'm keeping at most k candidates at each level
[00:07:00] candidates at each level so let's say k equals four
[00:07:03] so let's say k equals four so i'm going to start with empty
[00:07:06] so i'm going to start with empty assignment i'm going to x uh extend
[00:07:11] assignment i'm going to x uh extend and then i don't need to prune because
[00:07:13] and then i don't need to prune because there's only two
[00:07:14] there's only two um possible partial assignments here
[00:07:17] um possible partial assignments here and i have a capacity of four
[00:07:19] and i have a capacity of four i'm going to extend again
[00:07:21] i'm going to extend again again i don't need to prune
[00:07:23] again i don't need to prune but then next i'm going to extend each
[00:07:26] but then next i'm going to extend each of
[00:07:27] of the elements on my beam the partial
[00:07:30] the elements on my beam the partial assignments
[00:07:31] assignments extend each of these
[00:07:33] extend each of these and now i have eight and i now i need to
[00:07:35] and now i have eight and i now i need to reduce the eight partial assignments to
[00:07:38] reduce the eight partial assignments to four
[00:07:40] four and to do this i'm going to simply
[00:07:42] and to do this i'm going to simply compute the weight of each of these
[00:07:44] compute the weight of each of these eight partial assignments and then take
[00:07:46] eight partial assignments and then take the four which have the highest weight
[00:07:49] the four which have the highest weight and now let's suppose those are these
[00:07:52] and now let's suppose those are these four
[00:07:53] four and then i'll continue only expanding
[00:07:55] and then i'll continue only expanding the ones i've kept
[00:07:57] the ones i've kept and then
[00:07:58] and then keeping the ones the again the top four
[00:08:01] keeping the ones the again the top four and then keep on going
[00:08:03] and then keep on going so notice that
[00:08:06] so notice that visually
[00:08:07] visually i'm exploring only a very very small
[00:08:09] i'm exploring only a very very small fraction
[00:08:10] fraction of the tree but i'm doing this kind of
[00:08:13] of the tree but i'm doing this kind of holistically looking down uh the tree i
[00:08:16] holistically looking down uh the tree i kind of multiple uh diff
[00:08:19] kind of multiple uh diff i could be exploring
[00:08:21] i could be exploring different parts of the tree um at the
[00:08:23] different parts of the tree um at the same time
[00:08:26] same time so formally
[00:08:27] so formally beam search uh keeps at most k
[00:08:30] beam search uh keeps at most k candidates
[00:08:32] candidates of partial assignments
[00:08:34] of partial assignments i'm gonna initialize the candidate set
[00:08:36] i'm gonna initialize the candidate set to be
[00:08:37] to be just the single partial assignment which
[00:08:39] just the single partial assignment which is empty
[00:08:40] is empty now again like greedy search i'm going
[00:08:42] now again like greedy search i'm going to go through the variables one at a
[00:08:44] to go through the variables one at a time i'm going to extend
[00:08:47] time i'm going to extend in this case i'm going to consider each
[00:08:50] in this case i'm going to consider each partial assignment in c
[00:08:53] partial assignment in c and each possible value
[00:08:55] and each possible value that i can assign x i
[00:08:58] that i can assign x i and i'm gonna do perform the extend
[00:09:01] and i'm gonna do perform the extend uh the assignment
[00:09:03] uh the assignment and i'm just gonna keep track a c prime
[00:09:06] and i'm just gonna keep track a c prime is going to be the new set of candidates
[00:09:09] is going to be the new set of candidates and then now i'm going to prune that set
[00:09:12] and then now i'm going to prune that set by computing the weight for each
[00:09:14] by computing the weight for each element of c prime
[00:09:17] element of c prime and just keeping the top k elements
[00:09:22] and just keeping the top k elements so this is not guaranteed to find the
[00:09:23] so this is not guaranteed to find the maximum weight assignment
[00:09:25] maximum weight assignment either
[00:09:27] either but sometimes it works better so let's
[00:09:29] but sometimes it works better so let's look at this example
[00:09:33] object tracking
[00:09:35] object tracking and
[00:09:36] and i stay extend from the empty assignment
[00:09:39] i stay extend from the empty assignment to get three partial assignments to x1
[00:09:43] to get three partial assignments to x1 um
[00:09:44] um i prune to the top three so nothing gets
[00:09:48] i prune to the top three so nothing gets removed and then extend
[00:09:50] removed and then extend so each of these three
[00:09:53] so each of these three partial assignments gets extended into
[00:09:56] partial assignments gets extended into three additional ones now i have nine
[00:10:00] three additional ones now i have nine and now i'm going to prune down from
[00:10:02] and now i'm going to prune down from nine to three so that will keep
[00:10:05] nine to three so that will keep all the
[00:10:07] all the assignments here with a positive void
[00:10:10] assignments here with a positive void and now i extend again
[00:10:12] and now i extend again to
[00:10:14] to find settings of x3
[00:10:17] find settings of x3 compute each of these weights and then
[00:10:19] compute each of these weights and then i'm going to take the top
[00:10:23] assignments
[00:10:26] okay so now
[00:10:28] okay so now notice that
[00:10:29] notice that the top assignment that i have right now
[00:10:32] the top assignment that i have right now is
[00:10:33] is one two two with a weight of eight
[00:10:36] one two two with a weight of eight in this case i got lucky and i found
[00:10:39] in this case i got lucky and i found actual max weight assignment but in
[00:10:42] actual max weight assignment but in general you won't be guaranteed
[00:10:49] okay so what is the time complexity of
[00:10:51] okay so what is the time complexity of beam search because one of the
[00:10:53] beam search because one of the advantages is that's supposed to be fast
[00:10:56] advantages is that's supposed to be fast so let's do a simple calculation here so
[00:10:58] so let's do a simple calculation here so suppose we have n variables which is the
[00:11:00] suppose we have n variables which is the depth of this tree
[00:11:01] depth of this tree and suppose that each of the variables
[00:11:04] and suppose that each of the variables has
[00:11:05] has v elements which is going to be the
[00:11:07] v elements which is going to be the branching factor here
[00:11:09] branching factor here and then the beam size is k
[00:11:11] and then the beam size is k okay so what is the time
[00:11:12] okay so what is the time that it takes to run beam search
[00:11:15] that it takes to run beam search it's going to be for each of
[00:11:18] it's going to be for each of the variables
[00:11:19] the variables each level of this tree
[00:11:22] each level of this tree we're going to have a set of candidates
[00:11:24] we're going to have a set of candidates which is of size k
[00:11:27] which is of size k and
[00:11:28] and the extension
[00:11:30] the extension is is going to take each of these k
[00:11:32] is is going to take each of these k and extend it into b candidates so then
[00:11:35] and extend it into b candidates so then i'm going to end up with kb
[00:11:38] i'm going to end up with kb extended candidates total
[00:11:40] extended candidates total and then i'm going to take have to take
[00:11:43] and then i'm going to take have to take the top k
[00:11:44] the top k so the time it takes to take a list of
[00:11:47] so the time it takes to take a list of kb elements and select the top k
[00:11:51] kb elements and select the top k elements is kb log k by building a heap
[00:11:56] elements is kb log k by building a heap so the total time is n kb log k
[00:11:59] so the total time is n kb log k and importantly
[00:12:01] and importantly this is linear in the number of
[00:12:04] this is linear in the number of variables whereas backtracking search
[00:12:06] variables whereas backtracking search would be exponential and the number of
[00:12:08] would be exponential and the number of variables
[00:12:11] okay so let us summarize now
[00:12:14] okay so let us summarize now so beam search is a fairly simple
[00:12:16] so beam search is a fairly simple heuristic to
[00:12:18] heuristic to approximate uh maximum weight
[00:12:20] approximate uh maximum weight assignments and it's really done if
[00:12:22] assignments and it's really done if you're really in a hurry and you don't
[00:12:24] you're really in a hurry and you don't really care about getting maximum weight
[00:12:25] really care about getting maximum weight assignment because you probably won't
[00:12:28] assignment because you probably won't so um the nice thing about beam search
[00:12:30] so um the nice thing about beam search is it has this parameter k which allows
[00:12:32] is it has this parameter k which allows you to control the trade-off between
[00:12:34] you to control the trade-off between efficiency and accuracy
[00:12:37] efficiency and accuracy so if you're really in a hurry you set k
[00:12:39] so if you're really in a hurry you set k equals one you just get greedy search
[00:12:41] equals one you just get greedy search which sometimes actually gets you pretty
[00:12:43] which sometimes actually gets you pretty good answers
[00:12:45] good answers but and if as you increase k more and
[00:12:49] but and if as you increase k more and more
[00:12:50] more um if you can increase k to infinity
[00:12:52] um if you can increase k to infinity then you'll definitely
[00:12:54] then you'll definitely search the entire search tree and you
[00:12:56] search the entire search tree and you will get the optimal answer but this is
[00:12:58] will get the optimal answer but this is basically exponential time
[00:13:01] basically exponential time one thing to know about uh beam search
[00:13:03] one thing to know about uh beam search with kuan's infinity it's it is
[00:13:06] with kuan's infinity it's it is performing a breadth first search of the
[00:13:08] performing a breadth first search of the tree because at each it performs level
[00:13:10] tree because at each it performs level by level and explores all the nodes in a
[00:13:13] by level and explores all the nodes in a tree
[00:13:15] tree systematically
[00:13:17] systematically so using this analogy i want to
[00:13:21] so using this analogy i want to end with a final
[00:13:23] end with a final note here which is that that tracking
[00:13:26] note here which is that that tracking search
[00:13:27] search is really like doing a debt first search
[00:13:29] is really like doing a debt first search on the search tree it dives deeply into
[00:13:33] on the search tree it dives deeply into one
[00:13:35] one harsh complete assignment
[00:13:37] harsh complete assignment and then backtracks and then finds
[00:13:38] and then backtracks and then finds another complete assignment and
[00:13:40] another complete assignment and backtracks looking kind of one
[00:13:42] backtracks looking kind of one assignment at a time
[00:13:44] assignment at a time whereas beam search
[00:13:46] whereas beam search is more akin to breadth first search
[00:13:48] is more akin to breadth first search where we're proceeding level by level
[00:13:50] where we're proceeding level by level but the main difference with breadth
[00:13:52] but the main difference with breadth research is that we're doing this
[00:13:54] research is that we're doing this heuristic pruning at each level to make
[00:13:56] heuristic pruning at each level to make sure that we don't have too many
[00:13:58] sure that we don't have too many candidates
[00:13:59] candidates and the way it's using using that um
[00:14:03] and the way it's using using that um doing that pruning is based on the
[00:14:05] doing that pruning is based on the factors
[00:14:06] factors that it can evaluate so far so for beam
[00:14:09] that it can evaluate so far so for beam search to work
[00:14:10] search to work you really need it to be the case that
[00:14:13] you really need it to be the case that the factors are local and they can be
[00:14:16] the factors are local and they can be evaluated as much as possible along the
[00:14:18] evaluated as much as possible along the way and not all at the very end
[00:14:21] way and not all at the very end all right so that's the end of this
[00:14:23] all right so that's the end of this module


================================================================================
LECTURE 029
================================================================================

Constraint Satisfaction Problems (CSPs) 7 - Local Search | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=VwZKPlK6jUg

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about local search a strategy for
[00:00:09] about local search a strategy for approximately computing the maximum
[00:00:11] approximately computing the maximum weight
[00:00:12] weight assignment
[00:00:13] assignment a constraint satisfaction problem
[00:00:16] a constraint satisfaction problem so remember that a csv is defined by a
[00:00:19] so remember that a csv is defined by a factor graph which includes a set of
[00:00:21] factor graph which includes a set of variables x1 through xn
[00:00:23] variables x1 through xn and a set of factors f1 through fm where
[00:00:26] and a set of factors f1 through fm where each factor is a function that depends
[00:00:28] each factor is a function that depends on a subset of the variables and returns
[00:00:30] on a subset of the variables and returns a non-negative number
[00:00:33] a non-negative number each assignment to all the variables has
[00:00:36] each assignment to all the variables has a weight given by the product of all the
[00:00:39] a weight given by the product of all the factors evaluated on assignment
[00:00:42] factors evaluated on assignment and the objective is to compute the
[00:00:44] and the objective is to compute the maximum weight assignment as usual
[00:00:48] maximum weight assignment as usual so so far we've seen backtracking search
[00:00:50] so so far we've seen backtracking search and beam search
[00:00:52] and beam search and both of these search algorithms
[00:00:54] and both of these search algorithms works by extending partial assignments
[00:00:57] works by extending partial assignments you start with the empty assignments
[00:01:00] you start with the empty assignments and then you assign one variable and you
[00:01:02] and then you assign one variable and you sign another variable until you get to a
[00:01:05] sign another variable until you get to a complete assignment and then maybe you
[00:01:07] complete assignment and then maybe you backtrack or maybe you don't
[00:01:09] backtrack or maybe you don't so local search is going to be a little
[00:01:12] so local search is going to be a little bit different it's going to
[00:01:15] bit different it's going to modify complete assignments so you're
[00:01:17] modify complete assignments so you're going to start with a random assignment
[00:01:19] going to start with a random assignment and then you're going to choose one
[00:01:20] and then you're going to choose one variable and you change it choose
[00:01:22] variable and you change it choose another variable and change it more kind
[00:01:24] another variable and change it more kind of like house maintenance rather than
[00:01:26] of like house maintenance rather than building a house
[00:01:28] building a house so one of the advantages of local search
[00:01:30] so one of the advantages of local search is that gives you additional flexibility
[00:01:32] is that gives you additional flexibility you can pick any variable and try to
[00:01:35] you can pick any variable and try to improve it
[00:01:36] improve it whereas
[00:01:37] whereas backtracking search and beam search you
[00:01:40] backtracking search and beam search you have to do things in a certain order or
[00:01:42] have to do things in a certain order or beam search once you've assigned a
[00:01:44] beam search once you've assigned a variable you can't go back
[00:01:46] variable you can't go back and backtracking search you can't
[00:01:48] and backtracking search you can't backtrack but
[00:01:50] backtrack but you can't really kind of backtrack
[00:01:52] you can't really kind of backtrack backtrack out of order
[00:01:56] so recall our running example object
[00:01:59] so recall our running example object tracking so at each time position you
[00:02:02] tracking so at each time position you observe a noisy sensor reading of a
[00:02:04] observe a noisy sensor reading of a particular object you observe
[00:02:07] particular object you observe 0
[00:02:08] 0 2
[00:02:09] 2 and 2 as the positions of the object and
[00:02:12] and 2 as the positions of the object and you're trying to figure out where this
[00:02:14] you're trying to figure out where this object was
[00:02:15] object was so we did model this as a csp
[00:02:18] so we did model this as a csp where we have
[00:02:20] where we have three of observational factors
[00:02:24] three of observational factors one which favors x1 equals
[00:02:27] one which favors x1 equals zero o2 which favors x2 equals to
[00:02:31] zero o2 which favors x2 equals to o3 which favors x3 equals two and we
[00:02:34] o3 which favors x3 equals two and we have two transition factors
[00:02:37] have two transition factors that
[00:02:38] that favor
[00:02:39] favor subsequent positions being close by
[00:02:45] so let's jump in and suppose we just
[00:02:49] so let's jump in and suppose we just have
[00:02:50] have a complete assignment zero zero one
[00:02:54] a complete assignment zero zero one okay the question is how do we improve
[00:02:56] okay the question is how do we improve this
[00:02:57] this well let's look at um the weight of this
[00:03:00] well let's look at um the weight of this assignment so the weight of this
[00:03:02] assignment so the weight of this assignment is 2
[00:03:04] assignment is 2 because 0 agrees with 0
[00:03:08] because 0 agrees with 0 times 2 because 0 agrees with 0
[00:03:11] times 2 because 0 agrees with 0 times 0 uh-oh because these two are too
[00:03:15] times 0 uh-oh because these two are too far apart
[00:03:17] far apart um times one because these only differ
[00:03:21] um times one because these only differ by one
[00:03:22] by one and times one because
[00:03:25] and times one because by one
[00:03:26] by one but you get a zero so that's not a very
[00:03:28] but you get a zero so that's not a very good assignment so how can we improve
[00:03:31] good assignment so how can we improve let's try to reassign x2 to something
[00:03:34] let's try to reassign x2 to something else
[00:03:34] else let's try to
[00:03:35] let's try to assign it to some v so we can set v
[00:03:38] assign it to some v so we can set v equals 0 1 or 2.
[00:03:41] equals 0 1 or 2. and for each of these
[00:03:44] and for each of these alternate assignments and compute its
[00:03:46] alternate assignments and compute its weight
[00:03:47] weight and then we simply take the assignment
[00:03:49] and then we simply take the assignment with the best weight in this case it's
[00:03:52] with the best weight in this case it's uh this one
[00:03:54] uh this one which sets x2 to be one
[00:03:57] which sets x2 to be one then we end up with a new assignment
[00:03:59] then we end up with a new assignment which is better than the old one so
[00:04:01] which is better than the old one so mission accomplished
[00:04:05] so we can refine this strategy a little
[00:04:07] so we can refine this strategy a little bit more so
[00:04:09] bit more so suppose we're trying to
[00:04:11] suppose we're trying to assign x2
[00:04:13] assign x2 the weight of a new assignment
[00:04:16] the weight of a new assignment where x2 has been replaced with sum v
[00:04:20] where x2 has been replaced with sum v is as follows so you're multiplying all
[00:04:22] is as follows so you're multiplying all the factors in the csp together o 1 t 1
[00:04:25] the factors in the csp together o 1 t 1 o 2 t
[00:04:26] o 2 t three
[00:04:28] three but
[00:04:29] but note that
[00:04:30] note that only some of the factors depend on v
[00:04:33] only some of the factors depend on v in particular o1 and o3 don't depend on
[00:04:36] in particular o1 and o3 don't depend on v
[00:04:37] v so no matter what v is these are the
[00:04:39] so no matter what v is these are the same which means that we can ignore them
[00:04:43] same which means that we can ignore them and um just evaluate the factors that
[00:04:47] and um just evaluate the factors that involve
[00:04:48] involve x2
[00:04:49] x2 so this is an idea of locality which
[00:04:52] so this is an idea of locality which leverages the
[00:04:54] leverages the structure of the csp
[00:04:56] structure of the csp when evaluating possible reassignments
[00:04:58] when evaluating possible reassignments to some variable x i we only need to
[00:05:01] to some variable x i we only need to consider the factors that depend on x i
[00:05:04] consider the factors that depend on x i so in a factor graph where there's lots
[00:05:08] so in a factor graph where there's lots and lots of variables and you're trying
[00:05:10] and lots of variables and you're trying to reassign one variable which might
[00:05:12] to reassign one variable which might have a small neighborhood
[00:05:14] have a small neighborhood then you're saving a lot of effort
[00:05:19] so now we're ready to define
[00:05:21] so now we're ready to define our local search algorithm which is
[00:05:22] our local search algorithm which is called iterated conditional modes sounds
[00:05:25] called iterated conditional modes sounds fancy but it's really simple
[00:05:27] fancy but it's really simple the idea is that we're going to start
[00:05:29] the idea is that we're going to start x to be a random complete assignment
[00:05:34] x to be a random complete assignment and we're going to loop through
[00:05:36] and we're going to loop through x1 through xn
[00:05:38] x1 through xn and then keep on going until
[00:05:40] and then keep on going until we converge or we run out of time
[00:05:44] we converge or we run out of time um what we're going to do is we're going
[00:05:46] um what we're going to do is we're going to try to reassign x on okay so we're
[00:05:48] to try to reassign x on okay so we're going to consider each possible value
[00:05:50] going to consider each possible value that x i could take on
[00:05:53] that x i could take on and then we're going to update the
[00:05:55] and then we're going to update the current assignment x
[00:05:58] current assignment x with that value
[00:06:00] with that value okay so this um that produces an
[00:06:02] okay so this um that produces an assignment x v
[00:06:05] assignment x v and then we're going to compute the
[00:06:06] and then we're going to compute the weights
[00:06:08] weights of each of these x v's and choose the
[00:06:11] of each of these x v's and choose the one with the highest weight remember in
[00:06:13] one with the highest weight remember in computing the weight we only need to
[00:06:15] computing the weight we only need to evaluate
[00:06:16] evaluate the factors that
[00:06:17] the factors that touch x i and also notice that this
[00:06:21] touch x i and also notice that this looks remarkably like you know greedy
[00:06:24] looks remarkably like you know greedy search or beam search
[00:06:26] search or beam search um there is a substantial difference
[00:06:29] um there is a substantial difference in that here x are complete assignments
[00:06:32] in that here x are complete assignments not
[00:06:33] not partial assignments so this is not
[00:06:35] partial assignments so this is not extending an assignment so much as
[00:06:37] extending an assignment so much as replacing x i
[00:06:40] replacing x i uh with a read
[00:06:43] uh with a read so pictorially what this looks like is
[00:06:46] so pictorially what this looks like is um you
[00:06:49] um you start with x1
[00:06:50] start with x1 so by convention
[00:06:52] so by convention unshaded nodes are the ones that are
[00:06:55] unshaded nodes are the ones that are meant to be
[00:06:56] meant to be reassigned and shaded ones are the ones
[00:06:58] reassigned and shaded ones are the ones that are fixed
[00:07:00] that are fixed so you pick up x1 and you say can i
[00:07:02] so you pick up x1 and you say can i change it to make it better
[00:07:04] change it to make it better and then you
[00:07:05] and then you pick some value of x1 then you go to
[00:07:08] pick some value of x1 then you go to x2 and say can i make uh change x2 to
[00:07:10] x2 and say can i make uh change x2 to make this uh assignment better
[00:07:13] make this uh assignment better and then you go to x3 and then you go
[00:07:15] and then you go to x3 and then you go back to x1 and say hey can i make it
[00:07:18] back to x1 and say hey can i make it better by changing x1 again you keep on
[00:07:20] better by changing x1 again you keep on going until
[00:07:22] going until it converges
[00:07:24] it converges so here's a demo on the object tracking
[00:07:27] so here's a demo on the object tracking example
[00:07:28] example um
[00:07:33] so
[00:07:33] so at the start of the algorithm we're just
[00:07:35] at the start of the algorithm we're just going to initialize this with a random
[00:07:38] going to initialize this with a random assignment 0 1 2 and it has a weight
[00:07:42] assignment 0 1 2 and it has a weight of 4. um and now i'm going to try to
[00:07:46] of 4. um and now i'm going to try to maximize variable one uh x1 given
[00:07:49] maximize variable one uh x1 given everything else so let's consider
[00:07:51] everything else so let's consider alternative values of x1 so it could be
[00:07:53] alternative values of x1 so it could be zero one or two for each of these i'm
[00:07:56] zero one or two for each of these i'm going to compute
[00:07:58] going to compute its weight only evaluating the factors
[00:08:01] its weight only evaluating the factors that touch x1 so in this case it's only
[00:08:04] that touch x1 so in this case it's only o1 and t1 that touch x1 so i only need
[00:08:07] o1 and t1 that touch x1 so i only need to evaluate those
[00:08:09] to evaluate those compute the weights
[00:08:10] compute the weights choose the best one breaking ties
[00:08:13] choose the best one breaking ties arbitrarily so i
[00:08:14] arbitrarily so i choose x1 0 which means i didn't change
[00:08:18] choose x1 0 which means i didn't change so now let me step
[00:08:21] so now let me step so now i'm looking at one
[00:08:23] so now i'm looking at one x2
[00:08:25] x2 um and can i change anything
[00:08:30] um and can i change anything nope can i what about here x3
[00:08:34] nope can i what about here x3 um
[00:08:35] um assigned is assigned to
[00:08:37] assigned is assigned to what can i do
[00:08:39] what can i do well i compute the weights
[00:08:42] well i compute the weights and here i
[00:08:44] and here i am choosing
[00:08:45] am choosing x
[00:08:46] x three to be one okay so
[00:08:49] three to be one okay so um i change that assignment
[00:08:51] um i change that assignment and now i go back to x1 and iterate
[00:08:55] and now i go back to x1 and iterate and it looks like i've uh converged
[00:08:58] and it looks like i've uh converged because i'm not changing anything
[00:09:00] because i'm not changing anything so i've converged to
[00:09:02] so i've converged to an assignment with a weight of 4 which
[00:09:05] an assignment with a weight of 4 which if you remember is not the optimum uh
[00:09:08] if you remember is not the optimum uh maximum weight assignment the maximum
[00:09:11] maximum weight assignment the maximum weight assignment is
[00:09:12] weight assignment is um has weight eight
[00:09:14] um has weight eight so again iterative conditional modes is
[00:09:17] so again iterative conditional modes is is going to give you an okay solution
[00:09:19] is going to give you an okay solution but not necessarily the best one
[00:09:26] so convergence properties
[00:09:29] so convergence properties but the good news is that
[00:09:31] but the good news is that the weight of your assignment is
[00:09:34] the weight of your assignment is not going to go down it's going to
[00:09:36] not going to go down it's going to always increase or stay the same each
[00:09:38] always increase or stay the same each iteration
[00:09:39] iteration and this is because when you're trying
[00:09:41] and this is because when you're trying to reassign a variable you can always
[00:09:44] to reassign a variable you can always choose the old value
[00:09:47] choose the old value and maintain the same weight so any
[00:09:49] and maintain the same weight so any change must be increasing the weight
[00:09:52] change must be increasing the weight so this means that it converges in a
[00:09:55] so this means that it converges in a finite number of iterations because
[00:09:57] finite number of iterations because there's only finite number of possible
[00:10:00] there's only finite number of possible assignments so you can only increase the
[00:10:02] assignments so you can only increase the weight
[00:10:03] weight a finite number of times
[00:10:05] a finite number of times this uh can get second off local optimum
[00:10:08] this uh can get second off local optimum as we've seen and it's not generally
[00:10:11] as we've seen and it's not generally guaranteed to find the optimum
[00:10:14] guaranteed to find the optimum assignment
[00:10:16] assignment so just a quick note is that there's two
[00:10:20] so just a quick note is that there's two ways around this
[00:10:21] ways around this one is that there is a version of this
[00:10:24] one is that there is a version of this where you can change two variables or
[00:10:26] where you can change two variables or maybe three variables at a time
[00:10:28] maybe three variables at a time and that allows you to perhaps get out
[00:10:31] and that allows you to perhaps get out of your local optimum
[00:10:33] of your local optimum and another thing
[00:10:35] and another thing we can do is add randomness so at each
[00:10:39] we can do is add randomness so at each step we could just add
[00:10:41] step we could just add choose the best option or just choose a
[00:10:43] choose the best option or just choose a random option and this will also allow
[00:10:46] random option and this will also allow us to
[00:10:47] us to escape these local optimum
[00:10:50] escape these local optimum or we can use something like give
[00:10:51] or we can use something like give sampling which i'll talk about
[00:10:53] sampling which i'll talk about in a future module which will add
[00:10:55] in a future module which will add stochasticity to
[00:10:57] stochasticity to um
[00:10:58] um icm
[00:11:01] okay so here is the summary
[00:11:04] okay so here is the summary so
[00:11:05] so um
[00:11:06] um let me summarize actually
[00:11:08] let me summarize actually um all the search algorithms for csps
[00:11:12] um all the search algorithms for csps that we've encountered so first we
[00:11:15] that we've encountered so first we looked at backtracking search
[00:11:17] looked at backtracking search so the strategy is to extend partial
[00:11:20] so the strategy is to extend partial assignments and then backtrack when we
[00:11:23] assignments and then backtrack when we get to
[00:11:25] get to the complete assignment
[00:11:27] the complete assignment the backtracking search is
[00:11:29] the backtracking search is exact
[00:11:30] exact it computes the actual
[00:11:33] it computes the actual maximum weight assignment it's the only
[00:11:35] maximum weight assignment it's the only algorithm that we're considering that
[00:11:36] algorithm that we're considering that does that in general
[00:11:39] does that in general but the main problem is that the time
[00:11:41] but the main problem is that the time can be exponential
[00:11:44] can be exponential in the number of variables
[00:11:46] in the number of variables then we looked at a beam search
[00:11:49] then we looked at a beam search which
[00:11:50] which extends also extends partial assignments
[00:11:54] extends also extends partial assignments and here we're trading off accuracy for
[00:11:56] and here we're trading off accuracy for time so this is approximate
[00:11:58] time so this is approximate it will only give you a
[00:12:00] it will only give you a an okay solution
[00:12:02] an okay solution but it's linear in the number of
[00:12:04] but it's linear in the number of variables
[00:12:06] variables and local search um we saw iterative
[00:12:10] and local search um we saw iterative conditional modes which
[00:12:12] conditional modes which does local search by choosing the best
[00:12:15] does local search by choosing the best value of a variable at each given time
[00:12:19] value of a variable at each given time um is a different strategy here we're
[00:12:21] um is a different strategy here we're starting with complete assignments and
[00:12:24] starting with complete assignments and modifying them to make them better
[00:12:27] modifying them to make them better so here it's also
[00:12:29] so here it's also approximate
[00:12:31] approximate but it's fast just like beam search
[00:12:34] but it's fast just like beam search okay so that concludes this module


================================================================================
LECTURE 030
================================================================================

Markov Networks 1 - Overview | Stanford CS221: Artificial Intelligence (Autumn 2021)

Source: https://www.youtube.com/watch?v=neeaJb3wCYw

---

Transcript

[00:00:05] hi in this module i'm going to be
[00:00:07] hi in this module i'm going to be talking about markov network
[00:00:10] talking about markov network so far we've introduced constraint
[00:00:12] so far we've introduced constraint satisfaction problems the first of our
[00:00:14] satisfaction problems the first of our variable based models now we're going to
[00:00:16] variable based models now we're going to talk about markov networks the second
[00:00:18] talk about markov networks the second type of variable based models which will
[00:00:20] type of variable based models which will connect factor graphs with probability
[00:00:23] connect factor graphs with probability and this will be a stepping stone along
[00:00:25] and this will be a stepping stone along the way to network
[00:00:28] the way to network so recall that variable based models are
[00:00:31] so recall that variable based models are all based on factor graphs and markov
[00:00:34] all based on factor graphs and markov networks are no
[00:00:35] networks are no different so remember that a factor
[00:00:37] different so remember that a factor graph consists of a set of variables x1
[00:00:40] graph consists of a set of variables x1 n
[00:00:41] n and a set of factors f1 through fm
[00:00:44] and a set of factors f1 through fm where each factor takes a subset of the
[00:00:46] where each factor takes a subset of the variables and returns a negative number
[00:00:49] variables and returns a negative number if you multiply all of these numbers
[00:00:51] if you multiply all of these numbers together you can evaluate the weight of
[00:00:54] together you can evaluate the weight of a particular assignment
[00:00:58] so let's look at example of object
[00:01:01] so let's look at example of object tracking so here remember the goal is
[00:01:04] tracking so here remember the goal is over
[00:01:05] over time
[00:01:06] time record the noisy sensor
[00:01:08] record the noisy sensor reading of
[00:01:10] reading of an object position at zero
[00:01:13] an object position at zero two and two and the goal is to figure
[00:01:16] two and two and the goal is to figure out what the actual trajectory of this
[00:01:18] out what the actual trajectory of this object
[00:01:19] object we model this as a factor graph follows
[00:01:22] we model this as a factor graph follows where we have a number of factors
[00:01:25] where we have a number of factors representing uh the affinity for x1 to
[00:01:28] representing uh the affinity for x1 to be
[00:01:29] be close to zero f x two to be close to two
[00:01:33] close to zero f x two to be close to two and x three close to two and also
[00:01:36] and x three close to two and also adjacent positions to be close to each
[00:01:39] adjacent positions to be close to each other
[00:01:42] so
[00:01:44] before we treated this factor graph as a
[00:01:46] before we treated this factor graph as a constraint satisfaction problem where
[00:01:48] constraint satisfaction problem where the goal is to find the maximum weight
[00:01:50] the goal is to find the maximum weight assignment
[00:01:52] assignment and in this particular example we can
[00:01:55] and in this particular example we can look at the all the possible assignments
[00:01:57] look at the all the possible assignments each assignment has a weight
[00:01:59] each assignment has a weight and
[00:02:01] and find that the maximum weight assignment
[00:02:03] find that the maximum weight assignment is 1
[00:02:05] is 1 2
[00:02:06] 2 but just returning a single maximum way
[00:02:10] but just returning a single maximum way assignment
[00:02:11] assignment doesn't really give us the full picture
[00:02:13] doesn't really give us the full picture in particular it doesn't represent how
[00:02:16] in particular it doesn't represent how certain we are of
[00:02:17] certain we are of this assignment and what about all the
[00:02:19] this assignment and what about all the other possibilities
[00:02:22] other possibilities so the goal of markov networks is to try
[00:02:24] so the goal of markov networks is to try to capture this uncertainty over
[00:02:26] to capture this uncertainty over assignments using the language
[00:02:28] assignments using the language probability
[00:02:30] probability so we've actually done most of the hard
[00:02:31] so we've actually done most of the hard work already by setting up vector graphs
[00:02:34] work already by setting up vector graphs the only remaining part is to connect
[00:02:36] the only remaining part is to connect factor graphs with probability
[00:02:38] factor graphs with probability so formally a markup network or a markov
[00:02:41] so formally a markup network or a markov random field as it's sometimes called is
[00:02:43] random field as it's sometimes called is a factor graph which defines a joint
[00:02:45] a factor graph which defines a joint distribution
[00:02:47] distribution over a set of random variables x1
[00:02:50] over a set of random variables x1 through xn
[00:02:51] through xn so before these were just variables and
[00:02:53] so before these were just variables and now their random variables become
[00:02:54] now their random variables become talking about probabilities
[00:02:57] talking about probabilities so remember the factor graph gives us a
[00:02:59] so remember the factor graph gives us a weight
[00:03:00] weight for each possible assignment x
[00:03:03] for each possible assignment x and to convert this weight into a
[00:03:04] and to convert this weight into a probability we just need to normalize it
[00:03:08] probability we just need to normalize it so what i mean by that is i'm going to
[00:03:11] so what i mean by that is i'm going to look at
[00:03:13] look at the sum over all possible weights
[00:03:16] the sum over all possible weights all possible assignments and their
[00:03:18] all possible assignments and their weights
[00:03:19] weights and i'm going to define z
[00:03:21] and i'm going to define z as the sum of all the weights and that's
[00:03:24] as the sum of all the weights and that's going to be called the normalization
[00:03:25] going to be called the normalization constant or sometimes called the
[00:03:27] constant or sometimes called the partition function
[00:03:28] partition function and then i'm just going to divide by z
[00:03:31] and then i'm just going to divide by z so this is going to produce
[00:03:33] so this is going to produce something that sums to 1
[00:03:36] something that sums to 1 and i'm going to define that as a joint
[00:03:38] and i'm going to define that as a joint distribution over big x
[00:03:40] distribution over big x equals the assignment little x
[00:03:44] equals the assignment little x okay so let's do this example here we
[00:03:46] okay so let's do this example here we have um
[00:03:48] have um uh x1 x2 x3 and the weight of x so for
[00:03:52] uh x1 x2 x3 and the weight of x so for this uh we have a bunch of eight
[00:03:55] this uh we have a bunch of eight possible or not um or six possible
[00:03:57] possible or not um or six possible non-zero weight assignments with
[00:03:59] non-zero weight assignments with particular weights
[00:04:01] particular weights we add all these weights up that gives
[00:04:04] we add all these weights up that gives us the partition function z which is
[00:04:06] us the partition function z which is 26 here and then we divide each of these
[00:04:09] 26 here and then we divide each of these weights by 26 to produce joint
[00:04:13] weights by 26 to produce joint probability
[00:04:16] and so now this probability distribution
[00:04:19] and so now this probability distribution represents the uncertainty in the
[00:04:21] represents the uncertainty in the problem
[00:04:22] problem and notice that while one two two was
[00:04:26] and notice that while one two two was the maximum weight assignment and it
[00:04:28] the maximum weight assignment and it still is
[00:04:29] still is this probability gives us a more nuanced
[00:04:31] this probability gives us a more nuanced picture which is that we're only 31
[00:04:34] picture which is that we're only 31 percent sure that that is actually the
[00:04:36] percent sure that that is actually the true uh trajectory that object so this
[00:04:40] true uh trajectory that object so this could be useful information there's a
[00:04:41] could be useful information there's a big difference between 31 and 90
[00:04:46] but wait we can do more than that so the
[00:04:49] but wait we can do more than that so the language of probability as can allow us
[00:04:51] language of probability as can allow us to ask for other answer other questions
[00:04:54] to ask for other answer other questions besides just probabilities of all
[00:04:56] besides just probabilities of all assignments
[00:04:58] assignments for example if we wanted to know what
[00:05:00] for example if we wanted to know what was
[00:05:00] was where was object at time step 2
[00:05:03] where was object at time step 2 so that is what is the value f of random
[00:05:07] so that is what is the value f of random variable x2 and i don't care about x1
[00:05:10] variable x2 and i don't care about x1 and x3
[00:05:11] and x3 so this query is captured by a quantity
[00:05:14] so this query is captured by a quantity called marginal probability
[00:05:16] called marginal probability and the marginal probability of a
[00:05:18] and the marginal probability of a particular random variable
[00:05:21] particular random variable equaling a particular value v is given
[00:05:23] equaling a particular value v is given by
[00:05:24] by so we write this as p of x i equals v
[00:05:29] so we write this as p of x i equals v and this is given by just summing over
[00:05:31] and this is given by just summing over all possible
[00:05:32] all possible full assignments such that x i equals v
[00:05:36] full assignments such that x i equals v so all assignments conditio
[00:05:38] so all assignments conditio consistent with this condition
[00:05:40] consistent with this condition of the joint distribution which we
[00:05:43] of the joint distribution which we defined on the previous slide
[00:05:46] defined on the previous slide so now let's look at this object
[00:05:48] so now let's look at this object tracking example again
[00:05:49] tracking example again so we
[00:05:50] so we have our joint probability table that we
[00:05:54] have our joint probability table that we computed on the previous slide and now
[00:05:56] computed on the previous slide and now let's compute some probabilities of
[00:05:58] let's compute some probabilities of marginal probability so first let's
[00:06:01] marginal probability so first let's consider what is the probability of x2
[00:06:03] consider what is the probability of x2 equals one here
[00:06:05] equals one here so to do that we look over here and look
[00:06:07] so to do that we look over here and look at all the rows where x2 was one so
[00:06:10] at all the rows where x2 was one so that's this first four here
[00:06:12] that's this first four here and then we just add up their
[00:06:13] and then we just add up their probabilities so that's 0.15 0.15 0.15.5
[00:06:17] probabilities so that's 0.15 0.15 0.15.5 and that gives us 0.6
[00:06:20] and that gives us 0.6 and now we can issue another
[00:06:23] and now we can issue another marginal probability query what's the
[00:06:24] marginal probability query what's the probability of x2 equals 2
[00:06:27] probability of x2 equals 2 and look at all the rows where x2 is 2
[00:06:30] and look at all the rows where x2 is 2 and these are these last two rows
[00:06:33] and these are these last two rows and we add up these probabilities and
[00:06:34] and we add up these probabilities and that gives us a point
[00:06:37] that gives us a point there's some rounding error here
[00:06:39] there's some rounding error here why this doesn't add up exactly
[00:06:42] why this doesn't add up exactly okay so that allows us to answer
[00:06:44] okay so that allows us to answer marginal probability
[00:06:46] marginal probability so one thing you might note is that
[00:06:49] so one thing you might note is that the answer here is actually different
[00:06:51] the answer here is actually different than what if you just look at the max
[00:06:53] than what if you just look at the max weight assignment
[00:06:55] weight assignment in particular if you look at the maximum
[00:06:57] in particular if you look at the maximum weight assignment it says the most
[00:06:58] weight assignment it says the most likely thing is that x2
[00:07:01] likely thing is that x2 is one two two and you look at x2 oh and
[00:07:04] is one two two and you look at x2 oh and it says two
[00:07:06] it says two but notice that that is not the most
[00:07:08] but notice that that is not the most likely
[00:07:09] likely marg under the marginal probability the
[00:07:11] marg under the marginal probability the most likely uh value for x2 under
[00:07:14] most likely uh value for x2 under marginal probability is one and it has
[00:07:16] marginal probability is one and it has 62 percent chance of being a one in that
[00:07:20] 62 percent chance of being a one in that case
[00:07:22] case so
[00:07:23] so the intuition here is that while this
[00:07:27] the intuition here is that while this trajectory
[00:07:28] trajectory is has indeed the largest weight
[00:07:31] is has indeed the largest weight um there is actually a lot of
[00:07:33] um there is actually a lot of decentralized evidence for x2 equals one
[00:07:36] decentralized evidence for x2 equals one with these assignments which have less
[00:07:38] with these assignments which have less weight but kind of strengthen numbers we
[00:07:42] weight but kind of strengthen numbers we if you add up all these weights they
[00:07:44] if you add up all these weights they actually out
[00:07:46] actually out number um the evidence for x2 equals
[00:07:50] number um the evidence for x2 equals so this is kind of an important lesson
[00:07:51] so this is kind of an important lesson that
[00:07:52] that what question
[00:07:54] what question what answer you're going to get really
[00:07:55] what answer you're going to get really depends on the type of question you're
[00:07:58] depends on the type of question you're asking
[00:07:59] asking and in this case if you're really
[00:08:00] and in this case if you're really interested in objects at time step 2
[00:08:02] interested in objects at time step 2 then marginal probability is
[00:08:06] so let's look at a particular example so
[00:08:09] so let's look at a particular example so the easy model is a very canonical
[00:08:12] the easy model is a very canonical example that dates back to the 1920s
[00:08:16] example that dates back to the 1920s from statistical physics
[00:08:17] from statistical physics and the idea was that this is a model of
[00:08:20] and the idea was that this is a model of ferromagnetism
[00:08:24] so the idea is that you have um a markup
[00:08:27] so the idea is that you have um a markup network which contains a bunch of
[00:08:30] network which contains a bunch of different sites um
[00:08:33] different sites um and each site is going to be denoted x i
[00:08:36] and each site is going to be denoted x i which can take on two values minus one
[00:08:39] which can take on two values minus one and plus one so minus one represents a
[00:08:41] and plus one so minus one represents a down spin and plus one represents um an
[00:08:45] down spin and plus one represents um an up spin
[00:08:47] up spin and furthermore
[00:08:50] and furthermore all these uh
[00:08:52] all these uh variables are going to be related by
[00:08:55] variables are going to be related by a factor so we're going to call this f
[00:08:57] a factor so we're going to call this f i j
[00:08:59] i j which connects the site i and site j
[00:09:01] which connects the site i and site j it's going to depend on the spin of i
[00:09:04] it's going to depend on the spin of i and the spin of type j
[00:09:06] and the spin of type j and that's going to be equal to
[00:09:09] and that's going to be equal to um
[00:09:10] um x of
[00:09:11] x of beta times x i x j
[00:09:14] beta times x i x j okay so the intuition is that we want
[00:09:16] okay so the intuition is that we want neighboring insights to have the same
[00:09:19] neighboring insights to have the same okay so by multiplying these together
[00:09:22] okay so by multiplying these together uh if both of them are have the same
[00:09:25] uh if both of them are have the same sign then this is going to be one if
[00:09:26] sign then this is going to be one if they have opposite signs they're going
[00:09:28] they have opposite signs they're going to be negative one
[00:09:29] to be negative one and beta here is a scaling that says how
[00:09:33] and beta here is a scaling that says how strong is the
[00:09:34] strong is the affinity so if theta is zero
[00:09:37] affinity so if theta is zero that means uh this is just x of zero
[00:09:41] that means uh this is just x of zero which is one so that means there's no
[00:09:42] which is one so that means there's no connection between and as beta increases
[00:09:46] connection between and as beta increases then the affinity becomes stronger the
[00:09:48] then the affinity becomes stronger the difference between
[00:09:50] difference between them a green and not a green becomes
[00:09:54] them a green and not a green becomes heightened
[00:09:56] heightened so one thing uh easy models are useful
[00:09:59] so one thing uh easy models are useful for is to study phase transitions in the
[00:10:01] for is to study phase transitions in the ethical systems so here is an example of
[00:10:04] ethical systems so here is an example of what happens when a beta increases
[00:10:08] what happens when a beta increases so
[00:10:10] this
[00:10:11] this if beta is close to zero then you're
[00:10:14] if beta is close to zero then you're basically going to get unstructured
[00:10:17] basically going to get unstructured systems where
[00:10:18] systems where each site is just behaving independently
[00:10:21] each site is just behaving independently in fact if beta is zero then you all
[00:10:23] in fact if beta is zero then you all assignments are equally likely
[00:10:25] assignments are equally likely and as beta increases you'll see that
[00:10:27] and as beta increases you'll see that more and more coherence happens
[00:10:30] more and more coherence happens where neighboring sites really like to
[00:10:32] where neighboring sites really like to be close to each other but of course
[00:10:34] be close to each other but of course there's going to be some kind of sharp
[00:10:36] there's going to be some kind of sharp ridges where um two neighbors have to
[00:10:39] ridges where um two neighbors have to disagree
[00:10:42] so how we're gonna sample from
[00:10:45] so how we're gonna sample from this model
[00:10:46] this model is going to be a topic for another
[00:10:48] is going to be a topic for another module
[00:10:51] so here is another canonical application
[00:10:53] so here is another canonical application of markov networks from computer vision
[00:10:55] of markov networks from computer vision so this used to be very popular before
[00:10:58] so this used to be very popular before um so the idea is that you take a noisy
[00:11:02] um so the idea is that you take a noisy image and you want to denoise it into a
[00:11:04] image and you want to denoise it into a clean image
[00:11:06] clean image so we're going to present a very
[00:11:08] so we're going to present a very stylized uh simple example of this
[00:11:11] stylized uh simple example of this so here is our
[00:11:13] so here is our three by five image so each side is a
[00:11:16] three by five image so each side is a pixel and we assume that
[00:11:19] pixel and we assume that we only
[00:11:21] we only so x i is either zero or one
[00:11:24] so x i is either zero or one uh which is a pixel value which is
[00:11:25] uh which is a pixel value which is unknown we're modeling the clean image
[00:11:28] unknown we're modeling the clean image and we assume that only a subset of the
[00:11:30] and we assume that only a subset of the pixels are observed so maybe we observe
[00:11:32] pixels are observed so maybe we observe this one this one this one this one
[00:11:34] this one this one this one this one this one and the uh the goal is to fill
[00:11:37] this one and the uh the goal is to fill in the rest of the pixels
[00:11:39] in the rest of the pixels so we can capture
[00:11:41] so we can capture this observation
[00:11:44] this observation by an observation potential of o i x i
[00:11:47] by an observation potential of o i x i which is one if x i agrees with the
[00:11:51] which is one if x i agrees with the observation and zero if it doesn't so
[00:11:54] observation and zero if it doesn't so this is a hard constraint that says
[00:11:56] this is a hard constraint that says where i observed a value
[00:11:58] where i observed a value x i must take on that value so this one
[00:12:01] x i must take on that value so this one has to be zero this one has to be one
[00:12:03] has to be zero this one has to be one and so on
[00:12:04] and so on and finally we have um
[00:12:07] and finally we have um transition factors that say neighboring
[00:12:09] transition factors that say neighboring pixels are more likely to be the same
[00:12:11] pixels are more likely to be the same than different so again the same
[00:12:13] than different so again the same intuition is the easy model
[00:12:16] intuition is the easy model and we're going to denote this as tij
[00:12:19] and we're going to denote this as tij and this equals
[00:12:22] and this equals is 2 if two
[00:12:25] is 2 if two neighboring pixels agree
[00:12:27] neighboring pixels agree and is going to be one if they
[00:12:35] so
[00:12:35] so let me summarize markov networks you can
[00:12:38] let me summarize markov networks you can think about it succinctly as taking
[00:12:40] think about it succinctly as taking factor graphs and
[00:12:42] factor graphs and marrying them with probability
[00:12:45] marrying them with probability so again factor graphs already have done
[00:12:48] so again factor graphs already have done a lot of the work they allow already you
[00:12:51] a lot of the work they allow already you to specify a non-negative weight for
[00:12:53] to specify a non-negative weight for every assignment and all we have to do
[00:12:56] every assignment and all we have to do is normalize that to get a probability
[00:12:59] is normalize that to get a probability distribution
[00:13:01] distribution and once we have the probability
[00:13:02] and once we have the probability distribution we can answer all sorts of
[00:13:04] distribution we can answer all sorts of queries for example computing marginal
[00:13:06] queries for example computing marginal probabilities which allows us to
[00:13:08] probabilities which allows us to pinpoint individual variables and ask
[00:13:11] pinpoint individual variables and ask questions about them
[00:13:13] questions about them so it is useful comparing markup
[00:13:14] so it is useful comparing markup networks with uh csp
[00:13:17] networks with uh csp so dsps we talked about variables who
[00:13:20] so dsps we talked about variables who are known unknown
[00:13:21] are known unknown and
[00:13:22] and markov networks it's
[00:13:24] markov networks it's we're call them random variables uh
[00:13:27] we're call them random variables uh maybe hive behave like variables but
[00:13:29] maybe hive behave like variables but they're random variables because um
[00:13:31] they're random variables because um we're endowing them with a probabilistic
[00:13:33] we're endowing them with a probabilistic interpretation
[00:13:35] interpretation in csps we talked about weights
[00:13:37] in csps we talked about weights markov networks we talked about
[00:13:39] markov networks we talked about probabilities which are the normalized
[00:13:41] probabilities which are the normalized weights
[00:13:42] weights and the main difference is that in csps
[00:13:44] and the main difference is that in csps we were trying to find the maximum
[00:13:46] we were trying to find the maximum weight assignment
[00:13:48] weight assignment and in markov networks we're looking at
[00:13:51] and in markov networks we're looking at the distribution over
[00:13:53] the distribution over assignments holistically and answering
[00:13:55] assignments holistically and answering questions about marginal probabilities
[00:13:58] questions about marginal probabilities which gives us a more nuanced idea of
[00:14:00] which gives us a more nuanced idea of the set of possible
[00:14:04] okay that's it for this module


================================================================================
LECTURE 031
================================================================================

Markov Networks 2 - Gibbs Sampling | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=k6aZZF2pk7k

---

Transcript

[00:00:06] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about gibbs sampling simple algorithm
[00:00:09] about gibbs sampling simple algorithm for approximately computing marginal
[00:00:12] for approximately computing marginal probabilities and
[00:00:15] so recall that a markov network is based
[00:00:17] so recall that a markov network is based on a factor graph
[00:00:19] on a factor graph and
[00:00:20] and a factor graph
[00:00:23] a factor graph gives a weight to every possible
[00:00:25] gives a weight to every possible assignment of variables in that vector
[00:00:26] assignment of variables in that vector graph
[00:00:27] graph and in a markov network we'll convert
[00:00:29] and in a markov network we'll convert that weight into a probability by first
[00:00:32] that weight into a probability by first computing
[00:00:33] computing the normalization constant which is sum
[00:00:35] the normalization constant which is sum over
[00:00:37] over uh all the assignments of the weight of
[00:00:39] uh all the assignments of the weight of that assignment
[00:00:40] that assignment divide by that normalization constant
[00:00:42] divide by that normalization constant and we get the probability of
[00:00:46] and we get the probability of assignment little x
[00:00:49] assignment little x so in this object tracking example
[00:00:51] so in this object tracking example we see
[00:00:53] we see we have a bunch of different assignments
[00:00:56] we have a bunch of different assignments their weights
[00:00:57] their weights the partition function in this case is
[00:00:58] the partition function in this case is 26 we divide each of these weights by 26
[00:01:02] 26 we divide each of these weights by 26 and we get
[00:01:03] and we get these probabilities
[00:01:06] these probabilities so the cool thing with markov networks
[00:01:07] so the cool thing with markov networks is that you can compute marginal
[00:01:09] is that you can compute marginal probability
[00:01:11] probability and that's going to be our focus today
[00:01:13] and that's going to be our focus today so marginal probability
[00:01:15] so marginal probability is going to be focusing on one
[00:01:17] is going to be focusing on one particular variable x i and asking what
[00:01:20] particular variable x i and asking what values could it take on
[00:01:22] values could it take on and to get that we're going to sum over
[00:01:25] and to get that we're going to sum over all possible assignments where x i does
[00:01:28] all possible assignments where x i does actually equal
[00:01:29] actually equal of the joint probability of that
[00:01:32] of the joint probability of that assignment
[00:01:33] assignment and in this example
[00:01:35] and in this example if you look
[00:01:37] if you look and ask for the probability of x2 equals
[00:01:39] and ask for the probability of x2 equals one you sum over all the rows where x
[00:01:42] one you sum over all the rows where x two one that gives you uh
[00:01:45] two one that gives you uh point six two and if you ask for
[00:01:48] point six two and if you ask for x two equals two
[00:01:50] x two equals two either you're summing over the last two
[00:01:52] either you're summing over the last two rows and that gives you
[00:01:56] so now let me present give sampling
[00:02:00] so now let me present give sampling there's a simple algorithm for
[00:02:01] there's a simple algorithm for approximately computing these marginal
[00:02:04] approximately computing these marginal you could
[00:02:05] you could iterate over all possible assignments
[00:02:06] iterate over all possible assignments and compute but that would take
[00:02:08] and compute but that would take exponential time
[00:02:10] exponential time so give sampling is going to follow the
[00:02:12] so give sampling is going to follow the template of local search where we're
[00:02:14] template of local search where we're going to go through each variable one at
[00:02:16] going to go through each variable one at a time and update them
[00:02:17] a time and update them but unlike iterated conditional modes
[00:02:20] but unlike iterated conditional modes which we saw before give sampling is a
[00:02:22] which we saw before give sampling is a randomized algorithm tailored for the
[00:02:24] randomized algorithm tailored for the purpose of computing
[00:02:27] purpose of computing so let's present the algorithm
[00:02:29] so let's present the algorithm so
[00:02:30] so we're going to initialize the assignment
[00:02:32] we're going to initialize the assignment to some completely random
[00:02:34] to some completely random assignment
[00:02:36] assignment and then we're going to loop through
[00:02:37] and then we're going to loop through each of the variables until convergence
[00:02:40] each of the variables until convergence which i'll talk about a little bit later
[00:02:42] which i'll talk about a little bit later we're going to
[00:02:45] we're going to set
[00:02:46] set the assignment x i equals v
[00:02:49] the assignment x i equals v with this probability the probability of
[00:02:52] with this probability the probability of x i equals v
[00:02:54] x i equals v given
[00:02:56] given x minus i equals
[00:02:57] x minus i equals x minus i so this my x minus i notation
[00:03:01] x minus i so this my x minus i notation just refers to all the variables except
[00:03:03] just refers to all the variables except for x i
[00:03:06] for x i i'll come back to this in a second but
[00:03:08] i'll come back to this in a second but let me just highlight the kind of the
[00:03:09] let me just highlight the kind of the general flow of the algorithm so suppose
[00:03:12] general flow of the algorithm so suppose you have three variables if sampling is
[00:03:14] you have three variables if sampling is going to try to sample x1 holding the
[00:03:18] going to try to sample x1 holding the other ones fixed and now it's going to
[00:03:20] other ones fixed and now it's going to move on to x2 holding the others fixed
[00:03:22] move on to x2 holding the others fixed and update x2 and then go to x3 and then
[00:03:26] and update x2 and then go to x3 and then it's going to cycle back to x1 x2 x3 and
[00:03:28] it's going to cycle back to x1 x2 x3 and so
[00:03:31] so now how do i sample
[00:03:33] so now how do i sample x i equals
[00:03:36] x i equals so um
[00:03:37] so um [Music]
[00:03:38] [Music] here is one example
[00:03:41] here is one example what we're going to do is we're going to
[00:03:44] what we're going to do is we're going to try uh assigning
[00:03:48] try uh assigning x i equals v
[00:03:50] x i equals v and getting some
[00:03:51] and getting some weight
[00:03:53] weight so for every possible assignment of x2
[00:03:56] so for every possible assignment of x2 i'm going to get some weight
[00:03:58] i'm going to get some weight and now remember in icm i would just
[00:04:00] and now remember in icm i would just simply take the value that produce the
[00:04:03] simply take the value that produce the largest weight
[00:04:04] largest weight but the main difference with gibb
[00:04:06] but the main difference with gibb sampling is that i'm going to take these
[00:04:08] sampling is that i'm going to take these weights and i'm going to normalize them
[00:04:10] weights and i'm going to normalize them to produce a probability again
[00:04:13] to produce a probability again normalizing is summing these values but
[00:04:16] normalizing is summing these values but when i get five and dividing by five to
[00:04:19] when i get five and dividing by five to get probability point two point four
[00:04:21] get probability point two point four four
[00:04:23] four and now i'm going to sample x two equals
[00:04:26] and now i'm going to sample x two equals one of these values according to this
[00:04:27] one of these values according to this probability district
[00:04:29] probability district so you can visualize that sampling
[00:04:31] so you can visualize that sampling process
[00:04:32] process by
[00:04:34] by the interval from 0 to 1
[00:04:36] the interval from 0 to 1 where i have a number of segments
[00:04:39] where i have a number of segments representing the different possible
[00:04:41] representing the different possible values of x2
[00:04:43] values of x2 and the length is uh exactly the
[00:04:45] and the length is uh exactly the probability so probability of x2 equals
[00:04:48] probability so probability of x2 equals 0 probability of x2 equals 1 and
[00:04:51] 0 probability of x2 equals 1 and probability of x2 equals 2.
[00:04:54] probability of x2 equals 2. and then i'm going to throw a
[00:04:55] and then i'm going to throw a one-dimensional dart at this um at this
[00:04:58] one-dimensional dart at this um at this line i'm going to hit it somewhere and
[00:05:00] line i'm going to hit it somewhere and i'm going to take whatever value
[00:05:03] i'm going to take whatever value is
[00:05:04] is specified by that and
[00:05:07] specified by that and okay so now i have
[00:05:09] okay so now i have a new value for
[00:05:11] a new value for x2 here and now i proceed to the next
[00:05:14] x2 here and now i proceed to the next variable and
[00:05:18] so
[00:05:20] so so that
[00:05:22] so that produces a sequence of
[00:05:24] produces a sequence of samples
[00:05:25] samples of the assignments
[00:05:27] of the assignments and the main remaining thing to do is to
[00:05:30] and the main remaining thing to do is to aggregate them
[00:05:32] aggregate them so i'm going to
[00:05:34] so i'm going to every time i go through this loop i'm
[00:05:36] every time i go through this loop i'm going to increment a counter
[00:05:39] going to increment a counter for variable i
[00:05:40] for variable i of the particular value
[00:05:43] of the particular value that i i saw
[00:05:46] that i i saw okay so
[00:05:47] okay so and at the very end
[00:05:49] and at the very end i'm going to compute an estimate p hat
[00:05:52] i'm going to compute an estimate p hat of
[00:05:53] of x i equals x little x i
[00:05:56] x i equals x little x i and this is going to be simply the
[00:05:58] and this is going to be simply the normalized version of the count
[00:06:02] normalized version of the count so this is going to be the relative
[00:06:04] so this is going to be the relative frequency of
[00:06:06] frequency of seeing a particular value little x i uh
[00:06:09] seeing a particular value little x i uh compared to you know everything else
[00:06:11] compared to you know everything else i've seen
[00:06:14] i've seen but there's a lot of counting and
[00:06:16] but there's a lot of counting and normalizing
[00:06:19] normalizing so let's look at this demo to uh give us
[00:06:22] so let's look at this demo to uh give us a
[00:06:23] a more fuller sense what's going on okay
[00:06:26] more fuller sense what's going on okay so here is the object tracking example i
[00:06:27] so here is the object tracking example i have three variables and here i'm going
[00:06:30] have three variables and here i'm going i can specify the query which is which
[00:06:33] i can specify the query which is which variable am i interested in calculating
[00:06:35] variable am i interested in calculating the marginal of
[00:06:37] the marginal of and i'm going to run gibbs sampling here
[00:06:41] and then at the beginning
[00:06:43] and then at the beginning i sample a variable x 1
[00:06:46] i sample a variable x 1 given everything else so consider all
[00:06:49] given everything else so consider all the possible values of x1
[00:06:51] the possible values of x1 um i'm going to look at their
[00:06:53] um i'm going to look at their potentials or factors computer weight
[00:06:57] potentials or factors computer weight normalize
[00:06:59] normalize to get a distribution and i'm going to
[00:07:01] to get a distribution and i'm going to sample a value according to
[00:07:03] sample a value according to these probabilities so this in this case
[00:07:06] these probabilities so this in this case is just a coin flip i choose x1 equals
[00:07:08] is just a coin flip i choose x1 equals zero
[00:07:09] zero and then i update my counter
[00:07:12] and then i update my counter so i'm recording that i saw
[00:07:14] so i'm recording that i saw x2 equals ones once
[00:07:19] x2 equals ones once okay
[00:07:20] okay and then i'm going to move on to the
[00:07:21] and then i'm going to move on to the next variable x2 do the same thing
[00:07:24] next variable x2 do the same thing move to the next variable x3 and do the
[00:07:27] move to the next variable x3 and do the same thing and i'm going to just cycle
[00:07:29] same thing and i'm going to just cycle for this for a moment you can see that
[00:07:32] for this for a moment you can see that um
[00:07:33] um the the assignment which is depicted up
[00:07:36] the the assignment which is depicted up here is changing
[00:07:39] here is changing um
[00:07:40] um and down here
[00:07:42] and down here um i can see that the count
[00:07:45] um i can see that the count of the number of times x1
[00:07:47] of the number of times x1 x2 equals 1 has gone up to 25
[00:07:51] x2 equals 1 has gone up to 25 um and now look i actually hit a
[00:07:54] um and now look i actually hit a different value x i went to a
[00:07:56] different value x i went to a configuration where x2 equals two now
[00:07:59] configuration where x2 equals two now um
[00:08:00] um and then i might sample a little bit
[00:08:03] and then i might sample a little bit more and they'll come back to one
[00:08:05] more and they'll come back to one and you can just watch this for a little
[00:08:08] and you can just watch this for a little while and you can see over here that
[00:08:11] while and you can see over here that these are the estimates of the marginal
[00:08:14] these are the estimates of the marginal probability of x2 based on the counts so
[00:08:17] probability of x2 based on the counts so these numbers are simply these
[00:08:19] these numbers are simply these normalized versions of these
[00:08:22] normalized versions of these so i'm going to speed this up a little
[00:08:24] so i'm going to speed this up a little bit so let me do just a thousand steps
[00:08:27] bit so let me do just a thousand steps at a time
[00:08:28] at a time okay so now i have a val if i did a
[00:08:31] okay so now i have a val if i did a thousand steps of gibbs sampling now i
[00:08:33] thousand steps of gibbs sampling now i have a lot of counts of x uh two equals
[00:08:35] have a lot of counts of x uh two equals one some counts of x two equals two and
[00:08:38] one some counts of x two equals two and now you can see the probabilities are
[00:08:42] now you can see the probabilities are kind of converging to something like 0.6
[00:08:44] kind of converging to something like 0.6 and 0.3 let me just hit a step a few
[00:08:47] and 0.3 let me just hit a step a few more times and you can see that these
[00:08:50] more times and you can see that these probabilities are indeed converging to
[00:08:53] probabilities are indeed converging to 0.61
[00:08:55] 0.61 which if you remember from here is
[00:08:58] which if you remember from here is pretty close to the true marginal
[00:09:00] pretty close to the true marginal probability
[00:09:03] probability okay so
[00:09:04] okay so it seems you know at first gland kind of
[00:09:07] it seems you know at first gland kind of a wild thing right so we're running this
[00:09:09] a wild thing right so we're running this algorithm it's just generating samples
[00:09:11] algorithm it's just generating samples left and right it's
[00:09:13] left and right it's kind of random and yet if i compute the
[00:09:17] kind of random and yet if i compute the randomness is very carefully are
[00:09:19] randomness is very carefully are orchestrated so that when i sum things
[00:09:21] orchestrated so that when i sum things up properly i actually get the right
[00:09:24] up properly i actually get the right answer out
[00:09:28] so let me now go to the image de-noising
[00:09:31] so let me now go to the image de-noising example so here the goal is you're given
[00:09:33] example so here the goal is you're given a noisy image clean it up and in our
[00:09:37] a noisy image clean it up and in our simplified version
[00:09:38] simplified version i have
[00:09:39] i have x i which represents the
[00:09:42] x i which represents the clean pixel value
[00:09:44] clean pixel value which i don't know
[00:09:45] which i don't know a subset of the pixels are observed so
[00:09:48] a subset of the pixels are observed so for example these um in green here
[00:09:51] for example these um in green here and i'm going to clamp those pixel
[00:09:52] and i'm going to clamp those pixel values to the observed value and then i
[00:09:55] values to the observed value and then i have a
[00:09:56] have a factor
[00:09:58] factor that says neighboring pixels are twice
[00:10:00] that says neighboring pixels are twice as likely to be the same than different
[00:10:04] as likely to be the same than different so let's do give sampling in this image
[00:10:06] so let's do give sampling in this image noise in case so what give sampling
[00:10:08] noise in case so what give sampling would do is it's going to sweep across
[00:10:11] would do is it's going to sweep across the image
[00:10:12] the image and sample each variable condition on
[00:10:14] and sample each variable condition on the left
[00:10:16] the left so
[00:10:17] so suppose
[00:10:18] suppose i'm landing on this particular
[00:10:21] i'm landing on this particular pixel value and i'm trying to figure out
[00:10:22] pixel value and i'm trying to figure out what should its value be
[00:10:25] what should its value be so again i look at the possible values
[00:10:26] so again i look at the possible values it could be zero or one and for each
[00:10:29] it could be zero or one and for each value i'm going to compute a weight
[00:10:31] value i'm going to compute a weight so
[00:10:32] so remember from icm that i actually don't
[00:10:35] remember from icm that i actually don't need to
[00:10:36] need to compute the weight of the entire
[00:10:38] compute the weight of the entire assignment i just only need to look at
[00:10:40] assignment i just only need to look at the factors which are dependent on this
[00:10:43] the factors which are dependent on this value
[00:10:44] value okay so
[00:10:45] okay so let's consider v equals zero
[00:10:48] let's consider v equals zero so here if i put zero here that means
[00:10:50] so here if i put zero here that means this potential is going to be happy
[00:10:52] this potential is going to be happy because years agree and i'm gonna get a
[00:10:54] because years agree and i'm gonna get a two
[00:10:55] two um
[00:10:56] um and this one is going to disagree
[00:10:59] and this one is going to disagree this one's going to disagree on this one
[00:11:02] this one's going to disagree on this one so the weight is 2 times 1 times 1 times
[00:11:05] so the weight is 2 times 1 times 1 times 1 which is
[00:11:07] 1 which is so now if i try to put a 1 in this
[00:11:10] so now if i try to put a 1 in this position
[00:11:12] position now
[00:11:12] now um this uh potential says one while the
[00:11:16] um this uh potential says one while the others say two
[00:11:18] others say two so now that has a weight of eight
[00:11:21] so now that has a weight of eight so now to get the probability of x i
[00:11:23] so now to get the probability of x i equals v
[00:11:24] equals v given everything else i'm simply going
[00:11:27] given everything else i'm simply going to sum up and normalize so i have 2 and
[00:11:30] to sum up and normalize so i have 2 and 8 here the normalization constant
[00:11:33] 8 here the normalization constant is 10. so i get probabilities 0.2 and
[00:11:36] is 10. so i get probabilities 0.2 and 0.8
[00:11:37] 0.8 now given this distribution i'm going to
[00:11:39] now given this distribution i'm going to set
[00:11:40] set this value to
[00:11:42] this value to 1 with probability of 0.8 and 0 with
[00:11:45] 1 with probability of 0.8 and 0 with probability 0.2
[00:11:46] probability 0.2 and then i'm going to keep on going
[00:11:50] so here is a fun little demo of give
[00:11:53] so here is a fun little demo of give sampling for image noise that runs in
[00:11:56] sampling for image noise that runs in your browser
[00:11:57] your browser okay so the idea is that here is an
[00:11:59] okay so the idea is that here is an image and if you
[00:12:01] image and if you uh hit control enter here
[00:12:04] uh hit control enter here um you'll see that this is the input to
[00:12:07] um you'll see that this is the input to the system so we have black pixels and
[00:12:11] the system so we have black pixels and red pixels these are the observed pixels
[00:12:14] red pixels these are the observed pixels and white
[00:12:16] and white pixels are unobserved and these are the
[00:12:18] pixels are unobserved and these are the ones that we want to fill in
[00:12:20] ones that we want to fill in so there's a bunch of settings which
[00:12:22] so there's a bunch of settings which i'll talk to about in a second but if
[00:12:24] i'll talk to about in a second but if you click here you can see how get a
[00:12:27] you click here you can see how get a feeling for what give sampling is doing
[00:12:30] feeling for what give sampling is doing each frame here each iteration is a full
[00:12:34] each frame here each iteration is a full pass over all
[00:12:35] pass over all the pixels and you can see that it's
[00:12:37] the pixels and you can see that it's kind of dancing around because it's
[00:12:39] kind of dancing around because it's trying to explore
[00:12:40] trying to explore different uh assignments
[00:12:44] different uh assignments so one thing you can do is um you can
[00:12:47] so one thing you can do is um you can set show marginals equals true
[00:12:49] set show marginals equals true and what this does is that instead of
[00:12:52] and what this does is that instead of visualizing the assignment at a
[00:12:54] visualizing the assignment at a particular iteration for each of his
[00:12:57] particular iteration for each of his pixel here i'm actually visualizing the
[00:13:00] pixel here i'm actually visualizing the marginal probability estimate so this is
[00:13:03] marginal probability estimate so this is in general going to be a number between
[00:13:04] in general going to be a number between 0 and 1 which is represented as a shade
[00:13:07] 0 and 1 which is represented as a shade between black and red here
[00:13:09] between black and red here so this in some sense is the kind of
[00:13:11] so this in some sense is the kind of best guess at what the reconstruction
[00:13:14] best guess at what the reconstruction is
[00:13:16] is so there are a number of things you can
[00:13:18] so there are a number of things you can play with so for example the fraction of
[00:13:20] play with so for example the fraction of missing pixels if i reduce this to let's
[00:13:22] missing pixels if i reduce this to let's say 0.3
[00:13:24] say 0.3 then
[00:13:26] then you know the problem becomes easier and
[00:13:28] you know the problem becomes easier and you can see that the reconstruction gets
[00:13:30] you can see that the reconstruction gets some you know pretty reasonable
[00:13:32] some you know pretty reasonable results another fun thing you can play
[00:13:35] results another fun thing you can play with is um well actually let me let me
[00:13:38] with is um well actually let me let me bring down the map
[00:13:40] bring down the map bring up the missing fraction to one
[00:13:43] bring up the missing fraction to one okay so that means i don't see any pixel
[00:13:45] okay so that means i don't see any pixel so
[00:13:47] so here um this is just going to be
[00:13:49] here um this is just going to be actually i mean let me do
[00:13:51] actually i mean let me do show margins equals false
[00:13:53] show margins equals false oops
[00:13:58] so here you can see kind of just blind
[00:14:00] so here you can see kind of just blind samples from
[00:14:02] samples from the model
[00:14:03] the model okay and
[00:14:05] okay and if i pop up the coherence if i bump it
[00:14:08] if i pop up the coherence if i bump it down
[00:14:09] down um then you'll see kind of a more random
[00:14:11] um then you'll see kind of a more random pattern
[00:14:13] pattern if i bump it up to 10
[00:14:15] if i bump it up to 10 then you'll see kind of more coherence
[00:14:18] then you'll see kind of more coherence so remember this is kind of like the
[00:14:19] so remember this is kind of like the phase transitions that we saw for the
[00:14:22] phase transitions that we saw for the easy
[00:14:24] easy okay so i will let you
[00:14:26] okay so i will let you play with this on your phone
[00:14:29] play with this on your phone let me just conclude here uh actually
[00:14:32] let me just conclude here uh actually one thing before we
[00:14:33] one thing before we so
[00:14:34] so let me try to go back to iterated
[00:14:36] let me try to go back to iterated conditional modes and compare that with
[00:14:38] conditional modes and compare that with give samples both of them have the same
[00:14:41] give samples both of them have the same kind of template you're working with
[00:14:43] kind of template you're working with complete assignments and you're going
[00:14:44] complete assignments and you're going through each variable and updating the
[00:14:46] through each variable and updating the assignment to that variable one at a
[00:14:49] assignment to that variable one at a time
[00:14:49] time but there's a few differences here
[00:14:52] but there's a few differences here one the first salient one is that
[00:14:55] one the first salient one is that idea of conditional modes was for
[00:14:56] idea of conditional modes was for solving csps where we're trying to find
[00:14:58] solving csps where we're trying to find the maximum weight assignment dip
[00:15:00] the maximum weight assignment dip sampling is for markup networks where
[00:15:01] sampling is for markup networks where we're trying to compute marginal
[00:15:03] we're trying to compute marginal probabilities
[00:15:05] probabilities so as a consequence for icm
[00:15:08] so as a consequence for icm we at each step we're choosing the value
[00:15:13] we at each step we're choosing the value to sign to a variable which maximizes
[00:15:16] to sign to a variable which maximizes its weight whereas in give sampling
[00:15:19] its weight whereas in give sampling we're
[00:15:20] we're using the weights to form a distribution
[00:15:22] using the weights to form a distribution and sampling from that distribution
[00:15:25] and sampling from that distribution in icm we notice that
[00:15:27] in icm we notice that the algorithm does converge but often to
[00:15:30] the algorithm does converge but often to a local optimum which is not the best
[00:15:33] a local optimum which is not the best maximum weight assignment
[00:15:35] maximum weight assignment for gift sampling
[00:15:37] for gift sampling as you can see from these samples
[00:15:38] as you can see from these samples there's no traditional notions of
[00:15:40] there's no traditional notions of convergence then the samples are going
[00:15:42] convergence then the samples are going to keep on changing and keep on changing
[00:15:44] to keep on changing and keep on changing so the iterates are not the ones which
[00:15:46] so the iterates are not the ones which are converging what is actually going to
[00:15:48] are converging what is actually going to converge are the marginal estimates
[00:15:52] converge are the marginal estimates and
[00:15:53] and in under some technical assumptions
[00:15:56] in under some technical assumptions these estimates are actually going to
[00:15:58] these estimates are actually going to converge to the correct answer
[00:16:00] converge to the correct answer we saw that for object tracking it did a
[00:16:02] we saw that for object tracking it did a pretty good job there
[00:16:04] pretty good job there um but there were some kind of technical
[00:16:05] um but there were some kind of technical conditions um one sufficient condition
[00:16:08] conditions um one sufficient condition is that all the weights uh be
[00:16:10] is that all the weights uh be positive
[00:16:12] positive but more
[00:16:13] but more uh generally what we need is that for
[00:16:16] uh generally what we need is that for the probability of going from one
[00:16:18] the probability of going from one assignment to another assignment via
[00:16:19] assignment to another assignment via give sampling has positive probability
[00:16:21] give sampling has positive probability because if you have two disconnected um
[00:16:25] because if you have two disconnected um regions then you can't if you start keep
[00:16:28] regions then you can't if you start keep sampling at one particular point then
[00:16:29] sampling at one particular point then you will never reach the other point
[00:16:32] you will never reach the other point so one important caveat is skip sampling
[00:16:36] so one important caveat is skip sampling is wonderful but
[00:16:37] is wonderful but it in the worst case it does take
[00:16:39] it in the worst case it does take exponential time so these are really
[00:16:41] exponential time so these are really computing margin probabilities is a
[00:16:42] computing margin probabilities is a really hard problem and gibb sampling is
[00:16:45] really hard problem and gibb sampling is just you know a heuristic with some
[00:16:48] just you know a heuristic with some nice
[00:16:49] nice asymptotic guarantee
[00:16:53] so wrapping up
[00:16:54] so wrapping up um we looked at
[00:16:56] um we looked at computing the marginal probabilities of
[00:17:00] computing the marginal probabilities of a markov network
[00:17:02] a markov network and we saw that gibbs sampling did this
[00:17:06] and we saw that gibbs sampling did this by sampling one variable at a time
[00:17:09] by sampling one variable at a time and it counts visitations
[00:17:12] and it counts visitations to each of the values for a given
[00:17:15] to each of the values for a given variable
[00:17:17] variable and it's one of these kind of
[00:17:18] and it's one of these kind of astonishing things that give sampling
[00:17:21] astonishing things that give sampling is so carefully constructed that it
[00:17:23] is so carefully constructed that it actually kind of works and you can prove
[00:17:25] actually kind of works and you can prove lots of interesting theorems about it
[00:17:28] lots of interesting theorems about it finally gibbs family is just the first
[00:17:31] finally gibbs family is just the first taste of a much more broad class of
[00:17:34] taste of a much more broad class of techniques called markov chain monte
[00:17:36] techniques called markov chain monte carlo which are used to
[00:17:40] carlo which are used to produce
[00:17:41] produce much kind of richer ways of estimating
[00:17:44] much kind of richer ways of estimating probabilities in markup
[00:17:47] probabilities in markup all right that's the end of this module


================================================================================
LECTURE 032
================================================================================

Bayesian Networks 1 - Overview | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=fA7zP6EcVdw

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about bayesian networks and new modeling
[00:00:09] about bayesian networks and new modeling paradigm
[00:00:10] paradigm so we have talked about two types of
[00:00:12] so we have talked about two types of variable based models the first was
[00:00:14] variable based models the first was constraint satisfaction problems where
[00:00:16] constraint satisfaction problems where the objective is to find the maximum
[00:00:18] the objective is to find the maximum weight assignment given a factor graph
[00:00:20] weight assignment given a factor graph then we talked about markup networks
[00:00:22] then we talked about markup networks where we used factor graphs to define a
[00:00:24] where we used factor graphs to define a joint probability distribution over
[00:00:26] joint probability distribution over assignments and we were computing
[00:00:28] assignments and we were computing marginal probabilities
[00:00:31] marginal probabilities now we're going to talk about bayesian
[00:00:32] now we're going to talk about bayesian networks where we still define a
[00:00:34] networks where we still define a distribution over
[00:00:36] distribution over a set of random variables using a factor
[00:00:38] a set of random variables using a factor graph but now the factors are going to
[00:00:40] graph but now the factors are going to have special meaning
[00:00:42] have special meaning the bayesian networks were developed by
[00:00:43] the bayesian networks were developed by judea pearl in the mid 1980s and really
[00:00:47] judea pearl in the mid 1980s and really have evolved into the more general
[00:00:49] have evolved into the more general notion of generative modeling that we
[00:00:51] notion of generative modeling that we see today in machine
[00:00:53] see today in machine so quickly before diving into vision
[00:00:56] so quickly before diving into vision networks it's helpful to compare and
[00:00:58] networks it's helpful to compare and contrast with markov networks
[00:01:01] contrast with markov networks so both
[00:01:02] so both are going to define a probability
[00:01:03] are going to define a probability distribution over assignments to a set
[00:01:06] distribution over assignments to a set of random variables
[00:01:08] of random variables but the way that each approaches this is
[00:01:10] but the way that each approaches this is very different
[00:01:11] very different so if you're defining a markov network
[00:01:14] so if you're defining a markov network you tend to think in terms of specifying
[00:01:17] you tend to think in terms of specifying a set of preferences
[00:01:18] a set of preferences and you throw these factors encoding
[00:01:21] and you throw these factors encoding these preferences into
[00:01:23] these preferences into the markov network so for example last
[00:01:25] the markov network so for example last time we just threw
[00:01:26] time we just threw in the transition factor and observation
[00:01:28] in the transition factor and observation vector for the object tracking example
[00:01:31] vector for the object tracking example so the bayesian network is going to
[00:01:33] so the bayesian network is going to require a more coordinated approach
[00:01:36] require a more coordinated approach so in a bayesian network the factors are
[00:01:37] so in a bayesian network the factors are going to be local conditional
[00:01:39] going to be local conditional distributions as we'll see later and we
[00:01:42] distributions as we'll see later and we really think about a generative process
[00:01:44] really think about a generative process by which each of these variables is set
[00:01:47] by which each of these variables is set based on other variables in turn
[00:01:52] so there are many applications of
[00:01:54] so there are many applications of bayesian networks um or more generally
[00:01:56] bayesian networks um or more generally generated models
[00:01:58] generated models so i'll just go through a couple of them
[00:02:00] so i'll just go through a couple of them here so the first one is topic modeling
[00:02:02] here so the first one is topic modeling where the goal is to discover hidden
[00:02:05] where the goal is to discover hidden structure in a large collection of
[00:02:07] structure in a large collection of documents so an example of topic
[00:02:09] documents so an example of topic modeling is latent dirichlet allocation
[00:02:11] modeling is latent dirichlet allocation or lda
[00:02:13] or lda and lda posits that each document is
[00:02:16] and lda posits that each document is generated by
[00:02:18] generated by drawing a mixture of topics and then
[00:02:20] drawing a mixture of topics and then generating the words given those topics
[00:02:24] generating the words given those topics another interesting example is this idea
[00:02:27] another interesting example is this idea of vision as inverse graphics
[00:02:29] of vision as inverse graphics so much of computer vision today
[00:02:31] so much of computer vision today is
[00:02:33] is taking images and processing them in
[00:02:35] taking images and processing them in some way to generate semantic
[00:02:37] some way to generate semantic descriptions such as object categories
[00:02:39] descriptions such as object categories or scene descriptions
[00:02:41] or scene descriptions so vision is inverse graphics takes a
[00:02:44] so vision is inverse graphics takes a very different approach
[00:02:46] very different approach where we specify using laws of physics a
[00:02:50] where we specify using laws of physics a graphics engine that can generate an
[00:02:52] graphics engine that can generate an image given some semantic description
[00:02:54] image given some semantic description for example a 3d model of an object
[00:02:57] for example a 3d model of an object and then given this model
[00:03:00] and then given this model computer vision is just
[00:03:02] computer vision is just inverse graphics where we're trying to
[00:03:05] inverse graphics where we're trying to recover the semantic description
[00:03:08] recover the semantic description using
[00:03:09] using the image as input
[00:03:11] the image as input so this is an example of inference on
[00:03:14] so this is an example of inference on this generative model
[00:03:16] this generative model so while this idea hasn't really been
[00:03:19] so while this idea hasn't really been able to scale the scale past some
[00:03:21] able to scale the scale past some limited examples it's i think a very
[00:03:24] limited examples it's i think a very tantalizing idea nonetheless
[00:03:26] tantalizing idea nonetheless so switching gears a little bit let's
[00:03:28] so switching gears a little bit let's talk about communication networks
[00:03:31] talk about communication networks so
[00:03:32] so in the communication networks nodes must
[00:03:34] in the communication networks nodes must send messages
[00:03:36] send messages just a sequence of bits to each other
[00:03:38] just a sequence of bits to each other but these bits can get corrupted along
[00:03:41] but these bits can get corrupted along the way due to
[00:03:42] the way due to physics
[00:03:43] physics so the idea behind error correcting
[00:03:45] so the idea behind error correcting codes
[00:03:46] codes more in particular these things called
[00:03:48] more in particular these things called low density parity codes is that the
[00:03:50] low density parity codes is that the sender sends a random parity checks on
[00:03:53] sender sends a random parity checks on the data bits
[00:03:55] the data bits and then the receiver obtains a noisy
[00:03:57] and then the receiver obtains a noisy version of both the data and the parity
[00:03:59] version of both the data and the parity bits the bayesian network defines how
[00:04:02] bits the bayesian network defines how the original bits are related to the
[00:04:05] the original bits are related to the noisy bits and then the receiver can use
[00:04:08] noisy bits and then the receiver can use bayesian inference to compute and
[00:04:10] bayesian inference to compute and recover the original bits so this is
[00:04:12] recover the original bits so this is actually a very effective idea that's
[00:04:13] actually a very effective idea that's used in practice
[00:04:16] used in practice the final example is either
[00:04:20] the final example is either controversial or a little bit grim
[00:04:22] controversial or a little bit grim which i'll explain later so this this is
[00:04:24] which i'll explain later so this this is a problem of dna matching
[00:04:27] a problem of dna matching so there are two use cases of this
[00:04:30] so there are two use cases of this one is in forensics so given dna found
[00:04:33] one is in forensics so given dna found at a crime site
[00:04:35] at a crime site even if the suspect's dna is not in the
[00:04:38] even if the suspect's dna is not in the database
[00:04:39] database one can still match this dna against the
[00:04:42] one can still match this dna against the family members of a subject and here the
[00:04:45] family members of a subject and here the bayesian network is structured along the
[00:04:47] bayesian network is structured along the family tree
[00:04:48] family tree and
[00:04:49] and specifies the relationship between the
[00:04:51] specifies the relationship between the family members dna due to using
[00:04:54] family members dna due to using mendelian inheritance
[00:04:56] mendelian inheritance so now while this technology has
[00:04:58] so now while this technology has actually been used to solve a number of
[00:05:00] actually been used to solve a number of crime cases there's definitely a lot of
[00:05:03] crime cases there's definitely a lot of tricky ethical concerns about this
[00:05:05] tricky ethical concerns about this expanded dna matching especially when an
[00:05:08] expanded dna matching especially when an individual's decision to release their
[00:05:10] individual's decision to release their own dna can impact the privacy of family
[00:05:13] own dna can impact the privacy of family members
[00:05:15] members the second use case is in disaster
[00:05:17] the second use case is in disaster victim identification so after a big
[00:05:20] victim identification so after a big airplane crash or some other disaster
[00:05:23] airplane crash or some other disaster for example malaysia airlines crashed in
[00:05:25] for example malaysia airlines crashed in ukraine in 2014
[00:05:27] ukraine in 2014 a victim's dna is found at the crash
[00:05:29] a victim's dna is found at the crash site and is matched against the family
[00:05:32] site and is matched against the family members using the same mechanism as i
[00:05:34] members using the same mechanism as i just described to help identify victims
[00:05:37] just described to help identify victims and these methods are very scalable
[00:05:40] and these methods are very scalable which allows them to
[00:05:42] which allows them to deal with well these unfortunate large
[00:05:45] deal with well these unfortunate large crash sites
[00:05:48] so why bayesian networks
[00:05:51] so why bayesian networks well these days it's kind of hard not to
[00:05:53] well these days it's kind of hard not to think about problems exclusively through
[00:05:56] think about problems exclusively through the lens of standard supervised learning
[00:05:58] the lens of standard supervised learning such as just train a deep neural network
[00:06:00] such as just train a deep neural network on the pile of data
[00:06:02] on the pile of data vision networks really operate in a very
[00:06:04] vision networks really operate in a very different paradigm which offers several
[00:06:06] different paradigm which offers several advantages that i want to underscore
[00:06:08] advantages that i want to underscore here
[00:06:10] here so the first
[00:06:11] so the first is that it can handle heterogeneously
[00:06:14] is that it can handle heterogeneously missing information
[00:06:16] missing information so normally when you're doing standard
[00:06:18] so normally when you're doing standard supervised learning
[00:06:20] supervised learning your data is fairly homogeneous you have
[00:06:23] your data is fairly homogeneous you have training exam input and output pairs
[00:06:25] training exam input and output pairs both at training and test time
[00:06:28] both at training and test time but
[00:06:28] but in cases where you have missing
[00:06:30] in cases where you have missing information or your auxiliary
[00:06:31] information or your auxiliary information asian networks can
[00:06:33] information asian networks can gracefully handle this missingness in a
[00:06:36] gracefully handle this missingness in a way that's a little bit more challenging
[00:06:39] way that's a little bit more challenging for traditional supervised methods
[00:06:42] for traditional supervised methods the second is that bayesian networks
[00:06:43] the second is that bayesian networks allow you to incorporate prior knowledge
[00:06:46] allow you to incorporate prior knowledge much more easily so when you have it for
[00:06:48] much more easily so when you have it for example you understand how mendelian
[00:06:50] example you understand how mendelian inheritance works on dna or you
[00:06:53] inheritance works on dna or you understand the laws of physics then
[00:06:54] understand the laws of physics then visual networks provides a nice language
[00:06:56] visual networks provides a nice language for incorporating this information into
[00:06:59] for incorporating this information into your model
[00:07:00] your model and now using this model you can
[00:07:02] and now using this model you can actually learn from very few samples and
[00:07:04] actually learn from very few samples and extrapolate beyond the training
[00:07:06] extrapolate beyond the training distribution whereas in contrast many
[00:07:09] distribution whereas in contrast many kind of model
[00:07:10] kind of model agnostic
[00:07:12] agnostic low inductive bias methods such as deep
[00:07:14] low inductive bias methods such as deep neural networks require much more data
[00:07:16] neural networks require much more data to be effective
[00:07:18] to be effective because you're specifying prior
[00:07:20] because you're specifying prior knowledge you can also
[00:07:22] knowledge you can also interpret
[00:07:23] interpret the variables inside the bayesian
[00:07:25] the variables inside the bayesian networks and this could be useful for
[00:07:28] networks and this could be useful for understanding
[00:07:29] understanding why
[00:07:30] why a model is making a certain decision and
[00:07:32] a model is making a certain decision and you can introspect and ask questions
[00:07:34] you can introspect and ask questions about any of the intermediate variables
[00:07:36] about any of the intermediate variables and this is just follows from the laws
[00:07:38] and this is just follows from the laws of probability
[00:07:39] of probability finally
[00:07:40] finally asian networks are an important
[00:07:42] asian networks are an important precursor to causal models so these are
[00:07:45] precursor to causal models so these are beyond the scope of this course but they
[00:07:47] beyond the scope of this course but they are extremely important especially these
[00:07:49] are extremely important especially these days they allow you to answer questions
[00:07:51] days they allow you to answer questions about
[00:07:52] about interventions for example what would
[00:07:55] interventions for example what would happen if we give this drug to this
[00:07:57] happen if we give this drug to this patient
[00:07:58] patient and counter factuals what would have
[00:08:00] and counter factuals what would have happened if we have given this drug so
[00:08:03] happened if we have given this drug so these questions are extremely
[00:08:05] these questions are extremely tricky and deep
[00:08:07] tricky and deep that standard machine learning or any
[00:08:09] that standard machine learning or any methods that view the world through just
[00:08:11] methods that view the world through just the lens of predictions are really
[00:08:14] the lens of predictions are really inadequate to answer so we're not going
[00:08:16] inadequate to answer so we're not going to talk about this in this course but i
[00:08:18] to talk about this in this course but i highly encourage you to explore this
[00:08:19] highly encourage you to explore this topic on your own
[00:08:21] topic on your own so finally bayesian networks obviously
[00:08:24] so finally bayesian networks obviously aren't the panacea in many situations so
[00:08:27] aren't the panacea in many situations so often in these
[00:08:29] often in these canonical ai applications such as vision
[00:08:32] canonical ai applications such as vision speech and language
[00:08:34] speech and language we actually have large data sets and we
[00:08:36] we actually have large data sets and we mostly care about prediction
[00:08:38] mostly care about prediction and it's extremely hard to incorporate
[00:08:40] and it's extremely hard to incorporate prior knowledge into
[00:08:42] prior knowledge into your models in these very complex
[00:08:44] your models in these very complex domains so in these cases bayesian
[00:08:46] domains so in these cases bayesian networks haven't been um
[00:08:48] networks haven't been um as successful and have largely but been
[00:08:51] as successful and have largely but been supplanted by deep learning approaches
[00:08:53] supplanted by deep learning approaches but still having bayesian networks in
[00:08:55] but still having bayesian networks in your tool kit will allow you to use it
[00:08:58] your tool kit will allow you to use it effectively when you discover the right
[00:09:00] effectively when you discover the right problem
[00:09:02] problem so in the remaining modules on bayesian
[00:09:03] so in the remaining modules on bayesian networks i will first introduce azure
[00:09:06] networks i will first introduce azure networks more formally
[00:09:08] networks more formally and then i'll talk about problems
[00:09:10] and then i'll talk about problems programming which is a way to define
[00:09:12] programming which is a way to define bayesian networks using probabilistic
[00:09:15] bayesian networks using probabilistic programs so this is a really cool
[00:09:17] programs so this is a really cool way
[00:09:18] way to think about modeling
[00:09:21] to think about modeling then we'll turn to inference
[00:09:23] then we'll turn to inference i'll talk about what inference means um
[00:09:26] i'll talk about what inference means um computing conditional and marginal
[00:09:28] computing conditional and marginal probabilities we're actually going to
[00:09:30] probabilities we're actually going to reduce the problem in beijing networks
[00:09:32] reduce the problem in beijing networks to the uh the same problem of a big
[00:09:36] to the uh the same problem of a big probabilistic inference in markov
[00:09:37] probabilistic inference in markov networks allowing to leverage the stuff
[00:09:40] networks allowing to leverage the stuff that we talked about
[00:09:41] that we talked about when we talked about
[00:09:43] when we talked about markov networks
[00:09:44] markov networks then we're going to specialize to hidden
[00:09:46] then we're going to specialize to hidden markup models hmms an important special
[00:09:48] markup models hmms an important special case of asian networks
[00:09:50] case of asian networks we're going to show that the forward
[00:09:51] we're going to show that the forward backward algorithm can leverage the
[00:09:53] backward algorithm can leverage the chain structure of an hmm allowing you
[00:09:56] chain structure of an hmm allowing you to do exact probabilistic inference
[00:09:58] to do exact probabilistic inference efficiently
[00:09:59] efficiently then we're going to talk about particle
[00:10:00] then we're going to talk about particle filtering which allows you to do
[00:10:02] filtering which allows you to do approximate inference and scale up to
[00:10:05] approximate inference and scale up to hmms where variables have larger domains
[00:10:09] hmms where variables have larger domains finally we're going to talk about
[00:10:11] finally we're going to talk about learning in bayesian networks we're just
[00:10:13] learning in bayesian networks we're just going to start with supervised learning
[00:10:15] going to start with supervised learning where all the variables are observed and
[00:10:17] where all the variables are observed and this actually turns out to be quite easy
[00:10:20] this actually turns out to be quite easy you'll be pleasantly surprised
[00:10:22] you'll be pleasantly surprised then we're going to show you how to
[00:10:24] then we're going to show you how to guard against overfitting using laplace
[00:10:26] guard against overfitting using laplace smoothing and finally we're going to
[00:10:27] smoothing and finally we're going to turn to cases where not all the
[00:10:29] turn to cases where not all the variables are observed and we introduce
[00:10:31] variables are observed and we introduce the em algorithm that will help us learn
[00:10:33] the em algorithm that will help us learn in such vision elements
[00:10:35] in such vision elements okay so let's jump in


================================================================================
LECTURE 033
================================================================================

Bayesian Networks 2 - Definition | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=xvC6XmZmR_U

---

Transcript

[00:00:05] hi in this module i'm going to present
[00:00:07] hi in this module i'm going to present the formal definition of bayesian
[00:00:09] the formal definition of bayesian networks give a few examples and then
[00:00:11] networks give a few examples and then talk about a really interesting property
[00:00:13] talk about a really interesting property called explaining away
[00:00:15] called explaining away so before we begin i want to review some
[00:00:17] so before we begin i want to review some basic probability
[00:00:19] basic probability so suppose we have some random variables
[00:00:22] so suppose we have some random variables we have one called s which is
[00:00:25] we have one called s which is representing whether they're sunshine
[00:00:28] representing whether they're sunshine and another random variable called r
[00:00:30] and another random variable called r which represents whether there's rain
[00:00:33] which represents whether there's rain so you should think about a setting
[00:00:35] so you should think about a setting of values to s and r as capturing some
[00:00:38] of values to s and r as capturing some state of the world
[00:00:40] state of the world so we don't know which state of the
[00:00:42] so we don't know which state of the world we're in so we're going to capture
[00:00:44] world we're in so we're going to capture this uncertainty using a joint
[00:00:46] this uncertainty using a joint distribution
[00:00:47] distribution so the joint distribution over s and r
[00:00:50] so the joint distribution over s and r in p s comma r is equal to
[00:00:53] in p s comma r is equal to a table
[00:00:54] a table and this table specifies for every
[00:00:57] and this table specifies for every possible assignment to snr
[00:00:59] possible assignment to snr i'm going to have a probability
[00:01:01] i'm going to have a probability associated with it so for example
[00:01:04] associated with it so for example there's the probability of no sun and no
[00:01:06] there's the probability of no sun and no rain is 0.2
[00:01:08] rain is 0.2 snow sun and rain is 0.08 and so on and
[00:01:12] snow sun and rain is 0.08 and so on and so forth
[00:01:13] so forth so notice that i'm using lowercase
[00:01:15] so notice that i'm using lowercase letters to denote values and uppercase
[00:01:18] letters to denote values and uppercase letters to denote the random variables
[00:01:22] letters to denote the random variables and also notice that
[00:01:23] and also notice that this quantity is a number it's a
[00:01:25] this quantity is a number it's a probability whereas this quantity
[00:01:28] probability whereas this quantity is a table
[00:01:31] is a table so the joint distribution captures
[00:01:34] so the joint distribution captures everything that you really want to know
[00:01:36] everything that you really want to know you can think about it as a
[00:01:37] you can think about it as a probabilistic database
[00:01:39] probabilistic database that
[00:01:40] that captures how the world works
[00:01:43] captures how the world works so now we can use the joint distribution
[00:01:46] so now we can use the joint distribution to answer all sorts of interesting
[00:01:47] to answer all sorts of interesting questions
[00:01:49] questions so we can compute what is called a
[00:01:51] so we can compute what is called a marginal distribution so the idea here
[00:01:53] marginal distribution so the idea here is that suppose i'm interested in only
[00:01:55] is that suppose i'm interested in only whether it's sunshine or not i don't
[00:01:57] whether it's sunshine or not i don't care about whether it's raining
[00:01:59] care about whether it's raining so we can compute p of s
[00:02:02] so we can compute p of s and this is a table
[00:02:03] and this is a table which specifies the possible values of s
[00:02:06] which specifies the possible values of s 0 1 and the probability the marginal
[00:02:09] 0 1 and the probability the marginal probability of that particular value
[00:02:13] probability of that particular value so how do we compute this we simply
[00:02:14] so how do we compute this we simply aggregate rows
[00:02:16] aggregate rows so in particular we're going to look at
[00:02:18] so in particular we're going to look at s equals 0
[00:02:19] s equals 0 look at the joint distribution and match
[00:02:22] look at the joint distribution and match all the rows where s is zero so these
[00:02:24] all the rows where s is zero so these two and add them up so that gives us
[00:02:26] two and add them up so that gives us 0.28
[00:02:28] 0.28 and then for s equals one the matching
[00:02:30] and then for s equals one the matching rows are these last two rows and that
[00:02:32] rows are these last two rows and that gives us 0.72 that's the marginal
[00:02:36] gives us 0.72 that's the marginal distribution over s
[00:02:39] distribution over s so there's also another concept called
[00:02:41] so there's also another concept called conditional distribution
[00:02:43] conditional distribution and here the idea is that suppose
[00:02:45] and here the idea is that suppose i knew it was raining so i'm going to
[00:02:48] i knew it was raining so i'm going to condition on r equals one
[00:02:50] condition on r equals one so now i want to know what is the
[00:02:52] so now i want to know what is the probability of sunshine
[00:02:54] probability of sunshine so this is a table where i again specify
[00:02:57] so this is a table where i again specify the possible values of s zero one and i
[00:03:00] the possible values of s zero one and i want to know the probability the
[00:03:01] want to know the probability the conditional probability of s given r
[00:03:04] conditional probability of s given r equals one
[00:03:05] equals one so the way i want to approach
[00:03:07] so the way i want to approach conditional distribution is as follows i
[00:03:10] conditional distribution is as follows i have this condition r equals one
[00:03:12] have this condition r equals one that means i'm going to effectively
[00:03:15] that means i'm going to effectively remove
[00:03:16] remove all the rows where r does not equal one
[00:03:20] all the rows where r does not equal one i'm left with these two
[00:03:22] i'm left with these two and now i'm going to
[00:03:25] and now i'm going to um
[00:03:26] um simply normalize this distribution so i
[00:03:29] simply normalize this distribution so i have zero point zero eight and 0.02 the
[00:03:33] have zero point zero eight and 0.02 the sum is 0.1 so i divide by that sum
[00:03:36] sum is 0.1 so i divide by that sum and that's going to give me 0.8 and 0.2
[00:03:40] and that's going to give me 0.8 and 0.2 for
[00:03:41] for the values of
[00:03:44] s given r equals 1.
[00:03:47] s given r equals 1. so
[00:03:48] so all we did was select the rows that
[00:03:50] all we did was select the rows that match the condition
[00:03:51] match the condition and normalize to get
[00:03:54] and normalize to get the
[00:03:55] the distribution
[00:03:57] distribution and now just a simple note is that
[00:04:00] and now just a simple note is that the normalization constant which is the
[00:04:02] the normalization constant which is the sum of these two
[00:04:04] sum of these two is actually the probability of r equals
[00:04:08] is actually the probability of r equals the marginal probability of r equals one
[00:04:10] the marginal probability of r equals one you can check that the conditional by
[00:04:13] you can check that the conditional by definition is equal to the joint divided
[00:04:15] definition is equal to the joint divided by the marginal of the thinking
[00:04:21] so let's expand our example a little bit
[00:04:23] so let's expand our example a little bit so suppose we have variables now
[00:04:25] so suppose we have variables now sunshine rain
[00:04:27] sunshine rain traffic
[00:04:28] traffic and autumn
[00:04:30] and autumn so we have a joint distribution over all
[00:04:33] so we have a joint distribution over all four variables
[00:04:34] four variables and we can
[00:04:36] and we can marginal and conditioning are not
[00:04:39] marginal and conditioning are not mutually exclusive we can actually
[00:04:40] mutually exclusive we can actually define core
[00:04:41] define core questions that involve both so here is
[00:04:44] questions that involve both so here is an example suppose i
[00:04:46] an example suppose i know that it's uh there's traffic and
[00:04:49] know that it's uh there's traffic and that we're in the autumn quarter
[00:04:52] that we're in the autumn quarter and now we're interested in a particular
[00:04:54] and now we're interested in a particular variable query variable
[00:04:56] variable query variable in this case r
[00:04:58] in this case r so this question can be written as
[00:05:00] so this question can be written as follows the probability
[00:05:02] follows the probability of the query variable r condition on
[00:05:06] of the query variable r condition on this evidence t equals one a equals one
[00:05:10] this evidence t equals one a equals one okay so now
[00:05:11] okay so now the variables which are not mentioned
[00:05:13] the variables which are not mentioned here are said to be marginalized out so
[00:05:16] here are said to be marginalized out so s is not mentioned here so we're
[00:05:17] s is not mentioned here so we're marginalizing out s
[00:05:19] marginalizing out s okay so in general there's three
[00:05:22] okay so in general there's three sets of variables the query variables
[00:05:25] sets of variables the query variables the conditioning variables and
[00:05:28] the conditioning variables and the marginalized out variables which
[00:05:30] the marginalized out variables which form a partition of all the variables in
[00:05:33] form a partition of all the variables in your
[00:05:34] your in your system
[00:05:38] so now let's turn to a classic puzzle
[00:05:42] so now let's turn to a classic puzzle which we will solve using
[00:05:45] which we will solve using suppose that in the world
[00:05:47] suppose that in the world there are unfortunate things such as
[00:05:49] there are unfortunate things such as earthquakes and burglaries and suppose
[00:05:52] earthquakes and burglaries and suppose that they're independent events and
[00:05:54] that they're independent events and hopefully rare um each of them happens
[00:05:56] hopefully rare um each of them happens with probability epsilon probably
[00:05:59] with probability epsilon probably epsilon is some small
[00:06:02] epsilon is some small um you've installed an alarm which will
[00:06:05] um you've installed an alarm which will go off whenever
[00:06:07] go off whenever either there's an earthquake or
[00:06:10] either there's an earthquake or um
[00:06:11] um there's a burglar so you got some
[00:06:12] there's a burglar so you got some special deal where it's like a
[00:06:13] special deal where it's like a two-in-one kind of alarm
[00:06:15] two-in-one kind of alarm so um suppose you're away on vacation
[00:06:18] so um suppose you're away on vacation and then you got a notification that
[00:06:20] and then you got a notification that your alarm went off okay
[00:06:22] your alarm went off okay so now
[00:06:24] so now um
[00:06:25] um does hearing that there's an earthquake
[00:06:28] does hearing that there's an earthquake on
[00:06:29] on uh the radio or on your news feed
[00:06:32] uh the radio or on your news feed increase decrease or keep constant the
[00:06:35] increase decrease or keep constant the probability of burglar
[00:06:38] probability of burglar okay so if does knowing that there's an
[00:06:40] okay so if does knowing that there's an earthquake in addition to alarm how does
[00:06:43] earthquake in addition to alarm how does that change your beliefs about a
[00:06:45] that change your beliefs about a burglary
[00:06:47] burglary so
[00:06:48] so now we could try to intuit the answer
[00:06:51] now we could try to intuit the answer and i would encourage you to do that and
[00:06:53] and i would encourage you to do that and see if you're right but
[00:06:55] see if you're right but sometimes this is uh can be very
[00:06:58] sometimes this is uh can be very slippery because the right answer could
[00:07:00] slippery because the right answer could be counterintuitive
[00:07:03] so you might think that because well
[00:07:05] so you might think that because well earthquakes and burglaries i said
[00:07:06] earthquakes and burglaries i said they're independent
[00:07:08] they're independent so knowing that there's an earthquake
[00:07:10] so knowing that there's an earthquake why should that change the probability
[00:07:12] why should that change the probability of berkeley that's one way to think
[00:07:14] of berkeley that's one way to think about it but that turns out to be wrong
[00:07:16] about it but that turns out to be wrong and i'll show you why
[00:07:18] and i'll show you why so let's try to tackle this problem
[00:07:21] so let's try to tackle this problem using bayesian networks
[00:07:23] using bayesian networks so we're going to define a joint
[00:07:25] so we're going to define a joint distribution over
[00:07:27] distribution over earthquake burglary and alarm i'll do
[00:07:29] earthquake burglary and alarm i'll do this in the next slide but first let's
[00:07:31] this in the next slide but first let's talk about the questions let's convert
[00:07:33] talk about the questions let's convert this word problem into mathematical
[00:07:35] this word problem into mathematical notation here
[00:07:37] notation here so the two things i want to compare is
[00:07:39] so the two things i want to compare is what is the probability of there being a
[00:07:41] what is the probability of there being a burglary given only that i heard an
[00:07:44] burglary given only that i heard an alarm
[00:07:46] alarm versus what is the probability of a
[00:07:47] versus what is the probability of a burglary given that i heard an alarm and
[00:07:50] burglary given that i heard an alarm and i heard that there's an earthquake
[00:07:52] i heard that there's an earthquake so is it smaller is it the same is it
[00:07:55] so is it smaller is it the same is it larger and that's what i want to know
[00:07:59] larger and that's what i want to know so now let us define the bayesian
[00:08:01] so now let us define the bayesian network of
[00:08:03] network of completely
[00:08:04] completely so there's going to be four steps um to
[00:08:07] so there's going to be four steps um to thinking about how to define a bayesian
[00:08:09] thinking about how to define a bayesian network
[00:08:10] network first of all let's uh figure out what
[00:08:12] first of all let's uh figure out what the variables are so the variables are
[00:08:14] the variables are so the variables are whether there's a burglary b whether
[00:08:16] whether there's a burglary b whether there's an earthquake e and whether
[00:08:18] there's an earthquake e and whether there's uh the alarm went off or on a
[00:08:21] there's uh the alarm went off or on a okay
[00:08:22] okay second what we're going to do is to
[00:08:25] second what we're going to do is to model the dependencies between
[00:08:27] model the dependencies between these variables using directed arrows
[00:08:32] these variables using directed arrows and these capture
[00:08:34] and these capture you can think about them as capturing
[00:08:36] you can think about them as capturing causality but although that's
[00:08:39] causality but although that's not necessarily the case here
[00:08:43] and so these are meant to just capture
[00:08:45] and so these are meant to just capture qualitative relationships so here the
[00:08:48] qualitative relationships so here the alarm is triggered off either by the
[00:08:51] alarm is triggered off either by the burglary or an earthquake so that seems
[00:08:53] burglary or an earthquake so that seems sensible
[00:08:55] sensible so to make this qualitative relationship
[00:08:57] so to make this qualitative relationship it's quantitative i'm going to define
[00:09:00] it's quantitative i'm going to define a local conditional distribution for
[00:09:02] a local conditional distribution for each variable conditioned on its parents
[00:09:05] each variable conditioned on its parents so let's go through these examples so we
[00:09:07] so let's go through these examples so we have b that's a variable
[00:09:09] have b that's a variable so a local conditional distribution
[00:09:10] so a local conditional distribution specifies
[00:09:12] specifies for each possible value of b
[00:09:15] for each possible value of b what is this probability so
[00:09:17] what is this probability so i said that the probability of burglary
[00:09:19] i said that the probability of burglary is epsilon and that means the
[00:09:20] is epsilon and that means the probability of no burglary is one minus
[00:09:22] probability of no burglary is one minus sub-small
[00:09:24] sub-small um then we look at e
[00:09:26] um then we look at e e has no parents as well so um the
[00:09:29] e has no parents as well so um the probability of uh earthquake is
[00:09:33] probability of uh earthquake is epsilon and the probability of no
[00:09:34] epsilon and the probability of no earthquake is one minus
[00:09:36] earthquake is one minus so now let's turn into the um and i can
[00:09:40] so now let's turn into the um and i can write these conditional distributions as
[00:09:42] write these conditional distributions as follows as well so i can write
[00:09:44] follows as well so i can write probability of b
[00:09:45] probability of b is epsilon times indicator of b equals 1
[00:09:49] is epsilon times indicator of b equals 1 plus 1 minus epsilon times indicator b
[00:09:51] plus 1 minus epsilon times indicator b 0. so if i plug in 1 here then i'm going
[00:09:55] 0. so if i plug in 1 here then i'm going to get epsilon and if i plug in 0
[00:09:57] to get epsilon and if i plug in 0 i'm going to get 1 minus epsilon and
[00:09:59] i'm going to get 1 minus epsilon and same with probability of epsilon e
[00:10:02] same with probability of epsilon e so now the what is the probability of a
[00:10:05] so now the what is the probability of a given its parents
[00:10:08] given its parents so
[00:10:09] so it's easiest to write it actually
[00:10:11] it's easiest to write it actually mathematically like this
[00:10:13] mathematically like this so this is just the indicator of whether
[00:10:16] so this is just the indicator of whether a equals b or e
[00:10:19] a equals b or e so this is a deterministic relationship
[00:10:22] so this is a deterministic relationship but i've lifted this to this
[00:10:23] but i've lifted this to this probabilistic notation
[00:10:26] i can also
[00:10:28] i can also write this out as a table where i
[00:10:30] write this out as a table where i specify
[00:10:32] specify for every possible configuration of the
[00:10:34] for every possible configuration of the parents
[00:10:36] parents and of a itself
[00:10:37] and of a itself what is its probability
[00:10:39] what is its probability so here if a b
[00:10:41] so here if a b and e are zero
[00:10:43] and e are zero command
[00:10:45] command and does a equal
[00:10:47] and does a equal uh is zero equal a zero or zero
[00:10:50] uh is zero equal a zero or zero that's yes so this probability is one
[00:10:53] that's yes so this probability is one zero or a zero equal one the answer is
[00:10:56] zero or a zero equal one the answer is no so that's a zero because uh zero or z
[00:11:00] no so that's a zero because uh zero or z one equal zero
[00:11:02] one equal zero that's also a no
[00:11:03] that's also a no and the zero or one equal one
[00:11:07] and the zero or one equal one um the answer is yes so that's a one
[00:11:10] um the answer is yes so that's a one go on the rest of the table analogously
[00:11:13] go on the rest of the table analogously okay so now i have defined a local
[00:11:15] okay so now i have defined a local conditional distribution for each
[00:11:18] conditional distribution for each variable
[00:11:19] variable given its parents
[00:11:22] given its parents and now
[00:11:23] and now the final step is to multiply all these
[00:11:26] the final step is to multiply all these together and that is defined
[00:11:30] together and that is defined as the joint distribution over all the
[00:11:33] as the joint distribution over all the random variable
[00:11:35] random variable so notice that i'm deliberately using
[00:11:37] so notice that i'm deliberately using two types of p here so one is lowercase
[00:11:40] two types of p here so one is lowercase p which is used to specify
[00:11:43] p which is used to specify the local conditional disk probabilities
[00:11:46] the local conditional disk probabilities and the blackboard uppercase p is
[00:11:48] and the blackboard uppercase p is reserved for the joint distribution and
[00:11:51] reserved for the joint distribution and also derived marginal and conditional
[00:11:54] also derived marginal and conditional distributions
[00:11:56] distributions so notice that again that these local
[00:11:59] so notice that again that these local conditional distributions are just
[00:12:00] conditional distributions are just defined
[00:12:02] defined whereas this joint distribution is
[00:12:04] whereas this joint distribution is derived from the local conditional
[00:12:07] derived from the local conditional distribution
[00:12:10] all right so the joint distribution like
[00:12:12] all right so the joint distribution like i said is the simply the product of all
[00:12:16] i said is the simply the product of all the local conditional distributions so
[00:12:18] the local conditional distributions so if i work that out i get this table over
[00:12:21] if i work that out i get this table over all possible assignments to be ena and
[00:12:25] all possible assignments to be ena and its probability
[00:12:27] its probability okay so now i can work on
[00:12:29] okay so now i can work on these questions that i'm asking so this
[00:12:32] these questions that i'm asking so this is my probabilistic database let's go
[00:12:34] is my probabilistic database let's go query it
[00:12:35] query it so let's warm up with something
[00:12:39] so let's warm up with something relatively simple so probability what is
[00:12:41] relatively simple so probability what is the marginal probability of b
[00:12:43] the marginal probability of b equals one here
[00:12:45] equals one here okay so remember how do i compute
[00:12:46] okay so remember how do i compute marginal probability i look at b equals
[00:12:49] marginal probability i look at b equals one okay so that selects these rows down
[00:12:52] one okay so that selects these rows down here and i simply add up these
[00:12:54] here and i simply add up these probabilities so there's epsilon times 1
[00:12:56] probabilities so there's epsilon times 1 minus epsilon
[00:12:59] minus epsilon and then adding
[00:13:01] and then adding epsilon squared
[00:13:03] epsilon squared so that gives me epsilon minus epsilon
[00:13:06] so that gives me epsilon minus epsilon squared plus epsilon squared equals
[00:13:08] squared plus epsilon squared equals epsilon
[00:13:10] epsilon okay so what about
[00:13:12] okay so what about probability of burglary
[00:13:14] probability of burglary condition on the alarm
[00:13:17] condition on the alarm so remember for conditional
[00:13:18] so remember for conditional distributions
[00:13:20] distributions i'm going to now wipe out all the rows
[00:13:23] i'm going to now wipe out all the rows where a
[00:13:24] where a is not one
[00:13:26] is not one um
[00:13:27] um this
[00:13:29] this so i'm left with all these rows which
[00:13:31] so i'm left with all these rows which are consistent with my evidence of equal
[00:13:33] are consistent with my evidence of equal one
[00:13:35] one and now
[00:13:36] and now i'm going to look at
[00:13:38] i'm going to look at um probability of b equals one so that
[00:13:42] um probability of b equals one so that is are these two rows
[00:13:44] is are these two rows and now i add
[00:13:45] and now i add um epsilon 1 minus epsilon
[00:13:49] um epsilon 1 minus epsilon plus
[00:13:51] plus um
[00:13:52] um epsilon squared
[00:13:54] epsilon squared okay and i'm going to divide by the sum
[00:13:57] okay and i'm going to divide by the sum of all these three things which is
[00:14:00] of all these three things which is same as the numerator plus this
[00:14:01] same as the numerator plus this additional one minus epsilon times
[00:14:03] additional one minus epsilon times epsilon
[00:14:04] epsilon if you do the math here you get uh one
[00:14:07] if you do the math here you get uh one over two minus epsilon
[00:14:10] over two minus epsilon okay so this intuitively makes sense um
[00:14:13] okay so this intuitively makes sense um the prior probability of a burglary is
[00:14:15] the prior probability of a burglary is small but if i hear alarm then this goes
[00:14:18] small but if i hear alarm then this goes up to actually a little bit over 50
[00:14:20] up to actually a little bit over 50 percent
[00:14:23] so now the final
[00:14:26] so now the final question is what is the probability of
[00:14:27] question is what is the probability of burglary given that i heard the alarm
[00:14:31] burglary given that i heard the alarm and also i hear that there's an
[00:14:33] and also i hear that there's an earthquake
[00:14:34] earthquake okay so i'm conditioning on now a equals
[00:14:37] okay so i'm conditioning on now a equals one and equals one so i'm going to wipe
[00:14:39] one and equals one so i'm going to wipe out
[00:14:40] out the rows where
[00:14:42] the rows where e is zero
[00:14:45] and now i am left with what's the
[00:14:48] and now i am left with what's the probability of equals one so that is
[00:14:51] probability of equals one so that is epsilon squared
[00:14:53] epsilon squared divided by
[00:14:55] divided by the sum over these two probabilities
[00:14:57] the sum over these two probabilities which is epsilon squared plus one minus
[00:14:59] which is epsilon squared plus one minus f squared epsilon and this gives me
[00:15:03] f squared epsilon and this gives me if you do the math it gives me epsilon
[00:15:07] if you do the math it gives me epsilon okay so this answers our question now
[00:15:10] okay so this answers our question now um when i
[00:15:13] um when i heard the alarm the probability
[00:15:15] heard the alarm the probability of a burglary uh goes up rightfully but
[00:15:19] of a burglary uh goes up rightfully but now
[00:15:20] now i see that if there is an earthquake or
[00:15:23] i see that if there is an earthquake or hear that there's an earthquake that
[00:15:24] hear that there's an earthquake that probably goes down back to epsilon
[00:15:28] probably goes down back to epsilon okay so the answer to the question is
[00:15:30] okay so the answer to the question is that
[00:15:31] that observing the earthquake does cause the
[00:15:33] observing the earthquake does cause the problem of burglary to go down
[00:15:36] problem of burglary to go down okay so let me actually work
[00:15:39] okay so let me actually work convince you of this via this demo so
[00:15:42] convince you of this via this demo so here um remember from this
[00:15:45] here um remember from this before that we can define arbitrary
[00:15:48] before that we can define arbitrary factor graphs including major networks
[00:15:50] factor graphs including major networks using this tool so we have three
[00:15:53] using this tool so we have three variables uh b e and a
[00:15:55] variables uh b e and a epsilon we're setting to 0.05 here
[00:15:58] epsilon we're setting to 0.05 here um i'm going to define factors or local
[00:16:01] um i'm going to define factors or local conditional distributions here
[00:16:03] conditional distributions here probability of
[00:16:04] probability of a b
[00:16:05] a b probability of e probability of a given
[00:16:08] probability of e probability of a given b and e
[00:16:09] b and e and
[00:16:10] and now i'm going to ask for the probability
[00:16:12] now i'm going to ask for the probability of b
[00:16:14] of b um so
[00:16:16] um so if i step through this algorithm i see
[00:16:18] if i step through this algorithm i see that the probability of b is
[00:16:21] that the probability of b is 0.05 which is epsilon
[00:16:25] 0.05 which is epsilon so now what happens when i condition on
[00:16:28] so now what happens when i condition on a
[00:16:30] a when i condition on a
[00:16:31] when i condition on a i find that the probability of b
[00:16:34] i find that the probability of b condition on equals 1 is
[00:16:36] condition on equals 1 is 0.51 so remember this is 1 over 2
[00:16:40] 0.51 so remember this is 1 over 2 minus epsilon
[00:16:43] so now
[00:16:44] so now um
[00:16:46] um finally i'm going to condition on
[00:16:48] finally i'm going to condition on earthquake um
[00:16:52] earthquake um here
[00:16:53] here if i condition an earthquake then
[00:16:56] if i condition an earthquake then i see that the probability of burglary
[00:16:59] i see that the probability of burglary goes down to 0.05 which is epsilon
[00:17:05] okay so
[00:17:06] okay so what do i learn what have we learned
[00:17:08] what do i learn what have we learned from this so you can write a flashy
[00:17:11] from this so you can write a flashy headline saying earthquakes decrease
[00:17:13] headline saying earthquakes decrease burglaries
[00:17:15] burglaries okay so
[00:17:16] okay so of course this is run a little bit time
[00:17:18] of course this is run a little bit time and cheat because this is actually not a
[00:17:20] and cheat because this is actually not a causal statement you have to be because
[00:17:22] causal statement you have to be because if you go in and cause some earthquakes
[00:17:25] if you go in and cause some earthquakes i don't know how you would do that but
[00:17:26] i don't know how you would do that but supposing you do
[00:17:28] supposing you do then it's not like all the buller goes
[00:17:29] then it's not like all the buller goes are going to
[00:17:30] are going to disappear so um here decrease does not
[00:17:34] disappear so um here decrease does not mean you know causal effect it just
[00:17:37] mean you know causal effect it just means that
[00:17:38] means that given this evidence then actually the
[00:17:40] given this evidence then actually the probability of of various other
[00:17:43] probability of of various other variables change
[00:17:45] variables change so the punch line here is that
[00:17:48] so the punch line here is that you know dealing with all these
[00:17:49] you know dealing with all these probabilities and
[00:17:51] probabilities and reasoning under certainty is quite
[00:17:53] reasoning under certainty is quite slippery
[00:17:54] slippery so
[00:17:55] so we need some sort of sound mathematical
[00:17:57] we need some sort of sound mathematical framework such as bayesian networks to
[00:17:59] framework such as bayesian networks to deliver um the right answers
[00:18:04] so this type of
[00:18:07] so this type of phenomenon is
[00:18:08] phenomenon is is so important to bayesian networks
[00:18:10] is so important to bayesian networks that it has a special name it's called
[00:18:12] that it has a special name it's called explaining away
[00:18:14] explaining away so in general explaining away is when
[00:18:17] so in general explaining away is when you have two causes
[00:18:19] you have two causes or more that have positively influenced
[00:18:22] or more that have positively influenced an effect
[00:18:23] an effect so condition on effect
[00:18:26] so condition on effect further conditioning on one cause it
[00:18:28] further conditioning on one cause it actually reduces the probability of the
[00:18:30] actually reduces the probability of the other cause
[00:18:31] other cause so mathematically this is written as
[00:18:33] so mathematically this is written as probability of
[00:18:34] probability of uh the other cause given
[00:18:38] uh the other cause given the effect and
[00:18:39] the effect and one of the causes is less than the
[00:18:41] one of the causes is less than the probability of the cause given just the
[00:18:44] probability of the cause given just the effect alone
[00:18:45] effect alone and this is true even if the causes are
[00:18:48] and this is true even if the causes are independent which is might be somewhat
[00:18:51] independent which is might be somewhat of the counter to it counterintuitive
[00:18:53] of the counter to it counterintuitive effect and this is kind of the hallmark
[00:18:55] effect and this is kind of the hallmark of bayesian networks this is called a v
[00:18:58] of bayesian networks this is called a v structure it looks like a v
[00:19:00] structure it looks like a v um so you can rationalize this uh if you
[00:19:03] um so you can rationalize this uh if you want some intuition as follows so
[00:19:06] want some intuition as follows so you have this effect
[00:19:07] you have this effect and you observe a equals one and now
[00:19:10] and you observe a equals one and now you're trying to seek an explanation for
[00:19:12] you're trying to seek an explanation for what caused
[00:19:14] what caused this effect it's a b or e
[00:19:16] this effect it's a b or e so just
[00:19:18] so just condition on a
[00:19:20] condition on a well it could be either one so it's kind
[00:19:22] well it could be either one so it's kind of like 50 50. but
[00:19:25] of like 50 50. but if you
[00:19:26] if you you have if i told you that one of the
[00:19:28] you have if i told you that one of the causes was actually activated
[00:19:31] causes was actually activated then that intuitively lessens
[00:19:34] then that intuitively lessens um the
[00:19:36] um the responsibility and you don't really need
[00:19:38] responsibility and you don't really need this other cause to explain a so that's
[00:19:41] this other cause to explain a so that's why the probability of this other
[00:19:43] why the probability of this other cause goes down
[00:19:45] cause goes down so of course that is very hand wavy but
[00:19:48] so of course that is very hand wavy but you can rest assured that there is a
[00:19:50] you can rest assured that there is a rigorous mathematical calculations
[00:19:52] rigorous mathematical calculations behind that that just did
[00:19:55] behind that that just did okay so let's look at another
[00:19:58] okay so let's look at another example this is kind of a toy medical
[00:20:00] example this is kind of a toy medical diagnosis problem so suppose you are
[00:20:02] diagnosis problem so suppose you are coughing and you have itchy eyes
[00:20:04] coughing and you have itchy eyes you have a cold or
[00:20:07] you have a cold or something else
[00:20:08] something else so remember there's four steps so let's
[00:20:10] so remember there's four steps so let's go through them in turn so the first
[00:20:12] go through them in turn so the first step is to
[00:20:14] step is to write down what are the random variables
[00:20:16] write down what are the random variables um of interest
[00:20:18] um of interest so here we have cold
[00:20:20] so here we have cold c allergies a cough
[00:20:23] c allergies a cough h and itchy eyes
[00:20:25] h and itchy eyes i
[00:20:27] i okay so
[00:20:29] okay so um these are variables c a
[00:20:32] um these are variables c a h and i
[00:20:34] h and i so the second step is to draw arrows
[00:20:36] so the second step is to draw arrows between them using prior knowledge so
[00:20:39] between them using prior knowledge so using a really super crude
[00:20:41] using a really super crude medical knowledge i'm gonna just declare
[00:20:43] medical knowledge i'm gonna just declare that
[00:20:44] that well cough could be either because of
[00:20:47] well cough could be either because of cold or allergies whereas heis is
[00:20:51] cold or allergies whereas heis is generally due to allergies alone but not
[00:20:53] generally due to allergies alone but not cold
[00:20:55] cold so step three is i'm going to make this
[00:20:57] so step three is i'm going to make this quantitative by defining local
[00:21:00] quantitative by defining local conditional distribution so i'm going to
[00:21:03] conditional distribution so i'm going to for each
[00:21:04] for each variable
[00:21:05] variable i'm going to write down a local
[00:21:06] i'm going to write down a local conditional distribution given its
[00:21:08] conditional distribution given its parent so probability of c c has no
[00:21:10] parent so probability of c c has no parents
[00:21:11] parents probability of a a has no parents
[00:21:14] probability of a a has no parents um probability of h given its parent c
[00:21:17] um probability of h given its parent c and a
[00:21:19] and a and probability of i given its parent a
[00:21:24] and probability of i given its parent a okay so i'm not going to bother to write
[00:21:25] okay so i'm not going to bother to write down the actual uh probability
[00:21:28] down the actual uh probability distribution on the slide
[00:21:30] distribution on the slide um step four is to multiply all these
[00:21:33] um step four is to multiply all these together
[00:21:34] together to form
[00:21:36] to form the joint distribution
[00:21:37] the joint distribution over all the random variables and
[00:21:40] over all the random variables and lowercase p is local conditional
[00:21:42] lowercase p is local conditional distribution blackboard p is the joint
[00:21:45] distribution blackboard p is the joint distribution
[00:21:48] distribution okay so now i can ask this i have this
[00:21:51] okay so now i can ask this i have this probability database now we can ask
[00:21:53] probability database now we can ask questions about it so
[00:21:56] questions about it so let's warm up um not exactly with this
[00:21:58] let's warm up um not exactly with this question but a different question which
[00:22:00] question but a different question which is what is the probability i have a cold
[00:22:02] is what is the probability i have a cold if i just were coughing
[00:22:05] if i just were coughing okay
[00:22:06] okay so let's look at this demo
[00:22:08] so let's look at this demo so here is
[00:22:10] so here is um
[00:22:11] um bayesian network for medical diagnosis
[00:22:13] bayesian network for medical diagnosis where i've defined
[00:22:14] where i've defined um c
[00:22:16] um c a
[00:22:16] a um h
[00:22:18] um h and i
[00:22:19] and i and now i'm conditioning on h equals one
[00:22:21] and now i'm conditioning on h equals one and i equals one
[00:22:23] and i equals one um and i'm asking for the probability of
[00:22:25] um and i'm asking for the probability of c
[00:22:26] c marginalizing out a
[00:22:28] marginalizing out a okay so i am going to
[00:22:32] okay so i am going to there's a few times it runs the sum
[00:22:33] there's a few times it runs the sum variable animation algorithm which don't
[00:22:35] variable animation algorithm which don't worry about that for now
[00:22:37] worry about that for now um and it produces that the probability
[00:22:39] um and it produces that the probability of c condition on equals one y equals
[00:22:42] of c condition on equals one y equals one is
[00:22:43] one is point one three
[00:22:46] point one three um
[00:22:46] um sorry
[00:22:48] sorry this uh i meant to only condition on h
[00:22:51] this uh i meant to only condition on h equals one
[00:22:52] equals one so let me do that again and i get
[00:22:55] so let me do that again and i get .28 so i'm gonna write point
[00:22:59] .28 so i'm gonna write point here
[00:23:00] here and now what is the probability when i
[00:23:03] and now what is the probability when i condition on both um h equals one and i
[00:23:06] condition on both um h equals one and i equals one
[00:23:08] equals one and actually i already did this but i'll
[00:23:10] and actually i already did this but i'll just do it again
[00:23:12] just do it again this is going to be 0.13
[00:23:16] this is going to be 0.13 okay
[00:23:17] okay so again you can rest assured that
[00:23:20] so again you can rest assured that these calculations were followed before
[00:23:22] these calculations were followed before using laws of probability
[00:23:25] using laws of probability and one thing i want to point out is
[00:23:27] and one thing i want to point out is that this is another case of explaining
[00:23:29] that this is another case of explaining away but slightly disguised
[00:23:32] away but slightly disguised so here's how to think about it so um i
[00:23:37] so here's how to think about it so um i condition
[00:23:38] condition on i equals one so i observe that i have
[00:23:40] on i equals one so i observe that i have heis okay so heis is only connected to a
[00:23:44] heis okay so heis is only connected to a so that's only going to boost
[00:23:47] so that's only going to boost support for a
[00:23:49] support for a even though i don't condition on a i'm
[00:23:51] even though i don't condition on a i'm getting more support for a
[00:23:53] getting more support for a and now having more support for a
[00:23:56] and now having more support for a now this isn't explaining away
[00:23:59] now this isn't explaining away why
[00:24:01] why the cough so now a can explain the cough
[00:24:04] the cough so now a can explain the cough which lessens the need for
[00:24:07] which lessens the need for the cold
[00:24:08] the cold so that's why the probability of cold
[00:24:10] so that's why the probability of cold actually decreases compared to if i
[00:24:13] actually decreases compared to if i didn't have hei
[00:24:15] didn't have hei okay so you should be really kind of
[00:24:18] okay so you should be really kind of impressed by this kind of
[00:24:20] impressed by this kind of reasoning it's it's quite subtle even
[00:24:23] reasoning it's it's quite subtle even for
[00:24:24] for this very small four node bayesian
[00:24:26] this very small four node bayesian network and even qualitatively you might
[00:24:30] network and even qualitatively you might think
[00:24:30] think it's it's time hard to understand what
[00:24:33] it's it's time hard to understand what we'll happen to see
[00:24:34] we'll happen to see so just uh imagine if you have a huge
[00:24:37] so just uh imagine if you have a huge bayesian network and you want to get
[00:24:39] bayesian network and you want to get qualitatively precise answers
[00:24:42] qualitatively precise answers um you should be glad that we have
[00:24:43] um you should be glad that we have vision networks that can allow you to
[00:24:46] vision networks that can allow you to answer these questions
[00:24:49] based on laws of probability
[00:24:54] so
[00:24:55] so now let's define
[00:24:57] now let's define bayesian networks formally so a bayesian
[00:25:00] bayesian networks formally so a bayesian network
[00:25:02] network is specified by a set of random
[00:25:04] is specified by a set of random variables generically x1 to xn
[00:25:08] variables generically x1 to xn and it specifies a direct acyclic graph
[00:25:12] and it specifies a direct acyclic graph over these random variables
[00:25:15] over these random variables so that specifies the
[00:25:18] so that specifies the dependencies qualitatively
[00:25:20] dependencies qualitatively and then we specify a local conditional
[00:25:24] and then we specify a local conditional distributions for each
[00:25:26] distributions for each variable x i given
[00:25:28] variable x i given the parents
[00:25:29] the parents of x i
[00:25:33] and when you
[00:25:34] and when you multiply all these
[00:25:36] multiply all these local conditional distributions together
[00:25:40] local conditional distributions together then you get
[00:25:41] then you get the joint distribution over all the
[00:25:43] the joint distribution over all the random variables
[00:25:45] random variables okay so again we're using local
[00:25:47] okay so again we're using local uh
[00:25:48] uh lowercase p to denote local conditional
[00:25:50] lowercase p to denote local conditional distributions and blackboard p to denote
[00:25:53] distributions and blackboard p to denote joint distribution
[00:25:57] so now we can look at probabilistic
[00:25:59] so now we can look at probabilistic inference more formally as well so
[00:26:02] inference more formally as well so all the same friends you're given as
[00:26:03] all the same friends you're given as input a bayesian network specifying this
[00:26:05] input a bayesian network specifying this joint distribution so this is my
[00:26:07] joint distribution so this is my probabilistic database
[00:26:09] probabilistic database i get some evidence
[00:26:11] i get some evidence where a subset of the variables e
[00:26:14] where a subset of the variables e has been observed to take on particular
[00:26:17] has been observed to take on particular values on little e
[00:26:20] values on little e and i'm interested in a set of query
[00:26:22] and i'm interested in a set of query variables q which is another set of
[00:26:24] variables q which is another set of variables
[00:26:26] variables so now the probabilistic inference
[00:26:28] so now the probabilistic inference produces
[00:26:29] produces a probability of q
[00:26:32] a probability of q conditioned on the evidence
[00:26:35] conditioned on the evidence and
[00:26:36] and just to be very precise what this means
[00:26:38] just to be very precise what this means this is
[00:26:40] this is a probability
[00:26:42] a probability u equals q given equals e
[00:26:44] u equals q given equals e for each value
[00:26:46] for each value uh little q
[00:26:49] uh little q for example
[00:26:50] for example if i have
[00:26:51] if i have different coffee and have heis we have a
[00:26:54] different coffee and have heis we have a cold
[00:26:55] cold this is expressed as this probabilistic
[00:26:57] this is expressed as this probabilistic inference question what is the
[00:26:59] inference question what is the probability of a cold condition on
[00:27:03] probability of a cold condition on coughing and hdrs
[00:27:07] so this is a formal definition of
[00:27:08] so this is a formal definition of probabilistic inference
[00:27:10] probabilistic inference um the bad news is that answers
[00:27:12] um the bad news is that answers computing this is actually going to turn
[00:27:14] computing this is actually going to turn out to be very computationally
[00:27:16] out to be very computationally intractable
[00:27:17] intractable but we'll see algorithms that can tackle
[00:27:20] but we'll see algorithms that can tackle either approximately or special cases
[00:27:23] either approximately or special cases surely
[00:27:25] surely so in summary we've introduced bayesian
[00:27:27] so in summary we've introduced bayesian networks
[00:27:29] networks it's important to think about
[00:27:32] it's important to think about the basis of asian networks is based on
[00:27:33] the basis of asian networks is based on these random variables which capture the
[00:27:36] these random variables which capture the state of the world
[00:27:38] state of the world we have directed edges between these
[00:27:40] we have directed edges between these variables which represent
[00:27:42] variables which represent directional dependencies
[00:27:46] quantitatively we define a local
[00:27:49] quantitatively we define a local conditional distribution for each
[00:27:50] conditional distribution for each variable it's conditioned on its parents
[00:27:53] variable it's conditioned on its parents and we multiply all those together to
[00:27:54] and we multiply all those together to produce a joint distribution
[00:27:57] produce a joint distribution now this joint distribution is my
[00:27:59] now this joint distribution is my probabilistic database where i can ask
[00:28:01] probabilistic database where i can ask questions about
[00:28:03] questions about the world
[00:28:05] the world and this is the process of probabilistic
[00:28:07] and this is the process of probabilistic inference
[00:28:09] inference and hopefully through the alarm and the
[00:28:11] and hopefully through the alarm and the medical diagnosis examples
[00:28:13] medical diagnosis examples hopefully you can appreciate um the
[00:28:16] hopefully you can appreciate um the framework which is bayesian networks and
[00:28:17] framework which is bayesian networks and that captures certain types of reasoning
[00:28:20] that captures certain types of reasoning patterns such as explaining away which
[00:28:22] patterns such as explaining away which might be intuitive or counter-intuitive
[00:28:25] might be intuitive or counter-intuitive but you can rest that well at night
[00:28:27] but you can rest that well at night because this is all based on laws of
[00:28:30] because this is all based on laws of probability okay


================================================================================
LECTURE 034
================================================================================

Bayesian Networks 3 - Probabilistic Programming | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=ZVk8y1zVoD4

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about probabilistic programming new way
[00:00:09] about probabilistic programming new way to think about defining bayesian
[00:00:11] to think about defining bayesian networks through the lens of writing
[00:00:13] networks through the lens of writing programs and this really is going to
[00:00:15] programs and this really is going to highlight the generative process aspect
[00:00:17] highlight the generative process aspect of bayesian networks
[00:00:20] of bayesian networks so recall in bayesian networks is
[00:00:21] so recall in bayesian networks is defined by a set of variables
[00:00:25] defined by a set of variables there are directed edges between the
[00:00:28] there are directed edges between the random variables that capture
[00:00:29] random variables that capture qualitative relationships
[00:00:32] qualitative relationships then for every variable
[00:00:34] then for every variable we define a local conditional
[00:00:35] we define a local conditional distribution condition on the parents of
[00:00:39] distribution condition on the parents of that variable
[00:00:40] that variable you multiply all these together and you
[00:00:42] you multiply all these together and you get the joint distribution over all the
[00:00:45] get the joint distribution over all the random
[00:00:47] random and then given this joint distribution
[00:00:49] and then given this joint distribution as a probabilistic database you can go
[00:00:51] as a probabilistic database you can go and do probabilistic inference and
[00:00:53] and do probabilistic inference and answer all sorts of questions
[00:00:56] answer all sorts of questions so what we're going to focus on today is
[00:00:57] so what we're going to focus on today is how to write down this
[00:01:00] how to write down this joint distribution or via the beijing
[00:01:02] joint distribution or via the beijing network
[00:01:03] network so we're going to look at it in a you
[00:01:05] so we're going to look at it in a you via the lens of programs let's go
[00:01:07] via the lens of programs let's go through this example so let me write a
[00:01:09] through this example so let me write a short program that i claim is going to
[00:01:11] short program that i claim is going to be equivalent to writing down either
[00:01:16] be equivalent to writing down either this ex this
[00:01:17] this ex this equation or drawing this graph
[00:01:20] equation or drawing this graph so here it goes
[00:01:22] so here it goes so first i'm going to draw b from a
[00:01:24] so first i'm going to draw b from a bernoulli distribution
[00:01:26] bernoulli distribution so you can think about this as the
[00:01:28] so you can think about this as the bernoulli is just a function
[00:01:31] bernoulli is just a function that
[00:01:32] that it snaps on
[00:01:33] it snaps on and returns
[00:01:35] and returns one with or true with probability
[00:01:37] one with or true with probability epsilon
[00:01:39] epsilon so b is going to be set to 1 or true
[00:01:43] so b is going to be set to 1 or true with a probability epsilon
[00:01:45] with a probability epsilon i'm going to independently do the same
[00:01:47] i'm going to independently do the same for e
[00:01:49] for e and then finally i'm going to set a
[00:01:51] and then finally i'm going to set a equals b for e so if i run this program
[00:01:56] equals b for e so if i run this program it's going to produce a setting to a
[00:02:00] it's going to produce a setting to a b and b
[00:02:02] b and b so in general a project program is just
[00:02:04] so in general a project program is just a randomized program such that if you
[00:02:06] a randomized program such that if you run it it sets the random variables
[00:02:10] run it it sets the random variables and in particular it produces an
[00:02:11] and in particular it produces an assignment to the random variable
[00:02:15] assignment to the random variable so
[00:02:16] so while you can run the program it's
[00:02:19] while you can run the program it's useful to think about the program itself
[00:02:21] useful to think about the program itself as just a mathematical construct that's
[00:02:24] as just a mathematical construct that's used to define
[00:02:25] used to define a distribution in particular the
[00:02:28] a distribution in particular the probability of a program producing a
[00:02:30] probability of a program producing a particular assignment is going to be
[00:02:33] particular assignment is going to be if by definition the joint distribution
[00:02:35] if by definition the joint distribution over that assignment
[00:02:38] over that assignment so let's look at a more interesting
[00:02:40] so let's look at a more interesting example that showcases the convenience
[00:02:42] example that showcases the convenience of using programming
[00:02:45] of using programming so this one's going to have a for loop
[00:02:46] so this one's going to have a for loop in it so let's say we're doing object
[00:02:48] in it so let's say we're doing object tracking so we're going to assume that
[00:02:51] tracking so we're going to assume that there's some object that starts
[00:02:53] there's some object that starts at 0 0
[00:02:55] at 0 0 and then for each time step 1 through n
[00:02:59] and then for each time step 1 through n i'm going to
[00:03:01] i'm going to with probability
[00:03:03] with probability alpha
[00:03:05] alpha i'm going to go right so i minus 1 x i
[00:03:09] i'm going to go right so i minus 1 x i minus 1 is a previous location i'm going
[00:03:11] minus 1 is a previous location i'm going to add 1 comma 0 to it
[00:03:14] to add 1 comma 0 to it i'm going to go right
[00:03:16] i'm going to go right or
[00:03:17] or with probably one minus alpha i'm going
[00:03:18] with probably one minus alpha i'm going to go down
[00:03:21] to go down so here is the
[00:03:24] so here is the bayesian network corresponding to this
[00:03:26] bayesian network corresponding to this probabilistic program so you can see
[00:03:28] probabilistic program so you can see that each x i depends only on x i minus
[00:03:32] that each x i depends only on x i minus one
[00:03:34] one the cool part is that this is a program
[00:03:36] the cool part is that this is a program and we can actually run this so this is
[00:03:38] and we can actually run this so this is implemented in javascript behind scenes
[00:03:40] implemented in javascript behind scenes and you click run
[00:03:42] and you click run with alpha equals 0.5 and each run
[00:03:45] with alpha equals 0.5 and each run produces an assignment x1 x2 x3 x4 and
[00:03:49] produces an assignment x1 x2 x3 x4 and so on
[00:03:51] so on to the random variables and we can
[00:03:53] to the random variables and we can visualize them
[00:03:54] visualize them and you can play with alpha if alpha
[00:03:56] and you can play with alpha if alpha equals two
[00:03:57] equals two then
[00:03:58] then um actually let's make this point one
[00:04:02] um actually let's make this point one then
[00:04:05] i actually need to press ctrl enter to
[00:04:08] i actually need to press ctrl enter to save if it's point one then all the
[00:04:10] save if it's point one then all the trajectories are going to be over here
[00:04:12] trajectories are going to be over here it's point nine
[00:04:15] it's point nine then the trajectories are
[00:04:20] so this
[00:04:22] so this program
[00:04:23] program is specifies what is called a markov
[00:04:25] is specifies what is called a markov model which is a special case of a
[00:04:27] model which is a special case of a bayesian network where
[00:04:29] bayesian network where we have a chain each variable is only
[00:04:32] we have a chain each variable is only dependent on the previous
[00:04:36] so
[00:04:38] so with this markup model we can ask
[00:04:40] with this markup model we can ask particular questions for example what
[00:04:42] particular questions for example what are the possible trajectories given
[00:04:44] are the possible trajectories given the evidence x10 equals h2
[00:04:48] the evidence x10 equals h2 so here
[00:04:49] so here i'm going to condition on
[00:04:52] i'm going to condition on x10 equals a2
[00:04:54] x10 equals a2 and if i run this then i'm sampling from
[00:04:58] and if i run this then i'm sampling from all the program traces where i restrict
[00:05:01] all the program traces where i restrict only those ones where x10 is clamped to
[00:05:05] only those ones where x10 is clamped to a2
[00:05:07] a2 so this is a way to visualize
[00:05:09] so this is a way to visualize the conditional distribution
[00:05:12] the conditional distribution of a problem 6 program
[00:05:16] so now i'm going to quickly go through
[00:05:19] so now i'm going to quickly go through a set of examples of bayesian networks
[00:05:22] a set of examples of bayesian networks and by using probabilistic programs to
[00:05:25] and by using probabilistic programs to write them down
[00:05:26] write them down um so this is going to be a fairly broad
[00:05:29] um so this is going to be a fairly broad uh and quick overview um
[00:05:32] uh and quick overview um so one
[00:05:33] so one another my application is in language
[00:05:35] another my application is in language modeling which is often used to score
[00:05:38] modeling which is often used to score sentences
[00:05:39] sentences for speech recognition or machine
[00:05:40] for speech recognition or machine translation
[00:05:43] translation so here is a probabilistic program
[00:05:46] so here is a probabilistic program for each position in the sentence we're
[00:05:48] for each position in the sentence we're going to generate a word x i
[00:05:50] going to generate a word x i given i minus one so this is actually in
[00:05:54] given i minus one so this is actually in nlp it's called a bi-gram or more
[00:05:57] nlp it's called a bi-gram or more generally n-gram
[00:05:58] generally n-gram model
[00:05:59] model so here we generate x1 maybe that's rec
[00:06:03] so here we generate x1 maybe that's rec and generate x2 given x1 maybe that's ah
[00:06:06] and generate x2 given x1 maybe that's ah and made x3 give an x2 that's nice and
[00:06:08] and made x3 give an x2 that's nice and x4 given x3 that's beach
[00:06:15] so here is an example of object tracking
[00:06:18] so here is an example of object tracking which that's actually we're going to
[00:06:20] which that's actually we're going to study at length in future modules
[00:06:23] study at length in future modules this is called a hidden markov model
[00:06:26] this is called a hidden markov model um so here for every time step equals
[00:06:29] um so here for every time step equals one to big t
[00:06:31] one to big t i'm going to generate an object location
[00:06:34] i'm going to generate an object location ht
[00:06:35] ht so for example h1 i'm going to generate
[00:06:38] so for example h1 i'm going to generate 3 1
[00:06:39] 3 1 and then i'm going to also generate a
[00:06:40] and then i'm going to also generate a sensor reading et
[00:06:43] sensor reading et given ht
[00:06:44] given ht so given h1 i'm going to generate
[00:06:47] so given h1 i'm going to generate um e1
[00:06:49] um e1 and i might get something like just the
[00:06:51] and i might get something like just the sum of coordinates uh
[00:06:53] sum of coordinates uh example
[00:06:55] example and then i'm gonna move to the next time
[00:06:56] and then i'm gonna move to the next time step generate h2 given h1 um
[00:07:00] step generate h2 given h1 um maybe that's three two i'm going to
[00:07:02] maybe that's three two i'm going to generate a sensor reading which is some
[00:07:04] generate a sensor reading which is some of the coordinates and then so on
[00:07:06] of the coordinates and then so on generate h3 generate e3 turn h4 e4 and
[00:07:10] generate h3 generate e3 turn h4 e4 and ray h5e
[00:07:13] ray h5e so that specifies the joint distribution
[00:07:15] so that specifies the joint distribution over these
[00:07:17] over these object locations
[00:07:19] object locations and sensor readings and now a canonical
[00:07:21] and sensor readings and now a canonical question you might want to ask is given
[00:07:23] question you might want to ask is given the sensor readings where is the object
[00:07:28] so here is a generalization
[00:07:31] so here is a generalization of the hmm to allow for multiple object
[00:07:34] of the hmm to allow for multiple object tracking it's called a factorial hmm
[00:07:37] tracking it's called a factorial hmm so here for every time step now i'm
[00:07:39] so here for every time step now i'm going to have two objects a and b
[00:07:42] going to have two objects a and b and i'm going to generate the location
[00:07:44] and i'm going to generate the location of object o at time step t
[00:07:48] of object o at time step t um for example here i have
[00:07:51] um for example here i have h 1 a
[00:07:53] h 1 a and h 1 b
[00:07:56] and h 1 b and i'm going to generate a single
[00:07:59] and i'm going to generate a single sensor reading
[00:08:00] sensor reading which depends on both the objects
[00:08:04] which depends on both the objects so here i have e1
[00:08:07] so here i have e1 condition on both
[00:08:08] condition on both h1a and
[00:08:11] h1a and go h1b
[00:08:11] go h1b the next time step
[00:08:13] the next time step generate the object locations
[00:08:16] generate the object locations for
[00:08:18] for the two objects and then generate the
[00:08:20] the two objects and then generate the sensor reading condition on those two
[00:08:22] sensor reading condition on those two objects
[00:08:23] objects transition to the third time step
[00:08:26] transition to the third time step generate the sensor reading transition
[00:08:28] generate the sensor reading transition to the fourth time step
[00:08:30] to the fourth time step sensor reading
[00:08:33] so in general this defines a joint
[00:08:35] so in general this defines a joint distribution now overall
[00:08:38] distribution now overall the object locations for both objects as
[00:08:42] the object locations for both objects as well as the corresponding sensor reading
[00:08:46] so here is another uh classic example
[00:08:50] so here is another uh classic example called naive bayes which is often used
[00:08:52] called naive bayes which is often used for
[00:08:53] for a very fast classification
[00:08:56] a very fast classification so the way naive phase works is that
[00:08:58] so the way naive phase works is that we're going to
[00:08:59] we're going to generate a class or a label y
[00:09:03] generate a class or a label y example in document classification i
[00:09:05] example in document classification i might generate that this document is
[00:09:07] might generate that this document is going to be about travel
[00:09:09] going to be about travel and then for each position in the
[00:09:11] and then for each position in the document
[00:09:12] document i'm going to generate a word wi
[00:09:16] i'm going to generate a word wi so for this one i might generate a beach
[00:09:19] so for this one i might generate a beach um
[00:09:20] um second word my general paris
[00:09:22] second word my general paris um and then
[00:09:25] um and then all the way up to wl
[00:09:27] all the way up to wl so now the typical way you uh um
[00:09:30] so now the typical way you uh um use these database models is that you're
[00:09:32] use these database models is that you're given a text document which is the
[00:09:34] given a text document which is the sequence of words you asked for
[00:09:37] sequence of words you asked for the label what is this document
[00:09:43] so a fancier version of the naive phase
[00:09:46] so a fancier version of the naive phase model is
[00:09:48] model is called laying dirichlet allocation
[00:09:50] called laying dirichlet allocation and here we're going to assume that a
[00:09:53] and here we're going to assume that a document is not just about one topic but
[00:09:55] document is not just about one topic but possibly multiple topics so i'm going to
[00:09:56] possibly multiple topics so i'm going to generate a distribution over topics
[00:09:58] generate a distribution over topics called alpha notice that this is
[00:10:00] called alpha notice that this is actually a continuous
[00:10:02] actually a continuous random variable a might take on
[00:10:05] random variable a might take on uh values um
[00:10:08] uh values um which uh assigns probability 0.8 to
[00:10:11] which uh assigns probability 0.8 to travel 1.2 to europe
[00:10:13] travel 1.2 to europe and then for again for each us element
[00:10:16] and then for again for each us element in the document each position i
[00:10:19] in the document each position i and generate a topic
[00:10:21] and generate a topic zi
[00:10:22] zi so here i might generate um
[00:10:25] so here i might generate um travel for z1
[00:10:27] travel for z1 i'm going to generate a word
[00:10:29] i'm going to generate a word given that topic so here w1
[00:10:33] given that topic so here w1 um given the d1 let me carry the beach
[00:10:37] um given the d1 let me carry the beach then move on to the next word generate a
[00:10:38] then move on to the next word generate a topic generate a word given the topic
[00:10:41] topic generate a word given the topic and so on and so forth before i reach
[00:10:43] and so on and so forth before i reach the end
[00:10:44] the end document
[00:10:46] document okay so the typical way you would
[00:10:48] okay so the typical way you would uh use lda is that you're given a text
[00:10:50] uh use lda is that you're given a text document
[00:10:51] document the words here
[00:10:53] the words here what topics is it about
[00:10:55] what topics is it about i want to infer what are the topics for
[00:10:58] i want to infer what are the topics for each of the words but also the topic
[00:11:00] each of the words but also the topic distribution for that
[00:11:05] so here is another example which
[00:11:07] so here is another example which generalizes the bayesian network that we
[00:11:08] generalizes the bayesian network that we actually saw in a previous module
[00:11:11] actually saw in a previous module so
[00:11:13] so in general let's suppose that
[00:11:16] in general let's suppose that you have a bunch of diseases
[00:11:19] you have a bunch of diseases we're going to generate for each disease
[00:11:21] we're going to generate for each disease di which is the activity of disease high
[00:11:24] di which is the activity of disease high so we might have pneumonia generator 1
[00:11:28] so we might have pneumonia generator 1 cold and malaria
[00:11:31] cold and malaria um
[00:11:32] um and we're going to have a set of
[00:11:34] and we're going to have a set of symptoms and symptoms
[00:11:37] symptoms and symptoms where each symptom would generate an
[00:11:38] where each symptom would generate an activity
[00:11:40] activity of that symptom sj
[00:11:43] of that symptom sj we might have fever
[00:11:45] we might have fever which depends on
[00:11:47] which depends on the diseases
[00:11:49] the diseases and we might have a cough which depends
[00:11:51] and we might have a cough which depends on the set of diseases and we have
[00:11:54] on the set of diseases and we have vomiting which depends on the diseases
[00:11:58] vomiting which depends on the diseases so now the way you typically use this
[00:12:00] so now the way you typically use this visual network is that a patient comes
[00:12:02] visual network is that a patient comes in
[00:12:03] in and reports some symptoms
[00:12:06] and reports some symptoms um you ask the question what diseases
[00:12:09] um you ask the question what diseases might they have
[00:12:10] might they have and i'll just point out that this is a
[00:12:12] and i'll just point out that this is a case where missing information can be
[00:12:14] case where missing information can be handled naturally if you a patient
[00:12:16] handled naturally if you a patient doesn't have
[00:12:18] doesn't have um you if you didn't record a particular
[00:12:20] um you if you didn't record a particular symptom
[00:12:21] symptom then you can just
[00:12:23] then you can just ignore that variable
[00:12:27] so here is another example
[00:12:30] so here is another example motivation is that you have a social
[00:12:32] motivation is that you have a social network you want to analyze
[00:12:34] network you want to analyze why certain people are connected with
[00:12:36] why certain people are connected with other people
[00:12:37] other people so the model is formally called a
[00:12:39] so the model is formally called a sarcastic block model
[00:12:42] sarcastic block model the idea is that for each person
[00:12:44] the idea is that for each person we're going to generate a type of that
[00:12:47] we're going to generate a type of that person
[00:12:49] person so maybe we have three people a
[00:12:50] so maybe we have three people a politician a scientist
[00:12:53] politician a scientist and another scientist
[00:12:55] and another scientist and then for every pair of people we're
[00:12:58] and then for every pair of people we're going to generate
[00:13:00] going to generate whether those two people are connected
[00:13:02] whether those two people are connected eij is a boolean
[00:13:04] eij is a boolean whether
[00:13:07] this
[00:13:08] this politician and the scientist
[00:13:10] politician and the scientist um
[00:13:11] um might be connected there's a one
[00:13:14] might be connected there's a one and the generation of this only depends
[00:13:16] and the generation of this only depends on the types
[00:13:17] on the types of the two people in consideration
[00:13:20] of the two people in consideration so
[00:13:21] so two and three
[00:13:23] two and three are scientists and they're connected and
[00:13:25] are scientists and they're connected and this politician and this scientist
[00:13:28] this politician and this scientist are not connected
[00:13:31] are not connected so remember we are given
[00:13:34] so remember we are given the social network which
[00:13:35] the social network which are
[00:13:36] are just the connectivity structures so
[00:13:38] just the connectivity structures so these e's
[00:13:40] these e's and we're asked what is the probability
[00:13:43] and we're asked what is the probability of
[00:13:44] of the people
[00:13:45] the people being of certain types
[00:13:49] so
[00:13:50] so that was a whirlwind tour of a lot of
[00:13:52] that was a whirlwind tour of a lot of different bayesian popular bayesian
[00:13:55] different bayesian popular bayesian network
[00:13:56] network architectures in the literature
[00:13:58] architectures in the literature but they all basically boil down to this
[00:14:02] but they all basically boil down to this one which is that there is
[00:14:04] one which is that there is a variable or a set of variables h
[00:14:08] a variable or a set of variables h which are generated first and then
[00:14:10] which are generated first and then giving rise to a set of
[00:14:12] giving rise to a set of variables e
[00:14:14] variables e so the probabilistic program
[00:14:18] so the probabilistic program specifies a bayesian network by running
[00:14:20] specifies a bayesian network by running it it gives you a joint assignment and
[00:14:23] it it gives you a joint assignment and the probability of producing that
[00:14:25] the probability of producing that uh joint assignment is the joint
[00:14:27] uh joint assignment is the joint probability
[00:14:29] probability there are many many types of models i've
[00:14:31] there are many many types of models i've only given you a very small subsample of
[00:14:34] only given you a very small subsample of them
[00:14:35] them but the
[00:14:36] but the i want you to take away from this a
[00:14:38] i want you to take away from this a general
[00:14:39] general uh
[00:14:40] uh paradigm
[00:14:42] paradigm is that you come up with stories
[00:14:44] is that you come up with stories of how quantities of interest
[00:14:47] of how quantities of interest each um
[00:14:49] each um generate the data that you observe
[00:14:52] generate the data that you observe e
[00:14:53] e so this is really the opposite of how
[00:14:55] so this is really the opposite of how you normally think about machine
[00:14:57] you normally think about machine learning or classification where you
[00:14:59] learning or classification where you start with the inputs and then you
[00:15:01] start with the inputs and then you define a sequence of operations to
[00:15:03] define a sequence of operations to produce the outputs
[00:15:05] produce the outputs invasion networks often it's reversed
[00:15:07] invasion networks often it's reversed you think about the quantities of
[00:15:09] you think about the quantities of interest first
[00:15:11] interest first how they might arise in the world and
[00:15:13] how they might arise in the world and then how
[00:15:14] then how the data is generated from those
[00:15:16] the data is generated from those quantity of interest
[00:15:18] quantity of interest so this paradigm might take a little bit
[00:15:21] so this paradigm might take a little bit of getting used to but it might become
[00:15:23] of getting used to but it might become natural after some practice all right
[00:15:26] natural after some practice all right that's in


================================================================================
LECTURE 035
================================================================================

Bayesian Networks 4 - Probabilistic Inference | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=-dGOWB9Zh8s

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about the general strategy for
[00:00:08] about the general strategy for performing probabilistic inference
[00:00:10] performing probabilistic inference injection networks
[00:00:12] injection networks so recall that the bayesian network
[00:00:15] so recall that the bayesian network consists of a set of random variables
[00:00:17] consists of a set of random variables for example
[00:00:19] for example cold allergies cough
[00:00:21] cold allergies cough and hgi
[00:00:23] and hgi and then the bayesian network defines a
[00:00:25] and then the bayesian network defines a direct acyclic graph over these random
[00:00:28] direct acyclic graph over these random variables that capture the qualitative
[00:00:30] variables that capture the qualitative dependencies
[00:00:31] dependencies between the variables for example cough
[00:00:34] between the variables for example cough is caused by
[00:00:35] is caused by old or allergies
[00:00:37] old or allergies gis is caused by allergies alone
[00:00:40] gis is caused by allergies alone quantitatively the bayesian network
[00:00:41] quantitatively the bayesian network specifies a set of local conditional
[00:00:44] specifies a set of local conditional distribution
[00:00:45] distribution of each variable x i given
[00:00:48] of each variable x i given the parents
[00:00:49] the parents of phi
[00:00:52] of phi and so in this example i would have
[00:00:53] and so in this example i would have probability of c times probability of a
[00:00:56] probability of c times probability of a times probability of each given cna
[00:00:58] times probability of each given cna times probability of i given a
[00:01:02] times probability of i given a and then when i multiply all these
[00:01:04] and then when i multiply all these probabilities uh together
[00:01:07] probabilities uh together then i get by definition the joint
[00:01:11] then i get by definition the joint uh this
[00:01:12] uh this probability distribution over all of the
[00:01:14] probability distribution over all of the random
[00:01:16] random in this case i have a joint distribution
[00:01:18] in this case i have a joint distribution over
[00:01:18] over c
[00:01:19] c a
[00:01:21] a h
[00:01:22] h and i
[00:01:26] you can think about the
[00:01:27] you can think about the bayesian network as defining this joint
[00:01:29] bayesian network as defining this joint distribution which is a probabilistic
[00:01:31] distribution which is a probabilistic database
[00:01:32] database where you can answer questions about
[00:01:35] where you can answer questions about this for example
[00:01:37] this for example what is the probability of c given h
[00:01:40] what is the probability of c given h equals 1 and i equals 1. generally you
[00:01:42] equals 1 and i equals 1. generally you have a bayesian network
[00:01:44] have a bayesian network some of the variables you
[00:01:46] some of the variables you observe as evidence for example
[00:01:49] observe as evidence for example h and i in this case
[00:01:51] h and i in this case and another set of variables you are
[00:01:53] and another set of variables you are interested in which are the query
[00:01:55] interested in which are the query variables
[00:01:57] variables so that way q would be c here
[00:02:00] so that way q would be c here and what we want to produce is
[00:02:02] and what we want to produce is the probability of the query variable's
[00:02:05] the probability of the query variable's condition on evidence
[00:02:07] condition on evidence normally this is a probability
[00:02:09] normally this is a probability of q equals q
[00:02:11] of q equals q for each of the values of little q
[00:02:18] so the overarching strategy that we're
[00:02:20] so the overarching strategy that we're going to take for performing inference
[00:02:22] going to take for performing inference invasion networks is to convert them
[00:02:24] invasion networks is to convert them into markov networks which we discussed
[00:02:27] into markov networks which we discussed inference for
[00:02:29] inference for so
[00:02:30] so recall we're going to walk through this
[00:02:31] recall we're going to walk through this example
[00:02:33] example so recall that the joint distribution
[00:02:36] so recall that the joint distribution over
[00:02:38] over the variables here
[00:02:39] the variables here is equal to the simply the product of
[00:02:43] is equal to the simply the product of the local conditional distributions
[00:02:46] the local conditional distributions by definition of the beijing
[00:02:50] by definition of the beijing but
[00:02:51] but these local conditional distributions
[00:02:53] these local conditional distributions are non-negative quantities so they can
[00:02:55] are non-negative quantities so they can be interpreted as factors
[00:02:57] be interpreted as factors in the factor graph so let's draw the
[00:02:59] in the factor graph so let's draw the factor graph
[00:03:01] factor graph so here we have the same set of
[00:03:03] so here we have the same set of variables
[00:03:05] variables for every variable we have
[00:03:07] for every variable we have a factor corresponding to local
[00:03:09] a factor corresponding to local conditional distribution we have
[00:03:11] conditional distribution we have probability of c
[00:03:13] probability of c probability of a
[00:03:15] probability of a probability of h given c and a which
[00:03:18] probability of h given c and a which connects c a and h
[00:03:21] connects c a and h and then probability of i
[00:03:24] so
[00:03:25] so in the factory graph representation
[00:03:27] in the factory graph representation these are simply functions
[00:03:29] these are simply functions this is a function that takes depends on
[00:03:31] this is a function that takes depends on c and h and the factor graph doesn't
[00:03:34] c and h and the factor graph doesn't really care that it's a local
[00:03:35] really care that it's a local distribution
[00:03:37] distribution so now remember in a markov network
[00:03:40] so now remember in a markov network we take a factor graph
[00:03:42] we take a factor graph and we multiply all the factors together
[00:03:44] and we multiply all the factors together and we divide by the normalization
[00:03:47] and we divide by the normalization constant to get
[00:03:49] constant to get this product to sum to one
[00:03:52] this product to sum to one but notice that in this case that the
[00:03:54] but notice that in this case that the normalization constant is exactly one
[00:03:56] normalization constant is exactly one because we
[00:03:58] because we had this equality from the definition of
[00:04:01] had this equality from the definition of the bayesian network so z has to be one
[00:04:05] the bayesian network so z has to be one in this case
[00:04:06] in this case so the bayesian network is just a markov
[00:04:08] so the bayesian network is just a markov network with a normalization constant
[00:04:11] network with a normalization constant one
[00:04:12] one and that means we can take
[00:04:16] and that means we can take any major network and
[00:04:18] any major network and reinterpret it as a markov network and
[00:04:20] reinterpret it as a markov network and answer all sorts of marginal queries for
[00:04:22] answer all sorts of marginal queries for example we can ask for the probability
[00:04:24] example we can ask for the probability of a we can ask for the probability of h
[00:04:27] of a we can ask for the probability of h and so on
[00:04:29] and so on i'll just remind you that
[00:04:32] i'll just remind you that a single factor
[00:04:34] a single factor connects all the parents
[00:04:36] connects all the parents so notice that there are two edges c to
[00:04:38] so notice that there are two edges c to h
[00:04:39] h a to h here but in the factory graph
[00:04:42] a to h here but in the factory graph representation you should connect
[00:04:44] representation you should connect um
[00:04:46] um the parents and the child into one
[00:04:50] the parents and the child into one okay so
[00:04:52] okay so there's only one thing missing from this
[00:04:54] there's only one thing missing from this picture which is that often in bayesian
[00:04:56] picture which is that often in bayesian networks you want to condition on
[00:04:58] networks you want to condition on evidence
[00:05:00] evidence so let's condition on h and i
[00:05:05] to do this we're going to define a
[00:05:07] to do this we're going to define a markov network
[00:05:09] markov network over the non-conditioned variables so in
[00:05:12] over the non-conditioned variables so in this case that's going to be c
[00:05:14] this case that's going to be c equals c
[00:05:15] equals c a equals a condition on h equals 1 and i
[00:05:18] a equals a condition on h equals 1 and i equals 1.
[00:05:20] equals 1. and
[00:05:22] and what i'm going to do is we're just going
[00:05:25] what i'm going to do is we're just going to substitute the values
[00:05:27] to substitute the values to of the evidence into the factors
[00:05:30] to of the evidence into the factors themselves
[00:05:31] themselves so here is the factor graph i have only
[00:05:33] so here is the factor graph i have only c and a left
[00:05:35] c and a left and p of c and p of a is the same
[00:05:38] and p of c and p of a is the same and now we have this factor
[00:05:41] and now we have this factor that depends on c and a but h is equal
[00:05:44] that depends on c and a but h is equal to one so i don't need to represent h as
[00:05:46] to one so i don't need to represent h as a variable
[00:05:48] a variable and i hear the same i equals one so i
[00:05:50] and i hear the same i equals one so i don't need to represent i as x variable
[00:05:53] don't need to represent i as x variable so now i take these four factors
[00:05:56] so now i take these four factors and i multiply them all together
[00:05:59] and i multiply them all together and i get
[00:06:01] and i get that is this factor graph and now i need
[00:06:03] that is this factor graph and now i need to normalize
[00:06:05] to normalize by one over z it's a different z now
[00:06:08] by one over z it's a different z now in this case
[00:06:10] in this case z is not one because i'm conditioning on
[00:06:12] z is not one because i'm conditioning on evidence
[00:06:14] evidence and in particular
[00:06:16] and in particular z is going to be the probability of the
[00:06:20] z is going to be the probability of the evidence
[00:06:22] evidence and you can see this because this is a
[00:06:24] and you can see this because this is a joint this this is a conditional
[00:06:26] joint this this is a conditional distribution and conditional
[00:06:28] distribution and conditional distribution is equal to the joint
[00:06:30] distribution is equal to the joint distribution
[00:06:32] distribution divided by the marginal of the thing
[00:06:35] divided by the marginal of the thing that you're conditioning on
[00:06:36] that you're conditioning on z has to be equal to the marginal
[00:06:38] z has to be equal to the marginal evidence
[00:06:40] evidence but nonetheless this is a markov network
[00:06:44] but nonetheless this is a markov network and now again we can run any inference
[00:06:46] and now again we can run any inference algorithm we like over this markup
[00:06:49] algorithm we like over this markup network for example gibbs sampling
[00:06:51] network for example gibbs sampling so let me actually do that in this uh
[00:06:54] so let me actually do that in this uh here
[00:06:55] here so here is the medical diagnosis we
[00:06:58] so here is the medical diagnosis we define variable c
[00:07:00] define variable c a
[00:07:01] a h and i
[00:07:03] h and i we're gonna condition on h equals one i
[00:07:06] we're gonna condition on h equals one i equals one
[00:07:07] equals one now we're interested in the marginal
[00:07:09] now we're interested in the marginal probability of c
[00:07:10] probability of c and we're going to run gibbs sampling
[00:07:14] and we're going to run gibbs sampling so um if sampling remember it's going to
[00:07:18] so um if sampling remember it's going to take opera arbitrary factor graph
[00:07:22] take opera arbitrary factor graph or markup network and it's going to go
[00:07:24] or markup network and it's going to go through an assignment and reassign each
[00:07:27] through an assignment and reassign each variable one at a time and it's going to
[00:07:29] variable one at a time and it's going to accumulate these counts
[00:07:31] accumulate these counts let me speed this up a little bit and
[00:07:33] let me speed this up a little bit and say do a thousand steps at a time
[00:07:35] say do a thousand steps at a time and now you can see that these counts
[00:07:38] and now you can see that these counts should converge to
[00:07:40] should converge to the right it's about
[00:07:42] the right it's about 0.13 should converge to the right
[00:07:46] 0.13 should converge to the right probability of c
[00:07:47] probability of c condition on h and i
[00:07:54] so then we're kind of done we
[00:07:57] so then we're kind of done we have a bayesian network we condition on
[00:07:58] have a bayesian network we condition on evidence
[00:08:00] evidence we reform this reduced factor graph
[00:08:03] we reform this reduced factor graph or markov network and then we just run
[00:08:06] or markov network and then we just run gibbs sampling
[00:08:08] so in some sense we are done but i want
[00:08:11] so in some sense we are done but i want to push this a little bit further and
[00:08:13] to push this a little bit further and show how we can leverage the structure
[00:08:15] show how we can leverage the structure of asian networks to optimize things
[00:08:18] of asian networks to optimize things so let's take another example where
[00:08:20] so let's take another example where we're now conditioning on
[00:08:22] we're now conditioning on h
[00:08:23] h okay so
[00:08:25] okay so we're conditioning our h so let's go
[00:08:26] we're conditioning our h so let's go through the motions here we're going to
[00:08:28] through the motions here we're going to find the markup network on the um
[00:08:31] find the markup network on the um the variables that we didn't condition
[00:08:34] the variables that we didn't condition on condition on h equals one
[00:08:37] on condition on h equals one and that's going to be equal to just the
[00:08:39] and that's going to be equal to just the product of all the local conditional
[00:08:42] product of all the local conditional distributions where we've substituted
[00:08:45] distributions where we've substituted now h equals one
[00:08:46] now h equals one and now the normalization constant is
[00:08:48] and now the normalization constant is the probability of the evidence
[00:08:51] the probability of the evidence and now i can ask the question
[00:08:55] and now i can ask the question what is the probability of c equals one
[00:08:56] what is the probability of c equals one given equals one
[00:08:58] given equals one this is something that i can just go and
[00:09:02] this is something that i can just go and compute using keep sampling
[00:09:04] compute using keep sampling but the question is can we reduce the
[00:09:06] but the question is can we reduce the markov network before running inference
[00:09:08] markov network before running inference because if we can get the markup network
[00:09:10] because if we can get the markup network to be a little bit smaller
[00:09:12] to be a little bit smaller then hopefully inference can be a bit
[00:09:14] then hopefully inference can be a bit faster
[00:09:17] so the answer is yes and
[00:09:21] so the answer is yes and we're going to show this by doing a
[00:09:22] we're going to show this by doing a little bit of algebra here so here is
[00:09:28] this visual network again
[00:09:30] this visual network again where i've conditioned on h
[00:09:32] where i've conditioned on h so now let me
[00:09:35] so now let me compute the marginal distribution where
[00:09:38] compute the marginal distribution where i've marginalized that i so here i don't
[00:09:40] i've marginalized that i so here i don't have i anymore
[00:09:42] have i anymore but i can express this in terms of
[00:09:45] but i can express this in terms of this probability of c a and i
[00:09:49] this probability of c a and i where i simply sum out all possible
[00:09:52] where i simply sum out all possible values of
[00:09:53] values of so this is just the definition of
[00:09:55] so this is just the definition of marginal not marginal probability
[00:09:58] marginal not marginal probability so now i can um using the definition of
[00:10:02] so now i can um using the definition of the bayesian network i can rewrite the
[00:10:04] the bayesian network i can rewrite the joint distribution
[00:10:07] joint distribution in terms of local conditional
[00:10:08] in terms of local conditional distribution
[00:10:10] distribution okay so
[00:10:14] and now
[00:10:15] and now i
[00:10:16] i make an observation which is that
[00:10:21] make an observation which is that summing over i but none of this actually
[00:10:23] summing over i but none of this actually depends on i except for this last
[00:10:26] depends on i except for this last factor
[00:10:27] factor so what i can do is
[00:10:29] so what i can do is push all this stuff out or equivalently
[00:10:32] push all this stuff out or equivalently push the summation inside so now it's
[00:10:35] push the summation inside so now it's wrapped tightly around this a p of i
[00:10:38] wrapped tightly around this a p of i given a
[00:10:40] given a now what is this
[00:10:42] now what is this by definition of local conditional
[00:10:44] by definition of local conditional distributions this is exactly one so it
[00:10:48] distributions this is exactly one so it gets dropped
[00:10:49] gets dropped so now i have this nicer form
[00:10:52] so now i have this nicer form but not only is it smaller
[00:10:54] but not only is it smaller let's try to understand what it is it's
[00:10:56] let's try to understand what it is it's this is
[00:10:58] this is the probability of c
[00:11:00] the probability of c probability of a probability of h equals
[00:11:03] probability of a probability of h equals 1 given c and a
[00:11:05] 1 given c and a so it's as if
[00:11:08] so it's as if this variable i didn't exist at all
[00:11:12] this variable i didn't exist at all so this is a
[00:11:14] so this is a general idea behind bayesian networks
[00:11:16] general idea behind bayesian networks which is that you can throw away any
[00:11:19] which is that you can throw away any unobserved leaves before running
[00:11:21] unobserved leaves before running inference
[00:11:22] inference so this is very powerful because
[00:11:25] so this is very powerful because it connects
[00:11:26] it connects [Music]
[00:11:27] [Music] over variables which is generally an
[00:11:29] over variables which is generally an algebraic operation involves a lot of
[00:11:31] algebraic operation involves a lot of hard work
[00:11:33] hard work with
[00:11:34] with removal which is a graph operation which
[00:11:37] removal which is a graph operation which is more intuitive than trivial in this
[00:11:39] is more intuitive than trivial in this case
[00:11:40] case so in general marginalization is hard
[00:11:42] so in general marginalization is hard but when they are unobserved leaves of a
[00:11:45] but when they are unobserved leaves of a bayesian network it is trivial just
[00:11:47] bayesian network it is trivial just remove it
[00:11:52] so here is another type of structure we
[00:11:54] so here is another type of structure we can exploit which is actually not
[00:11:56] can exploit which is actually not specific to bayesian networks
[00:11:58] specific to bayesian networks it shows up more generally in markup
[00:12:00] it shows up more generally in markup network
[00:12:01] network so let's take another example here we're
[00:12:03] so let's take another example here we're going to condition on i this time
[00:12:06] going to condition on i this time so
[00:12:07] so here we're going to define this markup
[00:12:08] here we're going to define this markup network where
[00:12:10] network where let's just write down
[00:12:12] let's just write down this query that we're interested in
[00:12:14] this query that we're interested in we're interested in c equals c given i
[00:12:16] we're interested in c equals c given i equals 1 here
[00:12:19] equals 1 here and expanding it out based on the
[00:12:21] and expanding it out based on the definition of marginal probability um i
[00:12:24] definition of marginal probability um i can put in probability of c a and h
[00:12:27] can put in probability of c a and h where i sum over all possible values of
[00:12:30] where i sum over all possible values of h so i'm marginalizing out a and h here
[00:12:34] h so i'm marginalizing out a and h here and by definition of the bayesian
[00:12:37] and by definition of the bayesian network i can replace this with the
[00:12:40] network i can replace this with the local conditional
[00:12:42] local conditional distributions
[00:12:43] distributions and now using the same trick as before
[00:12:46] and now using the same trick as before i notice that
[00:12:48] i notice that h is an unobserved leave so i can
[00:12:50] h is an unobserved leave so i can actually marginalize out h and this
[00:12:52] actually marginalize out h and this factor disappears graphically this h
[00:12:55] factor disappears graphically this h disappears
[00:12:58] and now i am left with
[00:13:00] and now i am left with this
[00:13:01] this um
[00:13:03] um asian network
[00:13:04] asian network where
[00:13:06] where notice that
[00:13:08] notice that the only thing that
[00:13:11] the only thing that depends on c is this p of c
[00:13:14] depends on c is this p of c so i can actually pull it out
[00:13:17] so i can actually pull it out and rewrite it as
[00:13:18] and rewrite it as follows and now i have p of c times some
[00:13:22] follows and now i have p of c times some mess
[00:13:23] mess and the nice thing in this case is that
[00:13:26] and the nice thing in this case is that this mess is just a constant
[00:13:29] this mess is just a constant because it doesn't depend on
[00:13:31] because it doesn't depend on z
[00:13:32] z and moreover because pfc is a
[00:13:34] and moreover because pfc is a distribution and this left-hand side is
[00:13:36] distribution and this left-hand side is the distribution this constant is
[00:13:38] the distribution this constant is actually one
[00:13:40] actually one so
[00:13:41] so because c graphically c and
[00:13:44] because c graphically c and this a i
[00:13:46] this a i uh sub graph is actually disconnected
[00:13:49] uh sub graph is actually disconnected which means that i can simply remove
[00:13:52] which means that i can simply remove this part
[00:13:55] this part so generally
[00:13:56] so generally i can throw away any disconnected
[00:13:59] i can throw away any disconnected components
[00:14:00] components before running inference
[00:14:02] before running inference okay
[00:14:04] okay so
[00:14:05] so in general
[00:14:08] in general let's summarize here
[00:14:09] let's summarize here we've tackled the problem how to perform
[00:14:11] we've tackled the problem how to perform probabilistic inference in bayesian
[00:14:13] probabilistic inference in bayesian networks by reducing the problem to
[00:14:15] networks by reducing the problem to inference in markov network
[00:14:18] inference in markov network so
[00:14:19] so to prepare the markov network we're
[00:14:22] to prepare the markov network we're going to first condition on the evidence
[00:14:25] going to first condition on the evidence so this is tantamount to substituting
[00:14:27] so this is tantamount to substituting the values of the evidence into the
[00:14:29] the values of the evidence into the factors
[00:14:31] factors then we throw away any unobserved leaves
[00:14:33] then we throw away any unobserved leaves in this case h
[00:14:36] in this case h we throw away any disconnected
[00:14:37] we throw away any disconnected components
[00:14:39] components but these two are just optimizations
[00:14:40] but these two are just optimizations which are totally optional but it'll
[00:14:42] which are totally optional but it'll often save you some work
[00:14:45] often save you some work now we define a markup network over the
[00:14:47] now we define a markup network over the remaining factors
[00:14:50] remaining factors now we just have a factor
[00:14:53] graph where we can now run your favorite
[00:14:56] graph where we can now run your favorite inference algorithm so in case it's very
[00:14:58] inference algorithm so in case it's very simple as it is the case here you can
[00:15:01] simple as it is the case here you can just do it manually or if it's
[00:15:03] just do it manually or if it's what's remaining is more complicated
[00:15:05] what's remaining is more complicated then you can do something like sampling
[00:15:08] then you can do something like sampling and that's the end
[00:15:15] you


================================================================================
LECTURE 036
================================================================================

Bayesian Networks 5 - Forward-backward Algorithm | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=N-ZPbpJOQs0

---

Transcript

[00:00:05] hi in this module i'm going to introduce
[00:00:07] hi in this module i'm going to introduce the forward backward algorithm for
[00:00:09] the forward backward algorithm for performing exact and efficient inference
[00:00:11] performing exact and efficient inference and hidden markov models which are an
[00:00:13] and hidden markov models which are an important special case of bayesian
[00:00:15] important special case of bayesian networks
[00:00:17] networks so let's revisit our object tracking
[00:00:19] so let's revisit our object tracking example now through the lens of hidemak
[00:00:23] example now through the lens of hidemak recall that at each time i
[00:00:26] recall that at each time i there's an object that is at a position
[00:00:28] there's an object that is at a position particular position h i
[00:00:30] particular position h i object might have gone
[00:00:32] object might have gone this trajectory
[00:00:34] this trajectory in each position there's also a noisy
[00:00:36] in each position there's also a noisy observation
[00:00:37] observation 0
[00:00:38] 0 2 and 2.
[00:00:40] 2 and 2. so let's formally define
[00:00:42] so let's formally define a probabilistic story for how this
[00:00:46] a probabilistic story for how this these data might occur
[00:00:48] these data might occur so we start at
[00:00:50] so we start at h1 which is the position of object at
[00:00:52] h1 which is the position of object at time step one
[00:00:54] time step one and we're going to generate this
[00:00:56] and we're going to generate this position
[00:00:57] position uniformly at random
[00:00:59] uniformly at random probably one third one third one third
[00:01:02] probably one third one third one third for each of these possible positions
[00:01:04] for each of these possible positions zero one or two
[00:01:07] zero one or two and then i'm going to transition into
[00:01:10] and then i'm going to transition into the second time step so in general i'm
[00:01:13] the second time step so in general i'm going to look at h i minus 1
[00:01:16] going to look at h i minus 1 look into each i which is going to be
[00:01:19] look into each i which is going to be up with
[00:01:20] up with probability of one quarter the same with
[00:01:23] probability of one quarter the same with probably one half and
[00:01:25] probably one half and down with probably one half
[00:01:27] down with probably one half so mathematically this looks like this
[00:01:30] so mathematically this looks like this um h i can be h i minus one minus one
[00:01:33] um h i can be h i minus one minus one the same plus one
[00:01:35] the same plus one probabilities
[00:01:37] probabilities so this transition distribution is also
[00:01:39] so this transition distribution is also used to generate h3 given h2
[00:01:43] used to generate h3 given h2 now at each time step i have an emission
[00:01:48] now at each time step i have an emission of e1 e2 and e3
[00:01:51] of e1 e2 and e3 and in general i'm looking at the actual
[00:01:53] and in general i'm looking at the actual position
[00:01:54] position at time step i
[00:01:56] at time step i and i'm going to generate e i according
[00:01:58] and i'm going to generate e i according to essentially the same process which is
[00:02:01] to essentially the same process which is up with probability one quarter
[00:02:03] up with probability one quarter same with probability one half and down
[00:02:05] same with probability one half and down with probably one
[00:02:07] with probably one and this is
[00:02:08] and this is local conditional distribution formally
[00:02:10] local conditional distribution formally stated
[00:02:13] now i
[00:02:14] now i multiply
[00:02:16] multiply all the trend of the colloquial
[00:02:18] all the trend of the colloquial conditional distributions together
[00:02:20] conditional distributions together we have the probability of start h1 we
[00:02:23] we have the probability of start h1 we have the probability of
[00:02:25] have the probability of each i given h i minus 1 for each
[00:02:27] each i given h i minus 1 for each subsequent time step
[00:02:29] subsequent time step times the probability of
[00:02:31] times the probability of the noisy sensor reading e i given the
[00:02:34] the noisy sensor reading e i given the actual position h i for all the time
[00:02:37] actual position h i for all the time steps
[00:02:38] steps and this gives us the joint distribution
[00:02:40] and this gives us the joint distribution for all over all the actual positions
[00:02:42] for all over all the actual positions and sensor readings
[00:02:47] so now let's ask questions about our
[00:02:51] so now let's ask questions about our data markov
[00:02:52] data markov so there's two types of questions which
[00:02:54] so there's two types of questions which are common
[00:02:55] are common one is called filtering
[00:02:57] one is called filtering and the other is called smoothing so the
[00:02:59] and the other is called smoothing so the filtering question
[00:03:01] filtering question is something like this which is i'm
[00:03:03] is something like this which is i'm interested in a particular
[00:03:06] interested in a particular object
[00:03:08] object location at a particular time step h2
[00:03:11] location at a particular time step h2 given some evidence
[00:03:13] given some evidence which is all the sensor readings that
[00:03:15] which is all the sensor readings that i've seen
[00:03:16] i've seen before that
[00:03:20] smoothing
[00:03:21] smoothing is similar except for i in addition
[00:03:24] is similar except for i in addition condition on the future
[00:03:26] condition on the future so i might observe e3
[00:03:28] so i might observe e3 is equal to 2 as well
[00:03:32] is equal to 2 as well so notice that filtering is actually a
[00:03:34] so notice that filtering is actually a special case of smoothing if we
[00:03:36] special case of smoothing if we marginalize unobserved leaves
[00:03:39] marginalize unobserved leaves so uh to show this suppose we have um
[00:03:44] so uh to show this suppose we have um just this
[00:03:45] just this network or hmm
[00:03:47] network or hmm and i didn't observe e3
[00:03:49] and i didn't observe e3 if i didn't observe e3 e3 is just on
[00:03:52] if i didn't observe e3 e3 is just on observed leave
[00:03:53] observed leave and i can marginalize it out by
[00:03:56] and i can marginalize it out by just removing it
[00:03:58] just removing it now h3 is an observed leaf and i can
[00:04:01] now h3 is an observed leaf and i can remove that as well
[00:04:03] remove that as well so now this filtering query
[00:04:05] so now this filtering query is actually a smoothing query where
[00:04:08] is actually a smoothing query where there is no future because i don't
[00:04:10] there is no future because i don't observe the future
[00:04:16] so
[00:04:17] so now let us just focus on
[00:04:19] now let us just focus on smoothing queries
[00:04:20] smoothing queries without loss of generality
[00:04:23] without loss of generality so the forward backward algorithm is
[00:04:24] so the forward backward algorithm is based on dynamic programming and the key
[00:04:28] based on dynamic programming and the key uh idea is to represent
[00:04:31] uh idea is to represent the set of all assignments using
[00:04:33] the set of all assignments using allowance
[00:04:34] allowance so this lattice is a directly acyclic
[00:04:37] so this lattice is a directly acyclic graph not to be confused with actual
[00:04:41] graph not to be confused with actual hmm or pigeon network
[00:04:43] hmm or pigeon network there's a start state
[00:04:45] there's a start state and an end state
[00:04:47] and an end state and
[00:04:48] and each
[00:04:49] each column is going to represent a
[00:04:52] column is going to represent a particular value
[00:04:54] particular value and each row is going to correspond to
[00:04:57] and each row is going to correspond to a particular variable
[00:05:00] a particular variable so and each path
[00:05:03] so and each path through
[00:05:04] through this lattice
[00:05:05] this lattice now is going to correspond to
[00:05:07] now is going to correspond to an assignment
[00:05:09] an assignment of
[00:05:10] of values to all the variables for example
[00:05:14] values to all the variables for example if i go through this path and set h1
[00:05:17] if i go through this path and set h1 equals 0
[00:05:19] equals 0 h2 equals 2 and h3 1.
[00:05:24] h2 equals 2 and h3 1. so this is just a very compact way of
[00:05:26] so this is just a very compact way of representing all exponentially many
[00:05:29] representing all exponentially many assignments in a
[00:05:31] assignments in a in a polynomial sized object
[00:05:35] in a polynomial sized object so now i'm going to attach weights to
[00:05:37] so now i'm going to attach weights to the edges
[00:05:39] the edges so
[00:05:40] so edge start to any of these initial
[00:05:43] edge start to any of these initial states has weight
[00:05:45] states has weight the start probability times
[00:05:48] the start probability times the first emission probability
[00:05:50] the first emission probability for example this edge here has weight
[00:05:54] for example this edge here has weight h probability of h1 equals zero
[00:05:57] h probability of h1 equals zero times probability of e1 equals zero
[00:05:59] times probability of e1 equals zero given h1
[00:06:00] given h1 because e1 remember was observed to be
[00:06:03] because e1 remember was observed to be zero so i've plugged in that evidence
[00:06:07] zero so i've plugged in that evidence so now the subsequent edges are between
[00:06:11] so now the subsequent edges are between some h i minus 1 and some h i
[00:06:14] some h i minus 1 and some h i and that has weight the transition
[00:06:16] and that has weight the transition probability times the emission
[00:06:17] probability times the emission probability of the destination
[00:06:21] probability of the destination state
[00:06:22] state for example this edge here has
[00:06:25] for example this edge here has a
[00:06:26] a weight
[00:06:27] weight probability of h2 equals zero
[00:06:30] probability of h2 equals zero uh given h1 equals zero that's the
[00:06:32] uh given h1 equals zero that's the transition times probability of e2
[00:06:34] transition times probability of e2 equals two
[00:06:36] equals two which is what we observe as evidence
[00:06:38] which is what we observe as evidence condition on h2 equal
[00:06:41] condition on h2 equal and this one is h3 equals zero given h2
[00:06:43] and this one is h3 equals zero given h2 equals zero times the emission
[00:06:44] equals zero times the emission probability and this edge doesn't have
[00:06:47] probability and this edge doesn't have anything on it so
[00:06:48] anything on it so assume it to be one
[00:06:49] assume it to be one [Music]
[00:06:51] [Music] and now
[00:06:53] and now each path
[00:06:55] each path as we stated before
[00:06:58] as we stated before from start to end
[00:06:59] from start to end is an
[00:07:00] is an assignment of
[00:07:03] assignment of of all the variables
[00:07:05] of all the variables but in particular it has weight that's
[00:07:08] but in particular it has weight that's equal to
[00:07:09] equal to the product of the edge weights
[00:07:12] the product of the edge weights so this path here
[00:07:14] so this path here has weight which is simply the product
[00:07:17] has weight which is simply the product of all these purple numbers
[00:07:21] of all these purple numbers okay so and that weight is actually the
[00:07:23] okay so and that weight is actually the joint probability of
[00:07:26] joint probability of this particular assignment and the
[00:07:30] this particular assignment and the evidence
[00:07:32] evidence okay so now the key part
[00:07:34] okay so now the key part is
[00:07:35] is that smoothing queries such as
[00:07:37] that smoothing queries such as probability of h i given
[00:07:40] probability of h i given e i
[00:07:41] e i e equals e
[00:07:43] e equals e is simply the weighted fraction of paths
[00:07:46] is simply the weighted fraction of paths through
[00:07:47] through h i equals h i
[00:07:49] h i equals h i so for example if i'm looking interested
[00:07:52] so for example if i'm looking interested in what is the probability
[00:07:54] in what is the probability of h2 equals one condition on evidence
[00:07:58] of h2 equals one condition on evidence what i'm really asking in the context of
[00:08:00] what i'm really asking in the context of this lattice is
[00:08:02] this lattice is what is the fraction
[00:08:04] what is the fraction of paths that pass through this node
[00:08:07] of paths that pass through this node compared to all paths
[00:08:09] compared to all paths so stated differently
[00:08:12] so stated differently i'm going to look at all the paths
[00:08:14] i'm going to look at all the paths through this
[00:08:15] through this this node
[00:08:17] this node add up all the weights
[00:08:19] add up all the weights and divide by
[00:08:20] and divide by the sum of the weights over all paths
[00:08:23] the sum of the weights over all paths so this gives us a really nice graphical
[00:08:26] so this gives us a really nice graphical interpretation of these smoothing
[00:08:28] interpretation of these smoothing queries
[00:08:32] so now we can compute
[00:08:35] so now we can compute those quantities
[00:08:37] those quantities using a recurrence
[00:08:39] using a recurrence i'm going to define two
[00:08:41] i'm going to define two types of objects four word messages and
[00:08:43] types of objects four word messages and backward messages
[00:08:44] backward messages here's our lattice
[00:08:46] here's our lattice the forward message
[00:08:48] the forward message for each
[00:08:50] for each node here is going to be f of i
[00:08:53] node here is going to be f of i written
[00:08:54] written of h of i
[00:08:57] of h of i and this is supposed to be the sum of
[00:08:59] and this is supposed to be the sum of weights of paths from the start to a
[00:09:02] weights of paths from the start to a particular hi equals hi
[00:09:05] particular hi equals hi so for example
[00:09:07] so for example f
[00:09:08] f 2 of
[00:09:11] 2 of of 1
[00:09:12] of 1 is going to be
[00:09:13] is going to be the sum of the weights of all paths
[00:09:16] the sum of the weights of all paths from start to h2 equals
[00:09:20] from start to h2 equals and i can compute this recursively as
[00:09:22] and i can compute this recursively as follows so all paths that go from here
[00:09:25] follows so all paths that go from here to here have to
[00:09:27] to here have to and uh
[00:09:29] and uh have to have a previous
[00:09:32] position which is one of these so i'm
[00:09:34] position which is one of these so i'm gonna sum over all possible h i minus 1
[00:09:39] gonna sum over all possible h i minus 1 values of the previous
[00:09:41] values of the previous variable
[00:09:43] variable and then i'm going to recurse on f i
[00:09:44] and then i'm going to recurse on f i minus 1 of h
[00:09:46] minus 1 of h i minus 1 so the sum of all the
[00:09:49] i minus 1 so the sum of all the weights to each of these previous
[00:09:51] weights to each of these previous locations
[00:09:52] locations times
[00:09:53] times the weight
[00:09:56] of
[00:09:58] of along the edge from a particular h
[00:10:01] along the edge from a particular h i minus 1 to h i
[00:10:05] i minus 1 to h i so the backward message is analogous so
[00:10:09] so the backward message is analogous so b i
[00:10:10] b i of h i
[00:10:12] of h i is going to be the sum of weights of all
[00:10:14] is going to be the sum of weights of all paths from a particular hi equals hi to
[00:10:17] paths from a particular hi equals hi to the end
[00:10:18] the end so b2 of 1 is the sum of all paths from
[00:10:22] so b2 of 1 is the sum of all paths from this
[00:10:23] this to the end
[00:10:26] and this again is recursively defined as
[00:10:29] and this again is recursively defined as looking at all
[00:10:32] looking at all next nodes
[00:10:34] next nodes h i plus one
[00:10:37] h i plus one um
[00:10:38] um and recurse on b of i plus one of h i
[00:10:42] and recurse on b of i plus one of h i plus one
[00:10:43] plus one times
[00:10:44] times the weight
[00:10:46] the weight of the edge from between the h i and h i
[00:10:50] of the edge from between the h i and h i plus one
[00:10:53] okay so now having defined forward and
[00:10:56] okay so now having defined forward and backward
[00:10:57] backward messages
[00:10:58] messages i can multiply them together
[00:11:01] i can multiply them together to form s i and my claim is that the sum
[00:11:05] to form s i and my claim is that the sum of the weights of all paths from start
[00:11:08] of the weights of all paths from start to end that go through a particular
[00:11:11] to end that go through a particular node is exactly
[00:11:12] node is exactly si of h i
[00:11:14] si of h i okay so
[00:11:16] okay so for example if i'm looking at this node
[00:11:18] for example if i'm looking at this node again
[00:11:19] again f i is looking at all the ways to go
[00:11:22] f i is looking at all the ways to go from start to this node
[00:11:24] from start to this node and then the bi is looking at all the
[00:11:26] and then the bi is looking at all the ways to go from this node to the end
[00:11:29] ways to go from this node to the end and if i multiply all those
[00:11:32] and if i multiply all those two quantities together then i get
[00:11:34] two quantities together then i get all the paths from start that go through
[00:11:37] all the paths from start that go through this node to the
[00:11:42] so
[00:11:43] so now we're almost done
[00:11:45] now we're almost done we can take
[00:11:47] we can take these sis
[00:11:49] these sis and then we normalize them
[00:11:52] and then we normalize them over all possible other values that each
[00:11:55] over all possible other values that each i could take
[00:11:56] i could take and that gives us exactly the
[00:11:58] and that gives us exactly the probability of h i equals h i given
[00:12:03] probability of h i equals h i given the evidence
[00:12:04] the evidence and this is exactly this the smoothing
[00:12:07] and this is exactly this the smoothing uh quantity that we're looking for
[00:12:10] uh quantity that we're looking for what is the probability of h2 equals one
[00:12:12] what is the probability of h2 equals one condition on evidence
[00:12:15] condition on evidence now putting things together the forward
[00:12:17] now putting things together the forward backward algorithm is going to simply
[00:12:20] backward algorithm is going to simply compute
[00:12:21] compute all the forward messages proceeding from
[00:12:24] all the forward messages proceeding from one to two to three all the way up to n
[00:12:27] one to two to three all the way up to n where f i depends on f i minus one so
[00:12:30] where f i depends on f i minus one so i'm going forward
[00:12:32] i'm going forward then it's going to compute all the
[00:12:33] then it's going to compute all the backward messages going from n down to
[00:12:36] backward messages going from n down to one because bi depends on bi plus one
[00:12:41] one because bi depends on bi plus one now i'm going to multiply everything to
[00:12:43] now i'm going to multiply everything to the fi and the bi together to compute s
[00:12:45] the fi and the bi together to compute s i
[00:12:46] i and normalize and that gives me
[00:12:49] and normalize and that gives me the answer to the smoothing question
[00:12:53] the answer to the smoothing question so the runtime of this algorithm is
[00:12:57] so the runtime of this algorithm is so we have n time steps
[00:12:59] so we have n time steps and for each of the time steps i have
[00:13:01] and for each of the time steps i have number of domain
[00:13:03] number of domain elements that i need to consider so this
[00:13:05] elements that i need to consider so this is the number of
[00:13:07] is the number of nodes in this lattice
[00:13:08] nodes in this lattice and i have also another multiplicative
[00:13:11] and i have also another multiplicative factor of domain
[00:13:13] factor of domain because to compute the recurrence
[00:13:17] because to compute the recurrence and this is exactly actually the number
[00:13:19] and this is exactly actually the number of edges
[00:13:20] of edges in this lattice
[00:13:23] in this lattice so one other note is that
[00:13:26] so one other note is that notice that the four backward algorithm
[00:13:29] notice that the four backward algorithm actually computes all the smoothing
[00:13:31] actually computes all the smoothing queries for each
[00:13:33] queries for each i
[00:13:34] i and this takes time
[00:13:37] and this takes time the time complexity for computing all of
[00:13:38] the time complexity for computing all of them is exactly the same as computing
[00:13:42] them is exactly the same as computing uh the one for any individual one
[00:13:44] uh the one for any individual one and that's because there's a lot of
[00:13:46] and that's because there's a lot of shared computation
[00:13:48] shared computation so the forward message here that's
[00:13:51] so the forward message here that's computed is used down here and here and
[00:13:54] computed is used down here and here and same with the backward messages in the
[00:13:56] same with the backward messages in the other direction
[00:13:58] other direction so let's look at a quick demo of um
[00:14:02] so let's look at a quick demo of um this in action so
[00:14:03] this in action so um here is the
[00:14:05] um here is the object tracking hmm so we have h1h3
[00:14:09] object tracking hmm so we have h1h3 and we
[00:14:10] and we have uh the various probabilities of h1
[00:14:15] have uh the various probabilities of h1 1 given h1 h2 given h1
[00:14:17] 1 given h1 h2 given h1 e2 h3
[00:14:20] e2 h3 so on
[00:14:22] so on and now i'm interested in the
[00:14:23] and now i'm interested in the probability
[00:14:25] probability of
[00:14:27] of h2
[00:14:28] h2 so here
[00:14:30] so here notice that i'm actually
[00:14:32] notice that i'm actually not going to
[00:14:34] not going to run forward backward but i'm going to
[00:14:36] run forward backward but i'm going to run this more general algorithm called
[00:14:37] run this more general algorithm called sum variable elimination
[00:14:40] sum variable elimination um so the details are going to be a
[00:14:42] um so the details are going to be a little bit different so don't worry
[00:14:44] little bit different so don't worry about it too much but i just want to
[00:14:45] about it too much but i just want to give you a flavor of how it works
[00:14:48] give you a flavor of how it works so here the
[00:14:50] so here the uh the first thing i do is i compute
[00:14:53] uh the first thing i do is i compute um this this factor which is actually
[00:14:56] um this this factor which is actually the forward
[00:14:58] the forward message where i've sum
[00:14:59] message where i've sum out
[00:15:01] out the previous
[00:15:02] the previous time step on h1
[00:15:06] time step on h1 and then i'm going to compute another
[00:15:08] and then i'm going to compute another factor which is summing out
[00:15:10] factor which is summing out the backward message which sums out
[00:15:13] the backward message which sums out h3
[00:15:14] h3 and then i'm going to multiply them
[00:15:16] and then i'm going to multiply them together
[00:15:17] together and i get the probability of h2
[00:15:21] and i get the probability of h2 to be
[00:15:23] to be 0.61
[00:15:24] 0.61 and 0.3
[00:15:31] all right so to summarize we've
[00:15:32] all right so to summarize we've presented the forward backward algorithm
[00:15:34] presented the forward backward algorithm for proficiency inference in hmms in
[00:15:37] for proficiency inference in hmms in particular answering
[00:15:38] particular answering smoothing questions
[00:15:41] smoothing questions so the key idea behind the forward
[00:15:43] so the key idea behind the forward background is this lattice
[00:15:44] background is this lattice representations which allows us to
[00:15:46] representations which allows us to compactly
[00:15:48] compactly represent
[00:15:50] represent paths as assignments
[00:15:52] paths as assignments and the weights of each assignment to be
[00:15:56] and the weights of each assignment to be the product of the edge weights
[00:15:59] the product of the edge weights that allows us to define
[00:16:01] that allows us to define a dynamic program
[00:16:03] a dynamic program which computes the forward and backward
[00:16:05] which computes the forward and backward messages
[00:16:06] messages an efficient way
[00:16:10] and now if you multiply the forward and
[00:16:13] and now if you multiply the forward and backward messages and normalize now you
[00:16:15] backward messages and normalize now you can compute all the smoothing queries
[00:16:18] can compute all the smoothing queries you want
[00:16:19] you want um in the same amount of time as
[00:16:21] um in the same amount of time as computing any one of them because
[00:16:22] computing any one of them because there's a lot of shared computation
[00:16:25] there's a lot of shared computation all right that's the end


================================================================================
LECTURE 037
================================================================================

Bayesian Networks 6 - Particle Filtering | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=8sOtXbQIOuE

---

Transcript

[00:00:05] hi in this module i'm going to present
[00:00:07] hi in this module i'm going to present the particle filtering algorithm for
[00:00:09] the particle filtering algorithm for performing approximate inference in
[00:00:11] performing approximate inference in hidden markov models which is really
[00:00:13] hidden markov models which is really useful when the size of the domain of
[00:00:15] useful when the size of the domain of the variables is very large
[00:00:18] the variables is very large so let's start with our familiar object
[00:00:21] so let's start with our familiar object tracking hmm
[00:00:22] tracking hmm in this case for every time step we have
[00:00:24] in this case for every time step we have a position hi of an object which we
[00:00:27] a position hi of an object which we don't see
[00:00:28] don't see and instead we see some noisy sensor
[00:00:31] and instead we see some noisy sensor readings
[00:00:32] readings so the probabilistic story of object
[00:00:34] so the probabilistic story of object tracking is as follows
[00:00:36] tracking is as follows each one the first object position is
[00:00:38] each one the first object position is generated uniformly
[00:00:41] generated uniformly across all the values zero one two
[00:00:44] across all the values zero one two subsequently h2 is generated addition on
[00:00:48] subsequently h2 is generated addition on h1 via this transition distribution
[00:00:50] h1 via this transition distribution which goes up
[00:00:52] which goes up with probably one quarter uh stays the
[00:00:55] with probably one quarter uh stays the same with probably one half down with
[00:00:57] same with probably one half down with probably one quarter
[00:00:59] probably one quarter um
[00:01:01] um h3
[00:01:02] h3 now at each time step we get a sensor
[00:01:05] now at each time step we get a sensor reading
[00:01:06] reading e1 e2 e3 condition on the respective
[00:01:09] e1 e2 e3 condition on the respective locations and that's governed by the
[00:01:12] locations and that's governed by the emission distribution
[00:01:15] emission distribution which takes high and goes up by a
[00:01:19] which takes high and goes up by a quarter
[00:01:20] quarter is the same with
[00:01:21] is the same with half goes down with probably one
[00:01:25] half goes down with probably one so now you multiply all of these local
[00:01:28] so now you multiply all of these local conditional distributions together and
[00:01:30] conditional distributions together and you get one glorious joint distribution
[00:01:32] you get one glorious joint distribution over all the object locations as well as
[00:01:35] over all the object locations as well as the sensor
[00:01:37] the sensor readings
[00:01:38] readings that's rh
[00:01:41] that's rh so now given this hmm we can ask all
[00:01:44] so now given this hmm we can ask all sorts of questions uh on it in
[00:01:46] sorts of questions uh on it in particular we looked at filtering and
[00:01:48] particular we looked at filtering and smoothing questions
[00:01:50] smoothing questions the particle filtering as the name might
[00:01:51] the particle filtering as the name might suggest does filtering so let's focus on
[00:01:54] suggest does filtering so let's focus on filtering
[00:01:56] filtering so in filtering we're asked
[00:01:58] so in filtering we're asked asking for
[00:02:00] asking for the position of an object at a
[00:02:02] the position of an object at a particular time step condition on the
[00:02:05] particular time step condition on the past
[00:02:06] past the time step one we're asking for
[00:02:08] the time step one we're asking for i'm looking at only the evidence at uh
[00:02:11] i'm looking at only the evidence at uh time step one and i was like where is
[00:02:13] time step one and i was like where is this object
[00:02:14] this object um at time step two i now have two
[00:02:18] um at time step two i now have two observations now i ask where is the time
[00:02:20] observations now i ask where is the time object at time step two
[00:02:23] object at time step two it's time step three i have three
[00:02:24] it's time step three i have three observations and now my four words
[00:02:27] observations and now my four words object at time step three
[00:02:30] object at time step three so now i could apply forward backward uh
[00:02:33] so now i could apply forward backward uh algorithm to
[00:02:34] algorithm to this scenario um
[00:02:36] this scenario um and that would work but
[00:02:38] and that would work but the problem is that there's if you have
[00:02:40] the problem is that there's if you have a setting where there's many many
[00:02:42] a setting where there's many many location values for each eye but in this
[00:02:45] location values for each eye but in this in our simple example there's only three
[00:02:47] in our simple example there's only three but in practice you're going to be a
[00:02:48] but in practice you're going to be a hundred thousand
[00:02:50] hundred thousand and in that case four backwards is going
[00:02:52] and in that case four backwards is going to be really really slow
[00:02:54] to be really really slow because its running time scales
[00:02:55] because its running time scales quadratically with the number of to have
[00:02:58] quadratically with the number of to have a hundred thousand squared which is not
[00:03:00] a hundred thousand squared which is not a nice
[00:03:02] a nice the goal of particle filtering is to
[00:03:03] the goal of particle filtering is to realize that well you have a hundred
[00:03:06] realize that well you have a hundred thousand possible values but only a very
[00:03:08] thousand possible values but only a very small fraction of them are really uh
[00:03:11] small fraction of them are really uh likely given the data out
[00:03:15] likely given the data out so to start
[00:03:17] so to start introducing particle filtering let us
[00:03:19] introducing particle filtering let us revisit beam search because structurally
[00:03:22] revisit beam search because structurally particles and beam search are analogous
[00:03:26] particles and beam search are analogous so in beam search remember the idea was
[00:03:28] so in beam search remember the idea was to keep track of at most k
[00:03:31] to keep track of at most k partial assignments which we're going to
[00:03:33] partial assignments which we're going to call particles now in the particle
[00:03:36] call particles now in the particle filtering lingo
[00:03:38] filtering lingo so beam search starts with a candidate
[00:03:41] so beam search starts with a candidate set of only one assignment which is the
[00:03:43] set of only one assignment which is the empty assignment
[00:03:44] empty assignment it's going to go through each variable
[00:03:47] it's going to go through each variable in turn from 1 to n
[00:03:49] in turn from 1 to n and now we're going to for each of these
[00:03:52] and now we're going to for each of these partial assignments for our h1 through h
[00:03:55] partial assignments for our h1 through h i minus 1
[00:03:57] i minus 1 i'm going to take that
[00:03:59] i'm going to take that and then i'm going to consider all
[00:04:00] and then i'm going to consider all possible values i can assign to hi
[00:04:03] possible values i can assign to hi i'm going to extend that assignment so
[00:04:05] i'm going to extend that assignment so now we get a bunch of assignments from
[00:04:08] now we get a bunch of assignments from h1 to hi my i
[00:04:11] h1 to hi my i and now i'm going to compute the weight
[00:04:13] and now i'm going to compute the weight of each of these candidate particles
[00:04:17] of each of these candidate particles and i'm just going to take the
[00:04:20] and i'm just going to take the a highest weight particles
[00:04:24] a highest weight particles so let's recall
[00:04:26] so let's recall beam search on this example here
[00:04:29] beam search on this example here um here we have our object tracking hmm
[00:04:33] um here we have our object tracking hmm we have the variables and all the local
[00:04:36] we have the variables and all the local conditional distributions and i'm
[00:04:38] conditional distributions and i'm observing zero two two
[00:04:41] observing zero two two beam search starts out um
[00:04:44] beam search starts out um extending to variable h1 and it produces
[00:04:48] extending to variable h1 and it produces um
[00:04:49] um zero one and two are the possible
[00:04:51] zero one and two are the possible particles with weights
[00:04:53] particles with weights um these probabilities here
[00:04:57] um these probabilities here and i prune which does nothing because k
[00:04:59] and i prune which does nothing because k is three
[00:05:00] is three and then next
[00:05:01] and then next i'm going to extend
[00:05:04] i'm going to extend to h2 so each of these particles um
[00:05:07] to h2 so each of these particles um kind of multiplies
[00:05:09] kind of multiplies into
[00:05:10] into three particles
[00:05:12] three particles and now the weight
[00:05:14] and now the weight of each of these particles is going to
[00:05:16] of each of these particles is going to now include
[00:05:18] now include the factors which are the transition
[00:05:20] the factors which are the transition into h2 and the emission maybe e2 equals
[00:05:23] into h2 and the emission maybe e2 equals two given h2
[00:05:25] two given h2 um now i'm going to
[00:05:27] um now i'm going to room
[00:05:28] room down to three
[00:05:30] down to three i'm gonna now extend to h3
[00:05:33] i'm gonna now extend to h3 and i'm going to prune
[00:05:36] and i'm going to prune um
[00:05:37] um and at the end
[00:05:38] and at the end i get
[00:05:39] i get a
[00:05:41] a set of particles so here i have zero one
[00:05:44] set of particles so here i have zero one two
[00:05:45] two um i have one
[00:05:47] um i have one uh so zero one one i also have a one two
[00:05:52] uh so zero one one i also have a one two two and each particle has
[00:05:55] two and each particle has some weight
[00:05:56] some weight normally in beam search we're interested
[00:05:58] normally in beam search we're interested in them uh
[00:06:00] in them uh presented in the context of finding max
[00:06:02] presented in the context of finding max weight assignments so in this case you
[00:06:04] weight assignments so in this case you would just return onto two
[00:06:07] would just return onto two but in the case of particle filtering
[00:06:08] but in the case of particle filtering we're interested in answering
[00:06:10] we're interested in answering um filtering queries
[00:06:12] um filtering queries so what we're going to do instead is
[00:06:14] so what we're going to do instead is we're going to normalize the weights
[00:06:16] we're going to normalize the weights over all the particles to perform get an
[00:06:19] over all the particles to perform get an approximate distribution
[00:06:21] approximate distribution over assignments
[00:06:23] over assignments and now we're going to pretend this is a
[00:06:25] and now we're going to pretend this is a joint distribution over h1 to hm given
[00:06:29] joint distribution over h1 to hm given evidence
[00:06:31] evidence and now i'm going to adjust some
[00:06:32] and now i'm going to adjust some probabilities to get any approximate
[00:06:35] probabilities to get any approximate smoothing query a filtering query you
[00:06:37] smoothing query a filtering query you know i like
[00:06:39] know i like um
[00:06:41] so
[00:06:42] so this is fine
[00:06:44] this is fine but it has two problems with it
[00:06:48] but it has two problems with it one is that
[00:06:49] one is that the extend step
[00:06:51] the extend step is slow because it requires considering
[00:06:53] is slow because it requires considering every possible value
[00:06:55] every possible value of a chi
[00:06:58] of a chi so sometimes you can be clever you don't
[00:07:00] so sometimes you can be clever you don't have to enumerate all the values in the
[00:07:02] have to enumerate all the values in the domain and um
[00:07:05] domain and um you can only uh enumerate over values
[00:07:08] you can only uh enumerate over values that are going to have positive weight
[00:07:13] that are going to have positive weight but even that could be a lot
[00:07:16] but even that could be a lot so the second problem is that
[00:07:20] so the second problem is that we are greedily taking the
[00:07:22] we are greedily taking the a particles with the highest weight
[00:07:26] a particles with the highest weight so
[00:07:27] so um this is going to as we'll see later
[00:07:29] um this is going to as we'll see later doesn't provide enough diversity
[00:07:32] doesn't provide enough diversity so
[00:07:33] so we're going to have to do something
[00:07:34] we're going to have to do something about it
[00:07:35] about it so particle filtering is going to solve
[00:07:38] so particle filtering is going to solve both of these issues
[00:07:40] both of these issues and particle filtering is going to
[00:07:41] and particle filtering is going to contain three steps to replace the
[00:07:44] contain three steps to replace the extend prune
[00:07:45] extend prune two-step procedure of beam surge
[00:07:49] two-step procedure of beam surge so propose weight and resample are the
[00:07:51] so propose weight and resample are the three steps and we're going to go
[00:07:52] three steps and we're going to go through the turn
[00:07:54] through the turn the first step is propose so in general
[00:07:57] the first step is propose so in general you should think about the set of
[00:07:59] you should think about the set of particles as an approximating a certain
[00:08:02] particles as an approximating a certain distribution
[00:08:03] distribution in general the filtering distribution is
[00:08:06] in general the filtering distribution is the probability of the variables that
[00:08:08] the probability of the variables that you are considering condition on the
[00:08:10] you are considering condition on the evidence so far and suppose we have just
[00:08:13] evidence so far and suppose we have just two particles here
[00:08:16] two particles here um zero one and one two
[00:08:19] um zero one and one two so now
[00:08:20] so now the proposed step is
[00:08:23] the proposed step is um going to take each of these particles
[00:08:26] um going to take each of these particles and i'm just going to sample
[00:08:29] and i'm just going to sample a value for h3
[00:08:31] a value for h3 the next variable
[00:08:33] the next variable given the transition distribution
[00:08:36] given the transition distribution remember was
[00:08:37] remember was up and down with probably one quarter
[00:08:39] up and down with probably one quarter and the same the same way same with
[00:08:41] and the same the same way same with probability half
[00:08:43] probability half so that is going to produce these
[00:08:45] so that is going to produce these extended articles um conditioned on the
[00:08:49] extended articles um conditioned on the same evidence
[00:08:51] same evidence so
[00:08:52] so for example i'm going to take 0 1
[00:08:56] for example i'm going to take 0 1 i'm going to
[00:08:57] i'm going to uh
[00:08:58] uh now that will produce
[00:09:00] now that will produce um each this particle with probability
[00:09:03] um each this particle with probability half
[00:09:04] half because uh i'm just keeping
[00:09:09] the the value the same here
[00:09:11] the the value the same here and i'm going to take this particle and
[00:09:14] and i'm going to take this particle and i'm going to extend it to 2 and that's
[00:09:16] i'm going to extend it to 2 and that's also going to happen with probably half
[00:09:19] also going to happen with probably half now this is a random algorithm so i
[00:09:21] now this is a random algorithm so i could have sampled from the distribution
[00:09:23] could have sampled from the distribution i could have gotten one here
[00:09:25] i could have gotten one here i could have gotten three here but let's
[00:09:27] i could have gotten three here but let's just go with one two
[00:09:32] so in the next step i'm going to
[00:09:35] so in the next step i'm going to wait
[00:09:36] wait so you should think about these
[00:09:38] so you should think about these particles
[00:09:39] particles as
[00:09:40] as a guess as to what h3 is going to be
[00:09:45] a guess as to what h3 is going to be but we need to fact check this guess
[00:09:47] but we need to fact check this guess with evidence
[00:09:48] with evidence and so the waiting part uh step is going
[00:09:52] and so the waiting part uh step is going to
[00:09:53] to assign a weight to each
[00:09:55] assign a weight to each particle
[00:09:56] particle and that weight is going to be the
[00:09:59] and that weight is going to be the probability of the new evidence i got
[00:10:02] probability of the new evidence i got conditioned on
[00:10:04] conditioned on h3 here
[00:10:07] h3 here so
[00:10:08] so this is going to produce a set of new
[00:10:10] this is going to produce a set of new particles which are weighted
[00:10:13] particles which are weighted representing the distribution
[00:10:15] representing the distribution on the
[00:10:16] on the h1 h2 h3 condition all the evidence so
[00:10:20] h1 h2 h3 condition all the evidence so far
[00:10:22] far so
[00:10:23] so let's work out this example so for the
[00:10:25] let's work out this example so for the first particle i have h3 equals one
[00:10:29] first particle i have h3 equals one so h3 equals one it's going to generate
[00:10:32] so h3 equals one it's going to generate the evidence e3 equals two with
[00:10:34] the evidence e3 equals two with probability one quarter so i have a
[00:10:37] probability one quarter so i have a weight of one quarter attached to the
[00:10:38] weight of one quarter attached to the first particle the second particle
[00:10:41] first particle the second particle um
[00:10:44] has
[00:10:47] has this should be a two here actually so
[00:10:48] this should be a two here actually so the second particle has h3 equals two
[00:10:51] the second particle has h3 equals two and i'm going to look up this table and
[00:10:54] and i'm going to look up this table and that's going to
[00:10:56] that's going to um have weight one half because the
[00:10:58] um have weight one half because the probability of generating a two given a
[00:11:00] probability of generating a two given a two is one half
[00:11:02] two is one half so i have a weight of one half on this
[00:11:05] so i have a weight of one half on this particle
[00:11:08] particle so now at this point
[00:11:10] so now at this point i have a set of particles
[00:11:13] i have a set of particles that represent
[00:11:14] that represent the
[00:11:15] the advanced um
[00:11:17] advanced um filtering distribution
[00:11:19] filtering distribution but
[00:11:20] but notice that the weights are not the same
[00:11:24] notice that the weights are not the same and some weights are small some weights
[00:11:25] and some weights are small some weights are big
[00:11:26] are big and in particular the particles with
[00:11:28] and in particular the particles with small weight
[00:11:30] small weight are kind of wasting
[00:11:32] are kind of wasting space
[00:11:33] space you should think about the k particles
[00:11:35] you should think about the k particles as kind of a limited resource for
[00:11:37] as kind of a limited resource for representing this distribution so if you
[00:11:39] representing this distribution so if you have a particle with weight 0.0001
[00:11:43] have a particle with weight 0.0001 or maybe even 0 then certainly we
[00:11:47] or maybe even 0 then certainly we shouldn't be wasting one of the valuable
[00:11:50] shouldn't be wasting one of the valuable k slots for that
[00:11:52] k slots for that value
[00:11:53] value so what we're going to do is to
[00:11:56] so what we're going to do is to reallocate our resources via resampling
[00:11:59] reallocate our resources via resampling so in the resampling step we're going to
[00:12:02] so in the resampling step we're going to normalize these weights
[00:12:04] normalize these weights and draw k samples so normalizing these
[00:12:07] and draw k samples so normalizing these weights produces
[00:12:09] weights produces this distribution one-third
[00:12:11] this distribution one-third two-thirds
[00:12:13] two-thirds and now i'm going to draw k-samples from
[00:12:16] and now i'm going to draw k-samples from that distribution so redistribute
[00:12:20] that distribution so redistribute so the
[00:12:21] so the resulting particles are still going to
[00:12:23] resulting particles are still going to represent
[00:12:24] represent the same distribution but slightly in a
[00:12:27] the same distribution but slightly in a different way without weight
[00:12:30] different way without weight so um
[00:12:32] so um i'm going to sample
[00:12:33] i'm going to sample maybe i get
[00:12:35] maybe i get 1 2 2. i got that with probably two
[00:12:37] 1 2 2. i got that with probably two thirds
[00:12:38] thirds sample again and maybe i get the same
[00:12:40] sample again and maybe i get the same particle again
[00:12:42] particle again two-thirds
[00:12:43] two-thirds now of course this is again a random out
[00:12:45] now of course this is again a random out algorithm so i could have gone the first
[00:12:47] algorithm so i could have gone the first one
[00:12:48] one and the second one or the second one the
[00:12:49] and the second one or the second one the first one or the first one the first one
[00:12:55] so
[00:12:56] so now you might wonder why are we
[00:12:58] now you might wonder why are we resampling why leave the result of the
[00:13:00] resampling why leave the result of the algorithm up to chance
[00:13:03] algorithm up to chance so to see why consider the following
[00:13:05] so to see why consider the following setting so we have a distribution over a
[00:13:08] setting so we have a distribution over a bunch of possible locations
[00:13:12] bunch of possible locations and suppose that distribution were very
[00:13:14] and suppose that distribution were very it's very close to uniform so you can
[00:13:16] it's very close to uniform so you can see
[00:13:17] see maybe you can see that there's a slight
[00:13:19] maybe you can see that there's a slight um
[00:13:20] um higher probability in the middle
[00:13:23] higher probability in the middle but it's pretty flat
[00:13:26] but it's pretty flat so now if you did beam search which
[00:13:29] so now if you did beam search which takes the k
[00:13:31] takes the k possible positions with the highest
[00:13:32] possible positions with the highest weight you would end up with this
[00:13:35] weight you would end up with this and so you would end up with all the
[00:13:37] and so you would end up with all the particles clustering around the middle
[00:13:39] particles clustering around the middle which is really not representative of
[00:13:41] which is really not representative of the distribution
[00:13:43] the distribution because you have all these positioned
[00:13:44] because you have all these positioned out here with
[00:13:46] out here with a non-negligible probability masks which
[00:13:48] a non-negligible probability masks which have no support so it's kind of like
[00:13:50] have no support so it's kind of like putting all your eggs in the same basket
[00:13:53] putting all your eggs in the same basket or the same cake baskets i guess
[00:13:56] or the same cake baskets i guess um so instead if you re-sample or sample
[00:14:00] um so instead if you re-sample or sample from this distribution k times you're
[00:14:02] from this distribution k times you're going to get something more like this
[00:14:03] going to get something more like this which i would argue is more
[00:14:05] which i would argue is more representative
[00:14:06] representative of the distribution
[00:14:10] so
[00:14:11] so in cases where
[00:14:12] in cases where most of the weight is on a few locations
[00:14:15] most of the weight is on a few locations sampling versus taking the top okay is
[00:14:18] sampling versus taking the top okay is not really a big deal
[00:14:20] not really a big deal but when there's high uncertainty as in
[00:14:22] but when there's high uncertainty as in this example
[00:14:23] this example sampling is really important because it
[00:14:26] sampling is really important because it allows you to maintain
[00:14:27] allows you to maintain uncertainty
[00:14:31] so now we're ready to present our final
[00:14:33] so now we're ready to present our final particle filtering algorithm which is
[00:14:35] particle filtering algorithm which is again structured very similar to beam
[00:14:38] again structured very similar to beam search
[00:14:39] search so like beam search we initialize with
[00:14:41] so like beam search we initialize with the empty assignment for each of these
[00:14:45] the empty assignment for each of these time steps we're going to propose wait
[00:14:48] time steps we're going to propose wait and re-sample so here we're going to
[00:14:50] and re-sample so here we're going to propose
[00:14:51] propose we're going to take each
[00:14:53] we're going to take each assignment to h1 through hiv minus 1
[00:14:57] assignment to h1 through hiv minus 1 and we're going to
[00:14:59] and we're going to look at the transition distribution and
[00:15:00] look at the transition distribution and generate
[00:15:02] generate one possible
[00:15:03] one possible uh assignment to each i and i'm just
[00:15:06] uh assignment to each i and i'm just going to take that so in b search i'm
[00:15:08] going to take that so in b search i'm going to consider all of them
[00:15:11] going to consider all of them which can result in a blow-up but in
[00:15:14] which can result in a blow-up but in particles i'm going to only look at one
[00:15:17] particles i'm going to only look at one now i'm going to weight
[00:15:19] now i'm going to weight the particles based on
[00:15:22] the particles based on the evidence which is this emission
[00:15:24] the evidence which is this emission distribution
[00:15:27] distribution and finally i'm going to redistribute my
[00:15:30] and finally i'm going to redistribute my resources by
[00:15:32] resources by normalizing this weight distribution
[00:15:36] normalizing this weight distribution and drawing
[00:15:37] and drawing a particles independently from that
[00:15:39] a particles independently from that distribution
[00:15:41] distribution okay so let's see a demo here
[00:15:45] so here i have my object tracking hmm
[00:15:49] so here i have my object tracking hmm i'm going to run particle filtering with
[00:15:52] i'm going to run particle filtering with 100 particles instead of view search
[00:15:56] 100 particles instead of view search so
[00:15:57] so i'm going to start out with
[00:16:00] i'm going to start out with extending to
[00:16:03] extending to the
[00:16:04] the just the first
[00:16:06] just the first variable
[00:16:07] variable and now i have three part well i have a
[00:16:10] and now i have three part well i have a hundred particles
[00:16:12] hundred particles but um 38 of them are zero 33 of them
[00:16:16] but um 38 of them are zero 33 of them are one and 29 of them two
[00:16:18] are one and 29 of them two and these are the weights
[00:16:20] and these are the weights i
[00:16:22] i resample and now i redistribute
[00:16:25] resample and now i redistribute probability to
[00:16:28] probability to um zero and one with these particle
[00:16:30] um zero and one with these particle counts
[00:16:32] counts so now i'm going to extend
[00:16:35] so now i'm going to extend and
[00:16:36] and notice that
[00:16:37] notice that before i had 73 particles at zero now 51
[00:16:41] before i had 73 particles at zero now 51 of them go
[00:16:42] of them go to h2 equals zero and 22 of them go to
[00:16:45] to h2 equals zero and 22 of them go to h2e1
[00:16:48] h2e1 um
[00:16:50] um and then i'm going to
[00:16:52] and then i'm going to resample
[00:16:55] resample which redistributes the particles again
[00:16:58] which redistributes the particles again now i'm going to propose and
[00:17:02] now i'm going to propose and re-weight now all the particles
[00:17:05] re-weight now all the particles are
[00:17:06] are all over the place
[00:17:08] all over the place and now i redistribute mass
[00:17:11] and now i redistribute mass so that the particles are
[00:17:13] so that the particles are used more effectively
[00:17:16] used more effectively okay so now at the end
[00:17:19] okay so now at the end i have now a
[00:17:21] i have now a 100 particles
[00:17:23] 100 particles covering all these different assignments
[00:17:25] covering all these different assignments i can simply
[00:17:27] i can simply count the fraction of them
[00:17:29] count the fraction of them for
[00:17:30] for that satisfied various values of h3 and
[00:17:32] that satisfied various values of h3 and i get my approximate filtering
[00:17:35] i get my approximate filtering distribution over h3 condition
[00:17:42] okay so there are two
[00:17:44] okay so there are two ways to make particle filtering more
[00:17:47] ways to make particle filtering more efficient
[00:17:48] efficient so particle filtering we've casted
[00:17:52] so particle filtering we've casted in terms of generating a distribution
[00:17:54] in terms of generating a distribution over
[00:17:54] over complete assignments to all the
[00:17:56] complete assignments to all the variables
[00:17:57] variables but if you're only interested in
[00:17:59] but if you're only interested in filtering queries which
[00:18:01] filtering queries which look at the last variables
[00:18:03] look at the last variables then what we can do is instead of
[00:18:06] then what we can do is instead of storing all
[00:18:08] storing all um the assignments we only we only need
[00:18:11] um the assignments we only we only need to keep the value of the last hi so it's
[00:18:15] to keep the value of the last hi so it's i'm going to only look at
[00:18:17] i'm going to only look at h3
[00:18:19] h3 because this is sufficient to
[00:18:21] because this is sufficient to continue the algorithm forward
[00:18:25] continue the algorithm forward and furthermore
[00:18:27] and furthermore if you have
[00:18:29] if you have are multiple particles that have the
[00:18:31] are multiple particles that have the same value you can actually just store
[00:18:33] same value you can actually just store the counts as we saw in the demo one
[00:18:36] the counts as we saw in the demo one occurs twice and two occurs three
[00:18:40] times now let's visualize particle
[00:18:43] times now let's visualize particle filtering in a more realistic
[00:18:45] filtering in a more realistic interactive object tracking setting
[00:18:48] interactive object tracking setting okay so here
[00:18:50] okay so here we have a grid
[00:18:52] we have a grid um and we have an object that's going to
[00:18:54] um and we have an object that's going to be moving in this grid
[00:18:56] be moving in this grid where we're trying to determine its
[00:18:58] where we're trying to determine its location
[00:18:59] location so
[00:19:01] so the hmm is going to put
[00:19:03] the hmm is going to put have a transition distribution that
[00:19:06] have a transition distribution that places a uniform distribution over
[00:19:08] places a uniform distribution over moving north south east west or staying
[00:19:12] moving north south east west or staying put
[00:19:13] put and the emission distribution is going
[00:19:15] and the emission distribution is going to put a uniform distribution over
[00:19:17] to put a uniform distribution over locations that are within
[00:19:19] locations that are within three steps either vertical or
[00:19:21] three steps either vertical or horizontally so you can kind of see this
[00:19:24] horizontally so you can kind of see this definition of this emission distribution
[00:19:26] definition of this emission distribution which only depends on the
[00:19:28] which only depends on the x-distance and the y-distance and it's
[00:19:31] x-distance and the y-distance and it's going to look at
[00:19:32] going to look at um a uniform distribution over
[00:19:35] um a uniform distribution over basically a box
[00:19:38] basically a box okay so if i hit ctrl enter here we can
[00:19:42] okay so if i hit ctrl enter here we can see the observations they're very noisy
[00:19:46] see the observations they're very noisy and we're trying to guess where the
[00:19:47] and we're trying to guess where the object is so i don't know where
[00:19:50] object is so i don't know where somewhere um
[00:19:53] somewhere um so what we're going to do is we're going
[00:19:56] so what we're going to do is we're going to
[00:19:58] to run particle filtering let's say we have
[00:20:00] run particle filtering let's say we have uh uh 10 000 particles
[00:20:04] uh uh 10 000 particles we hit control enter again and now what
[00:20:06] we hit control enter again and now what we're going to see
[00:20:08] we're going to see is um a red blob
[00:20:11] is um a red blob and this represents where the particles
[00:20:14] and this represents where the particles are
[00:20:14] are with intensity
[00:20:16] with intensity of the particle representing the number
[00:20:19] of the particle representing the number of particles at that particular
[00:20:22] of particles at that particular location
[00:20:24] location so you'll see that
[00:20:26] so you'll see that this is our kind of best guess of where
[00:20:29] this is our kind of best guess of where the object
[00:20:30] the object is
[00:20:32] is okay so you can see how well we're doing
[00:20:34] okay so you can see how well we're doing by
[00:20:35] by showing the true position
[00:20:37] showing the true position so let's see where this object actually
[00:20:39] so let's see where this object actually is and we'll see that we're tracking it
[00:20:42] is and we'll see that we're tracking it you know rather well
[00:20:44] you know rather well um sometimes i think you'll notice that
[00:20:47] um sometimes i think you'll notice that it might mess up but on the whole it's
[00:20:50] it might mess up but on the whole it's it's pretty good
[00:20:52] it's pretty good so also notice that um
[00:20:55] so also notice that um you know
[00:20:56] you know the
[00:20:57] the the red blob where it thinks the object
[00:20:59] the red blob where it thinks the object is is not fooled by where the
[00:21:02] is is not fooled by where the observation is because there's enough
[00:21:04] observation is because there's enough noise here
[00:21:05] noise here um that
[00:21:06] um that um what
[00:21:08] um what a modeling
[00:21:09] a modeling the particle filter is doing is that
[00:21:12] the particle filter is doing is that it's essentially kind of smoothing out
[00:21:14] it's essentially kind of smoothing out the noise the noise is jumping around a
[00:21:17] the noise the noise is jumping around a lot but it's kind of tracking and knows
[00:21:19] lot but it's kind of tracking and knows that the object can't be teleporting and
[00:21:22] that the object can't be teleporting and it's moving um only
[00:21:24] it's moving um only by one step at most each time step
[00:21:29] so you can play with this demo a bit
[00:21:32] so you can play with this demo a bit more um
[00:21:33] more um we also have implemented um instead of
[00:21:36] we also have implemented um instead of this box noise you have gaussian noise
[00:21:39] this box noise you have gaussian noise which looks uh kind of similar
[00:21:41] which looks uh kind of similar more of a spherical blob um you can also
[00:21:44] more of a spherical blob um you can also play with it there's kind of really
[00:21:46] play with it there's kind of really weird looking noise which
[00:21:49] weird looking noise which places uniform distributions over all
[00:21:51] places uniform distributions over all positions on this uh
[00:21:53] positions on this uh lattice that have a certain kind of
[00:21:57] lattice that have a certain kind of parody
[00:22:01] okay so in summary we've presented the
[00:22:04] okay so in summary we've presented the particle filtering algorithm which
[00:22:06] particle filtering algorithm which allows us to
[00:22:08] allows us to answer
[00:22:09] answer filtering questions of the following
[00:22:12] filtering questions of the following where is this object at a particular
[00:22:14] where is this object at a particular time step given
[00:22:15] time step given evidence so far
[00:22:18] evidence so far and the key idea is using particles to
[00:22:21] and the key idea is using particles to represent
[00:22:22] represent this approximate distribution
[00:22:26] this approximate distribution so remember particle filter mean uh has
[00:22:30] so remember particle filter mean uh has three steps which
[00:22:32] three steps which is used to advance
[00:22:34] is used to advance um the set of particles first we propose
[00:22:38] um the set of particles first we propose when we take article and
[00:22:40] when we take article and transition them to the next time step
[00:22:44] transition them to the next time step we uh this is a guess of where the
[00:22:47] we uh this is a guess of where the object is going to be at the next time
[00:22:48] object is going to be at the next time step
[00:22:49] step then we're going to
[00:22:52] then we're going to fact check our gas by re-weighting the
[00:22:55] fact check our gas by re-weighting the particles based on the mission
[00:22:57] particles based on the mission distribution of what we actually saw
[00:23:02] distribution of what we actually saw and then we're going to reallocate our
[00:23:04] and then we're going to reallocate our resources by resampling and this will
[00:23:07] resources by resampling and this will allow the particles to occupy
[00:23:10] allow the particles to occupy the regions with a higher weight
[00:23:14] the regions with a higher weight so unlike forward backward algorithm
[00:23:16] so unlike forward backward algorithm this particle filtering allows us to
[00:23:18] this particle filtering allows us to scale up to cases where there are large
[00:23:20] scale up to cases where there are large number of locations
[00:23:22] number of locations and also unlike a beam search
[00:23:26] and also unlike a beam search it allows us to maintain
[00:23:28] it allows us to maintain a better particle diversity especially
[00:23:31] a better particle diversity especially in situations where the distribution is
[00:23:33] in situations where the distribution is close to uniform
[00:23:36] close to uniform so now particle filtering is also called
[00:23:38] so now particle filtering is also called a sequential monte carlo and there's
[00:23:41] a sequential monte carlo and there's many many more sophisticated extensions
[00:23:44] many many more sophisticated extensions that i haven't covered in particular
[00:23:46] that i haven't covered in particular particle filtering works for general
[00:23:48] particle filtering works for general factor graphs not just hidden markup
[00:23:50] factor graphs not just hidden markup models and i encourage you to read up
[00:23:53] models and i encourage you to read up and learn more about it
[00:23:55] and learn more about it that's all


================================================================================
LECTURE 038
================================================================================

Bayesian Networks 7 - Supervised Learning | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=_rbDjsJTgm8

---

Transcript

[00:00:05] so far we've introduced bayesian
[00:00:07] so far we've introduced bayesian networks and talked about how to perform
[00:00:09] networks and talked about how to perform inference in them in this module we'll
[00:00:11] inference in them in this module we'll turn to the question of how to learn
[00:00:13] turn to the question of how to learn them from
[00:00:15] them from so recall that a bayesian network
[00:00:17] so recall that a bayesian network consists of a set of random variables
[00:00:19] consists of a set of random variables for example
[00:00:20] for example of cold allergies
[00:00:22] of cold allergies off
[00:00:23] off and itchy eyes
[00:00:25] and itchy eyes the bayesian network also comes equipped
[00:00:27] the bayesian network also comes equipped with a dag specifying the qualitative
[00:00:30] with a dag specifying the qualitative relationships between all these
[00:00:33] relationships between all these different variables
[00:00:35] different variables quantitatively however the bayesian
[00:00:37] quantitatively however the bayesian network defines
[00:00:39] network defines a set of local conditional distributions
[00:00:41] a set of local conditional distributions over each variable x i given the parents
[00:00:45] over each variable x i given the parents of i
[00:00:48] and in this example we would have
[00:00:50] and in this example we would have probability of c given parents which are
[00:00:52] probability of c given parents which are none probability of a probability of h
[00:00:55] none probability of a probability of h given its two parents c and a and
[00:00:57] given its two parents c and a and probability of i given a
[00:01:01] probability of i given a so finally if we multiply all of these
[00:01:04] so finally if we multiply all of these probability distributions together
[00:01:07] probability distributions together then we get
[00:01:08] then we get what is the joint distribution all
[00:01:11] what is the joint distribution all random variables in this case we have a
[00:01:15] random variables in this case we have a c
[00:01:16] c a
[00:01:17] a h
[00:01:18] h and i
[00:01:22] then there's a question of how you do
[00:01:24] then there's a question of how you do inference in asian networks so inference
[00:01:27] inference in asian networks so inference remember you're given the bayesian
[00:01:28] remember you're given the bayesian network you're given some evidence that
[00:01:31] network you're given some evidence that you observe for example h and i equals
[00:01:34] you observe for example h and i equals one and one over a subset of the
[00:01:36] one and one over a subset of the variables and then you're given a query
[00:01:38] variables and then you're given a query variable which is something that you're
[00:01:40] variable which is something that you're interested in let's say you
[00:01:42] interested in let's say you hold
[00:01:43] hold and the inference algorithm is going to
[00:01:45] and the inference algorithm is going to produce
[00:01:46] produce a distribution over your query variables
[00:01:48] a distribution over your query variables condition on evidence
[00:01:50] condition on evidence so for every possible setting of the
[00:01:53] so for every possible setting of the query variable we have a probability
[00:01:56] query variable we have a probability so we saw many ways of doing this
[00:01:59] so we saw many ways of doing this including manually by exhaustive
[00:02:01] including manually by exhaustive enumeration
[00:02:03] enumeration we can convert major networks into
[00:02:04] we can convert major networks into markup networks and do give sampling and
[00:02:07] markup networks and do give sampling and then for hmms we have specialized
[00:02:09] then for hmms we have specialized techniques such as the forward backward
[00:02:11] techniques such as the forward backward algorithm and particle filtering
[00:02:15] so inference assumes that all these
[00:02:17] so inference assumes that all these local conditional distributions are
[00:02:19] local conditional distributions are known
[00:02:20] known but the big question is where did all
[00:02:23] but the big question is where did all these come from
[00:02:25] these come from so all these numbers are called the
[00:02:27] so all these numbers are called the parameters of the bayesian network the
[00:02:30] parameters of the bayesian network the red question marks and in general we
[00:02:33] red question marks and in general we might not know what they are
[00:02:36] might not know what they are so let's try to learn them so again in
[00:02:39] so let's try to learn them so again in all learning tasks we start with the
[00:02:41] all learning tasks we start with the data so in this case the training data
[00:02:44] data so in this case the training data is going to include
[00:02:46] is going to include examples where each example is a
[00:02:48] examples where each example is a complete assignment to x so this is the
[00:02:50] complete assignment to x so this is the fully supervised setting which is the
[00:02:52] fully supervised setting which is the simplest one to start out with
[00:02:55] simplest one to start out with and then the learning algorithm is going
[00:02:56] and then the learning algorithm is going to produce parameters and the parameters
[00:02:59] to produce parameters and the parameters are exactly all these red question marks
[00:03:01] are exactly all these red question marks these are all the local conditional
[00:03:03] these are all the local conditional probabilities
[00:03:06] probabilities so
[00:03:07] so we're going to go through a bunch of
[00:03:09] we're going to go through a bunch of examples and then later show a general
[00:03:11] examples and then later show a general principle that ties all of them together
[00:03:14] principle that ties all of them together so you might be feeling a little bit um
[00:03:18] so you might be feeling a little bit um that this this might be very challenging
[00:03:20] that this this might be very challenging because probabilistic inference assumes
[00:03:22] because probabilistic inference assumes you know the parameters and it was
[00:03:24] you know the parameters and it was already pretty hard both computationally
[00:03:27] already pretty hard both computationally and perhaps conceptually even
[00:03:29] and perhaps conceptually even but it turns out that for bayesian
[00:03:31] but it turns out that for bayesian networks at least somewhat surprising if
[00:03:34] networks at least somewhat surprising if you're learning under fully supervised
[00:03:36] you're learning under fully supervised data learning actually turns out to be
[00:03:39] data learning actually turns out to be surprisingly easy
[00:03:42] so let's begin
[00:03:44] so let's begin so suppose you're developing vision
[00:03:46] so suppose you're developing vision networks to model how
[00:03:48] networks to model how people rate movies
[00:03:50] people rate movies so um let's start with the world's
[00:03:52] so um let's start with the world's simplest page network which has one
[00:03:54] simplest page network which has one variable r which represents the rating
[00:03:56] variable r which represents the rating of a movie so the joint distribution is
[00:03:58] of a movie so the joint distribution is just p r in this case
[00:04:00] just p r in this case the movie rating can be one two five
[00:04:04] the movie rating can be one two five so first we have to
[00:04:05] so first we have to identify what the parameters are so the
[00:04:07] identify what the parameters are so the parameters here theta is just the
[00:04:09] parameters here theta is just the probability of one probability of two
[00:04:11] probability of one probability of two probably three part of a four
[00:04:13] probably three part of a four probability of five
[00:04:14] probability of five there are five parameters and if you're
[00:04:17] there are five parameters and if you're a little bit clever you only need four
[00:04:18] a little bit clever you only need four of them because the five numbers have to
[00:04:20] of them because the five numbers have to sum to one but for the sake of
[00:04:23] sum to one but for the sake of simplicity let's just say that there's
[00:04:25] simplicity let's just say that there's five parameters
[00:04:27] five parameters okay and now you're given some training
[00:04:29] okay and now you're given some training data
[00:04:30] data some ratings from users you have a one
[00:04:32] some ratings from users you have a one you have a three i have a bunch of fours
[00:04:34] you have a three i have a bunch of fours and three fives
[00:04:36] and three fives and now the question is how do you
[00:04:38] and now the question is how do you estimate the parameters given the
[00:04:40] estimate the parameters given the training data
[00:04:42] training data let's just follow our notes here well
[00:04:44] let's just follow our notes here well intuitively you would think that the
[00:04:46] intuitively you would think that the probability of a rating
[00:04:48] probability of a rating is a proportional to the number of
[00:04:50] is a proportional to the number of occurrences
[00:04:51] occurrences of that particular rating in the
[00:04:52] of that particular rating in the training data
[00:04:53] training data so now this is just intuition um
[00:04:56] so now this is just intuition um it might be a good thing or it might not
[00:04:58] it might be a good thing or it might not be a good thing well let's find out
[00:05:00] be a good thing well let's find out later but let's just go with that for
[00:05:01] later but let's just go with that for now
[00:05:03] now so here's the training data and what i'm
[00:05:05] so here's the training data and what i'm going to do
[00:05:06] going to do um is the parameters are a
[00:05:10] um is the parameters are a probability table so we're going to see
[00:05:12] probability table so we're going to see a lot of these over the course of the
[00:05:13] a lot of these over the course of the next few slides so for every rainy i'm
[00:05:15] next few slides so for every rainy i'm going to count number of times it shows
[00:05:17] going to count number of times it shows up 1 shows up once
[00:05:19] up 1 shows up once 2 shows up 0 times 3 shows up once
[00:05:23] 2 shows up 0 times 3 shows up once four shows up five times
[00:05:26] four shows up five times and five shows up three times
[00:05:29] and five shows up three times and now i'm just going to
[00:05:30] and now i'm just going to sum up all the counts that gives me 10
[00:05:33] sum up all the counts that gives me 10 i'm going to normalize to get my
[00:05:35] i'm going to normalize to get my probabilities
[00:05:36] probabilities and that's the probabilistment that's it
[00:05:40] and that's the probabilistment that's it count and normalize
[00:05:41] count and normalize okay so let's um
[00:05:44] okay so let's um level up a little bit and talk about two
[00:05:46] level up a little bit and talk about two variables suppose that
[00:05:48] variables suppose that um now the rating is governed by the
[00:05:51] um now the rating is governed by the genre so in particular the asian network
[00:05:53] genre so in particular the asian network is you first generate the genre and then
[00:05:55] is you first generate the genre and then you generate the rating given genre
[00:05:58] you generate the rating given genre so now there's um
[00:06:01] so now there's um the parameters of this bayesian network
[00:06:03] the parameters of this bayesian network includes both the probability of the
[00:06:05] includes both the probability of the genre which can two parameters and the
[00:06:07] genre which can two parameters and the probability of rating given genre which
[00:06:10] probability of rating given genre which includes
[00:06:11] includes two times five parameters so 10
[00:06:13] two times five parameters so 10 parameters for a total of 12 parameters
[00:06:16] parameters for a total of 12 parameters again if you're being clever you can get
[00:06:17] again if you're being clever you can get that down to nine
[00:06:19] that down to nine so now we're giving some training data
[00:06:22] so now we're giving some training data so we have um each training a point
[00:06:26] so we have um each training a point remember is a full assignment all the
[00:06:28] remember is a full assignment all the variables so we have our g equals d and
[00:06:31] variables so we have our g equals d and r equals four here
[00:06:35] r equals four here so now
[00:06:36] so now how do we estimate the parameters given
[00:06:39] how do we estimate the parameters given this more complicated
[00:06:41] this more complicated network
[00:06:42] network so following our nodes again there's
[00:06:44] so following our nodes again there's intuitive strategy is that we're just
[00:06:46] intuitive strategy is that we're just going to estimate each local conditional
[00:06:48] going to estimate each local conditional distribution separately and see what
[00:06:50] distribution separately and see what happens okay so what does that mean that
[00:06:53] happens okay so what does that mean that means for
[00:06:54] means for probability of
[00:06:56] probability of g
[00:06:57] g i'm just going to count the number of
[00:06:58] i'm just going to count the number of times
[00:06:59] times particular values of g show up so d
[00:07:01] particular values of g show up so d shows up uh one two three times
[00:07:05] shows up uh one two three times and c shows up twice so notice that this
[00:07:08] and c shows up twice so notice that this is the kind of same calculation as we
[00:07:11] is the kind of same calculation as we had before
[00:07:12] had before so now this is three fits and two fits
[00:07:15] so now this is three fits and two fits if you sum up and normalize
[00:07:19] if you sum up and normalize okay so in estimating p of g i simply
[00:07:22] okay so in estimating p of g i simply only look at the slice of the examples
[00:07:25] only look at the slice of the examples that matter for this
[00:07:27] that matter for this and same with a
[00:07:28] and same with a probability of
[00:07:31] probability of r given g so now i'm going to look at
[00:07:34] r given g so now i'm going to look at all the possible
[00:07:36] all the possible assignments to
[00:07:38] assignments to the parent of a particular
[00:07:40] the parent of a particular a node and also that node so that's a g
[00:07:43] a node and also that node so that's a g and r so
[00:07:44] and r so d4
[00:07:45] d4 shows up twice
[00:07:48] shows up twice e5 shows up once c1 shows up once and c5
[00:07:52] e5 shows up once c1 shows up once and c5 shows up once
[00:07:53] shows up once now i count and normalize and i get my
[00:07:56] now i count and normalize and i get my probability estimate of r
[00:07:58] probability estimate of r and g
[00:08:00] and g okay so far so good so
[00:08:02] okay so far so good so in summary
[00:08:03] in summary consider each local conditional
[00:08:05] consider each local conditional distribution separately and then count
[00:08:08] distribution separately and then count based on the slice of the data that's
[00:08:10] based on the slice of the data that's matters and normalize
[00:08:13] matters and normalize so now let's um consider three variables
[00:08:17] so now let's um consider three variables um so we have a genre whether the movie
[00:08:20] um so we have a genre whether the movie won an award or not and the rating
[00:08:22] won an award or not and the rating so here we have a genre and whether it
[00:08:25] so here we have a genre and whether it won award influencing whether um how
[00:08:28] won award influencing whether um how well the movie is rated joint
[00:08:30] well the movie is rated joint distribution is p of g p of a p of r
[00:08:33] distribution is p of g p of a p of r given g n a
[00:08:35] given g n a so now we have
[00:08:37] so now we have local conditional distributions for each
[00:08:40] local conditional distributions for each of these
[00:08:42] of these factors here
[00:08:44] factors here so remember that v structures
[00:08:47] so remember that v structures this type of structure was really
[00:08:49] this type of structure was really special in bayesian networks it gives
[00:08:52] special in bayesian networks it gives rise to explaining away
[00:08:53] rise to explaining away it's a thing that if you marginalize
[00:08:55] it's a thing that if you marginalize unobserved leaves you render things come
[00:08:58] unobserved leaves you render things come independent
[00:08:59] independent and it was really a hallmark of visual
[00:09:01] and it was really a hallmark of visual but from a perspective of learning
[00:09:03] but from a perspective of learning there's really nothing special here
[00:09:06] there's really nothing special here and uh to see this what we're gonna do
[00:09:09] and uh to see this what we're gonna do is just um you know suppose we have some
[00:09:12] is just um you know suppose we have some training data which concludes
[00:09:14] training data which concludes assignments to all three variables
[00:09:16] assignments to all three variables um we're just going to count and
[00:09:17] um we're just going to count and normalize again okay so
[00:09:20] normalize again okay so here
[00:09:21] here um we're going to start with p of g
[00:09:24] um we're going to start with p of g this is exactly the same thing as before
[00:09:26] this is exactly the same thing as before we just look at only the genre
[00:09:28] we just look at only the genre and then we're going to look at um
[00:09:32] and then we're going to look at um here a which is analogously looking at
[00:09:35] here a which is analogously looking at only
[00:09:37] only 0 1 0
[00:09:40] 0 1 0 1
[00:09:41] 1 and counting and normalizing and now the
[00:09:44] and counting and normalizing and now the the big local conditional distribution
[00:09:47] the big local conditional distribution is p of r
[00:09:48] is p of r given g and a so here i'm going to look
[00:09:52] given g and a so here i'm going to look at
[00:09:53] at the parents of r
[00:09:55] the parents of r and r itself i'm going to count the
[00:09:57] and r itself i'm going to count the number of times this local configuration
[00:10:00] number of times this local configuration happens so i have d01
[00:10:02] happens so i have d01 showing up once uh d03
[00:10:06] showing up once uh d03 um
[00:10:07] um going up once
[00:10:08] going up once um
[00:10:10] um and d15 showing up once
[00:10:13] and d15 showing up once and each of these showing once and now
[00:10:16] and each of these showing once and now now i want to normalize so i have to be
[00:10:18] now i want to normalize so i have to be a little bit careful i don't want to add
[00:10:20] a little bit careful i don't want to add all these numbers and normalize because
[00:10:22] all these numbers and normalize because this is
[00:10:23] this is um
[00:10:24] um conditioned on
[00:10:26] conditioned on g and a
[00:10:28] g and a so that means for every
[00:10:30] so that means for every uh
[00:10:31] uh g and setting of g and a i have my a
[00:10:34] g and setting of g and a i have my a distribution over r
[00:10:36] distribution over r i'm gonna look at d zero
[00:10:39] i'm gonna look at d zero so i have uh r um one occurrence of r
[00:10:42] so i have uh r um one occurrence of r equals one and one occurrence of r
[00:10:44] equals one and one occurrence of r equals three so that's
[00:10:46] equals three so that's if i normalize that it's going to give
[00:10:48] if i normalize that it's going to give me a half and a half
[00:10:50] me a half and a half and now for
[00:10:51] and now for this setting g and a i only have one
[00:10:55] this setting g and a i only have one possibility of r so that has probability
[00:10:58] possibility of r so that has probability one and same for these other ones
[00:11:02] one and same for these other ones so again everything is count and
[00:11:04] so again everything is count and normalize where you have to pay
[00:11:07] normalize where you have to pay attention to what you're normalizing
[00:11:09] attention to what you're normalizing over you're only normalizing over
[00:11:11] over you're only normalizing over possible
[00:11:13] possible values of r
[00:11:15] values of r not g and a
[00:11:19] so one thing you might note is that
[00:11:22] so one thing you might note is that a lot of these probabilities are
[00:11:24] a lot of these probabilities are one and the probabilities that are not
[00:11:26] one and the probabilities that are not mentioned here are zero so you might
[00:11:29] mentioned here are zero so you might wonder that if this is a good estimate
[00:11:31] wonder that if this is a good estimate but we'll come back to that later
[00:11:34] but we'll come back to that later so um now let's invert the b structure
[00:11:37] so um now let's invert the b structure let's look at a different
[00:11:39] let's look at a different structure so we have the genre and
[00:11:41] structure so we have the genre and suppose we have two people jim and
[00:11:42] suppose we have two people jim and martha and they're both going to rate
[00:11:45] martha and they're both going to rate this movie and both of them rated
[00:11:47] this movie and both of them rated depending on the genre so g generates r1
[00:11:50] depending on the genre so g generates r1 and also generates r2
[00:11:55] so now we have this three uh node page
[00:11:58] so now we have this three uh node page network
[00:11:59] network and um the estimation is going to
[00:12:03] and um the estimation is going to be the same i'll just go through it very
[00:12:05] be the same i'll just go through it very quickly so we have parameters one for
[00:12:08] quickly so we have parameters one for every variable here
[00:12:10] every variable here and so
[00:12:12] and so probability of
[00:12:14] probability of g
[00:12:15] g is count to normalize
[00:12:17] is count to normalize probability of
[00:12:19] probability of r1
[00:12:20] r1 given g
[00:12:21] given g is you count and normalize again
[00:12:23] is you count and normalize again remember that
[00:12:25] remember that i'm normalizing over
[00:12:27] i'm normalizing over possible values of g so you can
[00:12:29] possible values of g so you can partition the rows based on the value of
[00:12:31] partition the rows based on the value of g so here i have
[00:12:33] g so here i have two and one and i'm normalizing two
[00:12:35] two and one and i'm normalizing two thirds and one thirds and g equals c is
[00:12:38] thirds and one thirds and g equals c is just handled uh separately
[00:12:40] just handled uh separately in a separate normalization
[00:12:43] in a separate normalization and then um
[00:12:44] and then um [Music]
[00:12:45] [Music] r2 given g is analogous so i'm not going
[00:12:48] r2 given g is analogous so i'm not going to go over this
[00:12:51] so this is fine um
[00:12:54] so this is fine um except for what i'm going to now do is
[00:12:58] except for what i'm going to now do is think
[00:12:58] think about the setting where suppose you have
[00:13:01] about the setting where suppose you have not just two users but a thousand users
[00:13:04] not just two users but a thousand users or a million users now you might be a
[00:13:07] or a million users now you might be a little bit worried because now for every
[00:13:10] little bit worried because now for every user you might have to
[00:13:12] user you might have to have its own have their own
[00:13:15] have its own have their own local conditional distribution
[00:13:18] local conditional distribution and the number of parameters might just
[00:13:20] and the number of parameters might just blow up which means that um estimation
[00:13:23] blow up which means that um estimation might be hard especially for new users
[00:13:26] might be hard especially for new users so we're going to consider a slightly
[00:13:28] so we're going to consider a slightly different um
[00:13:30] different um it's going to be the same
[00:13:32] it's going to be the same bayesian network here
[00:13:35] bayesian network here but
[00:13:36] but the the parameters are different
[00:13:39] the the parameters are different in particular i'm going to consider
[00:13:41] in particular i'm going to consider um a single parameter of r given g
[00:13:46] um a single parameter of r given g instead of having p r 1 and p of r 2.
[00:13:50] instead of having p r 1 and p of r 2. so now how do i estimate
[00:13:53] so now how do i estimate distribution of this model
[00:13:56] distribution of this model so
[00:13:58] so let's begin so
[00:13:59] let's begin so probability of g is the same as before
[00:14:02] probability of g is the same as before and now the probability of r given g
[00:14:06] and now the probability of r given g i'm just going to count the number of
[00:14:08] i'm just going to count the number of times
[00:14:10] times a particular local configuration shows
[00:14:13] a particular local configuration shows up
[00:14:13] up either where r is r1 or r2 so d3 shows
[00:14:18] either where r is r1 or r2 so d3 shows up
[00:14:19] up once here
[00:14:21] once here d4 shows up
[00:14:24] d4 shows up three times you have one
[00:14:27] three times you have one and
[00:14:28] and two and three so notice i'm counting
[00:14:31] two and three so notice i'm counting both uh currencies of r1 and r2
[00:14:35] both uh currencies of r1 and r2 and d5 shows up twice um
[00:14:38] and d5 shows up twice um here
[00:14:39] here with r1 and here with r2
[00:14:42] with r1 and here with r2 um c1 shows up once a c2 shows up one c
[00:14:46] um c1 shows up once a c2 shows up one c or shows up once and c5 shows up once as
[00:14:49] or shows up once and c5 shows up once as well
[00:14:50] well um now
[00:14:51] um now i just count and normalize so i look at
[00:14:54] i just count and normalize so i look at all the
[00:14:55] all the e's and i count from sum and normalize
[00:14:59] e's and i count from sum and normalize and i look at all the of c's
[00:15:02] and i look at all the of c's i count normalize
[00:15:04] i count normalize okay so when i have only one
[00:15:07] okay so when i have only one distribution
[00:15:09] distribution that is responsible for two nodes i
[00:15:11] that is responsible for two nodes i simply aggregate their counts and
[00:15:13] simply aggregate their counts and normalize
[00:15:16] so this is
[00:15:19] so this is an important slide so the more general
[00:15:21] an important slide so the more general idea that i want to highlight is this
[00:15:23] idea that i want to highlight is this idea of parameter sharing and base
[00:15:25] idea of parameter sharing and base networks
[00:15:26] networks and this happens when the local
[00:15:27] and this happens when the local conditional distributions over different
[00:15:29] conditional distributions over different variables
[00:15:30] variables are actually
[00:15:32] are actually the same
[00:15:33] the same and to be very precise about that i want
[00:15:36] and to be very precise about that i want you to look at the following picture
[00:15:38] you to look at the following picture so we have g
[00:15:40] so we have g r1
[00:15:41] r1 and r2
[00:15:44] and r2 um so
[00:15:45] um so so far
[00:15:47] so far we've looked at bayesian networks
[00:15:49] we've looked at bayesian networks through the lens of inference where we
[00:15:51] through the lens of inference where we know that every variable comes with
[00:15:54] know that every variable comes with a local
[00:15:55] a local conditional distribution but we didn't
[00:15:57] conditional distribution but we didn't worry about where that
[00:15:59] worry about where that came from it was just there
[00:16:01] came from it was just there but now for learning it matters where it
[00:16:04] but now for learning it matters where it came from
[00:16:06] came from so
[00:16:06] so what we should think about is each of
[00:16:08] what we should think about is each of these variables being powered
[00:16:11] these variables being powered by a local conditional distribution so g
[00:16:14] by a local conditional distribution so g is powered by this table here
[00:16:17] is powered by this table here r1 is powered by this table
[00:16:20] r1 is powered by this table and in the case of parameter sharing r2
[00:16:22] and in the case of parameter sharing r2 is also powered by this table
[00:16:25] is also powered by this table so we have a bayesian network and behind
[00:16:27] so we have a bayesian network and behind the scenes you should think about all
[00:16:28] the scenes you should think about all these tables which have arrows kind of
[00:16:31] these tables which have arrows kind of hooking up and providing juice to each
[00:16:34] hooking up and providing juice to each of these variables
[00:16:36] of these variables and now
[00:16:37] and now if you didn't have parameter sharing
[00:16:39] if you didn't have parameter sharing then r1 and r2 would be powered by
[00:16:42] then r1 and r2 would be powered by different tables
[00:16:44] different tables now this is an important point when
[00:16:46] now this is an important point when we're doing inference
[00:16:48] we're doing inference you should think about that as reading
[00:16:50] you should think about that as reading from the parameters and when you're
[00:16:52] from the parameters and when you're reading you don't care whether
[00:16:54] reading you don't care whether um you have two copies of something or
[00:16:57] um you have two copies of something or one copy of something because you're
[00:16:58] one copy of something because you're getting the same thing
[00:17:00] getting the same thing but in learning
[00:17:01] but in learning we're writing to the parameters from the
[00:17:04] we're writing to the parameters from the observed variables
[00:17:06] observed variables in that case
[00:17:08] in that case you need to worry about whether you're
[00:17:09] you need to worry about whether you're writing to one
[00:17:11] writing to one a memory location or two memory
[00:17:12] a memory location or two memory locations so the right analogy is you
[00:17:15] locations so the right analogy is you think about the programming you have
[00:17:16] think about the programming you have pass by reference or passed by value
[00:17:19] pass by reference or passed by value and in parameter sharing we're passing
[00:17:21] and in parameter sharing we're passing by reference so we're passing this
[00:17:23] by reference so we're passing this parameter into each of these nodes and
[00:17:26] parameter into each of these nodes and when we do learning we write back into
[00:17:28] when we do learning we write back into those parameters and it matters whether
[00:17:31] those parameters and it matters whether they're the same
[00:17:33] they're the same they're uh or not
[00:17:36] so
[00:17:38] so when would you do parameter sharing like
[00:17:40] when would you do parameter sharing like this
[00:17:40] this well it's a trade-off and it's
[00:17:42] well it's a trade-off and it's ultimately a modeling decision so by
[00:17:44] ultimately a modeling decision so by doing this
[00:17:45] doing this you aggregate your data which means that
[00:17:47] you aggregate your data which means that you have more data for parameter which
[00:17:50] you have more data for parameter which allows you to get more reliable
[00:17:51] allows you to get more reliable estimates on the other hand you end up
[00:17:54] estimates on the other hand you end up with less expressive models
[00:17:56] with less expressive models for example if you had
[00:17:58] for example if you had a lot of users you might lose ability to
[00:18:00] a lot of users you might lose ability to personalize if you want to share and
[00:18:02] personalize if you want to share and there's obviously many of um
[00:18:06] intermediate points as well which we
[00:18:08] intermediate points as well which we won't get in
[00:18:11] so let's look at some other bayesian
[00:18:14] so let's look at some other bayesian networks with parameter sharing so
[00:18:17] networks with parameter sharing so we already looked at naive bayes before
[00:18:19] we already looked at naive bayes before but just to anchor it in this notation
[00:18:23] but just to anchor it in this notation we have some let's say we have a genre
[00:18:25] we have some let's say we have a genre and we have a movie review
[00:18:26] and we have a movie review um and we have a asian network which
[00:18:31] um and we have a asian network which generates each word
[00:18:33] generates each word independently conditioned on
[00:18:36] independently conditioned on the
[00:18:37] the genre
[00:18:38] genre and so the joint distribution over
[00:18:41] and so the joint distribution over everything is equal to probability of
[00:18:44] everything is equal to probability of under genre of y times um for each word
[00:18:48] under genre of y times um for each word the probability word given of a
[00:18:51] the probability word given of a particular word given y
[00:18:54] particular word given y so the parameters of this page network
[00:18:56] so the parameters of this page network are the genre and p word
[00:19:00] are the genre and p word so now you can think about doing a
[00:19:02] so now you can think about doing a little exercise of how many parameters
[00:19:04] little exercise of how many parameters are there
[00:19:06] are there um
[00:19:06] um so you look at theta into p genre well
[00:19:10] so you look at theta into p genre well that's two parameters because two genres
[00:19:12] that's two parameters because two genres keyword that's two times um the number
[00:19:16] keyword that's two times um the number of
[00:19:17] of words
[00:19:19] words that the number of values the wi can
[00:19:21] that the number of values the wi can take on
[00:19:23] take on and so
[00:19:24] and so that's it
[00:19:25] that's it um so
[00:19:27] um so notice importantly that the number of
[00:19:29] notice importantly that the number of parameters does not
[00:19:31] parameters does not grow with l even though the number of
[00:19:34] grow with l even though the number of variables in the bayesian network grows
[00:19:36] variables in the bayesian network grows with l so now we see the kind of
[00:19:39] with l so now we see the kind of complexities of the parameters and the
[00:19:41] complexities of the parameters and the number of variables to be quite
[00:19:43] number of variables to be quite different you can have a million
[00:19:44] different you can have a million variable
[00:19:45] variable bayesian network but you might have only
[00:19:47] bayesian network but you might have only one parameter for example that's quite
[00:19:50] one parameter for example that's quite possible so here's another example our
[00:19:54] possible so here's another example our friendly hmm so we have actual positions
[00:19:57] friendly hmm so we have actual positions of objects h1 through hn and e1 through
[00:20:00] of objects h1 through hn and e1 through en and
[00:20:01] en and this should be very familiar by now so
[00:20:03] this should be very familiar by now so you have hmm
[00:20:05] you have hmm which has a joint distribution which is
[00:20:07] which has a joint distribution which is given by
[00:20:09] given by three distributions p start of h one
[00:20:12] three distributions p start of h one times transition of h i given h i minus
[00:20:14] times transition of h i given h i minus one times for each variable the
[00:20:18] one times for each variable the probability of emitting e i given h i
[00:20:21] probability of emitting e i given h i again the parameters are p star p trans
[00:20:23] again the parameters are p star p trans and
[00:20:25] and and you can think about how many
[00:20:27] and you can think about how many parameters are in this
[00:20:30] parameters are in this bayesian network well you have the
[00:20:32] bayesian network well you have the number of positions times number of
[00:20:35] number of positions times number of positions squared times number of
[00:20:38] positions squared times number of positions times number of possible
[00:20:40] positions times number of possible sensor
[00:20:41] sensor reading values
[00:20:43] reading values okay again there is no dependence on
[00:20:46] okay again there is no dependence on the time window the number of uh time
[00:20:50] the time window the number of uh time steps here and
[00:20:52] steps here and and this is useful because if you
[00:20:53] and this is useful because if you imagine tracking over a long period of
[00:20:55] imagine tracking over a long period of time you may have a million time steps
[00:20:57] time you may have a million time steps you don't want the number of parameters
[00:20:59] you don't want the number of parameters to be same
[00:21:01] to be same okay so
[00:21:02] okay so here the training data is going to again
[00:21:06] here the training data is going to again be full assignments to all the random
[00:21:08] be full assignments to all the random variables
[00:21:09] variables and
[00:21:11] and later in a future module we'll come back
[00:21:12] later in a future module we'll come back to the case where in practice
[00:21:15] to the case where in practice um you might only observe the sensor
[00:21:17] um you might only observe the sensor reading but more on that later
[00:21:20] reading but more on that later so now let's present the general case
[00:21:22] so now let's present the general case hopefully the intuitions have already um
[00:21:24] hopefully the intuitions have already um been fleshed out but i just want to
[00:21:27] been fleshed out but i just want to write things down with some formal
[00:21:29] write things down with some formal notation so
[00:21:30] notation so a bayesian network um
[00:21:32] a bayesian network um remember includes variables x1 through
[00:21:34] remember includes variables x1 through xn
[00:21:35] xn and
[00:21:36] and now we have parameters
[00:21:39] now we have parameters and the parameters is a collection of
[00:21:42] and the parameters is a collection of distributions
[00:21:44] distributions um so i'm going to write that as p
[00:21:46] um so i'm going to write that as p subscript d
[00:21:48] subscript d where d indexes into
[00:21:50] where d indexes into um some set and for the hmm for example
[00:21:54] um some set and for the hmm for example big d is start trans and min so d is
[00:21:56] big d is start trans and min so d is just a label if you will
[00:21:59] just a label if you will name
[00:22:01] name so each variable x i is generated from
[00:22:04] so each variable x i is generated from some distribution and now the notation
[00:22:07] some distribution and now the notation gets a little bit
[00:22:09] gets a little bit scary but it's p sub
[00:22:11] scary but it's p sub d i
[00:22:12] d i is the distribution
[00:22:14] is the distribution that points into x i and i'm taking that
[00:22:17] that points into x i and i'm taking that um
[00:22:18] um looking up that distribution by name
[00:22:23] so um you can think about this more
[00:22:26] so um you can think about this more formally as this is just the equation
[00:22:29] formally as this is just the equation for defining what evasion network is um
[00:22:32] for defining what evasion network is um of the joint distribution
[00:22:33] of the joint distribution those products the local conditional
[00:22:36] those products the local conditional distributions but now i'm being very
[00:22:38] distributions but now i'm being very explicit
[00:22:39] explicit that each variable
[00:22:41] that each variable di
[00:22:43] di every variable i has a particular
[00:22:45] every variable i has a particular distribution that is powering that
[00:22:47] distribution that is powering that variable
[00:22:50] variable so the idea of parameter sharing is that
[00:22:51] so the idea of parameter sharing is that di is just the same for multiple eyes
[00:22:57] okay so here is
[00:22:59] okay so here is the learning algorithm for general
[00:23:01] the learning algorithm for general bayesian networks
[00:23:03] bayesian networks so the input is a d train consisting of
[00:23:06] so the input is a d train consisting of four assignments to all the variables x1
[00:23:08] four assignments to all the variables x1 for xn
[00:23:09] for xn and the output is going to be all these
[00:23:12] and the output is going to be all these distributions here
[00:23:13] distributions here so the algorithm is again just count and
[00:23:16] so the algorithm is again just count and normalize so what we're going to do is
[00:23:18] normalize so what we're going to do is go through every training
[00:23:20] go through every training example which is the full assignment to
[00:23:22] example which is the full assignment to all the variables for every
[00:23:24] all the variables for every variable in your
[00:23:26] variable in your basic network we're just going to
[00:23:28] basic network we're just going to increment a counter
[00:23:30] increment a counter okay so what this counter is is
[00:23:34] okay so what this counter is is i look at which distribution is powering
[00:23:36] i look at which distribution is powering variable i
[00:23:38] variable i and i'm going to increment
[00:23:40] and i'm going to increment the that counter for
[00:23:42] the that counter for the local configuration which is looking
[00:23:45] the local configuration which is looking at
[00:23:45] at um
[00:23:46] um assignment to its parents and also the
[00:23:49] assignment to its parents and also the value of x i
[00:23:52] value of x i and then i'm just going to normalize for
[00:23:54] and then i'm just going to normalize for each distribution and local assignment
[00:23:57] each distribution and local assignment to its its parents
[00:24:00] to its its parents i'm going to set the probability under
[00:24:03] i'm going to set the probability under the distribution of x i given parents to
[00:24:06] the distribution of x i given parents to be proportional
[00:24:07] be proportional to this count
[00:24:10] to this count okay and that's it
[00:24:15] so so far we've presented this counter
[00:24:17] so so far we've presented this counter normalized algorithm shown a lot of
[00:24:19] normalized algorithm shown a lot of examples and hopefully this seems like a
[00:24:22] examples and hopefully this seems like a reasonable thing to do
[00:24:24] reasonable thing to do but
[00:24:24] but part of you might still be wondering
[00:24:26] part of you might still be wondering well why
[00:24:28] well why why is talented normalize a reasonable
[00:24:30] why is talented normalize a reasonable thing to do and there
[00:24:32] thing to do and there is a higher principle here and it's
[00:24:34] is a higher principle here and it's called maximum likelihood
[00:24:36] called maximum likelihood so the principle of maximum likelihood
[00:24:38] so the principle of maximum likelihood which is a a very old idea in statistics
[00:24:42] which is a a very old idea in statistics is that
[00:24:43] is that we have our training data here
[00:24:47] we have our training data here okay so we
[00:24:48] okay so we if we look at the product over all
[00:24:51] if we look at the product over all examples in the training data and we
[00:24:53] examples in the training data and we look at the probability
[00:24:54] look at the probability under the bayesian network that um
[00:24:58] under the bayesian network that um that is assigned to that data
[00:25:00] that is assigned to that data and notice i'm going to provide
[00:25:01] and notice i'm going to provide semicolon theta here
[00:25:03] semicolon theta here to
[00:25:04] to recognize the fact that this bayesian
[00:25:06] recognize the fact that this bayesian network depends on the parameters now
[00:25:10] network depends on the parameters now so this is the likelihood of this data
[00:25:13] so this is the likelihood of this data given these parameters and maximum
[00:25:16] given these parameters and maximum likelihood is saying i want to tweak
[00:25:18] likelihood is saying i want to tweak these parameters so that this likelihood
[00:25:21] these parameters so that this likelihood as large as possible
[00:25:23] as large as possible so this should look a little bit more
[00:25:25] so this should look a little bit more like what we were doing in the machine
[00:25:27] like what we were doing in the machine learning modules where we write down a
[00:25:29] learning modules where we write down a loss function which depends on
[00:25:30] loss function which depends on parameters and and which is usually a
[00:25:33] parameters and and which is usually a sum over the data and we try to minimize
[00:25:36] sum over the data and we try to minimize find the parameters that minimize loss
[00:25:38] find the parameters that minimize loss here it's the opposite we're trying to
[00:25:40] here it's the opposite we're trying to find the parameters that maximize
[00:25:42] find the parameters that maximize the likelihood
[00:25:44] the likelihood and if you just take a log and you
[00:25:47] and if you just take a log and you negate it you actually end up with
[00:25:49] negate it you actually end up with minimum loss as well but
[00:25:51] minimum loss as well but i will ignore that for now
[00:25:54] i will ignore that for now so intuitively this is um
[00:25:57] so intuitively this is um a reasonable principle as well what
[00:26:00] a reasonable principle as well what you're trying to do is for every setting
[00:26:02] you're trying to do is for every setting of parameters that gives you some
[00:26:04] of parameters that gives you some likelihood under the model of the data
[00:26:07] likelihood under the model of the data and you just want to keep on tweaking
[00:26:09] and you just want to keep on tweaking that until the likelihood as high as
[00:26:12] that until the likelihood as high as possible
[00:26:14] possible so
[00:26:16] so having said that
[00:26:18] having said that now i'm just going to claim that that
[00:26:20] now i'm just going to claim that that algorithm which we call counter
[00:26:21] algorithm which we call counter normalize is exactly solving the maximum
[00:26:25] normalize is exactly solving the maximum likelihood objective
[00:26:27] likelihood objective so this is really nice because it gives
[00:26:29] so this is really nice because it gives us a closed solute form solution
[00:26:32] us a closed solute form solution to this maximum likelihood objective you
[00:26:34] to this maximum likelihood objective you don't have to take the gradient of this
[00:26:36] don't have to take the gradient of this and iterate and worry about convergence
[00:26:38] and iterate and worry about convergence also it's just done and this is one of
[00:26:41] also it's just done and this is one of the reasons that makes maximum molecular
[00:26:43] the reasons that makes maximum molecular estimation invasion networks so um
[00:26:46] estimation invasion networks so um scalable and
[00:26:48] scalable and you know intuitive is that well it is
[00:26:51] you know intuitive is that well it is available
[00:26:53] available that was a little bit logical
[00:26:56] that was a little bit logical all right so um i haven't justified why
[00:26:59] all right so um i haven't justified why maximum likelihood principle leads to
[00:27:01] maximum likelihood principle leads to the countenance algorithm but let me
[00:27:03] the countenance algorithm but let me just provide you a little bit of a taste
[00:27:06] just provide you a little bit of a taste of why this might be the case so let's
[00:27:09] of why this might be the case so let's take this small data set d4 d5 and c5
[00:27:14] take this small data set d4 d5 and c5 so um if i write down the maximum
[00:27:17] so um if i write down the maximum likelihood objective um so i have two
[00:27:20] likelihood objective um so i have two two variables here
[00:27:23] two variables here i'm going to expand
[00:27:26] i'm going to expand that okay so i have max over theta and
[00:27:29] that okay so i have max over theta and theta really here is the probability of
[00:27:32] theta really here is the probability of genre
[00:27:33] genre the probability of rating
[00:27:35] the probability of rating of given that genre is
[00:27:38] of given that genre is c and the probability of writing a given
[00:27:39] c and the probability of writing a given genre is d
[00:27:41] genre is d so i have three
[00:27:42] so i have three distributions here
[00:27:44] distributions here that i want to
[00:27:46] that i want to optimize and i'm just expand out based
[00:27:49] optimize and i'm just expand out based on the definition of a bayesian network
[00:27:51] on the definition of a bayesian network um
[00:27:52] um i have probability of a d given for
[00:27:56] i have probability of a d given for probability of rating for given d
[00:27:58] probability of rating for given d so that is
[00:27:59] so that is the probability
[00:28:01] the probability of the first
[00:28:03] of the first data point times
[00:28:05] data point times p of d
[00:28:06] p of d five given d that's the second data
[00:28:09] five given d that's the second data point and then e of c and p of five
[00:28:12] point and then e of c and p of five given c that's the third data point so
[00:28:14] given c that's the third data point so i'm multiplying all these probabilities
[00:28:15] i'm multiplying all these probabilities across all the data points and that is
[00:28:18] across all the data points and that is the probability
[00:28:20] the probability of the data given a particular
[00:28:25] of the data given a particular assignment to the local conditional
[00:28:27] assignment to the local conditional distribution
[00:28:30] distribution and now i've color coded them on purpose
[00:28:32] and now i've color coded them on purpose because what we can do is we can shuffle
[00:28:35] because what we can do is we can shuffle things around
[00:28:36] things around if you just look at probability of g
[00:28:39] if you just look at probability of g right so i'm maxing over that
[00:28:41] right so i'm maxing over that and
[00:28:42] and that
[00:28:43] that shows up in these three places and it
[00:28:46] shows up in these three places and it doesn't uh affect anything else so i'm i
[00:28:49] doesn't uh affect anything else so i'm i can just pull that out
[00:28:51] can just pull that out and i can pull the green
[00:28:53] and i can pull the green apart which is up r given c
[00:28:56] apart which is up r given c and can pull the blue stuff out and
[00:28:59] and can pull the blue stuff out and that's maximizing over um if r
[00:29:02] that's maximizing over um if r are given
[00:29:04] are given g equals d here
[00:29:06] g equals d here so
[00:29:07] so the
[00:29:08] the punchline here is that we can decompose
[00:29:13] punchline here is that we can decompose the maximum likelihood objective which
[00:29:14] the maximum likelihood objective which looks like a big tangled mess
[00:29:16] looks like a big tangled mess into actually some problems one for
[00:29:18] into actually some problems one for every distribution and assignments to
[00:29:22] every distribution and assignments to the parents of
[00:29:23] the parents of a particular variable
[00:29:26] a particular variable and now having done that
[00:29:28] and now having done that now i have just one a little local
[00:29:32] now i have just one a little local uh optimization problem here
[00:29:35] uh optimization problem here which is uh basically
[00:29:37] which is uh basically a solved by um in closed form
[00:29:41] a solved by um in closed form uh you can do this uh i'm not gonna do
[00:29:43] uh you can do this uh i'm not gonna do this for you but you can introduce a
[00:29:45] this for you but you can introduce a lagrange multiplier for the sum to one
[00:29:47] lagrange multiplier for the sum to one constraint
[00:29:48] constraint and you um do some
[00:29:52] and you um do some uh take some derivatives and you set it
[00:29:53] uh take some derivatives and you set it as zero and then you get that
[00:29:55] as zero and then you get that uh the probability max molecular
[00:29:58] uh the probability max molecular probability is proportional to the case
[00:30:00] probability is proportional to the case here
[00:30:01] here and in this case what we will estimate
[00:30:04] and in this case what we will estimate is that um
[00:30:06] is that um the probability of d is two-thirds
[00:30:08] the probability of d is two-thirds probably c is one-third and so on
[00:30:15] okay so let me summarize now so we've
[00:30:17] okay so let me summarize now so we've talked about
[00:30:19] talked about learning in fully supervised bayesian
[00:30:21] learning in fully supervised bayesian networks where we're observing instances
[00:30:24] networks where we're observing instances of all
[00:30:25] of all the variables here
[00:30:27] the variables here so one important concept to take away is
[00:30:29] so one important concept to take away is this idea of parameter sharing
[00:30:32] this idea of parameter sharing so we have talked about
[00:30:34] so we have talked about just a bayesian network
[00:30:37] just a bayesian network which an inference doesn't care about
[00:30:39] which an inference doesn't care about where these parameters come from
[00:30:42] where these parameters come from but we should really think about each of
[00:30:43] but we should really think about each of these nodes as being powered by
[00:30:47] these nodes as being powered by a particular
[00:30:48] a particular local conditional distribution
[00:30:50] local conditional distribution and sometimes two variables could be
[00:30:53] and sometimes two variables could be powered by the same distribution
[00:30:56] powered by the same distribution and again inference is reading from the
[00:30:58] and again inference is reading from the parameters right
[00:31:00] parameters right learning is right into the parameters in
[00:31:01] learning is right into the parameters in which case
[00:31:03] which case it matters where these arrows come from
[00:31:08] so
[00:31:09] so secondly we looked at the maximum
[00:31:12] secondly we looked at the maximum likelihood principle which is this kind
[00:31:14] likelihood principle which is this kind of high minor principle that says
[00:31:16] of high minor principle that says maximize the likelihood of your data and
[00:31:18] maximize the likelihood of your data and we show that this is
[00:31:20] we show that this is equal to this very
[00:31:22] equal to this very pragmatic and simple intuitive principle
[00:31:25] pragmatic and simple intuitive principle of
[00:31:26] of counting and normalizing
[00:31:28] counting and normalizing and this is the simplicity which makes
[00:31:30] and this is the simplicity which makes bayesian networks especially naive bayes
[00:31:33] bayesian networks especially naive bayes still very kind of practical useful and
[00:31:36] still very kind of practical useful and interpretable
[00:31:37] interpretable that's the end


================================================================================
LECTURE 039
================================================================================

Bayesian Networks 8 - Smoothing | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=M7rWvN_0xbw

---

Transcript

[00:00:05] hi in this module i'm going to talk
[00:00:07] hi in this module i'm going to talk about laplace smoothing for guardian and
[00:00:09] about laplace smoothing for guardian and glance over
[00:00:11] glance over so let's review maximum likelihood
[00:00:13] so let's review maximum likelihood estimation
[00:00:15] estimation remember last time we had an example of
[00:00:17] remember last time we had an example of a two-variable
[00:00:19] a two-variable work a genre of a movie and the rating
[00:00:21] work a genre of a movie and the rating of the movie where their joint
[00:00:23] of the movie where their joint distribution is given by probability of
[00:00:25] distribution is given by probability of a genre times probability of rating
[00:00:27] a genre times probability of rating given
[00:00:29] given and now we don't know these parameters
[00:00:32] and now we don't know these parameters but we want to estimate them from data
[00:00:34] but we want to estimate them from data suppose we gather
[00:00:36] suppose we gather five data points here
[00:00:38] five data points here and the way that maximum likelihood
[00:00:40] and the way that maximum likelihood estimation works is by counting and
[00:00:42] estimation works is by counting and normalizing parameters here are
[00:00:46] normalizing parameters here are probability of g
[00:00:47] probability of g and for that i'm going to count the
[00:00:49] and for that i'm going to count the number of times
[00:00:50] number of times g
[00:00:51] g goes up
[00:00:52] goes up and normalize
[00:00:54] and normalize and for
[00:00:56] and for probability of r given g i'm going to
[00:00:58] probability of r given g i'm going to look at the number of times each of
[00:01:00] look at the number of times each of these configurations shows up
[00:01:02] these configurations shows up and then i'm going to normalize
[00:01:04] and then i'm going to normalize each one condition on the value
[00:01:10] so if you look at these estimates you
[00:01:12] so if you look at these estimates you might notice that there's something
[00:01:14] might notice that there's something funny going on here
[00:01:16] funny going on here so
[00:01:17] so the probability that these
[00:01:20] the probability that these parameters assigned to a rating of two
[00:01:23] parameters assigned to a rating of two given that there's a comedy is zero it
[00:01:26] given that there's a comedy is zero it doesn't show up in this row of this
[00:01:28] doesn't show up in this row of this table which means that it's zero
[00:01:30] table which means that it's zero so do we really believe this
[00:01:33] so do we really believe this just because we didn't see an example of
[00:01:35] just because we didn't see an example of a comedy being rated as a two are we
[00:01:38] a comedy being rated as a two are we licensed to just give it a probably zero
[00:01:40] licensed to just give it a probably zero well that would be very close my dead
[00:01:43] well that would be very close my dead so this is a case where maximum
[00:01:44] so this is a case where maximum likelihood has overfit
[00:01:49] there's a very simple way to fix this
[00:01:51] there's a very simple way to fix this called laplace smoothing
[00:01:53] called laplace smoothing and the idea is that we're just going to
[00:01:55] and the idea is that we're just going to add
[00:01:56] add a lambda
[00:01:58] a lambda which is some positive value let's say
[00:02:00] which is some positive value let's say one to each count so let's do a maximum
[00:02:03] one to each count so let's do a maximum likelihood with laplace movement
[00:02:06] likelihood with laplace movement so training data is the same as before
[00:02:08] so training data is the same as before and what we're going to do is for each
[00:02:10] and what we're going to do is for each of these local distributions we're going
[00:02:13] of these local distributions we're going to preset pre-load a 1
[00:02:16] to preset pre-load a 1 labda more generally into this position
[00:02:20] labda more generally into this position and now i'm going to go through the
[00:02:21] and now i'm going to go through the training and count as usual so i add
[00:02:23] training and count as usual so i add three and i add two
[00:02:25] three and i add two and then i'm going to normalize
[00:02:27] and then i'm going to normalize over these combined counts
[00:02:30] over these combined counts and same with uh the probability of r
[00:02:34] and same with uh the probability of r given
[00:02:35] given g for each of these
[00:02:38] g for each of these configurations so now i have to actually
[00:02:40] configurations so now i have to actually instantiate all possible configurations
[00:02:43] instantiate all possible configurations i'm going to load a one
[00:02:46] i'm going to load a one into each of these counts
[00:02:49] into each of these counts and then i'm going to look at my
[00:02:50] and then i'm going to look at my training data
[00:02:52] training data and i'm going to add two there's two d4s
[00:02:55] and i'm going to add two there's two d4s one d5 one c1 and one c5
[00:02:58] one d5 one c1 and one c5 now given these counts i'm going to
[00:03:00] now given these counts i'm going to normalize to get my probability limit
[00:03:03] normalize to get my probability limit look at all the d's count them up
[00:03:05] look at all the d's count them up normalize i get
[00:03:07] normalize i get um some
[00:03:08] um some of these here and look at all the case
[00:03:12] of these here and look at all the case rows where g c
[00:03:14] rows where g c i'm going to
[00:03:16] i'm going to sum in
[00:03:18] sum in so now
[00:03:20] so now what we revisit our probability estimate
[00:03:22] what we revisit our probability estimate of r equals two given uh g equals c
[00:03:26] of r equals two given uh g equals c this was zero before
[00:03:28] this was zero before but now it's uh
[00:03:30] but now it's uh here that's one over seven which is
[00:03:32] here that's one over seven which is greater than zero
[00:03:34] greater than zero so now because we smooth these estimates
[00:03:36] so now because we smooth these estimates now we have a little bit more
[00:03:38] now we have a little bit more probability on even
[00:03:40] probability on even those outcomes that we've never seen
[00:03:43] those outcomes that we've never seen during training
[00:03:46] during training the key idea behind maximum likelihood
[00:03:48] the key idea behind maximum likelihood laplace smoothing is follows
[00:03:51] laplace smoothing is follows so we're going to go through each
[00:03:53] so we're going to go through each distribution
[00:03:55] distribution and partial assignment
[00:03:57] and partial assignment uh to
[00:03:59] uh to the parents of a node and the node
[00:04:01] the parents of a node and the node itself
[00:04:02] itself and we're simply going to add lambda
[00:04:04] and we're simply going to add lambda to the count
[00:04:07] to the count now we do maximum likelihood estimation
[00:04:09] now we do maximum likelihood estimation as usual so we're going to go through
[00:04:12] as usual so we're going to go through the training data and increment those
[00:04:14] the training data and increment those counts based on what we saw
[00:04:16] counts based on what we saw and then we count and normalize and
[00:04:17] and then we count and normalize and that's it
[00:04:19] that's it so the interpretation that we can place
[00:04:22] so the interpretation that we can place on the plot smoothing is it's like we
[00:04:24] on the plot smoothing is it's like we hallucinated
[00:04:26] hallucinated lambda occurrences of each local
[00:04:28] lambda occurrences of each local assignment so sometimes these uh lambda
[00:04:31] assignment so sometimes these uh lambda accounts are called pseudo counts
[00:04:32] accounts are called pseudo counts because they're not based on the data
[00:04:34] because they're not based on the data they're kind of made up for virtual
[00:04:36] they're kind of made up for virtual counts
[00:04:37] counts so you can think about pretending you
[00:04:39] so you can think about pretending you saw some examples before you saw data
[00:04:42] saw some examples before you saw data and then do a maximum likelihood image
[00:04:48] so
[00:04:48] so how
[00:04:49] how much
[00:04:51] much should lambda be how much smoothing
[00:04:52] should lambda be how much smoothing should we have and how does it interact
[00:04:54] should we have and how does it interact with the um there's two observations i
[00:04:57] with the um there's two observations i want to make first is that the more
[00:05:00] want to make first is that the more you smooth which means that the more the
[00:05:02] you smooth which means that the more the bigger the lambda is
[00:05:04] bigger the lambda is the closer you're going to push the
[00:05:06] the closer you're going to push the probability estimates closer to the
[00:05:07] probability estimates closer to the uniform distribution so for example
[00:05:11] uniform distribution so for example if i just smooth with lambda equals
[00:05:13] if i just smooth with lambda equals one-half
[00:05:14] one-half then and i observe only a d here then
[00:05:17] then and i observe only a d here then the probabilities estimates are going to
[00:05:18] the probabilities estimates are going to be three quarters in one corner whereas
[00:05:21] be three quarters in one corner whereas if i smooth with one then the
[00:05:24] if i smooth with one then the probabilities are going to be two-thirds
[00:05:25] probabilities are going to be two-thirds and one-third which is closer to
[00:05:27] and one-third which is closer to half-half
[00:05:29] half-half the second observation i want to make is
[00:05:31] the second observation i want to make is that no matter what you set lambda to
[00:05:34] that no matter what you set lambda to the data wins out in the end
[00:05:36] the data wins out in the end so suppose we only see examples of
[00:05:39] so suppose we only see examples of dramas
[00:05:41] dramas so suppose that we're smoothing with
[00:05:44] so suppose that we're smoothing with lambda equals one and we saw only one
[00:05:46] lambda equals one and we saw only one example of g equals
[00:05:49] example of g equals then again the problem estimates are
[00:05:51] then again the problem estimates are two-thirds and one-third
[00:05:53] two-thirds and one-third but suppose we keep on seeing dramas
[00:05:56] but suppose we keep on seeing dramas over and over again so we see 998 of
[00:06:00] over and over again so we see 998 of them
[00:06:00] them and now if we account and
[00:06:03] and now if we account and normalize then we get the probability
[00:06:05] normalize then we get the probability estimate as 0.999
[00:06:07] estimate as 0.999 uh
[00:06:08] uh of drama which is much closer to
[00:06:11] of drama which is much closer to scene-only dramas
[00:06:16] so to summarize
[00:06:20] uh we
[00:06:21] uh we looked at laplace smoothing for avoiding
[00:06:24] looked at laplace smoothing for avoiding overfitting and estimating beijing
[00:06:26] overfitting and estimating beijing networks
[00:06:27] networks and the key idea is that
[00:06:29] and the key idea is that we preload counts with a lambda
[00:06:33] we preload counts with a lambda and then we're going to go to training
[00:06:34] and then we're going to go to training data and we add
[00:06:36] data and we add counts based on our data and then we
[00:06:39] counts based on our data and then we normalize
[00:06:41] normalize so
[00:06:42] so the smoothing pulls us away
[00:06:45] the smoothing pulls us away from zeros to the uniform distribution
[00:06:49] from zeros to the uniform distribution but in the end
[00:06:50] but in the end all the smoothing
[00:06:52] all the smoothing gets washed out with more data
[00:06:55] gets washed out with more data that's the end


================================================================================
LECTURE 040
================================================================================

Bayesian Networks 9 - EM Algorithm | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=CPVFJBd-Qcg

---

Transcript

[00:00:06] hi in this module I'm going to talk
[00:00:07] hi in this module I'm going to talk about the EM algorithm for learning ban
[00:00:09] about the EM algorithm for learning ban networks when we have unobserved
[00:00:10] networks when we have unobserved variables our training
[00:00:12] variables our training data so let's start with our familiar
[00:00:15] data so let's start with our familiar movie raing example so here remember
[00:00:17] movie raing example so here remember this basan network we have a genre which
[00:00:20] this basan network we have a genre which could be drama or comedy and we have two
[00:00:23] could be drama or comedy and we have two people Jim and Martha who are going to
[00:00:25] people Jim and Martha who are going to rate produce ratings of this movie going
[00:00:29] rate produce ratings of this movie going to denote as R1 and
[00:00:31] to denote as R1 and R2 and before when we observe all the
[00:00:34] R2 and before when we observe all the variables in our training data we could
[00:00:36] variables in our training data we could just use maximum likelihood which um
[00:00:39] just use maximum likelihood which um amounts to counting and
[00:00:40] amounts to counting and normalizing but this only works if we
[00:00:43] normalizing but this only works if we observe all the variables in each
[00:00:45] observe all the variables in each training example so data collection is
[00:00:49] training example so data collection is expensive what happens if we don't
[00:00:51] expensive what happens if we don't observe some of the for example what it
[00:00:53] observe some of the for example what it happens if we don't know the genre of
[00:00:56] happens if we don't know the genre of the movies but we only observe the pairs
[00:00:59] the movies but we only observe the pairs of ratings of Martha and Jim so what can
[00:01:04] of ratings of Martha and Jim so what can we do in this case you know intuitively
[00:01:07] we do in this case you know intuitively it seems kind of hopeless how can we
[00:01:09] it seems kind of hopeless how can we learn a basan network relating G and R
[00:01:12] learn a basan network relating G and R when we don't even see examples of G but
[00:01:16] when we don't even see examples of G but we'll show that this is actually
[00:01:19] we'll show that this is actually possible in many cases well certainly
[00:01:21] possible in many cases well certainly not all cases and that's kind of the
[00:01:23] not all cases and that's kind of the magic of em and unsupervised learning in
[00:01:28] magic of em and unsupervised learning in general so let's try to approach this
[00:01:31] general so let's try to approach this problem top down what are the principles
[00:01:33] problem top down what are the principles that we have well maximum likelihood was
[00:01:36] that we have well maximum likelihood was uh something that served quite well so
[00:01:38] uh something that served quite well so let's try to see if we can make that
[00:01:41] let's try to see if we can make that work so generally we have a set of
[00:01:45] work so generally we have a set of variables which are hidden called Big H
[00:01:47] variables which are hidden called Big H and we also have some variables Big E
[00:01:50] and we also have some variables Big E which are observed so in this a movie
[00:01:54] which are observed so in this a movie rating example we have G is the hidden
[00:01:57] rating example we have G is the hidden variable the two ratings is a um observe
[00:02:00] variable the two ratings is a um observe variable and we have some little e
[00:02:03] variable and we have some little e denoting what they're observed to and in
[00:02:06] denoting what they're observed to and in this case we have remember the set of
[00:02:08] this case we have remember the set of parameters which is the probability of G
[00:02:11] parameters which is the probability of G and the probability of R
[00:02:13] and the probability of R given so
[00:02:15] given so the the principle of Maximum marginal
[00:02:19] the the principle of Maximum marginal likelihood says well just maximize the
[00:02:23] likelihood says well just maximize the probability of the data tweak the
[00:02:25] probability of the data tweak the parameters to make that probability as
[00:02:27] parameters to make that probability as high as possible so what does means for
[00:02:30] high as possible so what does means for us is that we're going to try to find
[00:02:31] us is that we're going to try to find the Theta that maximizes the probability
[00:02:36] the Theta that maximizes the probability over all the
[00:02:38] over all the observed observations that we have in
[00:02:40] observed observations that we have in the training data of the probability of
[00:02:43] the training data of the probability of that observation given data so this
[00:02:47] that observation given data so this looks very much like maximum likelihood
[00:02:49] looks very much like maximum likelihood with the exception that we are
[00:02:52] with the exception that we are marginalizing out the hidden variable
[00:02:56] marginalizing out the hidden variable and uh just to spell this out what this
[00:02:58] and uh just to spell this out what this quantity really is is is the summation
[00:03:02] quantity really is is is the summation over possible values of the Hidden
[00:03:05] over possible values of the Hidden variables of H equal H
[00:03:10] and okay so this is the principle that
[00:03:13] and okay so this is the principle that we want to adhere
[00:03:15] we want to adhere to so it turns out that the EM algorithm
[00:03:19] to so it turns out that the EM algorithm is one way of trying to optimize this um
[00:03:24] is one way of trying to optimize this um objective but we're going to try to
[00:03:26] objective but we're going to try to motivate em in a more intuitive way um
[00:03:30] motivate em in a more intuitive way um so em you should think about it as a
[00:03:32] so em you should think about it as a generalization of the K means algorithm
[00:03:36] generalization of the K means algorithm and remember in K means for clustering
[00:03:38] and remember in K means for clustering we also had a a similar problem where we
[00:03:41] we also had a a similar problem where we have cluster centroids and cluster
[00:03:44] have cluster centroids and cluster assignments both of which we didn't know
[00:03:47] assignments both of which we didn't know in our case the cluster centroids are
[00:03:50] in our case the cluster centroids are going to be generalized to parameters of
[00:03:52] going to be generalized to parameters of evasion Network in general and the
[00:03:54] evasion Network in general and the cluster assignments are going to be
[00:03:55] cluster assignments are going to be generalized to the hidden variables
[00:04:01] so here are the the variables we have e
[00:04:05] so here are the the variables we have e h and here is the expectation
[00:04:07] h and here is the expectation maximization algorithm or otherwise
[00:04:10] maximization algorithm or otherwise known as em so we're first going to
[00:04:13] known as em so we're first going to initialize the parameters
[00:04:15] initialize the parameters randomly and then we're going to repeat
[00:04:17] randomly and then we're going to repeat until
[00:04:18] until convergence um it's going to alternate
[00:04:21] convergence um it's going to alternate between two steps the Eep and the M step
[00:04:24] between two steps the Eep and the M step in the Eep what we're going to first do
[00:04:27] in the Eep what we're going to first do is try to use the parameters to guess
[00:04:31] is try to use the parameters to guess the hidden variables so we're going to
[00:04:33] the hidden variables so we're going to compute Q of H is going to be a
[00:04:37] compute Q of H is going to be a distribution over possible values that
[00:04:39] distribution over possible values that the hidden variables could take on and
[00:04:42] the hidden variables could take on and this is going to be equal to Simply the
[00:04:45] this is going to be equal to Simply the probability of the Hidden variable
[00:04:47] probability of the Hidden variable condition on um the evidence or
[00:04:50] condition on um the evidence or observations that we
[00:04:52] observations that we saw and again this is depends on the
[00:04:56] saw and again this is depends on the parameters um at the current iteration
[00:05:00] parameters um at the current iteration and we're going to do this for every
[00:05:02] and we're going to do this for every possible uh value of H how do we do this
[00:05:06] possible uh value of H how do we do this well we've already seen how we can
[00:05:09] well we've already seen how we can compute these type of quantities given a
[00:05:11] compute these type of quantities given a fixed Pion Network and this is called
[00:05:13] fixed Pion Network and this is called protic inference so in case h is small
[00:05:17] protic inference so in case h is small we can just do it kind of through Force
[00:05:20] we can just do it kind of through Force if uh H is hmm we can use the network as
[00:05:24] if uh H is hmm we can use the network as hmm we can use forward backward in
[00:05:26] hmm we can use forward backward in general we can use Gib sampling Etc
[00:05:30] general we can use Gib sampling Etc so now what do we have we have um these
[00:05:34] so now what do we have we have um these weights for
[00:05:36] weights for every H and now we can create fully
[00:05:40] every H and now we can create fully observed examples now we can just pair a
[00:05:44] observed examples now we can just pair a particular H with our observations and
[00:05:47] particular H with our observations and put a weight next to that example and
[00:05:52] put a weight next to that example and the important thing is now now we have a
[00:05:54] the important thing is now now we have a set of weighted examples which are fully
[00:05:56] set of weighted examples which are fully observed and what do we know how do we
[00:05:59] observed and what do we know how do we deal with fully observed examples well
[00:06:01] deal with fully observed examples well we can do maximum likelihood now uh so
[00:06:05] we can do maximum likelihood now uh so we take these weighted examples and then
[00:06:07] we take these weighted examples and then we just count and normalize and that
[00:06:10] we just count and normalize and that gives us a fresh set of parameters which
[00:06:13] gives us a fresh set of parameters which we then can go back and repeat um the E
[00:06:16] we then can go back and repeat um the E step and the M step over and over
[00:06:20] step and the M step over and over again so the EM algorithm is guarantee
[00:06:23] again so the EM algorithm is guarantee to converge through a local Optimum just
[00:06:25] to converge through a local Optimum just like K means but it can get stuck in
[00:06:28] like K means but it can get stuck in local Optimum and not actually solve uh
[00:06:31] local Optimum and not actually solve uh the global optimization
[00:06:34] the global optimization problem so let's do an example we're
[00:06:37] problem so let's do an example we're just going to do one iteration of em on
[00:06:40] just going to do one iteration of em on our sample Basia Network so suppose our
[00:06:43] our sample Basia Network so suppose our training data includes two examples um
[00:06:47] training data includes two examples um R2 uh R1 and R2 equals 2 and two and a
[00:06:51] R2 uh R1 and R2 equals 2 and two and a second example where it's one and two
[00:06:53] second example where it's one and two and the genre is
[00:06:56] and the genre is unobserved okay so suppose we have
[00:06:59] unobserved okay so suppose we have parameters
[00:07:00] parameters um that look like this so um probability
[00:07:03] um that look like this so um probability of G is just
[00:07:05] of G is just uniform and probability of R given G is
[00:07:10] uniform and probability of R given G is this by this
[00:07:12] this by this table okay so now we're going to do the
[00:07:15] table okay so now we're going to do the Eep so remember the Eep is trying to
[00:07:18] Eep so remember the Eep is trying to guess what G is given each of these
[00:07:22] guess what G is given each of these examples because we don't know what uh
[00:07:25] examples because we don't know what uh so let's look at
[00:07:27] so let's look at 22 okay so the first example example
[00:07:30] 22 okay so the first example example well G could be either C or D so there's
[00:07:34] well G could be either C or D so there's two possibilities and for each one I'm
[00:07:36] two possibilities and for each one I'm going to compute the probability of the
[00:07:40] going to compute the probability of the joint assignment for
[00:07:43] joint assignment for now so here I'm just going to by
[00:07:46] now so here I'm just going to by definition of uh the be Network this is
[00:07:49] definition of uh the be Network this is going to be um probability of g equals c
[00:07:54] going to be um probability of g equals c that's .5 times probability of G Ral to
[00:08:00] that's .5 times probability of G Ral to given uh c g gal C that's 6 um and then
[00:08:08] given uh c g gal C that's 6 um and then R2 equals to given G that's 6 that gives
[00:08:13] R2 equals to given G that's 6 that gives me8 and now we look at the other uh
[00:08:16] me8 and now we look at the other uh possibility which is gals D and the
[00:08:19] possibility which is gals D and the probability of gals D is
[00:08:24] probability of gals D is .5 and the probability of R1 = 2 given
[00:08:28] .5 and the probability of R1 = 2 given gal d
[00:08:30] gal d that is
[00:08:32] that is um4 here uh4 down here and a probability
[00:08:38] um4 here uh4 down here and a probability of R2 equal 2 given g c that's also
[00:08:41] of R2 equal 2 given g c that's also another
[00:08:42] another point4 okay so now I have uh these
[00:08:46] point4 okay so now I have uh these probabilities next to each of these
[00:08:49] probabilities next to each of these possible um extensions of this
[00:08:53] possible um extensions of this assignment um now I can normalize and
[00:08:56] assignment um now I can normalize and that's how I get my uh Q distribution so
[00:09:01] that's how I get my uh Q distribution so if I normalize distribution I'm going to
[00:09:02] if I normalize distribution I'm going to get 69 and
[00:09:06] get 69 and point there's more
[00:09:09] point there's more probability Mass on g equals c if I were
[00:09:14] probability Mass on g equals c if I were to guess then g equals
[00:09:17] to guess then g equals D okay so now I move on to the second
[00:09:19] D okay so now I move on to the second data point um and I'm going to do the
[00:09:21] data point um and I'm going to do the same thing so one two could either be C
[00:09:25] same thing so one two could either be C or and I'm going to compute the prob of
[00:09:29] or and I'm going to compute the prob of each possible assignment to G so I have
[00:09:34] each possible assignment to G so I have a probability of gal C that's .5 times
[00:09:38] a probability of gal C that's .5 times probability of R1 = 1 given g c that's
[00:09:44] probability of R1 = 1 given g c that's 04 um and I have what is the probability
[00:09:47] 04 um and I have what is the probability of R2 = 2 given gal C that's 6 and
[00:09:53] of R2 = 2 given gal C that's 6 and analogously I can compute the same
[00:09:55] analogously I can compute the same quantity for g c and again I
[00:10:00] quantity for g c and again I normalize and I get 05
[00:10:03] normalize and I get 05 and5 okay so at this point at the end of
[00:10:06] and5 okay so at this point at the end of the Eep what I have are four flesh to
[00:10:10] the Eep what I have are four flesh to data points I started with two data
[00:10:13] data points I started with two data points but it's kind of been expanded
[00:10:15] points but it's kind of been expanded into the possible continuations of of G
[00:10:19] into the possible continuations of of G and each data point is weighted by some
[00:10:23] and each data point is weighted by some probability Q of G which is essentially
[00:10:27] probability Q of G which is essentially how much I think that that uh um data
[00:10:31] how much I think that that uh um data point is valid in some
[00:10:33] point is valid in some sense okay so now we move on to the m
[00:10:36] sense okay so now we move on to the m step and the mstep is just going to take
[00:10:39] step and the mstep is just going to take these four data points and count them up
[00:10:42] these four data points and count them up and normalize so this should be very
[00:10:45] and normalize so this should be very familiar so first we're going to
[00:10:47] familiar so first we're going to estimate the probability of G so G can
[00:10:51] estimate the probability of G so G can take on two values C and D so I count
[00:10:54] take on two values C and D so I count them up so where are the how many times
[00:10:57] them up so where are the how many times did uh gals C occur well it shows up in
[00:11:01] did uh gals C occur well it shows up in the first and the third data points and
[00:11:04] the first and the third data points and I'm just going to add their weights
[00:11:05] I'm just going to add their weights together which is
[00:11:07] together which is 69 and
[00:11:10] 69 and .5 and what about D well gals D shows up
[00:11:14] .5 and what about D well gals D shows up in the second and the fourth rows and
[00:11:16] in the second and the fourth rows and that's. 31
[00:11:18] that's. 31 +0.5 and then I'm just going to uh
[00:11:21] +0.5 and then I'm just going to uh normalize this into an actual
[00:11:25] normalize this into an actual distribution so now I move on to the
[00:11:28] distribution so now I move on to the probability of R given G so for each
[00:11:31] probability of R given G so for each possible configuration here I'm going to
[00:11:34] possible configuration here I'm going to count so C1 shows up um here once and
[00:11:41] count so C1 shows up um here once and that has a weight of five um C what
[00:11:45] that has a weight of five um C what about C2 C2 shows up three times one
[00:11:49] about C2 C2 shows up three times one here one with R2 and one down here if I
[00:11:54] here one with R2 and one down here if I add the weights of those I'm going to
[00:11:57] add the weights of those I'm going to get 05 Plus 69 plus 69 remember notice
[00:12:01] get 05 Plus 69 plus 69 remember notice that this uh example is used twice
[00:12:05] that this uh example is used twice because um I'm generating two twice from
[00:12:10] because um I'm generating two twice from C okay so now I have uh these counts I
[00:12:14] C okay so now I have uh these counts I normalize this uh distribution to get a
[00:12:18] normalize this uh distribution to get a distribution of R given g equals c so
[00:12:24] distribution of R given g equals c so now I move on to what happens when um G
[00:12:28] now I move on to what happens when um G is d so I look at D1 D1 shows up uh once
[00:12:33] is d so I look at D1 D1 shows up uh once here with weight .5 and what about d = 2
[00:12:37] here with weight .5 and what about d = 2 well that shows up three times um twice
[00:12:42] well that shows up three times um twice here 2.3 ons and then once here I have
[00:12:47] here 2.3 ons and then once here I have another 0.5 I'm going to add normalize
[00:12:50] another 0.5 I'm going to add normalize and I get a
[00:12:52] and I get a distribution so the only difference
[00:12:54] distribution so the only difference between maximum likelihood and the M
[00:12:57] between maximum likelihood and the M step is that now I'm adding these
[00:12:59] step is that now I'm adding these fractional counts rather than um integer
[00:13:02] fractional counts rather than um integer counts but otherwise the logic and the
[00:13:05] counts but otherwise the logic and the code is exactly um the
[00:13:07] code is exactly um the same so what have we done stepping back
[00:13:10] same so what have we done stepping back a little bit intuitively we've gone from
[00:13:13] a little bit intuitively we've gone from a preliminary set of um parameters and
[00:13:17] a preliminary set of um parameters and I'm guessing what G is and then using
[00:13:19] I'm guessing what G is and then using that guess of G to further refine my
[00:13:22] that guess of G to further refine my estimate of the parameters and you'll
[00:13:24] estimate of the parameters and you'll see that uh the parameters over here
[00:13:28] see that uh the parameters over here were uh 4 and 6 and now they've been
[00:13:31] were uh 4 and 6 and now they've been pushed um to0 2
[00:13:35] pushed um to0 2 and8 so in general em is going to uh it
[00:13:39] and8 so in general em is going to uh it tends to polarize the probabilities
[00:13:42] tends to polarize the probabilities because that's the best way to maximize
[00:13:45] because that's the best way to maximize the likelihood of the data and now this
[00:13:49] the likelihood of the data and now this is just one iteration of BM now I would
[00:13:51] is just one iteration of BM now I would take these parameters and go through the
[00:13:53] take these parameters and go through the same process and go through the same
[00:13:54] same process and go through the same process until I've um converted
[00:14:01] okay so now let's turn to an interesting
[00:14:04] okay so now let's turn to an interesting application of of em and that's
[00:14:08] application of of em and that's decipherment so this is an example of a
[00:14:10] decipherment so this is an example of a ciphers it's called a copal cipher 105
[00:14:14] ciphers it's called a copal cipher 105 page encrypted volume dating back from
[00:14:16] page encrypted volume dating back from the 1730s it looks like this so for a
[00:14:19] the 1730s it looks like this so for a long time no one knew what was what
[00:14:22] long time no one knew what was what these words were um it was finally
[00:14:26] these words were um it was finally cracked in
[00:14:27] cracked in 2011 with the help of Em by Kevin Knight
[00:14:31] 2011 with the help of Em by Kevin Knight an NLP researcher so the kobio cipher is
[00:14:34] an NLP researcher so the kobio cipher is actually very complex so what we're
[00:14:36] actually very complex so what we're going to do is motivate the idea of
[00:14:39] going to do is motivate the idea of using basan networks for decipherment
[00:14:41] using basan networks for decipherment with a simple substitution
[00:14:43] with a simple substitution Cipher so the idea behind a substitution
[00:14:46] Cipher so the idea behind a substitution Cipher is that suppose you wanted to
[00:14:49] Cipher is that suppose you wanted to send an encrypted message to someone so
[00:14:51] send an encrypted message to someone so you're you're going to generate uh a
[00:14:54] you're you're going to generate uh a substitution table which specifies how
[00:14:57] substitution table which specifies how each letter uh uh gets transformed into
[00:15:01] each letter uh uh gets transformed into another letter Cipher is going to be a
[00:15:03] another letter Cipher is going to be a permutation of all the
[00:15:05] permutation of all the letter um and then you have a message
[00:15:09] letter um and then you have a message you want to send suppose you want to say
[00:15:11] you want to send suppose you want to say hello
[00:15:12] hello World um you're going to use this
[00:15:14] World um you're going to use this substitution table table apply it to
[00:15:16] substitution table table apply it to this plain text to produce a cipher text
[00:15:19] this plain text to produce a cipher text and this is done by taking a mapping H
[00:15:23] and this is done by taking a mapping H to n um e to m l to y and L to Y and O
[00:15:30] to n um e to m l to y and L to Y and O to T and so on so now you hide this
[00:15:34] to T and so on so now you hide this substitution T table and then you hand
[00:15:37] substitution T table and then you hand someone the cipher text or you put in a
[00:15:39] someone the cipher text or you put in a book and bury it for someone to discover
[00:15:42] book and bury it for someone to discover later so now the question is when
[00:15:46] later so now the question is when someone receives a cipher text uh is
[00:15:49] someone receives a cipher text uh is given the cipher text can they recover
[00:15:52] given the cipher text can they recover the pl text importantly the pl text is
[00:15:55] the pl text importantly the pl text is obviously unknown but also the
[00:15:57] obviously unknown but also the substitution table is also
[00:16:00] substitution table is also unknown so this is a very challenging
[00:16:03] unknown so this is a very challenging problem but let's see how we can use uh
[00:16:06] problem but let's see how we can use uh asan networks in particular hmm to try
[00:16:09] asan networks in particular hmm to try to address
[00:16:10] to address this so remember the process of using
[00:16:13] this so remember the process of using hmm you have to think about what is the
[00:16:15] hmm you have to think about what is the generative story of how this data
[00:16:19] generative story of how this data arose so I'm going to model this as
[00:16:21] arose so I'm going to model this as follows I'm going to have a sequence of
[00:16:24] follows I'm going to have a sequence of letters which are plain
[00:16:27] letters which are plain text but and these are hi hidden
[00:16:30] text but and these are hi hidden um and we have a corresponding sequence
[00:16:33] um and we have a corresponding sequence of characters in the cipher text and I'm
[00:16:36] of characters in the cipher text and I'm going to define a joint distribution
[00:16:38] going to define a joint distribution over all these by first generating uh
[00:16:42] over all these by first generating uh the the plain text letters according to
[00:16:46] the the plain text letters according to a Markoff uh
[00:16:48] a Markoff uh model via by uh start and a bunch of
[00:16:52] model via by uh start and a bunch of Transitions and then for each plain text
[00:16:56] Transitions and then for each plain text uh letter I'm going to generate a cipher
[00:16:58] uh letter I'm going to generate a cipher text letter
[00:16:59] text letter via some
[00:17:01] via some emission so the parameters of the hmm
[00:17:04] emission so the parameters of the hmm remember are the probability of start
[00:17:06] remember are the probability of start the probability of transition and the
[00:17:08] the probability of transition and the probability of
[00:17:10] probability of emission okay so
[00:17:12] emission okay so intuitively the transitions are going to
[00:17:14] intuitively the transitions are going to capture kind of the cohesion of plain
[00:17:16] capture kind of the cohesion of plain text because it's actually supposed to
[00:17:18] text because it's actually supposed to be readable and has structures not
[00:17:20] be readable and has structures not random letters and the emission is going
[00:17:24] random letters and the emission is going to distribution is going to capture the
[00:17:26] to distribution is going to capture the substitution table
[00:17:30] so how are we going to uh um estimate
[00:17:35] so how are we going to uh um estimate this HML okay so first of all we're
[00:17:39] this HML okay so first of all we're going to make some just uh simplifying
[00:17:42] going to make some just uh simplifying choices here but we'll show that it's
[00:17:44] choices here but we'll show that it's kind of sufficient so we're going to set
[00:17:46] kind of sufficient so we're going to set a p start to uniform um you could be a
[00:17:49] a p start to uniform um you could be a little bit more clever but let's it I'm
[00:17:51] little bit more clever but let's it I'm just going to leave it alone just for
[00:17:54] just going to leave it alone just for Simplicity then the transition
[00:17:57] Simplicity then the transition probabilities so these are are uh this
[00:18:00] probabilities so these are are uh this specifies a biogram model over
[00:18:02] specifies a biogram model over characters and this is this model tells
[00:18:06] characters and this is this model tells you what looks like uh English or not
[00:18:10] you what looks like uh English or not and the really cool thing about this is
[00:18:12] and the really cool thing about this is that if we know the plain text is
[00:18:14] that if we know the plain text is supposed to be English we can just go
[00:18:16] supposed to be English we can just go and grab a ton of English and estimate
[00:18:19] and grab a ton of English and estimate uh a distribution over that text and
[00:18:22] uh a distribution over that text and that gives us PTR we don't even look at
[00:18:25] that gives us PTR we don't even look at the cipher
[00:18:26] the cipher text and then finally the key
[00:18:29] text and then finally the key uh part is that the emission
[00:18:32] uh part is that the emission distribution is the substitution table
[00:18:35] distribution is the substitution table and that's what we're going to estimate
[00:18:37] and that's what we're going to estimate from em so notice that P emission is
[00:18:41] from em so notice that P emission is actually more General than a
[00:18:42] actually more General than a substitution it says for every plain
[00:18:45] substitution it says for every plain text character I can actually generate a
[00:18:48] text character I can actually generate a distribution over Cipher text letters
[00:18:52] distribution over Cipher text letters whereas a substitution table says
[00:18:54] whereas a substitution table says there's exactly one and this is more out
[00:18:56] there's exactly one and this is more out of convenience because makes
[00:18:59] of convenience because makes optimization easier um but in principle
[00:19:02] optimization easier um but in principle you can also think about pit as being
[00:19:04] you can also think about pit as being constrained to just a onetoone
[00:19:07] constrained to just a onetoone mapping okay so why do we think that
[00:19:11] mapping okay so why do we think that this will work
[00:19:13] this will work intuitively well so the transition
[00:19:17] intuitively well so the transition distributions which we've already
[00:19:18] distributions which we've already estimated on English is going to favor
[00:19:20] estimated on English is going to favor plain text that look like English while
[00:19:23] plain text that look like English while the emission distribution is going to
[00:19:25] the emission distribution is going to try to favor consistent character
[00:19:27] try to favor consistent character substitutions so we don't want it to be
[00:19:29] substitutions so we don't want it to be the case that a maps to a t here and a v
[00:19:33] the case that a maps to a t here and a v here and a f there we want some
[00:19:35] here and a f there we want some consistency and by having this emission
[00:19:37] consistency and by having this emission distribution and maximizing likelihood
[00:19:40] distribution and maximizing likelihood it's going to try to encourage that kind
[00:19:42] it's going to try to encourage that kind of consistency so we have these two
[00:19:44] of consistency so we have these two forces kind of at play each other with
[00:19:47] forces kind of at play each other with each other while we're trying to
[00:19:48] each other while we're trying to estimate both the hidden variables and
[00:19:51] estimate both the hidden variables and the
[00:19:54] parameters okay so
[00:19:56] parameters okay so um let's try to actually step into the
[00:20:00] um let's try to actually step into the EM algorithm and say what kind of
[00:20:02] EM algorithm and say what kind of computations are needed to estimate this
[00:20:05] computations are needed to estimate this hmm so in the Eep what I need to do is
[00:20:09] hmm so in the Eep what I need to do is to compute um the distribution over the
[00:20:13] to compute um the distribution over the hidden variables condition on the
[00:20:17] hidden variables condition on the observations and to do that we
[00:20:19] observations and to do that we introduced the forward backward
[00:20:21] introduced the forward backward algorithm a while back and for ofac
[00:20:24] algorithm a while back and for ofac algorithm is Computing these smoothing
[00:20:26] algorithm is Computing these smoothing queries which is exactly what's the
[00:20:29] queries which is exactly what's the probability of a plain text letter being
[00:20:32] probability of a plain text letter being a particular value H um given the cipher
[00:20:37] a particular value H um given the cipher text that we uh
[00:20:40] text that we uh observe and I'm going to uh do this for
[00:20:44] observe and I'm going to uh do this for each position in this in this text in
[00:20:48] each position in this in this text in the cipher text and every potential
[00:20:51] the cipher text and every potential character so I'm going to Define QI of H
[00:20:54] character so I'm going to Define QI of H to be this um probability
[00:20:59] to be this um probability so this is my best guess at at a
[00:21:01] so this is my best guess at at a particular um location what what do I
[00:21:04] particular um location what what do I think the the plain text character
[00:21:07] think the the plain text character is so now given these guesses the M step
[00:21:11] is so now given these guesses the M step is going to reestimate the substitution
[00:21:13] is going to reestimate the substitution table or the mission distribution so I'm
[00:21:16] table or the mission distribution so I'm going to count a fractional count and
[00:21:19] going to count a fractional count and normalize for all the characters um e
[00:21:22] normalize for all the characters um e and H okay so um for every possible um
[00:21:29] and H okay so um for every possible um plain text letter and every Cipher text
[00:21:32] plain text letter and every Cipher text letter I'm going to look at all the
[00:21:35] letter I'm going to look at all the positions where the cipher text was
[00:21:38] positions where the cipher text was actually e and I'm going to add this
[00:21:42] actually e and I'm going to add this probability or weight UI of
[00:21:46] probability or weight UI of H okay so this is going to tell me how
[00:21:49] H okay so this is going to tell me how many times on in expectation we believe
[00:21:52] many times on in expectation we believe that a particular plain text letter and
[00:21:55] that a particular plain text letter and a particular Cipher text letter are uh
[00:21:59] a particular Cipher text letter are uh together and now I'm just going to
[00:22:02] together and now I'm just going to normalize this distribution so P emit of
[00:22:05] normalize this distribution so P emit of a cipher text letter given a PL text
[00:22:06] a cipher text letter given a PL text letter is proportional to this count
[00:22:09] letter is proportional to this count emit of H and
[00:22:14] E okay so that's it and we just run the
[00:22:17] E okay so that's it and we just run the EM algorithm and we hope for the best
[00:22:20] EM algorithm and we hope for the best okay so just to make this a little bit
[00:22:21] okay so just to make this a little bit more exciting I'm going to try to code
[00:22:25] more exciting I'm going to try to code this up in Python so we can see it an
[00:22:27] this up in Python so we can see it an action all right so a few things first
[00:22:30] action all right so a few things first so here is our Cipher text uh you
[00:22:33] so here is our Cipher text uh you shouldn't be able to read this and we
[00:22:35] shouldn't be able to read this and we but we're going to try to decipher this
[00:22:38] but we're going to try to decipher this um and we also have this lm. train which
[00:22:43] um and we also have this lm. train which is this uh form called large amount of
[00:22:46] is this uh form called large amount of um English text uh that we can draw
[00:22:51] um English text uh that we can draw from okay so um I'm going to um we also
[00:22:55] from okay so um I'm going to um we also have this utility file which I'll just
[00:22:57] have this utility file which I'll just review so it allows you to read text um
[00:23:00] review so it allows you to read text um we're going to convert text into a
[00:23:02] we're going to convert text into a sequence of integers just for um
[00:23:05] sequence of integers just for um Simplicity and uh we also importantly
[00:23:08] Simplicity and uh we also importantly have implemented this forward backward
[00:23:10] have implemented this forward backward algorithm which is going to take a
[00:23:11] algorithm which is going to take a sequence of observations and the
[00:23:14] sequence of observations and the parameters of hmm and it's going to
[00:23:17] parameters of hmm and it's going to return Q which is a two-dimensional
[00:23:20] return Q which is a two-dimensional array uh where it's Qi for each position
[00:23:24] array uh where it's Qi for each position we have a distribution over um possible
[00:23:28] we have a distribution over um possible values of age
[00:23:29] values of age five okay so let's decipher let's
[00:23:35] five okay so let's decipher let's decipher some um some Cipher text okay
[00:23:39] decipher some um some Cipher text okay so import etail so I'm going to uh
[00:23:45] so import etail so I'm going to uh declared K to be the number of uh
[00:23:48] declared K to be the number of uh characters so this is a lowercase
[00:23:51] characters so this is a lowercase letters up plus space um so I've
[00:23:54] letters up plus space um so I've normalized the text um the first thing I
[00:23:58] normalized the text um the first thing I want to do is initialize the
[00:24:00] want to do is initialize the hmm so remember the parameters of hmm I
[00:24:03] hmm so remember the parameters of hmm I have start um
[00:24:06] have start um probabilities so this is going to be P
[00:24:09] probabilities so this is going to be P start of H and I'm just going to set
[00:24:12] start of H and I'm just going to set this to the uniform distribution and all
[00:24:14] this to the uniform distribution and all the DAT so start probs equals 1 / k
[00:24:19] the DAT so start probs equals 1 / k for um H and range uh
[00:24:24] for um H and range uh K so that's going to be just the uniform
[00:24:27] K so that's going to be just the uniform distribution
[00:24:32] okay so now what about the transition
[00:24:35] okay so now what about the transition probabilities so transition probably
[00:24:37] probabilities so transition probably goes from H1 to
[00:24:38] goes from H1 to H2 um and this is p of H2 given H1 so
[00:24:43] H2 um and this is p of H2 given H1 so not the order is Switched here because I
[00:24:46] not the order is Switched here because I want transition problems of each one to
[00:24:48] want transition problems of each one to be actually an array which specifies the
[00:24:50] be actually an array which specifies the distribution ARR
[00:24:52] distribution ARR two um so here we're going
[00:24:55] two um so here we're going to uh estimate this from um plain
[00:25:01] to uh estimate this from um plain text so I'm going to have raw text this
[00:25:05] text so I'm going to have raw text this is I'm going to read it from lm.
[00:25:08] is I'm going to read it from lm. Trin uh which we saw earlier and I'm
[00:25:11] Trin uh which we saw earlier and I'm going to convert it into a integer
[00:25:13] going to convert it into a integer sequence okay so let's see what that
[00:25:17] sequence okay so let's see what that looks like so that's just a sequence of
[00:25:20] looks like so that's just a sequence of integers um okay so now I'm going to
[00:25:25] integers um okay so now I'm going to estimate uh pans from this R text
[00:25:29] estimate uh pans from this R text so this is actually going to be just a
[00:25:32] so this is actually going to be just a standard um fully observable estimation
[00:25:35] standard um fully observable estimation problem um so I'm going to look over all
[00:25:41] problem um so I'm going to look over all um
[00:25:43] um positions um starting from one to the
[00:25:46] positions um starting from one to the end and then I'm going to Define
[00:25:50] end and then I'm going to Define um look at H1 and H2 to be consecutive
[00:25:55] um look at H1 and H2 to be consecutive or uh
[00:25:57] or uh consecutive um characters in this raw
[00:26:00] consecutive um characters in this raw character
[00:26:01] character sequence and then I'm going to increment
[00:26:05] sequence and then I'm going to increment a counter so I'm going to define
[00:26:07] a counter so I'm going to define transition counts to be uh for each H1
[00:26:12] transition counts to be uh for each H1 in range uh K and then for each H2 in
[00:26:18] in range uh K and then for each H2 in range
[00:26:19] range K I'm going to have a
[00:26:22] K I'm going to have a zero okay so this is going to be a k byk
[00:26:25] zero okay so this is going to be a k byk zero Matrix
[00:26:28] zero Matrix um and then I'm going to just increment
[00:26:30] um and then I'm going to just increment this count Sol this count once and then
[00:26:34] this count Sol this count once and then I'm going to normalize so the way I'm
[00:26:37] I'm going to normalize so the way I'm going to normalize is I'm going to
[00:26:38] going to normalize is I'm going to define transition probabilities to be
[00:26:41] define transition probabilities to be for each
[00:26:43] for each H1 I'm going to call uh normalize on
[00:26:49] H1 I'm going to call uh normalize on transition
[00:26:50] transition counts counts of
[00:26:53] counts counts of H1 okay so for every H1 uh this gives me
[00:26:57] H1 okay so for every H1 uh this gives me a distribution over 2 uh if I normalize
[00:27:00] a distribution over 2 uh if I normalize it that's going to be my transition
[00:27:03] it that's going to be my transition probability I'm done with transition
[00:27:05] probability I'm done with transition probabilities so what about emission
[00:27:10] probabilities so what about emission probabilities um so here I have for
[00:27:13] probabilities um so here I have for every H I have a distribution over e so
[00:27:17] every H I have a distribution over e so this is going to be just to write it out
[00:27:18] this is going to be just to write it out in our mathematical language this is
[00:27:20] in our mathematical language this is going to be uh e given
[00:27:23] going to be uh e given H okay so here I just want to initialize
[00:27:27] H okay so here I just want to initialize it to um the uniform
[00:27:31] it to um the uniform distribution uh so just to document this
[00:27:33] distribution uh so just to document this a little bit more so we have the uniform
[00:27:35] a little bit more so we have the uniform distribution this is done estimate this
[00:27:38] distribution this is done estimate this one play text done um and this is you no
[00:27:42] one play text done um and this is you no we're going to estimate this this is
[00:27:44] we're going to estimate this this is just an
[00:27:46] just an initialization okay so I'm going to
[00:27:48] initialization okay so I'm going to initialize this to um for each H
[00:27:52] initialize this to um for each H in the domain of H um for each e in
[00:27:58] in the domain of H um for each e in the similar domain I'm just going to
[00:28:01] the similar domain I'm just going to have a one over
[00:28:03] have a one over K okay so this is a uniform
[00:28:06] K okay so this is a uniform distribution uh and now I'm going to run
[00:28:10] distribution uh and now I'm going to run em to estimate only this uh emission
[00:28:15] em to estimate only this uh emission from
[00:28:16] from buil okay so let's harder so to run Em
[00:28:21] buil okay so let's harder so to run Em I'm going to load my Cipher text in so
[00:28:24] I'm going to load my Cipher text in so observations equals um read the text
[00:28:28] observations equals um read the text Cipher text and then I'm going to
[00:28:30] Cipher text and then I'm going to convert this into integer
[00:28:36] sequence okay so now I'm going to
[00:28:39] sequence okay so now I'm going to iterate a number of times let's just
[00:28:41] iterate a number of times let's just call it 200
[00:28:43] call it 200 um and then I'm going to do uh the Eep
[00:28:48] um and then I'm going to do uh the Eep and the M step okay so what happens in
[00:28:52] and the M step okay so what happens in the Eep um I'm going to use my current
[00:28:56] the Eep um I'm going to use my current setting of parameters to guess
[00:28:59] setting of parameters to guess at what uh the pl text is so I'm going
[00:29:02] at what uh the pl text is so I'm going to run forward
[00:29:05] to run forward backward on the observations and pass in
[00:29:08] backward on the observations and pass in the parameters
[00:29:11] the parameters of the
[00:29:16] hmm
[00:29:18] hmm larger okay and this is going to return
[00:29:21] larger okay and this is going to return Q just to know that Q of H equals in
[00:29:25] Q just to know that Q of H equals in mathematical notation this is the
[00:29:27] mathematical notation this is the probability
[00:29:29] probability at uh that hi equal H um given the
[00:29:35] at uh that hi equal H um given the evidence which is observations
[00:29:38] evidence which is observations here print out best guess so far so
[00:29:41] here print out best guess so far so let's see how we're doing we're going to
[00:29:42] let's see how we're doing we're going to do this at each
[00:29:44] do this at each iteration um so to do this so for
[00:29:48] iteration um so to do this so for every um let's define n equals the
[00:29:51] every um let's define n equals the length of uh the number of observations
[00:29:54] length of uh the number of observations here so for each um
[00:29:59] here so for each um position I'm going to look at Qi so this
[00:30:02] position I'm going to look at Qi so this gives me a distribution over H and I'm
[00:30:06] gives me a distribution over H and I'm going to take the one that has the
[00:30:08] going to take the one that has the highest
[00:30:11] highest probability so then I'm going to convert
[00:30:14] probability so then I'm going to convert this to um
[00:30:19] string and print it
[00:30:24] out
[00:30:26] out okay and now that finally the M step is
[00:30:31] okay and now that finally the M step is we're just going to count and normalize
[00:30:34] we're just going to count and normalize here so I'm going to define a new
[00:30:38] here so I'm going to define a new temporary variable which is emission
[00:30:40] temporary variable which is emission counts and this is going to be
[00:30:44] counts and this is going to be um let me just actually cheat a little
[00:30:47] um let me just actually cheat a little bit and I'm going to call emission
[00:30:49] bit and I'm going to call emission counts to be zero for um the same
[00:30:54] counts to be zero for um the same dimensionality as Mission props this is
[00:30:57] dimensionality as Mission props this is a matrix of zero
[00:31:00] okay so now we're going to up go through
[00:31:03] okay so now we're going to up go through um each position here I in range of of N
[00:31:09] um each position here I in range of of N and for each position what are the
[00:31:12] and for each position what are the possible values it can take so um that's
[00:31:15] possible values it can take so um that's going to be H and I'm going to
[00:31:18] going to be H and I'm going to upate the emission
[00:31:20] upate the emission counts of um so emission remember is h e
[00:31:23] counts of um so emission remember is h e so H and E is going to be observations
[00:31:27] so H and E is going to be observations of I
[00:31:28] of I plus equal to q i h so this is probably
[00:31:34] plus equal to q i h so this is probably the most uh important line here um so
[00:31:39] the most uh important line here um so remember qih is what is the weight on um
[00:31:45] remember qih is what is the weight on um a particular H at position I and then um
[00:31:51] a particular H at position I and then um emission counts is going to be h of that
[00:31:54] emission counts is going to be h of that particular observation and I'm going to
[00:31:56] particular observation and I'm going to just update that count
[00:31:59] just update that count okay so now all I need to do is
[00:32:01] okay so now all I need to do is normalize so emission probabilities is
[00:32:05] normalize so emission probabilities is for each um possible value of H I'm
[00:32:09] for each um possible value of H I'm going to
[00:32:11] going to normalize emission counts of
[00:32:16] normalize emission counts of H okay so that's it so just to uh review
[00:32:21] H okay so that's it so just to uh review this briefly so I first initialize the
[00:32:24] this briefly so I first initialize the hmm the starting probabilities are just
[00:32:27] hmm the starting probabilities are just uniform
[00:32:28] uniform and then I'm going to um estimate the
[00:32:31] and then I'm going to um estimate the transition probabilities in a fully
[00:32:33] transition probabilities in a fully supervised way from plain text um where
[00:32:36] supervised way from plain text um where I just simply count and normalize and
[00:32:40] I just simply count and normalize and then I'm going to initialize the mission
[00:32:42] then I'm going to initialize the mission probabilities to just uniform for now
[00:32:45] probabilities to just uniform for now and I'm going to run the EM algorithm to
[00:32:49] and I'm going to run the EM algorithm to actually um update these emission
[00:32:53] actually um update these emission probability okay so I read in
[00:32:55] probability okay so I read in observations and then I'm going to to
[00:32:58] observations and then I'm going to to iterate between the Eep and the mstep
[00:33:01] iterate between the Eep and the mstep where in the Eep I run forward backward
[00:33:03] where in the Eep I run forward backward to compute the distribution over
[00:33:08] to compute the distribution over particular possible values of H at each
[00:33:11] particular possible values of H at each position and print out my best guess and
[00:33:14] position and print out my best guess and then I'm going to count and
[00:33:17] then I'm going to count and normalize all right so let's see how
[00:33:20] normalize all right so let's see how this uh does so
[00:33:23] this uh does so decipher
[00:33:25] decipher y so at each pep it's going to print out
[00:33:28] y so at each pep it's going to print out its best guess and over time you can see
[00:33:31] its best guess and over time you can see that this jumble of letters is going to
[00:33:33] that this jumble of letters is going to slowly evolve as em is trying to figure
[00:33:36] slowly evolve as em is trying to figure out both the plain text as well as the
[00:33:39] out both the plain text as well as the substitution table so this isn't going
[00:33:42] substitution table so this isn't going to be perfect because we've used a
[00:33:44] to be perfect because we've used a fairly simple model and we don't have
[00:33:46] fairly simple model and we don't have much data but you can see some structure
[00:33:48] much data but you can see some structure emerging so so I W my woke alone without
[00:33:53] emerging so so I W my woke alone without so that's a real word any one that I
[00:33:57] so that's a real word any one that I could
[00:33:59] could really and so on um and plain um there's
[00:34:03] really and so on um and plain um there's probably something um and so on okay so
[00:34:08] probably something um and so on okay so just for comparison this is actually a
[00:34:11] just for comparison this is actually a plain text so this is a a little passage
[00:34:15] plain text so this is a a little passage from the little prints so I live my life
[00:34:18] from the little prints so I live my life alone without anyone that I could really
[00:34:22] alone without anyone that I could really uh talk to until I had an accident with
[00:34:26] uh talk to until I had an accident with my plan so definitely far far from
[00:34:28] my plan so definitely far far from perfect but given that we just did it in
[00:34:31] perfect but given that we just did it in 10 minutes it's maybe not
[00:34:34] 10 minutes it's maybe not bad okay so let me summarize we've
[00:34:37] bad okay so let me summarize we've presented the EM algorithm for
[00:34:38] presented the EM algorithm for estimating the parameters of aasan
[00:34:40] estimating the parameters of aasan network when there are unobserved
[00:34:42] network when there are unobserved variables so the overarching principle
[00:34:45] variables so the overarching principle is that of Maximum marginal likelihood
[00:34:47] is that of Maximum marginal likelihood we're going to find the parameters such
[00:34:50] we're going to find the parameters such that that drives up the probability of
[00:34:53] that that drives up the probability of the variables that we did observe as
[00:34:55] the variables that we did observe as much as possible
[00:34:58] much as possible so the EM algorithm is going to optimize
[00:35:02] so the EM algorithm is going to optimize the marginal likelihood objective but
[00:35:05] the marginal likelihood objective but fundamentally it's a chicken and egg
[00:35:07] fundamentally it's a chicken and egg problem just like in kme right we don't
[00:35:09] problem just like in kme right we don't know the hidden variables and we also
[00:35:11] know the hidden variables and we also don't know the parameters so what we're
[00:35:14] don't know the parameters so what we're going to do is to iterate between one
[00:35:17] going to do is to iterate between one and the other so in the Eep we're going
[00:35:20] and the other so in the Eep we're going to perform problemistic inference given
[00:35:22] to perform problemistic inference given a fixed set of parameters to produce my
[00:35:25] a fixed set of parameters to produce my our best guess over what some of the
[00:35:28] our best guess over what some of the Hidden variables are and then in the M
[00:35:30] Hidden variables are and then in the M step we're going to use these these
[00:35:34] step we're going to use these these probabilities as weights of examples and
[00:35:37] probabilities as weights of examples and then we're just going to count and
[00:35:38] then we're just going to count and normalize to parameters and then we go
[00:35:42] normalize to parameters and then we go estimate the hidden variables and
[00:35:43] estimate the hidden variables and estimate the parameters and so on and so
[00:35:46] estimate the parameters and so on and so forth so finally once you've learned
[00:35:49] forth so finally once you've learned your beija network you can go off and
[00:35:51] your beija network you can go off and perform inference and answer all sorts
[00:35:53] perform inference and answer all sorts of questions which could involve asking
[00:35:56] of questions which could involve asking about these on observe variables that
[00:35:59] about these on observe variables that you didn't see on new test examples or
[00:36:01] you didn't see on new test examples or it could be used to ask questions about
[00:36:04] it could be used to ask questions about the uh observe
[00:36:06] the uh observe variables um given some other variables
[00:36:09] variables um given some other variables and in general this highlights kind of
[00:36:11] and in general this highlights kind of the flexibility of beian networks just
[00:36:13] the flexibility of beian networks just because you had a certain pattern of
[00:36:16] because you had a certain pattern of missing at training time doesn't mean
[00:36:18] missing at training time doesn't mean you have to commit to that at test
[00:36:21] you have to commit to that at test time so there's many applications of be
[00:36:24] time so there's many applications of be networks including involving the EM
[00:36:27] networks including involving the EM algorithm we looked at decipherment
[00:36:28] algorithm we looked at decipherment where the goal is to infer the plain
[00:36:30] where the goal is to infer the plain text from the cipher text Em could also
[00:36:33] text from the cipher text Em could also be used to reconstruct philogenetic
[00:36:35] be used to reconstruct philogenetic trees given the DNA of modern
[00:36:38] trees given the DNA of modern organisms and it can also be used to
[00:36:41] organisms and it can also be used to infer the unknown label of a data point
[00:36:45] infer the unknown label of a data point where the observations are the possible
[00:36:47] where the observations are the possible noisy labels provided by crowd workers
[00:36:51] noisy labels provided by crowd workers so finally em is the most canonical
[00:36:53] so finally em is the most canonical version of broader class of techniques
[00:36:56] version of broader class of techniques called variational inference
[00:36:58] called variational inference which actually includes things like
[00:36:59] which actually includes things like variational Auto encoders which some of
[00:37:01] variational Auto encoders which some of you might have heard of um in that case
[00:37:05] you might have heard of um in that case the Q is actually the encoder and it's
[00:37:09] the Q is actually the encoder and it's given by a neural network and the
[00:37:11] given by a neural network and the decoder is the beijan network so there's
[00:37:15] decoder is the beijan network so there's a lot more to connections to be explored
[00:37:18] a lot more to connections to be explored and I encourage you to read up on this
[00:37:21] and I encourage you to read up on this by yourself


================================================================================
LECTURE 041
================================================================================

Logic 1 - Overview: Logic Based Models | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=oM5LUGPO7Zk

---

Transcript

[00:00:06] okay hi everyone so this week we are
[00:00:09] okay hi everyone so this week we are going to be talking about the logic so
[00:00:11] going to be talking about the logic so this is our last set of modules and
[00:00:13] this is our last set of modules and we're we're going to switch from
[00:00:15] we're we're going to switch from variable based models and start talking
[00:00:17] variable based models and start talking about logic
[00:00:19] about logic so let's start with a question so let's
[00:00:21] so let's start with a question so let's start with this question if x1 plus x2
[00:00:24] start with this question if x1 plus x2 is equal to 10 and x1 minus x2 is equal
[00:00:27] is equal to 10 and x1 minus x2 is equal to 4 what is x1 okay
[00:00:30] to 4 what is x1 okay so
[00:00:31] so think about this for a few seconds so
[00:00:33] think about this for a few seconds so the way you would go about this is
[00:00:35] the way you would go about this is you'll probably like use the thing that
[00:00:36] you'll probably like use the thing that you've used in algebra and then
[00:00:38] you've used in algebra and then basically like cancel out x2s together
[00:00:41] basically like cancel out x2s together and you would have like 2x1 equal to 14
[00:00:43] and you would have like 2x1 equal to 14 divide 14 by 2 and you'll end up getting
[00:00:46] divide 14 by 2 and you'll end up getting seven right
[00:00:47] seven right another way of looking at this problem
[00:00:49] another way of looking at this problem is you can think of this as a factor
[00:00:50] is you can think of this as a factor graph this is actually a factor graph
[00:00:52] graph this is actually a factor graph and we have these constraints and one
[00:00:54] and we have these constraints and one way of solving this is to to go and do
[00:00:56] way of solving this is to to go and do backtracking search and then actually
[00:00:58] backtracking search and then actually try to figure out the satisfying
[00:01:00] try to figure out the satisfying assignment there but but the problem
[00:01:02] assignment there but but the problem there is like that might not be the most
[00:01:04] there is like that might not be the most efficient way of doing it and kind of
[00:01:06] efficient way of doing it and kind of like the trick that you've learned in
[00:01:07] like the trick that you've learned in algebra is is probably a more efficient
[00:01:09] algebra is is probably a more efficient way of dealing with this question so
[00:01:12] way of dealing with this question so that is kind of like a motivation for
[00:01:14] that is kind of like a motivation for why we are talking about logic could we
[00:01:16] why we are talking about logic could we do logical inferences in a way that
[00:01:18] do logical inferences in a way that makes our lives much easier and then
[00:01:21] makes our lives much easier and then allow us to talk about expressions much
[00:01:23] allow us to talk about expressions much more concisely and allow us to move
[00:01:25] more concisely and allow us to move around symbols and and come up with
[00:01:27] around symbols and and come up with decisions come up with solutions based
[00:01:30] decisions come up with solutions based on these sort of logical inferences so
[00:01:32] on these sort of logical inferences so have this have this example in your mind
[00:01:34] have this have this example in your mind throughout the lecture because we're
[00:01:36] throughout the lecture because we're using similar type of ideas when we talk
[00:01:38] using similar type of ideas when we talk about logic and then doing inference in
[00:01:40] about logic and then doing inference in logic okay
[00:01:42] logic okay so if you remember our course plan right
[00:01:44] so if you remember our course plan right we started with machine learning and
[00:01:46] we started with machine learning and then we talked about in general reflex
[00:01:47] then we talked about in general reflex basement and reflex based models where
[00:01:50] basement and reflex based models where we have a low level of intelligence and
[00:01:52] we have a low level of intelligence and we started adding to to these levels of
[00:01:54] we started adding to to these levels of intelligence thinking about state-based
[00:01:56] intelligence thinking about state-based models and then thinking about variable
[00:01:58] models and then thinking about variable based models and finally we are we are
[00:02:00] based models and finally we are we are at logic when we are talking about a
[00:02:02] at logic when we are talking about a higher level of intelligence and
[00:02:04] higher level of intelligence and expressivity than when we think about
[00:02:06] expressivity than when we think about aiai systems and just taking a step back
[00:02:09] aiai systems and just taking a step back and thinking about the paradigms that
[00:02:11] and thinking about the paradigms that we've used in this class we started
[00:02:13] we've used in this class we started thinking about learning and modeling and
[00:02:15] thinking about learning and modeling and inference so the idea was we have some
[00:02:17] inference so the idea was we have some sort of data from that data we're going
[00:02:19] sort of data from that data we're going to learn a model we're going to learn
[00:02:22] to learn a model we're going to learn this representation and then we're going
[00:02:24] this representation and then we're going to be able to do inference on that model
[00:02:26] to be able to do inference on that model so once we have a model once we have an
[00:02:28] so once we have a model once we have an mvp
[00:02:29] mvp once we have a search problem we can
[00:02:30] once we have a search problem we can basically ask questions and that's
[00:02:32] basically ask questions and that's inference right you can basically ask
[00:02:34] inference right you can basically ask questions and infer an answer and and
[00:02:36] questions and infer an answer and and that allows us to think about different
[00:02:38] that allows us to think about different types of inference algorithms okay
[00:02:41] types of inference algorithms okay so examples of that is we talked about
[00:02:43] so examples of that is we talked about search problems so when we have a search
[00:02:45] search problems so when we have a search problem like the objectives inference
[00:02:47] problem like the objectives inference problems that we're thinking about was
[00:02:49] problems that we're thinking about was finding a minimum cost path or we were
[00:02:51] finding a minimum cost path or we were talking about mdps so in mvps for
[00:02:54] talking about mdps so in mvps for example or games you were thinking about
[00:02:55] example or games you were thinking about maximum value policies or or we looked
[00:02:58] maximum value policies or or we looked at csps or bayesian networks where we're
[00:03:01] at csps or bayesian networks where we're looking at basically like what is the
[00:03:03] looking at basically like what is the probability of some query conditioned on
[00:03:05] probability of some query conditioned on some sort of evidence so so
[00:03:07] some sort of evidence so so these are some examples of inference
[00:03:09] these are some examples of inference questions inference problems that we
[00:03:11] questions inference problems that we have looked at
[00:03:12] have looked at throughout the throughout the different
[00:03:14] throughout the throughout the different lectures and modules that we have seen
[00:03:17] lectures and modules that we have seen in in modeling so when you think about
[00:03:19] in in modeling so when you think about modeling paradigms and we had when we
[00:03:21] modeling paradigms and we had when we have state-based models we thought about
[00:03:23] have state-based models we thought about search problems mvps and games and and
[00:03:26] search problems mvps and games and and we basically thought about these in
[00:03:28] we basically thought about these in terms of states actions and and costs
[00:03:30] terms of states actions and and costs right so those were kind of the main
[00:03:32] right so those were kind of the main core elements that would come in into
[00:03:35] core elements that would come in into our modeling when we think when we were
[00:03:36] our modeling when we think when we were thinking about state-based models and
[00:03:38] thinking about state-based models and applications of that were things of the
[00:03:40] applications of that were things of the form of route finding or playing games
[00:03:44] form of route finding or playing games and then when we started thinking about
[00:03:45] and then when we started thinking about variable based models we started
[00:03:47] variable based models we started defining this idea of variables and
[00:03:49] defining this idea of variables and factors and constraints between them and
[00:03:52] factors and constraints between them and then we talked about csps we talked
[00:03:53] then we talked about csps we talked about bayesian networks markov networks
[00:03:56] about bayesian networks markov networks and applications of that were things
[00:03:58] and applications of that were things that that that was easier to think about
[00:04:00] that that that was easier to think about in terms of variables so we talked about
[00:04:02] in terms of variables so we talked about scheduling or tracking or medical
[00:04:04] scheduling or tracking or medical diagnosis where we have uh dependencies
[00:04:07] diagnosis where we have uh dependencies between these different variables and
[00:04:09] between these different variables and that was in variable based models so so
[00:04:12] that was in variable based models so so in this week what we want to do is we
[00:04:13] in this week what we want to do is we want to talk about logic based models
[00:04:16] want to talk about logic based models and in logic based models
[00:04:18] and in logic based models similar to state-based models and
[00:04:20] similar to state-based models and variable-based models we're going to
[00:04:21] variable-based models we're going to define a few different types of logic
[00:04:23] define a few different types of logic that we are going to be using so
[00:04:25] that we are going to be using so specifically we are going to be talking
[00:04:27] specifically we are going to be talking about prepositional logic and first
[00:04:29] about prepositional logic and first order logic and we're going to think in
[00:04:32] order logic and we're going to think in terms of logical formulas and in
[00:04:34] terms of logical formulas and in manipulating these logical formulas and
[00:04:36] manipulating these logical formulas and how we can infer new formulas from them
[00:04:39] how we can infer new formulas from them so specifically how we think about
[00:04:41] so specifically how we think about inference rules
[00:04:42] inference rules and what are some applications of logic
[00:04:44] and what are some applications of logic where logic shows up in a variety of
[00:04:46] where logic shows up in a variety of applications starting from theory
[00:04:47] applications starting from theory improving hardware and software
[00:04:49] improving hardware and software verification uh and also in general like
[00:04:52] verification uh and also in general like reasoning it's a core element of
[00:04:54] reasoning it's a core element of reasoning in artificial intelligence
[00:04:57] reasoning in artificial intelligence so historically if you think about logic
[00:04:59] so historically if you think about logic kind of like back like old ai was very
[00:05:02] kind of like back like old ai was very highly dependent on logic so logic was
[00:05:04] highly dependent on logic so logic was dominant in ai like before 1990s so the
[00:05:08] dominant in ai like before 1990s so the same sort of excitement the same sort of
[00:05:10] same sort of excitement the same sort of hype that is around deep learning today
[00:05:12] hype that is around deep learning today that same sort of hype was around logic
[00:05:15] that same sort of hype was around logic before 1990s and that was kind of like
[00:05:17] before 1990s and that was kind of like the core of ai people were thinking
[00:05:19] the core of ai people were thinking logic is going to give us really
[00:05:20] logic is going to give us really understanding of artificial intelligence
[00:05:23] understanding of artificial intelligence and developing artificial intelligence
[00:05:25] and developing artificial intelligence that could that could really achieve
[00:05:26] that could that could really achieve things that humans can
[00:05:28] things that humans can but but that didn't really pan out and
[00:05:30] but but that didn't really pan out and the reason it didn't pan out was logic
[00:05:32] the reason it didn't pan out was logic had a few problems so so the first
[00:05:34] had a few problems so so the first problem was logic was deterministic
[00:05:37] problem was logic was deterministic right and and it it couldn't really
[00:05:39] right and and it it couldn't really handle uncertainty and that gave rise to
[00:05:42] handle uncertainty and that gave rise to two things of the form of probabilistic
[00:05:43] two things of the form of probabilistic and friends and and in general
[00:05:45] and friends and and in general understanding probabilities and adding
[00:05:47] understanding probabilities and adding uncertainty on top of logic or
[00:05:49] uncertainty on top of logic or developing models that can capture
[00:05:50] developing models that can capture uncertainty beyond logic
[00:05:53] uncertainty beyond logic the second problem with logic based
[00:05:55] the second problem with logic based models where they were very rule-based
[00:05:57] models where they were very rule-based and and they wouldn't allow like
[00:05:59] and and they wouldn't allow like fine-tuning based on data so so because
[00:06:01] fine-tuning based on data so so because of that they they were very brittle so
[00:06:03] of that they they were very brittle so if i have new data that comes in and
[00:06:06] if i have new data that comes in and tells me something else then that
[00:06:07] tells me something else then that rule-based model is not going to be able
[00:06:09] rule-based model is not going to be able to capture that and it's really hard to
[00:06:11] to capture that and it's really hard to incorporate information coming from new
[00:06:14] incorporate information coming from new data and again that gives rise to
[00:06:15] data and again that gives rise to machine learning and this idea of
[00:06:17] machine learning and this idea of data-driven models and looking at data
[00:06:20] data-driven models and looking at data and being able to be being able to learn
[00:06:22] and being able to be being able to learn new models and being able to do in
[00:06:24] new models and being able to do in france from that perspective okay
[00:06:26] france from that perspective okay but the so so these are kind of like the
[00:06:28] but the so so these are kind of like the problem one and two are weaknesses of
[00:06:30] problem one and two are weaknesses of logic but in general logic has some sort
[00:06:33] logic but in general logic has some sort of strength that some of our models
[00:06:35] of strength that some of our models today like some some of the
[00:06:36] today like some some of the state-of-the-art models so they don't
[00:06:38] state-of-the-art models so they don't really have that and and i think there
[00:06:39] really have that and and i think there is really an opportunity here to use
[00:06:42] is really an opportunity here to use ideas from logic in some of the more
[00:06:43] ideas from logic in some of the more modern machine learning systems or some
[00:06:45] modern machine learning systems or some of the more modern ai systems and the
[00:06:47] of the more modern ai systems and the strength of logic is expressivity so so
[00:06:51] strength of logic is expressivity so so we're going to be talking about this
[00:06:53] we're going to be talking about this throughout this week in general but but
[00:06:55] throughout this week in general but but the nice thing that logic gives us is it
[00:06:57] the nice thing that logic gives us is it provides a compact way of expressing
[00:07:00] provides a compact way of expressing models expressing representations that
[00:07:03] models expressing representations that that we wouldn't normally be able to get
[00:07:05] that we wouldn't normally be able to get so this compact representation can be
[00:07:07] so this compact representation can be really powerful we could because we
[00:07:08] really powerful we could because we could manipulate that that compact
[00:07:10] could manipulate that that compact representation and that could allow us
[00:07:13] representation and that could allow us to to move in
[00:07:14] to to move in to to be able to infer new ideas and for
[00:07:17] to to be able to infer new ideas and for new rules um and so on and in general
[00:07:19] new rules um and so on and in general that expressivity is is a big strength
[00:07:22] that expressivity is is a big strength of logic and the reason that it is still
[00:07:24] of logic and the reason that it is still around and there is still excitement
[00:07:25] around and there is still excitement around using it
[00:07:28] around using it all right so let me let me motivate
[00:07:30] all right so let me let me motivate logic with an example so we've looked at
[00:07:32] logic with an example so we've looked at this example i think percy shown this
[00:07:33] this example i think percy shown this showed this example during the first
[00:07:35] showed this example during the first lecture where
[00:07:37] lecture where our goal is we want to have a smart
[00:07:39] our goal is we want to have a smart personal assistant so let's say you're
[00:07:41] personal assistant so let's say you're sitting on the beach and this is after
[00:07:42] sitting on the beach and this is after the class and you're on vacation and
[00:07:44] the class and you're on vacation and after covet and and we're sitting on a
[00:07:46] after covet and and we're sitting on a beach and we have a personal assistant
[00:07:49] beach and we have a personal assistant and what we want to do is we want to ask
[00:07:51] and what we want to do is we want to ask our personal assistant maybe it's siri
[00:07:53] our personal assistant maybe it's siri or maybe there's something fancier than
[00:07:55] or maybe there's something fancier than siri and we want to ask our personal
[00:07:57] siri and we want to ask our personal assistant a set of questions or maybe
[00:07:59] assistant a set of questions or maybe maybe you want to tell it some
[00:08:00] maybe you want to tell it some information maybe you want to inform it
[00:08:02] information maybe you want to inform it about something or ask questions from it
[00:08:04] about something or ask questions from it okay so and and let's say we use natural
[00:08:08] okay so and and let's say we use natural language so let's start with natural
[00:08:09] language so let's start with natural language as a medium for talking to this
[00:08:12] language as a medium for talking to this personal assistant okay so so let's look
[00:08:15] personal assistant okay so so let's look at an example here so so this was the
[00:08:17] at an example here so so this was the system so let's say that this is my
[00:08:19] system so let's say that this is my system and i tell my system
[00:08:21] system and i tell my system all students like cs221 okay so
[00:08:27] all students like cs221 okay so i'm telling it some information and then
[00:08:29] i'm telling it some information and then my personal assistant says i learned
[00:08:31] my personal assistant says i learned something okay
[00:08:32] something okay then i can say
[00:08:34] then i can say um bob does not like
[00:08:37] um bob does not like cs221 okay and then it would be like i
[00:08:41] cs221 okay and then it would be like i learned something and now based on this
[00:08:43] learned something and now based on this knowledge that it has like let's call
[00:08:45] knowledge that it has like let's call that knowledge base so based on this
[00:08:47] that knowledge base so based on this knowledge base that it has i can ask
[00:08:49] knowledge base that it has i can ask this personal assistant questions i can
[00:08:51] this personal assistant questions i can ask is
[00:08:52] ask is bob student
[00:08:55] bob student what should what should it answer
[00:08:57] what should what should it answer so if it actually does inference right
[00:08:59] so if it actually does inference right right it should it should answer no
[00:09:01] right it should it should answer no right like if it if it has a set of
[00:09:03] right like if it if it has a set of formulas and based on those it can infer
[00:09:05] formulas and based on those it can infer and it can reason then it should
[00:09:06] and it can reason then it should actually be able to answer answer that
[00:09:08] actually be able to answer answer that question underneath here there are a
[00:09:11] question underneath here there are a bunch of formulas and and there are a
[00:09:13] bunch of formulas and and there are a bunch of inference rules we are going to
[00:09:14] bunch of inference rules we are going to be talking talking about that but we
[00:09:16] be talking talking about that but we could take a look at that and see what
[00:09:18] could take a look at that and see what are the formulas it has access to and
[00:09:20] are the formulas it has access to and what are the things that it is unfair
[00:09:21] what are the things that it is unfair and it is unfairing that that bob is not
[00:09:24] and it is unfairing that that bob is not a not a student here based on the things
[00:09:26] a not a student here based on the things that i've told it well this is kind of
[00:09:28] that i've told it well this is kind of uh this is an environment that we're
[00:09:30] uh this is an environment that we're going to be talking about throughout uh
[00:09:32] going to be talking about throughout uh the lectures and other modules this week
[00:09:34] the lectures and other modules this week all right so
[00:09:36] all right so let's go back here okay
[00:09:38] let's go back here okay so so in general when we think about
[00:09:41] so so in general when we think about having this personal assistant where
[00:09:43] having this personal assistant where we're using natural language or where
[00:09:44] we're using natural language or where we're using where we're using logic the
[00:09:46] we're using where we're using logic the idea is it should be able to digest
[00:09:48] idea is it should be able to digest heterogeneous information and you should
[00:09:51] heterogeneous information and you should also be able to reason deeply about that
[00:09:53] also be able to reason deeply about that information right it can't have just
[00:09:55] information right it can't have just shallow knowledge of that information it
[00:09:57] shallow knowledge of that information it should be able to inference it should be
[00:09:58] should be able to inference it should be able to connect these different pieces
[00:10:00] able to connect these different pieces of information and make logical
[00:10:02] of information and make logical statements based on that makes logical
[00:10:04] statements based on that makes logical move space based on them okay so so why
[00:10:07] move space based on them okay so so why should we use natural language that's a
[00:10:09] should we use natural language that's a good question why natural language uh or
[00:10:12] good question why natural language uh or like anything else okay
[00:10:14] like anything else okay so natural language is kind of nice we
[00:10:17] so natural language is kind of nice we all like speaking natural language i'm
[00:10:18] all like speaking natural language i'm talking to you guys in natural language
[00:10:20] talking to you guys in natural language it's a very nice medium to use when we
[00:10:23] it's a very nice medium to use when we talk to personal assistants or when we
[00:10:26] talk to personal assistants or when we want to we're going to basically express
[00:10:28] want to we're going to basically express what what we would like to say so it's a
[00:10:30] what what we would like to say so it's a very rich medium for expressing what we
[00:10:32] very rich medium for expressing what we want
[00:10:33] want um and and because it is rich we can say
[00:10:35] um and and because it is rich we can say things like i don't know a dime is
[00:10:37] things like i don't know a dime is better than a nickel and and we can say
[00:10:39] better than a nickel and and we can say things like a nickel is better than a
[00:10:41] things like a nickel is better than a penny
[00:10:42] penny and and based on that we could be able
[00:10:45] and and based on that we could be able to make expressions we could make
[00:10:46] to make expressions we could make logical statements based on that and say
[00:10:48] logical statements based on that and say therefore a dime is better than a penny
[00:10:50] therefore a dime is better than a penny which which makes sense
[00:10:52] which which makes sense but the problem with natural language is
[00:10:55] but the problem with natural language is um it's also a little bit slippery like
[00:10:57] um it's also a little bit slippery like um i can start with something that says
[00:11:00] um i can start with something that says a penny is better than nothing that's
[00:11:02] a penny is better than nothing that's okay and then i would have another
[00:11:04] okay and then i would have another statement that just says nothing is
[00:11:06] statement that just says nothing is better than world peace and that's
[00:11:07] better than world peace and that's perfectly fine and putting these two
[00:11:09] perfectly fine and putting these two together i can come up with a logical
[00:11:12] together i can come up with a logical like
[00:11:13] like kind of a statement based on based on
[00:11:15] kind of a statement based on based on what i've seen that a penny is better
[00:11:17] what i've seen that a penny is better than world peace which sounds a little
[00:11:19] than world peace which sounds a little bit weird and not not correct and not
[00:11:22] bit weird and not not correct and not the thing that i actually wanted okay
[00:11:25] the thing that i actually wanted okay so even though natural language is
[00:11:27] so even though natural language is pretty rich
[00:11:28] pretty rich when we are thinking about logical
[00:11:30] when we are thinking about logical statements and making logical uh logical
[00:11:34] statements and making logical uh logical inference
[00:11:35] inference here and following like inference rules
[00:11:37] here and following like inference rules it feels like natural language is a
[00:11:39] it feels like natural language is a little bit slippery and you might want
[00:11:40] little bit slippery and you might want to have access to some other type of
[00:11:42] to have access to some other type of language so this language like when we
[00:11:45] language so this language like when we talk about language it doesn't need to
[00:11:46] talk about language it doesn't need to be natural language right language is
[00:11:48] be natural language right language is just a mechanism for expressing things
[00:11:51] just a mechanism for expressing things it's just a way of expressing okay so um
[00:11:54] it's just a way of expressing okay so um natural language is an example of a
[00:11:57] natural language is an example of a language that allows us to express
[00:11:59] language that allows us to express things it's kind of informal
[00:12:01] things it's kind of informal we also have programming languages that
[00:12:03] we also have programming languages that those are kind of formal like we have
[00:12:05] those are kind of formal like we have like python or c spots
[00:12:07] like python or c spots in addition to this we can have logical
[00:12:09] in addition to this we can have logical languages and the nice thing about
[00:12:11] languages and the nice thing about logical languages is that they're formal
[00:12:13] logical languages is that they're formal and we can think about the relationship
[00:12:14] and we can think about the relationship between them and formal formal
[00:12:16] between them and formal formal connections between them but the other
[00:12:18] connections between them but the other nice thing about logical languages is
[00:12:20] nice thing about logical languages is they're actually closer to natural
[00:12:22] they're actually closer to natural language than let's say programming
[00:12:24] language than let's say programming languages because they're declarative so
[00:12:26] languages because they're declarative so so there is like actually a connection
[00:12:28] so there is like actually a connection between natural language and logical
[00:12:30] between natural language and logical languages and in one of the later
[00:12:31] languages and in one of the later modules we're actually going to talk
[00:12:33] modules we're actually going to talk about how we can write expressions in
[00:12:35] about how we can write expressions in first order logic if we have a natural
[00:12:37] first order logic if we have a natural language statement so
[00:12:40] language statement so so in this lecture this week we are
[00:12:42] so in this lecture this week we are going to be talking about two types of
[00:12:43] going to be talking about two types of logical languages propositional logic
[00:12:45] logical languages propositional logic and first order logic
[00:12:48] and first order logic all right so what what is the goal of a
[00:12:51] all right so what what is the goal of a logical language so the goal here is to
[00:12:54] logical language so the goal here is to be able to represent knowledge right you
[00:12:56] be able to represent knowledge right you want to be able to represent knowledge
[00:12:57] want to be able to represent knowledge about the world but that is not the only
[00:12:59] about the world but that is not the only goal in addition to that you want to be
[00:13:01] goal in addition to that you want to be able to reason about that knowledge
[00:13:03] able to reason about that knowledge right it's not just about representing
[00:13:05] right it's not just about representing it's about how we can move how we can
[00:13:07] it's about how we can move how we can make logical statements and how we can
[00:13:09] make logical statements and how we can we can run inference rules and how we
[00:13:11] we can run inference rules and how we can make new statements and reason about
[00:13:13] can make new statements and reason about them so so an example is if i tell you
[00:13:16] them so so an example is if i tell you guys it's it's raining and it is wet you
[00:13:19] guys it's it's raining and it is wet you should be able to reason about that and
[00:13:20] should be able to reason about that and should be able to figure out that well
[00:13:22] should be able to figure out that well it is raining and right are you telling
[00:13:23] it is raining and right are you telling me raining and wet so so it is it is
[00:13:26] me raining and wet so so it is it is definitely like both of those statements
[00:13:28] definitely like both of those statements are true and you should be able to
[00:13:29] are true and you should be able to reason about that and that is the goal
[00:13:31] reason about that and that is the goal of um of a logical language
[00:13:35] of um of a logical language and and when we think about logic
[00:13:37] and and when we think about logic uh we have kind of three main
[00:13:39] uh we have kind of three main ingredients for logic i'm going to go
[00:13:41] ingredients for logic i'm going to go into these details a little bit more in
[00:13:44] into these details a little bit more in our first module but let me just give
[00:13:46] our first module but let me just give you a quick overview so we're going to
[00:13:48] you a quick overview so we're going to have syntax
[00:13:49] have syntax and syntax basically is tells us like
[00:13:52] and syntax basically is tells us like are the symbols of of that language so
[00:13:54] are the symbols of of that language so so basically it defines a set of valid
[00:13:57] so basically it defines a set of valid formulas so so syntax here for example
[00:14:01] formulas so so syntax here for example in a logical formula in propositional
[00:14:03] in a logical formula in propositional logic syntax could be rain and wet okay
[00:14:06] logic syntax could be rain and wet okay so when i write rain and red here this
[00:14:10] so when i write rain and red here this ends in the syntax land doesn't have any
[00:14:13] ends in the syntax land doesn't have any meanings it's just a symbol it's just
[00:14:15] meanings it's just a symbol it's just like this shape okay and then reign into
[00:14:17] like this shape okay and then reign into it they don't have any any meanings
[00:14:19] it they don't have any any meanings they're just symbols okay so so when
[00:14:22] they're just symbols okay so so when you're talking about syntax you're
[00:14:23] you're talking about syntax you're really talking about the symbols that
[00:14:25] really talking about the symbols that are building blocks of language but
[00:14:27] are building blocks of language but symbols are alone like syntax alone is
[00:14:30] symbols are alone like syntax alone is not going to be able to define a
[00:14:32] not going to be able to define a language in addition to syntax what we
[00:14:35] language in addition to syntax what we need is semantics we actually need to
[00:14:37] need is semantics we actually need to give meaning to the syntax so for each
[00:14:39] give meaning to the syntax so for each one of these formulas we need to be able
[00:14:41] one of these formulas we need to be able to specify a meaning or we specify it as
[00:14:44] to specify a meaning or we specify it as a meaning has a very very precise
[00:14:47] a meaning has a very very precise meaning to it so so meaning corresponds
[00:14:49] meaning to it so so meaning corresponds to an assignment a configuration of the
[00:14:52] to an assignment a configuration of the role like a setting in the world that
[00:14:54] role like a setting in the world that that corresponds to that formula that
[00:14:56] that corresponds to that formula that corresponds to that syntactical formula
[00:14:59] corresponds to that syntactical formula so for example in the case of rain and
[00:15:01] so for example in the case of rain and wit it actually corresponds to specific
[00:15:04] wit it actually corresponds to specific meaning where rain takes value one and
[00:15:06] meaning where rain takes value one and what takes value one and and this is
[00:15:09] what takes value one and and this is like a specific model a specific world
[00:15:11] like a specific model a specific world where we live in and and in this world
[00:15:14] where we live in and and in this world rain and red has this particular meaning
[00:15:17] rain and red has this particular meaning okay so so the main ingredients of logic
[00:15:20] okay so so the main ingredients of logic are first of syntax and semantics and
[00:15:23] are first of syntax and semantics and once we have syntax and semantics then
[00:15:25] once we have syntax and semantics then we can talk about inference rules we can
[00:15:27] we can talk about inference rules we can actually talk about what can we infer
[00:15:30] actually talk about what can we infer not now that we have a set of formulas
[00:15:32] not now that we have a set of formulas or we have a set of knowledge about the
[00:15:34] or we have a set of knowledge about the world so so given that we have a formula
[00:15:36] world so so given that we have a formula f could we infer could could we derive a
[00:15:39] f could we infer could could we derive a new formula g could we figure out like
[00:15:42] new formula g could we figure out like if g is true or not based on f for
[00:15:44] if g is true or not based on f for example like if you tell me rain and wet
[00:15:46] example like if you tell me rain and wet as a formula from that i can derive rain
[00:15:49] as a formula from that i can derive rain is also true right because it's got to
[00:15:51] is also true right because it's got to be rain and do it so from from that i
[00:15:53] be rain and do it so from from that i should be able to derive right right and
[00:15:56] should be able to derive right right and then that's what we are going to spend
[00:15:57] then that's what we are going to spend quite a bit of time this week because
[00:15:59] quite a bit of time this week because what are the inference rules that we can
[00:16:00] what are the inference rules that we can play around with and and how do they
[00:16:02] play around with and and how do they apply to different types of logic okay
[00:16:05] apply to different types of logic okay so three ingredients syntax semantics
[00:16:08] so three ingredients syntax semantics and inference rules
[00:16:11] and inference rules all right so let me just um make a
[00:16:13] all right so let me just um make a bigger point for this difference between
[00:16:15] bigger point for this difference between syntax and semantics because the
[00:16:17] syntax and semantics because the difference might be a little subtle so
[00:16:19] difference might be a little subtle so so again if you think about syntax
[00:16:21] so again if you think about syntax syntax is talking about the valid
[00:16:23] syntax is talking about the valid expressions that are in your language
[00:16:25] expressions that are in your language it's basically talking about the symbols
[00:16:26] it's basically talking about the symbols right the things that are valid to say
[00:16:28] right the things that are valid to say in the like the the the
[00:16:30] in the like the the the the symbols that are valid to write in
[00:16:32] the symbols that are valid to write in this in this language semantics is about
[00:16:35] this in this language semantics is about what the expressions means means so so
[00:16:37] what the expressions means means so so let me give you an example here so let's
[00:16:39] let me give you an example here so let's say if you're looking at 2 plus 3 versus
[00:16:42] say if you're looking at 2 plus 3 versus 3 plus 2 okay 2 plus 3 and 3 plus 2 have
[00:16:45] 3 plus 2 okay 2 plus 3 and 3 plus 2 have different syntax they're not the same
[00:16:47] different syntax they're not the same they don't look the same if i have no
[00:16:49] they don't look the same if i have no idea what two means or plus means or
[00:16:51] idea what two means or plus means or three means right two plus three has
[00:16:53] three means right two plus three has nothing to do with three plus two they
[00:16:55] nothing to do with three plus two they have two very different syntax but they
[00:16:57] have two very different syntax but they have the same semantics right if i know
[00:16:59] have the same semantics right if i know what plus means and what two means and
[00:17:01] what plus means and what two means and what three means and if i know what two
[00:17:03] what three means and if i know what two plus three is five and three plus two is
[00:17:05] plus three is five and three plus two is five then they have the same meaning
[00:17:07] five then they have the same meaning they have they have they have the same
[00:17:08] they have they have they have the same the same semantics so different syntax
[00:17:11] the same semantics so different syntax but similar the same semantics on the
[00:17:14] but similar the same semantics on the other hand we can have settings where we
[00:17:16] other hand we can have settings where we have the same syntax things look the
[00:17:18] have the same syntax things look the same but they have different meanings
[00:17:20] same but they have different meanings for example you can look at three over
[00:17:22] for example you can look at three over two in python 2.7 versus python three
[00:17:25] two in python 2.7 versus python three and in that case it's like it looks the
[00:17:27] and in that case it's like it looks the same three looks the same the divide
[00:17:30] same three looks the same the divide looks the same two looks the same so
[00:17:32] looks the same two looks the same so syntactically these two are exactly the
[00:17:34] syntactically these two are exactly the same thing but semantically they have
[00:17:36] same thing but semantically they have different meanings they actually
[00:17:38] different meanings they actually correspond to different values right
[00:17:39] correspond to different values right when you're doing this in python 2.7 or
[00:17:42] when you're doing this in python 2.7 or python 3. okay so so again we have two
[00:17:45] python 3. okay so so again we have two expressions that have the same syntax in
[00:17:47] expressions that have the same syntax in this case but they have different
[00:17:48] this case but they have different meanings and different semantics so
[00:17:50] meanings and different semantics so syntax and semantics are two different
[00:17:52] syntax and semantics are two different things both of them are needed to define
[00:17:54] things both of them are needed to define a logical language
[00:17:56] a logical language and i want to kind of like end with this
[00:17:59] and i want to kind of like end with this this view uh that and this diagram that
[00:18:01] this view uh that and this diagram that i'm gonna come back to and explain it in
[00:18:03] i'm gonna come back to and explain it in in a bit more detail in kind of future
[00:18:05] in a bit more detail in kind of future modules so so the idea is we have two
[00:18:08] modules so so the idea is we have two worlds here we have on the left we have
[00:18:11] worlds here we have on the left we have uh syntax syntax land and on the right
[00:18:14] uh syntax syntax land and on the right we have semantics land okay so in syntax
[00:18:17] we have semantics land okay so in syntax land we have formula so i'm going to use
[00:18:19] land we have formula so i'm going to use these rectangles to define to kind of
[00:18:21] these rectangles to define to kind of like represent formulas these are
[00:18:23] like represent formulas these are different formulas that i can write like
[00:18:25] different formulas that i can write like rain and wet and each one of these
[00:18:27] rain and wet and each one of these formulas have a meaning in in this
[00:18:30] formulas have a meaning in in this semantics line okay so each formula has
[00:18:32] semantics line okay so each formula has a meaning in the semantics and and
[00:18:35] a meaning in the semantics and and that's called models here
[00:18:37] that's called models here and what our goal is throughout the
[00:18:38] and what our goal is throughout the lecture is to come up with inference
[00:18:40] lecture is to come up with inference like first define syntax and semantics
[00:18:43] like first define syntax and semantics for different types of logics and then
[00:18:45] for different types of logics and then after that come up with inference rules
[00:18:47] after that come up with inference rules that allow us to manipulate these
[00:18:49] that allow us to manipulate these formulas these compact formulas that are
[00:18:51] formulas these compact formulas that are kind of like nice and expressive and
[00:18:54] kind of like nice and expressive and manipulate them and come up with new
[00:18:55] manipulate them and come up with new formulas and fair new formulas that that
[00:18:58] formulas and fair new formulas that that have meanings that that are also
[00:19:00] have meanings that that are also entailing the meanings of our current
[00:19:02] entailing the meanings of our current formulas so more on this later if this
[00:19:05] formulas so more on this later if this is confusing i will talk about this in
[00:19:07] is confusing i will talk about this in more details in a few in a few modules
[00:19:09] more details in a few in a few modules okay
[00:19:10] okay just to give you a quick overview of
[00:19:12] just to give you a quick overview of different types of logics you are being
[00:19:14] different types of logics you are being we will be talking about there are
[00:19:15] we will be talking about there are different types of logics and and in the
[00:19:18] different types of logics and and in the order of a disorder
[00:19:20] order of a disorder they're they're kind of like increasing
[00:19:22] they're they're kind of like increasing in the order of increasing of
[00:19:23] in the order of increasing of expressivity so uh and in this uh this
[00:19:27] expressivity so uh and in this uh this week we are going to be talking about uh
[00:19:29] week we are going to be talking about uh the bolded ones so we will be talking
[00:19:31] the bolded ones so we will be talking about propositional logic and
[00:19:33] about propositional logic and specifically propositional logic a
[00:19:34] specifically propositional logic a subset of it that only has these things
[00:19:37] subset of it that only has these things that are called horn clauses so we'll
[00:19:38] that are called horn clauses so we'll talk about what those are
[00:19:40] talk about what those are and we will also talk about first order
[00:19:42] and we will also talk about first order logic so first order logic only with
[00:19:44] logic so first order logic only with horn clauses and just generally first
[00:19:46] horn clauses and just generally first order logic there are other types of
[00:19:48] order logic there are other types of logic that we're not discussing in this
[00:19:50] logic that we're not discussing in this class second order logic temporal logic
[00:19:53] class second order logic temporal logic and they're actually quite useful and in
[00:19:56] and they're actually quite useful and in a variety of fields in programming
[00:19:57] a variety of fields in programming languages and robotics and formal
[00:19:59] languages and robotics and formal methods um and yeah like if you're
[00:20:02] methods um and yeah like if you're interested in any of these we can chat
[00:20:05] interested in any of these we can chat about it offline one other point i want
[00:20:07] about it offline one other point i want to make is as we increase like the level
[00:20:10] to make is as we increase like the level of expressivity of logic here like as
[00:20:12] of expressivity of logic here like as you go down in this list like
[00:20:14] you go down in this list like expressivity of these logics are is
[00:20:17] expressivity of these logics are is going higher and higher but the thing
[00:20:19] going higher and higher but the thing that you're losing on is computational
[00:20:21] that you're losing on is computational efficiency so if you want to run
[00:20:22] efficiency so if you want to run inference rules it's going to become
[00:20:24] inference rules it's going to become much more difficult if you're running it
[00:20:26] much more difficult if you're running it on first order logic as opposed to
[00:20:28] on first order logic as opposed to propositional logic so so there is a
[00:20:30] propositional logic so so there is a trade-off between computational
[00:20:31] trade-off between computational efficiency and expressivity of that
[00:20:33] efficiency and expressivity of that logical language
[00:20:35] logical language all right so with that this is the
[00:20:37] all right so with that this is the roadmap for this week's lecture so
[00:20:39] roadmap for this week's lecture so lectures so
[00:20:41] lectures so we're going to start with kind of like
[00:20:44] we're going to start with kind of like modeling and by modeling here i mean
[00:20:46] modeling and by modeling here i mean what i mean is defining the syntax and
[00:20:48] what i mean is defining the syntax and semantics of logic so so we're going to
[00:20:51] semantics of logic so so we're going to talk about propositional logic and the
[00:20:53] talk about propositional logic and the syntax of that then we are going to talk
[00:20:55] syntax of that then we are going to talk about the semantics of propositional
[00:20:56] about the semantics of propositional logic at that point we're going to be
[00:20:59] logic at that point we're going to be switching to inference and discussing
[00:21:01] switching to inference and discussing inference rules and in general we're
[00:21:03] inference rules and in general we're going to be talking about two main
[00:21:05] going to be talking about two main inference rules modus ponens is one and
[00:21:07] inference rules modus ponens is one and the other one is resolution so so we're
[00:21:10] the other one is resolution so so we're going to talk about under propositional
[00:21:12] going to talk about under propositional logic how do we do modus ponens how do
[00:21:14] logic how do we do modus ponens how do we do resolution at that point we're
[00:21:16] we do resolution at that point we're going to switch back to to a higher
[00:21:19] going to switch back to to a higher level of expressivity in terms of our
[00:21:21] level of expressivity in terms of our models we're going to talk about first
[00:21:22] models we're going to talk about first order logic after that again syntax and
[00:21:25] order logic after that again syntax and semantics of first order logic and then
[00:21:27] semantics of first order logic and then after that we're going to talk about
[00:21:29] after that we're going to talk about modus ponens again an inference rule for
[00:21:31] modus ponens again an inference rule for first order logic and we have an
[00:21:33] first order logic and we have an optional
[00:21:34] optional module at the end which is about
[00:21:36] module at the end which is about resolution for first order logic this
[00:21:38] resolution for first order logic this gets a little bit hairy but if you're
[00:21:40] gets a little bit hairy but if you're interested you can kind of look at look
[00:21:42] interested you can kind of look at look at this
[00:21:44] at this and how resolution gets applied to first
[00:21:46] and how resolution gets applied to first order logic and then we're not talking
[00:21:48] order logic and then we're not talking about learning during this week in
[00:21:50] about learning during this week in general learning more recently has been
[00:21:52] general learning more recently has been applied uh to to to kind of logical
[00:21:55] applied uh to to to kind of logical formula specifically in the area of
[00:21:57] formula specifically in the area of formal methods
[00:21:58] formal methods people have been thinking about learning
[00:22:01] people have been thinking about learning logical formulas from data from
[00:22:03] logical formulas from data from demonstrations
[00:22:04] demonstrations but that's outside the scope of this
[00:22:06] but that's outside the scope of this class so we will not be talking about
[00:22:08] class so we will not be talking about that
[00:22:13] you


================================================================================
LECTURE 042
================================================================================

Logic 2 - Propositional Logic Syntax | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=LBjNaewGJzk

---

Transcript

[00:00:04] all right so in this module we're going
[00:00:06] all right so in this module we're going to be talking about syntax of
[00:00:08] to be talking about syntax of propositional logic so if you remember
[00:00:10] propositional logic so if you remember this diagram what we're going to be
[00:00:12] this diagram what we're going to be talking about in this lecture is
[00:00:13] talking about in this lecture is thinking about syntax of logic thinking
[00:00:16] thinking about syntax of logic thinking about semantics of logic the meaning of
[00:00:18] about semantics of logic the meaning of logic and in addition to that thinking
[00:00:20] logic and in addition to that thinking about inference rules how can we
[00:00:21] about inference rules how can we manipulate logic and one point i want to
[00:00:24] manipulate logic and one point i want to mention is like you might have seen
[00:00:26] mention is like you might have seen logic in other classes you might have
[00:00:27] logic in other classes you might have seen logical formulas and being able to
[00:00:29] seen logical formulas and being able to manipulate them and move things around
[00:00:31] manipulate them and move things around and and that's not really the point here
[00:00:33] and and that's not really the point here the point here is to have this general
[00:00:36] the point here is to have this general framework kind of like this more
[00:00:38] framework kind of like this more principle way of looking at logic where
[00:00:40] principle way of looking at logic where we can think about algorithms that can
[00:00:42] we can think about algorithms that can manipulate uh logical formulas and and
[00:00:45] manipulate uh logical formulas and and can do inference rules just more
[00:00:47] can do inference rules just more generally from an algorithmic
[00:00:48] generally from an algorithmic perspective so the point is not for you
[00:00:51] perspective so the point is not for you to be able to like do logic and move
[00:00:53] to be able to like do logic and move things around the point is to have an
[00:00:55] things around the point is to have an algorithm that that algorithm can can do
[00:00:58] algorithm that that algorithm can can do logic so because the goal of this class
[00:01:00] logic so because the goal of this class is to have an artificial intelligence
[00:01:02] is to have an artificial intelligence that can do similar things as how humans
[00:01:04] that can do similar things as how humans would do it so so the point is not for
[00:01:06] would do it so so the point is not for you to do logic the point is for the ai
[00:01:08] you to do logic the point is for the ai to be able to do logic and this is kind
[00:01:10] to be able to do logic and this is kind of like an analogy to that is thinking
[00:01:12] of like an analogy to that is thinking about the bayesian networks lecture
[00:01:14] about the bayesian networks lecture right like last week we talked about
[00:01:15] right like last week we talked about bayesian networks and and in that
[00:01:17] bayesian networks and and in that setting right like you might be able to
[00:01:19] setting right like you might be able to do conditional marginal probabilities
[00:01:21] do conditional marginal probabilities perfectly fine right you might be able
[00:01:23] perfectly fine right you might be able to manipulate things perfectly fine but
[00:01:24] to manipulate things perfectly fine but that was not the point the point was not
[00:01:26] that was not the point the point was not for you to do that the point was to have
[00:01:28] for you to do that the point was to have an algorithm maybe like keep sampling
[00:01:31] an algorithm maybe like keep sampling that is more generally can be applied to
[00:01:33] that is more generally can be applied to any bayesian network not like a single
[00:01:35] any bayesian network not like a single example so so we're basically trying to
[00:01:37] example so so we're basically trying to do a similar thing in the space of logic
[00:01:39] do a similar thing in the space of logic here okay
[00:01:41] here okay so let's talk about syntax so what is
[00:01:43] so let's talk about syntax so what is syntax so the syntax of a propositional
[00:01:45] syntax so the syntax of a propositional logic consists of a few different things
[00:01:47] logic consists of a few different things so it consists of propositional symbols
[00:01:50] so it consists of propositional symbols so this could be like uh a or b or c
[00:01:52] so this could be like uh a or b or c like these take boolean values and then
[00:01:56] like these take boolean values and then based on these propositionals of symbols
[00:01:58] based on these propositionals of symbols we can we can basically build formulas
[00:02:00] we can we can basically build formulas on top of them the prepositional symbols
[00:02:02] on top of them the prepositional symbols are also commonly like known as atomic
[00:02:04] are also commonly like known as atomic formulas and then you can make more
[00:02:07] formulas and then you can make more complicated formulas based on these
[00:02:09] complicated formulas based on these atomic formulas using a set of logical
[00:02:11] atomic formulas using a set of logical connectives so these are negation and or
[00:02:15] connectives so these are negation and or implication or bi-directional
[00:02:17] implication or bi-directional implication
[00:02:19] implication so let me actually write that here
[00:02:22] so let me actually write that here so
[00:02:23] so here so we're going to start with syntax
[00:02:25] here so we're going to start with syntax so what does syntax have so syntax has
[00:02:27] so what does syntax have so syntax has prepositional
[00:02:29] prepositional formulas
[00:02:31] formulas propositional symbols sorry
[00:02:35] propositional symbols sorry so these are like a b
[00:02:37] so these are like a b c and so on and then we can have
[00:02:40] c and so on and then we can have formulas
[00:02:42] formulas defined on top of them and let me use f
[00:02:45] defined on top of them and let me use f for formula and how do i define formulas
[00:02:47] for formula and how do i define formulas i use these logical connectives to to
[00:02:50] i use these logical connectives to to create formulas these connectives that
[00:02:52] create formulas these connectives that we just talked about okay
[00:02:54] we just talked about okay so um like here are a couple of examples
[00:02:57] so um like here are a couple of examples of how we go about it so we can build
[00:02:59] of how we go about it so we can build these formulas recursively let's call f
[00:03:02] these formulas recursively let's call f and g formulas here and if f and g are
[00:03:04] and g formulas here and if f and g are formulas then i can even build more
[00:03:07] formulas then i can even build more formulas on top of them so i can have
[00:03:09] formulas on top of them so i can have negation of f as a new formula or f and
[00:03:12] negation of f as a new formula or f and g as a new formula or f or g as a new
[00:03:14] g as a new formula or f or g as a new formula if implication g or f
[00:03:18] formula if implication g or f bi-directional implication g so like f
[00:03:20] bi-directional implication g so like f is equivalent to g you can think of it
[00:03:22] is equivalent to g you can think of it like that okay
[00:03:24] like that okay here are a few examples so if a is is a
[00:03:27] here are a few examples so if a is is a boolean symbol a is a formula by itself
[00:03:30] boolean symbol a is a formula by itself negation of a is a formula
[00:03:32] negation of a is a formula negation of b implying c is a formula so
[00:03:35] negation of b implying c is a formula so i've just used a bunch of connectives
[00:03:37] i've just used a bunch of connectives and created a more
[00:03:39] and created a more complicated formula here
[00:03:41] complicated formula here uh i can have this guy as a formula
[00:03:43] uh i can have this guy as a formula right so negation of a is a formula
[00:03:45] right so negation of a is a formula negation of b is a formula negation of b
[00:03:47] negation of b is a formula negation of b implying c is a formula negation of b or
[00:03:51] implying c is a formula negation of b or d is a formula and then or of these and
[00:03:53] d is a formula and then or of these and end of these is also going to be a
[00:03:55] end of these is also going to be a formula okay
[00:03:57] formula okay negation of negation of a is a formula
[00:03:59] negation of negation of a is a formula well why is that because a is a formula
[00:04:02] well why is that because a is a formula negation of a is a formula negation of
[00:04:04] negation of a is a formula negation of that is also formula
[00:04:06] that is also formula and this guy a negation b is not a
[00:04:09] and this guy a negation b is not a formula so why is that the case well
[00:04:11] formula so why is that the case well negation of b is a formula a is a
[00:04:13] negation of b is a formula a is a formula but a and negation of b are not
[00:04:16] formula but a and negation of b are not connected with each other using any any
[00:04:18] connected with each other using any any logical connectives so so this is just
[00:04:20] logical connectives so so this is just basically putting two two bullion
[00:04:23] basically putting two two bullion two formulas two logical formulas right
[00:04:25] two formulas two logical formulas right next to each other without any
[00:04:27] next to each other without any connectives and then that's not a
[00:04:28] connectives and then that's not a formula
[00:04:29] formula a plus b is not a formula well why is
[00:04:33] a plus b is not a formula well why is that because plus just doesn't have any
[00:04:35] that because plus just doesn't have any meaning like it doesn't have like the
[00:04:36] meaning like it doesn't have like the syntax of it is not defined here right
[00:04:38] syntax of it is not defined here right like i never defined plus and and that
[00:04:41] like i never defined plus and and that doesn't have uh any that doesn't make
[00:04:43] doesn't have uh any that doesn't make sense in this logic it's not defined in
[00:04:46] sense in this logic it's not defined in this language okay
[00:04:48] this language okay and whatever point i want to mention
[00:04:49] and whatever point i want to mention here and we'll talk about semantics soon
[00:04:52] here and we'll talk about semantics soon is that syntax here you can think of it
[00:04:55] is that syntax here you can think of it just as symbols like syntax it doesn't
[00:04:57] just as symbols like syntax it doesn't have any meanings right syntax is just
[00:05:00] have any meanings right syntax is just the symbols that we are using here and
[00:05:02] the symbols that we are using here and and with no meanings assigned to it and
[00:05:04] and with no meanings assigned to it and the job of semantics is to to assign
[00:05:07] the job of semantics is to to assign meanings to what does negation mean
[00:05:09] meanings to what does negation mean actually or what does implication mean
[00:05:12] actually or what does implication mean and like what would be the meaning of it
[00:05:14] and like what would be the meaning of it but when we're talking about syntax i
[00:05:15] but when we're talking about syntax i could use any other symbol i can use
[00:05:17] could use any other symbol i can use this symbol and and and i can just
[00:05:20] this symbol and and and i can just define that in my logic and and that
[00:05:22] define that in my logic and and that would be the syntax of my logic so don't
[00:05:24] would be the syntax of my logic so don't assign any meanings just yet and when
[00:05:26] assign any meanings just yet and when you're talking about syntax it's just
[00:05:28] you're talking about syntax it's just symbol manipulation when we are talking
[00:05:30] symbol manipulation when we are talking about syntax right
[00:05:31] about syntax right but next in the next module we're going
[00:05:33] but next in the next module we're going to talk about semantics and giving some
[00:05:34] to talk about semantics and giving some meanings
[00:05:40] you


================================================================================
LECTURE 043
================================================================================

Logic 3 - Propositional Logic Semantics | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=N37yIn1jX98

---

Transcript

[00:00:05] all right so in this module we're going
[00:00:08] all right so in this module we're going to be talking about semantics so we've
[00:00:11] to be talking about semantics so we've started talking about syntax and in
[00:00:13] started talking about syntax and in propositional logic and then we defined
[00:00:15] propositional logic and then we defined formulas for positional formulas which
[00:00:17] formulas for positional formulas which basically take propositional symbols and
[00:00:19] basically take propositional symbols and logical connectives and put them
[00:00:21] logical connectives and put them together and symbolically create
[00:00:23] together and symbolically create something that we call a formula and
[00:00:26] something that we call a formula and that was kind of a syntactic view of
[00:00:28] that was kind of a syntactic view of things where we didn't assign any
[00:00:29] things where we didn't assign any meanings there was no meanings for
[00:00:31] meanings there was no meanings for anything it was just symbols and what we
[00:00:33] anything it was just symbols and what we would like to do in this module is we're
[00:00:34] would like to do in this module is we're trying to assign meanings to those
[00:00:37] trying to assign meanings to those syntactical formulas that we defined and
[00:00:39] syntactical formulas that we defined and then that corresponds to semantics so in
[00:00:41] then that corresponds to semantics so in this in this module we're going to be
[00:00:43] this in this module we're going to be talking about semantics and giving
[00:00:44] talking about semantics and giving meanings to those formulas and in
[00:00:47] meanings to those formulas and in general in this lecture we're going to
[00:00:48] general in this lecture we're going to have a good number of definitions so i'm
[00:00:51] have a good number of definitions so i'm going to write out those definitions in
[00:00:52] going to write out those definitions in a separate whiteboard so we can keep
[00:00:54] a separate whiteboard so we can keep track of them but a good number of
[00:00:56] track of them but a good number of definitions are coming up especially in
[00:00:58] definitions are coming up especially in this module and and let's start with
[00:01:00] this module and and let's start with some of them so so the first definition
[00:01:03] some of them so so the first definition is is a definition of a model and this
[00:01:05] is is a definition of a model and this is a very poor choice of words you've
[00:01:07] is a very poor choice of words you've been using the word model throughout uh
[00:01:10] been using the word model throughout uh the the lectures of this class as a
[00:01:12] the the lectures of this class as a different as a different thing right
[00:01:14] different as a different thing right like we talked about modeling inference
[00:01:16] like we talked about modeling inference and learning but in the logic lecture
[00:01:18] and learning but in the logic lecture we're going to assign a different
[00:01:20] we're going to assign a different meaning to the word model and that has
[00:01:22] meaning to the word model and that has historical reasons because historically
[00:01:24] historical reasons because historically like model has been used in these
[00:01:26] like model has been used in these settings uh for for for in in logic in
[00:01:30] settings uh for for for in in logic in in this particular way to to refer to
[00:01:32] in this particular way to to refer to assignments really so so a model for
[00:01:35] assignments really so so a model for this lecture and for the logic lecture
[00:01:37] this lecture and for the logic lecture let's refer to a model in propositional
[00:01:39] let's refer to a model in propositional logic as an assignment of truth values
[00:01:42] logic as an assignment of truth values to propositional symbols
[00:01:44] to propositional symbols so i'm going to use the word w for a
[00:01:46] so i'm going to use the word w for a model w for world so so that's why it's
[00:01:49] model w for world so so that's why it's it's called w so a model of w in
[00:01:52] it's called w so a model of w in propositional logic is just an
[00:01:54] propositional logic is just an assignment of truth values okay so what
[00:01:56] assignment of truth values okay so what does that mean so let's look at an
[00:01:58] does that mean so let's look at an example for example let's say we have
[00:02:00] example for example let's say we have three propositional symbols a b and c
[00:02:03] three propositional symbols a b and c okay how many models do we have well we
[00:02:06] okay how many models do we have well we can have eight possible models right two
[00:02:08] can have eight possible models right two to the three possible models or worlds
[00:02:10] to the three possible models or worlds that we can live in and a particular w a
[00:02:13] that we can live in and a particular w a particular model is going to be a
[00:02:15] particular model is going to be a particular assignment so for example i
[00:02:17] particular assignment so for example i can pick a equal to zero b equal to one
[00:02:20] can pick a equal to zero b equal to one and c equal to zero and that's a model
[00:02:22] and c equal to zero and that's a model that is one w that corresponds to an
[00:02:24] that is one w that corresponds to an assignment of truth values to
[00:02:26] assignment of truth values to prepositional symbols okay so let me
[00:02:28] prepositional symbols okay so let me write that on our uh on our whiteboard
[00:02:32] write that on our uh on our whiteboard here so going back here
[00:02:34] here so going back here i'm going to start
[00:02:36] i'm going to start under semantics
[00:02:38] under semantics and then we have the word model
[00:02:45] model
[00:02:49] and i'm going to use the word w for it
[00:02:51] and i'm going to use the word w for it okay
[00:02:54] all right
[00:02:57] let's go back
[00:03:00] let's go back all right so now we are ready to define
[00:03:02] all right so now we are ready to define this thing that's called interpretation
[00:03:04] this thing that's called interpretation function and an interpretation function
[00:03:06] function and an interpretation function is the thing that actually gives us
[00:03:07] is the thing that actually gives us gives us semantics and gives us meaning
[00:03:10] gives us semantics and gives us meaning so what is an interpretation function so
[00:03:11] so what is an interpretation function so let f be a formula that's what we
[00:03:14] let f be a formula that's what we defined in syntax and let w be a model
[00:03:18] defined in syntax and let w be a model okay
[00:03:18] okay so an assignment so an interpretation
[00:03:20] so an assignment so an interpretation function i gets a formula gets a model
[00:03:24] function i gets a formula gets a model and it basically outputs true or false
[00:03:27] and it basically outputs true or false it basically tells us if if w satisfies
[00:03:30] it basically tells us if if w satisfies f or if w doesn't satisfy f okay so so
[00:03:33] f or if w doesn't satisfy f okay so so interpretation function is really the
[00:03:35] interpretation function is really the thing that binds formulas and models
[00:03:38] thing that binds formulas and models formulas living in the syntactic
[00:03:40] formulas living in the syntactic syntactic land and models living in the
[00:03:42] syntactic land and models living in the semantic land and and basically
[00:03:44] semantic land and and basically interpretation function is trying to
[00:03:46] interpretation function is trying to like connect them and tell us if that is
[00:03:49] like connect them and tell us if that is true or not so so let me go back here
[00:03:51] true or not so so let me go back here and let me let me write out
[00:03:53] and let me let me write out interpretation function
[00:03:55] interpretation function enter
[00:03:57] enter rotation function
[00:04:03] so it's a function i
[00:04:05] so it's a function i that takes f a formula and w and it
[00:04:08] that takes f a formula and w and it gives us true or false so let me go
[00:04:12] gives us true or false so let me go here and show this so let's say that we
[00:04:14] here and show this so let's say that we have a formula f i'm gonna write our or
[00:04:17] have a formula f i'm gonna write our or draw our formulas using these rectangle
[00:04:20] draw our formulas using these rectangle rectangles so uh in the in the syntactic
[00:04:23] rectangles so uh in the in the syntactic land you might have a number of formulas
[00:04:25] land you might have a number of formulas let's say i have one formula f and in
[00:04:27] let's say i have one formula f and in the world of semantics we might have
[00:04:30] the world of semantics we might have different models right these are
[00:04:32] different models right these are different models or worlds that we can
[00:04:35] different models or worlds that we can live in or worlds
[00:04:39] and i can pick a specific one let me
[00:04:41] and i can pick a specific one let me call that w no
[00:04:46] okay let me call that w
[00:04:49] okay let me call that w and and what i can do is i can basically
[00:04:53] and and what i can do is i can basically connect the formula to this w using this
[00:04:56] connect the formula to this w using this interpretation function so i'm going to
[00:04:58] interpretation function so i'm going to interpret have an interpretation
[00:04:59] interpret have an interpretation function f and w
[00:05:01] function f and w that gives me true or false if if f
[00:05:04] that gives me true or false if if f satisfies w or not okay i can have other
[00:05:06] satisfies w or not okay i can have other w's in this space of models in this
[00:05:09] w's in this space of models in this semantic line so how do we define an
[00:05:11] semantic line so how do we define an interpretation function so
[00:05:14] interpretation function so um the way we define an interpretation
[00:05:15] um the way we define an interpretation function is recursively in a similar way
[00:05:18] function is recursively in a similar way that we defined our syntax so we're
[00:05:20] that we defined our syntax so we're going to start with propositional
[00:05:22] going to start with propositional symbols so so we have these
[00:05:23] symbols so so we have these propositional symbols a b and c right
[00:05:26] propositional symbols a b and c right like these these take boolean values and
[00:05:28] like these these take boolean values and the interpretation function of each one
[00:05:31] the interpretation function of each one of these each one of these propositional
[00:05:33] of these each one of these propositional symbols p
[00:05:34] symbols p and a model w is just going to return w
[00:05:38] and a model w is just going to return w of that prepositional prepositional
[00:05:40] of that prepositional prepositional symbol okay remember w is an assignment
[00:05:43] symbol okay remember w is an assignment to these so going back here let me let
[00:05:45] to these so going back here let me let me give you an example so my w is going
[00:05:47] me give you an example so my w is going to be an assignment that maybe says um a
[00:05:51] to be an assignment that maybe says um a it takes the value zero okay
[00:05:54] it takes the value zero okay and my prepositional symbol is just a
[00:05:56] and my prepositional symbol is just a right so if i look at interpretation
[00:05:58] right so if i look at interpretation function over p and w it's basically
[00:06:02] function over p and w it's basically checking
[00:06:04] checking in interpreting a
[00:06:06] in interpreting a and
[00:06:08] and the model a being assigned to equal to
[00:06:10] the model a being assigned to equal to zero and that returns the value zero
[00:06:13] zero and that returns the value zero okay so so that is just for the for the
[00:06:16] okay so so that is just for the for the base case of this okay
[00:06:18] base case of this okay so then when we think about more general
[00:06:21] so then when we think about more general formulas these formulas how are they how
[00:06:23] formulas these formulas how are they how are they defined they're defined based
[00:06:24] are they defined they're defined based on these logical connectives used
[00:06:27] on these logical connectives used applied over prepositional symbols and
[00:06:29] applied over prepositional symbols and based on that we can we can basically
[00:06:31] based on that we can we can basically recursively define this interpretation
[00:06:33] recursively define this interpretation function so i can have a formula f and a
[00:06:36] function so i can have a formula f and a formula g and i can create kind of this
[00:06:38] formula g and i can create kind of this truth table and an interpretation
[00:06:40] truth table and an interpretation function of f and w could take value
[00:06:43] function of f and w could take value zero or one interpretation function of g
[00:06:45] zero or one interpretation function of g and w could take value zero or one and
[00:06:48] and w could take value zero or one and then for anything else like for any of
[00:06:50] then for anything else like for any of these other logical connectives i can
[00:06:52] these other logical connectives i can basically recursively define them so
[00:06:54] basically recursively define them so what would be an interpretation function
[00:06:56] what would be an interpretation function of negation of f and w it would
[00:06:58] of negation of f and w it would basically try to negate this column
[00:07:01] basically try to negate this column right so so it would be one one zero
[00:07:02] right so so it would be one one zero zero or if i'm thinking about
[00:07:04] zero or if i'm thinking about interpretation of f and the g well what
[00:07:07] interpretation of f and the g well what would that be
[00:07:08] would that be and and and the model w what would that
[00:07:10] and and and the model w what would that be that would be interpretation function
[00:07:12] be that would be interpretation function of f and w and interpretation of g and w
[00:07:16] of f and w and interpretation of g and w so basically adding these two columns
[00:07:18] so basically adding these two columns and that gives us these values and so on
[00:07:21] and that gives us these values and so on similarly we can define interpretation
[00:07:22] similarly we can define interpretation function over f or g or f implying g or
[00:07:26] function over f or g or f implying g or if bi-directional implication of g and
[00:07:28] if bi-directional implication of g and and so on and then we can kind of assign
[00:07:31] and so on and then we can kind of assign meanings to these more more generic
[00:07:33] meanings to these more more generic formulas okay
[00:07:36] formulas okay all right so so let's look at an example
[00:07:38] all right so so let's look at an example of how do we do this recursively so
[00:07:39] of how do we do this recursively so let's say we have a formula f and that
[00:07:41] let's say we have a formula f and that formula is negation of a and b and
[00:07:44] formula is negation of a and b and bi-directional implications c okay so
[00:07:46] bi-directional implications c okay so that's my formula i have an assignment i
[00:07:49] that's my formula i have an assignment i have i have a model that model is truth
[00:07:51] have i have a model that model is truth assignment to my prepositional symbols a
[00:07:54] assignment to my prepositional symbols a b and c so let's say a is one b is one
[00:07:57] b and c so let's say a is one b is one and c is equal to 0 and now i can call
[00:07:59] and c is equal to 0 and now i can call an interpretation function f and w and
[00:08:02] an interpretation function f and w and see what the value of that would be
[00:08:04] see what the value of that would be how do we do that well let's start with
[00:08:06] how do we do that well let's start with these notes so um at this node i can
[00:08:09] these notes so um at this node i can call interpretation function over symbol
[00:08:11] call interpretation function over symbol a and w what is that equal to well that
[00:08:13] a and w what is that equal to well that is equal to just one because i'm just
[00:08:16] is equal to just one because i'm just going to call my my table of models
[00:08:18] going to call my my table of models right i already have this that's just
[00:08:20] right i already have this that's just equal to one
[00:08:21] equal to one then negation of a is going to be equal
[00:08:23] then negation of a is going to be equal to zero what is interpretation function
[00:08:25] to zero what is interpretation function of b and w well again i have an
[00:08:27] of b and w well again i have an assignment i have a model that tells me
[00:08:29] assignment i have a model that tells me b takes value one so that's one
[00:08:32] b takes value one so that's one and then if i'm taking an interpretation
[00:08:34] and then if i'm taking an interpretation function of negation of a and b then
[00:08:36] function of negation of a and b then that is an interval the the end of these
[00:08:39] that is an interval the the end of these these two so one and zero gives me zero
[00:08:42] these two so one and zero gives me zero similarly i can look at interpretation
[00:08:44] similarly i can look at interpretation function of c and w reading that off my
[00:08:46] function of c and w reading that off my model that is equal to zero and then
[00:08:49] model that is equal to zero and then when i'm looking at equivalence of this
[00:08:51] when i'm looking at equivalence of this c and negation of a and b well these two
[00:08:54] c and negation of a and b well these two are equivalent so that's just going to
[00:08:56] are equivalent so that's just going to be equal to one okay so this is just
[00:08:58] be equal to one okay so this is just showing recursively how do we how do we
[00:09:01] showing recursively how do we how do we run an interpretation function see
[00:09:03] run an interpretation function see there's like there's no learning here
[00:09:04] there's like there's no learning here this is like defined by logic right like
[00:09:06] this is like defined by logic right like you can define your own logic that would
[00:09:08] you can define your own logic that would be fun but but this is defined by some
[00:09:10] be fun but but this is defined by some sort of logic that is this propositional
[00:09:12] sort of logic that is this propositional logic that we have defined using using
[00:09:14] logic that we have defined using using our formulas and using using our
[00:09:16] our formulas and using using our connectives and so on and i'm just like
[00:09:19] connectives and so on and i'm just like calling this like i'm just computing
[00:09:20] calling this like i'm just computing this i'm not doing anything like fancy
[00:09:22] this i'm not doing anything like fancy here okay
[00:09:25] here okay all right so um so each formula and
[00:09:28] all right so um so each formula and model right like we'll give like we can
[00:09:29] model right like we'll give like we can interpret it using this interpretation
[00:09:31] interpret it using this interpretation function and that gives us a value zero
[00:09:34] function and that gives us a value zero or one okay so so now i'm going to
[00:09:36] or one okay so so now i'm going to define this thing that's called models
[00:09:39] define this thing that's called models of f and basically it's a set of w's
[00:09:42] of f and basically it's a set of w's it's a set of models where
[00:09:43] it's a set of models where interpretation function is equal to one
[00:09:46] interpretation function is equal to one so going back here right like looking at
[00:09:48] so going back here right like looking at this this this there could be one w and
[00:09:51] this this this there could be one w and i can check interpretation function of f
[00:09:53] i can check interpretation function of f and w i can also be looking at a set of
[00:09:56] and w i can also be looking at a set of models right and i can call this models
[00:09:58] models right and i can call this models of f
[00:09:59] of f and what is models of f right models of
[00:10:03] and what is models of f right models of f is a setting where interpretation is
[00:10:05] f is a setting where interpretation is it's a set of w's so let me write that
[00:10:08] it's a set of w's so let me write that here models of f what is that equal to
[00:10:10] here models of f what is that equal to it's a set it's a set of w's such that
[00:10:13] it's a set it's a set of w's such that interpretation function of f and w
[00:10:17] interpretation function of f and w is going to be equal to 1.
[00:10:20] is going to be equal to 1. so let me write that in the set of my
[00:10:21] so let me write that in the set of my definitions so we talked about
[00:10:23] definitions so we talked about interpretation function we talked about
[00:10:24] interpretation function we talked about a single model
[00:10:26] a single model and
[00:10:27] and i don't know why this does this let me
[00:10:29] i don't know why this does this let me erase this
[00:10:31] erase this we have models
[00:10:35] m of f
[00:10:37] m of f okay and what is that that is a set of
[00:10:39] okay and what is that that is a set of w's such that interpretation function of
[00:10:42] w's such that interpretation function of f and w
[00:10:43] f and w is just equal to one
[00:10:45] is just equal to one okay
[00:10:46] okay all right so now we have our models
[00:10:49] all right so now we have our models let's go back here
[00:10:51] let's go back here okay
[00:10:53] all right so so basically um intuitively
[00:10:56] all right so so basically um intuitively you can think of this models of f as all
[00:10:59] you can think of this models of f as all the all the worlds all the assignments
[00:11:01] the all the worlds all the assignments were f holds and anything outside of
[00:11:04] were f holds and anything outside of this is still like has some world has
[00:11:07] this is still like has some world has some possibility but that's a setting
[00:11:09] some possibility but that's a setting where this particular f doesn't doesn't
[00:11:11] where this particular f doesn't doesn't necessarily hold okay
[00:11:13] necessarily hold okay so so let's look at an example let's say
[00:11:15] so so let's look at an example let's say our formula f is rain or red okay
[00:11:19] our formula f is rain or red okay so then if i think about models all
[00:11:22] so then if i think about models all possible models are when we think about
[00:11:24] possible models are when we think about rain taking values zero or one and with
[00:11:27] rain taking values zero or one and with taking value 0 1 so i can kind of like
[00:11:29] taking value 0 1 so i can kind of like show that by this 2x2 grid that's all
[00:11:32] show that by this 2x2 grid that's all possible models but what is models of f
[00:11:35] possible models but what is models of f models of f is when rain or red holds
[00:11:39] models of f is when rain or red holds and that is the shaded area right so the
[00:11:41] and that is the shaded area right so the shaded area here is showing kind of like
[00:11:44] shaded area here is showing kind of like the meaning of this formula f which is
[00:11:47] the meaning of this formula f which is again symbolically written doesn't have
[00:11:49] again symbolically written doesn't have a meaning but models of f is assigning a
[00:11:52] a meaning but models of f is assigning a meaning to it so it's saying hey these
[00:11:54] meaning to it so it's saying hey these grids is showing like what is the
[00:11:57] grids is showing like what is the meaning of rain or wet okay
[00:12:00] meaning of rain or wet okay and and the key idea here in logic in
[00:12:02] and and the key idea here in logic in general is there's this formula although
[00:12:04] general is there's this formula although it is written like again syntactically
[00:12:06] it is written like again syntactically and then it is it is a symbolic
[00:12:08] and then it is it is a symbolic representation it's a very compact like
[00:12:10] representation it's a very compact like representation of a giant set of models
[00:12:13] representation of a giant set of models right so so in general the nice thing
[00:12:15] right so so in general the nice thing about logic is is we could use formulas
[00:12:18] about logic is is we could use formulas to compactly represent like very like
[00:12:20] to compactly represent like very like large meanings like a lot of times like
[00:12:22] large meanings like a lot of times like exponential meanings could be
[00:12:23] exponential meanings could be represented by by formulas that are
[00:12:26] represented by by formulas that are pretty compact and nice and then that is
[00:12:28] pretty compact and nice and then that is kind of the power of logic you can write
[00:12:29] kind of the power of logic you can write this compactly and then you can do
[00:12:31] this compactly and then you can do operations on it you can do inference on
[00:12:33] operations on it you can do inference on it and so on and that that is really
[00:12:35] it and so on and that that is really nice okay
[00:12:37] nice okay all right so that was formulas and
[00:12:39] all right so that was formulas and models and interpretation functions
[00:12:41] models and interpretation functions finding formulas and models
[00:12:43] finding formulas and models and and what now we want to do is we
[00:12:45] and and what now we want to do is we want to think about how could we do
[00:12:47] want to think about how could we do operations here and how could we what
[00:12:49] operations here and how could we what would what would meanings like what
[00:12:51] would what would meanings like what would new formulas add in terms of
[00:12:53] would new formulas add in terms of meanings to the knowledge that we
[00:12:55] meanings to the knowledge that we already have so for that let's define
[00:12:57] already have so for that let's define something that's called a knowledge base
[00:12:59] something that's called a knowledge base so a knowledge base is a set of formulas
[00:13:02] so a knowledge base is a set of formulas that i already know okay so if i have
[00:13:03] that i already know okay so if i have like a system a virtual assistant system
[00:13:06] like a system a virtual assistant system that that i want to i want to add logic
[00:13:08] that that i want to i want to add logic to it or i want to speak to it using
[00:13:10] to it or i want to speak to it using using language or using logic
[00:13:12] using language or using logic that that system has a knowledge base
[00:13:14] that that system has a knowledge base which is a set of formulas that are
[00:13:16] which is a set of formulas that are already represented it's a conjunction
[00:13:18] already represented it's a conjunction or intersection of a bunch of things
[00:13:20] or intersection of a bunch of things that it already knows okay so let me go
[00:13:23] that it already knows okay so let me go back here and write out knowledge base
[00:13:25] back here and write out knowledge base as a thing
[00:13:27] as a thing so
[00:13:28] so knowledge
[00:13:31] space
[00:13:33] kb i'm going to use kb for this and this
[00:13:36] kb i'm going to use kb for this and this is a set of formulas that that you
[00:13:38] is a set of formulas that that you already know okay so we might already
[00:13:40] already know okay so we might already know um a formula that says rain or snow
[00:13:44] know um a formula that says rain or snow or we might already know there's traffic
[00:13:46] or we might already know there's traffic okay so this is our knowledge base okay
[00:13:49] okay so this is our knowledge base okay so then what happens is is that someone
[00:13:52] so then what happens is is that someone might come and give me a new formula and
[00:13:55] might come and give me a new formula and what we're interested in looking at is
[00:13:56] what we're interested in looking at is how does how does that affect our
[00:13:58] how does how does that affect our knowledge
[00:13:59] knowledge so before getting in there so so
[00:14:01] so before getting in there so so knowledge base is a set of formulas so
[00:14:03] knowledge base is a set of formulas so it is in the syntax land
[00:14:05] it is in the syntax land what would be the analog of it in the
[00:14:07] what would be the analog of it in the semantics land it would be models of kb
[00:14:09] semantics land it would be models of kb and and what is the models of kb models
[00:14:12] and and what is the models of kb models of kb is going to be an intersection of
[00:14:15] of kb is going to be an intersection of models of f
[00:14:16] models of f okay so maybe let's go back here let me
[00:14:19] okay so maybe let's go back here let me let me just look at an example so we
[00:14:21] let me just look at an example so we looked at an example where
[00:14:23] looked at an example where let's say i have a formula f1 and if one
[00:14:27] let's say i have a formula f1 and if one says it's raining and snowing
[00:14:31] says it's raining and snowing okay
[00:14:32] okay and maybe i have f2 and f2 says there is
[00:14:34] and maybe i have f2 and f2 says there is traffic
[00:14:36] traffic okay
[00:14:38] let me let me separate these okay
[00:14:41] let me let me separate these okay and and i have a knowledge base and my
[00:14:43] and and i have a knowledge base and my knowledge base has f1
[00:14:45] knowledge base has f1 and f2 in it okay so someone already
[00:14:48] and f2 in it okay so someone already told me there's raining and snowing and
[00:14:50] told me there's raining and snowing and then there's traffic so what would be
[00:14:52] then there's traffic so what would be models of kb so models of kb
[00:14:58] models of kb so models of kb is is going to be an intersection of
[00:15:00] is is going to be an intersection of models of f1
[00:15:04] models of f1 with models of f2
[00:15:06] with models of f2 and and why is that because if you think
[00:15:08] and and why is that because if you think about it right like
[00:15:10] about it right like f1 is a formula
[00:15:12] f1 is a formula f1 has a set of models corresponding to
[00:15:15] f1 has a set of models corresponding to it models of f1 corresponding to it
[00:15:19] it models of f1 corresponding to it and f2 is another formula that i'm just
[00:15:21] and f2 is another formula that i'm just adding to my knowledge base
[00:15:23] adding to my knowledge base and that has a bunch of other
[00:15:26] and that has a bunch of other models corresponding to it
[00:15:28] models corresponding to it and my knowledge base right like is now
[00:15:31] and my knowledge base right like is now going to be an intersection
[00:15:33] going to be an intersection of these two because as we add more
[00:15:36] of these two because as we add more formulas right as we add more knowledge
[00:15:38] formulas right as we add more knowledge to our knowledge base then then our
[00:15:41] to our knowledge base then then our models is going to become smaller and
[00:15:42] models is going to become smaller and smaller because we are adding more
[00:15:44] smaller because we are adding more constraints which is pretty interesting
[00:15:46] constraints which is pretty interesting right so in general if i have let me
[00:15:48] right so in general if i have let me maybe write that in a different color so
[00:15:50] maybe write that in a different color so if i have a knowledge base
[00:15:52] if i have a knowledge base and i add a new formula to that
[00:15:53] and i add a new formula to that knowledge base maybe maybe i union it
[00:15:56] knowledge base maybe maybe i union it with a new formula that is added new f
[00:15:59] with a new formula that is added new f what would be the effect of that on
[00:16:01] what would be the effect of that on models of kb
[00:16:03] models of kb the effect of data models of kb is going
[00:16:06] the effect of data models of kb is going to be what i had for models of kb
[00:16:08] to be what i had for models of kb intersection models of f
[00:16:11] intersection models of f so adding new formulas is constraining
[00:16:14] so adding new formulas is constraining our models right constraining the
[00:16:16] our models right constraining the meaning more and more because it can be
[00:16:18] meaning more and more because it can be like if you have raining and snowing and
[00:16:20] like if you have raining and snowing and you have traffic as a whole other set of
[00:16:21] you have traffic as a whole other set of models the intersection of the two is
[00:16:23] models the intersection of the two is going to give me give me um this this
[00:16:25] going to give me give me um this this this models okay
[00:16:27] this models okay so um also let me connect this so this
[00:16:30] so um also let me connect this so this corresponds to these models correspond
[00:16:31] corresponds to these models correspond to these mods
[00:16:32] to these mods all right
[00:16:34] all right so let's go back here
[00:16:37] so let's go back here so so that's how we define models of
[00:16:39] so so that's how we define models of knowledge base let's look at another
[00:16:41] knowledge base let's look at another example here so let's say we're looking
[00:16:43] example here so let's say we're looking at raining as one formula
[00:16:45] at raining as one formula models of rain is going to be this
[00:16:47] models of rain is going to be this shaded area rain is equal to one and
[00:16:50] shaded area rain is equal to one and then we have another formula rain
[00:16:52] then we have another formula rain implies wet and what is models of rain
[00:16:55] implies wet and what is models of rain implies wet it is uh basically negation
[00:16:58] implies wet it is uh basically negation of rain or red so basically it is this
[00:17:00] of rain or red so basically it is this shaded area
[00:17:02] shaded area if i'm looking at a knowledge base that
[00:17:04] if i'm looking at a knowledge base that has both of these in it
[00:17:06] has both of these in it then what would be the models of that
[00:17:07] then what would be the models of that knowledge base is just going to be the
[00:17:10] knowledge base is just going to be the intersection of these two shaded area
[00:17:12] intersection of these two shaded area which is basically this square okay
[00:17:14] which is basically this square okay where we have both rain and rain implies
[00:17:16] where we have both rain and rain implies vent folding
[00:17:18] vent folding all right sounds good
[00:17:21] all right sounds good and then this is what i've already
[00:17:22] and then this is what i've already basically mentioned we have knowledge
[00:17:23] basically mentioned we have knowledge base if we add a formula to it we
[00:17:25] base if we add a formula to it we increase the size of our knowledge base
[00:17:26] increase the size of our knowledge base we're shrinking the the size of set of
[00:17:29] we're shrinking the the size of set of models because we're constraining things
[00:17:30] models because we're constraining things more and more so we are constraining the
[00:17:32] more and more so we are constraining the meaning part
[00:17:35] meaning part all right so so now let's talk about
[00:17:37] all right so so now let's talk about this idea of what happens if i have a
[00:17:39] this idea of what happens if i have a knowledge base and i add a new formula
[00:17:41] knowledge base and i add a new formula so i have a knowledge base i'm trying to
[00:17:43] so i have a knowledge base i'm trying to add a new formula see what happens and
[00:17:45] add a new formula see what happens and there are three things that can happen
[00:17:47] there are three things that can happen so so one option is entailment so what
[00:17:50] so so one option is entailment so what entailment says is if i have a knowledge
[00:17:52] entailment says is if i have a knowledge base if i have kb as my knowledge base
[00:17:55] base if i have kb as my knowledge base and you come and tell me a new formula f
[00:17:57] and you come and tell me a new formula f and that formula is not adding anything
[00:18:00] and that formula is not adding anything to my knowledge base then then we say we
[00:18:03] to my knowledge base then then we say we have entailment okay so so this is a
[00:18:05] have entailment okay so so this is a scenario where f is just not adding any
[00:18:08] scenario where f is just not adding any information or any new constraints like
[00:18:10] information or any new constraints like this is basically telling me things are
[00:18:11] this is basically telling me things are already new okay so so we say kb entails
[00:18:15] already new okay so so we say kb entails f and what that
[00:18:16] f and what that that is written using this double line
[00:18:19] that is written using this double line kind of um
[00:18:20] kind of um symbol so if you say kb entails f if and
[00:18:23] symbol so if you say kb entails f if and only if models of kb is an intersection
[00:18:27] only if models of kb is an intersection of models of f
[00:18:29] of models of f let's look at an example here so let's
[00:18:31] let's look at an example here so let's go back to
[00:18:33] go back to uh let's go back here maybe i'll start a
[00:18:36] uh let's go back here maybe i'll start a new
[00:18:38] new so we have three options
[00:18:41] so we have three options one is called entailment
[00:18:45] entailment
[00:18:48] so let's start with a knowledge base
[00:18:52] so let's start with a knowledge base and my knowledge base maybe is rain and
[00:18:56] and my knowledge base maybe is rain and snow so i have a formula in my knowledge
[00:18:58] snow so i have a formula in my knowledge base that says rain and snow
[00:19:01] base that says rain and snow and that has models corresponding to it
[00:19:04] and that has models corresponding to it so this is models of kb
[00:19:08] and you might come and tell me a new
[00:19:10] and you might come and tell me a new formula and that new formula is rain
[00:19:13] formula and that new formula is rain and if you tell me rain and i already
[00:19:15] and if you tell me rain and i already have rain and snow in my knowledge base
[00:19:17] have rain and snow in my knowledge base that doesn't add any knowledge to me
[00:19:19] that doesn't add any knowledge to me right like i already knew it was raining
[00:19:21] right like i already knew it was raining so so then um this models is going to be
[00:19:25] so so then um this models is going to be as super models of f
[00:19:27] as super models of f is going to be a super set of models of
[00:19:29] is going to be a super set of models of kb
[00:19:31] kb okay so we say kb
[00:19:33] okay so we say kb entails f
[00:19:35] entails f if and only if
[00:19:37] if and only if models of kb
[00:19:40] models of kb is an intersection
[00:19:42] is an intersection of models of f
[00:19:44] of models of f so
[00:19:45] so didn't didn't tell me anything new
[00:19:47] didn't didn't tell me anything new already knew that and and that is
[00:19:49] already knew that and and that is entailment so let me go back here and
[00:19:52] entailment so let me go back here and maybe add these um
[00:19:54] maybe add these um under
[00:19:55] under our definition so now we have defined
[00:19:58] our definition so now we have defined entailment
[00:20:01] entailment contracts okay
[00:20:04] contracts okay all right
[00:20:05] all right so let's go back here
[00:20:09] so rain and snow is entailing stem
[00:20:14] so rain and snow is entailing stem so
[00:20:15] so okay so that was one option another
[00:20:16] okay so that was one option another option is contradiction so so what is
[00:20:18] option is contradiction so so what is contradiction contradiction is a
[00:20:20] contradiction contradiction is a scenario where you're telling me a new
[00:20:22] scenario where you're telling me a new formula f i already have a knowledge
[00:20:23] formula f i already have a knowledge base kb you tell me a new formula f and
[00:20:26] base kb you tell me a new formula f and that is contradicting with my knowledge
[00:20:29] that is contradicting with my knowledge base okay so
[00:20:31] base okay so the the in in the models land what
[00:20:33] the the in in the models land what happens is that models of kb doesn't
[00:20:35] happens is that models of kb doesn't have any intersections with models of f
[00:20:40] have any intersections with models of f so f contradicts what we know our
[00:20:42] so f contradicts what we know our knowledge base if and only if models of
[00:20:45] knowledge base if and only if models of kb intersection models of f is going to
[00:20:47] kb intersection models of f is going to be the empty set
[00:20:49] be the empty set right so so let's look at an example uh
[00:20:52] right so so let's look at an example uh let's maybe go back here so
[00:20:54] let's maybe go back here so our second option is contradiction
[00:20:57] our second option is contradiction so let's write that here
[00:20:59] so let's write that here contradiction
[00:21:03] so contradiction is a scenario where i
[00:21:06] so contradiction is a scenario where i know some uh knowledge base
[00:21:09] know some uh knowledge base so my knowledge base is maybe
[00:21:11] so my knowledge base is maybe rain and snow again so
[00:21:14] rain and snow again so so i think it's raining and snowing
[00:21:16] so i think it's raining and snowing and then you come and tell me a new new
[00:21:18] and then you come and tell me a new new formula and that new formula is negation
[00:21:20] formula and that new formula is negation of snow maybe
[00:21:22] of snow maybe okay
[00:21:23] okay and then that contradicts with my
[00:21:25] and then that contradicts with my knowledge base right so if that
[00:21:27] knowledge base right so if that contradicts with my knowledge base what
[00:21:28] contradicts with my knowledge base what happens is that there is a models of kb
[00:21:33] happens is that there is a models of kb and there is a models of f
[00:21:36] and there is a models of f and they don't have any intersection so
[00:21:37] and they don't have any intersection so contradiction is a scenario where models
[00:21:40] contradiction is a scenario where models of kb
[00:21:41] of kb intersection models of f
[00:21:44] intersection models of f is empty
[00:21:48] one other interesting thing to to kind
[00:21:50] one other interesting thing to to kind of like notice here
[00:21:52] of like notice here is that if you think about contradiction
[00:21:54] is that if you think about contradiction contradiction is very related to
[00:21:56] contradiction is very related to entailment
[00:21:58] entailment contradiction is the same thing as
[00:22:00] contradiction is the same thing as entailing negation of f and then why is
[00:22:02] entailing negation of f and then why is that the case because if you look at
[00:22:04] that the case because if you look at models of f right
[00:22:06] models of f right models of negation of f is anything
[00:22:08] models of negation of f is anything outside of it right so if this is models
[00:22:10] outside of it right so if this is models of negation of f
[00:22:12] of negation of f then what is happening the thing that's
[00:22:14] then what is happening the thing that's happening is that models of of kb
[00:22:20] is a subset of models of negation of f
[00:22:23] is a subset of models of negation of f and and if you remember our definition
[00:22:25] and and if you remember our definition of entailment that is the same thing as
[00:22:27] of entailment that is the same thing as kb
[00:22:28] kb entailing negation of f
[00:22:31] entailing negation of f okay so that's pretty interesting
[00:22:33] okay so that's pretty interesting because contradiction is the same thing
[00:22:35] because contradiction is the same thing as entailing negation of f
[00:22:39] as entailing negation of f all right so so those were the two cases
[00:22:41] all right so so those were the two cases so far right you either told me a new
[00:22:43] so far right you either told me a new formula and and already knew it so that
[00:22:46] formula and and already knew it so that is entailment or you tell me a new
[00:22:48] is entailment or you tell me a new formula and that contradicts the
[00:22:50] formula and that contradicts the knowledge base that i've had so that is
[00:22:52] knowledge base that i've had so that is contradiction okay and let's add that
[00:22:54] contradiction okay and let's add that here
[00:22:55] here so we talked about entailment
[00:22:58] so we talked about entailment now we've talked about contradiction
[00:23:02] and we wrote entailment as kb entailing
[00:23:05] and we wrote entailment as kb entailing a formula and contradiction is kb
[00:23:08] a formula and contradiction is kb entailing negation of a formula okay
[00:23:12] entailing negation of a formula okay all right so there is a third case let's
[00:23:14] all right so there is a third case let's talk about that third case
[00:23:16] talk about that third case let me skip that so we talked about
[00:23:18] let me skip that so we talked about contradiction being very related to
[00:23:20] contradiction being very related to entailment and kb contradicting with f
[00:23:23] entailment and kb contradicting with f is the same thing as kb entailing
[00:23:25] is the same thing as kb entailing negation of f
[00:23:27] negation of f all right so the third case here is what
[00:23:29] all right so the third case here is what we're calling contingency and that is
[00:23:32] we're calling contingency and that is when you're telling me a formula and
[00:23:34] when you're telling me a formula and that formula is adding is actually
[00:23:36] that formula is adding is actually telling me something i didn't know it's
[00:23:37] telling me something i didn't know it's telling me some non-trivial information
[00:23:40] telling me some non-trivial information okay so so that is when models of kb has
[00:23:43] okay so so that is when models of kb has some intersection with some non-trivial
[00:23:45] some intersection with some non-trivial intersection with models of f okay
[00:23:49] intersection with models of f okay so so that is uh when we write models of
[00:23:51] so so that is uh when we write models of kb intersection models of f is is going
[00:23:54] kb intersection models of f is is going to be a subset of models of kb but it's
[00:23:57] to be a subset of models of kb but it's going to be strict subset of models of
[00:23:59] going to be strict subset of models of kb if this is equal like we get
[00:24:01] kb if this is equal like we get entailment so so we don't include uh
[00:24:03] entailment so so we don't include uh equality here okay
[00:24:05] equality here okay all right let's look at an example maybe
[00:24:07] all right let's look at an example maybe um let's go back here
[00:24:12] so our third case
[00:24:14] so our third case is contingency
[00:24:20] and that is when i have maybe my
[00:24:22] and that is when i have maybe my knowledge base and maybe my knowledge
[00:24:24] knowledge base and maybe my knowledge base is just rain this time
[00:24:26] base is just rain this time and you come and tell me a new formula
[00:24:27] and you come and tell me a new formula and that new formula is no okay so so my
[00:24:31] and that new formula is no okay so so my knowledge base thought it is raining so
[00:24:33] knowledge base thought it is raining so i have my models of knowledge base
[00:24:34] i have my models of knowledge base corresponding to raining here and then
[00:24:36] corresponding to raining here and then you come and tell me hey by the way it
[00:24:38] you come and tell me hey by the way it is it is also snowing and models of
[00:24:41] is it is also snowing and models of snowing is here and there is some
[00:24:43] snowing is here and there is some non-trivial intersection going on here
[00:24:46] non-trivial intersection going on here okay so so contingency is when models of
[00:24:49] okay so so contingency is when models of kb
[00:24:50] kb intersection models of f
[00:24:54] intersection models of f is going to be a subset and it's not
[00:24:56] is going to be a subset and it's not going to be equal to models of kb
[00:24:59] going to be equal to models of kb and similarly empty set is going to be a
[00:25:02] and similarly empty set is going to be a subset of this but it's not going to be
[00:25:03] subset of this but it's not going to be equal so if you get like some
[00:25:04] equal so if you get like some non-trivial information something that
[00:25:07] non-trivial information something that you didn't know and that gets added
[00:25:09] you didn't know and that gets added okay so so that is contingency
[00:25:12] okay so so that is contingency and going back here let me add
[00:25:14] and going back here let me add contingency as the third
[00:25:16] contingency as the third option
[00:25:23] and contingency um is when you have
[00:25:26] and contingency um is when you have these non-trivial intersections i'm not
[00:25:27] these non-trivial intersections i'm not gonna write it out
[00:25:29] gonna write it out all right
[00:25:30] all right let's go back here okay so so we have
[00:25:32] let's go back here okay so so we have these three possibilities you give me a
[00:25:34] these three possibilities you give me a new formula
[00:25:36] new formula i'm either entailing it or contradicting
[00:25:38] i'm either entailing it or contradicting it or or i have contingency
[00:25:40] it or or i have contingency so now let's talk about let's talk about
[00:25:43] so now let's talk about let's talk about how we would use these ideas if we want
[00:25:46] how we would use these ideas if we want to implement a virtual assistant
[00:25:47] to implement a virtual assistant remember like we started this lecture
[00:25:50] remember like we started this lecture thinking about having a virtual
[00:25:51] thinking about having a virtual assistant where we can talk to it in
[00:25:52] assistant where we can talk to it in logic or language and that virtual
[00:25:55] logic or language and that virtual assistants right like you can tell with
[00:25:56] assistants right like you can tell with some information or we can ask it
[00:25:58] some information or we can ask it questions and then maybe you want to
[00:26:00] questions and then maybe you want to implement this tell operation so if i
[00:26:02] implement this tell operation so if i want to implement a tell operation in
[00:26:05] want to implement a tell operation in this in this virtual assistant the
[00:26:06] this in this virtual assistant the spiritual assistant is going to have
[00:26:08] spiritual assistant is going to have some knowledge base at the moment some
[00:26:10] some knowledge base at the moment some kb and i tell it f a new formula f so
[00:26:13] kb and i tell it f a new formula f so what can happen so if i tell it a new
[00:26:16] what can happen so if i tell it a new formula maybe i tell it it is raining so
[00:26:18] formula maybe i tell it it is raining so i do tell rain
[00:26:20] i do tell rain three things can happen right it can
[00:26:22] three things can happen right it can either it can either entail f right my
[00:26:24] either it can either entail f right my knowledge base might already like have
[00:26:26] knowledge base might already like have raining in it
[00:26:28] raining in it in that case the response to tell
[00:26:30] in that case the response to tell operation until tell it is raining it's
[00:26:32] operation until tell it is raining it's going to be i already knew that okay so
[00:26:34] going to be i already knew that okay so so if it says if my virtual assistant
[00:26:37] so if it says if my virtual assistant already entails rain it should respond
[00:26:39] already entails rain it should respond to me as i already knew that
[00:26:41] to me as i already knew that okay if it contradicts them it should
[00:26:44] okay if it contradicts them it should say i don't believe that because it's
[00:26:46] say i don't believe that because it's knowledge base basically says it's not
[00:26:47] knowledge base basically says it's not raining and now you're telling it it is
[00:26:49] raining and now you're telling it it is raining and and because of that it would
[00:26:51] raining and and because of that it would respond as i don't believe that because
[00:26:53] respond as i don't believe that because my knowledge base tells me the opposite
[00:26:56] my knowledge base tells me the opposite or if you're telling it something new
[00:26:58] or if you're telling it something new you're telling it it is raining and it
[00:26:59] you're telling it it is raining and it didn't know that or didn't have any
[00:27:00] didn't know that or didn't have any information about it then it should say
[00:27:02] information about it then it should say i learned something new and based on
[00:27:05] i learned something new and based on that thing that you're telling it it
[00:27:06] that thing that you're telling it it should update knowledge base it should
[00:27:08] should update knowledge base it should update its kb by by this this new
[00:27:10] update its kb by by this this new formula that it is raining okay so so
[00:27:13] formula that it is raining okay so so now we can implement a tele operation
[00:27:15] now we can implement a tele operation based on based on these three ideas of
[00:27:17] based on based on these three ideas of entailment contingency and contradiction
[00:27:20] entailment contingency and contradiction in a very similar fashion we can also
[00:27:22] in a very similar fashion we can also implement an ask operation if you ask it
[00:27:24] implement an ask operation if you ask it is it raining then based on that it can
[00:27:27] is it raining then based on that it can it can go ahead and it can answer yes a
[00:27:30] it can go ahead and it can answer yes a definite yes if kb already entails f so
[00:27:33] definite yes if kb already entails f so if you have entailment it should answer
[00:27:35] if you have entailment it should answer no if we have a contradiction if kb
[00:27:38] no if we have a contradiction if kb contradicts f right or if kb entails
[00:27:41] contradicts f right or if kb entails negation of f so it should give us a
[00:27:43] negation of f so it should give us a definite no no that there's a
[00:27:44] definite no no that there's a contradiction
[00:27:46] contradiction or it should tell us i don't know if
[00:27:48] or it should tell us i don't know if there is contingency it doesn't know so
[00:27:50] there is contingency it doesn't know so if you ask it is it raining it doesn't
[00:27:51] if you ask it is it raining it doesn't know so it should just say i don't know
[00:27:53] know so it should just say i don't know okay so so going back to the things we
[00:27:56] okay so so going back to the things we were defining here let me let me move
[00:27:58] were defining here let me let me move this inference rule thing
[00:28:01] this inference rule thing okay
[00:28:02] okay so um let me just write
[00:28:05] so um let me just write tell n so so we talked about this till
[00:28:08] tell n so so we talked about this till an ask operation and and we can
[00:28:10] an ask operation and and we can basically implement tell and ask based
[00:28:13] basically implement tell and ask based on this entailment contradiction and
[00:28:15] on this entailment contradiction and contingency
[00:28:17] contingency okay
[00:28:19] okay all right so i want to just do a quick
[00:28:21] all right so i want to just do a quick side note here i don't want to go into
[00:28:22] side note here i don't want to go into this in too much detail but there is a
[00:28:25] this in too much detail but there is a connection between the things we were
[00:28:27] connection between the things we were talking about here and and some of the
[00:28:29] talking about here and and some of the topics we discussed like two weeks ago
[00:28:31] topics we discussed like two weeks ago in bayesian networks right so so we've
[00:28:33] in bayesian networks right so so we've been talking about this idea of models
[00:28:35] been talking about this idea of models and having having these types of these
[00:28:37] and having having these types of these types of models and this is the same
[00:28:39] types of models and this is the same thing as assignments right and you can
[00:28:41] thing as assignments right and you can think of it you can basically think of
[00:28:43] think of it you can basically think of having bayesian networks as a
[00:28:45] having bayesian networks as a distribution over these assignments over
[00:28:48] distribution over these assignments over these models right i can have like a
[00:28:50] these models right i can have like a equal to zero b equal to zero c equal to
[00:28:52] equal to zero b equal to zero c equal to zero and i can have a probability
[00:28:54] zero and i can have a probability assigned to that like probability of
[00:28:55] assigned to that like probability of that could be point three and i can have
[00:28:57] that could be point three and i can have another assignment or model and i can
[00:28:59] another assignment or model and i can have another probability assigned to it
[00:29:01] have another probability assigned to it so from a bayesian network
[00:29:03] so from a bayesian network perspective from probabilistic
[00:29:05] perspective from probabilistic perspective one can think about logic in
[00:29:07] perspective one can think about logic in a probabilistic way and think about
[00:29:09] a probabilistic way and think about probability of a formula given a
[00:29:12] probability of a formula given a knowledge base so so when you have a
[00:29:14] knowledge base so so when you have a knowledge base you have some knowledge
[00:29:15] knowledge base you have some knowledge and and you're asking about a formula
[00:29:17] and and you're asking about a formula one can think of instead of thinking
[00:29:19] one can think of instead of thinking about just these three different things
[00:29:21] about just these three different things like entailment contradiction and
[00:29:23] like entailment contradiction and contingency one can think of a
[00:29:25] contingency one can think of a probability like an actual like value
[00:29:27] probability like an actual like value right and probability of the formula
[00:29:29] right and probability of the formula given a knowledge base so what is that
[00:29:31] given a knowledge base so what is that going to be equal to
[00:29:33] going to be equal to that is going to be equal to probability
[00:29:36] that is going to be equal to probability of of models w's that that exists in the
[00:29:39] of of models w's that that exists in the in the intersection of models of kb and
[00:29:41] in the intersection of models of kb and models of f right over probability of
[00:29:45] models of f right over probability of all possible models in knowledge base so
[00:29:47] all possible models in knowledge base so so w's are all possible models in my
[00:29:49] so w's are all possible models in my knowledge base i'm going to sum over all
[00:29:51] knowledge base i'm going to sum over all those probabilities of all possible
[00:29:54] those probabilities of all possible models in my knowledge base in the
[00:29:55] models in my knowledge base in the denominator and in the numerator i'm
[00:29:58] denominator and in the numerator i'm going to just focus on models that are
[00:30:00] going to just focus on models that are in the intersection of models of kb and
[00:30:02] in the intersection of models of kb and models of f and then the union so so
[00:30:05] models of f and then the union so so models of kb union f is equal to models
[00:30:08] models of kb union f is equal to models of kb intersection models of f that's
[00:30:11] of kb intersection models of f that's why there's a union here you remember
[00:30:13] why there's a union here you remember like if you add an f to your kb you're
[00:30:15] like if you add an f to your kb you're shrinking your knowledge base so so
[00:30:17] shrinking your knowledge base so so that's why this this numerator is
[00:30:19] that's why this this numerator is smaller right you're shrinking you're
[00:30:21] smaller right you're shrinking you're shrinking your knowledge base by adding
[00:30:22] shrinking your knowledge base by adding this f to it okay
[00:30:24] this f to it okay and if you think about this fraction
[00:30:26] and if you think about this fraction right this is a number this is a number
[00:30:28] right this is a number this is a number between zero and one and now we have
[00:30:30] between zero and one and now we have probabilities we have actually like a
[00:30:32] probabilities we have actually like a probability for f like being satisfied
[00:30:34] probability for f like being satisfied or not given given a knowledge base
[00:30:37] or not given given a knowledge base but in general like this was just like a
[00:30:39] but in general like this was just like a quick diversion like talking about a
[00:30:41] quick diversion like talking about a probabilistic view of this there's quite
[00:30:43] probabilistic view of this there's quite a bit of work actually in logic and
[00:30:44] a bit of work actually in logic and probabilistic versions of it and
[00:30:47] probabilistic versions of it and thinking about probabilistic model
[00:30:48] thinking about probabilistic model checking and instead of giving just like
[00:30:50] checking and instead of giving just like zero one values what would be a
[00:30:52] zero one values what would be a probabilistic view of it we're not gonna
[00:30:54] probabilistic view of it we're not gonna go into details of those in this class
[00:30:56] go into details of those in this class and and basically you can think of these
[00:30:58] and and basically you can think of these probabilities as these three different
[00:31:01] probabilities as these three different ways of looking at the problem that we
[00:31:03] ways of looking at the problem that we have been talking about if this
[00:31:04] have been talking about if this probability is equal to zero then we
[00:31:07] probability is equal to zero then we basically have an answer of no we have
[00:31:09] basically have an answer of no we have contradiction right like f is not
[00:31:10] contradiction right like f is not satisfied
[00:31:12] satisfied well if this probability is equal to one
[00:31:14] well if this probability is equal to one if the numerator and denominator are
[00:31:15] if the numerator and denominator are equal to each other then you're
[00:31:16] equal to each other then you're answering yes right like f is not adding
[00:31:19] answering yes right like f is not adding any information so we have entailment
[00:31:21] any information so we have entailment and if you get any other value in
[00:31:23] and if you get any other value in between any other value between zero and
[00:31:25] between any other value between zero and one then you're in an entailment
[00:31:26] one then you're in an entailment situation sorry we are in a sorry
[00:31:29] situation sorry we are in a sorry contingency situation and we basically
[00:31:31] contingency situation and we basically say we don't know right like any of them
[00:31:34] say we don't know right like any of them we basically say we don't know okay
[00:31:36] we basically say we don't know okay all right so so that was just a quick
[00:31:38] all right so so that was just a quick kind of like connection to probabilistic
[00:31:41] kind of like connection to probabilistic generalization of some of the things we
[00:31:43] generalization of some of the things we were talking about on ambition networks
[00:31:45] were talking about on ambition networks but now let's just go back to go back to
[00:31:47] but now let's just go back to go back to the same problem you're talking about
[00:31:49] the same problem you're talking about right so so we've talked about these
[00:31:51] right so so we've talked about these three different things entailment
[00:31:52] three different things entailment contingency and contradiction we've
[00:31:54] contingency and contradiction we've talked about how we can have a tell and
[00:31:56] talked about how we can have a tell and ask opera operator based on that now we
[00:31:59] ask opera operator based on that now we are going to talk about this idea of
[00:32:00] are going to talk about this idea of satisfiability so what is satisfiability
[00:32:04] satisfiability so what is satisfiability so a knowledge-based kb is satisfiable
[00:32:06] so a knowledge-based kb is satisfiable if models of kb is not empty okay
[00:32:10] if models of kb is not empty okay very simple like models of kb is not
[00:32:12] very simple like models of kb is not empty we have satisfied gold okay so why
[00:32:15] empty we have satisfied gold okay so why is satisfiable to useful why am i
[00:32:17] is satisfiable to useful why am i talking about this at satisfiability
[00:32:19] talking about this at satisfiability because satisfiability is a well-known
[00:32:21] because satisfiability is a well-known problem we have really good like solvers
[00:32:23] problem we have really good like solvers for it sat solvers and then we're going
[00:32:25] for it sat solvers and then we're going to talk about that in one slide really
[00:32:27] to talk about that in one slide really quickly so it's nice to think about
[00:32:29] quickly so it's nice to think about these three different things entailment
[00:32:31] these three different things entailment um and contingency and and contradiction
[00:32:36] um and contingency and and contradiction in in kind of a view of a problem of
[00:32:38] in in kind of a view of a problem of satisfiability okay so so we have these
[00:32:41] satisfiability okay so so we have these three things satisfiability gives me a
[00:32:43] three things satisfiability gives me a yes or no answer so how can i use
[00:32:46] yes or no answer so how can i use satisfiability to answer if you are in
[00:32:48] satisfiability to answer if you are in either in any of these situations
[00:32:51] either in any of these situations so so the way you satisfy ability is we
[00:32:53] so so the way you satisfy ability is we do two calls to satisfy
[00:32:55] do two calls to satisfy so
[00:32:57] so the first call is in general if i want
[00:32:59] the first call is in general if i want to think about my ask operator or tell
[00:33:01] to think about my ask operator or tell operator and if i want to reduce it to
[00:33:03] operator and if i want to reduce it to satisfiability i can i can do two calls
[00:33:05] satisfiability i can i can do two calls to satisfy both i can first ask if kb
[00:33:08] to satisfy both i can first ask if kb union negation of f is satisfiable or
[00:33:11] union negation of f is satisfiable or not okay so what does the answer to that
[00:33:14] not okay so what does the answer to that give me so if i get no for that right if
[00:33:17] give me so if i get no for that right if kb union negation of f is not
[00:33:19] kb union negation of f is not satisfiable i have entailment right so i
[00:33:22] satisfiable i have entailment right so i get
[00:33:23] get my answer for entailment here
[00:33:25] my answer for entailment here and if i get yes for that that doesn't
[00:33:27] and if i get yes for that that doesn't answer everything right like if i just
[00:33:29] answer everything right like if i just get yes for this i don't know if i'm in
[00:33:31] get yes for this i don't know if i'm in a contingency situation or a
[00:33:32] a contingency situation or a contradiction situation so what do i
[00:33:34] contradiction situation so what do i need to do i need to do another call to
[00:33:36] need to do i need to do another call to satisfy voting and the second call to
[00:33:38] satisfy voting and the second call to satisfyability is asking if kb union f
[00:33:42] satisfyability is asking if kb union f is satisfiable and what does that check
[00:33:44] is satisfiable and what does that check well well if i get no for that then i'm
[00:33:47] well well if i get no for that then i'm getting contradiction remember
[00:33:48] getting contradiction remember contradiction is the same thing as
[00:33:50] contradiction is the same thing as entailing negation of f that is why the
[00:33:53] entailing negation of f that is why the answer for this gives me contradiction
[00:33:55] answer for this gives me contradiction and if i get yes for that i get
[00:33:56] and if i get yes for that i get contingency so what i've just done is is
[00:33:59] contingency so what i've just done is is if i in general if i want to know if i'm
[00:34:02] if i in general if i want to know if i'm in the entailment contradiction or
[00:34:03] in the entailment contradiction or contingency situation
[00:34:05] contingency situation then i can basically figure that out
[00:34:07] then i can basically figure that out with two calls to satisfiability
[00:34:09] with two calls to satisfiability and and why do i want to know i'm in any
[00:34:12] and and why do i want to know i'm in any of these situations because that helps
[00:34:13] of these situations because that helps me implement my ask and tell operations
[00:34:17] me implement my ask and tell operations so going back here
[00:34:19] so going back here so we talked about ask and so we talked
[00:34:21] so we talked about ask and so we talked about how it relates to entailment
[00:34:22] about how it relates to entailment contradiction and contingency and now we
[00:34:24] contradiction and contingency and now we have talked about satisfiability
[00:34:26] have talked about satisfiability as a way of answering
[00:34:29] as a way of answering which scenario we are in okay
[00:34:31] which scenario we are in okay and then how do we answer satisfiability
[00:34:34] and then how do we answer satisfiability so so that's a good question to ask so
[00:34:36] so so that's a good question to ask so what is satisfiability so satisfiability
[00:34:39] what is satisfiability so satisfiability and checking satisfiable to the sad
[00:34:40] and checking satisfiable to the sad problem in propositional logic is
[00:34:43] problem in propositional logic is basically just a special case of solving
[00:34:46] basically just a special case of solving a constrained satisfaction problem a csv
[00:34:49] a constrained satisfaction problem a csv and we have already like like learned
[00:34:50] and we have already like like learned about cscs and solving csvs so what that
[00:34:53] about cscs and solving csvs so what that means is we can basically check
[00:34:55] means is we can basically check satisfiability we can basically uh
[00:34:59] satisfiability we can basically uh check if the csb problem like was the
[00:35:01] check if the csb problem like was the solution to the csv problem and solve
[00:35:04] solution to the csv problem and solve satisfied we'll give you the algorithms
[00:35:05] satisfied we'll give you the algorithms that we already have access to okay and
[00:35:08] that we already have access to okay and this idea of checking satisfiability is
[00:35:11] this idea of checking satisfiability is called model checking you're checking if
[00:35:13] called model checking you're checking if a model exists or not you're checking if
[00:35:15] a model exists or not you're checking if an assignment exists or not okay so so
[00:35:18] an assignment exists or not okay so so the mapping of the sat problem to csvs
[00:35:21] the mapping of the sat problem to csvs is as follows so prepositional symbols
[00:35:23] is as follows so prepositional symbols are basically variables what we used to
[00:35:25] are basically variables what we used to call variables
[00:35:26] call variables formulas are basically constraints and
[00:35:30] formulas are basically constraints and then if you have variables and
[00:35:31] then if you have variables and constraints you can come up with an
[00:35:32] constraints you can come up with an assignment and that assignment is
[00:35:34] assignment and that assignment is basically a model so you're checking if
[00:35:36] basically a model so you're checking if a model exists or not you're checking if
[00:35:39] a model exists or not you're checking if if a satisfying assignment exists or not
[00:35:41] if a satisfying assignment exists or not okay
[00:35:43] okay let's look at an example so let's say
[00:35:44] let's look at an example so let's say our knowledge base has these two
[00:35:46] our knowledge base has these two formulas in it we have a or b
[00:35:49] formulas in it we have a or b and we have b um
[00:35:51] and we have b um bi-directional implication negation of c
[00:35:54] bi-directional implication negation of c okay
[00:35:55] okay all right so we have three symbols a b
[00:35:58] all right so we have three symbols a b and c these symbols are the same things
[00:36:00] and c these symbols are the same things as csp variables so we can have three
[00:36:03] as csp variables so we can have three nodes these three variables and then we
[00:36:05] nodes these three variables and then we have basically two formulas these
[00:36:07] have basically two formulas these formulas create constraints in our csv
[00:36:10] formulas create constraints in our csv so we have a or b and then we have we
[00:36:11] so we have a or b and then we have we have b equivalent negation of c
[00:36:13] have b equivalent negation of c and then what are we doing so we have a
[00:36:15] and then what are we doing so we have a csv we can solve it right we can find an
[00:36:17] csv we can solve it right we can find an assignment a consistent assignment for
[00:36:19] assignment a consistent assignment for it which is the same thing as a
[00:36:21] it which is the same thing as a satisfying model and if you find an
[00:36:24] satisfying model and if you find an assignment this problem is satisfiable
[00:36:26] assignment this problem is satisfiable model shaking comes up with a model for
[00:36:28] model shaking comes up with a model for it
[00:36:29] it and then if it is not satisfiable it's
[00:36:31] and then if it is not satisfiable it's going to return as unset it doesn't come
[00:36:33] going to return as unset it doesn't come up with any assignments
[00:36:35] up with any assignments so that's kind of nice going back here
[00:36:37] so that's kind of nice going back here right like this problem that you've been
[00:36:39] right like this problem that you've been talking about this tell and ask
[00:36:41] talking about this tell and ask operation
[00:36:42] operation reduces to entail my contradiction and
[00:36:44] reduces to entail my contradiction and contingency i can use satisfiability to
[00:36:47] contingency i can use satisfiability to cause the satisfiability to to answer
[00:36:50] cause the satisfiability to to answer that then how do i do that well i use
[00:36:52] that then how do i do that well i use model checkers to do that so so that's
[00:36:54] model checkers to do that so so that's called model checking checking the
[00:36:56] called model checking checking the satisfiability which is basically
[00:36:58] satisfiability which is basically solving a csv okay
[00:37:02] solving a csv okay all right so going back here okay so so
[00:37:05] all right so going back here okay so so what does model checking do model
[00:37:06] what does model checking do model checking takes as an input a knowledge
[00:37:08] checking takes as an input a knowledge base and what does it output it outputs
[00:37:10] base and what does it output it outputs if there exists a satisfying model or
[00:37:12] if there exists a satisfying model or not and if it does it returns that model
[00:37:15] not and if it does it returns that model okay so it checks if models of kb is is
[00:37:18] okay so it checks if models of kb is is not empty
[00:37:19] not empty and there are a good number of
[00:37:21] and there are a good number of algorithms out there that try to do
[00:37:22] algorithms out there that try to do model checking and and um some of the
[00:37:25] model checking and and um some of the older ones is the dpl is kind of like a
[00:37:27] older ones is the dpl is kind of like a well-known algorithm that tries to do
[00:37:29] well-known algorithm that tries to do satisfiability and model checking what
[00:37:32] satisfiability and model checking what it does is it uses backtracking search
[00:37:34] it does is it uses backtracking search and quite a bit of pruning and quite a
[00:37:36] and quite a bit of pruning and quite a bit of heuristic goes into it
[00:37:38] bit of heuristic goes into it to make sure that it can solve this
[00:37:39] to make sure that it can solve this problem as fast as possible um some some
[00:37:42] problem as fast as possible um some some more recent algorithms are things like
[00:37:44] more recent algorithms are things like walk sets that is pretty similar to
[00:37:46] walk sets that is pretty similar to gibbs sampling and it does a randomized
[00:37:48] gibbs sampling and it does a randomized local search there are a good number of
[00:37:51] local search there are a good number of solvers out there satisfiability solvers
[00:37:53] solvers out there satisfiability solvers out there z3 is a famous
[00:37:55] out there z3 is a famous sat solver that you can look into if
[00:37:57] sat solver that you can look into if you're interested in solving sad
[00:37:59] you're interested in solving sad problems
[00:38:01] problems and and with that we now have a good
[00:38:03] and and with that we now have a good idea of syntax we have a good idea of
[00:38:05] idea of syntax we have a good idea of semantics and next what we would like to
[00:38:07] semantics and next what we would like to talk about is we would like to talk
[00:38:09] talk about is we would like to talk about what the formulas get us right
[00:38:11] about what the formulas get us right like why do we live in the formula on
[00:38:13] like why do we live in the formula on land like why do we want to even like
[00:38:15] land like why do we want to even like look at syntax and it turns out that we
[00:38:17] look at syntax and it turns out that we can we can do inference on formulas and
[00:38:19] can we can do inference on formulas and and that that buys us quite a bit so in
[00:38:21] and that that buys us quite a bit so in the next module we're going to be
[00:38:23] the next module we're going to be talking about what formula is formulas
[00:38:25] talking about what formula is formulas bias and how to do inference rules
[00:38:33] you


================================================================================
LECTURE 044
================================================================================

Logic 4 - Inference Rules | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=RIk67yGMVv4

---

Transcript

[00:00:05] all right so in this module we will be
[00:00:07] all right so in this module we will be talking about inference rules so if you
[00:00:09] talking about inference rules so if you remember so far we've been talking about
[00:00:11] remember so far we've been talking about syntax and semantics and now we would
[00:00:13] syntax and semantics and now we would like to talk about how we can play
[00:00:15] like to talk about how we can play around with formulas and manipulate them
[00:00:18] around with formulas and manipulate them and apply inference rules on them i want
[00:00:21] and apply inference rules on them i want to talk about this diagram a little bit
[00:00:23] to talk about this diagram a little bit more um for a second before jumping into
[00:00:25] more um for a second before jumping into inference rules so let me let me go back
[00:00:27] inference rules so let me let me go back to my
[00:00:28] to my my whiteboard here so
[00:00:31] my whiteboard here so basically
[00:00:33] basically what i've been drawing here is we
[00:00:35] what i've been drawing here is we live in the syntax land and formulas
[00:00:37] live in the syntax land and formulas live in the syntax land so i'm going to
[00:00:39] live in the syntax land so i'm going to draw them like this so maybe i have
[00:00:40] draw them like this so maybe i have formula f1
[00:00:42] formula f1 formula f2
[00:00:44] formula f2 through maybe formula fn
[00:00:47] through maybe formula fn okay
[00:00:48] okay and then in the semantic slant
[00:00:51] and then in the semantic slant i give meanings
[00:00:52] i give meanings to these formulas right so so each
[00:00:55] to these formulas right so so each formula has a corresponding set of
[00:00:57] formula has a corresponding set of models to it so so each one of these
[00:00:59] models to it so so each one of these formulas will correspond to something
[00:01:02] formulas will correspond to something that i'm calling models of f1
[00:01:07] that i'm calling models of f1 f2 might have another set of models
[00:01:11] f2 might have another set of models models of f2
[00:01:13] models of f2 and so on and you might have a bunch of
[00:01:15] and so on and you might have a bunch of other ones so so let's say that i only
[00:01:17] other ones so so let's say that i only have like three of these actually let me
[00:01:19] have like three of these actually let me make this
[00:01:21] make this that's three to make this simpler
[00:01:24] that's three to make this simpler so let's say i have f3 and f3 has
[00:01:28] so let's say i have f3 and f3 has another set of models corresponding to
[00:01:30] another set of models corresponding to it autos of f3 okay
[00:01:35] and then this part defines our knowledge
[00:01:37] and then this part defines our knowledge base right so we talked about knowledge
[00:01:39] base right so we talked about knowledge base being a set of formulas and and the
[00:01:43] base being a set of formulas and and the shaded area
[00:01:44] shaded area corresponds to models of knowledge base
[00:01:48] corresponds to models of knowledge base okay so this is models of our knowledge
[00:01:50] okay so this is models of our knowledge base
[00:01:51] base so this is what we have talked about so
[00:01:53] so this is what we have talked about so far so now we want to talk about what
[00:01:55] far so now we want to talk about what inference rules really do
[00:01:57] inference rules really do so if you have a set of knowledge in
[00:01:59] so if you have a set of knowledge in your in your
[00:02:00] your in your knowledge base this set of formulas in
[00:02:02] knowledge base this set of formulas in your knowledge base the idea of
[00:02:04] your knowledge base the idea of inference rules is could you could you
[00:02:06] inference rules is could you could you basically apply a set of syntactic rules
[00:02:09] basically apply a set of syntactic rules on them and based on those rules i'm
[00:02:12] on them and based on those rules i'm going to call them inference rules
[00:02:15] going to call them inference rules infer
[00:02:17] infer something
[00:02:18] something a new formula here
[00:02:20] a new formula here so from
[00:02:23] so from the formulas f1 through f3 that you have
[00:02:25] the formulas f1 through f3 that you have could you infer a new formula a new g
[00:02:28] could you infer a new formula a new g that is just based on the formulas that
[00:02:30] that is just based on the formulas that you have and based on like symbolically
[00:02:32] you have and based on like symbolically manipulating on it manipulating them and
[00:02:35] manipulating on it manipulating them and the question is could you make sure that
[00:02:37] the question is could you make sure that that g that you're inferring
[00:02:39] that g that you're inferring actually has a model
[00:02:42] actually has a model that is a superset
[00:02:45] that is a superset of so this is model such as a superset
[00:02:48] of so this is model such as a superset of models of of of kb because ideally
[00:02:51] of models of of of kb because ideally you want to be able to infer something
[00:02:53] you want to be able to infer something that just directly comes from the
[00:02:55] that just directly comes from the formulas that you have so ideally you
[00:02:57] formulas that you have so ideally you would want to be in a situation where
[00:02:59] would want to be in a situation where models of g is going to be a subset of
[00:03:02] models of g is going to be a subset of models of kb and what does that mean
[00:03:04] models of kb and what does that mean that means kb entails g
[00:03:07] that means kb entails g right so so could you have a set of
[00:03:09] right so so could you have a set of inference rules that end up giving you a
[00:03:11] inference rules that end up giving you a g and then you end up in a situation
[00:03:13] g and then you end up in a situation where where models of g is going to be a
[00:03:15] where where models of g is going to be a subset of models of kb so so could we
[00:03:18] subset of models of kb so so could we come up come up with uh those set of
[00:03:21] come up come up with uh those set of inference rules and those set of g's so
[00:03:23] inference rules and those set of g's so that is the idea of inference rules and
[00:03:25] that is the idea of inference rules and what we're going to be talking about
[00:03:26] what we're going to be talking about today in this lecture okay
[00:03:29] today in this lecture okay all right and that and that's basically
[00:03:31] all right and that and that's basically what this diagram shows so we have a set
[00:03:32] what this diagram shows so we have a set of formulas each one of them correspond
[00:03:34] of formulas each one of them correspond to a set of models at the end of the day
[00:03:36] to a set of models at the end of the day i want to apply a set of inference rules
[00:03:38] i want to apply a set of inference rules on these formulas and i want to be able
[00:03:40] on these formulas and i want to be able to come up with
[00:03:42] to come up with models that is a super set of models of
[00:03:45] models that is a super set of models of my my knowledge base right would i be
[00:03:47] my my knowledge base right would i be able to do that
[00:03:49] able to do that all right so
[00:03:50] all right so let's talk about that so so let me give
[00:03:51] let's talk about that so so let me give you an example what do i mean by that so
[00:03:53] you an example what do i mean by that so so let's say that uh i know it is
[00:03:56] so let's say that uh i know it is raining so in my knowledge base i have
[00:03:58] raining so in my knowledge base i have that it is raining and then i have that
[00:04:01] that it is raining and then i have that if it is raining then it is wet so rain
[00:04:03] if it is raining then it is wet so rain implies wet if i tell you that it is
[00:04:06] implies wet if i tell you that it is raining
[00:04:07] raining and if it rains that implies wet what
[00:04:09] and if it rains that implies wet what can you tell me from that just from that
[00:04:11] can you tell me from that just from that knowledge
[00:04:12] knowledge from that knowledge you should be able
[00:04:14] from that knowledge you should be able to infer that well therefore it is wet
[00:04:16] to infer that well therefore it is wet right it's raining and bringing implies
[00:04:18] right it's raining and bringing implies with so there it's got to be wet okay so
[00:04:21] with so there it's got to be wet okay so so that is the idea of infrastructure
[00:04:23] so that is the idea of infrastructure could we like try to have a rule that
[00:04:25] could we like try to have a rule that that tries to basically infer wet here
[00:04:28] that tries to basically infer wet here okay just based on the formulas
[00:04:31] okay just based on the formulas so so in general in inference rules we
[00:04:33] so so in general in inference rules we have a set of premises we have a set of
[00:04:34] have a set of premises we have a set of formulas like rain and rain implies wet
[00:04:37] formulas like rain and rain implies wet and based on that we want to come up
[00:04:39] and based on that we want to come up with a conclusion in this case for
[00:04:40] with a conclusion in this case for example that conclusion is is that it is
[00:04:43] example that conclusion is is that it is wet and then that defines like in
[00:04:45] wet and then that defines like in general like inference rules there's a
[00:04:46] general like inference rules there's a specific type of inference rule that
[00:04:48] specific type of inference rule that you're going to talk about it's a pretty
[00:04:50] you're going to talk about it's a pretty simple one and it's called modus ponens
[00:04:53] simple one and it's called modus ponens okay so modus ponens is a very simple
[00:04:55] okay so modus ponens is a very simple inference rule and what it does what it
[00:04:57] inference rule and what it does what it says is that for any prepositional
[00:04:59] says is that for any prepositional simple symbol p and q
[00:05:02] simple symbol p and q if in my premises in my knowledge base i
[00:05:04] if in my premises in my knowledge base i have p and p implies q
[00:05:06] have p and p implies q that allows me to conclude q
[00:05:09] that allows me to conclude q kind of like this example that we saw
[00:05:11] kind of like this example that we saw here
[00:05:12] here and you should think of inference rules
[00:05:14] and you should think of inference rules as very like syntactic symbolic views of
[00:05:16] as very like syntactic symbolic views of the world so so i basically just like
[00:05:18] the world so so i basically just like look at my knowledge base if i find
[00:05:20] look at my knowledge base if i find anything that matches form p and
[00:05:22] anything that matches form p and anything that matches form p implies q
[00:05:25] anything that matches form p implies q from that i should be able to infer to
[00:05:26] from that i should be able to infer to you okay
[00:05:28] you okay let me put this here in the set of my
[00:05:30] let me put this here in the set of my definition so now we are at inference
[00:05:32] definition so now we are at inference rules
[00:05:33] rules and we are going to be talking about
[00:05:36] and we are going to be talking about modus ponens
[00:05:39] modus ponens so mode exponents is an inference rule
[00:05:41] so mode exponents is an inference rule it tells us if we have p and p implies q
[00:05:45] it tells us if we have p and p implies q we can infer q based on that okay
[00:05:49] we can infer q based on that okay all right
[00:05:51] all right so in general inference rules we can
[00:05:53] so in general inference rules we can write them in in this way we have a set
[00:05:55] write them in in this way we have a set of formulas f1 through fk and following
[00:05:57] of formulas f1 through fk and following an inference rule allows us to conclude
[00:06:00] an inference rule allows us to conclude g about it and depends on what inference
[00:06:01] g about it and depends on what inference rule we are using modus ponens is an
[00:06:04] rule we are using modus ponens is an example that we have just seen and again
[00:06:06] example that we have just seen and again these rules are applied directly on
[00:06:08] these rules are applied directly on syntax and they do not care about
[00:06:10] syntax and they do not care about semantics they don't care about what
[00:06:11] semantics they don't care about what raining means or what wet and raining
[00:06:14] raining means or what wet and raining and the meaning between them actually
[00:06:16] and the meaning between them actually means right they're just applied on the
[00:06:18] means right they're just applied on the syntax on the formulas and that's kind
[00:06:20] syntax on the formulas and that's kind of the power of logic right like we
[00:06:21] of the power of logic right like we talked about these formulas as a compact
[00:06:23] talked about these formulas as a compact way of representing representing much
[00:06:26] way of representing representing much much larger meanings like exponential
[00:06:29] much larger meanings like exponential meanings a lot of times and and now on
[00:06:31] meanings a lot of times and and now on these very compact formulas we can apply
[00:06:33] these very compact formulas we can apply syntactic rules we can apply inference
[00:06:35] syntactic rules we can apply inference rules and based on them we can infer new
[00:06:38] rules and based on them we can infer new formulas and and that have new meanings
[00:06:40] formulas and and that have new meanings basically
[00:06:41] basically okay so so if i want to think about what
[00:06:43] okay so so if i want to think about what an inference algorithm does kind of like
[00:06:45] an inference algorithm does kind of like a meta algorithm an inference algorithm
[00:06:48] a meta algorithm an inference algorithm here does something of this form so so
[00:06:50] here does something of this form so so we have an input that input is going to
[00:06:52] we have an input that input is going to be a set of inference rules we've talked
[00:06:54] be a set of inference rules we've talked about modus ponens as an example but in
[00:06:56] about modus ponens as an example but in general i might have other inference
[00:06:58] general i might have other inference rules and what i'd like to do is i'll
[00:07:00] rules and what i'd like to do is i'll like to repeat this loop until there are
[00:07:02] like to repeat this loop until there are no changes no more changes apply to my
[00:07:04] no changes no more changes apply to my knowledge base and what do i do i just
[00:07:06] knowledge base and what do i do i just choose a subset of my of
[00:07:10] choose a subset of my of formulas from my knowledge base and if i
[00:07:13] formulas from my knowledge base and if i can match my inference rule and and
[00:07:15] can match my inference rule and and infer a new formula g i will add g back
[00:07:19] infer a new formula g i will add g back to the knowledge base and i keep doing
[00:07:21] to the knowledge base and i keep doing this until there's no more g's no no
[00:07:23] this until there's no more g's no no more new formulas to be added to my
[00:07:25] more new formulas to be added to my knowledge base okay so that is what an
[00:07:27] knowledge base okay so that is what an inference algorithm does
[00:07:29] inference algorithm does and and one other definition here is is
[00:07:31] and and one other definition here is is this idea of derivation and proving so
[00:07:34] this idea of derivation and proving so what we say is is that a knowledge base
[00:07:36] what we say is is that a knowledge base proves or derives a formula f
[00:07:39] proves or derives a formula f even only if if
[00:07:41] even only if if even only if f eventually gets added to
[00:07:44] even only if f eventually gets added to knowledge base
[00:07:46] knowledge base so going back to to
[00:07:49] so going back to to let's say our definitions so now we have
[00:07:52] let's say our definitions so now we have a definition
[00:07:53] a definition of derivation
[00:07:56] of derivation or proving
[00:07:58] or proving and you're going to use
[00:08:00] and you're going to use so you're going to say a knowledge base
[00:08:02] so you're going to say a knowledge base derives f and you're going to represent
[00:08:05] derives f and you're going to represent that by this one line um one line kind
[00:08:08] that by this one line um one line kind of symbol so so going back here we're
[00:08:12] of symbol so so going back here we're going to basically if i have f1 through
[00:08:14] going to basically if i have f1 through f3 in my knowledge base if my knowledge
[00:08:17] f3 in my knowledge base if my knowledge base and like if i apply inference rules
[00:08:20] base and like if i apply inference rules and get a new g i would say my knowledge
[00:08:22] and get a new g i would say my knowledge base
[00:08:23] base is
[00:08:25] is deriving or proving g
[00:08:28] deriving or proving g okay
[00:08:29] okay and so okay so on this magic land then i
[00:08:31] and so okay so on this magic land then i have this idea of entailment
[00:08:35] have this idea of entailment which might be different
[00:08:37] which might be different from
[00:08:38] from what we have
[00:08:39] what we have in the syntactic land which is this idea
[00:08:41] in the syntactic land which is this idea of inferring or proving
[00:08:45] of inferring or proving or deriving
[00:08:48] or deriving so okay
[00:08:50] so okay all right so we'll talk about the
[00:08:52] all right so we'll talk about the relationship between these two in a few
[00:08:54] relationship between these two in a few slides but let me just go back to go
[00:08:56] slides but let me just go back to go back to talking about derivation a
[00:08:58] back to talking about derivation a little bit more okay so that is their
[00:09:00] little bit more okay so that is their evasion that is proving so let's look at
[00:09:02] evasion that is proving so let's look at an example let's say that i have a
[00:09:04] an example let's say that i have a knowledge base and in my knowledge base
[00:09:06] knowledge base and in my knowledge base i have that it is raining and i have
[00:09:08] i have that it is raining and i have that rating implies being wet and i have
[00:09:11] that rating implies being wet and i have that wet implies slippery okay so the
[00:09:14] that wet implies slippery okay so the question is can you can i apply the
[00:09:16] question is can you can i apply the inference algorithm on this modus ponens
[00:09:18] inference algorithm on this modus ponens using using just modus ponens and
[00:09:21] using using just modus ponens and rules how does how does that work let's
[00:09:24] rules how does how does that work let's actually try this out using using the
[00:09:27] actually try this out using using the system that we looked at in the
[00:09:29] system that we looked at in the overview lecture so um let's say it is
[00:09:32] overview lecture so um let's say it is raining
[00:09:33] raining okay
[00:09:34] okay so it says i learned something i can
[00:09:36] so it says i learned something i can look at the knowledge base so let's look
[00:09:37] look at the knowledge base so let's look at what is in the knowledge base so
[00:09:39] at what is in the knowledge base so raining is the in the knowledge base
[00:09:41] raining is the in the knowledge base okay
[00:09:42] okay i can say if it is raining
[00:09:45] i can say if it is raining then it is wet
[00:09:47] then it is wet okay so it says oh i learned something
[00:09:50] okay so it says oh i learned something let's look at the knowledge base so it's
[00:09:52] let's look at the knowledge base so it's ha it has it is raining
[00:09:54] ha it has it is raining it has raining implies wet that is that
[00:09:56] it has raining implies wet that is that is what this means because rain implies
[00:09:58] is what this means because rain implies width is equivalent to not rain or red
[00:10:01] width is equivalent to not rain or red right because that's what implication
[00:10:03] right because that's what implication logical implication means
[00:10:05] logical implication means and then based on these two things it
[00:10:07] and then based on these two things it actually derives wet it applies modus
[00:10:09] actually derives wet it applies modus ponens remember i have rain rain implies
[00:10:12] ponens remember i have rain rain implies wet what does modus ponens give me more
[00:10:14] wet what does modus ponens give me more disponents gives me wet
[00:10:16] disponents gives me wet so so i can derive it
[00:10:18] so so i can derive it let's add if it is wet
[00:10:22] let's add if it is wet it is slippery
[00:10:24] it is slippery and let's see what that gives us so it
[00:10:26] and let's see what that gives us so it says i learned something let's look at
[00:10:27] says i learned something let's look at the knowledge base okay so we have a
[00:10:29] the knowledge base okay so we have a bunch of things right so these are
[00:10:31] bunch of things right so these are things i added i said i added rain i
[00:10:33] things i added i said i added rain i added rain implies wet i added red
[00:10:36] added rain implies wet i added red implies slippery from the first modus
[00:10:38] implies slippery from the first modus ponens that we applied we got wet
[00:10:41] ponens that we applied we got wet we can apply modus ponens on wet and red
[00:10:44] we can apply modus ponens on wet and red implies slippery and we can get slippery
[00:10:46] implies slippery and we can get slippery so this slippery is added
[00:10:48] so this slippery is added we also get another formula here rain
[00:10:51] we also get another formula here rain implies slippery
[00:10:52] implies slippery and modus bonus actually doesn't get us
[00:10:54] and modus bonus actually doesn't get us that but but this this is using other
[00:10:57] that but but this this is using other other types of inference rules not just
[00:10:59] other types of inference rules not just modus ponens and then if you if you
[00:11:01] modus ponens and then if you if you apply other inference rules you might
[00:11:03] apply other inference rules you might actually get rain implies like right
[00:11:04] actually get rain implies like right here okay
[00:11:06] here okay all right so so let's look at this exact
[00:11:08] all right so so let's look at this exact example here
[00:11:10] example here all right so rain and rain implies wet
[00:11:13] all right so rain and rain implies wet will will get us wet right
[00:11:16] will will get us wet right so we apply modus bonus again on red and
[00:11:19] so we apply modus bonus again on red and red implies slippery and that gets us
[00:11:21] red implies slippery and that gets us slippery and if the only inference rule
[00:11:24] slippery and if the only inference rule that we have here is modus ponens then
[00:11:26] that we have here is modus ponens then we have converged knowledge base is not
[00:11:28] we have converged knowledge base is not changing anymore we have derived wet and
[00:11:30] changing anymore we have derived wet and slippery we have derived new formulas um
[00:11:33] slippery we have derived new formulas um but uh
[00:11:35] but uh we we are we are basically we have
[00:11:37] we we are we are basically we have converged at this point and we can't
[00:11:38] converged at this point and we can't derive anything more okay so
[00:11:40] derive anything more okay so so we can't derive a set of other things
[00:11:43] so we can't derive a set of other things here right like we haven't derived like
[00:11:44] here right like we haven't derived like not wet probably a good thing that we
[00:11:46] not wet probably a good thing that we haven't derived not with because not red
[00:11:49] haven't derived not with because not red is actually contradictory to our
[00:11:51] is actually contradictory to our knowledge base like it's actually not
[00:11:52] knowledge base like it's actually not true right so we shouldn't be able to
[00:11:54] true right so we shouldn't be able to derive not wet that's a good thing
[00:11:57] derive not wet that's a good thing in addition to that we weren't able to
[00:11:59] in addition to that we weren't able to derive rain implies slippery which is
[00:12:01] derive rain implies slippery which is actually true right like if you think
[00:12:03] actually true right like if you think about entailment and what is the truth
[00:12:05] about entailment and what is the truth right like rain implies slippery is
[00:12:07] right like rain implies slippery is entailed here but but we weren't able to
[00:12:10] entailed here but but we weren't able to we weren't able to get that by just
[00:12:13] we weren't able to get that by just applying modus ponens and and we will
[00:12:15] applying modus ponens and and we will talk about that in in general in a few
[00:12:18] talk about that in in general in a few slides like why is it that we can't get
[00:12:20] slides like why is it that we can't get rain implies slippery and what can we do
[00:12:22] rain implies slippery and what can we do to make sure that we get everything that
[00:12:24] to make sure that we get everything that is entailed okay and and that is kind of
[00:12:26] is entailed okay and and that is kind of like the same question as like we see
[00:12:29] like the same question as like we see here right like what is the relationship
[00:12:31] here right like what is the relationship between entailment and and and varying
[00:12:34] between entailment and and and varying and and deriving so derivation and
[00:12:36] and and deriving so derivation and entailment how are they related are they
[00:12:38] entailment how are they related are they the same thing or are they doing
[00:12:39] the same thing or are they doing different things and does it depend on
[00:12:41] different things and does it depend on the inference rule okay
[00:12:44] the inference rule okay all right so
[00:12:46] all right so so the density so far is we have
[00:12:48] so the density so far is we have semantics semantics is really about
[00:12:51] semantics semantics is really about truth right it's about entailment like
[00:12:54] truth right it's about entailment like about the meaning like what is actually
[00:12:55] about the meaning like what is actually true when you say a knowledge base
[00:12:57] true when you say a knowledge base entails f what that means is is that
[00:13:00] entails f what that means is is that models of knowledge base is a subset of
[00:13:02] models of knowledge base is a subset of models of f and and no and and in terms
[00:13:05] models of f and and no and and in terms of like meaning like
[00:13:07] of like meaning like that is actually what what the true
[00:13:09] that is actually what what the true truth is on the other hand we've talked
[00:13:11] truth is on the other hand we've talked about syntax in syntax we just do symbol
[00:13:14] about syntax in syntax we just do symbol manipulation right using inference rules
[00:13:16] manipulation right using inference rules we've looked at mode exponents as an
[00:13:18] we've looked at mode exponents as an inference rule and then we have looked
[00:13:20] inference rule and then we have looked at things like uh derivation so
[00:13:22] at things like uh derivation so knowledge base derives f
[00:13:24] knowledge base derives f okay
[00:13:25] okay so how are these two related let's talk
[00:13:27] so how are these two related let's talk about that and that brings us to the
[00:13:29] about that and that brings us to the idea of soundness and completeness okay
[00:13:32] idea of soundness and completeness okay so we're going to talk about soundness
[00:13:34] so we're going to talk about soundness and completeness
[00:13:36] and completeness so let's look at this as an example so
[00:13:37] so let's look at this as an example so imagine
[00:13:38] imagine imagine that you have a glass okay and
[00:13:41] imagine that you have a glass okay and and every and things that go inside of
[00:13:43] and every and things that go inside of the glass are formulas
[00:13:45] the glass are formulas and imagine that anything that is inside
[00:13:48] and imagine that anything that is inside of the glass is the truth so what does
[00:13:50] of the glass is the truth so what does that mean that means that knowledge base
[00:13:52] that mean that means that knowledge base entails those formulas so so every
[00:13:55] entails those formulas so so every formula that is true is going to be
[00:13:56] formula that is true is going to be inside of the glass
[00:13:59] inside of the glass so the idea of soundness is that if i am
[00:14:02] so the idea of soundness is that if i am applying inference rules if i'm from
[00:14:05] applying inference rules if i'm from running a bunch of inference rules the
[00:14:07] running a bunch of inference rules the formulas that are derived from those
[00:14:09] formulas that are derived from those inference rules i want to make sure that
[00:14:11] inference rules i want to make sure that they're also going to be inside of the
[00:14:13] they're also going to be inside of the glass i want to make sure that they're
[00:14:15] glass i want to make sure that they're also true okay so so the idea of
[00:14:18] also true okay so so the idea of soundness is that a set of inference
[00:14:20] soundness is that a set of inference rules rules our sound if if
[00:14:22] rules rules our sound if if we have if the formulas that are derived
[00:14:25] we have if the formulas that are derived following that inference rule is going
[00:14:27] following that inference rule is going to be a subset of the truth which is
[00:14:29] to be a subset of the truth which is which is that the the set of formulas
[00:14:32] which is that the the set of formulas that are entailed by the knowledge base
[00:14:34] that are entailed by the knowledge base okay so they're going to be inside of
[00:14:36] okay so they're going to be inside of the glass they're going to be true maybe
[00:14:37] the glass they're going to be true maybe they don't fill the glass that's fine
[00:14:39] they don't fill the glass that's fine but what this is telling me is that
[00:14:41] but what this is telling me is that anything that i'm deriving is still
[00:14:43] anything that i'm deriving is still going to be true i'm not going to derive
[00:14:45] going to be true i'm not going to derive something that's absolutely false and
[00:14:47] something that's absolutely false and that's that's a very important uh
[00:14:49] that's that's a very important uh property that you want to have in in
[00:14:50] property that you want to have in in general like you would want to have
[00:14:52] general like you would want to have soundness you want to have inference
[00:14:53] soundness you want to have inference rules that are sound because otherwise
[00:14:55] rules that are sound because otherwise we would be deriving things that are
[00:14:57] we would be deriving things that are absolutely false and and that inference
[00:14:59] absolutely false and and that inference rule is not useful right like we want to
[00:15:02] rule is not useful right like we want to be able to derive drive things that are
[00:15:04] be able to derive drive things that are at least true okay so that is this idea
[00:15:07] at least true okay so that is this idea of soundness
[00:15:09] of soundness on the other hand there is kind of like
[00:15:11] on the other hand there is kind of like the the other side of the story which is
[00:15:13] the the other side of the story which is about completeness completeness is about
[00:15:16] about completeness completeness is about making sure that you're deriving
[00:15:18] making sure that you're deriving everything that is true again remember
[00:15:20] everything that is true again remember everything that is inside of the glass
[00:15:23] everything that is inside of the glass is true and the idea of completeness is
[00:15:26] is true and the idea of completeness is that you got to make sure that the
[00:15:28] that you got to make sure that the formulas that are entailed the formulas
[00:15:30] formulas that are entailed the formulas that are inside of the glass are going
[00:15:32] that are inside of the glass are going to be a subset of the formulas that are
[00:15:35] to be a subset of the formulas that are derived so so what that means is that
[00:15:37] derived so so what that means is that your derivation rule make sure that that
[00:15:40] your derivation rule make sure that that you are you are getting all the formulas
[00:15:43] you are you are getting all the formulas that are true or even more than that
[00:15:44] that are true or even more than that right like if you talk about
[00:15:45] right like if you talk about completeness without worrying about
[00:15:47] completeness without worrying about soundness you might be able you might
[00:15:48] soundness you might be able you might even be deriving things that are outside
[00:15:50] even be deriving things that are outside of this class but you want to make sure
[00:15:52] of this class but you want to make sure that you are deriving everything that is
[00:15:54] that you are deriving everything that is inside of the class too so everything
[00:15:56] inside of the class too so everything that is entailed and that's the idea of
[00:15:58] that is entailed and that's the idea of completeness
[00:15:59] completeness okay
[00:16:00] okay so if you put soundness and completeness
[00:16:02] so if you put soundness and completeness together you get a filled up with just
[00:16:05] together you get a filled up with just filled up class right you get everything
[00:16:06] filled up class right you get everything that is inside of the glass and just
[00:16:08] that is inside of the glass and just everything that's inside of the glass
[00:16:10] everything that's inside of the glass okay which is which is like everything
[00:16:11] okay which is which is like everything that is true everything that is entailed
[00:16:14] that is true everything that is entailed okay so sadness and completeness is
[00:16:16] okay so sadness and completeness is about the truth the whole truth and
[00:16:19] about the truth the whole truth and nothing but the truth okay so soundness
[00:16:22] nothing but the truth okay so soundness gets you nothing but the truth
[00:16:24] gets you nothing but the truth everything that's inside of the glass
[00:16:26] everything that's inside of the glass and nothing outside of the glass because
[00:16:28] and nothing outside of the glass because that would be bad right like you don't
[00:16:29] that would be bad right like you don't want to get something false that's what
[00:16:31] want to get something false that's what soundness gets you
[00:16:32] soundness gets you completeness gets you the whole truth
[00:16:34] completeness gets you the whole truth make sure make sure that you get
[00:16:35] make sure make sure that you get everything that is inside of the glass
[00:16:37] everything that is inside of the glass and and like nowhere is it like kept
[00:16:40] and and like nowhere is it like kept empty right like you're driving on all
[00:16:41] empty right like you're driving on all the formulas that are inside of the
[00:16:43] the formulas that are inside of the class and that that is what completeness
[00:16:44] class and that that is what completeness gets you
[00:16:45] gets you in general you want both soundness and
[00:16:47] in general you want both soundness and completeness it would be awesome to get
[00:16:48] completeness it would be awesome to get both soundness and completeness and if
[00:16:50] both soundness and completeness and if you if you get both of em then
[00:16:52] you if you get both of em then entailment and derivation are equivalent
[00:16:54] entailment and derivation are equivalent right like if you derive something
[00:16:56] right like if you derive something you'll make sure that it's equivalent to
[00:16:58] you'll make sure that it's equivalent to the thing that that you're entailing
[00:17:00] the thing that that you're entailing in practice soundness is more important
[00:17:03] in practice soundness is more important right because you don't want to derive
[00:17:04] right because you don't want to derive something that is false and maybe you
[00:17:06] something that is false and maybe you don't get all the truth but maybe that
[00:17:07] don't get all the truth but maybe that is okay so so in practice you sound this
[00:17:11] is okay so so in practice you sound this we prefer to get that
[00:17:13] we prefer to get that first and then and then push towards
[00:17:15] first and then and then push towards completeness okay
[00:17:17] completeness okay so going back to here
[00:17:20] so going back to here so so soundness and completeness is the
[00:17:23] so so soundness and completeness is the thing that connects these guys together
[00:17:25] thing that connects these guys together so so
[00:17:26] so so soundness
[00:17:29] and
[00:17:30] and completeness
[00:17:34] make sure that these two are equivalent
[00:17:35] make sure that these two are equivalent you should write that here too
[00:17:38] you should write that here too okay so we talked about
[00:17:40] okay so we talked about sadness
[00:17:43] and completeness
[00:17:45] and completeness as things
[00:17:47] as things that are relating
[00:17:49] that are relating entailment and derivation okay
[00:17:53] entailment and derivation okay all right so
[00:17:55] all right so and then these are properties of
[00:17:56] and then these are properties of inference rules so the question is is
[00:17:59] inference rules so the question is is modus ponens
[00:18:00] modus ponens sound or is modus ponens complete what
[00:18:03] sound or is modus ponens complete what can we say about modus ponens because
[00:18:04] can we say about modus ponens because that's the only inference rule we have
[00:18:06] that's the only inference rule we have seen so far okay
[00:18:08] seen so far okay so remember modus ponens right like we
[00:18:10] so remember modus ponens right like we have rain and rain implies wet and modus
[00:18:12] have rain and rain implies wet and modus ponens gets a sweat so is that a sound
[00:18:15] ponens gets a sweat so is that a sound so so how do we check soundness
[00:18:18] so so how do we check soundness so to check soundness right sound this
[00:18:20] so to check soundness right sound this is about the meaning it's it's about
[00:18:22] is about the meaning it's it's about checking if if it is it is actually like
[00:18:25] checking if if it is it is actually like inside of the glass right like the thing
[00:18:26] inside of the glass right like the thing you're getting is is actually entailed
[00:18:29] you're getting is is actually entailed so so how are we checking that we look
[00:18:31] so so how are we checking that we look at models of rain models of rain is this
[00:18:33] at models of rain models of rain is this shaded area we look at models of rain
[00:18:36] shaded area we look at models of rain implies wet that is this shaded area
[00:18:39] implies wet that is this shaded area you take the intersection of them right
[00:18:41] you take the intersection of them right because models of these two formulas is
[00:18:43] because models of these two formulas is the intersection of these two models
[00:18:45] the intersection of these two models that is the the darker area
[00:18:48] that is the the darker area and the thing you're going to check is
[00:18:49] and the thing you're going to check is that if this darker area
[00:18:52] that if this darker area is going to be a subset of models of
[00:18:54] is going to be a subset of models of width is it going to be entailed right
[00:18:56] width is it going to be entailed right like you're checking entailment because
[00:18:58] like you're checking entailment because that is about the truth right like that
[00:18:59] that is about the truth right like that is the thing that checks the truth so
[00:19:01] is the thing that checks the truth so models of wet is here
[00:19:03] models of wet is here and then yeah this darker area is
[00:19:05] and then yeah this darker area is actually a subset of models of it so it
[00:19:08] actually a subset of models of it so it turns out that modus ponens is actually
[00:19:10] turns out that modus ponens is actually sound right like we are getting we are
[00:19:12] sound right like we are getting we are we are inferring formulas that are
[00:19:14] we are inferring formulas that are actually true okay
[00:19:16] actually true okay so it is sound
[00:19:18] so it is sound let's look at the difference in french
[00:19:19] let's look at the difference in french rule so i have a made up inference rule
[00:19:21] rule so i have a made up inference rule that says if you get wet
[00:19:23] that says if you get wet and if you get rain implies wet can you
[00:19:25] and if you get rain implies wet can you infer rain from that okay so you've got
[00:19:28] infer rain from that okay so you've got wet and raining implies red is it
[00:19:30] wet and raining implies red is it raining that's the thing you're checking
[00:19:33] raining that's the thing you're checking so this inference rule similarly i can
[00:19:35] so this inference rule similarly i can look at models a bit i can look at
[00:19:37] look at models a bit i can look at models of that implies
[00:19:39] models of that implies rain implies wet
[00:19:41] rain implies wet this shaded area is going to be the
[00:19:43] this shaded area is going to be the intersection and that is not a subset of
[00:19:46] intersection and that is not a subset of models of rain right like as you can see
[00:19:47] models of rain right like as you can see here that's not a subset of models of
[00:19:49] here that's not a subset of models of rain so what that means is we don't have
[00:19:51] rain so what that means is we don't have entailment here right so so because of
[00:19:53] entailment here right so so because of that
[00:19:54] that this particular inference rule is
[00:19:56] this particular inference rule is actually not sound
[00:19:57] actually not sound so all right so the nice thing about
[00:19:59] so all right so the nice thing about modus ponens is is it's actually sound
[00:20:02] modus ponens is is it's actually sound but the next question to ask is is modus
[00:20:05] but the next question to ask is is modus ponens complete
[00:20:07] ponens complete and i want you guys to remember uh this
[00:20:09] and i want you guys to remember uh this example that you looked at right like we
[00:20:11] example that you looked at right like we got a formula we got rain implies
[00:20:13] got a formula we got rain implies slippery
[00:20:15] slippery and uh that wasn't from modus ponens
[00:20:18] and uh that wasn't from modus ponens right like modus ponens wasn't able to
[00:20:19] right like modus ponens wasn't able to get that so this kind of gives us a hint
[00:20:22] get that so this kind of gives us a hint that modus bonus is not complete it's
[00:20:24] that modus bonus is not complete it's not going to get everything that is
[00:20:26] not going to get everything that is actually like entailing is actually true
[00:20:29] actually like entailing is actually true but but yeah let's look at an example
[00:20:31] but but yeah let's look at an example i'm not going to do justice improving
[00:20:33] i'm not going to do justice improving like that modus bonus is not complete
[00:20:34] like that modus bonus is not complete i'm mainly just going to look at a few
[00:20:36] i'm mainly just going to look at a few examples so yeah so let's look at
[00:20:38] examples so yeah so let's look at another example here
[00:20:40] another example here so let's say our knowledge base is rain
[00:20:43] so let's say our knowledge base is rain and if it is raining or snowing it will
[00:20:45] and if it is raining or snowing it will be wet okay
[00:20:47] be wet okay so so the question is um if if we can we
[00:20:51] so so the question is um if if we can we can infer
[00:20:52] can infer wet using using our modus ponens rules
[00:20:56] wet using using our modus ponens rules so so first question is is it actually
[00:20:58] so so first question is is it actually that is it actually true that it would
[00:21:01] that is it actually true that it would be wet just like think about it like
[00:21:02] be wet just like think about it like intuitively think about it like
[00:21:04] intuitively think about it like logically right like if you just think
[00:21:05] logically right like if you just think about it intuitively you know it's
[00:21:07] about it intuitively you know it's raining you know if it is raining or
[00:21:08] raining you know if it is raining or snowing then it's going to be wet
[00:21:11] snowing then it's going to be wet so then it's it's got to be wet right
[00:21:13] so then it's it's got to be wet right it's raining so it's got to be wet so
[00:21:15] it's raining so it's got to be wet so like if you just think about it
[00:21:16] like if you just think about it intuitively like you kind of like
[00:21:18] intuitively like you kind of like realize that wet gotta be entailed here
[00:21:21] realize that wet gotta be entailed here right like let's meet like from meaning
[00:21:22] right like let's meet like from meaning perspective right like wet should be
[00:21:24] perspective right like wet should be included or should we should be able to
[00:21:26] included or should we should be able to like get to it and and incorporate it in
[00:21:29] like get to it and and incorporate it in knowledge base but modus ponens is not
[00:21:31] knowledge base but modus ponens is not able to infer that so why is it not able
[00:21:33] able to infer that so why is it not able to infer that because in modus ponens we
[00:21:36] to infer that because in modus ponens we have this very specific syntactic form
[00:21:38] have this very specific syntactic form of f and f implies g and then this
[00:21:40] of f and f implies g and then this formula doesn't match that right like it
[00:21:42] formula doesn't match that right like it does have this or
[00:21:44] does have this or and modus ponens doesn't really have ors
[00:21:46] and modus ponens doesn't really have ors in it it doesn't really have any
[00:21:47] in it it doesn't really have any branchings in it and and because of that
[00:21:50] branchings in it and and because of that like i can't really apply i can't i
[00:21:52] like i can't really apply i can't i can't really apply modus 1 is here so
[00:21:54] can't really apply modus 1 is here so knowledge base here actually entails f f
[00:21:56] knowledge base here actually entails f f is in tail it is going to be red but
[00:21:58] is in tail it is going to be red but syntactically using just mode exponents
[00:22:01] syntactically using just mode exponents i'm not going to be able to derive f
[00:22:04] i'm not going to be able to derive f and then based on this example you can
[00:22:05] and then based on this example you can kind of see that modus ponens is not
[00:22:07] kind of see that modus ponens is not complete like we're not able to
[00:22:10] complete like we're not able to derive everything okay
[00:22:12] derive everything okay so one other thing i want to note here
[00:22:14] so one other thing i want to note here is modus bonus is kind of interesting
[00:22:15] is modus bonus is kind of interesting it's just looking at positive examples
[00:22:17] it's just looking at positive examples right like you you have a bunch of
[00:22:19] right like you you have a bunch of positive positive clauses uh sorry
[00:22:22] positive positive clauses uh sorry positive formulas and based on those
[00:22:23] positive formulas and based on those formulas you're able to infer something
[00:22:26] formulas you're able to infer something positive and again infer something
[00:22:27] positive and again infer something positive and refer something positive
[00:22:29] positive and refer something positive right it doesn't really have like these
[00:22:31] right it doesn't really have like these fours or negations and then that is why
[00:22:33] fours or negations and then that is why it's not able to it's not able to infer
[00:22:36] it's not able to it's not able to infer this this particular property because we
[00:22:38] this this particular property because we have an or here because it's not able to
[00:22:40] have an or here because it's not able to capture that and again it's like
[00:22:41] capture that and again it's like applying things syntactically so it
[00:22:43] applying things syntactically so it doesn't care about meaning so how can we
[00:22:45] doesn't care about meaning so how can we fix this so the question is going back
[00:22:47] fix this so the question is going back here right like we just saw that modus
[00:22:49] here right like we just saw that modus ponens sure it sound that is great but
[00:22:52] ponens sure it sound that is great but it was
[00:22:53] it was okay and ideally i want to be able to
[00:22:55] okay and ideally i want to be able to get both soundness and completeness
[00:22:57] get both soundness and completeness because ideally i want my what i'm
[00:22:59] because ideally i want my what i'm deriving would be equivalent to what i'm
[00:23:01] deriving would be equivalent to what i'm entailing i want both of them so a
[00:23:04] entailing i want both of them so a question that we're asking now is how
[00:23:06] question that we're asking now is how can we fix that how can we fix the fact
[00:23:08] can we fix that how can we fix the fact that modus ponens is not complete okay
[00:23:12] that modus ponens is not complete okay and and that's the topic of the next few
[00:23:14] and and that's the topic of the next few modules so we have two options to fix
[00:23:16] modules so we have two options to fix this completeness first option is maybe
[00:23:19] this completeness first option is maybe we should restrict propositional formula
[00:23:21] we should restrict propositional formula so for positional logic maybe it is too
[00:23:23] so for positional logic maybe it is too large if you're restricted maybe you can
[00:23:25] large if you're restricted maybe you can restrict it to a specific set of
[00:23:26] restrict it to a specific set of propositional logic that only has these
[00:23:29] propositional logic that only has these things that are called horn clauses
[00:23:31] things that are called horn clauses and under that scenario if you're
[00:23:33] and under that scenario if you're looking at propositional logic with only
[00:23:35] looking at propositional logic with only horn clauses it turns out that modus
[00:23:37] horn clauses it turns out that modus ponens is both sound and complete
[00:23:40] ponens is both sound and complete the other option is maybe i don't want
[00:23:42] the other option is maybe i don't want to change my propositional logic i want
[00:23:43] to change my propositional logic i want to keep all of her positional logic but
[00:23:45] to keep all of her positional logic but maybe i should be looking at more
[00:23:47] maybe i should be looking at more powerful inference rules so modus ponens
[00:23:49] powerful inference rules so modus ponens seems pretty simple maybe there are more
[00:23:51] seems pretty simple maybe there are more powerful inference rules that i can use
[00:23:53] powerful inference rules that i can use and then specifically resolution is is
[00:23:56] and then specifically resolution is is an inference rule that you're going to
[00:23:57] an inference rule that you're going to be talking about which is both sound and
[00:23:59] be talking about which is both sound and complete so next module we'll be talking
[00:24:02] complete so next module we'll be talking about horn clauses propositional logic
[00:24:04] about horn clauses propositional logic return clauses and the fact that modus
[00:24:06] return clauses and the fact that modus ponens is sound and complete and in the
[00:24:08] ponens is sound and complete and in the module after that we'll be talking about
[00:24:09] module after that we'll be talking about resolution and how we can use resolution
[00:24:12] resolution and how we can use resolution and the fact that it is
[00:24:19] you


================================================================================
LECTURE 045
================================================================================

Logic 5 - Propositional Modus Ponens | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=6bj4z2mt1KE

---

Transcript

[00:00:05] okay so in this module we would like to
[00:00:07] okay so in this module we would like to talk about foreign clauses specifically
[00:00:10] talk about foreign clauses specifically how modus ponens applies to
[00:00:11] how modus ponens applies to propositional logic with only horned
[00:00:13] propositional logic with only horned clauses and how we can show uh
[00:00:16] clauses and how we can show uh completeness and soundness in that
[00:00:17] completeness and soundness in that setting okay
[00:00:19] setting okay so to do that you have to define a few
[00:00:21] so to do that you have to define a few other things uh so let me go back to my
[00:00:24] other things uh so let me go back to my definitions here
[00:00:26] definitions here so
[00:00:27] so here we've been talking about inference
[00:00:28] here we've been talking about inference rules we've been talking about modus
[00:00:30] rules we've been talking about modus ponens derivation and proving and we've
[00:00:32] ponens derivation and proving and we've talked about sinus and completeness
[00:00:34] talked about sinus and completeness you've seen that modus ponens is sound
[00:00:36] you've seen that modus ponens is sound but it is not complete and as a way of
[00:00:39] but it is not complete and as a way of fixing that we thought maybe we should
[00:00:40] fixing that we thought maybe we should restrict our formulas to formulas that
[00:00:43] restrict our formulas to formulas that are only that only have horn clauses so
[00:00:46] are only that only have horn clauses so we need to define what a horn causes and
[00:00:48] we need to define what a horn causes and to define define what a horn clause is
[00:00:50] to define define what a horn clause is we have to define what a definite cause
[00:00:52] we have to define what a definite cause is so i'm going to define definite
[00:00:55] is so i'm going to define definite clause
[00:00:57] clause and i'm going to define goal clause
[00:01:01] and a horn clause is basically a clause
[00:01:03] and a horn clause is basically a clause that is either a definite clause or goal
[00:01:05] that is either a definite clause or goal clause i'll define this in a setting in
[00:01:08] clause i'll define this in a setting in a second
[00:01:10] a second okay so so that's
[00:01:12] okay so so that's just back here so what is a definite
[00:01:15] just back here so what is a definite clause a definite cause is a clause that
[00:01:17] clause a definite cause is a clause that has the following form so you have p1
[00:01:20] has the following form so you have p1 anded through pk implying q okay so that
[00:01:24] anded through pk implying q okay so that is a definite clause okay so where p1
[00:01:26] is a definite clause okay so where p1 through pk and q are propositional
[00:01:28] through pk and q are propositional symbols notice that just one thing i
[00:01:31] symbols notice that just one thing i want to mention is k could be zero too
[00:01:33] want to mention is k could be zero too so you could have almost like true
[00:01:35] so you could have almost like true implies q so you would end up with just
[00:01:37] implies q so you would end up with just q
[00:01:38] q so so that is also a definite cause so
[00:01:40] so so that is also a definite cause so here are some examples of definite
[00:01:42] here are some examples of definite causes so
[00:01:44] causes so rain and snow implying traffic is a
[00:01:46] rain and snow implying traffic is a definite cause because it does have this
[00:01:48] definite cause because it does have this form of p1 and it through pk implying q
[00:01:51] form of p1 and it through pk implying q okay
[00:01:52] okay traffic itself is also just a definite
[00:01:54] traffic itself is also just a definite clause so q itself is just a definite
[00:01:57] clause so q itself is just a definite clause
[00:01:57] clause not traffic negation of traffic is not a
[00:02:00] not traffic negation of traffic is not a definite cause because you don't have
[00:02:02] definite cause because you don't have any negations here right like these are
[00:02:03] any negations here right like these are propositional symbols
[00:02:05] propositional symbols and then random snow implying traffic or
[00:02:08] and then random snow implying traffic or peaceful is not is not a definite cause
[00:02:11] peaceful is not is not a definite cause because we have this or here okay
[00:02:14] because we have this or here okay so again a definite clause has this form
[00:02:16] so again a definite clause has this form of just positive information implying
[00:02:18] of just positive information implying something positive okay so it has kind
[00:02:21] something positive okay so it has kind of kind of that form
[00:02:22] of kind of that form and in addition to definite clause we
[00:02:24] and in addition to definite clause we also have this other thing and that is
[00:02:26] also have this other thing and that is called a goal clause so a goal clause is
[00:02:29] called a goal clause so a goal clause is is a clause of this form p1 and it
[00:02:31] is a clause of this form p1 and it through pk implying false okay so so
[00:02:35] through pk implying false okay so so this clause is called the goal clause
[00:02:37] this clause is called the goal clause okay so like traffic and accident
[00:02:40] okay so like traffic and accident implying false is going to be a gold
[00:02:42] implying false is going to be a gold class
[00:02:43] class so what is a foreign cause a foreign
[00:02:45] so what is a foreign cause a foreign clause is a cause that is either a
[00:02:47] clause is a cause that is either a definite cause or a goal
[00:02:49] definite cause or a goal okay
[00:02:50] okay and then the reason i'm separating this
[00:02:51] and then the reason i'm separating this goal cause here is this type of goal
[00:02:53] goal cause here is this type of goal clauses have a specific form they're
[00:02:55] clauses have a specific form they're equivalent to basically saying negation
[00:02:58] equivalent to basically saying negation of whatever comes first right because
[00:03:00] of whatever comes first right because implication is negation of this or false
[00:03:04] implication is negation of this or false or false goes away right so so so then
[00:03:06] or false goes away right so so so then it's basically just negation of this
[00:03:08] it's basically just negation of this first part what is negation of this
[00:03:10] first part what is negation of this first part then that is negation of
[00:03:12] first part then that is negation of traffic and accident or negation of
[00:03:14] traffic and accident or negation of traffic or negation of accident so
[00:03:16] traffic or negation of accident so basically you can think of a bunch of
[00:03:18] basically you can think of a bunch of ores of a bunch of negations and then
[00:03:20] ores of a bunch of negations and then that acts as a gold clause and and that
[00:03:22] that acts as a gold clause and and that is also allowed here when you talk about
[00:03:24] is also allowed here when you talk about foreign clauses in general okay
[00:03:27] foreign clauses in general okay all right so that's a wrong clause
[00:03:29] all right so that's a wrong clause and then i'm going to expand this idea
[00:03:31] and then i'm going to expand this idea of modus ponens we talked about modus
[00:03:32] of modus ponens we talked about modus ponens being of the form of p and p
[00:03:35] ponens being of the form of p and p implying q being able to give us q right
[00:03:39] implying q being able to give us q right so so the more general modus ponens for
[00:03:42] so so the more general modus ponens for infrastructure exponents is of this form
[00:03:44] infrastructure exponents is of this form of having p1 through pk
[00:03:46] of having p1 through pk and then p1 through pka and it together
[00:03:48] and then p1 through pka and it together implying q giving us q here is an
[00:03:51] implying q giving us q here is an example so let's say it is wet and it's
[00:03:54] example so let's say it is wet and it's a weekday and if it isn't wet and it is
[00:03:56] a weekday and if it isn't wet and it is a weekday there is traffic okay so so
[00:03:58] a weekday there is traffic okay so so this
[00:04:00] this is going to imply traffic here for us
[00:04:02] is going to imply traffic here for us okay so that's just a more general form
[00:04:04] okay so that's just a more general form of modus ponens okay
[00:04:07] of modus ponens okay all right so then we have basically this
[00:04:09] all right so then we have basically this theorem and then the theorem says that
[00:04:12] theorem and then the theorem says that if i apply this modus ponens true only
[00:04:14] if i apply this modus ponens true only on horned clauses then i'm going to get
[00:04:16] on horned clauses then i'm going to get completeness okay so so modus ponens is
[00:04:19] completeness okay so so modus ponens is complete with respect to horn causes and
[00:04:22] complete with respect to horn causes and what that means is that suppose that you
[00:04:24] what that means is that suppose that you have a knowledge base that only has horn
[00:04:26] have a knowledge base that only has horn clauses and then p is entailed p is a
[00:04:29] clauses and then p is entailed p is a symbol and p is entailed in in this
[00:04:31] symbol and p is entailed in in this knowledge base then if i just apply
[00:04:33] knowledge base then if i just apply modus ponens if i just apply this
[00:04:35] modus ponens if i just apply this particular inference rule of modus
[00:04:36] particular inference rule of modus ponens i will be able to derive p
[00:04:39] ponens i will be able to derive p and that's pretty nice because in
[00:04:41] and that's pretty nice because in general if you ask me like remember to
[00:04:43] general if you ask me like remember to ask and tell operators if you ask me is
[00:04:45] ask and tell operators if you ask me is p true you're really asking me if p is
[00:04:48] p true you're really asking me if p is entailed in kb and instead of me doing
[00:04:51] entailed in kb and instead of me doing something of the form of model checking
[00:04:53] something of the form of model checking and satisfying satisfiability and things
[00:04:55] and satisfying satisfiability and things of those forms that we have talked about
[00:04:57] of those forms that we have talked about instead of me doing like all of that and
[00:04:59] instead of me doing like all of that and trying to figure out if this model
[00:05:01] trying to figure out if this model really like entails p or not
[00:05:04] really like entails p or not then what i can do is i can basically do
[00:05:06] then what i can do is i can basically do a symbol manipulation i can basically
[00:05:08] a symbol manipulation i can basically just apply about exponents on my
[00:05:09] just apply about exponents on my knowledge base and see if i can derive
[00:05:12] knowledge base and see if i can derive it like syntactically or not and then if
[00:05:14] it like syntactically or not and then if i can then then then this derivation and
[00:05:17] i can then then then this derivation and entailment are equivalent right like if
[00:05:19] entailment are equivalent right like if i can derive this based on syntax and
[00:05:21] i can derive this based on syntax and based on modus ponens then i would be
[00:05:23] based on modus ponens then i would be able to say that the knowledge base also
[00:05:25] able to say that the knowledge base also entails p
[00:05:27] entails p so going back to this diagram that we
[00:05:29] so going back to this diagram that we had before right so so we will have
[00:05:32] had before right so so we will have soundness and completeness meaning that
[00:05:34] soundness and completeness meaning that um this idea of derivation knowledge
[00:05:37] um this idea of derivation knowledge base knowledge base deriving g is going
[00:05:39] base knowledge base deriving g is going to be equivalent to knowledge base
[00:05:41] to be equivalent to knowledge base entailing g so if you ask me is g true
[00:05:44] entailing g so if you ask me is g true or like if you want to add g to the
[00:05:46] or like if you want to add g to the knowledge base remember that ask and
[00:05:48] knowledge base remember that ask and tell operations that's about asking for
[00:05:51] tell operations that's about asking for entailment right and if it is asking for
[00:05:53] entailment right and if it is asking for entailment the end right if i'm in a
[00:05:55] entailment the end right if i'm in a space where i have sound the same
[00:05:56] space where i have sound the same completeness of my inference rules modus
[00:05:58] completeness of my inference rules modus ponens in this case then i can just do
[00:06:00] ponens in this case then i can just do this derivation which is much simpler
[00:06:05] this derivation which is much simpler all right
[00:06:06] all right so let's just look at an example here so
[00:06:08] so let's just look at an example here so let's say that my knowledge base are the
[00:06:10] let's say that my knowledge base are the following and formulas here and then my
[00:06:13] following and formulas here and then my modus ponens rule is this more general
[00:06:14] modus ponens rule is this more general rule of p1 through pk and then np1
[00:06:17] rule of p1 through pk and then np1 through pk added together implying q and
[00:06:19] through pk added together implying q and that gives me q okay so so what happens
[00:06:22] that gives me q okay so so what happens here so so if you ask me based on your
[00:06:25] here so so if you ask me based on your knowledge base is there traffic can you
[00:06:27] knowledge base is there traffic can you tell me if there is traffic or not what
[00:06:29] tell me if there is traffic or not what i can do is i can check if the knowledge
[00:06:31] i can do is i can check if the knowledge base derives traffic and how do i do
[00:06:33] base derives traffic and how do i do that well i have rain and rain implies
[00:06:35] that well i have rain and rain implies wet if i apply modus ponens on my
[00:06:37] wet if i apply modus ponens on my knowledge base i get wet
[00:06:39] knowledge base i get wet i know that it's a weekday that's in my
[00:06:40] i know that it's a weekday that's in my knowledge base i have got the sweat and
[00:06:42] knowledge base i have got the sweat and added that to my knowledge base i also
[00:06:44] added that to my knowledge base i also have red and weekday implies traffic in
[00:06:47] have red and weekday implies traffic in my knowledge base with all these three
[00:06:49] my knowledge base with all these three together i can infer i can infer traffic
[00:06:52] together i can infer i can infer traffic i can derive traffic and because
[00:06:54] i can derive traffic and because knowledgebase derives traffic and we
[00:06:56] knowledgebase derives traffic and we have soundness and completeness because
[00:06:58] have soundness and completeness because we are looking at only horn clauses we
[00:07:00] we are looking at only horn clauses we are able to say the knowledge base here
[00:07:02] are able to say the knowledge base here in this case entails traffic
[00:07:05] in this case entails traffic all right so this is kind of like an
[00:07:08] all right so this is kind of like an overview of what we have talked about so
[00:07:09] overview of what we have talked about so far we've talked about formulas that's
[00:07:11] far we've talked about formulas that's in the syntax land they have meanings in
[00:07:13] in the syntax land they have meanings in the semantic line we have models for
[00:07:15] the semantic line we have models for each of them and then in the semantic
[00:07:17] each of them and then in the semantic land if you want to check if you want to
[00:07:19] land if you want to check if you want to check something is entailed or not we
[00:07:20] check something is entailed or not we have to do satisfiability right we have
[00:07:22] have to do satisfiability right we have to have to do model checking and that
[00:07:24] to have to do model checking and that was quite involved so instead of doing
[00:07:26] was quite involved so instead of doing that if we have a set of inference rules
[00:07:28] that if we have a set of inference rules that are going to be sound and complete
[00:07:30] that are going to be sound and complete either because maybe our formulas are
[00:07:32] either because maybe our formulas are restricted or maybe our inference rules
[00:07:34] restricted or maybe our inference rules are fancier then we are able to derive
[00:07:37] are fancier then we are able to derive the formula and and that derivation if
[00:07:39] the formula and and that derivation if you have soundness and completeness that
[00:07:41] you have soundness and completeness that derivation is the same thing as checking
[00:07:43] derivation is the same thing as checking entailment so
[00:07:45] entailment so in this module we've talked about form
[00:07:46] in this module we've talked about form clauses and and kind of like a
[00:07:48] clauses and and kind of like a restricted version of formulas where we
[00:07:50] restricted version of formulas where we can apply modus ponens in the next
[00:07:52] can apply modus ponens in the next module we'll be talking about resolution
[00:07:54] module we'll be talking about resolution so a fancier inference rule as opposed
[00:07:56] so a fancier inference rule as opposed to changing our formulas in order to get
[00:07:58] to changing our formulas in order to get both soundness and completeness


================================================================================
LECTURE 046
================================================================================

Logic 6 - Propositional Resolutions | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=egLAF4dFdBo

---

Transcript

[00:00:05] so in this module we're going to be
[00:00:07] so in this module we're going to be talking about the resolution which is an
[00:00:09] talking about the resolution which is an inference rule so so far we've been
[00:00:11] inference rule so so far we've been talking about propositional logic we've
[00:00:13] talking about propositional logic we've been talking about syntax and semantics
[00:00:14] been talking about syntax and semantics of propositional logic and we discussed
[00:00:16] of propositional logic and we discussed one inference rule specifically modus
[00:00:18] one inference rule specifically modus ponens and the idea of this of an
[00:00:20] ponens and the idea of this of an inference rule is can we do uh
[00:00:22] inference rule is can we do uh manipulation of syntax in the syntactic
[00:00:25] manipulation of syntax in the syntactic land over formulas in order to derive in
[00:00:28] land over formulas in order to derive in order to prove a new formula and the
[00:00:31] order to prove a new formula and the idea is is that in france through under
[00:00:33] idea is is that in france through under that specific set of logic and logical
[00:00:35] that specific set of logic and logical formulas is that sound and complete and
[00:00:37] formulas is that sound and complete and what we have seen is if i apply just
[00:00:39] what we have seen is if i apply just modus ponens on propositional logic i
[00:00:42] modus ponens on propositional logic i get soundness but i don't get
[00:00:44] get soundness but i don't get completeness and then what that means is
[00:00:46] completeness and then what that means is if i have a bunch of formulas that are
[00:00:48] if i have a bunch of formulas that are entailed that are true i'm not going to
[00:00:50] entailed that are true i'm not going to be able to get all of them if i apply
[00:00:52] be able to get all of them if i apply modus ponens on propositional logic so
[00:00:54] modus ponens on propositional logic so we talked about two ways of solving that
[00:00:56] we talked about two ways of solving that and we discussed the first way the first
[00:00:58] and we discussed the first way the first idea was instead of looking at all of
[00:01:01] idea was instead of looking at all of propositional logic let's look at a
[00:01:03] propositional logic let's look at a subset of it and that subset is
[00:01:04] subset of it and that subset is propositional logic with only horn
[00:01:06] propositional logic with only horn classes so we defined foreign clauses
[00:01:09] classes so we defined foreign clauses during the last module and we looked at
[00:01:10] during the last module and we looked at propositional logic with only horn
[00:01:12] propositional logic with only horn clauses and in that case if i apply
[00:01:14] clauses and in that case if i apply modus ponens then i get soundness and
[00:01:16] modus ponens then i get soundness and completeness everything is great
[00:01:19] completeness everything is great the other option is what if i don't want
[00:01:21] the other option is what if i don't want to limit my prepositional logic what if
[00:01:23] to limit my prepositional logic what if i want to look at all of her positional
[00:01:24] i want to look at all of her positional logic can i make my inference rule a
[00:01:27] logic can i make my inference rule a little bit fancier a little bit more
[00:01:28] little bit fancier a little bit more powerful
[00:01:29] powerful so in this module we are going to be
[00:01:31] so in this module we are going to be talking about a new type of inference
[00:01:33] talking about a new type of inference rule specifically called resolution as a
[00:01:35] rule specifically called resolution as a way of getting both soundness and
[00:01:38] way of getting both soundness and completeness
[00:01:39] completeness all right so
[00:01:40] all right so to start with i i want to just write out
[00:01:42] to start with i i want to just write out a few things that that we're all aware
[00:01:44] a few things that that we're all aware of but uh let's just get on the same
[00:01:46] of but uh let's just get on the same page on all of them so
[00:01:49] page on all of them so um all right so
[00:01:51] um all right so let me just write out a few things so so
[00:01:53] let me just write out a few things so so if we have p twice q well what is that
[00:01:55] if we have p twice q well what is that equivalent to that is equivalence to
[00:01:58] equivalent to that is equivalence to negation of p or q let's just write out
[00:02:01] negation of p or q let's just write out some of these equivalences here
[00:02:03] some of these equivalences here if i get if i have negation of p
[00:02:06] if i get if i have negation of p and if i have negation of p and q
[00:02:10] and if i have negation of p and q what is that what is that equivalent to
[00:02:13] what is that what is that equivalent to well i can apply de morgan's law
[00:02:16] well i can apply de morgan's law and that gives me gets me negation of p
[00:02:19] and that gives me gets me negation of p or negation of q and then if i have
[00:02:23] or negation of q and then if i have negation of p
[00:02:24] negation of p or q
[00:02:26] or q what is that going to be that is going
[00:02:28] what is that going to be that is going to be equal to negation of p
[00:02:30] to be equal to negation of p and negation of q
[00:02:32] and negation of q remove these extra lines
[00:02:37] all right so these are a few
[00:02:38] all right so these are a few equivalences that we all agree over this
[00:02:41] equivalences that we all agree over this is just how they are like
[00:02:44] is just how they are like it's it's just like truth right so if
[00:02:46] it's it's just like truth right so if you look at the truth a truth table of
[00:02:47] you look at the truth a truth table of these you're gonna get you're going to
[00:02:49] these you're gonna get you're going to get these these equivalences and the
[00:02:51] get these these equivalences and the reason i'm defining these equivalences
[00:02:54] reason i'm defining these equivalences is in general i would like to write
[00:02:55] is in general i would like to write everything in the form of disjunctions
[00:02:57] everything in the form of disjunctions and conjunctions okay so let me define a
[00:03:00] and conjunctions okay so let me define a few other things here so so i'm going to
[00:03:01] few other things here so so i'm going to define
[00:03:02] define a literal
[00:03:05] a literal as a prepositional symbol p
[00:03:08] as a prepositional symbol p or a negation of a proposed per
[00:03:09] or a negation of a proposed per positional symbol negation of p okay so
[00:03:12] positional symbol negation of p okay so a literal is just p or negation of p
[00:03:14] a literal is just p or negation of p where p is just a compositional symbol
[00:03:16] where p is just a compositional symbol okay
[00:03:17] okay so then based on that one can define a
[00:03:20] so then based on that one can define a clause
[00:03:21] clause to be a conjunction
[00:03:23] to be a conjunction sorry to be a disjunction of
[00:03:25] sorry to be a disjunction of prepositional symbols okay so we talked
[00:03:28] prepositional symbols okay so we talked about born clauses during the last
[00:03:30] about born clauses during the last module but we never like defined what a
[00:03:32] module but we never like defined what a clause is so a clause is just an or of a
[00:03:34] clause is so a clause is just an or of a bunch of literals it's just a
[00:03:36] bunch of literals it's just a disjunction of a bunch of literal so i
[00:03:38] disjunction of a bunch of literal so i can have a clause that's like p1 or
[00:03:40] can have a clause that's like p1 or negation of p2 maybe or p3 this is a
[00:03:43] negation of p2 maybe or p3 this is a clause
[00:03:44] clause because it's just an or of a bunch of
[00:03:46] because it's just an or of a bunch of literals
[00:03:48] literals so then the question is what is a
[00:03:50] so then the question is what is a foreign clause so could we would you
[00:03:52] foreign clause so could we would you find horn clauses last lecture but we
[00:03:54] find horn clauses last lecture but we could think about horn clauses a little
[00:03:56] could think about horn clauses a little bit differently here so a horn
[00:03:58] bit differently here so a horn clause it's basically a clause it's just
[00:04:01] clause it's basically a clause it's just a disjunction of a bunch of literals
[00:04:03] a disjunction of a bunch of literals with at most one positive literal so i'm
[00:04:06] with at most one positive literal so i'm going to refer to this guy as a positive
[00:04:08] going to refer to this guy as a positive literal
[00:04:09] literal and this guy as a negative literal
[00:04:14] and this guy as a negative literal and a horned class basically says you
[00:04:16] and a horned class basically says you have at most one positive literal in in
[00:04:19] have at most one positive literal in in your class for example this clause that
[00:04:21] your class for example this clause that i've written here is not a horn clause
[00:04:23] i've written here is not a horn clause right because it has two positive
[00:04:24] right because it has two positive literals p1 and p3 but for example i can
[00:04:27] literals p1 and p3 but for example i can have another horn clause that is p1 or
[00:04:30] have another horn clause that is p1 or negation of p2
[00:04:33] negation of p2 or
[00:04:34] or negation of p3
[00:04:36] negation of p3 and then this is going to be um a horn
[00:04:40] and then this is going to be um a horn clause because it has at most one
[00:04:41] clause because it has at most one positive literal that that is p1 okay so
[00:04:44] positive literal that that is p1 okay so this is just another way of looking at
[00:04:46] this is just another way of looking at horn clauses so going back here right so
[00:04:49] horn clauses so going back here right so we have a implies c how can we write it
[00:04:51] we have a implies c how can we write it we can write it as negation of a or c
[00:04:54] we can write it as negation of a or c we have a and b implying c right what is
[00:04:57] we have a and b implying c right what is that equal to it's negation of this
[00:04:59] that equal to it's negation of this first part i can use the morgan's law
[00:05:01] first part i can use the morgan's law and that gives me negation of a or
[00:05:03] and that gives me negation of a or negation of b or c remember like again
[00:05:06] negation of b or c remember like again again this is this is a cause now and
[00:05:08] again this is this is a cause now and it's a horn clause and and again like
[00:05:10] it's a horn clause and and again like defining what i've defined so far a
[00:05:12] defining what i've defined so far a literal is going to be a prepositional
[00:05:15] literal is going to be a prepositional uh symbol either positive or negative
[00:05:17] uh symbol either positive or negative either p or negation of p a clause is
[00:05:20] either p or negation of p a clause is just a disjunction of these literals an
[00:05:22] just a disjunction of these literals an important clause is just a clause with
[00:05:24] important clause is just a clause with at most one positive literal okay
[00:05:27] at most one positive literal okay all right so now when i'm thinking about
[00:05:29] all right so now when i'm thinking about modus ponens i can actually write it out
[00:05:31] modus ponens i can actually write it out as as clauses right like remember i have
[00:05:33] as as clauses right like remember i have a and a implies c and that gets me c
[00:05:37] a and a implies c and that gets me c that is what modus ponens tells me
[00:05:39] that is what modus ponens tells me instead of a implies c i can just write
[00:05:41] instead of a implies c i can just write it as a clause and i can write it as
[00:05:42] it as a clause and i can write it as negation of a or c okay
[00:05:45] negation of a or c okay and kind of like intuitively like what
[00:05:48] and kind of like intuitively like what is really happening is you're cancelling
[00:05:50] is really happening is you're cancelling out a and negation of a that's why we
[00:05:53] out a and negation of a that's why we are getting c
[00:05:54] are getting c okay
[00:05:56] okay and the reason i'm defining modus ponens
[00:05:58] and the reason i'm defining modus ponens like this i'm rewriting it is this kind
[00:06:00] like this i'm rewriting it is this kind of helps us to think about this more
[00:06:02] of helps us to think about this more general resolution rule that i'll be
[00:06:04] general resolution rule that i'll be talking about in a few slides okay
[00:06:06] talking about in a few slides okay so the idea of resolution is i don't
[00:06:09] so the idea of resolution is i don't want to limit myself to specific types
[00:06:11] want to limit myself to specific types of clauses i can talk about general
[00:06:13] of clauses i can talk about general clauses and general clauses are what are
[00:06:15] clauses and general clauses are what are they there are disjunctions of of
[00:06:18] they there are disjunctions of of positive or negative literals
[00:06:21] positive or negative literals and the idea of resolution is if you
[00:06:23] and the idea of resolution is if you have a bunch of clauses
[00:06:25] have a bunch of clauses you'll have a rule you'll have an
[00:06:26] you'll have a rule you'll have an inference rule that cancels out your
[00:06:29] inference rule that cancels out your positive and negative literals so here's
[00:06:32] positive and negative literals so here's an example so if it is raining or
[00:06:34] an example so if it is raining or snowing that's part of your knowledge
[00:06:36] snowing that's part of your knowledge base
[00:06:37] base and if it is not snowing or there is
[00:06:40] and if it is not snowing or there is traffic
[00:06:41] traffic one can infer that it is raining or
[00:06:44] one can infer that it is raining or there is traffic
[00:06:45] there is traffic why let's think about like why can't we
[00:06:47] why let's think about like why can't we why can't we infer this even intuitively
[00:06:50] why can't we infer this even intuitively okay
[00:06:50] okay so so if it is snowing right so so so if
[00:06:54] so so if it is snowing right so so so if it is snowing then there's got to be if
[00:06:56] it is snowing then there's got to be if the snowing is true right there's got to
[00:06:59] the snowing is true right there's got to be traffic okay so that's how i get
[00:07:01] be traffic okay so that's how i get traffic and if it is not snowing right
[00:07:04] traffic and if it is not snowing right if it's not snowing then there's got to
[00:07:06] if it's not snowing then there's got to be rain because it's either snowing or
[00:07:08] be rain because it's either snowing or or or raining so that's how i get rain
[00:07:10] or or raining so that's how i get rain right so intuitively that is why you're
[00:07:13] right so intuitively that is why you're getting this rain or traffic and in some
[00:07:15] getting this rain or traffic and in some sense you can think about snow and
[00:07:17] sense you can think about snow and negation of snow canceling each other
[00:07:19] negation of snow canceling each other out because because either if it is
[00:07:21] out because because either if it is snowing or or or if it is not snowing
[00:07:24] snowing or or or if it is not snowing you are going to get traffic or rain out
[00:07:26] you are going to get traffic or rain out of it
[00:07:27] of it and then this is basically resolution
[00:07:29] and then this is basically resolution inference will apply to one example okay
[00:07:32] inference will apply to one example okay one can think about this much more
[00:07:34] one can think about this much more generally and think about a clause f1 or
[00:07:37] generally and think about a clause f1 or disjunctive through fn or p and then
[00:07:40] disjunctive through fn or p and then another clause where you have negation
[00:07:41] another clause where you have negation of p or g one through g m
[00:07:44] of p or g one through g m and then the idea of an inference rule
[00:07:46] and then the idea of an inference rule is based on these two premises you can
[00:07:48] is based on these two premises you can conclude a new clause that cancels out p
[00:07:52] conclude a new clause that cancels out p and negation
[00:07:54] and negation so this is called resolution
[00:07:56] so this is called resolution all right
[00:08:00] so is is resolution sound so that's a
[00:08:03] so is is resolution sound so that's a very good question to ask in general we
[00:08:05] very good question to ask in general we want it to be sound because we want to
[00:08:06] want it to be sound because we want to be able to derive things that are
[00:08:08] be able to derive things that are actually true so is it like remember
[00:08:10] actually true so is it like remember this example is it true that i can
[00:08:12] this example is it true that i can derive rain or traffic here okay so so
[00:08:15] derive rain or traffic here okay so so how do i check that well to check
[00:08:17] how do i check that well to check soundness i need to actually get to the
[00:08:18] soundness i need to actually get to the models and meanings of each one of these
[00:08:20] models and meanings of each one of these formulas and i need to check entailment
[00:08:22] formulas and i need to check entailment so let's check that on this example so
[00:08:25] so let's check that on this example so if i have rain or snow what is models of
[00:08:27] if i have rain or snow what is models of rain or snow models of so so i have here
[00:08:30] rain or snow models of so so i have here my my truth table is going to be a
[00:08:32] my my truth table is going to be a little bit larger because i have both
[00:08:34] little bit larger because i have both snow rain and traffic so i need to look
[00:08:36] snow rain and traffic so i need to look at zero one values for all of them
[00:08:39] at zero one values for all of them so i have rain or snow rain or snow
[00:08:41] so i have rain or snow rain or snow correspond to these shaded areas so
[00:08:43] correspond to these shaded areas so that's models of rain or snow
[00:08:45] that's models of rain or snow and then i have models of
[00:08:48] and then i have models of not snow or traffic that corresponds to
[00:08:50] not snow or traffic that corresponds to these shaded areas
[00:08:52] these shaded areas and remember as i add more formulas to
[00:08:54] and remember as i add more formulas to my knowledge base i'm shrinking its
[00:08:56] my knowledge base i'm shrinking its models right i'm adding more constraints
[00:08:58] models right i'm adding more constraints so i'm shrinking the models that is why
[00:09:00] so i'm shrinking the models that is why models of of these two formulas is going
[00:09:03] models of of these two formulas is going to be the intersection of their models
[00:09:05] to be the intersection of their models so the intersection is going to be this
[00:09:07] so the intersection is going to be this darker red area okay
[00:09:10] darker red area okay so if i'm checking entailment if i'm
[00:09:12] so if i'm checking entailment if i'm basically checking if resolution is
[00:09:14] basically checking if resolution is sound i should be checking entailment
[00:09:16] sound i should be checking entailment and what that means is i should be
[00:09:18] and what that means is i should be checking if the models of what is in my
[00:09:20] checking if the models of what is in my knowledge base is going to be is is
[00:09:23] knowledge base is going to be is is going to be uh as a subset of models of
[00:09:27] going to be uh as a subset of models of this new formula that i'm trying to
[00:09:29] this new formula that i'm trying to derive here so what's the new formula
[00:09:30] derive here so what's the new formula i'm trying to derive here resolution
[00:09:32] i'm trying to derive here resolution tells me you can derive rain or traffic
[00:09:35] tells me you can derive rain or traffic and if i look at rain or traffic and
[00:09:36] and if i look at rain or traffic and models of rain or traffic i get this
[00:09:39] models of rain or traffic i get this green area so the question is is the
[00:09:41] green area so the question is is the shade a dark dark red area uh a subset
[00:09:45] shade a dark dark red area uh a subset of the green area and in this case it is
[00:09:47] of the green area and in this case it is so so it turns out that the resolution
[00:09:49] so so it turns out that the resolution is actually sound so so in terms of
[00:09:51] is actually sound so so in terms of thinking about the models thinking about
[00:09:53] thinking about the models thinking about the semantics here we are getting we are
[00:09:55] the semantics here we are getting we are getting soundness we are ensuring that
[00:09:57] getting soundness we are ensuring that we are getting truth uh by applying
[00:09:59] we are getting truth uh by applying resolution okay so resolution is sound
[00:10:03] resolution okay so resolution is sound so as you've kind of seen resolution
[00:10:06] so as you've kind of seen resolution only works on clauses right like i've
[00:10:08] only works on clauses right like i've been defining these clauses which are
[00:10:10] been defining these clauses which are disjunctions of literals and the
[00:10:12] disjunctions of literals and the question is can i apply resolution to
[00:10:15] question is can i apply resolution to all of propositional logic and the
[00:10:16] all of propositional logic and the answer is yes it actually turns out that
[00:10:19] answer is yes it actually turns out that the fact because like if resolution only
[00:10:21] the fact because like if resolution only works on clauses that is actually enough
[00:10:24] works on clauses that is actually enough and the reason that is actually enough
[00:10:26] and the reason that is actually enough is you can think about any prepositional
[00:10:28] is you can think about any prepositional formula and you can write any
[00:10:30] formula and you can write any prepositional formula as a conjunction
[00:10:33] prepositional formula as a conjunction of a bunch of clauses and that's called
[00:10:35] of a bunch of clauses and that's called a conjunctive conjunctive normal form
[00:10:38] a conjunctive conjunctive normal form okay so a conjunctive normal form a cnf
[00:10:41] okay so a conjunctive normal form a cnf formula is a conjunction of clauses okay
[00:10:44] formula is a conjunction of clauses okay so so an example of that is is it's you
[00:10:47] so so an example of that is is it's you have a clause a or b or negation of c
[00:10:50] have a clause a or b or negation of c you have another clause negation of b or
[00:10:52] you have another clause negation of b or d
[00:10:53] d and an end of these two clauses is is a
[00:10:56] and an end of these two clauses is is a conjunct isn't a conjunctive normal form
[00:10:58] conjunct isn't a conjunctive normal form okay so you can kind of like think of
[00:11:01] okay so you can kind of like think of this as the equivalent of having a
[00:11:02] this as the equivalent of having a knowledge base where you have each
[00:11:04] knowledge base where you have each formula is a clause and then when you
[00:11:07] formula is a clause and then when you have a bunch of formulas in your in your
[00:11:08] have a bunch of formulas in your in your knowledge base you're basically thinking
[00:11:10] knowledge base you're basically thinking about end of those formulas right so so
[00:11:12] about end of those formulas right so so a knowledge base basically is an end of
[00:11:15] a knowledge base basically is an end of a bunch of a conjunction of a bunch of
[00:11:17] a bunch of a conjunction of a bunch of formulas that could be written let's say
[00:11:19] formulas that could be written let's say as clauses okay
[00:11:22] as clauses okay all right so so then basically every
[00:11:25] all right so so then basically every formula that that is written in
[00:11:27] formula that that is written in propositional logic can be converted
[00:11:30] propositional logic can be converted into a conjunctive normal form in a new
[00:11:32] into a conjunctive normal form in a new formula conjunctive normal form that's
[00:11:33] formula conjunctive normal form that's that's exactly equal like the models of
[00:11:36] that's exactly equal like the models of the old formula is exactly equal to the
[00:11:38] the old formula is exactly equal to the models of the new formula so how can we
[00:11:40] models of the new formula so how can we do that um it's actually a
[00:11:43] do that um it's actually a kind of um easy way of doing it it's
[00:11:46] kind of um easy way of doing it it's just there's just a recipe uh for for
[00:11:48] just there's just a recipe uh for for trans for converting every formula to a
[00:11:51] trans for converting every formula to a conjunctive normal form let's let's look
[00:11:52] conjunctive normal form let's let's look at an example so
[00:11:54] at an example so let's say that you have a formula it
[00:11:56] let's say that you have a formula it says summer implies no and the whole
[00:11:57] says summer implies no and the whole thing implies bizarre okay
[00:12:00] thing implies bizarre okay so um
[00:12:02] so um here i don't have any ends or ors right
[00:12:04] here i don't have any ends or ors right like i have this implication so i need
[00:12:06] like i have this implication so i need to get rid of that implication how can i
[00:12:08] to get rid of that implication how can i do that i can basically remove
[00:12:10] do that i can basically remove implication and and write that write it
[00:12:12] implication and and write that write it out in the form that i talked about
[00:12:13] out in the form that i talked about earlier which is negation of the first
[00:12:15] earlier which is negation of the first term or the second term so this
[00:12:17] term or the second term so this implication i can write it as negation
[00:12:19] implication i can write it as negation of the first term this whole term or the
[00:12:22] of the first term this whole term or the second term i can remove this
[00:12:24] second term i can remove this implication and write it in a similar
[00:12:27] implication and write it in a similar way i can write it as negation of summer
[00:12:29] way i can write it as negation of summer or snow okay so now what i'm going to do
[00:12:32] or snow okay so now what i'm going to do is i'm going to push the negation inside
[00:12:33] is i'm going to push the negation inside using the morgan's law so i'm going to
[00:12:35] using the morgan's law so i'm going to push a negation inside make this and
[00:12:37] push a negation inside make this and push negation inside i have a double
[00:12:39] push negation inside i have a double negation here i'm going to get rid of
[00:12:41] negation here i'm going to get rid of that double negation and make this
[00:12:43] that double negation and make this positive
[00:12:44] positive so now i have a bunch of literals
[00:12:45] so now i have a bunch of literals positive or negative and i only have
[00:12:48] positive or negative and i only have like ands and ors but this is actually
[00:12:50] like ands and ors but this is actually not in the conjunctive normal form right
[00:12:52] not in the conjunctive normal form right because conjunctive normal form means
[00:12:54] because conjunctive normal form means that and of a bunch of ores this is
[00:12:57] that and of a bunch of ores this is actually the opposite right this is this
[00:12:58] actually the opposite right this is this is or of a bunch of ants
[00:13:01] is or of a bunch of ants but what you can you can you can
[00:13:02] but what you can you can you can actually distribute this or over the end
[00:13:05] actually distribute this or over the end and then you end up if you distribute
[00:13:07] and then you end up if you distribute the or over the end you end up with
[00:13:09] the or over the end you end up with these two causes summer or bizarre and
[00:13:11] these two causes summer or bizarre and another clause which is
[00:13:13] another clause which is negation of snow or bizarre
[00:13:15] negation of snow or bizarre okay so so you end up in a cnf form any
[00:13:18] okay so so you end up in a cnf form any formula you give me i can end up in a
[00:13:20] formula you give me i can end up in a cnn form so so the general recipe for it
[00:13:23] cnn form so so the general recipe for it is if you have if you have implications
[00:13:25] is if you have if you have implications or these bi-directional implications
[00:13:28] or these bi-directional implications replace them with ad and orders and
[00:13:29] replace them with ad and orders and negations so that's the first thing you
[00:13:31] negations so that's the first thing you want to do bi-directional implication
[00:13:33] want to do bi-directional implication write it out as as
[00:13:35] write it out as as as implications and ants
[00:13:37] as implications and ants if you see an implication write it out
[00:13:39] if you see an implication write it out as a form of negation and or
[00:13:42] as a form of negation and or if you have any negations move them
[00:13:43] if you have any negations move them inside using de morgan's law
[00:13:45] inside using de morgan's law if you have double negations remove the
[00:13:47] if you have double negations remove the double negations and then at the end
[00:13:50] double negations and then at the end just distribute or over end if you have
[00:13:52] just distribute or over end if you have if you have any anything of that form
[00:13:53] if you have any anything of that form and you'll end up in a conjunctive
[00:13:55] and you'll end up in a conjunctive normal form so that is kind of the
[00:13:57] normal form so that is kind of the general recipe of converting any
[00:13:59] general recipe of converting any propositional logic formula to a cnf
[00:14:01] propositional logic formula to a cnf form okay
[00:14:03] form okay and then why are we writing this as a
[00:14:05] and then why are we writing this as a cnf form because the resolution rule
[00:14:07] cnf form because the resolution rule works only on clauses which is it only
[00:14:10] works only on clauses which is it only works on cnf form formulas
[00:14:13] works on cnf form formulas all right
[00:14:14] all right so so what's the idea of resolution
[00:14:16] so so what's the idea of resolution algorithm well
[00:14:18] algorithm well why are we trying to run resolution the
[00:14:20] why are we trying to run resolution the reason is in general you might be asking
[00:14:22] reason is in general you might be asking me if f is true or not right like we we
[00:14:25] me if f is true or not right like we we care about having that assistant that
[00:14:27] care about having that assistant that they can ask from or we can tell it
[00:14:29] they can ask from or we can tell it things and and what does that do that
[00:14:31] things and and what does that do that tries to basically check things like
[00:14:33] tries to basically check things like entailment right so so if if the
[00:14:36] entailment right so so if if the knowledge base if you want to check if
[00:14:37] knowledge base if you want to check if the knowledge base is entailing a new
[00:14:39] the knowledge base is entailing a new formula or not that's the same thing
[00:14:42] formula or not that's the same thing right that's the same thing as checking
[00:14:43] right that's the same thing as checking if negation of f if the knowledge base
[00:14:45] if negation of f if the knowledge base contradicts negation of f or basically
[00:14:48] contradicts negation of f or basically checking if negation of f added to the
[00:14:51] checking if negation of f added to the knowledge base is unsatisfiable or not
[00:14:53] knowledge base is unsatisfiable or not okay
[00:14:54] okay so how do we run the resolution based
[00:14:56] so how do we run the resolution based algorithm well what we do is if you ask
[00:14:58] algorithm well what we do is if you ask me if f is entailed or not i'll add
[00:15:00] me if f is entailed or not i'll add negation of f to my knowledge base and
[00:15:02] negation of f to my knowledge base and then i convert all my formulas to cnf
[00:15:05] then i convert all my formulas to cnf form we can do that and once i have
[00:15:07] form we can do that and once i have everything to cn in a cnf form i can
[00:15:09] everything to cn in a cnf form i can apply resolution i can keep repeatedly
[00:15:11] apply resolution i can keep repeatedly applying resolution until until
[00:15:14] applying resolution until until everything is converged and then i can
[00:15:16] everything is converged and then i can return entailment even only if i'm
[00:15:18] return entailment even only if i'm deriving false okay
[00:15:20] deriving false okay so that is how we run resolution if if
[00:15:23] so that is how we run resolution if if we want to answer a question about
[00:15:25] we want to answer a question about entailment
[00:15:27] entailment let's look at an example here so let's
[00:15:28] let's look at an example here so let's say i have a knowledge base this is my
[00:15:30] say i have a knowledge base this is my knowledge base it has a bunch of things
[00:15:31] knowledge base it has a bunch of things in it they're not in a cnn form they're
[00:15:33] in it they're not in a cnn form they're not in a class form or anything but but
[00:15:35] not in a class form or anything but but i have a bunch of formulas and and
[00:15:38] i have a bunch of formulas and and you're asking me if if this knowledge
[00:15:40] you're asking me if if this knowledge base entails a new formula and that new
[00:15:42] base entails a new formula and that new formula is c so how do i check that
[00:15:44] formula is c so how do i check that using resolution what i'm going to do is
[00:15:46] using resolution what i'm going to do is i'm going to add a negation of c to my
[00:15:48] i'm going to add a negation of c to my knowledge base i'm going to make
[00:15:50] knowledge base i'm going to make everything to the to a cnf form so using
[00:15:53] everything to the to a cnf form so using that recipe that i talked about removing
[00:15:55] that recipe that i talked about removing implications and pushing pushing
[00:15:57] implications and pushing pushing negations in and distributing wars over
[00:15:59] negations in and distributing wars over things right once i do that everything
[00:16:01] things right once i do that everything is in a class form i have a clause and i
[00:16:03] is in a class form i have a clause and i have literals okay so so this is my this
[00:16:06] have literals okay so so this is my this is my uh knowledge base everything is in
[00:16:09] is my uh knowledge base everything is in the class form in a cnf form
[00:16:11] the class form in a cnf form and then i'm going to repeatedly apply a
[00:16:13] and then i'm going to repeatedly apply a resolution so how do i apply resolution
[00:16:15] resolution so how do i apply resolution let's start from from these two so i
[00:16:17] let's start from from these two so i have a and i have negation of a or b or
[00:16:20] have a and i have negation of a or b or c in some sense a and negation of a gets
[00:16:23] c in some sense a and negation of a gets cancelled out so i can add b or c to my
[00:16:26] cancelled out so i can add b or c to my to my knowledge base using resolution
[00:16:28] to my knowledge base using resolution okay i have negation of b in my in my
[00:16:31] okay i have negation of b in my in my knowledge base so so negation of b and b
[00:16:34] knowledge base so so negation of b and b get cancelled out and i can add c to my
[00:16:36] get cancelled out and i can add c to my knowledge base and i've added negation
[00:16:38] knowledge base and i've added negation of c to my knowledge base of negation of
[00:16:40] of c to my knowledge base of negation of c and
[00:16:41] c and and c get cancelled out and i'll get
[00:16:43] and c get cancelled out and i'll get false okay so after repeatedly applying
[00:16:46] false okay so after repeatedly applying resolution here i'm getting false
[00:16:48] resolution here i'm getting false meaning that when i add negation of the
[00:16:50] meaning that when i add negation of the formula i i was able to get this
[00:16:52] formula i i was able to get this contradiction i was able to get false
[00:16:54] contradiction i was able to get false and what that means is that knowledge
[00:16:56] and what that means is that knowledge base actually entails the formula the
[00:16:58] base actually entails the formula the formula being c in this case okay so
[00:17:00] formula being c in this case okay so knowledge base in klc yeah i can derive
[00:17:03] knowledge base in klc yeah i can derive c okay
[00:17:05] c okay all right so
[00:17:07] all right so a good question to ask is what is the
[00:17:08] a good question to ask is what is the time complexity of these and these
[00:17:10] time complexity of these and these algorithms so so if you remember modus
[00:17:12] algorithms so so if you remember modus ponens right like the idea of modus
[00:17:14] ponens right like the idea of modus ponens this was a more general form of
[00:17:16] ponens this was a more general form of it was that at every step right like we
[00:17:18] it was that at every step right like we would at most uh kind of like add one
[00:17:21] would at most uh kind of like add one prepositional symbol uh to to our to our
[00:17:24] prepositional symbol uh to to our to our knowledge base and if you're adding one
[00:17:26] knowledge base and if you're adding one prepositional symbol like if you have
[00:17:28] prepositional symbol like if you have like n of them you have at most like n
[00:17:30] like n of them you have at most like n things to go over so so this would be a
[00:17:33] things to go over so so this would be a linear time algorithm like when you're
[00:17:35] linear time algorithm like when you're running modus ponens it's pretty simple
[00:17:37] running modus ponens it's pretty simple it's it's it's also converges fairly
[00:17:39] it's it's it's also converges fairly quickly because our end things that we
[00:17:41] quickly because our end things that we need to go over
[00:17:42] need to go over but when we think about uh
[00:17:44] but when we think about uh this this inference rule resolution but
[00:17:47] this this inference rule resolution but when we were thinking about resolution
[00:17:49] when we were thinking about resolution we are adding many prepositional symbols
[00:17:51] we are adding many prepositional symbols back to our back to our um knowledge
[00:17:54] back to our back to our um knowledge base and then kind of like worst case
[00:17:55] base and then kind of like worst case you're adding all the subsets of
[00:17:58] you're adding all the subsets of of the disjunctions of these these
[00:18:00] of the disjunctions of these these symbols into our to our uh knowledge
[00:18:03] symbols into our to our uh knowledge base at the end too so what that means
[00:18:04] base at the end too so what that means is you have to go over all of them and
[00:18:07] is you have to go over all of them and then that takes exponential time right
[00:18:09] then that takes exponential time right so so running resolution is in terms of
[00:18:12] so so running resolution is in terms of time complexity it takes exponential
[00:18:14] time complexity it takes exponential time
[00:18:15] time and it's actually not surprising that it
[00:18:18] and it's actually not surprising that it takes exponential time if you think
[00:18:19] takes exponential time if you think about what resolution is doing is it's
[00:18:21] about what resolution is doing is it's actually trying to solve a
[00:18:22] actually trying to solve a satisfiability problem right like you
[00:18:25] satisfiability problem right like you have pauses and and and these clauses
[00:18:27] have pauses and and and these clauses and you want to find you we want to
[00:18:29] and you want to find you we want to check satisfiability here you're doing
[00:18:30] check satisfiability here you're doing model shaking and that satisfiability is
[00:18:33] model shaking and that satisfiability is known to be mp complete so it's not
[00:18:35] known to be mp complete so it's not surprising that that running resolution
[00:18:38] surprising that that running resolution until convergence actually takes
[00:18:39] until convergence actually takes exponential time
[00:18:41] exponential time so there are really some trade-offs here
[00:18:43] so there are really some trade-offs here like if you think about using horn
[00:18:46] like if you think about using horn clauses uh you could you could use modus
[00:18:49] clauses uh you could you could use modus ponens the nice thing about it is that
[00:18:51] ponens the nice thing about it is that it's going to be linear time but it is
[00:18:53] it's going to be linear time but it is less expressive you're not able to
[00:18:55] less expressive you're not able to represent everything in propositional
[00:18:56] represent everything in propositional logic you're only limited to foreign
[00:18:58] logic you're only limited to foreign clauses but one classes turn that turn
[00:19:00] clauses but one classes turn that turn out to be kind of useful for for many
[00:19:03] out to be kind of useful for for many applications especially some
[00:19:04] applications especially some applications in programming languages so
[00:19:06] applications in programming languages so so in those applications it does make
[00:19:08] so in those applications it does make sense to use modus ponens because it
[00:19:11] sense to use modus ponens because it it's faster it takes linear time on the
[00:19:13] it's faster it takes linear time on the other hand if you really care about all
[00:19:15] other hand if you really care about all of your positional logic then you really
[00:19:17] of your positional logic then you really care about dealing with any type of
[00:19:19] care about dealing with any type of causes and there you have to use
[00:19:21] causes and there you have to use resolution but the problem with
[00:19:23] resolution but the problem with resolution is it's trying to solve an
[00:19:24] resolution is it's trying to solve an np-complete problem it takes exponential


================================================================================
LECTURE 047
================================================================================

Logic 7 - First Order Logic | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=Z-O0Q3_oTJM

---

Transcript

[00:00:05] okay so in this module we would like to
[00:00:07] okay so in this module we would like to talk about first order logic so far
[00:00:09] talk about first order logic so far we've been talking about propositional
[00:00:10] we've been talking about propositional logic you've talked about the syntax of
[00:00:12] logic you've talked about the syntax of propositional logic it's semantics and
[00:00:14] propositional logic it's semantics and we've also talked about a few different
[00:00:16] we've also talked about a few different inference rules so we've talked about
[00:00:17] inference rules so we've talked about modus ponens and and resolution okay
[00:00:20] modus ponens and and resolution okay and now we want to extend our logic and
[00:00:23] and now we want to extend our logic and make it a little bit fancier make it a
[00:00:25] make it a little bit fancier make it a little bit more complicated and and
[00:00:26] little bit more complicated and and think about first order logic so the
[00:00:28] think about first order logic so the first question to ask is why do we even
[00:00:31] first question to ask is why do we even want to do that why is why is
[00:00:33] want to do that why is why is propositional logic not enough like if
[00:00:35] propositional logic not enough like if you remember we talked about resolution
[00:00:37] you remember we talked about resolution and and resolution was was taking an
[00:00:39] and and resolution was was taking an exponential amount of time to to to be
[00:00:42] exponential amount of time to to to be solved and it seemed pretty powerful
[00:00:44] solved and it seemed pretty powerful right like if you can if you can do that
[00:00:45] right like if you can if you can do that in propositional logic it seems pretty
[00:00:47] in propositional logic it seems pretty useful it seems pretty powerful so so
[00:00:49] useful it seems pretty powerful so so what are some of the limitations of
[00:00:51] what are some of the limitations of prepositional logic okay so so let me
[00:00:53] prepositional logic okay so so let me show that in one example so imagine we
[00:00:55] show that in one example so imagine we start with a sentence that says alice
[00:00:58] start with a sentence that says alice and bob both know are athletic so if i
[00:01:00] and bob both know are athletic so if i want to write this in propositional
[00:01:02] want to write this in propositional logic one way of doing that is is that i
[00:01:04] logic one way of doing that is is that i can have a set of set of symbols for
[00:01:07] can have a set of set of symbols for positional symbols one being alice knows
[00:01:09] positional symbols one being alice knows arithmetic and i can have another
[00:01:11] arithmetic and i can have another prepositional symbol about bob knows
[00:01:13] prepositional symbol about bob knows arithmetic and this can take true or
[00:01:15] arithmetic and this can take true or false values and one way of writing this
[00:01:18] false values and one way of writing this is is by writing this this
[00:01:20] is is by writing this this this particular formula where i can
[00:01:23] this particular formula where i can write alice knows arithmetic and bob
[00:01:25] write alice knows arithmetic and bob knows everything okay
[00:01:27] knows everything okay so this seems uh something this seems a
[00:01:30] so this seems uh something this seems a little weird right right this seems like
[00:01:31] little weird right right this seems like something is wrong here so what is wrong
[00:01:34] something is wrong here so what is wrong here
[00:01:35] here so if i try to extend this and and write
[00:01:38] so if i try to extend this and and write something that's slightly more
[00:01:39] something that's slightly more complicated
[00:01:40] complicated this type of writing symbols and adding
[00:01:43] this type of writing symbols and adding them and so on just doesn't scale it's
[00:01:45] them and so on just doesn't scale it's not expressive enough let's say i write
[00:01:47] not expressive enough let's say i write all students know earth if i'm writing
[00:01:50] all students know earth if i'm writing all students no arithmetic
[00:01:52] all students no arithmetic then i'm going to list all student names
[00:01:55] then i'm going to list all student names and then have a single
[00:01:56] and then have a single a single prepositional symbol for each
[00:01:58] a single prepositional symbol for each of students knowing arithmetic and
[00:02:00] of students knowing arithmetic and ending that and that just just doesn't
[00:02:03] ending that and that just just doesn't scale if i have a lot of students if
[00:02:04] scale if i have a lot of students if you're in a class like 221 that wouldn't
[00:02:07] you're in a class like 221 that wouldn't really scale right like i need to write
[00:02:09] really scale right like i need to write alice's student implies alice knows
[00:02:11] alice's student implies alice knows arithmetic bob the student implies bob
[00:02:13] arithmetic bob the student implies bob knows arithmetic and each one of these
[00:02:15] knows arithmetic and each one of these is going to be a symbol that takes like
[00:02:17] is going to be a symbol that takes like true false by itself and and this is
[00:02:19] true false by itself and and this is going to go up fairly quickly even worse
[00:02:22] going to go up fairly quickly even worse like i can have a situation where i i
[00:02:24] like i can have a situation where i i write a preposition that says
[00:02:26] write a preposition that says i i write a statement that says every
[00:02:28] i i write a statement that says every even integer greater than two is the sum
[00:02:30] even integer greater than two is the sum of two primes okay so this is actually a
[00:02:32] of two primes okay so this is actually a gold box conjecture okay so if i want to
[00:02:34] gold box conjecture okay so if i want to write this in logic well i'm kind of
[00:02:36] write this in logic well i'm kind of screwed right like i can't write this in
[00:02:38] screwed right like i can't write this in propositional logic because it's talking
[00:02:40] propositional logic because it's talking about every even integer and there's an
[00:02:42] about every even integer and there's an infinite number of them so i'm not going
[00:02:44] infinite number of them so i'm not going to be able to write that in
[00:02:45] to be able to write that in propositional logic okay so so what can
[00:02:48] propositional logic okay so so what can we do it it looks like if i'm using
[00:02:51] we do it it looks like if i'm using propositional logic it's very clunky and
[00:02:54] propositional logic it's very clunky and then there is there is kind of like a
[00:02:55] then there is there is kind of like a lot of proposition a lot of
[00:02:57] lot of proposition a lot of prepositional symbols going on and it
[00:03:00] prepositional symbols going on and it just wouldn't scale
[00:03:02] just wouldn't scale but if you think about it like when
[00:03:03] but if you think about it like when you're thinking about these statements
[00:03:06] you're thinking about these statements there are some objects here and then
[00:03:08] there are some objects here and then some relationships some predicates
[00:03:10] some relationships some predicates between these objects and maybe we can
[00:03:12] between these objects and maybe we can use that structure there's quite a bit
[00:03:13] use that structure there's quite a bit of structure here right like alice being
[00:03:16] of structure here right like alice being a student or robbing a student right
[00:03:18] a student or robbing a student right like being a student is is a predicate
[00:03:21] like being a student is is a predicate on top of this object object being alice
[00:03:23] on top of this object object being alice or object being bob so maybe we can use
[00:03:25] or object being bob so maybe we can use that structure and instead of defining a
[00:03:28] that structure and instead of defining a single propositional symbol for
[00:03:29] single propositional symbol for everything maybe we can talk about
[00:03:31] everything maybe we can talk about objects and predicates instead okay so
[00:03:34] objects and predicates instead okay so so what that means is here for example
[00:03:37] so what that means is here for example like in this other example alice knowing
[00:03:39] like in this other example alice knowing you're arithmetic alice you can think of
[00:03:40] you're arithmetic alice you can think of it as an object arithmetic you can think
[00:03:42] it as an object arithmetic you can think of it as an object and knowing is a
[00:03:45] of it as an object and knowing is a predicate on top of alice and arithmetic
[00:03:48] predicate on top of alice and arithmetic and maybe we can we can think about that
[00:03:50] and maybe we can we can think about that structure and in general this other view
[00:03:52] structure and in general this other view that there are some objects and some
[00:03:53] that there are some objects and some predicates on top of them and think of
[00:03:56] predicates on top of them and think of that as a way of writing a new type of
[00:03:58] that as a way of writing a new type of logic in addition to that like for that
[00:04:01] logic in addition to that like for that example
[00:04:02] example i was talking about every integer having
[00:04:04] i was talking about every integer having a property then for for those types of
[00:04:07] a property then for for those types of types of specifications we need to think
[00:04:10] types of specifications we need to think about quantifiers right we need to think
[00:04:12] about quantifiers right we need to think about ways of saying for all or rays of
[00:04:15] about ways of saying for all or rays of saying there exists so so we need to
[00:04:17] saying there exists so so we need to have a way of representing these
[00:04:19] have a way of representing these quantifiers and to represent the
[00:04:20] quantifiers and to represent the quantifier we need to have a variable
[00:04:22] quantifier we need to have a variable right so when i say for all students
[00:04:24] right so when i say for all students right i need to have a variable x that
[00:04:27] right i need to have a variable x that corresponds to every single student so
[00:04:29] corresponds to every single student so in addition to these objects and
[00:04:31] in addition to these objects and predicates we need to have quantifiers
[00:04:34] predicates we need to have quantifiers and variables and then we need to use
[00:04:35] and variables and then we need to use quantifiers and variables to represent
[00:04:38] quantifiers and variables to represent to represent our our statements here
[00:04:40] to represent our our statements here okay let me let me give you
[00:04:42] okay let me let me give you an example so so what i want to do in
[00:04:45] an example so so what i want to do in this in this module today is i want to
[00:04:46] this in this module today is i want to talk about the syntax of first order
[00:04:48] talk about the syntax of first order logic and i want to then talk about
[00:04:50] logic and i want to then talk about semantics of first order logic and the
[00:04:52] semantics of first order logic and the next module i'll be talking about and
[00:04:54] next module i'll be talking about and friends rules but i'm not gonna go into
[00:04:57] friends rules but i'm not gonna go into as much like i'm not gonna do justice
[00:04:59] as much like i'm not gonna do justice here so i'm not gonna
[00:05:00] here so i'm not gonna go into as much details and syntax and
[00:05:02] go into as much details and syntax and semantics the same way that we did in
[00:05:04] semantics the same way that we did in propositional logic so it's a little bit
[00:05:06] propositional logic so it's a little bit more high level here okay so let me just
[00:05:09] more high level here okay so let me just give you a couple of examples here so so
[00:05:10] give you a couple of examples here so so if i'm saying alice and bob both know
[00:05:13] if i'm saying alice and bob both know arithmetic in first order logic ideally
[00:05:16] arithmetic in first order logic ideally i would want to be able to write
[00:05:17] i would want to be able to write something of this form like something
[00:05:20] something of this form like something that says hey predicate of knowing
[00:05:23] that says hey predicate of knowing over alice and arithmetic over over
[00:05:25] over alice and arithmetic over over these these objects alice and arithmetic
[00:05:28] these these objects alice and arithmetic should be true and the same predicate of
[00:05:31] should be true and the same predicate of knowing over bob and arithmetic again
[00:05:33] knowing over bob and arithmetic again arithmetic is the same symbol should be
[00:05:35] arithmetic is the same symbol should be true okay so i want to be able to
[00:05:37] true okay so i want to be able to capture that structure of objects and
[00:05:39] capture that structure of objects and predicates in first order watching
[00:05:42] predicates in first order watching the other thing is like going back to
[00:05:43] the other thing is like going back to this other student all students knowing
[00:05:46] this other student all students knowing arithmetic i should be able to have
[00:05:48] arithmetic i should be able to have quantifiers and variables so so ideally
[00:05:50] quantifiers and variables so so ideally if i want to write this in first order
[00:05:52] if i want to write this in first order logic i would i should write something
[00:05:54] logic i would i should write something of this form for all x
[00:05:56] of this form for all x x being a student predicate of student
[00:05:59] x being a student predicate of student over variable of x should imply x
[00:06:03] over variable of x should imply x knowing arithmetic again knowing is the
[00:06:05] knowing arithmetic again knowing is the same predicate as before okay
[00:06:08] same predicate as before okay so these are actually just examples of
[00:06:09] so these are actually just examples of first order logic but but how do we do
[00:06:11] first order logic but but how do we do this how do we get to these get to these
[00:06:13] this how do we get to these get to these statements for that we need to define
[00:06:16] statements for that we need to define the syntax of first order logic so so
[00:06:18] the syntax of first order logic so so let's get into the syntax of first order
[00:06:20] let's get into the syntax of first order logic
[00:06:21] logic all right
[00:06:22] all right so let me go to my notebook so so we're
[00:06:24] so let me go to my notebook so so we're going to talk about first order logic
[00:06:30] and it's syntax
[00:06:33] and it's syntax and when you're defining a syntax of
[00:06:35] and when you're defining a syntax of first order logic we have two types of
[00:06:37] first order logic we have two types of things going on we have terms
[00:06:40] things going on we have terms and we have formulas
[00:06:42] and we have formulas if you remember prepositional logic we
[00:06:44] if you remember prepositional logic we only had formulas here here we first
[00:06:46] only had formulas here here we first need to define a set of terms and these
[00:06:48] need to define a set of terms and these terms are expressions that are referring
[00:06:51] terms are expressions that are referring to objects okay so so they are
[00:06:53] to objects okay so so they are expressions
[00:06:59] that are referring to objects
[00:07:03] that are referring to objects okay
[00:07:04] okay okay so what are terms so the first
[00:07:06] okay so what are terms so the first thing that we consider as term is a
[00:07:09] thing that we consider as term is a constant symbol
[00:07:11] constant symbol okay
[00:07:12] okay so alice for example or math
[00:07:19] or arithmetic or whatever these are
[00:07:21] or arithmetic or whatever these are constant symbols okay so constant symbol
[00:07:24] constant symbols okay so constant symbol is is a term in addition to constant
[00:07:26] is is a term in addition to constant symbols we need to have variables like
[00:07:28] symbols we need to have variables like as i as i was talking about earlier if
[00:07:30] as i as i was talking about earlier if you want to if you want to be able to
[00:07:32] you want to if you want to be able to talk about quantifiers those quantifiers
[00:07:35] talk about quantifiers those quantifiers need to be defined over variables so so
[00:07:37] need to be defined over variables so so a variable is kind of like if when i say
[00:07:39] a variable is kind of like if when i say for all x x does something that x is a
[00:07:42] for all x x does something that x is a variable okay
[00:07:44] variable okay and in addition to that we we can have
[00:07:46] and in addition to that we we can have functions here and these functions are
[00:07:48] functions here and these functions are defined on some of these terms so so
[00:07:51] defined on some of these terms so so functions can be defined on terms and
[00:07:53] functions can be defined on terms and and they also give us terms so for
[00:07:55] and they also give us terms so for example i can look at a function like
[00:07:57] example i can look at a function like summing over x and 3. so if i'm looking
[00:08:00] summing over x and 3. so if i'm looking at sum of
[00:08:01] at sum of 3 and x x is a variable 3 here is a
[00:08:05] 3 and x x is a variable 3 here is a constant symbol summing over that is a
[00:08:07] constant symbol summing over that is a function and that also gives me a chance
[00:08:09] function and that also gives me a chance okay
[00:08:10] okay all right so so that is uh that is terms
[00:08:14] all right so so that is uh that is terms so now i can talk about formulas okay so
[00:08:16] so now i can talk about formulas okay so so now that i've defined terms i can
[00:08:18] so now that i've defined terms i can talk about formulas the the most basic
[00:08:20] talk about formulas the the most basic form of a formula
[00:08:22] form of a formula is is an atomic formula this is actually
[00:08:26] is is an atomic formula this is actually very similar to our prepositional
[00:08:28] very similar to our prepositional symbols in propositional logic so so
[00:08:30] symbols in propositional logic so so this is kind of like the basis of it
[00:08:32] this is kind of like the basis of it right in propositional logic we had
[00:08:33] right in propositional logic we had these propositional symbols like p or
[00:08:36] these propositional symbols like p or negation of negation of p we would
[00:08:37] negation of negation of p we would define negation on top of that but p was
[00:08:40] define negation on top of that but p was this this propositional symbol here
[00:08:42] this this propositional symbol here atomic formula is kind of like the basis
[00:08:44] atomic formula is kind of like the basis of of this of this logic okay so so what
[00:08:47] of of this of this logic okay so so what is an atomic formula an atomic formula
[00:08:50] is an atomic formula an atomic formula is a predicate applied to terms okay so
[00:08:53] is a predicate applied to terms okay so so for example bob knowing arithmetic is
[00:08:56] so for example bob knowing arithmetic is is an atomic formula i can write that as
[00:08:59] is an atomic formula i can write that as um
[00:08:59] um nose
[00:09:01] nose nose is a predicate applied to symbol
[00:09:05] nose is a predicate applied to symbol constant symbol bob and constant symbol
[00:09:08] constant symbol bob and constant symbol arithmetic
[00:09:10] arithmetic okay and
[00:09:12] okay and this whole thing is an atomic formula
[00:09:14] this whole thing is an atomic formula okay so once we have the atomic formula
[00:09:17] okay so once we have the atomic formula then what we can do is we can do
[00:09:18] then what we can do is we can do operations on top of these right we
[00:09:20] operations on top of these right we could we could do the same things that
[00:09:22] could we could do the same things that we did in propositional logic we can
[00:09:24] we did in propositional logic we can have logical connectives on top of them
[00:09:27] have logical connectives on top of them connectives
[00:09:29] connectives on top of these so uh apply to these
[00:09:31] on top of these so uh apply to these formulas so these logical connectives
[00:09:33] formulas so these logical connectives are things like negation or or
[00:09:36] are things like negation or or and or implication
[00:09:38] and or implication bi-directional implication we can just
[00:09:40] bi-directional implication we can just apply the same sort of things similar to
[00:09:42] apply the same sort of things similar to propositional logic here and in addition
[00:09:44] propositional logic here and in addition to that we are going to define
[00:09:46] to that we are going to define quantifiers so quantifiers is going to
[00:09:49] quantifiers so quantifiers is going to be defined on top of all of these and
[00:09:51] be defined on top of all of these and these quantifiers are things like for
[00:09:53] these quantifiers are things like for all or there exists okay
[00:09:56] all or there exists okay all right
[00:09:57] all right so this defines the syntax of our first
[00:10:00] so this defines the syntax of our first order logic we have terms we have
[00:10:02] order logic we have terms we have formulas and then formulas atomic
[00:10:05] formulas and then formulas atomic formulas are basically going to be
[00:10:07] formulas are basically going to be predicates on top of terms and once we
[00:10:09] predicates on top of terms and once we have the atomic formulas we can play
[00:10:11] have the atomic formulas we can play around with them using the connectives
[00:10:13] around with them using the connectives logical connectives or using the
[00:10:14] logical connectives or using the quantifiers
[00:10:16] quantifiers let's go back here so a quick recap of
[00:10:19] let's go back here so a quick recap of that constant symbols like arithmetic
[00:10:21] that constant symbols like arithmetic variables just like x functions like sum
[00:10:23] variables just like x functions like sum of three and x and then we have formulas
[00:10:26] of three and x and then we have formulas it's referring to kind of like these
[00:10:28] it's referring to kind of like these truth values uh and and atomic formulas
[00:10:31] truth values uh and and atomic formulas are predicates applied to terms
[00:10:33] are predicates applied to terms connectives connect them for example you
[00:10:35] connectives connect them for example you might say uh x is a student implies x
[00:10:38] might say uh x is a student implies x knows arithmetic so this implication is
[00:10:41] knows arithmetic so this implication is connecting this predicate on top of
[00:10:43] connecting this predicate on top of symbol on top of the variable to this
[00:10:46] symbol on top of the variable to this predicate on top of the variable and the
[00:10:48] predicate on top of the variable and the symbol
[00:10:50] symbol and then in addition to that once we
[00:10:51] and then in addition to that once we have variables we can have quantifiers
[00:10:53] have variables we can have quantifiers you can say for all x if x is student
[00:10:56] you can say for all x if x is student that implies x knows everything okay
[00:10:59] that implies x knows everything okay all right so that summarizes the syntax
[00:11:01] all right so that summarizes the syntax of first order logic
[00:11:03] of first order logic one quick note on quantifiers is if you
[00:11:05] one quick note on quantifiers is if you think about quantifiers quantifiers are
[00:11:08] think about quantifiers quantifiers are are just slightly more complicated
[00:11:10] are just slightly more complicated versions of ants and ores so if you
[00:11:12] versions of ants and ores so if you think about the for all quantifier the
[00:11:15] think about the for all quantifier the universal quantification you can think
[00:11:17] universal quantification you can think of it literally as a conjunction okay so
[00:11:19] of it literally as a conjunction okay so when i say for all x p of x that's very
[00:11:23] when i say for all x p of x that's very similar to saying p of a and p of b and
[00:11:25] similar to saying p of a and p of b and p of c and so on okay and and and this
[00:11:28] p of c and so on okay and and and this for all is kind of like being treated as
[00:11:31] for all is kind of like being treated as ants between all the possible things
[00:11:33] ants between all the possible things that can attack that
[00:11:35] that can attack that that this x can take okay
[00:11:38] that this x can take okay similarly if you talk about existential
[00:11:40] similarly if you talk about existential quantification there exists that is kind
[00:11:42] quantification there exists that is kind of like an or you can think of it as a
[00:11:44] of like an or you can think of it as a disjunction if i say there exists xp of
[00:11:46] disjunction if i say there exists xp of x it's very similar to saying p of a or
[00:11:48] x it's very similar to saying p of a or pfp or pfc and so on
[00:11:51] pfp or pfc and so on and if i have a finite number of them
[00:11:52] and if i have a finite number of them then i can actually enumerate all of
[00:11:54] then i can actually enumerate all of that i can actually unroll this and
[00:11:56] that i can actually unroll this and enumerate all of them okay
[00:11:59] enumerate all of them okay um okay so then if for all and there
[00:12:01] um okay so then if for all and there exists are kind of like and and or then
[00:12:04] exists are kind of like and and or then i can apply the morgan's law so what
[00:12:06] i can apply the morgan's law so what that means is if i have a negation
[00:12:08] that means is if i have a negation outside of one of these quantifiers if i
[00:12:10] outside of one of these quantifiers if i say negation of for all x p x that is
[00:12:13] say negation of for all x p x that is equivalent to saying there exists x
[00:12:16] equivalent to saying there exists x negation of px
[00:12:17] negation of px why because the the ands are going to be
[00:12:19] why because the the ands are going to be flipped to become ors so so for all
[00:12:22] flipped to become ors so so for all becomes there exists and the negation
[00:12:24] becomes there exists and the negation becomes inside just like the morgan's
[00:12:26] becomes inside just like the morgan's law applies to enzymes
[00:12:29] law applies to enzymes all right
[00:12:30] all right another point i want to make here is is
[00:12:33] another point i want to make here is is if when we say for all x there exists y
[00:12:36] if when we say for all x there exists y we can't flip the order again just like
[00:12:38] we can't flip the order again just like hand and earth you can't really like
[00:12:40] hand and earth you can't really like flip this order right if i say for all x
[00:12:42] flip this order right if i say for all x there exists y so then x knows y that's
[00:12:46] there exists y so then x knows y that's pretty different from saying there
[00:12:48] pretty different from saying there exists y such that for all x x knows y
[00:12:51] exists y such that for all x x knows y so so we can't like simply flip their
[00:12:53] so so we can't like simply flip their orders do not do that
[00:12:56] orders do not do that okay so so now that we know the syntax
[00:12:58] okay so so now that we know the syntax of first order logic let's talk about
[00:13:01] of first order logic let's talk about how we can start from natural language
[00:13:04] how we can start from natural language and from natural language how can we
[00:13:06] and from natural language how can we write first order logic so if you think
[00:13:08] write first order logic so if you think about universal quantification when we
[00:13:11] about universal quantification when we talk about for all the way that we
[00:13:13] talk about for all the way that we usually refer to that in natural
[00:13:15] usually refer to that in natural language is by using the word like every
[00:13:18] language is by using the word like every okay so if i say every student knows
[00:13:20] okay so if i say every student knows arithmetic i would use the for all like
[00:13:23] arithmetic i would use the for all like quantifier right like i'm saying for all
[00:13:25] quantifier right like i'm saying for all x because that corresponds to every
[00:13:26] x because that corresponds to every student
[00:13:29] student so
[00:13:30] so is this the right so it's a question to
[00:13:31] is this the right so it's a question to ask is is this the right way of writing
[00:13:33] ask is is this the right way of writing this natural language statement i have
[00:13:36] this natural language statement i have every student knows arithmetic
[00:13:38] every student knows arithmetic i would write for all x
[00:13:40] i would write for all x student x is student
[00:13:43] student x is student okay
[00:13:44] okay and then in addition to that i write x
[00:13:46] and then in addition to that i write x knows arithmetic
[00:13:48] knows arithmetic but this doesn't actually correspond to
[00:13:50] but this doesn't actually correspond to this sentence so there is there is
[00:13:51] this sentence so there is there is something a little bit subtle going on
[00:13:53] something a little bit subtle going on here then you say every student knows
[00:13:56] here then you say every student knows arithmetic
[00:13:57] arithmetic basically if you're conditioning it like
[00:13:59] basically if you're conditioning it like knowing you're arithmetic on being a
[00:14:01] knowing you're arithmetic on being a student
[00:14:02] student but the statement here is not doing that
[00:14:04] but the statement here is not doing that conditioning this statement is is
[00:14:06] conditioning this statement is is basically saying everyone is student and
[00:14:08] basically saying everyone is student and everyone knows arithmetic and then
[00:14:10] everyone knows arithmetic and then that's not right right not everyone
[00:14:12] that's not right right not everyone knows arithmetic every student knows
[00:14:14] knows arithmetic every student knows arithmetic so because there is that
[00:14:16] arithmetic so because there is that conditioning that goes on and in our
[00:14:18] conditioning that goes on and in our like in this statement and it's kind of
[00:14:20] like in this statement and it's kind of like implied in this natural language
[00:14:21] like implied in this natural language the correct way of writing this is by
[00:14:24] the correct way of writing this is by using an implication so if i want to
[00:14:26] using an implication so if i want to write out the statement every student
[00:14:28] write out the statement every student knows arithmetic i would write for all x
[00:14:30] knows arithmetic i would write for all x of x as a student then that implies that
[00:14:33] of x as a student then that implies that x knows arithmetic okay so condition on
[00:14:36] x knows arithmetic okay so condition on that person being a student then x knows
[00:14:38] that person being a student then x knows everything and then that that is an
[00:14:40] everything and then that that is an implication
[00:14:41] implication we're going to have a couple of these
[00:14:43] we're going to have a couple of these examples and and what i'm in the
[00:14:44] examples and and what i'm in the assignment the logic assignment so so i
[00:14:47] assignment the logic assignment so so i think it's kind of like a good rule of
[00:14:49] think it's kind of like a good rule of rule of thumb to to think about
[00:14:51] rule of thumb to to think about for all employees every time you see
[00:14:54] for all employees every time you see every so so this is not always true but
[00:14:57] every so so this is not always true but in in general if you if you see like
[00:14:59] in in general if you if you see like every every student or every person does
[00:15:01] every every student or every person does whatever it's it's usually of the form
[00:15:04] whatever it's it's usually of the form of for all implications okay
[00:15:07] of for all implications okay how about there exists so so let's say
[00:15:09] how about there exists so so let's say that we have some student knows
[00:15:12] that we have some student knows arithmetic so when we talk about some
[00:15:14] arithmetic so when we talk about some student then we have to use existential
[00:15:16] student then we have to use existential quantifier and the actual correct way of
[00:15:18] quantifier and the actual correct way of writing this is to say well there exists
[00:15:20] writing this is to say well there exists some student and an x knows arithmetic
[00:15:23] some student and an x knows arithmetic so so an and is going to be sufficient
[00:15:26] so so an and is going to be sufficient here so every time you see a sum it it's
[00:15:29] here so every time you see a sum it it's usually corresponding to there exists
[00:15:31] usually corresponding to there exists and an and and every time you see every
[00:15:33] and an and and every time you see every it's usually corresponding to for all
[00:15:35] it's usually corresponding to for all and an implication
[00:15:39] all right
[00:15:41] all right so note that there are different
[00:15:42] so note that there are different connectives here for for for all and
[00:15:44] connectives here for for for all and there exists usually when you start from
[00:15:46] there exists usually when you start from natural language
[00:15:48] natural language okay let's look at a few examples let's
[00:15:49] okay let's look at a few examples let's see examples let's see if we can write
[00:15:51] see examples let's see if we can write these in first order logic so first
[00:15:54] these in first order logic so first example is there is some course that
[00:15:56] example is there is some course that every student has taken
[00:15:59] every student has taken okay how do we write this
[00:16:01] okay how do we write this so there is some course so so so
[00:16:04] so there is some course so so so there
[00:16:05] there exists a y such that y is a course okay
[00:16:08] exists a y such that y is a course okay there is some course this is this is to
[00:16:10] there is some course this is this is to that part of it there is some course
[00:16:11] that part of it there is some course there is a y so y is a course
[00:16:13] there is a y so y is a course that every student has taken it so
[00:16:15] that every student has taken it so that's the and part and there is
[00:16:17] that's the and part and there is something true about this course what is
[00:16:19] something true about this course what is true about this course every student has
[00:16:22] true about this course every student has taken it
[00:16:23] taken it so how do we write every student has
[00:16:25] so how do we write every student has taken it what that says is for all x
[00:16:27] taken it what that says is for all x when x is a student that implies that x
[00:16:31] when x is a student that implies that x that student has taken the course y
[00:16:34] that student has taken the course y okay all right so that's the first
[00:16:36] okay all right so that's the first example
[00:16:37] example let's look at the second example so
[00:16:39] let's look at the second example so every if you've seen this example
[00:16:41] every if you've seen this example earlier uh in this module so every even
[00:16:44] earlier uh in this module so every even integer greater than 2 is the sum of is
[00:16:46] integer greater than 2 is the sum of is the sum of two primes how do we write
[00:16:48] the sum of two primes how do we write this so it says every even integer so if
[00:16:51] this so it says every even integer so if it says every if i see every then i
[00:16:54] it says every if i see every then i would expect to see it for all x and
[00:16:56] would expect to see it for all x and some implication that comes later so so
[00:16:58] some implication that comes later so so what is that x for all x that x is an
[00:17:01] what is that x for all x that x is an even integer and this is greater than
[00:17:03] even integer and this is greater than two so so for every even integer greater
[00:17:06] two so so for every even integer greater than two so even integer greater than
[00:17:08] than two so even integer greater than two
[00:17:09] two what do i get for that even integer
[00:17:10] what do i get for that even integer greater than two that implies that
[00:17:13] greater than two that implies that that that integer is going to be the sum
[00:17:15] that that integer is going to be the sum of two primes so so how do i write that
[00:17:18] of two primes so so how do i write that i'm going to use there exists y and
[00:17:20] i'm going to use there exists y and there exists c to correspond to those
[00:17:22] there exists c to correspond to those two primes right so that integer is
[00:17:24] two primes right so that integer is going to be sum of two primes there
[00:17:27] going to be sum of two primes there exists a prime and there exists another
[00:17:28] exists a prime and there exists another prime there exists a y and there is
[00:17:30] prime there exists a y and there is another z so so that i i i can say
[00:17:34] another z so so that i i i can say sum
[00:17:35] sum and y is a prime and z is a prime and
[00:17:38] and y is a prime and z is a prime and sum of y and z is equal to x that
[00:17:41] sum of y and z is equal to x that integer
[00:17:42] integer okay
[00:17:43] okay all
[00:17:44] all right let's look at another example if a
[00:17:48] right let's look at another example if a student takes a course and the course
[00:17:50] student takes a course and the course covers a concept then the student knows
[00:17:53] covers a concept then the student knows the concept
[00:17:54] the concept so if it's kind of like every remember
[00:17:57] so if it's kind of like every remember like every was also kind of like if
[00:17:59] like every was also kind of like if right like when we're seeing every we
[00:18:00] right like when we're seeing every we would say for all implies if it's kind
[00:18:02] would say for all implies if it's kind of the same thing if it's basically
[00:18:04] of the same thing if it's basically saying hey if a student basically means
[00:18:06] saying hey if a student basically means for every student so so if a student
[00:18:09] for every student so so if a student takes the course and the course covers
[00:18:12] takes the course and the course covers the concept then the student knows the
[00:18:14] the concept then the student knows the concept so for all x
[00:18:16] concept so for all x x being a student okay
[00:18:18] x being a student okay and
[00:18:19] and x takes a course y so for all course y
[00:18:22] x takes a course y so for all course y and for all concepts c so so these four
[00:18:24] and for all concepts c so so these four alls are for all student and for all
[00:18:27] alls are for all student and for all course and for all concept if x is a
[00:18:30] course and for all concept if x is a student and x takes y and y is a course
[00:18:34] student and x takes y and y is a course and y covers z if you want it to be like
[00:18:37] and y covers z if you want it to be like pedantic and right we should have had
[00:18:38] pedantic and right we should have had and z is a concept but i'm skipping that
[00:18:42] and z is a concept but i'm skipping that then what does that tell me
[00:18:43] then what does that tell me then the student knows the concept right
[00:18:45] then the student knows the concept right the thing that comes after a comma is
[00:18:47] the thing that comes after a comma is the thing that comes after after this
[00:18:49] the thing that comes after after this implication right then the student x
[00:18:52] implication right then the student x knows the concept c
[00:18:54] knows the concept c okay all right so that was going from
[00:18:57] okay all right so that was going from natural language to first order logic
[00:19:00] natural language to first order logic and and we're able to talk about the
[00:19:02] and and we're able to talk about the syntax of first order logic using terms
[00:19:04] syntax of first order logic using terms and then using using these formulas
[00:19:06] and then using using these formulas defined over terms so so now let's talk
[00:19:08] defined over terms so so now let's talk about the semantics of first order logic
[00:19:11] about the semantics of first order logic how do we define the meaning or
[00:19:12] how do we define the meaning or semantics for this first order logic so
[00:19:14] semantics for this first order logic so if you remember the semantics how we
[00:19:16] if you remember the semantics how we define semantics for propositional logic
[00:19:19] define semantics for propositional logic they defined it by this using this idea
[00:19:21] they defined it by this using this idea of models right like models representing
[00:19:24] of models right like models representing a particular situation in the world okay
[00:19:27] a particular situation in the world okay so in propositional logic and model w
[00:19:30] so in propositional logic and model w was a world
[00:19:31] was a world that was mapping prepositional symbols
[00:19:33] that was mapping prepositional symbols to truth values it was it was a truth
[00:19:35] to truth values it was it was a truth assignment to propositional symbols so
[00:19:38] assignment to propositional symbols so if going back to my original example
[00:19:40] if going back to my original example alice knows arithmetic and bob knows
[00:19:41] alice knows arithmetic and bob knows arithmetic if i were to write that in in
[00:19:44] arithmetic if i were to write that in in propositional logic i would basically
[00:19:46] propositional logic i would basically have these propositional symbols alice
[00:19:48] have these propositional symbols alice knows arithmetic bob knows arithmetic
[00:19:50] knows arithmetic bob knows arithmetic and a model would assign one or zero to
[00:19:53] and a model would assign one or zero to each one of these prepositional symbols
[00:19:55] each one of these prepositional symbols that was in prepositional watching
[00:19:57] that was in prepositional watching how do we think about it in first order
[00:19:59] how do we think about it in first order logic
[00:20:01] logic so the reason we think about this in
[00:20:02] so the reason we think about this in first order logic is by having a graph
[00:20:05] first order logic is by having a graph representation for every model so so you
[00:20:08] representation for every model so so you can think about predicates that we have
[00:20:10] can think about predicates that we have been talking about like knowing
[00:20:11] been talking about like knowing arithmetic and so on as either like
[00:20:14] arithmetic and so on as either like unary or binary predicates that are
[00:20:16] unary or binary predicates that are defined over over these terms that we
[00:20:18] defined over over these terms that we are talking about okay so so a model w
[00:20:21] are talking about okay so so a model w can basically be represented by by a
[00:20:24] can basically be represented by by a graph here so so so a w could be a graph
[00:20:27] graph here so so so a w could be a graph where we have these these different
[00:20:29] where we have these these different nodes and each node corresponds to an
[00:20:31] nodes and each node corresponds to an object okay so so an object is going to
[00:20:34] object okay so so an object is going to be represented uh by a node and then
[00:20:37] be represented uh by a node and then we're labeling each node by constant
[00:20:40] we're labeling each node by constant symbol okay so node one o1 is an object
[00:20:43] symbol okay so node one o1 is an object and it's going to be labeled by alice
[00:20:45] and it's going to be labeled by alice and node two might be an object and it's
[00:20:47] and node two might be an object and it's corresponded to like both bob and robert
[00:20:49] corresponded to like both bob and robert and o3 is corresponding to uh is a note
[00:20:52] and o3 is corresponding to uh is a note to corresponding to an object
[00:20:53] to corresponding to an object corresponding to an arithmetic then what
[00:20:55] corresponding to an arithmetic then what we can do is we can have directed edges
[00:20:57] we can do is we can have directed edges here that that basically talk about
[00:20:59] here that that basically talk about binary predicates so alice knowing
[00:21:02] binary predicates so alice knowing arithmetic corresponds to this directed
[00:21:05] arithmetic corresponds to this directed edge that corresponds to this predicate
[00:21:07] edge that corresponds to this predicate knows apply to alice and alice and
[00:21:10] knows apply to alice and alice and arithmetic and for for unary predicates
[00:21:12] arithmetic and for for unary predicates we're basically joining just going to
[00:21:13] we're basically joining just going to have the predicate on top of the note so
[00:21:16] have the predicate on top of the note so alice being a student okay
[00:21:19] alice being a student okay all right so that defines a model here
[00:21:21] all right so that defines a model here so so a model in the first order logic
[00:21:24] so so a model in the first order logic basically has two components one is that
[00:21:27] basically has two components one is that it basically has constant symbols
[00:21:29] it basically has constant symbols assigned to two objects so so so
[00:21:32] assigned to two objects so so so basically a model for for alice
[00:21:34] basically a model for for alice corresponds to node 01 and
[00:21:37] corresponds to node 01 and bob corresponds to node o2 and
[00:21:39] bob corresponds to node o2 and arithmetic corresponds to node 03 and
[00:21:41] arithmetic corresponds to node 03 and then we have predicate symbols that's
[00:21:43] then we have predicate symbols that's that are basically giving us these two
[00:21:46] that are basically giving us these two poles of oh one no angle three or o two
[00:21:48] poles of oh one no angle three or o two knowing o3 so so that corresponds to w
[00:21:51] knowing o3 so so that corresponds to w of knowing that predicate of knowing
[00:21:53] of knowing that predicate of knowing which basically gives us these two poles
[00:21:55] which basically gives us these two poles either either they could be binary or
[00:21:57] either either they could be binary or unary depending on depending on their
[00:21:59] unary depending on depending on their predicate
[00:22:01] predicate all right so the way we are defining a
[00:22:03] all right so the way we are defining a model is a little bit more complex than
[00:22:05] model is a little bit more complex than the way we defined modeled in
[00:22:06] the way we defined modeled in propositional logic
[00:22:09] propositional logic so so there are a few other restrictions
[00:22:12] so so there are a few other restrictions that we are putting on models to make
[00:22:13] that we are putting on models to make our lives a little bit easier so so if
[00:22:16] our lives a little bit easier so so if you remember um okay so we can have um
[00:22:18] you remember um okay so we can have um basically a statement that says john and
[00:22:21] basically a statement that says john and bob are students right so how do i write
[00:22:23] bob are students right so how do i write this and and first order logic i can say
[00:22:26] this and and first order logic i can say john a student and and all the students
[00:22:29] john a student and and all the students so students predicate on top of john and
[00:22:31] so students predicate on top of john and bob
[00:22:32] bob so if i think about a model
[00:22:34] so if i think about a model corresponding to that i can have a note
[00:22:36] corresponding to that i can have a note 01 corresponding to john and and student
[00:22:39] 01 corresponding to john and and student predicate student on top of this node
[00:22:41] predicate student on top of this node and i can have node o2 responding to bob
[00:22:44] and i can have node o2 responding to bob and student on top of this
[00:22:46] and student on top of this but that's one option right one could
[00:22:49] but that's one option right one could have other types of models that
[00:22:51] have other types of models that represent this i can have a single note
[00:22:53] represent this i can have a single note and i can say well this person's name is
[00:22:55] and i can say well this person's name is both john and bob maybe john and bob
[00:22:58] both john and bob maybe john and bob are the same people and and uh we're
[00:23:01] are the same people and and uh we're talking about uh both of them being a
[00:23:03] talking about uh both of them being a student so one other option is w2 one
[00:23:06] student so one other option is w2 one other way of representing this model is
[00:23:07] other way of representing this model is w2 where i just write one note or maybe
[00:23:10] w2 where i just write one note or maybe i have three notes maybe i have like
[00:23:12] i have three notes maybe i have like this other unnamed note here that that
[00:23:14] this other unnamed note here that that doesn't have anyone assigned to it so
[00:23:17] doesn't have anyone assigned to it so the restriction that we are putting in
[00:23:18] the restriction that we are putting in here is basically trying to make sure
[00:23:20] here is basically trying to make sure that w2 and w3 doesn't happen so so
[00:23:24] that w2 and w3 doesn't happen so so basically we are putting this unique
[00:23:25] basically we are putting this unique names assumption which says each object
[00:23:28] names assumption which says each object has at most one constant symbol for it
[00:23:30] has at most one constant symbol for it and this basically rules out w and in
[00:23:33] and this basically rules out w and in addition to the w
[00:23:35] addition to the w sorry this rules that w two basically we
[00:23:37] sorry this rules that w two basically we can't have both john and bob associated
[00:23:40] can't have both john and bob associated to the single node to the single object
[00:23:42] to the single node to the single object so we can have at most one constant
[00:23:44] so we can have at most one constant symbol
[00:23:45] symbol and in addition to that you're going to
[00:23:46] and in addition to that you're going to have another assumption on domain
[00:23:48] have another assumption on domain closure which basically says each object
[00:23:50] closure which basically says each object has at least one constant symbol so so
[00:23:53] has at least one constant symbol so so we can't have an object corresponding to
[00:23:55] we can't have an object corresponding to o2 here that doesn't have any symbols
[00:23:57] o2 here that doesn't have any symbols assigned to it so this rules out double
[00:23:59] assigned to it so this rules out double e3 so this basically ensures that when
[00:24:03] e3 so this basically ensures that when when we have a constant symbol a
[00:24:04] when we have a constant symbol a constant symbol is equivalent to having
[00:24:06] constant symbol is equivalent to having an object if i have an object there is
[00:24:08] an object if i have an object there is one single constant
[00:24:10] one single constant constant symbol that is assigned to it
[00:24:12] constant symbol that is assigned to it okay
[00:24:13] okay so why am i trying to do this like what
[00:24:15] so why am i trying to do this like what would this buy me so the thing that
[00:24:18] would this buy me so the thing that despised me this one to one mapping that
[00:24:20] despised me this one to one mapping that we have between constant symbols and
[00:24:22] we have between constant symbols and objects like using using these two
[00:24:23] objects like using using these two assumptions that i've put allows me to
[00:24:26] assumptions that i've put allows me to do to do an operation that called
[00:24:28] do to do an operation that called prepositionalization
[00:24:29] prepositionalization and and what that buys me is actually it
[00:24:31] and and what that buys me is actually it advised me to to be able to use
[00:24:33] advised me to to be able to use inference rules that were in
[00:24:35] inference rules that were in propositional logic the whole reason i'm
[00:24:36] propositional logic the whole reason i'm doing this is so i can i can use ideas
[00:24:39] doing this is so i can i can use ideas from propositional logic like resolution
[00:24:41] from propositional logic like resolution or modus ponens when it comes to
[00:24:42] or modus ponens when it comes to inference um in first order logic okay
[00:24:46] inference um in first order logic okay so if you think about it if we make this
[00:24:48] so if you think about it if we make this restriction then first order logic is
[00:24:51] restriction then first order logic is not anything fancy right it's just
[00:24:52] not anything fancy right it's just syntactic trigger on top of
[00:24:53] syntactic trigger on top of propositional logic it it helps us write
[00:24:56] propositional logic it it helps us write things a little bit more expressively
[00:24:57] things a little bit more expressively and have like an easier time like
[00:24:59] and have like an easier time like writing out things but at the end of the
[00:25:01] writing out things but at the end of the day it's it's kind of like the same
[00:25:03] day it's it's kind of like the same thing the same sort of logic that goes
[00:25:05] thing the same sort of logic that goes behind everything okay so uh for example
[00:25:08] behind everything okay so uh for example if we have like this this example
[00:25:10] if we have like this this example knowledge base in first order logic we
[00:25:12] knowledge base in first order logic we might say alice's student and bob is
[00:25:14] might say alice's student and bob is student and every student is a person so
[00:25:17] student and every student is a person so for all x axis student that implies x is
[00:25:19] for all x axis student that implies x is a person and some students are creative
[00:25:22] a person and some students are creative so for for their existing x or x as a
[00:25:24] so for for their existing x or x as a student and x is created x is creative
[00:25:26] student and x is created x is creative okay so this is my knowledge base in
[00:25:28] okay so this is my knowledge base in first order logic okay one can write
[00:25:30] first order logic okay one can write this exact knowledge base in
[00:25:32] this exact knowledge base in propositional logic right like i can
[00:25:34] propositional logic right like i can actually do that now based on this
[00:25:36] actually do that now based on this assumption that that every constant
[00:25:38] assumption that that every constant symbol has a one-to-one mapping to an
[00:25:40] symbol has a one-to-one mapping to an object i can simply write this in
[00:25:42] object i can simply write this in prepositional logic i can say student
[00:25:44] prepositional logic i can say student alice i can say student pop both of
[00:25:46] alice i can say student pop both of these are our prepositional symbols i
[00:25:49] these are our prepositional symbols i can have an and of them and then student
[00:25:51] can have an and of them and then student alice implies personalis which is
[00:25:52] alice implies personalis which is another another propositional symbol and
[00:25:55] another another propositional symbol and student bob employs person bob
[00:25:57] student bob employs person bob student alice and creative alice or
[00:25:59] student alice and creative alice or student bob and creative bob is what i
[00:26:02] student bob and creative bob is what i get from this last next question


================================================================================
LECTURE 048
================================================================================

Logic 8 - First Order Modus Ponens | Stanford CS221: Artificial Intelligence (Autumn 2021)

Source: https://www.youtube.com/watch?v=mndzhfBpyUw

---

Transcript

[00:00:05] okay so so far we've been talking about
[00:00:08] okay so so far we've been talking about first order logic and it's syntax and
[00:00:10] first order logic and it's syntax and semantics and now what we would like to
[00:00:12] semantics and now what we would like to do is you like to talk about inference
[00:00:14] do is you like to talk about inference rules for first order logic in this
[00:00:16] rules for first order logic in this module we are going to be talking about
[00:00:17] module we are going to be talking about modus ponens when we have only horn
[00:00:20] modus ponens when we have only horn clauses and in the next module we are
[00:00:22] clauses and in the next module we are going to be talking about resolution
[00:00:24] going to be talking about resolution when it comes to first order logic okay
[00:00:27] when it comes to first order logic okay all right so if you remember what
[00:00:29] all right so if you remember what entrance rules do is they basically do
[00:00:31] entrance rules do is they basically do symbol manipulation so they take the
[00:00:33] symbol manipulation so they take the formulas the syntactic form of the
[00:00:35] formulas the syntactic form of the formulas and they have like no notion of
[00:00:37] formulas and they have like no notion of meanings or anything of that form but
[00:00:39] meanings or anything of that form but based on the formulas that are in
[00:00:41] based on the formulas that are in knowledge base they basically try to
[00:00:42] knowledge base they basically try to infer they try to derive or prove a new
[00:00:45] infer they try to derive or prove a new formula based on what exists by
[00:00:47] formula based on what exists by syntactically moving things around kind
[00:00:49] syntactically moving things around kind of like what we have seen in modus
[00:00:51] of like what we have seen in modus ponens for propositional logic right
[00:00:54] ponens for propositional logic right so
[00:00:54] so what we would like to do is you like to
[00:00:56] what we would like to do is you like to focus on applying modus ponens to first
[00:00:58] focus on applying modus ponens to first order logic when we are under a scenario
[00:01:01] order logic when we are under a scenario where we are have only horn clauses
[00:01:03] where we are have only horn clauses and if you remember horn clauses were
[00:01:05] and if you remember horn clauses were definite causes and goal clauses and
[00:01:07] definite causes and goal clauses and definite clauses were of the form of
[00:01:09] definite clauses were of the form of having some some set of prepositional
[00:01:11] having some some set of prepositional symbols so p and q for example implying
[00:01:15] symbols so p and q for example implying pp1 and p2 for example implying some sum
[00:01:17] pp1 and p2 for example implying some sum q so some positive literals ended with
[00:01:20] q so some positive literals ended with each other implying a new positive
[00:01:22] each other implying a new positive literal so how do we extend that idea of
[00:01:25] literal so how do we extend that idea of definite clause to the space of first
[00:01:27] definite clause to the space of first order watching
[00:01:28] order watching so if you want to look at definite
[00:01:29] so if you want to look at definite causes in first order logic you're going
[00:01:31] causes in first order logic you're going to have a set of variables and you're
[00:01:33] to have a set of variables and you're going to have quantifiers on top of them
[00:01:35] going to have quantifiers on top of them so for example
[00:01:36] so for example this is an example of a definite clause
[00:01:38] this is an example of a definite clause where we talk about for all x for all y
[00:01:40] where we talk about for all x for all y for all z and then we have these
[00:01:42] for all z and then we have these predicates takes x and y
[00:01:44] predicates takes x and y and ended with another predicate covers
[00:01:47] and ended with another predicate covers y and z and that implies a whole new
[00:01:49] y and z and that implies a whole new predicate knows x and z okay so we have
[00:01:51] predicate knows x and z okay so we have kind of like these atomic formulas ended
[00:01:54] kind of like these atomic formulas ended with each other and we have we have a
[00:01:56] with each other and we have we have a set of quantifiers outside and this
[00:01:58] set of quantifiers outside and this implication okay
[00:02:00] implication okay so basically if you prepositionalize
[00:02:03] so basically if you prepositionalize here we get one formula for each value
[00:02:06] here we get one formula for each value of x x y and z so so if you remember
[00:02:08] of x x y and z so so if you remember prepositionalization from the from the
[00:02:10] prepositionalization from the from the last module what we can do is we can
[00:02:12] last module what we can do is we can basically think about x y and z taking
[00:02:14] basically think about x y and z taking specific values like x being alice and y
[00:02:17] specific values like x being alice and y being cs221 and z being mdp and if you
[00:02:20] being cs221 and z being mdp and if you think about each one of these values
[00:02:22] think about each one of these values like each one of these formulas taking
[00:02:24] like each one of these formulas taking one value for uh for each x for each y
[00:02:27] one value for uh for each x for each y and for each c and then we end up with a
[00:02:29] and for each c and then we end up with a prepositional logic formula that ends up
[00:02:31] prepositional logic formula that ends up being actually definite clauses
[00:02:34] being actually definite clauses but we would like to be able to
[00:02:35] but we would like to be able to represent this in this more expressive
[00:02:37] represent this in this more expressive way and and
[00:02:39] way and and because of that we are defining definite
[00:02:41] because of that we are defining definite clauses in first order logic using these
[00:02:43] clauses in first order logic using these variables and using these quantifiers
[00:02:45] variables and using these quantifiers and so on so more formally a definite
[00:02:48] and so on so more formally a definite clause has the following form so so it
[00:02:50] clause has the following form so so it has this form of having a for all
[00:02:52] has this form of having a for all quantifier for x1 for all through for
[00:02:56] quantifier for x1 for all through for all xn where x1 and x8 through xn are
[00:02:58] all xn where x1 and x8 through xn are variables
[00:03:00] variables and and we have these atomic formulas a1
[00:03:03] and and we have these atomic formulas a1 through a k and b all of these are
[00:03:04] through a k and b all of these are atomic formulas and we are adding these
[00:03:07] atomic formulas and we are adding these atomic formulas a1 through a k and that
[00:03:09] atomic formulas a1 through a k and that implies b and remember these atomic
[00:03:11] implies b and remember these atomic formulas actually contain these
[00:03:13] formulas actually contain these variables x1 through xn so so they are
[00:03:16] variables x1 through xn so so they are they actually have x1 through xn inside
[00:03:18] they actually have x1 through xn inside of them containing them okay kind of
[00:03:20] of them containing them okay kind of like this example up here
[00:03:23] like this example up here all right so this is a definite clause
[00:03:25] all right so this is a definite clause in first order logic so how can we do
[00:03:28] in first order logic so how can we do modus ponens in in first order logic
[00:03:31] modus ponens in in first order logic okay so if this is a definite clause for
[00:03:34] okay so if this is a definite clause for all x one through x and a one added
[00:03:36] all x one through x and a one added through a k implies b
[00:03:38] through a k implies b one possible attempt maybe our first
[00:03:40] one possible attempt maybe our first attempt in looking at modus ponens is we
[00:03:43] attempt in looking at modus ponens is we have this and in addition to that maybe
[00:03:45] have this and in addition to that maybe in our knowledge base we have a one
[00:03:47] in our knowledge base we have a one through a k and based on that we should
[00:03:49] through a k and based on that we should be able to derive b based on these
[00:03:52] be able to derive b based on these premises maybe we can conclude b okay
[00:03:56] premises maybe we can conclude b okay so does this work does this does this
[00:03:58] so does this work does this does this definition of modus ponens work
[00:04:01] definition of modus ponens work let's look at an example so it turns out
[00:04:02] let's look at an example so it turns out that it actually doesn't work so so
[00:04:04] that it actually doesn't work so so let's look at this example where we have
[00:04:07] let's look at this example where we have p of alice so p is a predicate over
[00:04:09] p of alice so p is a predicate over alice and maybe that defines our a1
[00:04:12] alice and maybe that defines our a1 and then we say for all x p of x implies
[00:04:15] and then we say for all x p of x implies q of x okay
[00:04:17] q of x okay and ideally what should i get from this
[00:04:20] and ideally what should i get from this ideally i would like to get q of b from
[00:04:22] ideally i would like to get q of b from this
[00:04:23] this but i'm really not able to do that well
[00:04:25] but i'm really not able to do that well why am i not able to do that because
[00:04:27] why am i not able to do that because remember modus ponens is an inference
[00:04:29] remember modus ponens is an inference rule inference rules don't really know
[00:04:31] rule inference rules don't really know anything about semantics or meanings so
[00:04:33] anything about semantics or meanings so they're basically just matching symbols
[00:04:35] they're basically just matching symbols and if i'm just matching symbols first
[00:04:37] and if i'm just matching symbols first off p of alice has nothing to do with p
[00:04:39] off p of alice has nothing to do with p of x so i can't really match p of alice
[00:04:41] of x so i can't really match p of alice and p of x so i'm kind of like screwed i
[00:04:43] and p of x so i'm kind of like screwed i can't like this i can't apply this modus
[00:04:45] can't like this i can't apply this modus ponens idea on top of it and then in
[00:04:48] ponens idea on top of it and then in addition to that if i even if like i
[00:04:50] addition to that if i even if like i could somehow say p of alice and p of x
[00:04:52] could somehow say p of alice and p of x are the same thing i i'm not going to be
[00:04:54] are the same thing i i'm not going to be able to get
[00:04:56] able to get here q of alice because q of q of alice
[00:05:00] here q of alice because q of q of alice and q of x are very different things so
[00:05:02] and q of x are very different things so i can't infer q of alice and then i also
[00:05:05] i can't infer q of alice and then i also can't really like match p of x and p of
[00:05:07] can't really like match p of x and p of ours they don't really match here so
[00:05:09] ours they don't really match here so this modus ponens like the rule that
[00:05:11] this modus ponens like the rule that i've written here just doesn't work this
[00:05:13] i've written here just doesn't work this is not the modus ponens that we should
[00:05:15] is not the modus ponens that we should be using in first order logic
[00:05:17] be using in first order logic so how are we going to solve this
[00:05:19] so how are we going to solve this so there are two ideas that i'm going to
[00:05:21] so there are two ideas that i'm going to be talking about in this module
[00:05:22] be talking about in this module substitution and unification and
[00:05:25] substitution and unification and substitution and unification are the
[00:05:27] substitution and unification are the things that are going to make our
[00:05:29] things that are going to make our improve our modus ponens and help us
[00:05:31] improve our modus ponens and help us apply modus ponens in first order logic
[00:05:33] apply modus ponens in first order logic so let's look at what they are so what
[00:05:35] so let's look at what they are so what is substitution
[00:05:37] is substitution so what substitution does is it takes a
[00:05:39] so what substitution does is it takes a substitution rule that substitutes a
[00:05:42] substitution rule that substitutes a variable with a term
[00:05:44] variable with a term and it takes a formula and it basically
[00:05:46] and it takes a formula and it basically takes that formula and substitutes all
[00:05:49] takes that formula and substitutes all those variables with those terms that it
[00:05:51] those variables with those terms that it is given okay
[00:05:53] is given okay one thing to notice is it's going to
[00:05:54] one thing to notice is it's going to substitute a variable like x with a term
[00:05:57] substitute a variable like x with a term and what is a term if you remember our
[00:06:00] and what is a term if you remember our module on syntax of first order logic a
[00:06:03] module on syntax of first order logic a term is going to be either a constant
[00:06:05] term is going to be either a constant symbol
[00:06:06] symbol or another variable
[00:06:08] or another variable or a function okay so here in this
[00:06:10] or a function okay so here in this example alice is a constant symbol so
[00:06:12] example alice is a constant symbol so i'm replacing a variable x with a
[00:06:15] i'm replacing a variable x with a constant symbol alice okay
[00:06:18] constant symbol alice okay here's another example so i'm
[00:06:20] here's another example so i'm substituting x with alice and i'm
[00:06:22] substituting x with alice and i'm substituting y with z with another
[00:06:24] substituting y with z with another variable in this formula p of x and k of
[00:06:27] variable in this formula p of x and k of x and y okay so so i'm doing find and
[00:06:29] x and y okay so so i'm doing find and replace basically i'm doing find x
[00:06:31] replace basically i'm doing find x replace it by alice find x replace it by
[00:06:34] replace it by alice find x replace it by hours find y replace it by z
[00:06:36] hours find y replace it by z and and that is what substitution does
[00:06:39] and and that is what substitution does so substitution theta it's a mapping
[00:06:41] so substitution theta it's a mapping from variables to terms and substitute
[00:06:44] from variables to terms and substitute theta f returns basically the result of
[00:06:47] theta f returns basically the result of performing that substitution on a
[00:06:49] performing that substitution on a formula
[00:06:51] formula okay
[00:06:53] okay all right so what does unification do so
[00:06:55] all right so what does unification do so that was substitution that's great so
[00:06:57] that was substitution that's great so what does unification do so what
[00:06:58] what does unification do so what unification does is it takes two
[00:07:00] unification does is it takes two formulas and it tries to match them as
[00:07:03] formulas and it tries to match them as closely as possible and unification
[00:07:06] closely as possible and unification returns a substitution rule that tries
[00:07:08] returns a substitution rule that tries to match those formulas as close as
[00:07:10] to match those formulas as close as possible so if i'm doing unify knows
[00:07:13] possible so if i'm doing unify knows alice arithmetic and knows x arithmetic
[00:07:16] alice arithmetic and knows x arithmetic i have these two formulas i try to match
[00:07:18] i have these two formulas i try to match them as close as possible and the
[00:07:20] them as close as possible and the substitution rule that matches these as
[00:07:22] substitution rule that matches these as close as possible is replace the
[00:07:24] close as possible is replace the variable x by alps okay so that is what
[00:07:27] variable x by alps okay so that is what i'm going to reject
[00:07:29] i'm going to reject let's look at another example so i might
[00:07:31] let's look at another example so i might have unify knows alice y and knows x z
[00:07:34] have unify knows alice y and knows x z okay so what is a substitution rule that
[00:07:36] okay so what is a substitution rule that gets me there i'm going to get a
[00:07:38] gets me there i'm going to get a substitution rule that says
[00:07:40] substitution rule that says replace x variable x by alice replace
[00:07:43] replace x variable x by alice replace variable y by z and and that is going to
[00:07:46] variable y by z and and that is going to be the substitution rule that i'll get
[00:07:47] be the substitution rule that i'll get out of unifying these two formulas
[00:07:51] out of unifying these two formulas here's another example so i have unified
[00:07:53] here's another example so i have unified knows alice y and knows biopsy
[00:07:56] knows alice y and knows biopsy so this is going to return fail the
[00:07:57] so this is going to return fail the reason it's going to return fail is i'm
[00:08:00] reason it's going to return fail is i'm not going to be able to substitute a
[00:08:02] not going to be able to substitute a symbol a constant symbol by another
[00:08:04] symbol a constant symbol by another constant symbol remember we are
[00:08:05] constant symbol remember we are substituting variables by terms and
[00:08:07] substituting variables by terms and there are no variables here to
[00:08:09] there are no variables here to substitute there are two there are two
[00:08:11] substitute there are two there are two constant symbols or two terms right so
[00:08:13] constant symbols or two terms right so so i'm not going to be able to
[00:08:14] so i'm not going to be able to substitute these so i'm going to get
[00:08:16] substitute these so i'm going to get failed from unification here
[00:08:18] failed from unification here and here's another example so i might
[00:08:20] and here's another example so i might have knows alice and y and knows x and f
[00:08:22] have knows alice and y and knows x and f of x a function here right so so here a
[00:08:26] of x a function here right so so here a substitution rule is take the variable x
[00:08:28] substitution rule is take the variable x replace it by alice and takes variable y
[00:08:31] replace it by alice and takes variable y and replace it by f of l so i'm taking
[00:08:33] and replace it by f of l so i'm taking the most general form of this where i
[00:08:36] the most general form of this where i could have f of x here but because i've
[00:08:38] could have f of x here but because i've already i already know in my
[00:08:39] already i already know in my substitution rule that x needs to be
[00:08:41] substitution rule that x needs to be replaced by alice instead of putting f
[00:08:43] replaced by alice instead of putting f of x here i'm putting f of alex i've
[00:08:45] of x here i'm putting f of alex i've already like replaced x by als okay
[00:08:48] already like replaced x by als okay so what is unification well what does it
[00:08:50] so what is unification well what does it do more formally it takes two formulas f
[00:08:52] do more formally it takes two formulas f and g and it returns the substitution
[00:08:55] and g and it returns the substitution which is the most general form of a
[00:08:57] which is the most general form of a unifier so to unify f and g two formulas
[00:09:00] unifier so to unify f and g two formulas return of theta so then if i if i do
[00:09:03] return of theta so then if i if i do substitution of theta and f that gives
[00:09:05] substitution of theta and f that gives me the same thing as substitution of
[00:09:08] me the same thing as substitution of theta in g and it returns fail if such a
[00:09:11] theta in g and it returns fail if such a such a substitution doesn't exist okay
[00:09:14] such a substitution doesn't exist okay so why am i defining these so the reason
[00:09:16] so why am i defining these so the reason i'm defining unification and
[00:09:17] i'm defining unification and substitution is i can now modify my
[00:09:20] substitution is i can now modify my modus ponens and i can use this idea of
[00:09:22] modus ponens and i can use this idea of substitution and unification in order to
[00:09:24] substitution and unification in order to make modus ponens work in first order
[00:09:26] make modus ponens work in first order logic so here i'm going to have
[00:09:29] logic so here i'm going to have different a1 prime through a k prime
[00:09:31] different a1 prime through a k prime these atomic formulas from a1 through ak
[00:09:34] these atomic formulas from a1 through ak and different b prime than b these are
[00:09:36] and different b prime than b these are going to be different atomic formulas
[00:09:38] going to be different atomic formulas okay specifically if you think about it
[00:09:41] okay specifically if you think about it these a ones prime through a k prime are
[00:09:44] these a ones prime through a k prime are are groundings of this a1 through a k
[00:09:47] are groundings of this a1 through a k which basically operate on on these
[00:09:50] which basically operate on on these these variables x's and b again operates
[00:09:53] these variables x's and b again operates on a variable on x and b prime you can
[00:09:55] on a variable on x and b prime you can think of it as a grounding of b okay and
[00:09:58] think of it as a grounding of b okay and then b prime and b or a1 prime through a
[00:10:00] then b prime and b or a1 prime through a k prime they don't look the same right
[00:10:02] k prime they don't look the same right so that's why i can't syntactically just
[00:10:04] so that's why i can't syntactically just like replace them by each other but what
[00:10:06] like replace them by each other but what i can do is i can use substitution and
[00:10:08] i can do is i can use substitution and unification what i can do is first up i
[00:10:10] unification what i can do is first up i can look at my a1 prime through ak prime
[00:10:12] can look at my a1 prime through ak prime my groundings and then these other
[00:10:14] my groundings and then these other atomic formulas a1 through ak and i can
[00:10:17] atomic formulas a1 through ak and i can unify them
[00:10:18] unify them so once i unify them i get a
[00:10:20] so once i unify them i get a substitution rule theta
[00:10:23] substitution rule theta and what i can do is i can i can derive
[00:10:25] and what i can do is i can i can derive b prime and what is b prime b prime is
[00:10:28] b prime and what is b prime b prime is going to be the result of substituting
[00:10:30] going to be the result of substituting theta in b and that is going to be my
[00:10:33] theta in b and that is going to be my new modus ponens rule okay so i'll end
[00:10:36] new modus ponens rule okay so i'll end up getting a grounded version of b b
[00:10:39] up getting a grounded version of b b prime and how do i get that by
[00:10:41] prime and how do i get that by substituting theta and b and where do i
[00:10:43] substituting theta and b and where do i get theta i get theta by unifying a1
[00:10:45] get theta i get theta by unifying a1 prime through a k prime and a1 through
[00:10:47] prime through a k prime and a1 through ak okay let's look at an example
[00:10:51] ak okay let's look at an example so let's say that in my knowledge base i
[00:10:53] so let's say that in my knowledge base i have a premise that says alice takes
[00:10:54] have a premise that says alice takes cs221 so this is my first version a1
[00:10:57] cs221 so this is my first version a1 prime which is a grounded version of x
[00:11:00] prime which is a grounded version of x taking y and then i have
[00:11:03] taking y and then i have cs221 covers mvp again it's a grounded
[00:11:05] cs221 covers mvp again it's a grounded version of why taking z so what i do is
[00:11:08] version of why taking z so what i do is first i do a unification of of these two
[00:11:12] first i do a unification of of these two formulas and these two formulas and
[00:11:14] formulas and these two formulas and based on that unification i'm going to
[00:11:16] based on that unification i'm going to get a substitution rule that
[00:11:17] get a substitution rule that substitution rule tells me that take
[00:11:20] substitution rule tells me that take variable x replace it by alice take
[00:11:22] variable x replace it by alice take variable b means sorry take takes and
[00:11:24] variable b means sorry take takes and take variable by replace it by cs221
[00:11:27] take variable by replace it by cs221 variable z replace it by mvp
[00:11:29] variable z replace it by mvp and then what am i going to do right
[00:11:31] and then what am i going to do right what am i going to return out of modus
[00:11:32] what am i going to return out of modus ponens modus ponens basically tells me
[00:11:34] ponens modus ponens basically tells me that this is your b you want to return
[00:11:36] that this is your b you want to return kind of a modified version of b and what
[00:11:38] kind of a modified version of b and what is that modified version that modified
[00:11:41] is that modified version that modified version is using this substitution rule
[00:11:44] version is using this substitution rule over your b or over this nose xc so if i
[00:11:47] over your b or over this nose xc so if i if i substitute theta in nose xc i'm
[00:11:50] if i substitute theta in nose xc i'm going to get alice knows mvp and that is
[00:11:52] going to get alice knows mvp and that is the thing i'm going to be returning that
[00:11:54] the thing i'm going to be returning that is the thing that i'm going to be
[00:11:55] is the thing that i'm going to be deriving here or proving here and that's
[00:11:58] deriving here or proving here and that's basically applying modus ponens in first
[00:12:00] basically applying modus ponens in first order logic
[00:12:02] order logic so let's think about the complexity of
[00:12:04] so let's think about the complexity of this is this how is the time complexity
[00:12:07] this is this how is the time complexity here and and how how bad is this so if
[00:12:10] here and and how how bad is this so if you remember
[00:12:11] you remember when we were doing modus ponens in
[00:12:12] when we were doing modus ponens in propositional logic every time we were
[00:12:15] propositional logic every time we were running mode exponents we were adding
[00:12:16] running mode exponents we were adding one we were adding one prepositional
[00:12:19] one we were adding one prepositional prepositional symbol right in in the
[00:12:21] prepositional symbol right in in the prepositional logic line here every time
[00:12:24] prepositional logic line here every time you're running modus bonus you're only
[00:12:25] you're running modus bonus you're only adding one atomic formula which is which
[00:12:28] adding one atomic formula which is which is not bad which is actually pretty good
[00:12:30] is not bad which is actually pretty good okay and in addition to that if you
[00:12:31] okay and in addition to that if you don't have any functions right if if
[00:12:34] don't have any functions right if if there are no functions going on here
[00:12:35] there are no functions going on here then the number of these atomic formulas
[00:12:37] then the number of these atomic formulas is at most the number of constant
[00:12:39] is at most the number of constant symbols that we have to the power of
[00:12:41] symbols that we have to the power of maximum predicate area so so in this
[00:12:43] maximum predicate area so so in this example for example i might have p of x
[00:12:46] example for example i might have p of x y and z and maybe x takes a hundred
[00:12:48] y and z and maybe x takes a hundred values y takes a hundred values and z
[00:12:51] values y takes a hundred values and z takes a hundred values
[00:12:52] takes a hundred values so then i'm gonna get a hundred to the
[00:12:55] so then i'm gonna get a hundred to the power of three which is which is not bad
[00:12:58] power of three which is which is not bad but the thing is if there are functions
[00:13:00] but the thing is if there are functions here
[00:13:01] here then we actually end up with an
[00:13:04] then we actually end up with an infinite number of them being applied to
[00:13:06] infinite number of them being applied to each other so this becomes unbounded so
[00:13:08] each other so this becomes unbounded so if i have if i have a function over a i
[00:13:10] if i have if i have a function over a i can keep applying that and then i end up
[00:13:13] can keep applying that and then i end up with an infinite number of things being
[00:13:15] with an infinite number of things being added in because i can keep applying the
[00:13:17] added in because i can keep applying the function on it it on it so remember like
[00:13:20] function on it it on it so remember like for example the sum uh function that we
[00:13:22] for example the sum uh function that we saw like earlier in one of the examples
[00:13:24] saw like earlier in one of the examples we had sum of three and x right so i can
[00:13:26] we had sum of three and x right so i can keep applying some on top of each other
[00:13:28] keep applying some on top of each other on top of itself and almost like
[00:13:30] on top of itself and almost like recreate like arithmetic by applying
[00:13:32] recreate like arithmetic by applying some on itself but we are going to get
[00:13:35] some on itself but we are going to get an unbounded number of formulas here
[00:13:36] an unbounded number of formulas here which is not that great okay
[00:13:39] which is not that great okay all right what else do we know about
[00:13:41] all right what else do we know about modus ponens so so the thing in modus
[00:13:43] modus ponens so so the thing in modus ponens in this space of first order
[00:13:44] ponens in this space of first order logic so what we know is modus ponens
[00:13:47] logic so what we know is modus ponens turns out to be complete for for first
[00:13:49] turns out to be complete for for first order logic with only horn clauses this
[00:13:52] order logic with only horn clauses this is a similar type of completeness that
[00:13:54] is a similar type of completeness that we have uh when we look at uh when we
[00:13:57] we have uh when we look at uh when we look at modus ponens and prepositional
[00:13:59] look at modus ponens and prepositional logic again if you are limited to horn
[00:14:01] logic again if you are limited to horn causes we have completeness in first
[00:14:03] causes we have completeness in first order logic as well
[00:14:05] order logic as well in addition to that we know that first
[00:14:07] in addition to that we know that first order logic only when it is even only
[00:14:09] order logic only when it is even only when it is restricted to horn causes is
[00:14:11] when it is restricted to horn causes is semi-decidable so so what does that mean
[00:14:14] semi-decidable so so what does that mean what that means is if that is that if
[00:14:16] what that means is if that is that if our knowledge base entails f
[00:14:19] our knowledge base entails f and then we want to figure out if it
[00:14:20] and then we want to figure out if it entails f or not but if it actually
[00:14:23] entails f or not but if it actually entails f and we keep doing forward
[00:14:25] entails f and we keep doing forward inference we keep trying to derive a new
[00:14:28] inference we keep trying to derive a new formula until convergence uh using modus
[00:14:31] formula until convergence uh using modus ponens this forward and inference until
[00:14:35] ponens this forward and inference until kind of like complete it until we get
[00:14:36] kind of like complete it until we get like these complete inference rules and
[00:14:39] like these complete inference rules and and getting f takes finite time so so if
[00:14:42] and getting f takes finite time so so if my knowledge base actually entails f i
[00:14:44] my knowledge base actually entails f i should be able to derive f in finite
[00:14:46] should be able to derive f in finite time i should be able to prove f by just
[00:14:48] time i should be able to prove f by just inference rules and finite time which is
[00:14:51] inference rules and finite time which is pretty nice
[00:14:52] pretty nice but with the difficulty that gets me to
[00:14:54] but with the difficulty that gets me to semi-decidability is if knowledge base
[00:14:56] semi-decidability is if knowledge base doesn't entail f and i might not know if
[00:14:59] doesn't entail f and i might not know if knowledge base entails it for a dozen
[00:15:00] knowledge base entails it for a dozen entailment right like if i don't know
[00:15:02] entailment right like if i don't know and actually knowledge base doesn't
[00:15:04] and actually knowledge base doesn't entail f it turns out that there are no
[00:15:06] entail f it turns out that there are no algorithms that can show this in finance
[00:15:09] algorithms that can show this in finance time
[00:15:10] time okay so so
[00:15:11] okay so so and then this is actually kind of
[00:15:12] and then this is actually kind of related to halting problems so so there
[00:15:15] related to halting problems so so there actually people have shown that there is
[00:15:16] actually people have shown that there is like no algorithms that that could do
[00:15:18] like no algorithms that that could do this in finite time and we are kind of
[00:15:20] this in finite time and we are kind of screwed in that case okay
[00:15:22] screwed in that case okay so in general this is not too bad in
[00:15:25] so in general this is not too bad in general you can think about having a
[00:15:27] general you can think about having a budget for the amount of time that
[00:15:29] budget for the amount of time that you're going to run your inference role
[00:15:31] you're going to run your inference role and run it and see if if you get lucky
[00:15:33] and run it and see if if you get lucky and nkb actually entails f you're going
[00:15:36] and nkb actually entails f you're going to be able to get f in finite time so so
[00:15:39] to be able to get f in finite time so so you could actually run uh first modus
[00:15:42] you could actually run uh first modus ponens with first order logic when you
[00:15:44] ponens with first order logic when you have horn clauses and and and it does
[00:15:47] have horn clauses and and and it does it does work in some instances but then
[00:15:50] it does work in some instances but then kb actually entails it
[00:15:52] kb actually entails it but then in the next module what we
[00:15:53] but then in the next module what we would like to talk about is you want to
[00:15:55] would like to talk about is you want to go beyond modus ponens and we want to
[00:15:57] go beyond modus ponens and we want to talk about resolution specifically how
[00:15:59] talk about resolution specifically how resolution would work in first order
[00:16:01] resolution would work in first order logic


================================================================================
LECTURE 049
================================================================================

Logic 9 - First Order Resolution | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=iG_tz7ZjZAI

---

Transcript

[00:00:05] okay so in this module we are going to
[00:00:07] okay so in this module we are going to be talking about the resolution for
[00:00:08] be talking about the resolution for first order logic this is an optional
[00:00:11] first order logic this is an optional module but I think it would be
[00:00:12] module but I think it would be interesting to think about how we could
[00:00:14] interesting to think about how we could apply resolution when we have this more
[00:00:16] apply resolution when we have this more complicated logic this first order logic
[00:00:18] complicated logic this first order logic and so far we have talked about syntax
[00:00:20] and so far we have talked about syntax semantics we have talked about mod
[00:00:21] semantics we have talked about mod exponents when we have horn Clauses in
[00:00:24] exponents when we have horn Clauses in first order logic and now we want to
[00:00:26] first order logic and now we want to extend this idea of applying inference
[00:00:30] extend this idea of applying inference to settings where we we don't
[00:00:32] to settings where we we don't necessarily have horn Clauses so if you
[00:00:34] necessarily have horn Clauses so if you think about first order logic it's not
[00:00:36] think about first order logic it's not really limited to settings where we have
[00:00:38] really limited to settings where we have horn Clauses we sometimes have non-horn
[00:00:40] horn Clauses we sometimes have non-horn Clauses here's actually an example for
[00:00:42] Clauses here's actually an example for all xx's student this implies x no Su y
[00:00:47] all xx's student this implies x no Su y okay so so this Su y this there exist y
[00:00:51] okay so so this Su y this there exist y here is is is going to create a non-horn
[00:00:54] here is is is going to create a non-horn clause and why is that because an
[00:00:55] clause and why is that because an existential quantifier is really a
[00:00:57] existential quantifier is really a glorified or right like it's a glor
[00:01:00] glorified or right like it's a glor disjunction so so what this is basically
[00:01:02] disjunction so so what this is basically getting us is NOS X and and y1 or X Y Y2
[00:01:08] getting us is NOS X and and y1 or X Y Y2 and so on and and and this basically
[00:01:11] and so on and and and this basically creates an or on this side of the
[00:01:13] creates an or on this side of the implication and that makes this
[00:01:15] implication and that makes this particular statement a non-or clause
[00:01:18] particular statement a non-or clause okay so what does that mean that means
[00:01:20] okay so what does that mean that means that I can't just apply modus ponents on
[00:01:21] that I can't just apply modus ponents on it okay so so what can we do here so the
[00:01:25] it okay so so what can we do here so the high level strategy here is that we have
[00:01:27] high level strategy here is that we have this formula we have this first order
[00:01:29] this formula we have this first order logic formula first off you need to
[00:01:31] logic formula first off you need to convert it to to a CNF form form to a
[00:01:33] convert it to to a CNF form form to a conjunctive normal form and this is
[00:01:35] conjunctive normal form and this is similar to before like before even in
[00:01:37] similar to before like before even in prepositional logic then we had
[00:01:38] prepositional logic then we had something that wasn't a horn Clause we
[00:01:41] something that wasn't a horn Clause we were starting with with writing it as a
[00:01:43] were starting with with writing it as a CNF form okay and then after that we
[00:01:46] CNF form okay and then after that we repeatedly apply resolution rule on it
[00:01:48] repeatedly apply resolution rule on it and our resolution rule here is going to
[00:01:50] and our resolution rule here is going to be slightly different from the
[00:01:51] be slightly different from the resolution rule uh that we had in
[00:01:53] resolution rule uh that we had in propositional logic because similar to
[00:01:55] propositional logic because similar to mod exponents we need to do unification
[00:01:57] mod exponents we need to do unification we need to do substitution and and and
[00:01:59] we need to do substitution and and and similarly we change our resolution rule
[00:02:01] similarly we change our resolution rule to actually have that element of
[00:02:03] to actually have that element of unification and substitution here okay
[00:02:06] unification and substitution here okay converting to CNF is also not exactly
[00:02:08] converting to CNF is also not exactly like we did converting to CNF in
[00:02:10] like we did converting to CNF in propositional logic there are going to
[00:02:12] propositional logic there are going to be a few new things that I'm going to
[00:02:14] be a few new things that I'm going to attempt to give you some ideas around it
[00:02:17] attempt to give you some ideas around it but in general I'm just giving like a
[00:02:19] but in general I'm just giving like a high level strategy idea of how you're
[00:02:22] high level strategy idea of how you're going to apply resolution to first order
[00:02:24] going to apply resolution to first order logic this is not a very complete um
[00:02:27] logic this is not a very complete um complete explanation of that and in
[00:02:29] complete explanation of that and in general gets a little bit messy like
[00:02:31] general gets a little bit messy like when you think about applying resolution
[00:02:32] when you think about applying resolution to first order logic so think of this as
[00:02:35] to first order logic so think of this as a big picture like high level strategy
[00:02:37] a big picture like high level strategy and overview uh for applying resolution
[00:02:39] and overview uh for applying resolution here okay all right so so let's start
[00:02:43] here okay all right so so let's start with with a formula let's say this is
[00:02:44] with with a formula let's say this is our formula and we have for all X for
[00:02:47] our formula and we have for all X for all Y and Y is an animal implies X lops
[00:02:50] all Y and Y is an animal implies X lops Y and that implies that there exist in y
[00:02:54] Y and that implies that there exist in y uh so such that y LS X okay so so this
[00:02:57] uh so such that y LS X okay so so this is some some statement some formula and
[00:03:00] is some some statement some formula and and what we would like to do is we would
[00:03:01] and what we would like to do is we would like to convert this to a CNF form so
[00:03:03] like to convert this to a CNF form so how does a CNF form look like in first
[00:03:05] how does a CNF form look like in first order logic so at the end of the day the
[00:03:08] order logic so at the end of the day the output is going to look something like
[00:03:10] output is going to look something like this for us so it's going to be an end
[00:03:13] this for us so it's going to be an end of a bunch of Clauses so these are
[00:03:14] of a bunch of Clauses so these are clauses because they have ORS between
[00:03:16] clauses because they have ORS between them and in addition to that we have
[00:03:18] them and in addition to that we have these new functions these capitalized y
[00:03:21] these new functions these capitalized y or capitalized Z and these are called
[00:03:23] or capitalized Z and these are called scolum functions and I'm going to talk
[00:03:25] scolum functions and I'm going to talk about what they are in in in a few
[00:03:27] about what they are in in in a few slides okay so there are few things that
[00:03:30] slides okay so there are few things that are new when we think about the CNF form
[00:03:33] are new when we think about the CNF form the first thing is that all variables
[00:03:35] the first thing is that all variables that I have in this form are actually
[00:03:37] that I have in this form are actually like universally Quantified so there is
[00:03:39] like universally Quantified so there is like a for all X that exists here that
[00:03:42] like a for all X that exists here that I've just dropped okay so so in reality
[00:03:45] I've just dropped okay so so in reality there's for all X that exists here and
[00:03:48] there's for all X that exists here and then there are these scolum functions
[00:03:50] then there are these scolum functions that are functions of of things that are
[00:03:52] that are functions of of things that are existentially uh Quantified right so so
[00:03:55] existentially uh Quantified right so so so basically basically they represent
[00:03:58] so basically basically they represent existential quantifiers here and and
[00:04:00] existential quantifiers here and and they're functions of this x thing that
[00:04:02] they're functions of this x thing that that has a for all X at the beginning on
[00:04:04] that has a for all X at the beginning on it okay so so those are kind of like two
[00:04:06] it okay so so those are kind of like two new things that are happening uh in
[00:04:08] new things that are happening uh in order to get a CNF form of this first
[00:04:11] order to get a CNF form of this first order logic formula let's actually go
[00:04:14] order logic formula let's actually go through an example for this so let's
[00:04:16] through an example for this so let's start with this statement that says
[00:04:19] start with this statement that says anyone who likes all animals is liked by
[00:04:22] anyone who likes all animals is liked by someone okay so one can write this as an
[00:04:25] someone okay so one can write this as an input that says for all X for all y y is
[00:04:28] input that says for all X for all y y is an animal implies X LS Y and that full
[00:04:31] an animal implies X LS Y and that full thing implies that there exist in y so y
[00:04:35] thing implies that there exist in y so y LS X
[00:04:37] LS X okay all right so uh first thing to do
[00:04:41] okay all right so uh first thing to do is similar to before if you want to like
[00:04:43] is similar to before if you want to like follow like the steps of converting the
[00:04:45] follow like the steps of converting the stone CNF form we're going to eliminate
[00:04:48] stone CNF form we're going to eliminate implication so I'm going to eliminate
[00:04:49] implication so I'm going to eliminate this outside imp implication how do I
[00:04:52] this outside imp implication how do I elimin eliminate it I'm going to take
[00:04:54] elimin eliminate it I'm going to take the negation of what comes before it so
[00:04:56] the negation of what comes before it so negation up until here or or the rest of
[00:05:00] negation up until here or or the rest of the statement I'm also going to replace
[00:05:04] the statement I'm also going to replace this um implication by negation of the
[00:05:06] this um implication by negation of the first part or the second part so
[00:05:08] first part or the second part so negation of the first part or the second
[00:05:10] negation of the first part or the second part and we get this this statement okay
[00:05:14] part and we get this this statement okay now I'm going to push negations inwards
[00:05:16] now I'm going to push negations inwards and eliminate double negations this is
[00:05:18] and eliminate double negations this is exactly what we have done before so let
[00:05:20] exactly what we have done before so let me push negations inside and it goes all
[00:05:22] me push negations inside and it goes all the way to negation of love and and now
[00:05:25] the way to negation of love and and now we we have ended up with this formula
[00:05:27] we we have ended up with this formula where we have these quantifiers right
[00:05:29] where we have these quantifiers right like we have these for all and they
[00:05:30] like we have these for all and they exist and so on and everything else is
[00:05:33] exist and so on and everything else is an atomic formula right remember before
[00:05:35] an atomic formula right remember before like when we were trying to convert
[00:05:36] like when we were trying to convert things to a CNF form we would end up
[00:05:38] things to a CNF form we would end up with propositional with propositional
[00:05:40] with propositional with propositional symbols right so we would have we would
[00:05:42] symbols right so we would have we would end up with propositional symbols that
[00:05:44] end up with propositional symbols that that could take a positive or negative
[00:05:46] that could take a positive or negative negative value right so we would have
[00:05:48] negative value right so we would have positive or negative literals at the end
[00:05:49] positive or negative literals at the end of the day but here we have Atomic
[00:05:51] of the day but here we have Atomic formulas so so we end up with this
[00:05:53] formulas so so we end up with this Atomic formulas or negations of these
[00:05:55] Atomic formulas or negations of these Atomic formulas okay so now one thing
[00:05:58] Atomic formulas okay so now one thing that is new is we're kind of
[00:06:00] that is new is we're kind of standardizing the variables here so so
[00:06:03] standardizing the variables here so so we have a y here and we have a y here
[00:06:05] we have a y here and we have a y here but but there is this existential
[00:06:07] but but there is this existential quantify on each of them and these y are
[00:06:09] quantify on each of them and these y are kind of treated as as a local variable
[00:06:12] kind of treated as as a local variable so so in order to kind of like avoid
[00:06:14] so so in order to kind of like avoid confusions you're going to Define like a
[00:06:16] confusions you're going to Define like a new variable for each of them so I'm
[00:06:18] new variable for each of them so I'm going to define a z here and keep this
[00:06:20] going to define a z here and keep this as Y and again the reason I'm doing this
[00:06:23] as Y and again the reason I'm doing this is at the end of the day I'm removing
[00:06:24] is at the end of the day I'm removing this for all X and I want to make sure
[00:06:27] this for all X and I want to make sure that these this Y is a function of x and
[00:06:29] that these this Y is a function of x and this Z is a function of X and these are
[00:06:31] this Z is a function of X and these are different things these are two different
[00:06:33] different things these are two different local variables okay all right so so
[00:06:36] local variables okay all right so so this is new so I'm a standardizing
[00:06:38] this is new so I'm a standardizing variables this is a new Step that is
[00:06:40] variables this is a new Step that is done here okay now now that we are left
[00:06:43] done here okay now now that we are left with this formula what we are going to
[00:06:45] with this formula what we are going to do is we're going to replace all these
[00:06:48] do is we're going to replace all these existential qu existentially Quantified
[00:06:50] existential qu existentially Quantified variables with something that's called a
[00:06:53] variables with something that's called a scolum function okay so so before we had
[00:06:56] scolum function okay so so before we had there exists of Y and and this there
[00:06:58] there exists of Y and and this there exists of Y depend depends on X too
[00:07:00] exists of Y depend depends on X too right so so for all X there exist a y so
[00:07:03] right so so for all X there exist a y so this is really a function of X the
[00:07:05] this is really a function of X the scolum function is a y function of X the
[00:07:08] scolum function is a y function of X the scolum function is y function of X or Z
[00:07:11] scolum function is y function of X or Z is a function of of X so I'm going to
[00:07:13] is a function of of X so I'm going to write these scolum functions as
[00:07:15] write these scolum functions as functions of this variable that is that
[00:07:17] functions of this variable that is that is universally Quantified and then I'm
[00:07:19] is universally Quantified and then I'm going to just drop that I'm going to
[00:07:21] going to just drop that I'm going to later on drop this for all X so that
[00:07:24] later on drop this for all X so that that makes my life easier and then
[00:07:26] that makes my life easier and then finally I need to distribute uh or
[00:07:28] finally I need to distribute uh or overend so I can end up with Clauses in
[00:07:31] overend so I can end up with Clauses in conjunctive normal form and this is a
[00:07:33] conjunctive normal form and this is a similar step that we've had before in
[00:07:35] similar step that we've had before in propositional logic and remove the
[00:07:37] propositional logic and remove the universal quantifiers and this is what I
[00:07:39] universal quantifiers and this is what I would end up at so so now I've end up in
[00:07:41] would end up at so so now I've end up in in in with a formula in CNF form uh in
[00:07:45] in in with a formula in CNF form uh in first order logic okay so just just
[00:07:48] first order logic okay so just just recap what is new in it we have scolum
[00:07:50] recap what is new in it we have scolum functions which kind of represent
[00:07:52] functions which kind of represent existential quantifiers and variables
[00:07:54] existential quantifiers and variables that are universally Quantified I've
[00:07:56] that are universally Quantified I've also dropped the universally quantifier
[00:07:58] also dropped the universally quantifier Universal quantify on all my variables
[00:08:01] Universal quantify on all my variables here okay so that those are kind of what
[00:08:04] here okay so that those are kind of what the core differences here okay so now we
[00:08:06] the core differences here okay so now we are ready to talk about resolution now
[00:08:09] are ready to talk about resolution now that we can write our our formulas our
[00:08:10] that we can write our our formulas our first order logic formulas in CNF form
[00:08:14] first order logic formulas in CNF form then we can write the resolution rule as
[00:08:15] then we can write the resolution rule as follows so we have these Atomic formulas
[00:08:18] follows so we have these Atomic formulas F1 or through FN or p and then we have
[00:08:21] F1 or through FN or p and then we have another thing in our set of premises
[00:08:23] another thing in our set of premises negation of Q or G1 through GM and and
[00:08:27] negation of Q or G1 through GM and and notice that P and Q could be different
[00:08:30] notice that P and Q could be different things because they might just look
[00:08:31] things because they might just look different from each other so what we do
[00:08:33] different from each other so what we do is we unify p and Q and when we unify p
[00:08:37] is we unify p and Q and when we unify p and Q we get a substitution rule Theta
[00:08:40] and Q we get a substitution rule Theta and and then what we can actually infer
[00:08:42] and and then what we can actually infer what we can derive here from resolution
[00:08:45] what we can derive here from resolution is going to be an or of F1 or all the
[00:08:48] is going to be an or of F1 or all the way through FN or G1 all the way through
[00:08:51] way through FN or G1 all the way through GN we are basically canceling out p and
[00:08:53] GN we are basically canceling out p and Q with each other but the reason we can
[00:08:55] Q with each other but the reason we can do that is we have unified p and Q with
[00:08:58] do that is we have unified p and Q with with the substitution rule Theta so so
[00:09:00] with the substitution rule Theta so so in this new formula we are substituting
[00:09:02] in this new formula we are substituting Theta in the formula okay this is kind
[00:09:05] Theta in the formula okay this is kind of similar to the substitution and
[00:09:07] of similar to the substitution and unification that we did in MO in modus
[00:09:09] unification that we did in MO in modus components we're just doing this now on
[00:09:12] components we're just doing this now on resolution on these uh CNF Clauses that
[00:09:15] resolution on these uh CNF Clauses that that we have uh we have just created
[00:09:18] that we have uh we have just created okay let's look at an example here so so
[00:09:20] okay let's look at an example here so so let's say that I have two CNF Clauses uh
[00:09:24] let's say that I have two CNF Clauses uh here so I have animal y ofx or love Z of
[00:09:28] here so I have animal y ofx or love Z of XX and then I have negation of loves un
[00:09:31] XX and then I have negation of loves un and V or feeds un andv okay so loves and
[00:09:35] and V or feeds un andv okay so loves and negation of loves are the things that I
[00:09:37] negation of loves are the things that I would like to I would like to be able to
[00:09:39] would like to I would like to be able to do uh unification on so if I unify these
[00:09:42] do uh unification on so if I unify these two then I'm going to come up with a
[00:09:44] two then I'm going to come up with a substitution rule that says substitute
[00:09:46] substitution rule that says substitute variable U with function C of X
[00:09:49] variable U with function C of X substitute the variable V with with with
[00:09:52] substitute the variable V with with with variable X okay and at the end of the
[00:09:55] variable X okay and at the end of the day the thing that I'm am inferring I'm
[00:09:57] day the thing that I'm am inferring I'm deriving here is going going to
[00:09:59] deriving here is going going to basically cancel out these two and it's
[00:10:02] basically cancel out these two and it's going to get animal y ofx or feeds u and
[00:10:07] going to get animal y ofx or feeds u and v but I'm not going to put u and v in
[00:10:08] v but I'm not going to put u and v in there anymore why why why is that
[00:10:10] there anymore why why why is that because I'm substituting this Theta I'm
[00:10:13] because I'm substituting this Theta I'm substituting uh Z of X for U and X for V
[00:10:17] substituting uh Z of X for U and X for V so so the thing that at the end of the
[00:10:19] so so the thing that at the end of the day I'm I'm proving is animal y of Y of
[00:10:22] day I'm I'm proving is animal y of Y of X or feed Z of XX okay so this is
[00:10:26] X or feed Z of XX okay so this is there's quite a bit of like symbol
[00:10:27] there's quite a bit of like symbol manipulation going on here but kind of
[00:10:29] manipulation going on here but kind of do I get the gist of it it's very
[00:10:30] do I get the gist of it it's very similar to resolution that we have seen
[00:10:32] similar to resolution that we have seen so far combined with unification and
[00:10:34] so far combined with unification and substitution over these new new Clauses
[00:10:37] substitution over these new new Clauses these new CNF Clauses that that we have
[00:10:39] these new CNF Clauses that that we have talked about and that summarizes How We
[00:10:42] talked about and that summarizes How We Do inference uh using resolution in
[00:10:45] Do inference uh using resolution in first order logic


================================================================================
LECTURE 050
================================================================================

Logic 10 - Recap | Stanford CS221: Artificial Intelligence (Autumn 2021)

Source: https://www.youtube.com/watch?v=LYsOjtmLpPo

---

Transcript

[00:00:05] okay so this is the last module of last
[00:00:08] okay so this is the last module of last set of lectures this quarter so
[00:00:11] set of lectures this quarter so let's do a recap of logic so what have
[00:00:14] let's do a recap of logic so what have we talked about
[00:00:15] we talked about so we talked about logic during this
[00:00:17] so we talked about logic during this week we talked about three main
[00:00:18] week we talked about three main ingredients of logic we talked about
[00:00:20] ingredients of logic we talked about syntax which basically defines a set of
[00:00:23] syntax which basically defines a set of formulas allows us to syntactically
[00:00:25] formulas allows us to syntactically symbolically talk about formulas and
[00:00:28] symbolically talk about formulas and talk about things that exist in the
[00:00:29] talk about things that exist in the world so for example i might say rain
[00:00:32] world so for example i might say rain and wet without knowing what rain means
[00:00:34] and wet without knowing what rain means or what means or what this ant symbol
[00:00:36] or what means or what this ant symbol means and and that is in the syntax line
[00:00:38] means and and that is in the syntax line where i just like have symbols and i can
[00:00:40] where i just like have symbols and i can manipulate those symbols
[00:00:42] manipulate those symbols and then i can assign meanings to the
[00:00:44] and then i can assign meanings to the syntax using semantics so the idea of
[00:00:46] syntax using semantics so the idea of semantics is that for every formula you
[00:00:49] semantics is that for every formula you can specify a set of models m of f which
[00:00:52] can specify a set of models m of f which is basically a set of assignments or
[00:00:54] is basically a set of assignments or configurations in the world that assign
[00:00:56] configurations in the world that assign meaning to a formula a syntactic formula
[00:00:59] meaning to a formula a syntactic formula f
[00:01:00] f so in the case of rain and width for
[00:01:02] so in the case of rain and width for example rain can take values 0 and 1
[00:01:04] example rain can take values 0 and 1 when the wet can take values 0 and 1 and
[00:01:07] when the wet can take values 0 and 1 and then in the case of rain and width it
[00:01:09] then in the case of rain and width it would be this darker area that
[00:01:10] would be this darker area that corresponds to the meaning of what does
[00:01:12] corresponds to the meaning of what does it mean when both rain and red is true
[00:01:14] it mean when both rain and red is true okay
[00:01:15] okay so in general when we try to define
[00:01:17] so in general when we try to define logic we need both syntax and semantics
[00:01:20] logic we need both syntax and semantics syntax as a symbol as a way of just
[00:01:22] syntax as a symbol as a way of just writing it out writing out the formula
[00:01:24] writing it out writing out the formula semantics as a way of giving meaning to
[00:01:26] semantics as a way of giving meaning to those formulas and in addition to syntax
[00:01:28] those formulas and in addition to syntax and semantics we talked about inference
[00:01:30] and semantics we talked about inference rules we spent quite a bit of time
[00:01:31] rules we spent quite a bit of time talking about modus ponens and
[00:01:33] talking about modus ponens and resolution for for both propositional
[00:01:35] resolution for for both propositional logic and first order logic as ways of
[00:01:38] logic and first order logic as ways of doing inference on our knowledge base so
[00:01:40] doing inference on our knowledge base so we have a knowledge base which has a
[00:01:42] we have a knowledge base which has a bunch of formulas and the question is
[00:01:44] bunch of formulas and the question is what are some new formulas that we can
[00:01:46] what are some new formulas that we can derive from that knowledge base for
[00:01:48] derive from that knowledge base for example you might have rain and wet and
[00:01:50] example you might have rain and wet and from that i can derive rain right like i
[00:01:53] from that i can derive rain right like i can actually derive and prove that it's
[00:01:55] can actually derive and prove that it's raining so how do we how do we think
[00:01:57] raining so how do we how do we think about inference rules how can we infer
[00:01:59] about inference rules how can we infer new new type of formulas here
[00:02:02] new new type of formulas here and what can we tell about the formulas
[00:02:04] and what can we tell about the formulas that we infer so i think that is also an
[00:02:06] that we infer so i think that is also an interesting question that we have been
[00:02:08] interesting question that we have been talking about so so how do we think
[00:02:09] talking about so so how do we think about inference algorithms so if you
[00:02:11] about inference algorithms so if you have a knowledge base the idea is if we
[00:02:13] have a knowledge base the idea is if we have an inference rule like modus ponens
[00:02:15] have an inference rule like modus ponens or resolution we should repeatedly apply
[00:02:18] or resolution we should repeatedly apply that inference rule until we get new
[00:02:20] that inference rule until we get new formulas and we derive new formulas new
[00:02:22] formulas and we derive new formulas new f and as we get new formulas you're
[00:02:25] f and as we get new formulas you're expanding our knowledge base but we're
[00:02:27] expanding our knowledge base but we're shrinking our the space of models
[00:02:30] shrinking our the space of models because we're adding more constraints if
[00:02:31] because we're adding more constraints if i add a new formula in general and i'm
[00:02:33] i add a new formula in general and i'm shrinking my space if i'm deriving a
[00:02:36] shrinking my space if i'm deriving a formula though if that formula is just
[00:02:38] formula though if that formula is just derived from knowledge base it's not
[00:02:40] derived from knowledge base it's not it's not changing the size of my
[00:02:41] it's not changing the size of my knowledge base
[00:02:43] knowledge base so here is an example so let's say i
[00:02:44] so here is an example so let's say i have wet and weekday and wet and weekday
[00:02:47] have wet and weekday and wet and weekday implying traffic from these three
[00:02:49] implying traffic from these three formulas in my premise what can i
[00:02:52] formulas in my premise what can i conclude what can i infer so we talked
[00:02:54] conclude what can i infer so we talked about modus ponens as an inference rule
[00:02:56] about modus ponens as an inference rule that allows us to infer traffic out of
[00:02:58] that allows us to infer traffic out of this okay
[00:03:00] this okay and then more more generally what does
[00:03:02] and then more more generally what does modus ponens does and what does oneness
[00:03:04] modus ponens does and what does oneness wants to modus points thinks about
[00:03:06] wants to modus points thinks about having a set of propositional symbols p1
[00:03:09] having a set of propositional symbols p1 through pk
[00:03:10] through pk and then a formula that is p1 added
[00:03:13] and then a formula that is p1 added through pk implying q
[00:03:15] through pk implying q and then it basically says we can derive
[00:03:17] and then it basically says we can derive q that is what bonus modus ponens does
[00:03:20] q that is what bonus modus ponens does and then we basically talked about
[00:03:22] and then we basically talked about soundness and completeness of of
[00:03:25] soundness and completeness of of inference rules for example modus ponens
[00:03:27] inference rules for example modus ponens as an example so what does soundness
[00:03:28] as an example so what does soundness mean soundness means that if you're
[00:03:30] mean soundness means that if you're deriving things if you're deriving new
[00:03:32] deriving things if you're deriving new formulas you need to make sure that
[00:03:34] formulas you need to make sure that these new formulas are actually true
[00:03:36] these new formulas are actually true they're actually they live in this space
[00:03:38] they're actually they live in this space of things that are entailed and are
[00:03:40] of things that are entailed and are actually true and and if you remember
[00:03:42] actually true and and if you remember our what our example with glass and
[00:03:45] our what our example with glass and water inside of the glass what soundness
[00:03:48] water inside of the glass what soundness means is anything that we are deriving
[00:03:49] means is anything that we are deriving should be inside of the glass it should
[00:03:51] should be inside of the glass it should be a formula that's inside of the glass
[00:03:53] be a formula that's inside of the glass because everything that's in glass is is
[00:03:55] because everything that's in glass is is entailed and is true so we need to make
[00:03:57] entailed and is true so we need to make sure that everything that you're
[00:03:58] sure that everything that you're deriving if you want to make sure it's
[00:04:00] deriving if you want to make sure it's sound it has to be inside of the glass
[00:04:02] sound it has to be inside of the glass on the other hand we talked about
[00:04:03] on the other hand we talked about completeness and completeness means that
[00:04:06] completeness and completeness means that we are deriving the whole truth meaning
[00:04:08] we are deriving the whole truth meaning that if i have this class i should be
[00:04:10] that if i have this class i should be able to derive everything that's that is
[00:04:12] able to derive everything that's that is inside of this class or even more that
[00:04:14] inside of this class or even more that that is what completeness means okay
[00:04:17] that is what completeness means okay and if i have both soundness and
[00:04:19] and if i have both soundness and completeness then derivation and
[00:04:21] completeness then derivation and entailment are basically the same thing
[00:04:22] entailment are basically the same thing right like remember entailment is is is
[00:04:25] right like remember entailment is is is about about meanings about semantics
[00:04:27] about about meanings about semantics right entailment is is about what what
[00:04:30] right entailment is is about what what is the meaning of f actually being
[00:04:32] is the meaning of f actually being entailed by the knowledge base or being
[00:04:34] entailed by the knowledge base or being contradicted by the knowledge base but
[00:04:36] contradicted by the knowledge base but derivation is is just basically symbol
[00:04:39] derivation is is just basically symbol manipulation okay so
[00:04:41] manipulation okay so so it's difficult to talk about
[00:04:43] so it's difficult to talk about entailment kind of in the semantics land
[00:04:45] entailment kind of in the semantics land because you have to think about meanings
[00:04:46] because you have to think about meanings and so on but if you can do derivation
[00:04:49] and so on but if you can do derivation in in in the world of syntax you're just
[00:04:51] in in in the world of syntax you're just moving around formulas and and kind of
[00:04:54] moving around formulas and and kind of mindlessly by moving around formulas and
[00:04:56] mindlessly by moving around formulas and applying inference rules you can derive
[00:04:58] applying inference rules you can derive new formulas and and that allows you to
[00:05:01] new formulas and and that allows you to have a compact way of thinking about
[00:05:02] have a compact way of thinking about these formulas and new formulas that are
[00:05:04] these formulas and new formulas that are being derived so if derivation is the
[00:05:06] being derived so if derivation is the same thing as entailment that's pretty
[00:05:08] same thing as entailment that's pretty nice because if if you have a virtual
[00:05:11] nice because if if you have a virtual assistant and you want to ask a virtual
[00:05:13] assistant and you want to ask a virtual assistant a question or maybe you want
[00:05:14] assistant a question or maybe you want to tell it some information that
[00:05:16] to tell it some information that corresponds to maybe an entailment
[00:05:18] corresponds to maybe an entailment question and that might be difficult to
[00:05:20] question and that might be difficult to answer so instead if you have a sound
[00:05:22] answer so instead if you have a sound and complete inference rule for your for
[00:05:25] and complete inference rule for your for your logic then maybe you can just check
[00:05:27] your logic then maybe you can just check derivation and derivation alone is going
[00:05:30] derivation and derivation alone is going to is going to give you the answer okay
[00:05:32] to is going to give you the answer okay so that is why we talked about
[00:05:33] so that is why we talked about derivation for a bit
[00:05:35] derivation for a bit and we discussed modus ponens um for
[00:05:38] and we discussed modus ponens um for propositional logic and the fact that
[00:05:40] propositional logic and the fact that modus ponens is actually sound for
[00:05:42] modus ponens is actually sound for propositional logic but it is not
[00:05:44] propositional logic but it is not complete it's not able to get all the
[00:05:47] complete it's not able to get all the formulas that are true
[00:05:49] formulas that are true so in order to solve that we had two
[00:05:51] so in order to solve that we had two solutions one was maybe propositional
[00:05:54] solutions one was maybe propositional logic is too large maybe we should
[00:05:56] logic is too large maybe we should reduce the size of propositional logic
[00:05:59] reduce the size of propositional logic and then the other idea was maybe modus
[00:06:01] and then the other idea was maybe modus ponens is not as strong maybe we should
[00:06:03] ponens is not as strong maybe we should make modus ponens stronger or come up
[00:06:05] make modus ponens stronger or come up with a stronger inference rule so let's
[00:06:07] with a stronger inference rule so let's talk about those two ideas the first
[00:06:09] talk about those two ideas the first idea is propositional logic allows us to
[00:06:12] idea is propositional logic allows us to talk about any legal combination of
[00:06:14] talk about any legal combination of symbols and and that is pretty
[00:06:16] symbols and and that is pretty expressive but maybe that is too
[00:06:18] expressive but maybe that is too expressive so maybe we can just look at
[00:06:20] expressive so maybe we can just look at propositional logic with only horn
[00:06:23] propositional logic with only horn clauses so this is a restricted set of
[00:06:25] clauses so this is a restricted set of logic formulas and that allows us
[00:06:28] logic formulas and that allows us to to basically have this more
[00:06:30] to to basically have this more restricted set and that allows us to get
[00:06:32] restricted set and that allows us to get both soundness and completeness with
[00:06:35] both soundness and completeness with modus ponens so what is a horn clause
[00:06:37] modus ponens so what is a horn clause horn clause is basically a clause that
[00:06:39] horn clause is basically a clause that has at most one positive literal so if
[00:06:41] has at most one positive literal so if you write this in the cnn form in the
[00:06:43] you write this in the cnn form in the conjunctive normal form then you want to
[00:06:45] conjunctive normal form then you want to basically make sure that you have at
[00:06:47] basically make sure that you have at most one positive literal another way of
[00:06:50] most one positive literal another way of writing it is that you have you have an
[00:06:52] writing it is that you have you have an end of a set of propositional symbols p1
[00:06:54] end of a set of propositional symbols p1 added through pk and that implies some
[00:06:57] added through pk and that implies some sum q and you basically want to make
[00:06:59] sum q and you basically want to make sure that there are no no ores here or
[00:07:02] sure that there are no no ores here or no branchings here that that is why
[00:07:05] no branchings here that that is why we could actually show
[00:07:06] we could actually show completeness with horn clauses
[00:07:09] completeness with horn clauses all right so
[00:07:10] all right so that was basically propositional logic
[00:07:13] that was basically propositional logic uh
[00:07:14] uh with horn clauses using modus ponens and
[00:07:16] with horn clauses using modus ponens and that gives us completeness okay without
[00:07:19] that gives us completeness okay without we the general propositional logic
[00:07:21] we the general propositional logic doesn't give us completeness with more
[00:07:23] doesn't give us completeness with more response the other option we discussed
[00:07:26] response the other option we discussed is is maybe we should have a fancier
[00:07:28] is is maybe we should have a fancier inference rule specifically resolution
[00:07:30] inference rule specifically resolution was the thing that we started looking at
[00:07:32] was the thing that we started looking at so resolution was able to give us both
[00:07:35] so resolution was able to give us both soundness and completeness the issue
[00:07:37] soundness and completeness the issue with it was uh it was it was actually
[00:07:40] with it was uh it was it was actually exponential time as opposed to linear
[00:07:42] exponential time as opposed to linear time and when we think about modus
[00:07:44] time and when we think about modus ponens where we keep adding only one
[00:07:45] ponens where we keep adding only one formula and at most like we'll end up
[00:07:47] formula and at most like we'll end up with n formulas but with resolution we
[00:07:49] with n formulas but with resolution we might have an exponential time algorithm
[00:07:52] might have an exponential time algorithm but we end up getting both soundness and
[00:07:54] but we end up getting both soundness and completeness which is nice
[00:07:57] completeness which is nice okay so that was all about prepositional
[00:07:59] okay so that was all about prepositional logic at some point we started talking
[00:08:00] logic at some point we started talking about first order logic so we started
[00:08:03] about first order logic so we started expanding our logic and try to be more
[00:08:05] expanding our logic and try to be more expressive in our logic try to be able
[00:08:07] expressive in our logic try to be able to talk about variables talk about uh
[00:08:10] to talk about variables talk about uh quantifiers and and be able to have a
[00:08:12] quantifiers and and be able to have a better way of representing things that
[00:08:14] better way of representing things that are much harder to represent in
[00:08:15] are much harder to represent in propositional logic and then we started
[00:08:18] propositional logic and then we started talking about syntax semantics and
[00:08:20] talking about syntax semantics and inference rules for for first order
[00:08:22] inference rules for for first order logic so so basically we went over the
[00:08:23] logic so so basically we went over the same things for first order logic so
[00:08:26] same things for first order logic so comparing propositional logic with first
[00:08:28] comparing propositional logic with first order logic in propositional logic we
[00:08:30] order logic in propositional logic we have this option of doing model checking
[00:08:32] have this option of doing model checking when we think about our models and and
[00:08:35] when we think about our models and and the semantics of our models in first
[00:08:37] the semantics of our models in first order logic we don't really have an
[00:08:39] order logic we don't really have an analog of that but we have this other
[00:08:41] analog of that but we have this other thing called propositionalization so for
[00:08:43] thing called propositionalization so for subset of first order logic uh formulas
[00:08:46] subset of first order logic uh formulas what we can do is we can do
[00:08:47] what we can do is we can do propositionalization and that takes us
[00:08:50] propositionalization and that takes us back back to the propositional logic
[00:08:51] back back to the propositional logic land and we can use the same tools that
[00:08:53] land and we can use the same tools that are there
[00:08:55] are there thinking about inference rules we
[00:08:57] thinking about inference rules we discussed modus ponens with horn clauses
[00:09:00] discussed modus ponens with horn clauses and the fact that it sound incomplete
[00:09:02] and the fact that it sound incomplete same story is true with first order
[00:09:04] same story is true with first order logic so we could apply modus ponens
[00:09:06] logic so we could apply modus ponens with horn clauses in first order logic
[00:09:08] with horn clauses in first order logic it's sound and complete there's a plus
[00:09:10] it's sound and complete there's a plus plus here and that plus plus basically
[00:09:12] plus here and that plus plus basically means that we had to change modus ponens
[00:09:15] means that we had to change modus ponens a little bit so we discussed unification
[00:09:18] a little bit so we discussed unification and substitution because we have
[00:09:20] and substitution because we have variables here so because there are
[00:09:22] variables here so because there are those variables you should be able to we
[00:09:24] those variables you should be able to we should we should apply unification and
[00:09:26] should we should apply unification and substitution to make sure that our modus
[00:09:28] substitution to make sure that our modus ponens makes sense in the space of first
[00:09:30] ponens makes sense in the space of first order logic
[00:09:32] order logic similarly we discussed resolution and
[00:09:34] similarly we discussed resolution and then it showed that it is general and
[00:09:35] then it showed that it is general and it's sound and complete in propositional
[00:09:37] it's sound and complete in propositional logic and in the case of first order
[00:09:39] logic and in the case of first order logic we briefly discussed this in an
[00:09:42] logic we briefly discussed this in an optional uh module and again like this
[00:09:45] optional uh module and again like this is resolution plus plus because we are
[00:09:47] is resolution plus plus because we are talking about applying unification and
[00:09:50] talking about applying unification and and substitution on resolution too and
[00:09:52] and substitution on resolution too and again it sounded complete even under
[00:09:54] again it sounded complete even under first order logic which is kind of nice
[00:09:57] first order logic which is kind of nice all right so that summarizes our logic
[00:09:59] all right so that summarizes our logic lecture i just want to like leave you
[00:10:02] lecture i just want to like leave you guys with one thought when you think
[00:10:04] guys with one thought when you think about logic
[00:10:05] about logic so so what is it about logic that is
[00:10:07] so so what is it about logic that is useful again
[00:10:08] useful again we talked about all the all the
[00:10:10] we talked about all the all the limitations of it right the fact that it
[00:10:12] limitations of it right the fact that it can't handle uncertainty or it's not
[00:10:15] can't handle uncertainty or it's not probabilistic really or it's pretty
[00:10:17] probabilistic really or it's pretty brittle it doesn't it's not able to
[00:10:19] brittle it doesn't it's not able to capture data as you get more data it's
[00:10:22] capture data as you get more data it's hard to update its rules because it has
[00:10:24] hard to update its rules because it has all these deterministic rules that are
[00:10:25] all these deterministic rules that are built on top of each other
[00:10:27] built on top of each other but it does have one big benefit and
[00:10:30] but it does have one big benefit and that one big benefit is that it allows
[00:10:32] that one big benefit is that it allows us to have a very compact and concise
[00:10:35] us to have a very compact and concise way of representing knowledge that we
[00:10:37] way of representing knowledge that we wouldn't normally have right like
[00:10:39] wouldn't normally have right like remember like the whole point of
[00:10:41] remember like the whole point of inference rules was i had this logical
[00:10:43] inference rules was i had this logical formula which is like a very compact way
[00:10:45] formula which is like a very compact way of thinking about semantics and
[00:10:47] of thinking about semantics and knowledge that's actually pretty
[00:10:48] knowledge that's actually pretty difficult to represent in the semantics
[00:10:50] difficult to represent in the semantics land and now that i have this concise
[00:10:53] land and now that i have this concise formula i can just manipulate it i can
[00:10:54] formula i can just manipulate it i can move it around and i can do all sorts of
[00:10:56] move it around and i can do all sorts of inference rules on top of it i can come
[00:10:58] inference rules on top of it i can come up with new formulas derive new formulas
[00:11:01] up with new formulas derive new formulas and prove new formulas and that's pretty
[00:11:03] and prove new formulas and that's pretty interesting and it's much harder to do
[00:11:05] interesting and it's much harder to do that in the semantics line so the thing
[00:11:08] that in the semantics line so the thing that logic really gives us it's really
[00:11:10] that logic really gives us it's really like big strength here is having this
[00:11:12] like big strength here is having this compact representation that can help us
[00:11:15] compact representation that can help us think better about about about formulas
[00:11:18] think better about about about formulas and and do better manipulation of them
[00:11:20] and and do better manipulation of them and i think one thing that would be very
[00:11:22] and i think one thing that would be very interesting to think about is how could
[00:11:23] interesting to think about is how could we use these ideas maybe not exactly
[00:11:26] we use these ideas maybe not exactly logic but ideas from logic in some of
[00:11:28] logic but ideas from logic in some of the more modern ai systems some of the
[00:11:31] the more modern ai systems some of the more modern machine learning based
[00:11:32] more modern machine learning based systems and and i think that is a pretty
[00:11:35] systems and and i think that is a pretty interesting view of logic that that
[00:11:37] interesting view of logic that that would be good to take from this class
[00:11:44] you


================================================================================
LECTURE 051
================================================================================

AI and Law I Mariano-Florentino Cuéllar, President of the Carnegie Endowment for International Peace

Source: https://www.youtube.com/watch?v=_-hBu3_Jz-0

---

Transcript

[00:00:05] so today we have the pleasure of hearing
[00:00:07] so today we have the pleasure of hearing from justice mariano florentino whaler
[00:00:10] from justice mariano florentino whaler tino is a professor in the law school at
[00:00:11] tino is a professor in the law school at stanford he's also a justice on the
[00:00:14] stanford he's also a justice on the california supreme court he was also an
[00:00:16] california supreme court he was also an official in the clinton and obama
[00:00:17] official in the clinton and obama administration which is incredibly cool
[00:00:20] administration which is incredibly cool tino did his undergrad at harvard he did
[00:00:23] tino did his undergrad at harvard he did his law school at yale he also has a phd
[00:00:25] his law school at yale he also has a phd in political science at stanford and he
[00:00:27] in political science at stanford and he has done a lot of work around cyber law
[00:00:30] has done a lot of work around cyber law and actually like ai and legislation as
[00:00:32] and actually like ai and legislation as well as other work around international
[00:00:34] well as other work around international affairs and public health he also
[00:00:36] affairs and public health he also teaches a class on regulating ai which
[00:00:39] teaches a class on regulating ai which is a very cool class so if you're
[00:00:41] is a very cool class so if you're interested in these areas and if you're
[00:00:42] interested in these areas and if you're interested in these topics i absolutely
[00:00:44] interested in these topics i absolutely recommend taking that class especially
[00:00:46] recommend taking that class especially after taking 221 i think that would be a
[00:00:48] after taking 221 i think that would be a really good class to take
[00:00:50] really good class to take um and then we've been basically
[00:00:52] um and then we've been basically interacting with tino through the ai
[00:00:53] interacting with tino through the ai safety center and the human center the
[00:00:55] safety center and the human center the ai institute institute over the past
[00:00:58] ai institute institute over the past couple of years we have a project
[00:00:59] couple of years we have a project together right now on adaptive agents so
[00:01:01] together right now on adaptive agents so it's really great to work with tino it's
[00:01:03] it's really great to work with tino it's really great to hear from him and and
[00:01:05] really great to hear from him and and for me he's like on the list of the top
[00:01:07] for me he's like on the list of the top five people i would want to have a
[00:01:09] five people i would want to have a conversation with this includes
[00:01:10] conversation with this includes roboticists this is a very small list so
[00:01:13] roboticists this is a very small list so it's really great today to hear from
[00:01:15] it's really great today to hear from tino and and i'm excited to hear your
[00:01:17] tino and and i'm excited to hear your talking you know so welcome
[00:01:19] talking you know so welcome thank you very much professor sally
[00:01:21] thank you very much professor sally dorsa and thank you
[00:01:23] dorsa and thank you percy and peng and woody it's really an
[00:01:26] percy and peng and woody it's really an honor to be here and to share some time
[00:01:28] honor to be here and to share some time with you i have to tell you that that
[00:01:29] with you i have to tell you that that last comment you made dorsa is a lot of
[00:01:32] last comment you made dorsa is a lot of pressure i don't want to let the class
[00:01:34] pressure i don't want to let the class down
[00:01:34] down and get demoted and not be on your top
[00:01:37] and get demoted and not be on your top five list um
[00:01:38] five list um it's also you know been really great to
[00:01:40] it's also you know been really great to get to know you and i learned so much
[00:01:42] get to know you and i learned so much from all of our interactions i
[00:01:44] from all of our interactions i appreciate that you've come to speak at
[00:01:46] appreciate that you've come to speak at my class so it's only fair and it's
[00:01:47] my class so it's only fair and it's really an honor to be here
[00:01:49] really an honor to be here i want to take like 35 to 40 minutes
[00:01:52] i want to take like 35 to 40 minutes which i know in the era of zoom is a
[00:01:54] which i know in the era of zoom is a long time so i'm going to hope that
[00:01:56] long time so i'm going to hope that those of you who have been
[00:01:58] those of you who have been good enough to tune in i know these uh
[00:02:00] good enough to tune in i know these uh doing this live is optional that you're
[00:02:02] doing this live is optional that you're going to find this worthwhile i want us
[00:02:04] going to find this worthwhile i want us to have a lot of time for this
[00:02:05] to have a lot of time for this discussion but let me just give you a
[00:02:06] discussion but let me just give you a quick overview of what i mostly want to
[00:02:08] quick overview of what i mostly want to do
[00:02:09] do i
[00:02:10] i want to explore with you why
[00:02:14] want to explore with you why your interest in artificial intelligence
[00:02:16] your interest in artificial intelligence which is what led you to take this class
[00:02:18] which is what led you to take this class is actually incredibly relevant to
[00:02:21] is actually incredibly relevant to policy to politics and law and along the
[00:02:24] policy to politics and law and along the way you're going to see it's also
[00:02:26] way you're going to see it's also relevant to international affairs and
[00:02:27] relevant to international affairs and geopolitics
[00:02:28] geopolitics but really in the course of this talk i
[00:02:31] but really in the course of this talk i want to share with you
[00:02:32] want to share with you some reasons not only why you should be
[00:02:35] some reasons not only why you should be interested in law and policy and take
[00:02:37] interested in law and policy and take your technical knowledge and expect that
[00:02:38] your technical knowledge and expect that it's going to be relevant to a lot of
[00:02:40] it's going to be relevant to a lot of really important questions the world is
[00:02:42] really important questions the world is facing i also want to maybe
[00:02:45] facing i also want to maybe give you a sense of how i became really
[00:02:48] give you a sense of how i became really interested in the subject along the way
[00:02:50] interested in the subject along the way and i'm going to try to share my slides
[00:02:52] and i'm going to try to share my slides now so you have a better sense
[00:02:55] now so you have a better sense of what we're talking about
[00:02:57] of what we're talking about so let me start by by noting that
[00:03:01] so let me start by by noting that right now you're at an amazing moment in
[00:03:03] right now you're at an amazing moment in your life you're learning about
[00:03:04] your life you're learning about artificial intelligence you have this
[00:03:06] artificial intelligence you have this extraordinary university at least
[00:03:08] extraordinary university at least virtually around you eventually you'll
[00:03:10] virtually around you eventually you'll be back here physically i hope and
[00:03:12] be back here physically i hope and expect
[00:03:13] expect and you can look at this talk and think
[00:03:15] and you can look at this talk and think about it from the perspective of a
[00:03:16] about it from the perspective of a technical expert which is what you're
[00:03:18] technical expert which is what you're becoming by taking this class
[00:03:20] becoming by taking this class but before we get to that i want you to
[00:03:22] but before we get to that i want you to imagine yourself not as a technical
[00:03:23] imagine yourself not as a technical expert but to think of yourself as just
[00:03:25] expert but to think of yourself as just a citizen somebody who has to think
[00:03:27] a citizen somebody who has to think about
[00:03:28] about how does this technology affect daily
[00:03:30] how does this technology affect daily life
[00:03:31] life who's being affected by it
[00:03:33] who's being affected by it where are the inequities what are the
[00:03:35] where are the inequities what are the opportunities for understanding it
[00:03:36] opportunities for understanding it better
[00:03:37] better and then near the end of the talk i want
[00:03:39] and then near the end of the talk i want you to imagine yourself as a policy
[00:03:41] you to imagine yourself as a policy maker somebody who has to make decisions
[00:03:43] maker somebody who has to make decisions about how to allocate scarce resources
[00:03:45] about how to allocate scarce resources where government budget should go
[00:03:48] where government budget should go what people should do in the legislature
[00:03:50] what people should do in the legislature and the courts
[00:03:51] and the courts around how to resolve the technical
[00:03:54] around how to resolve the technical questions and policy questions and the
[00:03:55] questions and policy questions and the legal questions that arise
[00:03:58] legal questions that arise and what you're going to find is that
[00:03:59] and what you're going to find is that your technical knowledge is extremely
[00:04:01] your technical knowledge is extremely relevant to a lot of these crucial
[00:04:03] relevant to a lot of these crucial issues but at the same time
[00:04:05] issues but at the same time you need to round out that knowledge by
[00:04:07] you need to round out that knowledge by understanding a little bit about the
[00:04:08] understanding a little bit about the legal system and about organizations
[00:04:11] legal system and about organizations so
[00:04:12] so the bottom line really is that i'm going
[00:04:14] the bottom line really is that i'm going to share with you
[00:04:16] to share with you a lot of different messages but the core
[00:04:18] a lot of different messages but the core message is that this technology that you
[00:04:20] message is that this technology that you were learning to master has not only
[00:04:22] were learning to master has not only benefits but risks
[00:04:24] benefits but risks and in the course of
[00:04:26] and in the course of implementing that technology society is
[00:04:28] implementing that technology society is going to be
[00:04:30] going to be shaping how that technology is used
[00:04:32] shaping how that technology is used through the legal system and also
[00:04:33] through the legal system and also through organizations through the
[00:04:35] through organizations through the associations the institutions the groups
[00:04:38] associations the institutions the groups but especially the firms the agencies
[00:04:40] but especially the firms the agencies that so many of us are going to work in
[00:04:42] that so many of us are going to work in law firms government agencies
[00:04:45] law firms government agencies big corporations nonprofit organizations
[00:04:48] big corporations nonprofit organizations now i want to tell you because i know
[00:04:50] now i want to tell you because i know that it is difficult to hang on to your
[00:04:52] that it is difficult to hang on to your attention but i'm going to try
[00:04:54] attention but i'm going to try that there are some things that i
[00:04:55] that there are some things that i absolutely want to remember have you
[00:04:57] absolutely want to remember have you remember
[00:04:58] remember if you remember one thing about my whole
[00:05:00] if you remember one thing about my whole presentation
[00:05:02] presentation it's that the impact of artificial
[00:05:03] it's that the impact of artificial intelligence on the world on your daily
[00:05:06] intelligence on the world on your daily life is a function of law and
[00:05:08] life is a function of law and organizations it's not anything that
[00:05:11] organizations it's not anything that actually acts directly by itself but it
[00:05:14] actually acts directly by itself but it has to be mediated by some organization
[00:05:16] has to be mediated by some organization by like what stanford does or what
[00:05:19] by like what stanford does or what the you know republican party does or
[00:05:21] the you know republican party does or what
[00:05:22] what the united nations does
[00:05:24] the united nations does but also it's needed by legal rules and
[00:05:27] but also it's needed by legal rules and along the way you're going to find that
[00:05:29] along the way you're going to find that we might sometimes talk as though we're
[00:05:31] we might sometimes talk as though we're discussing the possibility of developing
[00:05:33] discussing the possibility of developing legal rules that will apply to ai well
[00:05:35] legal rules that will apply to ai well i'm here to suggest you that many of
[00:05:37] i'm here to suggest you that many of those rules already exist the question
[00:05:38] those rules already exist the question is just how to translate them to this
[00:05:40] is just how to translate them to this context
[00:05:41] context if i can convince you to remember two
[00:05:43] if i can convince you to remember two things and not just one
[00:05:45] things and not just one i'd like you to remember the point above
[00:05:47] i'd like you to remember the point above but then also i want you to remember
[00:05:48] but then also i want you to remember some crucial terminology and that is the
[00:05:51] some crucial terminology and that is the techniques of ai like machine learning
[00:05:55] techniques of ai like machine learning are different from ai systems or
[00:05:57] are different from ai systems or applications and these are the
[00:05:58] applications and these are the mechanisms obviously that instantiate
[00:06:00] mechanisms obviously that instantiate the techniques that are attached to a
[00:06:02] the techniques that are attached to a user interface i'll say more about this
[00:06:04] user interface i'll say more about this later and that actually spit out
[00:06:06] later and that actually spit out information
[00:06:08] information recommendations
[00:06:10] recommendations insights that people will then act on
[00:06:13] insights that people will then act on and if miraculously enough i can get you
[00:06:14] and if miraculously enough i can get you to remember three things and this is the
[00:06:17] to remember three things and this is the last thing i really want you to remember
[00:06:18] last thing i really want you to remember for sure it's the previous two points
[00:06:21] for sure it's the previous two points plus that
[00:06:22] plus that point that law is kind of merging with
[00:06:25] point that law is kind of merging with the design and policy challenges that
[00:06:27] the design and policy challenges that are implicit in ai so what i'm going to
[00:06:28] are implicit in ai so what i'm going to end up is by telling you that lawyers
[00:06:31] end up is by telling you that lawyers are becoming more and more a little bit
[00:06:32] are becoming more and more a little bit like people like you
[00:06:34] like people like you who are
[00:06:35] who are trying to wrap their minds around
[00:06:38] trying to wrap their minds around machine learning supervised learning
[00:06:39] machine learning supervised learning unsupervised learning reinforcement
[00:06:40] unsupervised learning reinforcement learning and in the same way
[00:06:43] learning and in the same way you and your community the people who
[00:06:44] you and your community the people who are the technical experts are
[00:06:46] are the technical experts are increasingly pushing to ask questions
[00:06:48] increasingly pushing to ask questions like what is the right way to use this
[00:06:50] like what is the right way to use this technology what do we want it to do and
[00:06:52] technology what do we want it to do and not do
[00:06:53] not do so with that as background
[00:06:55] so with that as background let me acknowledge
[00:06:56] let me acknowledge more explicitly the benefits side of the
[00:06:59] more explicitly the benefits side of the ai technology you're learning to master
[00:07:01] ai technology you're learning to master because if we don't then we're going to
[00:07:03] because if we don't then we're going to get a pretty distorted picture
[00:07:06] get a pretty distorted picture if you were physically on campus right
[00:07:08] if you were physically on campus right now and you were walking around stanford
[00:07:09] now and you were walking around stanford you could go to 10 different places on
[00:07:11] you could go to 10 different places on campus where really cool stuff is
[00:07:14] campus where really cool stuff is happening that is relevant to real
[00:07:15] happening that is relevant to real problems that people are facing around
[00:07:17] problems that people are facing around the world and where ai techniques are
[00:07:19] the world and where ai techniques are being used to try to make the world a
[00:07:21] being used to try to make the world a little bit of a better place
[00:07:22] little bit of a better place so let's take for example
[00:07:24] so let's take for example the population of the world that is
[00:07:27] the population of the world that is facing
[00:07:28] facing serious nutritional stress meaning
[00:07:30] serious nutritional stress meaning people who are at serious risk of
[00:07:32] people who are at serious risk of starving
[00:07:33] starving a generation ago that population was
[00:07:35] a generation ago that population was much bigger than it is now but sadly
[00:07:38] much bigger than it is now but sadly that population is still stubbornly
[00:07:40] that population is still stubbornly large
[00:07:40] large 700 million people are so faced serious
[00:07:43] 700 million people are so faced serious food insecurity these are generally the
[00:07:45] food insecurity these are generally the people living on a dollar a day or less
[00:07:47] people living on a dollar a day or less you see some of the kids here
[00:07:49] you see some of the kids here overwhelmingly that population is
[00:07:50] overwhelmingly that population is concentrated in africa and in
[00:07:53] concentrated in africa and in india and in asia
[00:07:55] india and in asia but there are also some people in north
[00:07:57] but there are also some people in north america and europe even who face food
[00:07:58] america and europe even who face food insecurity
[00:08:00] insecurity the different ways that we have to
[00:08:03] the different ways that we have to allocate resources effectively to make
[00:08:05] allocate resources effectively to make sure food doesn't go to waste to make
[00:08:07] sure food doesn't go to waste to make supply chains more efficient to pinpoint
[00:08:09] supply chains more efficient to pinpoint where there are problems in real time
[00:08:12] where there are problems in real time and what's more a lot of this population
[00:08:13] and what's more a lot of this population not only faces problems around food but
[00:08:15] not only faces problems around food but also faces problems around education
[00:08:18] also faces problems around education the distribution of access to high
[00:08:20] the distribution of access to high quality education is incredibly unequal
[00:08:23] quality education is incredibly unequal as you know we're all a part of that
[00:08:24] as you know we're all a part of that system we take part in it
[00:08:27] system we take part in it so
[00:08:29] when i think about the future of both
[00:08:30] when i think about the future of both nutrition and education
[00:08:32] nutrition and education in a world that is more equitable and
[00:08:34] in a world that is more equitable and more benign i cannot imagine that future
[00:08:36] more benign i cannot imagine that future without some use of artificial
[00:08:38] without some use of artificial intelligence techniques to democratize
[00:08:40] intelligence techniques to democratize education to make the delivery of food
[00:08:42] education to make the delivery of food more efficient to pinpoint problems in
[00:08:43] more efficient to pinpoint problems in real time
[00:08:45] real time in somewhat similar fashion this uh
[00:08:47] in somewhat similar fashion this uh quirky uh set of four images you see
[00:08:50] quirky uh set of four images you see here is an example of the work that dan
[00:08:52] here is an example of the work that dan hoe my colleague at the law school is
[00:08:54] hoe my colleague at the law school is doing with some colleagues
[00:08:56] doing with some colleagues to use satellite imagery to pinpoint
[00:08:58] to use satellite imagery to pinpoint where sources of pollution are
[00:09:00] where sources of pollution are in
[00:09:00] in much more accurate fashion than anything
[00:09:03] much more accurate fashion than anything the government currently has really and
[00:09:05] the government currently has really and what that would allow us to do is more
[00:09:06] what that would allow us to do is more effectively
[00:09:07] effectively cross-reference the self-reported data
[00:09:10] cross-reference the self-reported data that comes from firms that claim to be
[00:09:12] that comes from firms that claim to be complying with environmental law with
[00:09:14] complying with environmental law with the reality and it takes some
[00:09:17] the reality and it takes some fairly sophisticated but also in some
[00:09:19] fairly sophisticated but also in some ways intuitive machine learning
[00:09:21] ways intuitive machine learning techniques to make use of this visual
[00:09:23] techniques to make use of this visual data
[00:09:24] data and then you've got a picture of a
[00:09:25] and then you've got a picture of a courtroom
[00:09:26] courtroom this is not the kind of courtroom where
[00:09:27] this is not the kind of courtroom where i sit because it's the trial courtroom
[00:09:30] i sit because it's the trial courtroom mostly this is where trials are actually
[00:09:31] mostly this is where trials are actually heard in superior court
[00:09:33] heard in superior court the reality is that in california if we
[00:09:35] the reality is that in california if we had more time and if we were in person i
[00:09:38] had more time and if we were in person i would ask you to guess the number of
[00:09:40] would ask you to guess the number of cases
[00:09:41] cases we
[00:09:42] we hear in california courts every year
[00:09:44] hear in california courts every year and generally speaking when i ask that
[00:09:46] and generally speaking when i ask that question people like 20 000 and i give a
[00:09:48] question people like 20 000 and i give a shock response that's too long people
[00:09:50] shock response that's too long people say okay
[00:09:51] say okay 200 000 and my eyebrows still go up and
[00:09:54] 200 000 and my eyebrows still go up and finally we'll get to like 800 000 well
[00:09:56] finally we'll get to like 800 000 well the actual retail answer is like 6
[00:09:59] the actual retail answer is like 6 million cases a year
[00:10:01] million cases a year so it will not shock you to hear that
[00:10:04] so it will not shock you to hear that probably 40 50
[00:10:06] probably 40 50 of those cases the litigants are
[00:10:08] of those cases the litigants are self-represented there are people like
[00:10:10] self-represented there are people like you
[00:10:10] you they don't have a lawyer
[00:10:12] they don't have a lawyer they
[00:10:14] they are trying their best to navigate an
[00:10:15] are trying their best to navigate an incredibly complicated system i would
[00:10:18] incredibly complicated system i would love to imagine a world where the
[00:10:19] love to imagine a world where the distribution of legal knowledge is not
[00:10:22] distribution of legal knowledge is not so restricted just to people who have a
[00:10:24] so restricted just to people who have a stanford law degree or a similarly great
[00:10:27] stanford law degree or a similarly great credential or can pay a lot of money for
[00:10:29] credential or can pay a lot of money for a fancy lawyer but where software and ai
[00:10:33] a fancy lawyer but where software and ai systems that you might help design can
[00:10:34] systems that you might help design can help people navigate a very intricate
[00:10:36] help people navigate a very intricate legal system
[00:10:38] legal system but at the very far right you see a
[00:10:40] but at the very far right you see a picture of an african-american man under
[00:10:42] picture of an african-american man under the words criminal of justice and
[00:10:44] the words criminal of justice and there's a question mark there
[00:10:46] there's a question mark there and
[00:10:47] and why i'm doing that is to highlight that
[00:10:49] why i'm doing that is to highlight that this whole world we can imagine also has
[00:10:52] this whole world we can imagine also has its risks and its downsides and to make
[00:10:55] its risks and its downsides and to make this more concrete i want to focus on
[00:10:57] this more concrete i want to focus on one person in particular
[00:10:59] one person in particular the gentleman you see here robert
[00:11:01] the gentleman you see here robert williams is one of
[00:11:03] williams is one of many people whose lives are being
[00:11:06] many people whose lives are being affected by the fact that artificial
[00:11:08] affected by the fact that artificial intelligence systems are not just
[00:11:09] intelligence systems are not just theoretical anymore in terms of their
[00:11:11] theoretical anymore in terms of their practical application
[00:11:13] practical application they're being used
[00:11:14] they're being used in all kinds of settings including in
[00:11:16] in all kinds of settings including in the criminal justice system
[00:11:18] the criminal justice system so here he was one day in a suburb of
[00:11:21] so here he was one day in a suburb of detroit when he gets arrested by police
[00:11:24] detroit when he gets arrested by police he's told that he's being arrested
[00:11:26] he's told that he's being arrested because he's suspected of committing
[00:11:28] because he's suspected of committing larceny which is a fancy word for
[00:11:30] larceny which is a fancy word for stealing
[00:11:31] stealing robbing a store
[00:11:33] robbing a store in detroit and it turns out that because
[00:11:37] in detroit and it turns out that because the police were using an image
[00:11:39] the police were using an image recognition system doing facial
[00:11:41] recognition system doing facial recognition
[00:11:42] recognition and they had a
[00:11:44] and they had a base of data corpus of 49 million images
[00:11:47] base of data corpus of 49 million images and the system indicated that the image
[00:11:49] and the system indicated that the image from the security camera in detroit
[00:11:51] from the security camera in detroit matched robert williams's picture
[00:11:55] matched robert williams's picture he was arrested
[00:11:57] he was arrested now you might ask did the police have
[00:11:59] now you might ask did the police have another reason to suspect him did were
[00:12:01] another reason to suspect him did were there outstanding arrest warrants for
[00:12:03] there outstanding arrest warrants for him had he committed similar crimes in
[00:12:05] him had he committed similar crimes in the past and the answer is no no no okay
[00:12:07] the past and the answer is no no no okay so once he was arrested the police
[00:12:09] so once he was arrested the police admitted that the photo is a little
[00:12:11] admitted that the photo is a little blurry
[00:12:12] blurry they admitted that they didn't have any
[00:12:13] they admitted that they didn't have any other information about him and after a
[00:12:15] other information about him and after a little bit more
[00:12:17] little bit more discussion
[00:12:18] discussion they ultimately
[00:12:20] they ultimately agreed with mr williams that the picture
[00:12:23] agreed with mr williams that the picture really just didn't look like him meaning
[00:12:25] really just didn't look like him meaning the intuitive human response was like no
[00:12:27] the intuitive human response was like no that doesn't seem to be you but the
[00:12:29] that doesn't seem to be you but the algorithm says it's you so what do we do
[00:12:32] algorithm says it's you so what do we do the answer is he was detained for 30
[00:12:34] the answer is he was detained for 30 hours
[00:12:35] hours now
[00:12:36] now i'm not suggesting that there aren't
[00:12:37] i'm not suggesting that there aren't worse things than being detained for no
[00:12:39] worse things than being detained for no reason for 30 hours
[00:12:41] reason for 30 hours but i'll tell you i grew up on the u.s
[00:12:42] but i'll tell you i grew up on the u.s mexico border and it was a fact of life
[00:12:45] mexico border and it was a fact of life in my family that sometimes you need to
[00:12:47] in my family that sometimes you need to cross over to the american side to go
[00:12:49] cross over to the american side to go shopping or do something else like that
[00:12:51] shopping or do something else like that and
[00:12:52] and being detained even 45 minutes an hour
[00:12:55] being detained even 45 minutes an hour an hour and 15 minutes all those things
[00:12:56] an hour and 15 minutes all those things happen to me it's not very pleasant so
[00:12:59] happen to me it's not very pleasant so you can imagine what it's like or you
[00:13:00] you can imagine what it's like or you can begin to imagine if you tried what
[00:13:02] can begin to imagine if you tried what it's like to be detained more than 38
[00:13:04] it's like to be detained more than 38 hours and they'll be told that it's
[00:13:05] hours and they'll be told that it's because a computer made a mistake
[00:13:07] because a computer made a mistake computer must have gotten it wrong was
[00:13:08] computer must have gotten it wrong was the exact thing that he was told
[00:13:11] the exact thing that he was told everything that i want to share with you
[00:13:12] everything that i want to share with you from here on out in a way you could sum
[00:13:14] from here on out in a way you could sum up by asking this narrow question of why
[00:13:16] up by asking this narrow question of why did this happen to mr williams and what
[00:13:18] did this happen to mr williams and what does that mean what are the remedies do
[00:13:20] does that mean what are the remedies do we have a legal system a society where
[00:13:22] we have a legal system a society where we can sort of disentangle the mistaken
[00:13:24] we can sort of disentangle the mistaken uses from the correct ones manage the
[00:13:27] uses from the correct ones manage the risks appropriately can we take
[00:13:29] risks appropriately can we take seriously the fact that humans also make
[00:13:31] seriously the fact that humans also make mistakes when they're looking for faces
[00:13:33] mistakes when they're looking for faces i'll say more about that in a moment
[00:13:36] i'll say more about that in a moment but i hope i can press you to think
[00:13:37] but i hope i can press you to think about the situation with mr williams in
[00:13:39] about the situation with mr williams in a little bit of a broader context
[00:13:42] a little bit of a broader context because
[00:13:42] because we could talk about criminal justice or
[00:13:45] we could talk about criminal justice or we could talk about testing
[00:13:47] we could talk about testing as you may know the intellectual the
[00:13:49] as you may know the intellectual the international baccalaureate exams this
[00:13:51] international baccalaureate exams this last year because of code were not
[00:13:53] last year because of code were not actually given but instead students were
[00:13:56] actually given but instead students were given a score that was their predicted
[00:13:58] given a score that was their predicted score based on the previous portfolio
[00:14:00] score based on the previous portfolio work they'd submitted we can talk about
[00:14:03] work they'd submitted we can talk about testing uh
[00:14:05] testing uh in remote settings where your facial
[00:14:09] in remote settings where your facial your image is sort of being analyzed by
[00:14:11] your image is sort of being analyzed by a camera that's trying to detect whether
[00:14:13] a camera that's trying to detect whether you're cheating
[00:14:14] you're cheating we can talk about insurance we can talk
[00:14:17] we can talk about insurance we can talk about it
[00:14:19] about it 36 other domains where this stuff is
[00:14:21] 36 other domains where this stuff is really effective life and the broader
[00:14:23] really effective life and the broader question really is
[00:14:25] question really is what does the incident involving robert
[00:14:27] what does the incident involving robert williams tell us about law
[00:14:29] williams tell us about law about artificial intelligence and about
[00:14:31] about artificial intelligence and about how society is changing
[00:14:33] how society is changing and our legal system is changing in
[00:14:35] and our legal system is changing in response to this technology
[00:14:37] response to this technology so that is the tip of a very very big
[00:14:38] so that is the tip of a very very big iceberg now let me acknowledge again
[00:14:42] iceberg now let me acknowledge again the point about how
[00:14:44] the point about how there's a lot about this subject that
[00:14:47] there's a lot about this subject that goes deeper it doesn't just start with
[00:14:49] goes deeper it doesn't just start with the history of artificial intelligence
[00:14:51] the history of artificial intelligence it actually starts with the history of
[00:14:55] it actually starts with the history of really modern society
[00:14:57] really modern society now on the screen you see the picture of
[00:14:59] now on the screen you see the picture of a very intense looking man
[00:15:01] a very intense looking man named
[00:15:02] named max weber
[00:15:03] max weber for anybody who's ever taken a class on
[00:15:05] for anybody who's ever taken a class on social theory or sociology his name
[00:15:07] social theory or sociology his name might be familiar
[00:15:09] might be familiar there's a lot i could say about him but
[00:15:10] there's a lot i could say about him but here's the main point i want to make
[00:15:12] here's the main point i want to make writing
[00:15:13] writing in the very early 20th century max weber
[00:15:16] in the very early 20th century max weber was looking around society and observing
[00:15:18] was looking around society and observing things noting
[00:15:20] things noting society didn't work the same way
[00:15:22] society didn't work the same way then that it did 100 or 200 years ago
[00:15:26] then that it did 100 or 200 years ago many many people worked inside
[00:15:27] many many people worked inside organizations with a hierarchy
[00:15:30] organizations with a hierarchy formal systems of authority
[00:15:32] formal systems of authority organizations had a director an
[00:15:33] organizations had a director an assistant director officials clerks
[00:15:37] assistant director officials clerks and all of this observed max labor was a
[00:15:40] and all of this observed max labor was a means to which the modern nation state
[00:15:42] means to which the modern nation state processed information
[00:15:43] processed information took it and decided what to do with it
[00:15:45] took it and decided what to do with it rationally sometimes developing the
[00:15:48] rationally sometimes developing the mechanisms to act as if by reflex by
[00:15:51] mechanisms to act as if by reflex by recognizing the kind of problem and
[00:15:52] recognizing the kind of problem and quickly delivering a response sometimes
[00:15:54] quickly delivering a response sometimes by elevating it to people who could sit
[00:15:56] by elevating it to people who could sit in an office talk in a conference room
[00:15:58] in an office talk in a conference room and come up with a solution thinking
[00:16:00] and come up with a solution thinking presumably logically and what max faber
[00:16:03] presumably logically and what max faber noted much to the influence of people
[00:16:05] noted much to the influence of people who came after him including yours truly
[00:16:08] who came after him including yours truly is that these bureaucracies aspired to
[00:16:11] is that these bureaucracies aspired to work like a machine right they were
[00:16:14] work like a machine right they were trying to automate the process of
[00:16:16] trying to automate the process of decision-making in some way to the point
[00:16:18] decision-making in some way to the point that it could be predictable and
[00:16:20] that it could be predictable and rational and
[00:16:23] rational and weber pointed out that that was all well
[00:16:25] weber pointed out that that was all well and good but there were going to be some
[00:16:27] and good but there were going to be some problems along the way and in some ways
[00:16:29] problems along the way and in some ways i'm here to tell you that many of the
[00:16:30] i'm here to tell you that many of the problems that weber highlighted how we
[00:16:33] problems that weber highlighted how we have a love-hate relationship with these
[00:16:35] have a love-hate relationship with these bureaucracies on the one hand we think
[00:16:36] bureaucracies on the one hand we think that they're inefficient that they're
[00:16:38] that they're inefficient that they're rule-bound that they're not creative
[00:16:39] rule-bound that they're not creative that they're frustrating that they're
[00:16:41] that they're frustrating that they're slow but at the same time we can't live
[00:16:43] slow but at the same time we can't live without them
[00:16:44] without them that will end up illuminating in some
[00:16:46] that will end up illuminating in some ways
[00:16:47] ways some of the really interesting choices
[00:16:49] some of the really interesting choices we have about how we use artificial
[00:16:50] we have about how we use artificial intelligence maybe in some ways to
[00:16:53] intelligence maybe in some ways to replace conventional bureaucracy but i
[00:16:55] replace conventional bureaucracy but i would argue in other ways
[00:16:56] would argue in other ways to replicate and in some ways channel
[00:16:59] to replicate and in some ways channel some of the same
[00:17:00] some of the same tragic conflicts and tensions
[00:17:04] tragic conflicts and tensions now
[00:17:05] now channeling max weber to some degree and
[00:17:06] channeling max weber to some degree and also reflecting my own interest in ai in
[00:17:09] also reflecting my own interest in ai in 2016 i wrote a piece
[00:17:11] 2016 i wrote a piece that had the following punchline
[00:17:13] that had the following punchline basically which was
[00:17:14] basically which was sometimes we're going to deal with the
[00:17:16] sometimes we're going to deal with the concerns we have about the role of
[00:17:18] concerns we have about the role of artificial intelligence by suggesting
[00:17:20] artificial intelligence by suggesting that really all we're building are
[00:17:22] that really all we're building are recommendation engines not really that
[00:17:25] recommendation engines not really that different from the way netflix works you
[00:17:27] different from the way netflix works you may also like to watch this
[00:17:29] may also like to watch this judge you may think that this person
[00:17:31] judge you may think that this person deserves a harsher sentence but it's
[00:17:34] deserves a harsher sentence but it's really up to you judge you don't have to
[00:17:36] really up to you judge you don't have to be the one to decide or rather you don't
[00:17:38] be the one to decide or rather you don't you have to be the one to decide we
[00:17:39] you have to be the one to decide we don't have to be the ones to decide we
[00:17:41] don't have to be the ones to decide we the ones who designed the ai system
[00:17:43] the ones who designed the ai system we're just giving you a recommendation
[00:17:45] we're just giving you a recommendation we're using these techniques to give you
[00:17:47] we're using these techniques to give you a sense of what the likelihood is this
[00:17:48] a sense of what the likelihood is this person will be offended
[00:17:50] person will be offended and the point i was trying to make in
[00:17:51] and the point i was trying to make in 2016 which seems now like a long time
[00:17:53] 2016 which seems now like a long time ago
[00:17:54] ago is that actually
[00:17:56] is that actually that line between the computer program
[00:17:58] that line between the computer program in particular the ai system that has
[00:18:00] in particular the ai system that has sophisticated user interface capacities
[00:18:03] sophisticated user interface capacities to sort of speak to you in natural
[00:18:04] to sort of speak to you in natural language or to serve up the information
[00:18:07] language or to serve up the information in a way that's easier for you to
[00:18:08] in a way that's easier for you to assimilate it's really difficult to
[00:18:10] assimilate it's really difficult to police that line between they're just
[00:18:12] police that line between they're just supporting your decision and they're
[00:18:14] supporting your decision and they're actually making the decision
[00:18:16] actually making the decision now here's one place where i can
[00:18:17] now here's one place where i can highlight my point at the very beginning
[00:18:19] highlight my point at the very beginning about how
[00:18:20] about how law emerges with organizations which
[00:18:22] law emerges with organizations which merges with ai if you really want to
[00:18:23] merges with ai if you really want to understand the effect right so if you
[00:18:26] understand the effect right so if you want to know if an ai system is actually
[00:18:28] want to know if an ai system is actually serving as a decision support tool
[00:18:30] serving as a decision support tool rather than actually making a decision
[00:18:32] rather than actually making a decision you're going to want to know the answer
[00:18:33] you're going to want to know the answer to questions like well
[00:18:35] to questions like well are the designers of that system liable
[00:18:37] are the designers of that system liable if it turns out to make a recommendation
[00:18:39] if it turns out to make a recommendation that's really really bad that results in
[00:18:42] that's really really bad that results in people getting injured
[00:18:44] people getting injured or conversely is the organization run in
[00:18:47] or conversely is the organization run in a way that the decision making the
[00:18:49] a way that the decision making the decision maker using the ai system
[00:18:52] decision maker using the ai system is being audited and is being checked to
[00:18:54] is being audited and is being checked to see if all of her decisions are just
[00:18:56] see if all of her decisions are just exactly rubber stamping what the
[00:18:58] exactly rubber stamping what the software does and if that's the case
[00:19:00] software does and if that's the case what's the point of having the human
[00:19:01] what's the point of having the human decision making in the loop anyway right
[00:19:03] decision making in the loop anyway right so i'm giving you the sense that we're
[00:19:04] so i'm giving you the sense that we're building up to this point of all these
[00:19:06] building up to this point of all these conflicts and questions and meanwhile
[00:19:09] conflicts and questions and meanwhile people like robert williams are getting
[00:19:10] people like robert williams are getting arrested
[00:19:12] arrested but now let me return to this point
[00:19:14] but now let me return to this point about how
[00:19:15] about how humans often are not great decision
[00:19:17] humans often are not great decision makers too
[00:19:18] makers too so we can think about where it is that
[00:19:20] so we can think about where it is that human cognition
[00:19:22] human cognition fails in terms of perception
[00:19:24] fails in terms of perception we can think about how humans add up
[00:19:26] we can think about how humans add up information and come up with a thought
[00:19:28] information and come up with a thought or a decision we can think about what
[00:19:30] or a decision we can think about what motivates humans whether it is that even
[00:19:33] motivates humans whether it is that even if i have every reason in the world
[00:19:35] if i have every reason in the world based on my job to be fair when i'm
[00:19:38] based on my job to be fair when i'm working in a police station i'm deciding
[00:19:40] working in a police station i'm deciding who to arrest if i have an improper
[00:19:42] who to arrest if i have an improper motivation if i want to impress somebody
[00:19:44] motivation if i want to impress somebody who happens to be on a ride-along with
[00:19:46] who happens to be on a ride-along with me that day or if i really dislike the
[00:19:49] me that day or if i really dislike the person who works in this particular area
[00:19:51] person who works in this particular area of town and i want to arrest them
[00:19:52] of town and i want to arrest them because i have a nefarious motivation
[00:19:54] because i have a nefarious motivation that can mean that human decision making
[00:19:57] that can mean that human decision making gets all messed up and even the legal
[00:19:59] gets all messed up and even the legal arrangements we have to police human
[00:20:01] arrangements we have to police human behavior
[00:20:02] behavior can fall short
[00:20:04] can fall short but my next slide which is probably the
[00:20:06] but my next slide which is probably the messiest slide of the whole presentation
[00:20:08] messiest slide of the whole presentation so you don't have to memorize it or even
[00:20:10] so you don't have to memorize it or even read it all
[00:20:11] read it all you know i can make these available to
[00:20:13] you know i can make these available to you later but here's the punchline the
[00:20:14] you later but here's the punchline the punchline is that the mere argument that
[00:20:17] punchline is that the mere argument that humans are not as good as the
[00:20:19] humans are not as good as the performance of ai systems in a discrete
[00:20:22] performance of ai systems in a discrete test like facial recognition does not
[00:20:24] test like facial recognition does not really answer the question of how you
[00:20:26] really answer the question of how you want
[00:20:27] want ai systems to be used by organizations
[00:20:30] ai systems to be used by organizations to make decisions because the devil's
[00:20:32] to make decisions because the devil's really in the details let me just pick
[00:20:34] really in the details let me just pick two points here to highlight
[00:20:35] two points here to highlight let's talk about perception
[00:20:37] let's talk about perception so the field of
[00:20:39] so the field of the neurophysiology of how vision works
[00:20:42] the neurophysiology of how vision works is really really complicated and
[00:20:44] is really really complicated and fascinating and it's not an accident i
[00:20:46] fascinating and it's not an accident i would argue that some of the coolest
[00:20:48] would argue that some of the coolest things that we have been learning
[00:20:50] things that we have been learning about
[00:20:51] about how to develop better image recognition
[00:20:54] how to develop better image recognition systems in the ii space are influenced
[00:20:56] systems in the ii space are influenced by what we learned from neuroscience
[00:20:59] by what we learned from neuroscience but the fact that that's still a bit of
[00:21:00] but the fact that that's still a bit of a mystery highlights to you that
[00:21:03] a mystery highlights to you that we actually
[00:21:04] we actually only understand a little bit about how
[00:21:06] only understand a little bit about how humans make visual processing decisions
[00:21:09] humans make visual processing decisions for example we know that it takes about
[00:21:11] for example we know that it takes about 100 milliseconds for humans to perceive
[00:21:13] 100 milliseconds for humans to perceive whether a picture reflects a person of
[00:21:15] whether a picture reflects a person of one gender or another generally for
[00:21:18] one gender or another generally for humans to pick up emotions for humans to
[00:21:19] humans to pick up emotions for humans to recognize familiar faces but eyewitness
[00:21:22] recognize familiar faces but eyewitness identification i do you remember that
[00:21:24] identification i do you remember that involves unfamiliar faces do you
[00:21:26] involves unfamiliar faces do you remember whether this image is showing
[00:21:28] remember whether this image is showing you the person that you
[00:21:30] you the person that you think you saw two weeks ago when the
[00:21:32] think you saw two weeks ago when the glass was shattered and somebody came
[00:21:34] glass was shattered and somebody came into your apartment at night and grabbed
[00:21:36] into your apartment at night and grabbed your beautiful collection of baseball
[00:21:38] your beautiful collection of baseball cards and left
[00:21:40] cards and left that is a lot less exact and as one of
[00:21:42] that is a lot less exact and as one of my colleagues explained in a dissenting
[00:21:44] my colleagues explained in a dissenting opinion in a case called people we read
[00:21:46] opinion in a case called people we read you know
[00:21:47] you know we would be
[00:21:48] we would be grossly inaccurate if we suggested that
[00:21:50] grossly inaccurate if we suggested that that is a
[00:21:52] that is a system of identification that works
[00:21:54] system of identification that works really really well
[00:21:56] really really well but then of course if you compare that
[00:21:58] but then of course if you compare that to the way ai systems work on the one
[00:22:00] to the way ai systems work on the one hand ai systems might be
[00:22:02] hand ai systems might be in the lab much more accurate than
[00:22:04] in the lab much more accurate than humans at picking out the similarity
[00:22:06] humans at picking out the similarity between two images
[00:22:08] between two images just at random not that are sort of
[00:22:10] just at random not that are sort of known earlier the way humans might know
[00:22:12] known earlier the way humans might know them earlier
[00:22:13] them earlier but on the other hand the ability of
[00:22:15] but on the other hand the ability of those systems outside of the lab to
[00:22:18] those systems outside of the lab to operate effectively
[00:22:19] operate effectively and particularly to detect emotions for
[00:22:21] and particularly to detect emotions for example
[00:22:23] example is not so great
[00:22:25] is not so great these systems have
[00:22:28] these systems have in a number of applications and
[00:22:29] in a number of applications and instantiations real differences in how
[00:22:32] instantiations real differences in how effectively they work for pictures of
[00:22:35] effectively they work for pictures of people who identify as white rather than
[00:22:36] people who identify as white rather than for blacks or asians and of course you
[00:22:38] for blacks or asians and of course you have all kinds of other failure modes
[00:22:40] have all kinds of other failure modes like hackie
[00:22:41] like hackie and then of course we could talk about
[00:22:44] and then of course we could talk about legal arrangements and here i would just
[00:22:46] legal arrangements and here i would just note that
[00:22:47] note that we humans have
[00:22:49] we humans have hundreds of years of experience dealing
[00:22:52] hundreds of years of experience dealing with human mistakes that's really what
[00:22:54] with human mistakes that's really what the legal system is designed to do
[00:22:56] the legal system is designed to do we are only learning now how to adapt
[00:22:59] we are only learning now how to adapt our legal rules and standards to deal
[00:23:01] our legal rules and standards to deal with the mistakes that machines make
[00:23:04] with the mistakes that machines make we're not starting from scratch but it
[00:23:06] we're not starting from scratch but it would be a mistake to assume that we
[00:23:08] would be a mistake to assume that we figured out exactly how to do that
[00:23:11] figured out exactly how to do that so
[00:23:13] so now i want to make the point that when
[00:23:14] now i want to make the point that when we are dealing with problems posed by ai
[00:23:17] we are dealing with problems posed by ai in the legal system
[00:23:19] in the legal system we are not starting from scratch and the
[00:23:22] we are not starting from scratch and the best way i can make that point is to
[00:23:23] best way i can make that point is to just highlight for those of you who are
[00:23:25] just highlight for those of you who are vicariously interested in asking
[00:23:27] vicariously interested in asking yourself what would it be like to go to
[00:23:28] yourself what would it be like to go to law school what would that feel like
[00:23:31] law school what would that feel like and you're thinking well maybe that
[00:23:32] and you're thinking well maybe that would not be terrible maybe it might be
[00:23:34] would not be terrible maybe it might be kind of fun
[00:23:35] kind of fun i'll give you a flavor of some of the
[00:23:36] i'll give you a flavor of some of the subjects that people learn about in law
[00:23:38] subjects that people learn about in law school and it will not take a rocket
[00:23:40] school and it will not take a rocket scientist that will not take a stanford
[00:23:42] scientist that will not take a stanford computer science professor to see
[00:23:44] computer science professor to see that these subjects that we cover in law
[00:23:46] that these subjects that we cover in law school they're just literally touching
[00:23:48] school they're just literally touching right up against ai
[00:23:50] right up against ai already and it will continue to be the
[00:23:52] already and it will continue to be the case
[00:23:53] case so in an area of law called agency law
[00:23:56] so in an area of law called agency law is where we figure out like if professor
[00:23:57] is where we figure out like if professor sadik says to a ta i want you to go
[00:24:00] sadik says to a ta i want you to go across campus and i want you to pick up
[00:24:02] across campus and i want you to pick up this particular
[00:24:04] this particular computer and i want you to carry it to
[00:24:06] computer and i want you to carry it to the other side of campus and along the
[00:24:08] the other side of campus and along the way the person picks up the computer but
[00:24:10] way the person picks up the computer but then gets distracted drops the computer
[00:24:12] then gets distracted drops the computer and kills a bird and it turns out that
[00:24:14] and kills a bird and it turns out that that bird is the prize-winning bird of
[00:24:15] that bird is the prize-winning bird of somebody's like bird collection or
[00:24:17] somebody's like bird collection or whatever
[00:24:18] whatever does professor sadik end up being
[00:24:20] does professor sadik end up being responsible well agency law resolves
[00:24:22] responsible well agency law resolves that kind of question when are you
[00:24:24] that kind of question when are you responsible for the actions of others in
[00:24:26] responsible for the actions of others in your organization of your agent
[00:24:29] your organization of your agent now ordinarily agency law applies to the
[00:24:31] now ordinarily agency law applies to the actions that you begin to put in motion
[00:24:33] actions that you begin to put in motion that some other human being engages in
[00:24:36] that some other human being engages in but you can totally see how this branch
[00:24:38] but you can totally see how this branch of law is beginning to grapple with the
[00:24:40] of law is beginning to grapple with the question of when you are responsible for
[00:24:42] question of when you are responsible for the actions that you set in motion
[00:24:44] the actions that you set in motion because you design an agent to do
[00:24:46] because you design an agent to do something
[00:24:47] something like to sort employee applicants and
[00:24:49] like to sort employee applicants and then the agent does that the artificial
[00:24:52] then the agent does that the artificial software based agent
[00:24:54] software based agent okay so then you have my core field of
[00:24:56] okay so then you have my core field of administrative law and legislation this
[00:24:58] administrative law and legislation this is the law of what
[00:25:00] is the law of what counts as sufficient justification for
[00:25:02] counts as sufficient justification for any action of government
[00:25:04] any action of government if the president signs an executive
[00:25:06] if the president signs an executive order saying i don't want
[00:25:08] order saying i don't want the census to keep on going until
[00:25:11] the census to keep on going until december i want it to stop in october
[00:25:14] december i want it to stop in october when does the president have the power
[00:25:15] when does the president have the power to do that how does that power
[00:25:19] to do that how does that power get into some conflict potentially with
[00:25:21] get into some conflict potentially with the power of congress to pass a law
[00:25:22] the power of congress to pass a law saying how long the sense is supposed to
[00:25:24] saying how long the sense is supposed to continue you get the idea what if the
[00:25:26] continue you get the idea what if the government says well you're going to
[00:25:27] government says well you're going to have to move out of this home because we
[00:25:29] have to move out of this home because we want to build a road through here what
[00:25:31] want to build a road through here what right do you have to challenge that kind
[00:25:32] right do you have to challenge that kind of action
[00:25:34] of action so obviously the more and more that
[00:25:36] so obviously the more and more that government decision making
[00:25:37] government decision making involves reliance on machines the more
[00:25:40] involves reliance on machines the more and more that this branch of law is
[00:25:42] and more that this branch of law is going to have to deal with the question
[00:25:43] going to have to deal with the question of
[00:25:44] of what does it mean when the machine is
[00:25:45] what does it mean when the machine is empowered to play a crucial role in that
[00:25:47] empowered to play a crucial role in that government decision does that make it
[00:25:48] government decision does that make it more reliable less reliable more fair
[00:25:51] more reliable less reliable more fair less fair when can we do that one can we
[00:25:53] less fair when can we do that one can we not do that
[00:25:54] not do that last but certainly not least tort law
[00:25:56] last but certainly not least tort law tort law is about who has a duty to whom
[00:25:59] tort law is about who has a duty to whom what counts as a reasonable decision and
[00:26:01] what counts as a reasonable decision and how do we attribute causal
[00:26:02] how do we attribute causal responsibility for bad things that
[00:26:04] responsibility for bad things that happen translation let's say you're back
[00:26:06] happen translation let's say you're back on campus and sadly you get covert 19.
[00:26:10] on campus and sadly you get covert 19. can you blame the university
[00:26:11] can you blame the university when why can you blame the university
[00:26:13] when why can you blame the university why can you not
[00:26:15] why can you not or you know let's suppose forget covert
[00:26:17] or you know let's suppose forget covert 19 for a moment let's suppose that
[00:26:19] 19 for a moment let's suppose that you're in a lab and sadly your lab
[00:26:20] you're in a lab and sadly your lab partner
[00:26:21] partner decides to try to attack you and you
[00:26:25] decides to try to attack you and you survive but you're asking well wasn't
[00:26:27] survive but you're asking well wasn't the university responsible for making
[00:26:29] the university responsible for making sure that i wasn't attacked that's
[00:26:31] sure that i wasn't attacked that's toward law
[00:26:32] toward law and you can imagine that as the
[00:26:34] and you can imagine that as the information that is the fuel of modern
[00:26:38] information that is the fuel of modern ai systems and the sort of fuel for
[00:26:39] ai systems and the sort of fuel for machine learning increasingly flows to
[00:26:42] machine learning increasingly flows to systems that are interconnected
[00:26:43] systems that are interconnected questions about what a decision maker
[00:26:45] questions about what a decision maker does with that information and whether
[00:26:47] does with that information and whether that information makes the decision
[00:26:49] that information makes the decision maker responsible for a different kind
[00:26:51] maker responsible for a different kind of safety protection relative to
[00:26:53] of safety protection relative to somebody that could be protected that
[00:26:55] somebody that could be protected that all becomes more interesting
[00:26:57] all becomes more interesting okay
[00:26:59] okay so let me give you some context for how
[00:27:00] so let me give you some context for how to think about these problems by just
[00:27:02] to think about these problems by just acknowledging the history of ai is kind
[00:27:04] acknowledging the history of ai is kind of long and it does not start with the
[00:27:06] of long and it does not start with the birth of the internet it goes back
[00:27:07] birth of the internet it goes back further in history to some of professor
[00:27:10] further in history to some of professor sadik's colleagues in the computer
[00:27:12] sadik's colleagues in the computer science department at stanford so i
[00:27:14] science department at stanford so i could go on and on about this but my
[00:27:16] could go on and on about this but my little uh
[00:27:19] subtext in addition to what i want to
[00:27:20] subtext in addition to what i want to share about the history of ai is to kind
[00:27:22] share about the history of ai is to kind of quickly give you a sense of how in
[00:27:23] of quickly give you a sense of how in the world i became super interested in
[00:27:26] the world i became super interested in this beginning a little in college but
[00:27:27] this beginning a little in college but then again when i worked in government
[00:27:29] then again when i worked in government at the treasury department and even more
[00:27:30] at the treasury department and even more so when i came back from working for
[00:27:32] so when i came back from working for obama in uh 2010
[00:27:35] obama in uh 2010 so
[00:27:36] so just look at those pictures for a moment
[00:27:38] just look at those pictures for a moment you might recognize some of these faces
[00:27:40] you might recognize some of these faces i'm sure you recognize at least one
[00:27:42] i'm sure you recognize at least one the one where the woman had read as it
[00:27:44] the one where the woman had read as it were but if you go back a little further
[00:27:46] were but if you go back a little further what you're going to see in the picture
[00:27:47] what you're going to see in the picture under 1950s is herbert simon
[00:27:50] under 1950s is herbert simon a really really smart man whose parents
[00:27:52] a really really smart man whose parents were refugees from germany who spent
[00:27:55] were refugees from germany who spent most of his career at carnegie mellon
[00:27:56] most of his career at carnegie mellon university
[00:27:58] university and
[00:27:59] and i mean come on you have to be pretty
[00:28:01] i mean come on you have to be pretty smart if you
[00:28:03] smart if you start as a political scientist but then
[00:28:06] start as a political scientist but then become interested in psychology end up
[00:28:08] become interested in psychology end up writing about economics and winning the
[00:28:11] writing about economics and winning the nobel prize in economics and along the
[00:28:13] nobel prize in economics and along the way you become like a major pioneer of
[00:28:16] way you become like a major pioneer of ai that was herbert simon for you he was
[00:28:18] ai that was herbert simon for you he was so brilliant i recommend to you any book
[00:28:20] so brilliant i recommend to you any book or article ever written by herbert simon
[00:28:22] or article ever written by herbert simon among other things
[00:28:24] among other things one of the reasons he won the nobel
[00:28:25] one of the reasons he won the nobel prize in economics is because he
[00:28:27] prize in economics is because he developed the notion of bounded
[00:28:28] developed the notion of bounded rationality which is at the core of what
[00:28:29] rationality which is at the core of what we call behavioral economics now the
[00:28:31] we call behavioral economics now the notion that you may not
[00:28:34] notion that you may not you may be best modeled as a human not
[00:28:36] you may be best modeled as a human not as somebody who's trying to optimize but
[00:28:37] as somebody who's trying to optimize but as somebody who's trying to satisfy a
[00:28:39] as somebody who's trying to satisfy a certain threshold and we can certainly
[00:28:41] certain threshold and we can certainly use that insight to imagine how to
[00:28:43] use that insight to imagine how to design a software agent and how to do
[00:28:45] design a software agent and how to do machine learning which is one reason why
[00:28:47] machine learning which is one reason why you can imagine his expertise and
[00:28:49] you can imagine his expertise and brilliance got transferred over into ai
[00:28:52] brilliance got transferred over into ai he's most associated ai with the story
[00:28:54] he's most associated ai with the story of um
[00:28:56] of um development of systems to do basically
[00:28:59] development of systems to do basically like first order logic
[00:29:01] like first order logic and mathematical type reasoning what
[00:29:04] and mathematical type reasoning what some refer to as good old-fashioned ai
[00:29:07] some refer to as good old-fashioned ai and i'll just note here that that was
[00:29:09] and i'll just note here that that was really important but always
[00:29:13] really important but always treat it as the holy grail maybe
[00:29:14] treat it as the holy grail maybe something was elusive and not possible
[00:29:17] something was elusive and not possible to realize the kind of instinctive
[00:29:20] to realize the kind of instinctive almost automatic decision making and
[00:29:22] almost automatic decision making and motion that now is so much at the
[00:29:25] motion that now is so much at the cutting edge of what we are helping
[00:29:27] cutting edge of what we are helping robots and ai systems to do
[00:29:30] robots and ai systems to do the recognition piece was missing even
[00:29:33] the recognition piece was missing even if the cognition piece at least around
[00:29:35] if the cognition piece at least around how you prove theorems was possible to
[00:29:38] how you prove theorems was possible to instantiate early on briefly by the
[00:29:40] instantiate early on briefly by the 1970s the picture really is different
[00:29:43] 1970s the picture really is different here that picture includes
[00:29:46] here that picture includes ed fagenbaum somebody who is a colleague
[00:29:48] ed fagenbaum somebody who is a colleague in computer science and always someone
[00:29:51] in computer science and always someone fascinating to talk to somebody who's
[00:29:52] fascinating to talk to somebody who's been one of my mentors a little bit and
[00:29:54] been one of my mentors a little bit and trying to learn about ai and he's very
[00:29:56] trying to learn about ai and he's very much associated with expert systems with
[00:29:58] much associated with expert systems with taking insights from not only
[00:30:02] taking insights from not only the work of herbert's simon who was
[00:30:04] the work of herbert's simon who was actually
[00:30:05] actually ed fagenbaum's
[00:30:07] ed fagenbaum's but also from
[00:30:09] but also from psychology and sociology and decision
[00:30:12] psychology and sociology and decision theory to develop systems that could act
[00:30:15] theory to develop systems that could act almost as experts and replicate
[00:30:17] almost as experts and replicate knowledge in particular domains
[00:30:19] knowledge in particular domains and then by the 2000s the real
[00:30:21] and then by the 2000s the real phenomenon that changes everything and
[00:30:23] phenomenon that changes everything and certainly gives rise to the prominence
[00:30:25] certainly gives rise to the prominence of the person in the third picture
[00:30:26] of the person in the third picture sheryl sandberg is the rise of the
[00:30:28] sheryl sandberg is the rise of the internet because of course all this
[00:30:30] internet because of course all this stuff about ai was happening partly in
[00:30:32] stuff about ai was happening partly in academic labs
[00:30:34] academic labs and partly in defense departments
[00:30:37] and partly in defense departments but uh suddenly the ability we have to
[00:30:40] but uh suddenly the ability we have to harvest and centralize
[00:30:43] harvest and centralize billions and billions and billions of
[00:30:45] billions and billions and billions of pieces of behavioral data from humans
[00:30:47] pieces of behavioral data from humans and of course to do it in
[00:30:49] and of course to do it in systems that work
[00:30:51] systems that work faster and have access to more computing
[00:30:53] faster and have access to more computing power lets us do some truly amazing
[00:30:56] power lets us do some truly amazing things and i'll just note here that my
[00:30:58] things and i'll just note here that my interest in this begins in college in 93
[00:31:00] interest in this begins in college in 93 when i was trying to understand how
[00:31:01] when i was trying to understand how human decision making could be modeled
[00:31:03] human decision making could be modeled so kind of very much the herbert simon
[00:31:04] so kind of very much the herbert simon sort of work
[00:31:06] sort of work but when i was working in treasury in
[00:31:07] but when i was working in treasury in the late 90s
[00:31:09] the late 90s wasn't lost on me that there was just so
[00:31:10] wasn't lost on me that there was just so much
[00:31:11] much data that the us government had gathered
[00:31:13] data that the us government had gathered around financial transactions and i was
[00:31:16] around financial transactions and i was interested in privacy as you might have
[00:31:18] interested in privacy as you might have been but also interested in the idea
[00:31:20] been but also interested in the idea that if that data were available how
[00:31:22] that if that data were available how could it be used in a way that was
[00:31:24] could it be used in a way that was efficient lawful
[00:31:26] efficient lawful analytically sophisticated to detect
[00:31:28] analytically sophisticated to detect really really problematic
[00:31:30] really really problematic uses of the financial system including
[00:31:32] uses of the financial system including to commit corruption for example to
[00:31:34] to commit corruption for example to launder money and so on
[00:31:36] launder money and so on and so i became exposed to some of the
[00:31:38] and so i became exposed to some of the techniques that you're learning about in
[00:31:39] techniques that you're learning about in this class right now
[00:31:41] this class right now when i came back from the obama
[00:31:42] when i came back from the obama administration in 2010 it struck me that
[00:31:45] administration in 2010 it struck me that so many of the domains in which i was
[00:31:46] so many of the domains in which i was working particularly around public
[00:31:48] working particularly around public health and criminal justice
[00:31:50] health and criminal justice were already being affected by early
[00:31:52] were already being affected by early examples and applications of this stuff
[00:31:55] examples and applications of this stuff so i became really interested in trying
[00:31:58] so i became really interested in trying to understand how this stuff would
[00:32:00] to understand how this stuff would affect
[00:32:01] affect every aspect of decision making in law
[00:32:03] every aspect of decision making in law and political science and try to learn
[00:32:05] and political science and try to learn more about what you're learning about
[00:32:07] more about what you're learning about right now
[00:32:08] right now so here's where i want to highlight
[00:32:10] so here's where i want to highlight where my own thinking went after i
[00:32:12] where my own thinking went after i returned from the white house in 2010
[00:32:14] returned from the white house in 2010 it struck me that some of the most
[00:32:16] it struck me that some of the most interesting work happening in ai in
[00:32:18] interesting work happening in ai in universities but also in the private
[00:32:19] universities but also in the private sector was about pushing the boundaries
[00:32:21] sector was about pushing the boundaries of analytical techniques to discern
[00:32:23] of analytical techniques to discern patterns
[00:32:24] patterns unsupervised learning reinforcement
[00:32:26] unsupervised learning reinforcement learning and so on and the breakthroughs
[00:32:28] learning and so on and the breakthroughs were really extraordinary and they
[00:32:29] were really extraordinary and they continue to be
[00:32:30] continue to be but it was also striking to me that
[00:32:32] but it was also striking to me that these techniques in their raw form were
[00:32:35] these techniques in their raw form were not necessarily designed to influence or
[00:32:36] not necessarily designed to influence or help non-experts they were not
[00:32:38] help non-experts they were not necessarily designed to solve real-world
[00:32:40] necessarily designed to solve real-world problems
[00:32:41] problems so if instead you're looking at how ai
[00:32:43] so if instead you're looking at how ai techniques get used
[00:32:45] techniques get used like they were used in the arrest of
[00:32:47] like they were used in the arrest of robert williams you're not dealing with
[00:32:49] robert williams you're not dealing with ai techniques by themselves you're
[00:32:51] ai techniques by themselves you're dealing with ai systems
[00:32:53] dealing with ai systems which
[00:32:54] which my co-author and i defined using
[00:32:56] my co-author and i defined using probably a little too much mumbo jumbo a
[00:32:58] probably a little too much mumbo jumbo a socio-technical embodiment of policy
[00:33:01] socio-technical embodiment of policy codified an appropriate con
[00:33:03] codified an appropriate con computational learning tool so a system
[00:33:06] computational learning tool so a system to gather data and learn from the data
[00:33:09] to gather data and learn from the data and embedded in a specific institutional
[00:33:11] and embedded in a specific institutional context meaning it fits in an
[00:33:12] context meaning it fits in an organization and is given a certain
[00:33:15] organization and is given a certain purview
[00:33:16] purview people who make decisions are told
[00:33:18] people who make decisions are told here's how you can use the tool here's
[00:33:20] here's how you can use the tool here's how you shouldn't use the tool right
[00:33:22] how you shouldn't use the tool right and really what that means is when you
[00:33:24] and really what that means is when you want to understand how ai is being used
[00:33:26] want to understand how ai is being used in the real world
[00:33:27] in the real world you have to understand relationships of
[00:33:29] you have to understand relationships of power who gets to decide that the system
[00:33:33] power who gets to decide that the system works the way it does and that somebody
[00:33:35] works the way it does and that somebody can point to that system and claim that
[00:33:38] can point to that system and claim that it embodies some kind of intelligence
[00:33:42] why does this matter well it matters
[00:33:44] why does this matter well it matters because now here we get to the other
[00:33:46] because now here we get to the other side of the coin of the internet right
[00:33:47] side of the coin of the internet right we're not in a world where this is
[00:33:49] we're not in a world where this is mostly happening in the lab right now
[00:33:50] mostly happening in the lab right now we're in a world where really important
[00:33:52] we're in a world where really important things in the world are being affected
[00:33:54] things in the world are being affected by ai i cannot give this lecture without
[00:33:56] by ai i cannot give this lecture without pointing to the toothbrush that somebody
[00:33:58] pointing to the toothbrush that somebody recently gave me as a gift that
[00:34:00] recently gave me as a gift that advertises how it uses artificial
[00:34:02] advertises how it uses artificial intelligence to learn how to brush your
[00:34:04] intelligence to learn how to brush your teeth
[00:34:04] teeth and this is the genesis of a concept i
[00:34:07] and this is the genesis of a concept i call toothbrush maturity
[00:34:09] call toothbrush maturity when technology gets to be so ubiquitous
[00:34:11] when technology gets to be so ubiquitous to the point that it intersects even
[00:34:13] to the point that it intersects even with a toothbrush then you know that
[00:34:15] with a toothbrush then you know that you're dealing with something that has
[00:34:17] you're dealing with something that has to be understood in its real world
[00:34:19] to be understood in its real world context and not just in the theoretical
[00:34:21] context and not just in the theoretical stories you can tell about how well it's
[00:34:23] stories you can tell about how well it's going to work another example of this
[00:34:25] going to work another example of this really though is that
[00:34:27] really though is that the very large internet companies that
[00:34:28] the very large internet companies that are around us in silicon valley have a
[00:34:30] are around us in silicon valley have a market capitalization that you can't
[00:34:32] market capitalization that you can't really explain without understanding
[00:34:33] really explain without understanding just how well online advertising must be
[00:34:36] just how well online advertising must be working and how much it's leveraging the
[00:34:38] working and how much it's leveraging the enormous amounts of data that are
[00:34:40] enormous amounts of data that are generated by the internet and analyzed
[00:34:43] generated by the internet and analyzed by some of the ai techniques that you
[00:34:45] by some of the ai techniques that you are learning about here where's this
[00:34:48] are learning about here where's this going is so interesting and the short
[00:34:50] going is so interesting and the short answer is i really wonder whether anyone
[00:34:53] answer is i really wonder whether anyone really fully knows
[00:34:55] really fully knows and that's true of almost any technology
[00:34:56] and that's true of almost any technology right you can't always predict by the
[00:34:58] right you can't always predict by the way i'm about three quarters of the way
[00:34:59] way i'm about three quarters of the way through the presentation so just bear
[00:35:01] through the presentation so just bear with me for a few more minutes but
[00:35:03] with me for a few more minutes but you know i can point to the different
[00:35:05] you know i can point to the different things here but the main point i want
[00:35:06] things here but the main point i want this slide to highlight for you is that
[00:35:08] this slide to highlight for you is that some of the breakthroughs that we're
[00:35:10] some of the breakthroughs that we're seeing right now are not so much
[00:35:13] seeing right now are not so much progress in terms of just more clever
[00:35:15] progress in terms of just more clever algorithms or even more different data
[00:35:17] algorithms or even more different data but it's partly that it's partly just
[00:35:19] but it's partly that it's partly just leveraging more and more computing power
[00:35:21] leveraging more and more computing power i wonder where that's going to go i
[00:35:23] i wonder where that's going to go i don't know that that's sustainable
[00:35:25] don't know that that's sustainable but i do think that if you want to get a
[00:35:27] but i do think that if you want to get a sense of where this field is going think
[00:35:30] sense of where this field is going think a little bit about language in
[00:35:32] a little bit about language in particular
[00:35:33] particular because
[00:35:35] because if i go back and think a little bit
[00:35:37] if i go back and think a little bit about how government agencies were
[00:35:38] about how government agencies were making decisions in the late 90s when i
[00:35:40] making decisions in the late 90s when i was there
[00:35:41] was there most of the expert analysis was being
[00:35:44] most of the expert analysis was being done using techniques like probit logit
[00:35:47] done using techniques like probit logit econometrics like regression stuff
[00:35:48] econometrics like regression stuff you're going to be learning about in
[00:35:49] you're going to be learning about in this class but it was being mediated
[00:35:51] this class but it was being mediated through humans presenting to each other
[00:35:54] through humans presenting to each other what ar systems may increasingly have
[00:35:55] what ar systems may increasingly have the capacity to do is to use those very
[00:35:57] the capacity to do is to use those very same techniques but to then communicate
[00:35:59] same techniques but to then communicate with the user in a way that is adaptive
[00:36:01] with the user in a way that is adaptive to the human
[00:36:03] to the human and able to leverage
[00:36:05] and able to leverage language in a way that previously
[00:36:07] language in a way that previously software did not do so that persuasive
[00:36:10] software did not do so that persuasive ability of software is something we have
[00:36:12] ability of software is something we have never really seen before and as we have
[00:36:15] never really seen before and as we have more effective use of compute and
[00:36:16] more effective use of compute and greater use of compute i think the feats
[00:36:19] greater use of compute i think the feats that will be possible when you sort of
[00:36:20] that will be possible when you sort of marry up the gt3 type stuff with the
[00:36:23] marry up the gt3 type stuff with the analytics will be very different which
[00:36:25] analytics will be very different which is to say a lot of humans who are
[00:36:26] is to say a lot of humans who are consuming the output are not necessarily
[00:36:29] consuming the output are not necessarily going to be in a great position to be
[00:36:30] going to be in a great position to be very sophisticated arbiters of whether
[00:36:32] very sophisticated arbiters of whether what they're being told or recommended
[00:36:34] what they're being told or recommended is accurate or not
[00:36:36] is accurate or not just to wrap up
[00:36:38] just to wrap up there are all kinds of interesting
[00:36:39] there are all kinds of interesting intersections now about the law and
[00:36:42] intersections now about the law and ai and policy problems that result
[00:36:45] ai and policy problems that result i want to make a pitch to you this is
[00:36:46] i want to make a pitch to you this is kind of tentative i'm not as certain
[00:36:48] kind of tentative i'm not as certain about this as i am about other things
[00:36:50] about this as i am about other things that we're actually having this really
[00:36:52] that we're actually having this really weird bifurcated bimodal distribution of
[00:36:55] weird bifurcated bimodal distribution of attention to the problems where some
[00:36:57] attention to the problems where some problems now are so familiar
[00:37:00] problems now are so familiar that we don't necessarily know how to
[00:37:01] that we don't necessarily know how to solve them but you will hear the
[00:37:02] solve them but you will hear the buzzwords very often explainability
[00:37:04] buzzwords very often explainability interpretability
[00:37:06] interpretability bias
[00:37:07] bias privacy
[00:37:08] privacy etc and these problems i think of as not
[00:37:12] etc and these problems i think of as not medium to long-term problems they are
[00:37:13] medium to long-term problems they are present day problems they have hit
[00:37:15] present day problems they have hit already just ask robert williams
[00:37:18] already just ask robert williams and then when you see an interview with
[00:37:20] and then when you see an interview with like elon musk
[00:37:22] like elon musk then you're going to hear about
[00:37:23] then you're going to hear about catastrophic or existential risk
[00:37:26] catastrophic or existential risk i think that it would probably be a big
[00:37:28] i think that it would probably be a big mistake to ignore catastrophic of
[00:37:30] mistake to ignore catastrophic of existence or existential risk much as i
[00:37:33] existence or existential risk much as i would have argued in the 1960s if i'd
[00:37:35] would have argued in the 1960s if i'd been alive then and an adult then
[00:37:38] been alive then and an adult then anybody who was interested in the future
[00:37:39] anybody who was interested in the future of fossil fuels even if we didn't have
[00:37:41] of fossil fuels even if we didn't have all the science
[00:37:42] all the science would probably be
[00:37:44] would probably be making a mistake if they ignored
[00:37:46] making a mistake if they ignored completely what the risks might be if
[00:37:47] completely what the risks might be if they were trying to understand the risk
[00:37:48] they were trying to understand the risk systemically for the planet
[00:37:50] systemically for the planet of
[00:37:51] of the use at scale of these techniques for
[00:37:53] the use at scale of these techniques for producing energy once the rest of the
[00:37:56] producing energy once the rest of the world meaning poor people in indonesia
[00:37:59] world meaning poor people in indonesia india and africa and china began to
[00:38:01] india and africa and china began to demand
[00:38:02] demand the level of consumption of energy that
[00:38:03] the level of consumption of energy that americans and europeans have taken had
[00:38:05] americans and europeans have taken had taken for granted right
[00:38:08] taken for granted right but i still think that in some ways the
[00:38:10] but i still think that in some ways the catastrophic or existential risk piece
[00:38:13] catastrophic or existential risk piece is not a risk that i believe the world
[00:38:16] is not a risk that i believe the world is likely to be facing
[00:38:18] is likely to be facing in five years or eight years or ten
[00:38:20] in five years or eight years or ten years
[00:38:21] years that's just maybe um something we can go
[00:38:23] that's just maybe um something we can go into in the q a about why that is but i
[00:38:25] into in the q a about why that is but i suspect that the the level of delegation
[00:38:28] suspect that the the level of delegation we have already engaged in the ai
[00:38:29] we have already engaged in the ai systems doesn't get to the point where
[00:38:31] systems doesn't get to the point where they can
[00:38:32] they can protect their purview and power without
[00:38:34] protect their purview and power without our intervention as well as they might
[00:38:37] our intervention as well as they might someday
[00:38:38] someday and obviously that requires for the
[00:38:40] and obviously that requires for the thinking but that leaves some seriously
[00:38:43] thinking but that leaves some seriously interesting issues that i think really
[00:38:46] interesting issues that i think really deserve attention more in the short term
[00:38:48] deserve attention more in the short term for one
[00:38:50] for one this question of where causal
[00:38:51] this question of where causal responsibility lies
[00:38:53] responsibility lies when a system that deploys ai
[00:38:57] when a system that deploys ai acts in a way that is not safe think
[00:38:59] acts in a way that is not safe think about the autonomous vehicle but not
[00:39:02] about the autonomous vehicle but not only the autonomous vehicle think about
[00:39:04] only the autonomous vehicle think about the ai system in a large company that
[00:39:06] the ai system in a large company that increasingly is making financial
[00:39:08] increasingly is making financial decisions
[00:39:09] decisions reviewed by humans perhaps needed by
[00:39:10] reviewed by humans perhaps needed by humans but increasingly in an autonomous
[00:39:12] humans but increasingly in an autonomous way i think that problems involving
[00:39:15] way i think that problems involving power and collective action are really
[00:39:17] power and collective action are really interesting in this space so if you're
[00:39:19] interesting in this space so if you're running a large company and suddenly
[00:39:22] running a large company and suddenly 27 of those jobs are now going to be
[00:39:26] 27 of those jobs are now going to be or the functions done by different
[00:39:28] or the functions done by different people in different jobs are now going
[00:39:29] people in different jobs are now going to be done by ai systems how does that
[00:39:32] to be done by ai systems how does that redistribute power within the
[00:39:33] redistribute power within the organization
[00:39:35] organization how does the advent of lethal autonomous
[00:39:38] how does the advent of lethal autonomous weapons influence the distribution of
[00:39:40] weapons influence the distribution of power in geopolitics for example how
[00:39:42] power in geopolitics for example how does it empower countries with smaller
[00:39:43] does it empower countries with smaller armies and so on
[00:39:46] armies and so on another point which is familiar to
[00:39:47] another point which is familiar to people working on cars in particular is
[00:39:50] people working on cars in particular is that precision can spur disagreement
[00:39:52] that precision can spur disagreement right now a lot of legal rules are
[00:39:54] right now a lot of legal rules are written in fairly general terms
[00:39:56] written in fairly general terms which is to say
[00:39:59] which is to say humans are not supposed to drive when
[00:40:00] humans are not supposed to drive when they are impaired they're supposed to
[00:40:02] they are impaired they're supposed to engage in driving that shows reasonable
[00:40:05] engage in driving that shows reasonable care etc etc these are fairly vague
[00:40:08] care etc etc these are fairly vague descriptions in the court figure out
[00:40:10] descriptions in the court figure out what that means in particular fact
[00:40:11] what that means in particular fact patterns with the help of a jury
[00:40:13] patterns with the help of a jury but when you can actually program
[00:40:16] but when you can actually program an automated system to make split-second
[00:40:19] an automated system to make split-second decisions that are extremely precise
[00:40:21] decisions that are extremely precise about when and how to prioritize
[00:40:24] about when and how to prioritize exposing some smaller number of humans
[00:40:27] exposing some smaller number of humans to risk when you can save a larger
[00:40:29] to risk when you can save a larger number of humans like a variation of a
[00:40:30] number of humans like a variation of a trolley problem that will spur
[00:40:32] trolley problem that will spur disagreements that didn't exist before
[00:40:34] disagreements that didn't exist before just like mapping technologies when they
[00:40:36] just like mapping technologies when they developed and became more precise
[00:40:38] developed and became more precise spurred disagreements between different
[00:40:40] spurred disagreements between different countries
[00:40:41] countries that previously shared borders in very
[00:40:43] that previously shared borders in very inhospitable locations when the border
[00:40:45] inhospitable locations when the border could really not be
[00:40:46] could really not be traced with quite as much detail and
[00:40:48] traced with quite as much detail and specificity
[00:40:51] specificity just to mention maybe one or two last
[00:40:53] just to mention maybe one or two last quick things on this slide
[00:40:55] quick things on this slide i think it's going to be really
[00:40:56] i think it's going to be really interesting as ai systems
[00:40:59] interesting as ai systems pose the question of what it means
[00:41:02] pose the question of what it means to maximize social welfare like how do
[00:41:04] to maximize social welfare like how do you design a system that is going to
[00:41:05] you design a system that is going to have its core attribute this is what you
[00:41:07] have its core attribute this is what you want to do and some people are trying to
[00:41:09] want to do and some people are trying to do that it's going to try to keep humans
[00:41:11] do that it's going to try to keep humans safe
[00:41:12] safe or it's going to try to avoid doing
[00:41:14] or it's going to try to avoid doing anything that will imperil too many
[00:41:15] anything that will imperil too many people
[00:41:16] people taking human values and turning them
[00:41:18] taking human values and turning them into code is actually really really
[00:41:20] into code is actually really really difficult
[00:41:22] difficult and it is related to the process through
[00:41:24] and it is related to the process through which humans think about change and
[00:41:26] which humans think about change and conflict which is to say
[00:41:27] conflict which is to say we often deal with conflict or
[00:41:29] we often deal with conflict or institutions like courts and
[00:41:30] institutions like courts and legislatures increasingly as we deal
[00:41:32] legislatures increasingly as we deal with conflict or machines we'll have to
[00:41:35] with conflict or machines we'll have to program machines to help diffuse
[00:41:37] program machines to help diffuse conflict and not only to point out how
[00:41:39] conflict and not only to point out how two views that seem to be very similar
[00:41:41] two views that seem to be very similar are actually in tension with each other
[00:41:44] are actually in tension with each other all right so we've gotten to my very
[00:41:45] all right so we've gotten to my very last slide i'll end here there's
[00:41:47] last slide i'll end here there's probably too much text on it but here's
[00:41:49] probably too much text on it but here's what i want to highlight
[00:41:51] what i want to highlight if you are listening to this lecture and
[00:41:53] if you are listening to this lecture and you're thinking i hope that part of my
[00:41:54] you're thinking i hope that part of my career is spent thinking about how i can
[00:41:57] career is spent thinking about how i can help move ai and the design of ai so
[00:42:00] help move ai and the design of ai so that it is socially beneficial
[00:42:04] that it is socially beneficial i want to highlight to you that that is
[00:42:06] i want to highlight to you that that is actually really difficult to define it
[00:42:08] actually really difficult to define it probably in ways you've already
[00:42:09] probably in ways you've already anticipated but i want to highlight in
[00:42:11] anticipated but i want to highlight in particular a tension between two
[00:42:13] particular a tension between two different ways of thinking of what the
[00:42:14] different ways of thinking of what the social good is for purposes of ai and
[00:42:17] social good is for purposes of ai and pretty much everything else
[00:42:19] pretty much everything else in one version of what it means to work
[00:42:21] in one version of what it means to work for the social good you basically
[00:42:23] for the social good you basically develop systems that increasingly are
[00:42:25] develop systems that increasingly are good at giving people what they want
[00:42:27] good at giving people what they want what they say they want but especially
[00:42:28] what they say they want but especially what their behavior indicates that they
[00:42:31] what their behavior indicates that they like and that they value
[00:42:32] like and that they value so the entertainment that they want the
[00:42:34] so the entertainment that they want the products that they want the classes that
[00:42:36] products that they want the classes that they want the kind of teaching that
[00:42:39] they want the kind of teaching that maximizes student evaluation you know
[00:42:41] maximizes student evaluation you know feedback and so on
[00:42:42] feedback and so on but of course you know part of what
[00:42:44] but of course you know part of what makes life so interesting is that
[00:42:45] makes life so interesting is that there's a separation sometimes between
[00:42:47] there's a separation sometimes between what people say they want
[00:42:49] what people say they want and what they actually want or what
[00:42:51] and what they actually want or what people say they want and what they do or
[00:42:53] people say they want and what they do or for that matter what people want at time
[00:42:55] for that matter what people want at time one when you started listening to this
[00:42:57] one when you started listening to this lecture and what you want right now it's
[00:42:58] lecture and what you want right now it's probably to stop right and once you
[00:43:00] probably to stop right and once you start admitting to the idea that human
[00:43:02] start admitting to the idea that human welfare is more complicated
[00:43:05] welfare is more complicated and further once you start designing
[00:43:06] and further once you start designing systems that are in real time shaping
[00:43:08] systems that are in real time shaping human
[00:43:10] human affect and culture and behavior it
[00:43:12] affect and culture and behavior it actually becomes really really difficult
[00:43:14] actually becomes really really difficult to know where to land like how to take
[00:43:17] to know where to land like how to take advantage of the human knowledge you
[00:43:18] advantage of the human knowledge you have to know like how to make humans
[00:43:20] have to know like how to make humans better off
[00:43:21] better off i don't know how we solve that problem
[00:43:24] i don't know how we solve that problem but i do know that the things that i do
[00:43:26] but i do know that the things that i do as a judge and the things that we do in
[00:43:28] as a judge and the things that we do in law schools as lawyers and the things
[00:43:31] law schools as lawyers and the things you do as technical experts are
[00:43:33] you do as technical experts are increasingly merging and i don't think
[00:43:35] increasingly merging and i don't think that we can answer these really tough
[00:43:37] that we can answer these really tough questions without acknowledging that our
[00:43:39] questions without acknowledging that our bodies of knowledge have a border that
[00:43:41] bodies of knowledge have a border that is increasingly becoming really blurry
[00:43:44] is increasingly becoming really blurry and with that
[00:43:45] and with that i'm going to stop and thank you for
[00:43:47] i'm going to stop and thank you for listening and i'm looking forward to
[00:43:48] listening and i'm looking forward to your comments and questions and
[00:43:50] your comments and questions and feedbacks concurring opinions to sense
[00:43:53] feedbacks concurring opinions to sense whatever you want to share
[00:44:02] so i think yeah so the way you're going
[00:44:04] so i think yeah so the way you're going to go for
[00:44:05] to go for questions and thoughts is either raise
[00:44:08] questions and thoughts is either raise your hand or put it on a chat and then
[00:44:12] your hand or put it on a chat and then they'll just call you or read the
[00:44:13] they'll just call you or read the question great
[00:44:23] i should add that if we were in a real
[00:44:25] i should add that if we were in a real classroom
[00:44:26] classroom what law professors do is we call on
[00:44:28] what law professors do is we call on people so i would call on people but i
[00:44:31] people so i would call on people but i can't really call them people so i'm
[00:44:32] can't really call them people so i'm going to wait for your questions i have
[00:44:34] going to wait for your questions i have a question you mentioned that a lot of
[00:44:36] a question you mentioned that a lot of the laws right that will be used for ai
[00:44:39] the laws right that will be used for ai actually already exists you can give an
[00:44:41] actually already exists you can give an example for an existing law that you
[00:44:43] example for an existing law that you think will be used for ai applications
[00:44:46] think will be used for ai applications or systems
[00:44:48] or systems pretty soon
[00:44:52] absolutely great question thank you
[00:44:54] absolutely great question thank you the short answer is let me let me start
[00:44:56] the short answer is let me let me start with the common law which are the
[00:44:58] with the common law which are the subjects that we teach
[00:44:59] subjects that we teach law students in the first year that are
[00:45:02] law students in the first year that are they're sort of defined by the fact that
[00:45:04] they're sort of defined by the fact that the law is a little bit more judge-made
[00:45:06] the law is a little bit more judge-made so someone like you as you as you
[00:45:09] so someone like you as you as you learn
[00:45:10] learn basic civics of a system like the
[00:45:11] basic civics of a system like the american system
[00:45:13] american system in most cases the legislature is elected
[00:45:15] in most cases the legislature is elected to enact what the law is and then
[00:45:18] to enact what the law is and then the executive branch will implement it
[00:45:20] the executive branch will implement it and then the courts will judge and
[00:45:22] and then the courts will judge and interpret like what the law means but
[00:45:23] interpret like what the law means but there are certain branches of the law
[00:45:25] there are certain branches of the law where in our anglo-american tradition
[00:45:27] where in our anglo-american tradition the law is actually first developed by
[00:45:30] the law is actually first developed by judges over time little by little case
[00:45:32] judges over time little by little case by case and then the legislature will
[00:45:34] by case and then the legislature will jump in and they will tweak the law in
[00:45:36] jump in and they will tweak the law in this way that way so so those bodies of
[00:45:38] this way that way so so those bodies of law include
[00:45:39] law include contract law and tort law and both of
[00:45:42] contract law and tort law and both of those are so clearly about ai at some
[00:45:45] those are so clearly about ai at some level so contract law is the law of
[00:45:48] level so contract law is the law of promises that we make to each other and
[00:45:49] promises that we make to each other and when they're binding and when they're
[00:45:50] when they're binding and when they're not right
[00:45:51] not right when professor saudi promised you a good
[00:45:53] when professor saudi promised you a good class i think she's delivering as
[00:45:56] class i think she's delivering as are her colleagues in this class but
[00:45:58] are her colleagues in this class but like if you say well the class wasn't
[00:46:00] like if you say well the class wasn't good i've been defrauded
[00:46:02] good i've been defrauded you know the law will try to determine
[00:46:04] you know the law will try to determine whether there's like a actual legal
[00:46:05] whether there's like a actual legal claim that you have or not
[00:46:07] claim that you have or not so in the air context just imagine now
[00:46:09] so in the air context just imagine now for a moment
[00:46:10] for a moment that increasingly transactions are being
[00:46:12] that increasingly transactions are being made by two ai systems making spot
[00:46:15] made by two ai systems making spot contracts with each other in a split
[00:46:17] contracts with each other in a split second because you've pre-programmed one
[00:46:19] second because you've pre-programmed one to say as long as this stock falls below
[00:46:21] to say as long as this stock falls below this price buy a whole bunch of it right
[00:46:24] this price buy a whole bunch of it right and when the lines of supply and demand
[00:46:26] and when the lines of supply and demand cross because two ai systems are talking
[00:46:28] cross because two ai systems are talking to each other
[00:46:29] to each other the deal is made but then it turns out
[00:46:31] the deal is made but then it turns out that maybe this was class c shares of
[00:46:33] that maybe this was class c shares of stock not class a
[00:46:35] stock not class a and so like whose gets who's stuck you
[00:46:38] and so like whose gets who's stuck you know dealing with the cost of a
[00:46:39] know dealing with the cost of a transaction that is not what both
[00:46:40] transaction that is not what both parties wanted so existing contract law
[00:46:42] parties wanted so existing contract law has a lot to say about that
[00:46:44] has a lot to say about that now tort law is an area of law that
[00:46:47] now tort law is an area of law that frankly when i was a law student i
[00:46:49] frankly when i was a law student i thought was really boring when i was a
[00:46:51] thought was really boring when i was a law professor i thought was kind of
[00:46:53] law professor i thought was kind of technically complicated not that
[00:46:54] technically complicated not that interesting to me i had my hands full of
[00:46:56] interesting to me i had my hands full of teaching other stuff as a judge i think
[00:46:58] teaching other stuff as a judge i think of it as fascinating really really
[00:47:00] of it as fascinating really really interesting that's the body of law
[00:47:02] interesting that's the body of law governing
[00:47:04] governing when your conduct harms other people and
[00:47:07] when your conduct harms other people and when you were liable for that
[00:47:09] when you were liable for that there is no way to have a discussion
[00:47:11] there is no way to have a discussion about cars like automated cars without
[00:47:13] about cars like automated cars without having tort law be a big part of that
[00:47:16] having tort law be a big part of that so like you know
[00:47:17] so like you know to what extent is the
[00:47:20] to what extent is the designer of the software that runs the
[00:47:23] designer of the software that runs the vision system for the car responsible
[00:47:25] vision system for the car responsible for this person who gets run over versus
[00:47:28] for this person who gets run over versus the person who runs the company that
[00:47:30] the person who runs the company that tested the software versus the company
[00:47:32] tested the software versus the company that designed the car and marketed for
[00:47:33] that designed the car and marketed for you versus the driver that pushed the
[00:47:35] you versus the driver that pushed the car to operate in really bad weather and
[00:47:38] car to operate in really bad weather and and torque is really complicated but i
[00:47:40] and torque is really complicated but i will just give you one quick insight
[00:47:42] will just give you one quick insight which is very intuitive which is
[00:47:44] which is very intuitive which is one workforce concept in tort law which
[00:47:46] one workforce concept in tort law which has been on the books for a while
[00:47:48] has been on the books for a while is the notion that the tortella should
[00:47:50] is the notion that the tortella should pay attention other things being equal
[00:47:52] pay attention other things being equal to who in that chain of causation is in
[00:47:55] to who in that chain of causation is in the best position to have avoided the
[00:47:56] the best position to have avoided the harm at the lowest cost least cost
[00:47:59] harm at the lowest cost least cost avoidable and you can see how here that
[00:48:01] avoidable and you can see how here that would be a really interesting and
[00:48:02] would be a really interesting and important question like who could have
[00:48:03] important question like who could have done a very little bit to avoid that
[00:48:05] done a very little bit to avoid that person from getting run over there are
[00:48:07] person from getting run over there are dozens of other areas of law but this to
[00:48:09] dozens of other areas of law but this to me is a really good example of how when
[00:48:11] me is a really good example of how when silicon valley says like oh we have to
[00:48:12] silicon valley says like oh we have to decide if ai is going to be regulated
[00:48:15] decide if ai is going to be regulated i think there's a little disconnect with
[00:48:18] i think there's a little disconnect with reality
[00:48:25] okay i think we have
[00:48:28] okay i think we have go ahead yeah
[00:48:34] yeah thanks i have a question so um
[00:48:39] yeah thanks i have a question so um these systems are probably like very
[00:48:41] these systems are probably like very opaque for judges and people who
[00:48:44] opaque for judges and people who actually have to make decisions about
[00:48:46] actually have to make decisions about you know what happened and how does it
[00:48:48] you know what happened and how does it interact with case law and statutes and
[00:48:50] interact with case law and statutes and whatever and my question is if as a
[00:48:53] whatever and my question is if as a judge you have to answer a question of
[00:48:54] judge you have to answer a question of fact about a particular ai system
[00:48:57] fact about a particular ai system you just like have experts come in and
[00:48:59] you just like have experts come in and testify and be like well nobody else has
[00:49:01] testify and be like well nobody else has any hope of being able to understand it
[00:49:03] any hope of being able to understand it so we're just going to take what they
[00:49:04] so we're just going to take what they say as gospel or do like judges and
[00:49:07] say as gospel or do like judges and clerks and things like actually try to
[00:49:10] clerks and things like actually try to educate themselves on math behind how
[00:49:12] educate themselves on math behind how the stuff works
[00:49:14] the stuff works that is an excellent question and i have
[00:49:17] that is an excellent question and i have good news and bad news what do you want
[00:49:18] good news and bad news what do you want to hear first
[00:49:21] to hear first uh let's let's hear the good news
[00:49:23] uh let's let's hear the good news okay
[00:49:24] okay the good news is that in some ways the
[00:49:26] the good news is that in some ways the problem you've just described is not
[00:49:28] problem you've just described is not completely new to law
[00:49:30] completely new to law and in our system we have a kind of
[00:49:32] and in our system we have a kind of interplay of
[00:49:33] interplay of decision making that involves
[00:49:37] decision making that involves jurors being
[00:49:38] jurors being asked
[00:49:39] asked subject to some instructions given by
[00:49:41] subject to some instructions given by judges how to interpret an ambiguous
[00:49:43] judges how to interpret an ambiguous fact pattern lay people who are supposed
[00:49:45] fact pattern lay people who are supposed to be acting in their best
[00:49:47] to be acting in their best conscientious effort to do the right
[00:49:49] conscientious effort to do the right thing and follow the instructions and
[00:49:51] thing and follow the instructions and then
[00:49:53] then experts who in an adversarial system are
[00:49:55] experts who in an adversarial system are selected carefully vetted carefully
[00:49:58] selected carefully vetted carefully debriefed before they sort of come
[00:49:59] debriefed before they sort of come before the court and then subject to
[00:50:01] before the court and then subject to cross-examination who can shed light on
[00:50:04] cross-examination who can shed light on things and help the jurors and the judge
[00:50:06] things and help the jurors and the judge make decisions and then judges who are
[00:50:08] make decisions and then judges who are supposed to resolve questions as a
[00:50:10] supposed to resolve questions as a matter of law like questions that
[00:50:11] matter of law like questions that ultimately are more about how to
[00:50:12] ultimately are more about how to interpret the legal issue itself
[00:50:14] interpret the legal issue itself and if for example a statute says
[00:50:17] and if for example a statute says a quote-unquote highly autonomous system
[00:50:20] a quote-unquote highly autonomous system shall be regulated subject to subpart j
[00:50:22] shall be regulated subject to subpart j but then like is this a highly
[00:50:24] but then like is this a highly autonomous system or not so that's a
[00:50:25] autonomous system or not so that's a mixed question of law in fact for
[00:50:26] mixed question of law in fact for example the judge might do part of it so
[00:50:29] example the judge might do part of it so in particular the expert testimony piece
[00:50:32] in particular the expert testimony piece will frequently involve experts who come
[00:50:35] will frequently involve experts who come and pick at the really intricate like
[00:50:37] and pick at the really intricate like math type questions that you're raising
[00:50:40] math type questions that you're raising and i wouldn't say the system is perfect
[00:50:42] and i wouldn't say the system is perfect but i would say the system works okay so
[00:50:44] but i would say the system works okay so other contexts where the highly
[00:50:45] other contexts where the highly technical
[00:50:47] technical gets adjudicated kind of like it would
[00:50:49] gets adjudicated kind of like it would in an ai context would be dna evidence
[00:50:51] in an ai context would be dna evidence for example
[00:50:53] for example base pairs and you know what does it
[00:50:55] base pairs and you know what does it mean if you say
[00:50:57] mean if you say the match is 1 in
[00:50:59] the match is 1 in 1.7 billion how do you know that what's
[00:51:03] 1.7 billion how do you know that what's the difference between
[00:51:04] the difference between 1 and 1 3 and 3 million which is what
[00:51:06] 1 and 1 3 and 3 million which is what the other experts said like we have a
[00:51:08] the other experts said like we have a way of dealing with elections
[00:51:10] way of dealing with elections here is the bad news though
[00:51:12] here is the bad news though i think the bad news is that none of the
[00:51:14] i think the bad news is that none of the technologies in the past
[00:51:16] technologies in the past have had the potential that ai systems
[00:51:19] have had the potential that ai systems do
[00:51:20] do to talk back and i think that is like
[00:51:23] to talk back and i think that is like not a small thing because what that
[00:51:25] not a small thing because what that means is
[00:51:26] means is these ai systems can be designed in a
[00:51:28] these ai systems can be designed in a way that i think creates a little bit of
[00:51:30] way that i think creates a little bit of a comforting illusion
[00:51:32] a comforting illusion that even the experts understand what's
[00:51:34] that even the experts understand what's really going on when they may be
[00:51:36] really going on when they may be influenced more by
[00:51:38] influenced more by design choices that might be really
[00:51:41] design choices that might be really really hard to arbiter
[00:51:43] really hard to arbiter where
[00:51:44] where the ai system is actually
[00:51:46] the ai system is actually maximizing not
[00:51:49] maximizing not the level of accuracy it conveys to the
[00:51:51] the level of accuracy it conveys to the user about the mathematical basis for a
[00:51:53] user about the mathematical basis for a conclusion like that this person is
[00:51:55] conclusion like that this person is likely to reoffend
[00:51:57] likely to reoffend but instead
[00:51:58] but instead maximizing the possibility that the
[00:52:00] maximizing the possibility that the decision maker who's being influenced by
[00:52:02] decision maker who's being influenced by the ai system is going to agree with it
[00:52:04] the ai system is going to agree with it and that could even be an expert who's
[00:52:05] and that could even be an expert who's testifying right so this is where
[00:52:09] testifying right so this is where ai accountability begins to merge with
[00:52:11] ai accountability begins to merge with cyber security with me like ultimately
[00:52:14] cyber security with me like ultimately like cyber security problems are very
[00:52:15] like cyber security problems are very much about how like if you go back
[00:52:17] much about how like if you go back literally to supply chains and how you
[00:52:19] literally to supply chains and how you can mess up
[00:52:21] can mess up the very core architecture of how like a
[00:52:22] the very core architecture of how like a microprocessor works the ways you can
[00:52:25] microprocessor works the ways you can bias results that can become incredibly
[00:52:27] bias results that can become incredibly difficult even for an expert to pick a
[00:52:28] difficult even for an expert to pick a part and i don't think we have a great
[00:52:30] part and i don't think we have a great answer for that like there may be
[00:52:32] answer for that like there may be blockchain type like really fancy ways
[00:52:35] blockchain type like really fancy ways of of using like
[00:52:37] of of using like you know hardcore like
[00:52:39] you know hardcore like encryption like stuff to sort of like
[00:52:42] encryption like stuff to sort of like have greater
[00:52:44] have greater confidence in results and to know when
[00:52:47] confidence in results and to know when things have been messed with
[00:52:49] things have been messed with but somewhere along the line there are
[00:52:50] but somewhere along the line there are humans and humans are imperfect and i
[00:52:52] humans and humans are imperfect and i just i worry about that piece of it a
[00:52:54] just i worry about that piece of it a lot
[00:52:59] awesome and
[00:53:03] uh we have a couple of questions in the
[00:53:04] uh we have a couple of questions in the chat
[00:53:05] chat uh are you able to view the chat
[00:53:08] uh are you able to view the chat uh let's see i see three participants
[00:53:10] uh let's see i see three participants raised hand that's oh
[00:53:16] [Music]
[00:53:18] [Music] do you want to take both of them
[00:53:19] do you want to take both of them together
[00:53:20] together yes let me see from a global perspective
[00:53:23] yes let me see from a global perspective any collaboration between countries
[00:53:25] any collaboration between countries regard of ai adoption you see ai systems
[00:53:28] regard of ai adoption you see ai systems from countries with different values
[00:53:30] from countries with different values will be prevented from being adopted
[00:53:32] will be prevented from being adopted okay and then similar to how developing
[00:53:33] okay and then similar to how developing countries will strive to attain
[00:53:36] countries will strive to attain the same level of life quality as
[00:53:37] the same level of life quality as developed countries it seems just a
[00:53:39] developed countries it seems just a matter of time before ai becomes the
[00:53:41] matter of time before ai becomes the next thing
[00:53:42] next thing how should we think about international
[00:53:44] how should we think about international cooperational rivalry oh yes in ai
[00:53:47] cooperational rivalry oh yes in ai development is there anything we can do
[00:53:49] development is there anything we can do as technologists to help how should we
[00:53:51] as technologists to help how should we navigate a time where u.s
[00:53:54] navigate a time where u.s has marked ai software export
[00:53:56] has marked ai software export restrictions okay great questions
[00:53:59] restrictions okay great questions let me start with the second one a
[00:54:01] let me start with the second one a little bit so
[00:54:04] i think that
[00:54:08] as technologists you can probably help
[00:54:10] as technologists you can probably help by
[00:54:11] by trying to make sure that the hype
[00:54:13] trying to make sure that the hype doesn't run away with the discussion of
[00:54:15] doesn't run away with the discussion of these issues
[00:54:17] these issues so i could find people
[00:54:19] so i could find people who are in the national security world i
[00:54:22] who are in the national security world i can find people who are in the public
[00:54:23] can find people who are in the public intellectual world
[00:54:25] intellectual world who will see the relationship between
[00:54:27] who will see the relationship between the us and other countries
[00:54:30] the us and other countries so much through the lens of rivalry
[00:54:32] so much through the lens of rivalry that very little space will be left
[00:54:34] that very little space will be left for any collaboration between scientists
[00:54:36] for any collaboration between scientists for example between
[00:54:38] for example between civil society nonprofits that are trying
[00:54:40] civil society nonprofits that are trying to reduce the risk of climate change by
[00:54:42] to reduce the risk of climate change by using machine learning tools or whatever
[00:54:45] using machine learning tools or whatever and i think that technologists will be
[00:54:47] and i think that technologists will be important voices in saying
[00:54:49] important voices in saying we can be legitimately concerned about
[00:54:52] we can be legitimately concerned about how differences in technological
[00:54:53] how differences in technological development can affect geopolitics or
[00:54:56] development can affect geopolitics or relations between different countries
[00:54:58] relations between different countries but not run to the conclusion that
[00:54:59] but not run to the conclusion that everything is pure competition
[00:55:01] everything is pure competition i bet you that there are people in this
[00:55:03] i bet you that there are people in this class who are not born in the u.s well i
[00:55:06] class who are not born in the u.s well i know that's the case because i wasn't
[00:55:07] know that's the case because i wasn't born in the u.s and i know that's true
[00:55:09] born in the u.s and i know that's true of others of you and to me that's like a
[00:55:12] of others of you and to me that's like a really poignant reminder of the risks of
[00:55:15] really poignant reminder of the risks of having the conversation about eye shut
[00:55:17] having the conversation about eye shut down to the point that it becomes too
[00:55:19] down to the point that it becomes too one-sided and too
[00:55:21] one-sided and too too much just about national advancement
[00:55:23] too much just about national advancement in no way do i want to deny
[00:55:26] in no way do i want to deny and this will get me a little into this
[00:55:27] and this will get me a little into this first to the earlier question
[00:55:30] first to the earlier question that there were different agendas
[00:55:31] that there were different agendas different goals different
[00:55:34] different goals different geopolitical objectives of different
[00:55:36] geopolitical objectives of different countries and that getting some
[00:55:38] countries and that getting some advantage in ai technology can translate
[00:55:40] advantage in ai technology can translate into potential military
[00:55:42] into potential military and economic advantage so some balance
[00:55:45] and economic advantage so some balance has to be struck and that requires some
[00:55:47] has to be struck and that requires some careful discussion it requires some
[00:55:49] careful discussion it requires some norms it requires some cooperation from
[00:55:51] norms it requires some cooperation from universities because we think of
[00:55:53] universities because we think of universities as working best when mostly
[00:55:55] universities as working best when mostly things are pretty open in the university
[00:55:57] things are pretty open in the university we share knowledge i learn from you all
[00:55:59] we share knowledge i learn from you all you will learn from me etc
[00:56:01] you will learn from me etc but the reality is sometimes careful
[00:56:03] but the reality is sometimes careful languages have to be drawn
[00:56:05] languages have to be drawn and ultimately that does reflect the
[00:56:07] and ultimately that does reflect the reality that countries do have different
[00:56:09] reality that countries do have different values about others and i'll just
[00:56:11] values about others and i'll just mention one word that it is a simple way
[00:56:13] mention one word that it is a simple way of making that point but there are other
[00:56:14] of making that point but there are other examples i could give you
[00:56:16] examples i could give you the word is privacy
[00:56:18] the word is privacy so i can imagine many countries in the
[00:56:20] so i can imagine many countries in the world that could argue either
[00:56:24] world that could argue either that their populations simply have no
[00:56:26] that their populations simply have no particular reason to value privacy the
[00:56:28] particular reason to value privacy the way americans do or
[00:56:30] way americans do or that whether their population values
[00:56:32] that whether their population values privacy or not their law is such that
[00:56:34] privacy or not their law is such that they prioritize other things and they're
[00:56:36] they prioritize other things and they're simply going to gather a ton of data
[00:56:37] simply going to gather a ton of data about a ton of stuff
[00:56:39] about a ton of stuff and
[00:56:40] and i don't think that data translates
[00:56:43] i don't think that data translates automatically to greater power in the ai
[00:56:44] automatically to greater power in the ai space but it is a significant advantage
[00:56:46] space but it is a significant advantage if you have it like it'll be interesting
[00:56:48] if you have it like it'll be interesting to ask in the coming months and years to
[00:56:50] to ask in the coming months and years to what extent some combination of
[00:56:52] what extent some combination of reinforcement learning and the
[00:56:54] reinforcement learning and the generation of artificial sort of fake
[00:56:56] generation of artificial sort of fake data
[00:56:57] data like or fake you know information inputs
[00:56:59] like or fake you know information inputs into like a reinforcement learning
[00:57:01] into like a reinforcement learning algorithm can make up for just raw
[00:57:03] algorithm can make up for just raw access to real world data but i still
[00:57:06] access to real world data but i still think a corpus of actual human behavior
[00:57:08] think a corpus of actual human behavior is quite something and if you simply
[00:57:09] is quite something and if you simply didn't have to worry about privacy
[00:57:12] didn't have to worry about privacy the insights you might get into the
[00:57:14] the insights you might get into the expressions on people's faces when
[00:57:16] expressions on people's faces when they're having a private conversation
[00:57:17] they're having a private conversation about something incredibly sensitive or
[00:57:20] about something incredibly sensitive or the fear you can see in somebody's eyes
[00:57:22] the fear you can see in somebody's eyes when they begin to realize that they've
[00:57:23] when they begin to realize that they've said something
[00:57:24] said something on social media that is likely to bring
[00:57:27] on social media that is likely to bring a knock on the door from the police
[00:57:29] a knock on the door from the police is valuable particularly if your goal is
[00:57:32] is valuable particularly if your goal is to try to not only
[00:57:34] to try to not only not only improve the lot and the
[00:57:36] not only improve the lot and the well-being of the population but to
[00:57:37] well-being of the population but to control them and to limit the extent to
[00:57:39] control them and to limit the extent to which they push against you so i think
[00:57:42] which they push against you so i think this leaves us in a really challenging
[00:57:43] this leaves us in a really challenging space like i really really would urge
[00:57:45] space like i really really would urge all of us
[00:57:46] all of us to highlight the importance of some
[00:57:48] to highlight the importance of some public collaboration across borders on
[00:57:51] public collaboration across borders on this
[00:57:52] this we don't stand much of a chance i think
[00:57:54] we don't stand much of a chance i think of getting really to where humanity
[00:57:55] of getting really to where humanity needs to get to on so many crucial
[00:57:57] needs to get to on so many crucial issues including ai safety by the way we
[00:57:59] issues including ai safety by the way we don't have some sharing of information
[00:58:01] don't have some sharing of information pure competition is going to drive a lot
[00:58:03] pure competition is going to drive a lot of the dangerous and riskiest
[00:58:06] of the dangerous and riskiest technological experimentation on the
[00:58:07] technological experimentation on the ground at least for a while
[00:58:09] ground at least for a while on the other hand i think we'd be naive
[00:58:11] on the other hand i think we'd be naive to think that everybody shares interests
[00:58:13] to think that everybody shares interests so
[00:58:14] so some degree of building of norms and
[00:58:16] some degree of building of norms and cooperation among communities of people
[00:58:19] cooperation among communities of people who are in civil society in the world of
[00:58:21] who are in civil society in the world of nonprofits or philanthropy or education
[00:58:24] nonprofits or philanthropy or education i think will be really crucial
[00:58:33] as a
[00:58:34] as a question yes
[00:58:37] question yes the general question this
[00:58:39] the general question this comes to be real
[00:58:40] comes to be real if i just say i've developed projects
[00:58:42] if i just say i've developed projects and i have to connect all the
[00:58:44] and i have to connect all the information from the network
[00:58:46] information from the network uh
[00:58:47] uh so what should i
[00:58:49] so what should i share or take care of
[00:58:52] share or take care of something like the copyright
[00:58:54] something like the copyright privacy
[00:58:56] privacy and the commission from the source
[00:58:57] and the commission from the source society or even some citation i have to
[00:59:00] society or even some citation i have to prepare
[00:59:02] prepare this kind of thing
[00:59:08] yeah yeah thank you i this is a really
[00:59:10] yeah yeah thank you i this is a really big subject let me just try to abstract
[00:59:12] big subject let me just try to abstract a little bit from your good question
[00:59:14] a little bit from your good question you're basically
[00:59:16] you're basically raising the broader question of how we
[00:59:18] raising the broader question of how we might think of ownership and
[00:59:20] might think of ownership and responsibility over data particularly as
[00:59:23] responsibility over data particularly as people work together on these projects
[00:59:25] people work together on these projects that mine huge amounts of data
[00:59:28] that mine huge amounts of data and use it in maybe different ways i
[00:59:30] and use it in maybe different ways i mean
[00:59:32] mean i and i'll give you a short answer but
[00:59:34] i and i'll give you a short answer but then i'll elaborate a little bit the
[00:59:36] then i'll elaborate a little bit the short answer is increasingly the world
[00:59:38] short answer is increasingly the world is waking up to the fact that control
[00:59:40] is waking up to the fact that control over data really is control over
[00:59:41] over data really is control over property in some ways so just as you
[00:59:44] property in some ways so just as you might have a
[00:59:46] might have a use agreement that says to somebody you
[00:59:49] use agreement that says to somebody you can you can license the use of this
[00:59:51] can you can license the use of this technology that i've developed let's say
[00:59:53] technology that i've developed let's say a piece of hardware like a camera that
[00:59:56] a piece of hardware like a camera that can
[00:59:57] can see really really well at night but by
[00:59:59] see really really well at night but by using this camera you are
[01:00:02] using this camera you are agreeing not to use it to uh you know
[01:00:05] agreeing not to use it to uh you know look into people's homes without the
[01:00:06] look into people's homes without the permission or something so increasingly
[01:00:09] permission or something so increasingly i would say there is a criss-crossing
[01:00:11] i would say there is a criss-crossing regime of law some of it is state law
[01:00:14] regime of law some of it is state law some of it is federal law pertaining to
[01:00:16] some of it is federal law pertaining to particular classes of data like medical
[01:00:17] particular classes of data like medical data that really highly regulates the
[01:00:20] data that really highly regulates the use of data
[01:00:21] use of data by the same token are there
[01:00:23] by the same token are there opportunities still to harvest scrape
[01:00:25] opportunities still to harvest scrape even data from say the public internet
[01:00:28] even data from say the public internet that can then be used in different ways
[01:00:29] that can then be used in different ways well of course and sometimes
[01:00:32] well of course and sometimes that will allow breakthroughs to occur
[01:00:34] that will allow breakthroughs to occur in ai
[01:00:36] in ai but but this is where it gets really
[01:00:37] but but this is where it gets really tricky because we're really in the midst
[01:00:39] tricky because we're really in the midst now of developing
[01:00:41] now of developing national and then eventually maybe
[01:00:43] national and then eventually maybe global norms about what it means to
[01:00:46] global norms about what it means to appropriately design ai systems that
[01:00:48] appropriately design ai systems that will get rid of data that no longer are
[01:00:50] will get rid of data that no longer are needed for the original purpose so let
[01:00:53] needed for the original purpose so let me give you two competing perspectives
[01:00:54] me give you two competing perspectives about that first one is that because ai
[01:00:58] about that first one is that because ai systems are so capable of developing
[01:01:01] systems are so capable of developing insights using the techniques you are
[01:01:02] insights using the techniques you are learning in this class
[01:01:04] learning in this class that discern patterns in the data that
[01:01:07] that discern patterns in the data that human intuition would not have been able
[01:01:09] human intuition would not have been able to detect
[01:01:10] to detect big masses of data used in new ways is
[01:01:13] big masses of data used in new ways is risky because it means that maybe you
[01:01:15] risky because it means that maybe you end up getting embarrassed about people
[01:01:17] end up getting embarrassed about people discovering that
[01:01:19] discovering that the fact that you like a certain kind of
[01:01:22] the fact that you like a certain kind of literature and that your eyes move in a
[01:01:24] literature and that your eyes move in a certain way when you're in a
[01:01:25] certain way when you're in a conversation mean that you have a really
[01:01:27] conversation mean that you have a really short attention span and you can't be
[01:01:29] short attention span and you can't be trusted with a certain kind of job right
[01:01:31] trusted with a certain kind of job right so those questions are partly being
[01:01:34] so those questions are partly being mediated with the respect to like do we
[01:01:36] mediated with the respect to like do we not create norms that once data are used
[01:01:39] not create norms that once data are used for the purpose for which they are
[01:01:40] for the purpose for which they are collected we destroy the data
[01:01:43] collected we destroy the data now i'll give you the perspective of me
[01:01:44] now i'll give you the perspective of me the academic
[01:01:46] the academic so a lot of my early work as a law
[01:01:49] so a lot of my early work as a law professor was historical i would
[01:01:51] professor was historical i would actually look at
[01:01:52] actually look at old
[01:01:53] old memos and documents that were going back
[01:01:56] memos and documents that were going back to like presidential decisions made like
[01:01:57] to like presidential decisions made like in the roosevelt administration where i
[01:01:59] in the roosevelt administration where i was looking at what happened when
[01:02:01] was looking at what happened when roseville was trying to reorganize the
[01:02:02] roseville was trying to reorganize the government on the eve of world war ii
[01:02:04] government on the eve of world war ii and how he was trying to protect certain
[01:02:06] and how he was trying to protect certain programs from being defunded as the
[01:02:09] programs from being defunded as the country was getting ready to go to war
[01:02:10] country was getting ready to go to war by the way as a little subplot on that
[01:02:12] by the way as a little subplot on that one of the things i learned about that i
[01:02:14] one of the things i learned about that i was not expecting is that there was a
[01:02:15] was not expecting is that there was a big biological weapons research program
[01:02:18] big biological weapons research program that was being funded with white house
[01:02:20] that was being funded with white house support despite the fact that that
[01:02:22] support despite the fact that that arguably contravened certainly contrary
[01:02:24] arguably contravened certainly contrary statements the white house had made and
[01:02:26] statements the white house had made and arguably contravened you know certain
[01:02:28] arguably contravened you know certain aspects of legal norms at the time but
[01:02:31] aspects of legal norms at the time but long story short the point is
[01:02:33] long story short the point is if the norm had been followed that you
[01:02:34] if the norm had been followed that you only keep the data for its original
[01:02:36] only keep the data for its original intended use and then get rid of it
[01:02:39] intended use and then get rid of it where in the world would i be able to
[01:02:40] where in the world would i be able to write the stuff that i did how would i
[01:02:42] write the stuff that i did how would i be able to do it and you could say okay
[01:02:44] be able to do it and you could say okay well those are presidential records
[01:02:45] well those are presidential records that's different but in general the
[01:02:47] that's different but in general the historians who write about how humans
[01:02:50] historians who write about how humans lived 150 years ago are doing it with
[01:02:52] lived 150 years ago are doing it with data that were not intended for
[01:02:54] data that were not intended for historians
[01:02:55] historians so i think we have to strike a balance
[01:02:58] so i think we have to strike a balance but i would say you should assume
[01:02:59] but i would say you should assume whenever you're dealing with data that
[01:03:00] whenever you're dealing with data that there's probably some rule and if there
[01:03:02] there's probably some rule and if there isn't some rule that's in the law it's
[01:03:04] isn't some rule that's in the law it's probably in human subjects requirements
[01:03:06] probably in human subjects requirements in a universal
[01:03:17] yeah and my question is with regards to
[01:03:20] yeah and my question is with regards to the out of lab and the in the models
[01:03:24] the out of lab and the in the models working inside the lab which you
[01:03:26] working inside the lab which you mentioned
[01:03:27] mentioned so and uh so let's say if uh an
[01:03:30] so and uh so let's say if uh an organization develops a model let's say
[01:03:33] organization develops a model let's say it's for an auto
[01:03:35] it's for an auto self-driving cars for example i'm just
[01:03:37] self-driving cars for example i'm just picking it up
[01:03:39] picking it up 99 of the time
[01:03:42] 99 of the time it inside the lab the models work very
[01:03:45] it inside the lab the models work very well
[01:03:46] well and outside the lab you know it causes
[01:03:50] and outside the lab you know it causes uh loss of a person a life
[01:03:54] uh loss of a person a life is an example i'm taking an extreme here
[01:03:57] is an example i'm taking an extreme here so if this kind of situation comes to
[01:04:00] so if this kind of situation comes to you as a judge in your code
[01:04:04] you as a judge in your code what is the decision making
[01:04:06] what is the decision making which on the law side which you take
[01:04:09] which on the law side which you take with regards to the ai model and what's
[01:04:13] with regards to the ai model and what's the
[01:04:14] the thinking which goes behind you
[01:04:18] thinking which goes behind you to make this kind of
[01:04:20] to make this kind of the judgment against
[01:04:22] the judgment against the maker of the model or so it's
[01:04:26] the maker of the model or so it's working 99
[01:04:28] working 99 of the time but there are failures
[01:04:31] of the time but there are failures because you know the models are
[01:04:33] because you know the models are true uh given what has been coded and
[01:04:36] true uh given what has been coded and what the training has been it has been
[01:04:38] what the training has been it has been done on so i'm just wondering the
[01:04:41] done on so i'm just wondering the decision process what would uh
[01:04:44] decision process what would uh so is it tied to the risk or which the
[01:04:47] so is it tied to the risk or which the company has
[01:04:48] company has done the due diligence
[01:04:50] done the due diligence or
[01:04:51] or what is the essence of
[01:04:54] what is the essence of the responsibility and
[01:04:57] the responsibility and so that was the broad questions thank
[01:05:00] so that was the broad questions thank you
[01:05:01] you thank you so
[01:05:02] thank you so that's a another good opportunity for me
[01:05:05] that's a another good opportunity for me to share with you how much existing law
[01:05:07] to share with you how much existing law is already grappling with these issues
[01:05:09] is already grappling with these issues that arise that are so relevant to ai
[01:05:12] that arise that are so relevant to ai and particularly to the scaling up of ai
[01:05:15] and particularly to the scaling up of ai outside the laboratory
[01:05:17] outside the laboratory so i'll preface this by saying that
[01:05:18] so i'll preface this by saying that because i'm a sitting judge i wouldn't
[01:05:21] because i'm a sitting judge i wouldn't want you to feel like i'm telling you
[01:05:23] want you to feel like i'm telling you exactly how i would decide the case if
[01:05:25] exactly how i would decide the case if it came up because we actually have
[01:05:27] it came up because we actually have cases that are not like what you're
[01:05:28] cases that are not like what you're saying that are pending in the
[01:05:29] saying that are pending in the california courts and i'm not supposed
[01:05:31] california courts and i'm not supposed to say how decide them but i can tell
[01:05:33] to say how decide them but i can tell you
[01:05:35] you i can tell you in general the bodies of
[01:05:36] i can tell you in general the bodies of law that are relevant to trying to deal
[01:05:38] law that are relevant to trying to deal with this question
[01:05:39] with this question and in what direction they've moved over
[01:05:41] and in what direction they've moved over time
[01:05:43] time so
[01:05:44] so we have bodies of law particularly in
[01:05:47] we have bodies of law particularly in tort law and in contract law
[01:05:49] tort law and in contract law and in consumer protection law more
[01:05:51] and in consumer protection law more generally that can use basically three
[01:05:54] generally that can use basically three sorts of techniques to deal with the
[01:05:56] sorts of techniques to deal with the risk that you're pointing to which
[01:05:58] risk that you're pointing to which arises when you go from
[01:06:00] arises when you go from highly efficacious behavior in the lab
[01:06:02] highly efficacious behavior in the lab to what happens when
[01:06:05] to what happens when some technology is operating quote
[01:06:06] some technology is operating quote unquote in the real world so when a
[01:06:08] unquote in the real world so when a vision system for example is tested in
[01:06:10] vision system for example is tested in control conditions it works fine but now
[01:06:12] control conditions it works fine but now you're putting it on the front end of a
[01:06:15] you're putting it on the front end of a car that is going to work mostly
[01:06:17] car that is going to work mostly autonomously and it's driving around
[01:06:19] autonomously and it's driving around palo alto and then even driving around
[01:06:21] palo alto and then even driving around some much more irregular environment
[01:06:23] some much more irregular environment like
[01:06:24] like some dusty road unpaved in northern
[01:06:26] some dusty road unpaved in northern mexico
[01:06:28] mexico so one body of law is tort law again
[01:06:31] so one body of law is tort law again this is the body of law involving
[01:06:33] this is the body of law involving the duties that you owe for example to
[01:06:35] the duties that you owe for example to others as a company or as a person and
[01:06:38] others as a company or as a person and here
[01:06:39] here a core insight of tort lies that you
[01:06:42] a core insight of tort lies that you have a
[01:06:43] have a duty of care if there's a theory of
[01:06:45] duty of care if there's a theory of negligence that is being or well let me
[01:06:47] negligence that is being or well let me put it this way
[01:06:50] if the claim is that the manufacturer
[01:06:52] if the claim is that the manufacturer should have been more careful than the
[01:06:54] should have been more careful than the manufacturer was and the manufacturer
[01:06:57] manufacturer was and the manufacturer owes a duty to the person who's using
[01:06:59] owes a duty to the person who's using the product which is its own separate
[01:07:00] the product which is its own separate question
[01:07:01] question a crucial
[01:07:02] a crucial issue will be
[01:07:04] issue will be the extent to which prevailing norms in
[01:07:07] the extent to which prevailing norms in the industry about how much testing
[01:07:09] the industry about how much testing happens outside the lab or followed or
[01:07:12] happens outside the lab or followed or not
[01:07:12] not the more those norms converge
[01:07:15] the more those norms converge the easier it is for the company to say
[01:07:17] the easier it is for the company to say perhaps like well look we did some
[01:07:19] perhaps like well look we did some outside the lab testing
[01:07:20] outside the lab testing you know you could easily spend a
[01:07:23] you know you could easily spend a billion dollars a day testing outside
[01:07:25] billion dollars a day testing outside the lab infinitely but we did enough
[01:07:27] the lab infinitely but we did enough testing that we met the industry not
[01:07:30] testing that we met the industry not a different technique would be to rely
[01:07:32] a different technique would be to rely more on contract law where you could say
[01:07:36] more on contract law where you could say i was sold a product that was guaranteed
[01:07:38] i was sold a product that was guaranteed to me to have a degree of safety and
[01:07:41] to me to have a degree of safety and efficacy in it and in fact it didn't
[01:07:43] efficacy in it and in fact it didn't reflect that
[01:07:45] reflect that because it wasn't tested outside the lab
[01:07:47] because it wasn't tested outside the lab and the promise that was made to me was
[01:07:49] and the promise that was made to me was not that this product was just tested in
[01:07:50] not that this product was just tested in the lab it also implied a lot of testing
[01:07:52] the lab it also implied a lot of testing outside the lab
[01:07:53] outside the lab and the third strategy is more
[01:07:55] and the third strategy is more administrative regulation
[01:07:57] administrative regulation so this is like what the fda does with
[01:07:58] so this is like what the fda does with respect to pharmaceutical products and
[01:08:00] respect to pharmaceutical products and here the key insight
[01:08:03] here the key insight is that we don't rely purely just on
[01:08:05] is that we don't rely purely just on tort law or contract law we actually
[01:08:08] tort law or contract law we actually have the government saying to you you
[01:08:10] have the government saying to you you can only sell this product if you've
[01:08:13] can only sell this product if you've tested it in a particular way and as you
[01:08:15] tested it in a particular way and as you go through that pharmaceutical approval
[01:08:17] go through that pharmaceutical approval process and get into phase two phase
[01:08:19] process and get into phase two phase three phase four trials effectively what
[01:08:21] three phase four trials effectively what you're doing is you're going further and
[01:08:23] you're doing is you're going further and further outside the lab
[01:08:24] further outside the lab so we can do this to some degree they're
[01:08:26] so we can do this to some degree they're going to be some nuances but i think
[01:08:28] going to be some nuances but i think it's really important to remember we
[01:08:29] it's really important to remember we have different tools we can use to deal
[01:08:32] have different tools we can use to deal with that risk
[01:08:35] thank you
[01:08:37] thank you notice we have a couple of questions on
[01:08:38] notice we have a couple of questions on the chat so we do some of those and then
[01:08:40] the chat so we do some of those and then come back to the raised hands yes i
[01:08:42] come back to the raised hands yes i think maybe we have time just to take uh
[01:08:45] think maybe we have time just to take uh uh these two
[01:08:47] uh these two are you guys these two questions so
[01:08:49] are you guys these two questions so first
[01:08:50] first on the criminal justice system
[01:08:55] oh i see that okay i'm going to just
[01:08:56] oh i see that okay i'm going to just read it out if we find that an algorithm
[01:08:58] read it out if we find that an algorithm deployed in the criminal justice system
[01:09:00] deployed in the criminal justice system that doesn't explicitly take into
[01:09:01] that doesn't explicitly take into account race but it's systematically
[01:09:02] account race but it's systematically discriminated against black people
[01:09:04] discriminated against black people perhaps compass
[01:09:05] perhaps compass are the your legal ways to counter that
[01:09:07] are the your legal ways to counter that discrimination is it constitutional they
[01:09:10] discrimination is it constitutional they can do a count to protect a character
[01:09:11] can do a count to protect a character like race oh great okay doesn't mean we
[01:09:14] like race oh great okay doesn't mean we uh just shouldn't use the algorithm oh
[01:09:16] uh just shouldn't use the algorithm oh okay excellent question
[01:09:18] okay excellent question so the short answer is the legal system
[01:09:20] so the short answer is the legal system has in america for good reason
[01:09:23] has in america for good reason been long
[01:09:24] been long deeply concerned with racial inequities
[01:09:27] deeply concerned with racial inequities whether it has
[01:09:28] whether it has been sufficiently concerned is a
[01:09:30] been sufficiently concerned is a question that we can leave for another
[01:09:32] question that we can leave for another day and others can talk about but
[01:09:34] day and others can talk about but it has been concerned about it and that
[01:09:36] it has been concerned about it and that concern is reflected in several parts of
[01:09:38] concern is reflected in several parts of our legal system it's reflected in a lot
[01:09:40] our legal system it's reflected in a lot of statutes at the state and federal
[01:09:43] of statutes at the state and federal level against discrimination by race
[01:09:46] level against discrimination by race like
[01:09:47] like provisions against discrimination and
[01:09:49] provisions against discrimination and employment and hiring
[01:09:51] employment and hiring but it's also reflected in the
[01:09:53] but it's also reflected in the constitution for example in the equal
[01:09:55] constitution for example in the equal protection clause of the constitution
[01:09:57] protection clause of the constitution and here i would just say
[01:10:00] and here i would just say the legal system treats differently uses
[01:10:03] the legal system treats differently uses of race that are well first of all the
[01:10:06] of race that are well first of all the legal system treats different uses of
[01:10:07] legal system treats different uses of race than other classifications that
[01:10:09] race than other classifications that people might be subjected to
[01:10:11] people might be subjected to they are subjected to
[01:10:13] they are subjected to what the legal system calls strict
[01:10:15] what the legal system calls strict scrutiny which is a very very demanding
[01:10:18] scrutiny which is a very very demanding form of review where they're essentially
[01:10:21] form of review where they're essentially explicit
[01:10:22] explicit classifications are not permitted unless
[01:10:24] classifications are not permitted unless there's a very
[01:10:26] there's a very compelling strong justification on the
[01:10:28] compelling strong justification on the part of the government and there's no
[01:10:31] part of the government and there's no really no realistic way of doing it in a
[01:10:34] really no realistic way of doing it in a different manner
[01:10:35] different manner where it gets more complicated with
[01:10:37] where it gets more complicated with something like compass is where there's
[01:10:38] something like compass is where there's no explicit racial classification and
[01:10:41] no explicit racial classification and you still have biases and here i just
[01:10:44] you still have biases and here i just note that you can
[01:10:46] note that you can find algorithmic ways
[01:10:48] find algorithmic ways to reduce that bias or even disappear it
[01:10:50] to reduce that bias or even disappear it completely
[01:10:52] completely and there might even be legal reasons to
[01:10:53] and there might even be legal reasons to do that but
[01:10:55] do that but generally when you do that one of two
[01:10:57] generally when you do that one of two trade-offs will happen you will either
[01:10:59] trade-offs will happen you will either increase the likelihood that other
[01:11:01] increase the likelihood that other variables that may not otherwise be so
[01:11:04] variables that may not otherwise be so consequential become more consequential
[01:11:06] consequential become more consequential so to some people that might mean you're
[01:11:08] so to some people that might mean you're introducing a certain kind of different
[01:11:09] introducing a certain kind of different bias although we may not care about it
[01:11:11] bias although we may not care about it as much because it may not be racial
[01:11:14] as much because it may not be racial or you simply reduce the accuracy of the
[01:11:16] or you simply reduce the accuracy of the overall model in some cases now
[01:11:18] overall model in some cases now that may be entirely sensible to do
[01:11:21] that may be entirely sensible to do but these questions about when and how
[01:11:23] but these questions about when and how you recalibrate the results of a
[01:11:25] you recalibrate the results of a decision-making process because it
[01:11:27] decision-making process because it doesn't take race into account and yet
[01:11:29] doesn't take race into account and yet it still
[01:11:30] it still gives unequal results is a very familiar
[01:11:33] gives unequal results is a very familiar and vexing and difficult question in
[01:11:35] and vexing and difficult question in criminal justice and in the legal system
[01:11:37] criminal justice and in the legal system more generally let me just take the last
[01:11:39] more generally let me just take the last question then
[01:11:40] question then which one is it the one that says via
[01:11:42] which one is it the one that says via advertising complicated
[01:11:45] question actually
[01:11:47] question actually okay so my question is regarding is that
[01:11:49] okay so my question is regarding is that one how much tolerance okay would citing
[01:11:51] one how much tolerance okay would citing have when ai makes mistakes on the one
[01:11:54] have when ai makes mistakes on the one hand we say humans are not perfect on
[01:11:56] hand we say humans are not perfect on the other hand we want to see how much a
[01:11:58] the other hand we want to see how much a company can improve an algorithm to
[01:12:00] company can improve an algorithm to avoid an accident caused by their ai
[01:12:02] avoid an accident caused by their ai powered product to what extent should
[01:12:03] powered product to what extent should they our product manufacturers optimize
[01:12:05] they our product manufacturers optimize their product to become acceptable
[01:12:07] their product to become acceptable great question
[01:12:10] i think this gives us a chance to end
[01:12:12] i think this gives us a chance to end where we began actually because when it
[01:12:14] where we began actually because when it started i highlighted to you that
[01:12:17] started i highlighted to you that not only can ai have many benefits for
[01:12:19] not only can ai have many benefits for society
[01:12:20] society i named some but also that the relevant
[01:12:23] i named some but also that the relevant comparison is not to perfection but to
[01:12:26] comparison is not to perfection but to what imperfect forms of decision making
[01:12:28] what imperfect forms of decision making we might have to rely on if we don't
[01:12:30] we might have to rely on if we don't rely on a particular ai system
[01:12:33] rely on a particular ai system but
[01:12:34] but it would be wrong to conclude from that
[01:12:36] it would be wrong to conclude from that discussion
[01:12:38] discussion that as long as ai systems are more
[01:12:41] that as long as ai systems are more accurate than human decision makers then
[01:12:43] accurate than human decision makers then there's no legal problem or there's no
[01:12:46] there's no legal problem or there's no policy problem i think instead the
[01:12:48] policy problem i think instead the reality is
[01:12:50] reality is that as ai performance improves in a
[01:12:52] that as ai performance improves in a very discreet domain
[01:12:54] very discreet domain two things might happen
[01:12:56] two things might happen the first that are relevant to the
[01:12:57] the first that are relevant to the answer to your question the first is
[01:13:00] answer to your question the first is that we may come to understand and trust
[01:13:03] that we may come to understand and trust ai systems better to make that discrete
[01:13:06] ai systems better to make that discrete decision
[01:13:07] decision so long as it doesn't introduce some
[01:13:08] so long as it doesn't introduce some other biases that we think of as even
[01:13:10] other biases that we think of as even more concerning
[01:13:12] more concerning so
[01:13:12] so notice the point that i made about
[01:13:14] notice the point that i made about how ai systems
[01:13:16] how ai systems might be really good at picking out
[01:13:18] might be really good at picking out faces that are unfamiliar relative to
[01:13:20] faces that are unfamiliar relative to humans who are picking out unfamiliar
[01:13:22] humans who are picking out unfamiliar faces
[01:13:23] faces but if they're trying to discern emotion
[01:13:26] but if they're trying to discern emotion they may not be as good as humans right
[01:13:28] they may not be as good as humans right now that might change over time but
[01:13:30] now that might change over time but right now that means that we have to be
[01:13:31] right now that means that we have to be very specific with respect to what we're
[01:13:33] very specific with respect to what we're expecting system to do rather than
[01:13:35] expecting system to do rather than presuming that there's a sort of halo
[01:13:36] presuming that there's a sort of halo efficacy beyond the very narrow context
[01:13:39] efficacy beyond the very narrow context in which it's been tested
[01:13:41] in which it's been tested which might have to include just beyond
[01:13:43] which might have to include just beyond the lab right
[01:13:45] the lab right number two as ai systems get better in
[01:13:48] number two as ai systems get better in general
[01:13:50] general the standard of care that the legal
[01:13:51] the standard of care that the legal system uses to discern whether something
[01:13:53] system uses to discern whether something works effectively or not will begin to
[01:13:56] works effectively or not will begin to be redefined so that it's not just human
[01:13:58] be redefined so that it's not just human efficacy it's a well-performing ai
[01:14:00] efficacy it's a well-performing ai systems efficacy and not only a
[01:14:02] systems efficacy and not only a well-performing a system but ideally a
[01:14:04] well-performing a system but ideally a well-performing ai system that does not
[01:14:06] well-performing ai system that does not have a built-in set of biases that we
[01:14:08] have a built-in set of biases that we consider problematic so that means for
[01:14:10] consider problematic so that means for example
[01:14:11] example if over time 70
[01:14:14] if over time 70 of human passengers were in autonomous
[01:14:16] of human passengers were in autonomous vehicles rather than human driven ones
[01:14:19] vehicles rather than human driven ones been a faulty form of performance from
[01:14:21] been a faulty form of performance from one of those vehicles that increases the
[01:14:24] one of those vehicles that increases the risk of
[01:14:25] risk of harm and actually results in somebody
[01:14:27] harm and actually results in somebody getting harmed
[01:14:28] getting harmed might still be actionable
[01:14:30] might still be actionable despite the fact that even that faulty
[01:14:33] despite the fact that even that faulty system works a lot better than human
[01:14:34] system works a lot better than human drivers did
[01:14:36] drivers did it also means that even if ai systems
[01:14:38] it also means that even if ai systems are better at discerning new faces
[01:14:40] are better at discerning new faces if their accuracy is much much better
[01:14:42] if their accuracy is much much better for white faces than black faces
[01:14:45] for white faces than black faces that would be a policy and potentially
[01:14:47] that would be a policy and potentially legal problem for some
[01:14:49] legal problem for some that might require remediation and
[01:14:51] that might require remediation and attention even if the system works
[01:14:53] attention even if the system works better than humans
[01:14:55] better than humans all of which is why i think these
[01:14:57] all of which is why i think these problems are going to keep your
[01:14:58] problems are going to keep your generation busy for a long time
[01:15:02] thank you all right
[01:15:04] thank you all right thank you so much you know thanks for
[01:15:06] thank you so much you know thanks for the great talk and awesome discussion
[01:15:08] the great talk and awesome discussion and there are lots of interesting
[01:15:09] and there are lots of interesting questions this is really fun
[01:15:11] questions this is really fun uh so let's thank you know again
[01:15:15] uh so let's thank you know again thank you everybody i really enjoyed
[01:15:16] thank you everybody i really enjoyed this and i appreciate your very
[01:15:18] this and i appreciate your very thoughtful questions and best of luck
[01:15:25] you


================================================================================
LECTURE 052
================================================================================

Stanford Fireside Talks: Robustness in Machine Learning I Robust Machine Learning

Source: https://www.youtube.com/watch?v=xr8AHGlieOE

---

Transcript

[00:00:05] so today we're pleased to have tatsu
[00:00:08] so today we're pleased to have tatsu hashimoto here with us um tatsu did his
[00:00:11] hashimoto here with us um tatsu did his phd at mit did a post talk at stanford
[00:00:15] phd at mit did a post talk at stanford spent one year
[00:00:16] spent one year as a researcher at microsoft semantic
[00:00:18] as a researcher at microsoft semantic machines and he's joining stanford um as
[00:00:21] machines and he's joining stanford um as of last month as a fresh assistant
[00:00:23] of last month as a fresh assistant professor so welcome to
[00:00:25] professor so welcome to welcome back to stanford he'll actually
[00:00:28] welcome back to stanford he'll actually be teaching 221 in the winter
[00:00:30] be teaching 221 in the winter so if you like his talk you should go
[00:00:32] so if you like his talk you should go tell all your friends to have them take
[00:00:34] tell all your friends to have them take 221 in the in the winter
[00:00:37] 221 in the in the winter so tatsu has worked on a number of areas
[00:00:40] so tatsu has worked on a number of areas from computational biology text
[00:00:42] from computational biology text generation and nlp but he's probably
[00:00:45] generation and nlp but he's probably really well known for his work on
[00:00:48] really well known for his work on you know robustness and machine learning
[00:00:50] you know robustness and machine learning and i think throughout this course um
[00:00:54] and i think throughout this course um we've emphasized that machine learning
[00:00:55] we've emphasized that machine learning is something that's really
[00:00:58] is something that's really being deployed in the real world all
[00:01:00] being deployed in the real world all over right now and having real impact in
[00:01:03] over right now and having real impact in the world just last week we heard from
[00:01:05] the world just last week we heard from you know about this so i think
[00:01:07] you know about this so i think robustness of machine learning systems
[00:01:09] robustness of machine learning systems is a really really important area and
[00:01:11] is a really really important area and tatsu is an expert in this so i'm really
[00:01:13] tatsu is an expert in this so i'm really happy to have him tell us
[00:01:15] happy to have him tell us what robustness and machine learning is
[00:01:17] what robustness and machine learning is all about and where things are at the
[00:01:19] all about and where things are at the moment so take it away tatsu
[00:01:23] moment so take it away tatsu okay great
[00:01:25] okay great um so i want to start with emphasizing
[00:01:28] um so i want to start with emphasizing sort of what percy already said which is
[00:01:30] sort of what percy already said which is that there's been this enormous and
[00:01:32] that there's been this enormous and rapid progress in machine learning
[00:01:35] rapid progress in machine learning over the last decade or so and
[00:01:37] over the last decade or so and especially in tasks like image
[00:01:39] especially in tasks like image recognition um 10 years ago
[00:01:42] recognition um 10 years ago or errors were at the level of like 20
[00:01:44] or errors were at the level of like 20 30 percent and human level performance
[00:01:46] 30 percent and human level performance was you know sub uh seven percent
[00:01:50] was you know sub uh seven percent and there was this huge gap in
[00:01:51] and there was this huge gap in performance and everyone said it'll take
[00:01:53] performance and everyone said it'll take a long time to reach human level
[00:01:54] a long time to reach human level performance
[00:01:56] performance but nowadays really human level
[00:01:57] but nowadays really human level performance is being achieved on all
[00:01:59] performance is being achieved on all sorts of tasks image recognition as of
[00:02:01] sorts of tasks image recognition as of say 2015 but also in tasks like natural
[00:02:04] say 2015 but also in tasks like natural language processing and much more
[00:02:06] language processing and much more challenging reasoning-based tasks
[00:02:08] challenging reasoning-based tasks these systems are now getting really
[00:02:10] these systems are now getting really close if not exceeding human performance
[00:02:13] close if not exceeding human performance and so machine learning has really
[00:02:15] and so machine learning has really achieved these sort of great successes
[00:02:17] achieved these sort of great successes and they're being deployed and we can
[00:02:18] and they're being deployed and we can sort of ask what is machine learning
[00:02:20] sort of ask what is machine learning been good at and what is it good at and
[00:02:22] been good at and what is it good at and it's really good at extracting patterns
[00:02:24] it's really good at extracting patterns from training data
[00:02:25] from training data and applying this on a test distribution
[00:02:28] and applying this on a test distribution to do some prediction
[00:02:30] to do some prediction and so we can think of this as you know
[00:02:31] and so we can think of this as you know classic digit prediction task you have
[00:02:33] classic digit prediction task you have some images of digits and you need to
[00:02:35] some images of digits and you need to return the you know numbers that are
[00:02:36] return the you know numbers that are associated with them as long as sort of
[00:02:38] associated with them as long as sort of the source and the target distributions
[00:02:40] the source and the target distributions look the same
[00:02:42] look the same modern machine learning systems based on
[00:02:44] modern machine learning systems based on large amounts of data and neural nets
[00:02:46] large amounts of data and neural nets are going to do exceedingly well on
[00:02:47] are going to do exceedingly well on these tasks but really the challenge is
[00:02:50] these tasks but really the challenge is what if
[00:02:50] what if um the training data doesn't look very
[00:02:52] um the training data doesn't look very much like the test data
[00:02:54] much like the test data um in these cases we're gonna have a lot
[00:02:56] um in these cases we're gonna have a lot of challenges so on the
[00:02:58] of challenges so on the image that i put here you know in the
[00:02:59] image that i put here you know in the source domain we have these like black
[00:03:01] source domain we have these like black and white images and sort of desaturated
[00:03:04] and white images and sort of desaturated settings and now at test time you have
[00:03:05] settings and now at test time you have these yellow cabs in new york um and you
[00:03:08] these yellow cabs in new york um and you know your predictions might not work so
[00:03:09] know your predictions might not work so well once you have this what's called
[00:03:11] well once you have this what's called distribution shift
[00:03:13] distribution shift and so once we start to think about
[00:03:15] and so once we start to think about going beyond just sort of data that
[00:03:18] going beyond just sort of data that looks like the training data we see a
[00:03:19] looks like the training data we see a lot of problems on the horizon and we've
[00:03:20] lot of problems on the horizon and we've discovered a lot of these problems
[00:03:22] discovered a lot of these problems beyond tesla accuracy
[00:03:25] beyond tesla accuracy and i'm going to at the beginning of
[00:03:26] and i'm going to at the beginning of this talk cover sort of three classes of
[00:03:28] this talk cover sort of three classes of problems
[00:03:29] problems that hopefully
[00:03:30] that hopefully you'll think about as you sort of
[00:03:32] you'll think about as you sort of continue on your journey in ai and
[00:03:33] continue on your journey in ai and machine learning the first one is sort
[00:03:35] machine learning the first one is sort of discrimination
[00:03:37] of discrimination and performance on minorities
[00:03:39] and performance on minorities another one is vulnerability to
[00:03:41] another one is vulnerability to adversaries in high stakes secure
[00:03:44] adversaries in high stakes secure applications um and then last one which
[00:03:47] applications um and then last one which is a little bit more abstract but i will
[00:03:49] is a little bit more abstract but i will get into this in more detail is that
[00:03:51] get into this in more detail is that models don't really display an
[00:03:52] models don't really display an understanding of the tasks that they're
[00:03:55] understanding of the tasks that they're actually performing and this is going to
[00:03:56] actually performing and this is going to be a little bit abstract but because the
[00:03:58] be a little bit abstract but because the ai focused class i think this is an
[00:03:59] ai focused class i think this is an important thing uh to be discussing and
[00:04:02] important thing uh to be discussing and going through
[00:04:03] going through and so sort of the unifying theme like
[00:04:04] and so sort of the unifying theme like these seem like very different problems
[00:04:06] these seem like very different problems right like problems that machine
[00:04:07] right like problems that machine learning systems have today
[00:04:09] learning systems have today but really they're all sort of connected
[00:04:11] but really they're all sort of connected with a single underlying theme and that
[00:04:14] with a single underlying theme and that many of these problems can be cast as
[00:04:15] many of these problems can be cast as these problems in robustness
[00:04:18] these problems in robustness and so when the training distribution
[00:04:19] and so when the training distribution and the test distribution are different
[00:04:21] and the test distribution are different these models break down because they're
[00:04:23] these models break down because they're broke
[00:04:25] so to start with let's talk about sort
[00:04:27] so to start with let's talk about sort of discrimination and fairness and
[00:04:29] of discrimination and fairness and minority groups
[00:04:30] minority groups so
[00:04:31] so a really typical thing that happens in a
[00:04:34] a really typical thing that happens in a lot of machine learning systems today is
[00:04:36] lot of machine learning systems today is that there's sort of a majority group
[00:04:38] that there's sort of a majority group let's say you know western cultures
[00:04:40] let's say you know western cultures english text
[00:04:41] english text or
[00:04:42] or sort of males in many cases so in this
[00:04:44] sort of males in many cases so in this majority group that dominates the
[00:04:46] majority group that dominates the training data you get extremely good
[00:04:48] training data you get extremely good superhuman performance in these systems
[00:04:51] superhuman performance in these systems and often you're going to be deploying
[00:04:52] and often you're going to be deploying this to a wide variety of users and so
[00:04:55] this to a wide variety of users and so you will have minorities using your
[00:04:57] you will have minorities using your system
[00:04:58] system and in these cases you end up with
[00:05:00] and in these cases you end up with horrible sort of near random performance
[00:05:02] horrible sort of near random performance and you can sort of immediately see how
[00:05:04] and you can sort of immediately see how this is a discrimination issue and sort
[00:05:05] this is a discrimination issue and sort of an equity issue
[00:05:07] of an equity issue and i'm going to go over a lot of these
[00:05:09] and i'm going to go over a lot of these examples in turn but these just show up
[00:05:10] examples in turn but these just show up in all sorts of places that you might
[00:05:12] in all sorts of places that you might not initially think about when you think
[00:05:13] not initially think about when you think about fairness problems like say a
[00:05:15] about fairness problems like say a dependency parsing or video captioning
[00:05:18] dependency parsing or video captioning face recognition is a very common one
[00:05:19] face recognition is a very common one that people probably already know
[00:05:21] that people probably already know but in these sorts of like common widely
[00:05:23] but in these sorts of like common widely deployed ml systems you start to see
[00:05:25] deployed ml systems you start to see these gaps between how these systems
[00:05:26] these gaps between how these systems perform
[00:05:28] perform on majority groups versus minority
[00:05:31] on majority groups versus minority groups
[00:05:34] so the first one that i think is
[00:05:35] so the first one that i think is probably maybe surprising to many people
[00:05:38] probably maybe surprising to many people that there's these kinds of gaps
[00:05:40] that there's these kinds of gaps it's a test called dependency parsing so
[00:05:42] it's a test called dependency parsing so the input is just sort of sentences
[00:05:44] the input is just sort of sentences tokenized and sort of split up so an
[00:05:47] tokenized and sort of split up so an example here is bills on ports and
[00:05:48] example here is bills on ports and immigration we're submitted by senator
[00:05:50] immigration we're submitted by senator brownback republican of kansas um and
[00:05:53] brownback republican of kansas um and the output is that you're supposed to
[00:05:54] the output is that you're supposed to analyze sort of the syntactic structure
[00:05:56] analyze sort of the syntactic structure of this sentence and create dependencies
[00:05:58] of this sentence and create dependencies between what are called headwords uh and
[00:06:00] between what are called headwords uh and they're dependent and so you end up with
[00:06:02] they're dependent and so you end up with what looks like a tree here
[00:06:04] what looks like a tree here um and so the sentence above like the
[00:06:05] um and so the sentence above like the bills on ports and so on um can be
[00:06:07] bills on ports and so on um can be parsed into this v-shaped uh structure
[00:06:10] parsed into this v-shaped uh structure here on the bottom and so this is called
[00:06:12] here on the bottom and so this is called dependency parsing because there's these
[00:06:14] dependency parsing because there's these explicit dependencies uh between tokens
[00:06:16] explicit dependencies uh between tokens that show up um in your data
[00:06:19] that show up um in your data and in sort of classical nlp pipelines
[00:06:22] and in sort of classical nlp pipelines such as say if you want to extract
[00:06:23] such as say if you want to extract relations between uh people or entities
[00:06:26] relations between uh people or entities you know who was the person that
[00:06:27] you know who was the person that submitted the bill in the sentence for
[00:06:28] submitted the bill in the sentence for example you might use something like a
[00:06:30] example you might use something like a dependency parser to look at
[00:06:32] dependency parser to look at dependencies in your sentence and to
[00:06:34] dependencies in your sentence and to extract relations right so this is sort
[00:06:35] extract relations right so this is sort of a first step in terms of uh getting
[00:06:38] of a first step in terms of uh getting these kinds of more sophisticated
[00:06:39] these kinds of more sophisticated analyses in these sort of classical
[00:06:41] analyses in these sort of classical pipelines nowadays many things are sort
[00:06:42] pipelines nowadays many things are sort of end-to-end and neural um but that
[00:06:44] of end-to-end and neural um but that sort of decide the point here
[00:06:47] sort of decide the point here and what's sort of surprising or maybe
[00:06:49] and what's sort of surprising or maybe not surprising if you've thought about
[00:06:50] not surprising if you've thought about these kinds of problems is that these
[00:06:52] these kinds of problems is that these parsers do much much worse on
[00:06:55] parsers do much much worse on data that's not commonly used to train
[00:06:58] data that's not commonly used to train these dependency parsers so this is a
[00:07:00] these dependency parsers so this is a study from
[00:07:01] study from sue lemblaja in 2016 where they took a
[00:07:04] sue lemblaja in 2016 where they took a bunch of different dependency parsers
[00:07:06] bunch of different dependency parsers and applied them to text
[00:07:09] and applied them to text from standard american english as well
[00:07:11] from standard american english as well as african-american vernacular and
[00:07:13] as african-american vernacular and that's the column labeled
[00:07:14] that's the column labeled aalis
[00:07:16] aalis and the performance here is measured by
[00:07:18] and the performance here is measured by what's called label attachment score so
[00:07:20] what's called label attachment score so that's how well do you reconstruct the
[00:07:21] that's how well do you reconstruct the tree
[00:07:22] tree and the numbers here you know you might
[00:07:23] and the numbers here you know you might not really know how to internalize this
[00:07:25] not really know how to internalize this but you see these big gaps right so in
[00:07:27] but you see these big gaps right so in terms of
[00:07:28] terms of standard american english you get these
[00:07:30] standard american english you get these 57 uh sort of f1 score type accuracy and
[00:07:33] 57 uh sort of f1 score type accuracy and then 43 in african-americans and you get
[00:07:36] then 43 in african-americans and you get a 14-point gap and sort of
[00:07:38] a 14-point gap and sort of state-of-the-art for this task say you
[00:07:39] state-of-the-art for this task say you know you're competing over like one
[00:07:40] know you're competing over like one point difference so these are enormous
[00:07:42] point difference so these are enormous gaps uh once you go from standard
[00:07:44] gaps uh once you go from standard american english to african-american
[00:07:45] american english to african-american vernacular
[00:07:47] vernacular and these kinds of things can have huge
[00:07:49] and these kinds of things can have huge downstream impact um if they're used in
[00:07:51] downstream impact um if they're used in things like relation extraction or qa
[00:07:53] things like relation extraction or qa systems right because texts from
[00:07:55] systems right because texts from african-americans are just
[00:07:56] african-americans are just systematically not going to get
[00:07:58] systematically not going to get extracted into say relations or entities
[00:08:00] extracted into say relations or entities when you build knowledge bases and
[00:08:02] when you build knowledge bases and things like that
[00:08:03] things like that and so you might sort of see how this
[00:08:05] and so you might sort of see how this begins to affect these kinds of minority
[00:08:07] begins to affect these kinds of minority groups
[00:08:09] groups through these kinds of robustness
[00:08:10] through these kinds of robustness problems
[00:08:14] another example
[00:08:16] another example is video captioning so many of you have
[00:08:18] is video captioning so many of you have already interacted with systems like
[00:08:20] already interacted with systems like this through youtube's video captioning
[00:08:22] this through youtube's video captioning system
[00:08:23] system where the input is you know you have a
[00:08:25] where the input is you know you have a video with some spoken text audio and
[00:08:27] video with some spoken text audio and the output is text captions that are
[00:08:30] the output is text captions that are automatically added to the video
[00:08:33] automatically added to the video and these things are increasingly
[00:08:34] and these things are increasingly important because say if you have um
[00:08:37] important because say if you have um i know that in medical domains if you
[00:08:39] i know that in medical domains if you have medicaid funded sort of videos that
[00:08:41] have medicaid funded sort of videos that you need to put up on the internet you
[00:08:42] you need to put up on the internet you need to have captions and so in these
[00:08:44] need to have captions and so in these cases you have you either run these
[00:08:46] cases you have you either run these systems or you have people transcribe
[00:08:48] systems or you have people transcribe the videos
[00:08:51] the videos and what's been found is that these
[00:08:53] and what's been found is that these kinds of systems work a lot worse for
[00:08:54] kinds of systems work a lot worse for women um so this is a study by rachel
[00:08:56] women um so this is a study by rachel chapman 2017
[00:08:58] chapman 2017 where she basically showed that if you
[00:09:00] where she basically showed that if you took uh male versus female speakers and
[00:09:02] took uh male versus female speakers and you ran them through
[00:09:04] you ran them through youtube's video captioning system you
[00:09:06] youtube's video captioning system you guys systematically higher error rates
[00:09:08] guys systematically higher error rates for women and you see that sort of the
[00:09:09] for women and you see that sort of the median error rate is essentially the
[00:09:11] median error rate is essentially the upper quartile error rate for maps so
[00:09:13] upper quartile error rate for maps so that's actually a pretty substantial
[00:09:14] that's actually a pretty substantial difference in the word error rate
[00:09:16] difference in the word error rate between these two groups
[00:09:18] between these two groups and you also see sort of expected
[00:09:19] and you also see sort of expected differences between dialects which is
[00:09:21] differences between dialects which is you know scottish speakers get
[00:09:22] you know scottish speakers get substantially worse
[00:09:24] substantially worse um video captioning accuracy uh whereas
[00:09:27] um video captioning accuracy uh whereas you know speakers from california uh get
[00:09:29] you know speakers from california uh get really good word error rates and so you
[00:09:31] really good word error rates and so you know you can sort of see how this sort
[00:09:33] know you can sort of see how this sort of manifests right youtube being based
[00:09:34] of manifests right youtube being based in california obviously dog fooded with
[00:09:37] in california obviously dog fooded with people with californian accents
[00:09:39] people with californian accents and when tested out of distribution on
[00:09:41] and when tested out of distribution on scottish speakers suddenly performs a
[00:09:42] scottish speakers suddenly performs a lot worse
[00:09:44] lot worse and so this is the kind of robustness
[00:09:46] and so this is the kind of robustness problems that you initially don't think
[00:09:47] problems that you initially don't think about because you sort of think about
[00:09:49] about because you sort of think about well is our model performing well on you
[00:09:52] well is our model performing well on you know really a complex uh input and so
[00:09:55] know really a complex uh input and so you might put in some really complex
[00:09:56] you might put in some really complex inputs as a california speaker but
[00:09:59] inputs as a california speaker but really you haven't tested out of
[00:10:00] really you haven't tested out of distribution on scottish accents
[00:10:05] distribution on scottish accents and then we'll come to another example
[00:10:07] and then we'll come to another example which many of you hopefully already know
[00:10:09] which many of you hopefully already know in facial recognition this has sort of
[00:10:11] in facial recognition this has sort of been really widely discussed even in the
[00:10:13] been really widely discussed even in the media and just to go over what the task
[00:10:15] media and just to go over what the task is the input
[00:10:17] is the input is images um
[00:10:19] is images um possibly containing a face or not
[00:10:21] possibly containing a face or not depending on the task
[00:10:23] depending on the task and you can do many sort of things with
[00:10:24] and you can do many sort of things with these images and there are many outputs
[00:10:26] these images and there are many outputs that are associated with face
[00:10:27] that are associated with face recognition or identification tasks
[00:10:30] recognition or identification tasks and so you might ask is there a face in
[00:10:32] and so you might ask is there a face in this image and that's sort of face uh
[00:10:34] this image and that's sort of face uh sorry recognition um you might need to
[00:10:36] sorry recognition um you might need to match a given face to uh database of
[00:10:39] match a given face to uh database of faces and that would be identification
[00:10:41] faces and that would be identification um or you might need to predict
[00:10:42] um or you might need to predict attributes is this um
[00:10:45] attributes is this um face
[00:10:46] face a female face or a male face or you know
[00:10:49] a female face or a male face or you know happy or sad you have many sort of
[00:10:50] happy or sad you have many sort of attribute prediction tasks that can be
[00:10:52] attribute prediction tasks that can be built on top of faces
[00:10:54] built on top of faces and
[00:10:56] and this is one of the original studies i
[00:10:58] this is one of the original studies i think in terms of highlighting
[00:11:00] think in terms of highlighting how bad these kinds of systems can be
[00:11:02] how bad these kinds of systems can be in sort of widespread ways and so
[00:11:05] in sort of widespread ways and so there's a study from mit media lab
[00:11:07] there's a study from mit media lab gender shades and joy bullani vilam
[00:11:10] gender shades and joy bullani vilam winnie in 2018
[00:11:12] winnie in 2018 where she basically took
[00:11:13] where she basically took a whole bunch of um portraits of
[00:11:16] a whole bunch of um portraits of legislators from different countries
[00:11:18] legislators from different countries african and i think northern european
[00:11:21] african and i think northern european and ran them through
[00:11:22] and ran them through different face
[00:11:24] different face attribute prediction systems for whether
[00:11:25] attribute prediction systems for whether or not they were male or female and what
[00:11:28] or not they were male or female and what you can sort of see on this uh on the
[00:11:30] you can sort of see on this uh on the top right
[00:11:31] top right is that dark female skin uh results in
[00:11:35] is that dark female skin uh results in much worse
[00:11:36] much worse gender predictions compared to
[00:11:38] gender predictions compared to light skinned males where you basically
[00:11:40] light skinned males where you basically have perfect prediction
[00:11:43] have perfect prediction and these kinds of things are
[00:11:45] and these kinds of things are pretty problematic if you've been
[00:11:46] pretty problematic if you've been testing your systems on light-skinned
[00:11:49] testing your systems on light-skinned people you think your system is near
[00:11:50] people you think your system is near perfect and so you might be using it for
[00:11:52] perfect and so you might be using it for really high-stakes tasks
[00:11:55] really high-stakes tasks where you need a 100 performance but
[00:11:56] where you need a 100 performance but then when applied to these sort of
[00:11:58] then when applied to these sort of darker skinned uh demographic groups you
[00:12:01] darker skinned uh demographic groups you end up with substantially worse
[00:12:02] end up with substantially worse performance and so you're not you don't
[00:12:04] performance and so you're not you don't you don't even realize um the kinds of
[00:12:06] you don't even realize um the kinds of harms that you're causing by using these
[00:12:08] harms that you're causing by using these kinds of systems
[00:12:11] kinds of systems and what's sort of problematic and sort
[00:12:13] and what's sort of problematic and sort of you can see is that they reflect a
[00:12:15] of you can see is that they reflect a lot of the benchmark data that's been
[00:12:16] lot of the benchmark data that's been constructed for this task and so on the
[00:12:18] constructed for this task and so on the right on the right bottom here
[00:12:20] right on the right bottom here um you see
[00:12:22] um you see the distribution of sort of skin color
[00:12:25] the distribution of sort of skin color and gender for benchmark data sets in
[00:12:28] and gender for benchmark data sets in this kind of like sort of gender
[00:12:30] this kind of like sort of gender um identification from face image tasks
[00:12:33] um identification from face image tasks and what you see is that they're sort of
[00:12:34] and what you see is that they're sort of a systematic underrepresentation of both
[00:12:37] a systematic underrepresentation of both females and darker skinned demographics
[00:12:42] females and darker skinned demographics and you might say you know this really
[00:12:44] and you might say you know this really just reflects the underlying data
[00:12:46] just reflects the underlying data distribution and so maybe all we need to
[00:12:48] distribution and so maybe all we need to get is you know unbiased data you hear
[00:12:49] get is you know unbiased data you hear this term a lot from i think people who
[00:12:53] this term a lot from i think people who um haven't thought too deeply about
[00:12:55] um haven't thought too deeply about problems of robustness but the issue is
[00:12:57] problems of robustness but the issue is that there's really no such thing as
[00:12:59] that there's really no such thing as truly unbiased data in the sense that
[00:13:00] truly unbiased data in the sense that there will always be an underrepresented
[00:13:02] there will always be an underrepresented group if you slice your data fine enough
[00:13:04] group if you slice your data fine enough so we need to really just go beyond
[00:13:05] so we need to really just go beyond thinking about balancing the data set
[00:13:07] thinking about balancing the data set and we need to think about how can we
[00:13:08] and we need to think about how can we make our models work well
[00:13:10] make our models work well even for really small groups really
[00:13:11] even for really small groups really small demographic groups and even
[00:13:12] small demographic groups and even individuals
[00:13:16] another task
[00:13:18] another task that has these kinds of issues is
[00:13:20] that has these kinds of issues is language identification so as an input
[00:13:22] language identification so as an input you might be uh working at twitter and
[00:13:24] you might be uh working at twitter and you need to identify the language of a
[00:13:25] you need to identify the language of a tweet so that you can run a machine
[00:13:27] tweet so that you can run a machine translation system and automatically
[00:13:28] translation system and automatically translate a tweet into someone's uh sort
[00:13:31] translate a tweet into someone's uh sort of speaking language
[00:13:33] of speaking language but in order to do this you need to
[00:13:35] but in order to do this you need to first identify what text the tweet is
[00:13:37] first identify what text the tweet is written in right and so you might have a
[00:13:39] written in right and so you might have a lot of different kinds of inputs and
[00:13:41] lot of different kinds of inputs and this figure one shows the challenge in
[00:13:43] this figure one shows the challenge in this task
[00:13:44] this task so you might have um dialectical text so
[00:13:47] so you might have um dialectical text so the top one is on nigerian english
[00:13:50] the top one is on nigerian english the second one is sort of sort of irish
[00:13:52] the second one is sort of sort of irish tweets and the last one you can have
[00:13:54] tweets and the last one you can have code switching so you can have a mix of
[00:13:55] code switching so you can have a mix of both indonesian and english and so in
[00:13:58] both indonesian and english and so in language identification when you're
[00:13:59] language identification when you're given these kinds of tweets you need to
[00:14:01] given these kinds of tweets you need to identify the source language
[00:14:03] identify the source language that they were written in
[00:14:04] that they were written in and so the output of the task is the
[00:14:06] and so the output of the task is the language
[00:14:08] language what's sort of been identified is that
[00:14:10] what's sort of been identified is that there's systematic biases once again in
[00:14:12] there's systematic biases once again in language identification and sort of one
[00:14:14] language identification and sort of one that's immediately a little bit
[00:14:15] that's immediately a little bit troubling is that african-american
[00:14:17] troubling is that african-american english often gets identified not as
[00:14:19] english often gets identified not as english so there's sort of like an
[00:14:21] english so there's sort of like an implicit normative judgment being made
[00:14:23] implicit normative judgment being made here that african-american vernacular is
[00:14:25] here that african-american vernacular is uh not english and you see this uh error
[00:14:27] uh not english and you see this uh error right here
[00:14:28] right here as aae having almost double the array of
[00:14:31] as aae having almost double the array of language identification compared to a
[00:14:33] language identification compared to a more standard american english
[00:14:35] more standard american english data set
[00:14:37] data set you also see this across languages and
[00:14:39] you also see this across languages and this is a study by jergens at all in
[00:14:40] this is a study by jergens at all in 2017
[00:14:42] 2017 where if you sort sort of the languages
[00:14:44] where if you sort sort of the languages by the human development
[00:14:46] by the human development index of the countries um you see that
[00:14:48] index of the countries um you see that there's this decreasing uh recall or
[00:14:50] there's this decreasing uh recall or decreasing accuracy um as the countries
[00:14:53] decreasing accuracy um as the countries get less and less developed and that's
[00:14:54] get less and less developed and that's because often these kinds of countries
[00:14:57] because often these kinds of countries have under-resourced
[00:14:58] have under-resourced data sets and so there isn't as much
[00:15:01] data sets and so there isn't as much data with which to train these language
[00:15:03] data with which to train these language identification systems
[00:15:04] identification systems so you see the systematic biases in
[00:15:06] so you see the systematic biases in terms of how well developed and how
[00:15:08] terms of how well developed and how internet connected these countries are
[00:15:11] internet connected these countries are and this this leads to sort of
[00:15:12] and this this leads to sort of representational harms right like if
[00:15:14] representational harms right like if you're an african-american english
[00:15:15] you're an african-american english speaker and a system you know tells you
[00:15:17] speaker and a system you know tells you that you're not speaking english that's
[00:15:19] that you're not speaking english that's kind of harmful and there's also utility
[00:15:21] kind of harmful and there's also utility harms right like if your text doesn't
[00:15:23] harms right like if your text doesn't automatically get translated to english
[00:15:24] automatically get translated to english you know you're going to
[00:15:26] you know you're going to your tweets won't reach as wide of an
[00:15:28] your tweets won't reach as wide of an audience
[00:15:29] audience and so you can think of these
[00:15:31] and so you can think of these as having sort of pretty serious
[00:15:32] as having sort of pretty serious implications for fairness as machine
[00:15:34] implications for fairness as machine learning becomes more widespread and
[00:15:36] learning becomes more widespread and more useful
[00:15:37] more useful and more impactful
[00:15:38] and more impactful and so there's these problems of
[00:15:41] and so there's these problems of serious sort of active discrimination um
[00:15:44] serious sort of active discrimination um where this was a story in the new york
[00:15:46] where this was a story in the new york times where
[00:15:47] times where um a face
[00:15:48] um a face recognition system identified a person
[00:15:50] recognition system identified a person as being a criminal and this was faulty
[00:15:52] as being a criminal and this was faulty and this was essentially the only reason
[00:15:55] and this was essentially the only reason for arresting this michigan man and so
[00:15:57] for arresting this michigan man and so if you have a system that's
[00:15:59] if you have a system that's much more error prone on african
[00:16:01] much more error prone on african americans you're basically going to have
[00:16:02] americans you're basically going to have a much higher error rate when deploying
[00:16:04] a much higher error rate when deploying these kinds of algorithms so you have
[00:16:05] these kinds of algorithms so you have these active discriminations and harms
[00:16:07] these active discriminations and harms that are being done but on the other
[00:16:09] that are being done but on the other side right we think that you know as
[00:16:11] side right we think that you know as people taking machine learning and
[00:16:12] people taking machine learning and studying machine learning we think that
[00:16:14] studying machine learning we think that these kinds of technologies are broadly
[00:16:15] these kinds of technologies are broadly beneficial and useful um and increase
[00:16:18] beneficial and useful um and increase efficiency and so you know there's a
[00:16:20] efficiency and so you know there's a study um by eric barone yolfsen which
[00:16:23] study um by eric barone yolfsen which says you know the application of machine
[00:16:25] says you know the application of machine translation systems um increased you
[00:16:27] translation systems um increased you know exports on ebay by 17.5 because
[00:16:31] know exports on ebay by 17.5 because it's really easy to translate text and
[00:16:32] it's really easy to translate text and so people from other countries can buy
[00:16:34] so people from other countries can buy your product but if for example
[00:16:37] your product but if for example language identifiers can't identify your
[00:16:38] language identifiers can't identify your language and so you can't use machine
[00:16:40] language and so you can't use machine translation systems then you don't get
[00:16:42] translation systems then you don't get these benefits right so you get unequal
[00:16:44] these benefits right so you get unequal access to sort of the fruits of these
[00:16:46] access to sort of the fruits of these kinds of ai systems and so this can lead
[00:16:48] kinds of ai systems and so this can lead to sort of uh harms in both ways you
[00:16:51] to sort of uh harms in both ways you don't get access to the benefits um and
[00:16:53] don't get access to the benefits um and you get these kinds of active harms from
[00:16:55] you get these kinds of active harms from the errors that these systems make
[00:16:58] the errors that these systems make um and i'm gonna stop here because i
[00:16:59] um and i'm gonna stop here because i think fairness is a topic that many
[00:17:01] think fairness is a topic that many people have
[00:17:02] people have uh feelings and comments about and i'd
[00:17:04] uh feelings and comments about and i'd be happy to sort of just sit around and
[00:17:06] be happy to sort of just sit around and discuss for the next couple of minutes
[00:17:07] discuss for the next couple of minutes if anyone has questions um about sort of
[00:17:10] if anyone has questions um about sort of how fairness and these sort of
[00:17:11] how fairness and these sort of robustness questions interact with each
[00:17:12] robustness questions interact with each other
[00:17:13] other yeah uh there's a bunch of questions in
[00:17:15] yeah uh there's a bunch of questions in the chat oh
[00:17:17] the chat oh yes uh sorry i have full screen so i'm
[00:17:19] yes uh sorry i have full screen so i'm gonna
[00:17:20] gonna give me a moment to pull up the chat um
[00:17:24] give me a moment to pull up the chat um yes okay
[00:17:26] yes okay okay i see it now
[00:17:28] okay i see it now ah okay so
[00:17:30] ah okay so i think
[00:17:32] i think there are um i'll start with the first
[00:17:34] there are um i'll start with the first question which is um
[00:17:36] question which is um having um balanced unbiased data is not
[00:17:38] having um balanced unbiased data is not enough and so this is a very subtle
[00:17:40] enough and so this is a very subtle point so there are data that you can
[00:17:41] point so there are data that you can construct that will make you
[00:17:43] construct that will make you um robust to certain kinds of groups
[00:17:45] um robust to certain kinds of groups right like so let's let's go back to um
[00:17:47] right like so let's let's go back to um just this slide
[00:17:51] um so if we look at this sort of
[00:17:53] um so if we look at this sort of distribution of data it's clear that at
[00:17:55] distribution of data it's clear that at least for this like top one aliens uh
[00:17:57] least for this like top one aliens uh we're probably going to have some sort
[00:17:59] we're probably going to have some sort of bias in terms of light versus dark
[00:18:01] of bias in terms of light versus dark skin because dark skin is so
[00:18:02] skin because dark skin is so underrepresented
[00:18:03] underrepresented um but if we balance this data out right
[00:18:06] um but if we balance this data out right we might still have um sort of
[00:18:08] we might still have um sort of unbalanced unbalanced uh demographics in
[00:18:11] unbalanced unbalanced uh demographics in certain other areas right maybe it's not
[00:18:14] certain other areas right maybe it's not uh dark versus light skin but maybe it's
[00:18:15] uh dark versus light skin but maybe it's geographic region it might be income
[00:18:18] geographic region it might be income right these kinds of problems are
[00:18:19] right these kinds of problems are enumerable and so what you really need
[00:18:21] enumerable and so what you really need and not is not necessarily the search
[00:18:23] and not is not necessarily the search for this unreachable perfectly balanced
[00:18:25] for this unreachable perfectly balanced data set
[00:18:26] data set but sort of a model that can sort of do
[00:18:29] but sort of a model that can sort of do well on small data sets or small um
[00:18:31] well on small data sets or small um amounts of data so you want to have a
[00:18:33] amounts of data so you want to have a model that can take in these kinds of
[00:18:34] model that can take in these kinds of imbalanced data and do well both on the
[00:18:37] imbalanced data and do well both on the male and or sorry the dark and the light
[00:18:38] male and or sorry the dark and the light skin and the important thing in this
[00:18:40] skin and the important thing in this task is there's no real trade-off right
[00:18:42] task is there's no real trade-off right like there's no real reason that you
[00:18:43] like there's no real reason that you can't do well on both the light and dark
[00:18:45] can't do well on both the light and dark skin and i think that's the sort of
[00:18:46] skin and i think that's the sort of crucial structure here like if you can
[00:18:48] crucial structure here like if you can do well on both groups
[00:18:50] do well on both groups then it's not really about the amount of
[00:18:52] then it's not really about the amount of data or the distribution of data it's
[00:18:54] data or the distribution of data it's more about the model and sort of how
[00:18:55] more about the model and sort of how you're learning it
[00:18:58] and the second question is there way to
[00:19:00] and the second question is there way to audit models without having access
[00:19:03] audit models without having access to the model um
[00:19:05] to the model um so that's a interesting question i mean
[00:19:09] so that's a interesting question i mean i'm not sure if you meant by this like
[00:19:11] i'm not sure if you meant by this like access to the model's outputs or
[00:19:13] access to the model's outputs or something else like if you have access
[00:19:15] something else like if you have access to the model's outputs right you can
[00:19:16] to the model's outputs right you can perform a study like gender shades where
[00:19:18] perform a study like gender shades where you run the model
[00:19:20] you run the model on certain
[00:19:21] on certain sort of challenge examples and you look
[00:19:23] sort of challenge examples and you look at what the error rate is and you say
[00:19:25] at what the error rate is and you say well clearly we are doing much worse on
[00:19:27] well clearly we are doing much worse on dark skinned females than light-skinned
[00:19:28] dark skinned females than light-skinned males so there's some sort of sort of
[00:19:30] males so there's some sort of sort of bias so you can audit models that way it
[00:19:32] bias so you can audit models that way it becomes much harder to audit models if
[00:19:34] becomes much harder to audit models if you can't execute the model on your own
[00:19:36] you can't execute the model on your own data then you'll have to do something a
[00:19:38] data then you'll have to do something a little bit tricky and it requires
[00:19:40] little bit tricky and it requires very specialized conditions i think to
[00:19:42] very specialized conditions i think to be able to audit those kinds of models
[00:19:46] also feel free to ask follow-up
[00:19:47] also feel free to ask follow-up questions if i didn't answer any of
[00:19:48] questions if i didn't answer any of these
[00:19:50] these so uh similar to the issue with a person
[00:19:52] so uh similar to the issue with a person in michigan there has been efforts in
[00:19:54] in michigan there has been efforts in applying ai to model future problem
[00:19:55] applying ai to model future problem human behavior
[00:19:57] human behavior um
[00:19:58] um oh this is a comment yes um and that's
[00:20:00] oh this is a comment yes um and that's highly problematic uh i think in one of
[00:20:02] highly problematic uh i think in one of the earlier uh slider talks there was a
[00:20:05] the earlier uh slider talks there was a discussion about sort of how
[00:20:07] discussion about sort of how amplification and feedback effects are
[00:20:08] amplification and feedback effects are really insidious and yeah so predicting
[00:20:10] really insidious and yeah so predicting the future and actioning on predicted
[00:20:12] the future and actioning on predicted future behavior is even more problematic
[00:20:14] future behavior is even more problematic than the task i described here because
[00:20:17] than the task i described here because acting on the real world will change the
[00:20:18] acting on the real world will change the outcome right so if you predict that
[00:20:20] outcome right so if you predict that crime will happen in a certain area you
[00:20:21] crime will happen in a certain area you assign more police and you find more
[00:20:23] assign more police and you find more crime that's going to lead to a pretty
[00:20:24] crime that's going to lead to a pretty vicious feedback loop so you need to
[00:20:26] vicious feedback loop so you need to really think about
[00:20:28] really think about sort of the whole socio technical system
[00:20:31] sort of the whole socio technical system rather than just the classification
[00:20:32] rather than just the classification system narrowly narrowly when you're in
[00:20:35] system narrowly narrowly when you're in those settings
[00:20:36] those settings um
[00:20:38] um the last one uh it seems that we can
[00:20:40] the last one uh it seems that we can always slice our data into more cell
[00:20:41] always slice our data into more cell populations to test for fairness are
[00:20:43] populations to test for fairness are there industry standards for what we
[00:20:45] there industry standards for what we should usually start with
[00:20:47] should usually start with that's a great question and also a
[00:20:48] that's a great question and also a really important academic one um so
[00:20:51] really important academic one um so there is a sort of
[00:20:53] there is a sort of easy answer which is that a lot of
[00:20:56] easy answer which is that a lot of research and a lot of industry work has
[00:20:58] research and a lot of industry work has focused on sort of legally protected
[00:20:59] focused on sort of legally protected groups
[00:21:00] groups and that's a well-defined set of
[00:21:02] and that's a well-defined set of attributes that you can't discriminate
[00:21:04] attributes that you can't discriminate on and so you can group by those you can
[00:21:05] on and so you can group by those you can group by intersections of those and you
[00:21:07] group by intersections of those and you can say those are the groups i shouldn't
[00:21:09] can say those are the groups i shouldn't discriminate on
[00:21:10] discriminate on but sort of academically this seems
[00:21:12] but sort of academically this seems unsatisfying because why should those be
[00:21:14] unsatisfying because why should those be the only things we care about and
[00:21:16] the only things we care about and there's a lot of work on sort of
[00:21:17] there's a lot of work on sort of individualized fairness making sure that
[00:21:19] individualized fairness making sure that you do well on sort of individual people
[00:21:22] you do well on sort of individual people make sure you treat people that are
[00:21:23] make sure you treat people that are similar similarly and things like that
[00:21:25] similar similarly and things like that and that's a whole active area of
[00:21:26] and that's a whole active area of research and sort of not really
[00:21:28] research and sort of not really something where there's an obvious and
[00:21:30] something where there's an obvious and clear answer yet
[00:21:32] clear answer yet okay
[00:21:33] okay any other final questions before i move
[00:21:35] any other final questions before i move on
[00:21:42] okay um so now i'm going to move on to
[00:21:46] okay um so now i'm going to move on to sort of the second point that i talked
[00:21:47] sort of the second point that i talked about before that machine learning
[00:21:49] about before that machine learning systems aren't really secure and can't
[00:21:51] systems aren't really secure and can't really be used in many high-stakes
[00:21:53] really be used in many high-stakes situations
[00:21:54] situations so
[00:21:55] so i'm going to start with one of the most
[00:21:58] i'm going to start with one of the most well-known examples of this um called
[00:22:00] well-known examples of this um called adversarial examples
[00:22:02] adversarial examples where on the left we have an image this
[00:22:04] where on the left we have an image this is a panda and a classification system
[00:22:07] is a panda and a classification system gets this mostly right so it's panda
[00:22:08] gets this mostly right so it's panda with 57 confidence that's great now what
[00:22:12] with 57 confidence that's great now what we're going to do is we're going to add
[00:22:14] we're going to do is we're going to add a very specially designed and visually
[00:22:16] a very specially designed and visually imperceptible perturbation so this
[00:22:18] imperceptible perturbation so this middle panel looks like complete noise
[00:22:20] middle panel looks like complete noise we scale it down so that it looks just
[00:22:22] we scale it down so that it looks just like zeros and then we add it to the
[00:22:24] like zeros and then we add it to the panda image and we get the image on the
[00:22:26] panda image and we get the image on the right now we run our image classifier
[00:22:28] right now we run our image classifier and what we get out is it's almost
[00:22:30] and what we get out is it's almost certainly a given which is you know
[00:22:31] certainly a given which is you know completely wrong
[00:22:33] completely wrong and so what this tells us is
[00:22:35] and so what this tells us is we can find visually imperceptible
[00:22:37] we can find visually imperceptible perturbations that lead to
[00:22:39] perturbations that lead to very confident misclassifications and
[00:22:41] very confident misclassifications and i'm not going to you know show you the
[00:22:42] i'm not going to you know show you the results of this adversarial example
[00:22:44] results of this adversarial example stuff but you can do this to almost any
[00:22:46] stuff but you can do this to almost any system and you can completely and
[00:22:47] system and you can completely and catastrophically destroy the accuracy of
[00:22:50] catastrophically destroy the accuracy of all of these systems and this also
[00:22:52] all of these systems and this also happens in nlp systems and so on and so
[00:22:54] happens in nlp systems and so on and so this is a really sort of hard to avoid
[00:22:56] this is a really sort of hard to avoid and almost sort of universal behavior uh
[00:22:58] and almost sort of universal behavior uh and i want to show you how you know
[00:23:01] and i want to show you how you know sort of robust this kind of behavior is
[00:23:04] sort of robust this kind of behavior is um and so it doesn't have to be images
[00:23:06] um and so it doesn't have to be images on a computer screen it can happen by
[00:23:08] on a computer screen it can happen by putting little black and white patches
[00:23:11] putting little black and white patches to a stop sign and so on the left a
[00:23:13] to a stop sign and so on the left a system is going to classify that as a
[00:23:14] system is going to classify that as a yield instead of a stop sign
[00:23:16] yield instead of a stop sign the middle one is a fun 3d printed
[00:23:20] the middle one is a fun 3d printed toy where if you try to run an object
[00:23:22] toy where if you try to run an object recognizer it will say gun from almost
[00:23:24] recognizer it will say gun from almost any angle
[00:23:25] any angle and the right one is adversarial sticker
[00:23:28] and the right one is adversarial sticker where if you stick it anywhere and you
[00:23:29] where if you stick it anywhere and you take an image it's going to say that
[00:23:30] take an image it's going to say that it's a toaster
[00:23:32] it's a toaster instead of a banana which is what it
[00:23:34] instead of a banana which is what it should be so these are very many
[00:23:36] should be so these are very many different formats but you have this uh
[00:23:37] different formats but you have this uh same and kind of disturbing phenomena
[00:23:40] same and kind of disturbing phenomena where
[00:23:41] where you know it's obvious to us that
[00:23:43] you know it's obvious to us that something shouldn't be
[00:23:45] something shouldn't be tricking us like you know black and
[00:23:46] tricking us like you know black and white patches that are that small or
[00:23:48] white patches that are that small or like weird texture on a turtle shouldn't
[00:23:50] like weird texture on a turtle shouldn't really be fooling us into changing our
[00:23:52] really be fooling us into changing our predictions but it really fools
[00:23:55] predictions but it really fools these image classifiers
[00:23:57] these image classifiers and when you first see this you think
[00:23:59] and when you first see this you think there must be a really simple patch
[00:24:01] there must be a really simple patch right like maybe you like run it through
[00:24:02] right like maybe you like run it through a jpeg compressor maybe you add a little
[00:24:04] a jpeg compressor maybe you add a little bit of extra noise to every image
[00:24:06] bit of extra noise to every image and so this has led to an enormous
[00:24:08] and so this has led to an enormous number of papers
[00:24:10] number of papers over like a hundred or so over the last
[00:24:13] over like a hundred or so over the last um five or six years in which people
[00:24:16] um five or six years in which people have tried a lot of different things to
[00:24:17] have tried a lot of different things to defend
[00:24:18] defend against
[00:24:20] against these kinds of what's called adversarial
[00:24:21] these kinds of what's called adversarial perturbations but the problem is that
[00:24:23] perturbations but the problem is that every time someone comes up with a
[00:24:25] every time someone comes up with a defense um soon after
[00:24:27] defense um soon after someone's gonna someone breaks it by
[00:24:29] someone's gonna someone breaks it by finding you know a better attack or even
[00:24:31] finding you know a better attack or even somewhat more disturbingly just running
[00:24:33] somewhat more disturbingly just running the old attack for longer
[00:24:35] the old attack for longer and so it kind of seems like this is a
[00:24:37] and so it kind of seems like this is a really persistent
[00:24:39] really persistent um and serious phenomena and
[00:24:42] um and serious phenomena and i think the recent view of a lot of
[00:24:44] i think the recent view of a lot of these adversarial example um
[00:24:48] these adversarial example um type
[00:24:49] type problems is not that you know there's
[00:24:51] problems is not that you know there's some
[00:24:52] some really degenerate artifact about the way
[00:24:54] really degenerate artifact about the way we train models or the way we um
[00:24:56] we train models or the way we um optimize things it's really just the
[00:24:58] optimize things it's really just the fact that there are a lot of ways to
[00:25:00] fact that there are a lot of ways to have a high performance prediction
[00:25:02] have a high performance prediction system and many of the ways in which we
[00:25:04] system and many of the ways in which we can predict accurately rely on these
[00:25:06] can predict accurately rely on these what we're going to call non-robust
[00:25:08] what we're going to call non-robust features
[00:25:09] features and so when we try to say classify a dog
[00:25:12] and so when we try to say classify a dog or a cat or so on we as humans rely on
[00:25:15] or a cat or so on we as humans rely on these what
[00:25:16] these what we're going to term robust features
[00:25:18] we're going to term robust features right we try to identify eyes and snout
[00:25:20] right we try to identify eyes and snout in these parts
[00:25:21] in these parts and so these kinds of things are pretty
[00:25:23] and so these kinds of things are pretty robust to pixel level perturbations but
[00:25:25] robust to pixel level perturbations but actually you know low level textures and
[00:25:28] actually you know low level textures and really small image patches are also very
[00:25:30] really small image patches are also very predictive of classes of dogs versus
[00:25:33] predictive of classes of dogs versus cats let's say
[00:25:35] cats let's say and who are we to say right that that's
[00:25:37] and who are we to say right that that's like an incorrect way to make the
[00:25:38] like an incorrect way to make the predictions because when we train the
[00:25:40] predictions because when we train the model
[00:25:41] model all we're saying is you know just
[00:25:42] all we're saying is you know just classify these dogs and cats well
[00:25:45] classify these dogs and cats well and so you can think of this as saying
[00:25:46] and so you can think of this as saying you know our problem is underspecified
[00:25:49] you know our problem is underspecified and there are many ways to get a good
[00:25:50] and there are many ways to get a good classifier and some of them really rely
[00:25:53] classifier and some of them really rely on the use of these sort of like
[00:25:54] on the use of these sort of like non-robust features
[00:25:57] non-robust features um
[00:25:58] um and
[00:25:59] and this has kind of serious security
[00:26:01] this has kind of serious security implications right
[00:26:03] implications right um if you're trying to make a
[00:26:04] um if you're trying to make a self-driving car system
[00:26:07] self-driving car system um
[00:26:09] um you know the stop sign being classified
[00:26:10] you know the stop sign being classified as a yield is pretty bad you might run
[00:26:12] as a yield is pretty bad you might run over a pedestrian and this really
[00:26:14] over a pedestrian and this really prevents the application of machine
[00:26:16] prevents the application of machine learning systems in things like
[00:26:18] learning systems in things like um
[00:26:19] um self-driving cars or at least we should
[00:26:21] self-driving cars or at least we should be very hesitant if we believe that
[00:26:23] be very hesitant if we believe that these kinds of problems are inherent
[00:26:24] these kinds of problems are inherent right because the world is kind of
[00:26:26] right because the world is kind of designed so that they're um really
[00:26:28] designed so that they're um really easily perceptible to humans and not
[00:26:30] easily perceptible to humans and not necessarily designed so that small
[00:26:32] necessarily designed so that small perturbations say by putting on stickers
[00:26:35] perturbations say by putting on stickers don't change um stop signs to yield
[00:26:37] don't change um stop signs to yield signs um and in other cases right vision
[00:26:40] signs um and in other cases right vision systems i think are being increasingly
[00:26:42] systems i think are being increasingly being used in high-stakes applications
[00:26:44] being used in high-stakes applications like we might reasonably imagine say
[00:26:46] like we might reasonably imagine say right at a tsa checkpoint there's a
[00:26:48] right at a tsa checkpoint there's a camera that's running and it tries to
[00:26:49] camera that's running and it tries to identify whether or not you have a gun
[00:26:51] identify whether or not you have a gun right and if you can make these
[00:26:52] right and if you can make these adversarial examples that say make a gun
[00:26:54] adversarial examples that say make a gun a knot gun or a turtle a gun that seems
[00:26:57] a knot gun or a turtle a gun that seems very problematic right we can't use
[00:26:59] very problematic right we can't use vision systems for those kinds of high
[00:27:00] vision systems for those kinds of high stakes applications that we might want
[00:27:02] stakes applications that we might want to use them for
[00:27:04] to use them for and so both of these really pose
[00:27:05] and so both of these really pose challenges for the use of machine
[00:27:07] challenges for the use of machine learning in these like sort of high
[00:27:09] learning in these like sort of high stakes life or death kind of uh settings
[00:27:12] stakes life or death kind of uh settings um
[00:27:13] um and i'm gonna stop here um to take
[00:27:15] and i'm gonna stop here um to take questions about uh adversarial examples
[00:27:18] questions about uh adversarial examples for the next couple minutes
[00:27:22] do i know why the first example was
[00:27:24] do i know why the first example was classified as a yield sign
[00:27:26] classified as a yield sign um that's a good question i mean with
[00:27:28] um that's a good question i mean with all these adversarial examples the
[00:27:31] all these adversarial examples the reason why they're being classified as
[00:27:33] reason why they're being classified as yield is pretty
[00:27:35] yield is pretty confusing i think well
[00:27:36] confusing i think well um for example like why is this turtle
[00:27:39] um for example like why is this turtle being classified as a gun i'm really not
[00:27:41] being classified as a gun i'm really not sure it doesn't look anything like a gun
[00:27:42] sure it doesn't look anything like a gun and the textures don't look like a gun
[00:27:45] and the textures don't look like a gun the way these things are constructed
[00:27:46] the way these things are constructed right is they're constructed by
[00:27:48] right is they're constructed by optimization process you're basically
[00:27:50] optimization process you're basically looking for
[00:27:52] looking for perturbations on say a normal turtle
[00:27:54] perturbations on say a normal turtle texture that lead it to being a gun and
[00:27:56] texture that lead it to being a gun and so there's no real interpretable reason
[00:27:58] so there's no real interpretable reason why say this looks like a yield sign or
[00:28:01] why say this looks like a yield sign or this is being classified as a gun from
[00:28:02] this is being classified as a gun from every angle
[00:28:06] can i train these train on these
[00:28:08] can i train these train on these examples to correct classification
[00:28:11] examples to correct classification oh sorry i skipped a question how do we
[00:28:14] oh sorry i skipped a question how do we define the non-robust feature versus
[00:28:17] define the non-robust feature versus ah yes okay so this is an ad hoc
[00:28:19] ah yes okay so this is an ad hoc definition so the split just here is
[00:28:22] definition so the split just here is just whether or not a feature can be
[00:28:23] just whether or not a feature can be flipped by changing um the image
[00:28:26] flipped by changing um the image slightly in pixel space um and so that's
[00:28:29] slightly in pixel space um and so that's really the working definition here of
[00:28:31] really the working definition here of robust versus non-robust
[00:28:33] robust versus non-robust and i think
[00:28:35] and i think if we were being more precise i think
[00:28:37] if we were being more precise i think this should really be split as saying
[00:28:39] this should really be split as saying visually imperceptible is non-robust and
[00:28:42] visually imperceptible is non-robust and visually perceptible is possibly robust
[00:28:44] visually perceptible is possibly robust and i think that's sort of a pretty
[00:28:45] and i think that's sort of a pretty reasonable split anything that humans
[00:28:47] reasonable split anything that humans really cannot visually tell
[00:28:50] really cannot visually tell should not be being used as sort of
[00:28:51] should not be being used as sort of features as inputs to a
[00:28:54] features as inputs to a reliable prediction system
[00:28:57] um can i train on these examples to
[00:28:59] um can i train on these examples to correct classification um i'm going to
[00:29:02] correct classification um i'm going to try to interpret this question because
[00:29:03] try to interpret this question because i'm not 100 sure i think what you're
[00:29:06] i'm not 100 sure i think what you're describing is an idea called adversarial
[00:29:08] describing is an idea called adversarial training so the idea is to basically
[00:29:11] training so the idea is to basically instead of training on um
[00:29:14] instead of training on um just the input image let's just go back
[00:29:16] just the input image let's just go back a little bit instead of trying on the
[00:29:17] a little bit instead of trying on the pandas right we try to train our system
[00:29:20] pandas right we try to train our system to
[00:29:21] to basically classify this image the
[00:29:23] basically classify this image the adversarial image as being a pen right
[00:29:25] adversarial image as being a pen right and you might think okay this is good um
[00:29:27] and you might think okay this is good um we can now make this a panda
[00:29:29] we can now make this a panda but now we need to prevent um some other
[00:29:32] but now we need to prevent um some other sort of adversarially designed noise
[00:29:34] sort of adversarially designed noise from making sure that we look like a
[00:29:35] from making sure that we look like a panda right because there's probably
[00:29:37] panda right because there's probably many many many different attacks that
[00:29:40] many many many different attacks that will change this into a panda and so the
[00:29:42] will change this into a panda and so the idea that you're describing here is
[00:29:44] idea that you're describing here is basically called adversarial training
[00:29:46] basically called adversarial training and i think that's in
[00:29:47] and i think that's in yes that's one of the earliest
[00:29:50] yes that's one of the earliest defense approaches
[00:29:52] defense approaches and it's empirically reasonably
[00:29:54] and it's empirically reasonably effective
[00:29:55] effective um
[00:29:56] um but it's
[00:29:58] but it's you can still attack it by sort of more
[00:30:00] you can still attack it by sort of more sophisticated methods you can find
[00:30:02] sophisticated methods you can find still visually imperceptible attacks
[00:30:04] still visually imperceptible attacks after adversarial training that breaks
[00:30:06] after adversarial training that breaks the system so this is not really a
[00:30:07] the system so this is not really a foolproof way
[00:30:08] foolproof way of um trying to make models more robust
[00:30:10] of um trying to make models more robust it's better than nothing
[00:30:12] it's better than nothing um are you seeing yet unfound defenses
[00:30:15] um are you seeing yet unfound defenses are needed for ml self-driving cars to
[00:30:16] are needed for ml self-driving cars to be secure from nefarious attacks i think
[00:30:18] be secure from nefarious attacks i think this is a great question
[00:30:21] this is a great question i think i was i'm being a little bit too
[00:30:24] i think i was i'm being a little bit too aggressive in terms of the things that
[00:30:26] aggressive in terms of the things that i'm saying right um
[00:30:29] i'm saying right um sort of it's an open question whether or
[00:30:30] sort of it's an open question whether or not these kinds of attacks
[00:30:32] not these kinds of attacks are really feasible in the real world or
[00:30:34] are really feasible in the real world or whether or not they're things that we
[00:30:36] whether or not they're things that we should
[00:30:36] should worry about right like in the real world
[00:30:38] worry about right like in the real world i can easily cut out cut off a stop sign
[00:30:40] i can easily cut out cut off a stop sign you know using a saw and that's an
[00:30:42] you know using a saw and that's an adversarial human attack but we're not
[00:30:44] adversarial human attack but we're not too worried about that attack and so
[00:30:46] too worried about that attack and so maybe we shouldn't be worried about
[00:30:47] maybe we shouldn't be worried about adversarial attacks on self-driving car
[00:30:49] adversarial attacks on self-driving car systems but i think
[00:30:51] systems but i think there are two things that this
[00:30:52] there are two things that this highlights one thing is that we should
[00:30:54] highlights one thing is that we should be a little bit careful when we deploy
[00:30:56] be a little bit careful when we deploy these self-driving car systems right
[00:30:57] these self-driving car systems right like we should have fail safes that for
[00:30:58] like we should have fail safes that for example don't rely just on vision
[00:31:00] example don't rely just on vision that seems pretty important we might
[00:31:02] that seems pretty important we might want to use like radar or lidar radar
[00:31:04] want to use like radar or lidar radar doesn't work on soft people let's say
[00:31:05] doesn't work on soft people let's say lidar um try to make sure we're not
[00:31:07] lidar um try to make sure we're not going to run over people when we miss
[00:31:08] going to run over people when we miss detect the stop sign right like having
[00:31:10] detect the stop sign right like having lots of orthogonal checks becomes
[00:31:12] lots of orthogonal checks becomes increasingly important once you realize
[00:31:14] increasingly important once you realize that there are ways to fool these vision
[00:31:16] that there are ways to fool these vision systems
[00:31:16] systems um
[00:31:18] um and
[00:31:19] and i think people are working on sort of
[00:31:21] i think people are working on sort of provably robust machine learning systems
[00:31:24] provably robust machine learning systems um maybe in settings like sort of
[00:31:26] um maybe in settings like sort of military applications those become truly
[00:31:27] military applications those become truly important um and so there is progress on
[00:31:30] important um and so there is progress on that but it's just that provably robust
[00:31:31] that but it's just that provably robust systems achieve much worse average
[00:31:34] systems achieve much worse average accuracy than non-robust systems so
[00:31:35] accuracy than non-robust systems so there's this big gap right now that we
[00:31:37] there's this big gap right now that we don't really know how to close
[00:31:39] don't really know how to close um
[00:31:41] um okay in my opinion is research now
[00:31:43] okay in my opinion is research now shifting towards reformulating models to
[00:31:45] shifting towards reformulating models to rely on robust features instead of
[00:31:46] rely on robust features instead of finding ad hoc defenses
[00:31:48] finding ad hoc defenses um
[00:31:49] um that's a good question i think there is
[00:31:51] that's a good question i think there is still a big gap in terms of provably
[00:31:53] still a big gap in terms of provably robust defenses versus
[00:31:56] robust defenses versus um these what you might call ad hoc
[00:31:58] um these what you might call ad hoc defenses that work well for like say one
[00:32:00] defenses that work well for like say one or two targeted attack types
[00:32:03] or two targeted attack types um
[00:32:04] um but i think there's things like
[00:32:05] but i think there's things like randomized smoothing and procedures like
[00:32:07] randomized smoothing and procedures like that now that sort of get the best of
[00:32:09] that now that sort of get the best of both worlds in some sense they're
[00:32:10] both worlds in some sense they're getting increasingly they're provably
[00:32:11] getting increasingly they're provably correct in some framework and they're
[00:32:14] correct in some framework and they're getting increasingly better um and so i
[00:32:16] getting increasingly better um and so i think for high-stakes applications i
[00:32:17] think for high-stakes applications i think we'll end up in a place where
[00:32:19] think we'll end up in a place where we'll lose some average case accuracy
[00:32:21] we'll lose some average case accuracy but not catastrophically so and we'll
[00:32:23] but not catastrophically so and we'll still have sort of adversarially robust
[00:32:25] still have sort of adversarially robust models
[00:32:27] models that's sort of where i imagine the field
[00:32:29] that's sort of where i imagine the field will go
[00:32:30] will go um it does seem like ad-hoc defenses
[00:32:32] um it does seem like ad-hoc defenses keep getting broken so that's not really
[00:32:33] keep getting broken so that's not really a path towards like truly robust systems
[00:32:35] a path towards like truly robust systems even though they might make
[00:32:37] even though they might make for more useful systems overall if for
[00:32:39] for more useful systems overall if for example adversarial training leads to
[00:32:42] example adversarial training leads to more interpretable latent features
[00:32:45] more interpretable latent features um does producing adversarial attacks
[00:32:47] um does producing adversarial attacks require access to the model if so isn't
[00:32:49] require access to the model if so isn't this just an issue of info security
[00:32:51] this just an issue of info security equivalent of
[00:32:53] equivalent of uh
[00:32:54] uh i can't pass that second sentence but
[00:32:56] i can't pass that second sentence but yes i agree
[00:32:58] yes i agree with this general sentiment as well
[00:33:00] with this general sentiment as well right so if you need access to the
[00:33:02] right so if you need access to the internals of the model
[00:33:03] internals of the model then really at that point you know
[00:33:05] then really at that point you know you've rooted the system if you're
[00:33:08] you've rooted the system if you're attacking say a medical imaging system
[00:33:10] attacking say a medical imaging system uh or you've like have access to
[00:33:12] uh or you've like have access to someone's car and if you're the mossad
[00:33:14] someone's car and if you're the mossad you can probably like mess with their
[00:33:15] you can probably like mess with their brakes right um and so it's true that
[00:33:18] brakes right um and so it's true that those attack models are pretty obscure
[00:33:21] those attack models are pretty obscure and weird but there are for um what are
[00:33:24] and weird but there are for um what are called white box attacks which only
[00:33:25] called white box attacks which only require to evaluate the model
[00:33:28] require to evaluate the model for one and for two um
[00:33:30] for one and for two um a lot of systems are shared right so you
[00:33:32] a lot of systems are shared right so you only need to learn to attack them
[00:33:34] only need to learn to attack them uh once and so if you're trying to
[00:33:36] uh once and so if you're trying to attack tesla's um auto driving system
[00:33:38] attack tesla's um auto driving system you need to get a tesla you need to
[00:33:40] you need to get a tesla you need to figure out you know the adversarial
[00:33:41] figure out you know the adversarial sticker that's going to make your system
[00:33:43] sticker that's going to make your system go haywire and then you need to like
[00:33:45] go haywire and then you need to like paste that everywhere right that doesn't
[00:33:46] paste that everywhere right that doesn't require a particularly sophisticated uh
[00:33:49] require a particularly sophisticated uh threat model in order to execute
[00:33:52] threat model in order to execute so i think there's models in which these
[00:33:54] so i think there's models in which these are sort of real and problematic even
[00:33:56] are sort of real and problematic even though i think there's a lot of you know
[00:33:57] though i think there's a lot of you know questions and debate about whether or
[00:33:59] questions and debate about whether or not we should really care about this
[00:34:01] not we should really care about this like cost-benefit trade-offs of robust
[00:34:03] like cost-benefit trade-offs of robust versus non-robust systems and so on but
[00:34:05] versus non-robust systems and so on but it's an important thing to keep in mind
[00:34:06] it's an important thing to keep in mind right
[00:34:10] okay
[00:34:12] okay any other questions for the adversarial
[00:34:14] any other questions for the adversarial examples part
[00:34:20] okay
[00:34:22] okay and so now i'm going to get to the last
[00:34:23] and so now i'm going to get to the last part which given that this is a ai class
[00:34:26] part which given that this is a ai class is maybe the most important of the
[00:34:29] is maybe the most important of the uh three failures of sort of machine
[00:34:31] uh three failures of sort of machine learning now in terms of robustness i
[00:34:33] learning now in terms of robustness i mean i think it's one of understanding
[00:34:35] mean i think it's one of understanding and
[00:34:36] and people sort of throw this word around a
[00:34:38] people sort of throw this word around a lot that models don't really understand
[00:34:40] lot that models don't really understand and it's hard to pin down what
[00:34:41] and it's hard to pin down what understanding is but it's very easy to
[00:34:43] understanding is but it's very easy to show when models don't understand and so
[00:34:46] show when models don't understand and so we can go through some examples here
[00:34:47] we can go through some examples here we'll go through them again in more
[00:34:49] we'll go through them again in more detail
[00:34:50] detail um but this is from an overview of what
[00:34:52] um but this is from an overview of what people call shortcuts um in the citation
[00:34:54] people call shortcuts um in the citation on the bottom
[00:34:55] on the bottom um and for example let's say we're
[00:34:57] um and for example let's say we're trying to caption the this image here
[00:34:59] trying to caption the this image here and so we need to you know describe in
[00:35:02] and so we need to you know describe in text what's happening but really
[00:35:04] text what's happening but really sometimes these systems might just use
[00:35:05] sometimes these systems might just use the background instead of actually
[00:35:07] the background instead of actually recognizing hills and skies and sheep
[00:35:09] recognizing hills and skies and sheep um
[00:35:11] um adversarial examples maybe you know
[00:35:13] adversarial examples maybe you know because we're recognizing textures and
[00:35:15] because we're recognizing textures and not actually recognizing the shape of
[00:35:16] not actually recognizing the shape of things like teapots
[00:35:18] things like teapots and if we're doing medical image
[00:35:19] and if we're doing medical image diagnostics you know we might be looking
[00:35:21] diagnostics you know we might be looking at markings on x-rays like what hospital
[00:35:24] at markings on x-rays like what hospital the x-ray came from instead of actually
[00:35:26] the x-ray came from instead of actually performing prediction and so in all
[00:35:27] performing prediction and so in all these cases we're making use of these
[00:35:30] these cases we're making use of these pieces of information that shouldn't
[00:35:32] pieces of information that shouldn't really be central to the tasks like
[00:35:33] really be central to the tasks like they're not the core production tasks
[00:35:35] they're not the core production tasks that we care about and somehow the model
[00:35:37] that we care about and somehow the model has picked them up and learned to do
[00:35:38] has picked them up and learned to do really well um and i'm just gonna group
[00:35:41] really well um and i'm just gonna group this broadly under this label of
[00:35:43] this broadly under this label of shortcut learning
[00:35:46] shortcut learning and the way to think about this is that
[00:35:48] and the way to think about this is that when we train models in machine learning
[00:35:50] when we train models in machine learning we're training them to do well well
[00:35:52] we're training them to do well well directly on the training set and these
[00:35:54] directly on the training set and these days we now expect them to do well on
[00:35:56] days we now expect them to do well on the test set but really what we would
[00:35:58] the test set but really what we would ideally like them to do as sort of
[00:36:00] ideally like them to do as sort of systems that reason and understand and
[00:36:02] systems that reason and understand and so on is that they perform well you know
[00:36:05] so on is that they perform well you know on these like challenge sets like really
[00:36:07] on these like challenge sets like really difficult examples that we've
[00:36:08] difficult examples that we've constructed to break the model
[00:36:10] constructed to break the model um and so you can think about this as
[00:36:12] um and so you can think about this as you know there's a lot of possible rules
[00:36:14] you know there's a lot of possible rules that work well in the training set and
[00:36:16] that work well in the training set and there's fewer that work well on um test
[00:36:18] there's fewer that work well on um test sets and there's you know very very few
[00:36:21] sets and there's you know very very few that work well on sort of these
[00:36:23] that work well on sort of these challenge sets the intended true
[00:36:25] challenge sets the intended true reasoning that we would like our models
[00:36:27] reasoning that we would like our models to extract
[00:36:28] to extract and so we can think about machine
[00:36:30] and so we can think about machine learning today as you know we've gone
[00:36:32] learning today as you know we've gone from this like
[00:36:33] from this like uh tan colored circle where we were
[00:36:35] uh tan colored circle where we were before to this you know blue colored
[00:36:38] before to this you know blue colored circle where we are now where we do all
[00:36:39] circle where we are now where we do all the tests at but where we really want to
[00:36:41] the tests at but where we really want to be is still further we want to make sure
[00:36:42] be is still further we want to make sure we learn the right sort of mechanism
[00:36:44] we learn the right sort of mechanism um and you may have heard this classic
[00:36:47] um and you may have heard this classic ai story um of tank identification and i
[00:36:50] ai story um of tank identification and i think this is like a really old uh like
[00:36:52] think this is like a really old uh like you know cold war kind of story where um
[00:36:55] you know cold war kind of story where um i'm gonna read this out loud you know
[00:36:56] i'm gonna read this out loud you know the army trained the program to
[00:36:57] the army trained the program to differentiate american tanks from
[00:36:59] differentiate american tanks from russian tanks it got 100 accuracy on a
[00:37:01] russian tanks it got 100 accuracy on a test set but later people realized that
[00:37:04] test set but later people realized that american tanks were photographed on a
[00:37:05] american tanks were photographed on a sunny day and russian tanks were
[00:37:07] sunny day and russian tanks were photographed on a cloudy day and the
[00:37:08] photographed on a cloudy day and the computer launched effect brightness
[00:37:11] computer launched effect brightness not actually detected tanks
[00:37:13] not actually detected tanks and so you know this is exactly the kind
[00:37:14] and so you know this is exactly the kind of problem that i'm talking about right
[00:37:16] of problem that i'm talking about right um where we have this extremely high
[00:37:18] um where we have this extremely high test accuracy and we are super happy but
[00:37:21] test accuracy and we are super happy but then we realize we haven't learned
[00:37:22] then we realize we haven't learned anything about the underlying task
[00:37:24] anything about the underlying task um and this has been attributed to a lot
[00:37:26] um and this has been attributed to a lot of different people it was i think
[00:37:28] of different people it was i think originally in written form on dreyfus's
[00:37:30] originally in written form on dreyfus's textbook um but turns out it's an actual
[00:37:32] textbook um but turns out it's an actual real example this can't really be
[00:37:34] real example this can't really be attributed to any actual experiment run
[00:37:36] attributed to any actual experiment run by the army um the citation here is
[00:37:38] by the army um the citation here is warren he has a website where he has
[00:37:40] warren he has a website where he has gone through all the possible
[00:37:41] gone through all the possible attributions of this urban myth um
[00:37:44] attributions of this urban myth um but really this urban myth has is so
[00:37:46] but really this urban myth has is so popular i think at least in the ai
[00:37:48] popular i think at least in the ai machine learning community because
[00:37:49] machine learning community because there's a kernel of truth uh to this
[00:37:53] there's a kernel of truth uh to this um
[00:37:54] um sort of story and we're gonna go through
[00:37:57] sort of story and we're gonna go through several examples of um
[00:38:00] several examples of um tasks today where there's these kinds of
[00:38:02] tasks today where there's these kinds of failures
[00:38:04] failures um so one of them that's kind of fun uh
[00:38:06] um so one of them that's kind of fun uh is
[00:38:07] is this vision task apparently of where
[00:38:09] this vision task apparently of where people have tried to predict gender from
[00:38:11] people have tried to predict gender from iris patterns
[00:38:12] iris patterns um and
[00:38:14] um and there was apparently some belief that
[00:38:16] there was apparently some belief that this was a task that you can perform
[00:38:18] this was a task that you can perform because you can actually get reasonably
[00:38:19] because you can actually get reasonably high test accuracy if you train cnns on
[00:38:23] high test accuracy if you train cnns on cropped images of viruses and you try to
[00:38:24] cropped images of viruses and you try to predict gender
[00:38:26] predict gender um but there's this paper
[00:38:28] um but there's this paper that identified that actually this is
[00:38:30] that identified that actually this is not actually because of the iris it's
[00:38:32] not actually because of the iris it's because um female eyes often have
[00:38:34] because um female eyes often have mascara and that systematically shifts
[00:38:37] mascara and that systematically shifts the brightness of the images um and this
[00:38:39] the brightness of the images um and this sort of
[00:38:40] sort of histogram tells sort of you know a
[00:38:42] histogram tells sort of you know a thousand words in one image and so on
[00:38:44] thousand words in one image and so on the top one you see this distribution of
[00:38:47] the top one you see this distribution of males and this x-axis here is the
[00:38:49] males and this x-axis here is the average brightness of the image and so
[00:38:51] average brightness of the image and so the distributions look pretty similar
[00:38:52] the distributions look pretty similar like males and females have a similar
[00:38:54] like males and females have a similar brightness distribution
[00:38:56] brightness distribution when females have no cosmetics on
[00:38:59] when females have no cosmetics on but if you restrict yourself to females
[00:39:01] but if you restrict yourself to females with cosmetics you know this red
[00:39:03] with cosmetics you know this red distribution shifts to the left
[00:39:04] distribution shifts to the left and the image becomes darker and so you
[00:39:07] and the image becomes darker and so you see that there's this you know very
[00:39:08] see that there's this you know very strong confounding effect of the female
[00:39:11] strong confounding effect of the female eyes having mascara therefore being
[00:39:14] eyes having mascara therefore being darker and therefore these systems
[00:39:16] darker and therefore these systems predicting quite well based on this
[00:39:18] predicting quite well based on this average uh darkness
[00:39:20] average uh darkness even though really apparently wasn't
[00:39:22] even though really apparently wasn't learning anything at all in terms of
[00:39:23] learning anything at all in terms of this actual prediction task
[00:39:26] this actual prediction task another one from sort of the sort of
[00:39:28] another one from sort of the sort of guern website and sort of the
[00:39:30] guern website and sort of the investigation into this tank phenomenon
[00:39:32] investigation into this tank phenomenon which is interesting is uh kaggle
[00:39:34] which is interesting is uh kaggle fisheries competition where the task is
[00:39:36] fisheries competition where the task is you're given images um of fishes being
[00:39:39] you're given images um of fishes being caught on a fishing boat and the task is
[00:39:41] caught on a fishing boat and the task is to identify whether or not these uh
[00:39:43] to identify whether or not these uh boats are catching fish illegally so
[00:39:45] boats are catching fish illegally so you're supposed to identify whether or
[00:39:46] you're supposed to identify whether or not these fish are part of a set of sort
[00:39:49] not these fish are part of a set of sort of protected category of fish you're not
[00:39:51] of protected category of fish you're not supposed to catch
[00:39:52] supposed to catch i mean it turns out on the training set
[00:39:54] i mean it turns out on the training set you can do extremely well on this task
[00:39:56] you can do extremely well on this task using a very simple heuristic
[00:39:59] using a very simple heuristic these images come from a relatively
[00:40:00] these images come from a relatively small number of boats so you first
[00:40:01] small number of boats so you first identify each boat and then you identify
[00:40:04] identify each boat and then you identify for each boat whether or not they've
[00:40:05] for each boat whether or not they've been catching illegal fish and this
[00:40:07] been catching illegal fish and this approach does really well because it
[00:40:09] approach does really well because it turns out only a few fish or a few boats
[00:40:11] turns out only a few fish or a few boats um catch these illegal types of fish and
[00:40:13] um catch these illegal types of fish and so by first identifying the boat and
[00:40:15] so by first identifying the boat and then by identifying the fish you can get
[00:40:17] then by identifying the fish you can get extremely high accuracy even though you
[00:40:19] extremely high accuracy even though you have learned nothing about actually
[00:40:21] have learned nothing about actually performing this fish identification task
[00:40:25] performing this fish identification task another one that seems maybe more high
[00:40:27] another one that seems maybe more high stakes and problematic
[00:40:29] stakes and problematic is um in medical prediction there is a
[00:40:31] is um in medical prediction there is a lot of talk about you know tumor
[00:40:33] lot of talk about you know tumor identification or chest x-ray
[00:40:36] identification or chest x-ray sort of malignancy prediction
[00:40:38] sort of malignancy prediction and in these cases right we it's pretty
[00:40:41] and in these cases right we it's pretty important to ask whether or not we're
[00:40:43] important to ask whether or not we're doing well because
[00:40:45] doing well because these are high stakes situations then
[00:40:46] these are high stakes situations then you would like to make sure that you're
[00:40:48] you would like to make sure that you're not being fooled by some sort of feature
[00:40:50] not being fooled by some sort of feature that makes the cast easier than it
[00:40:51] that makes the cast easier than it actually should be
[00:40:53] actually should be and there's often claims now of these
[00:40:55] and there's often claims now of these systems performing
[00:40:57] systems performing just as well as human doctors in terms
[00:40:59] just as well as human doctors in terms of their diagnostic accuracy and so on
[00:41:02] of their diagnostic accuracy and so on and
[00:41:03] and one sort of really interesting um and
[00:41:06] one sort of really interesting um and maybe a little problematic example is
[00:41:09] maybe a little problematic example is when you have these sort of tumors um
[00:41:12] when you have these sort of tumors um sort of skin lesions that you're trying
[00:41:13] sort of skin lesions that you're trying to classify as whether or not they're
[00:41:14] to classify as whether or not they're cancerous or not um doctors will often
[00:41:17] cancerous or not um doctors will often put surgical markers to highlight
[00:41:20] put surgical markers to highlight tumors that they think are more serious
[00:41:22] tumors that they think are more serious than others just so that when someone
[00:41:23] than others just so that when someone else is looking at them they can
[00:41:24] else is looking at them they can immediately identify the more
[00:41:26] immediately identify the more problematic ones
[00:41:28] problematic ones and
[00:41:29] and the training set apparently for these
[00:41:31] the training set apparently for these systems contain a lot of these markings
[00:41:33] systems contain a lot of these markings and so there was a examination into
[00:41:36] and so there was a examination into these sort of tumor classification
[00:41:37] these sort of tumor classification systems where they artificially added uh
[00:41:40] systems where they artificially added uh markings to these images as well as
[00:41:42] markings to these images as well as cropped out markings from already marked
[00:41:45] cropped out markings from already marked images and they show that they can
[00:41:46] images and they show that they can basically flip the classification output
[00:41:48] basically flip the classification output of these systems
[00:41:50] of these systems and so in some ways the high accuracy of
[00:41:52] and so in some ways the high accuracy of these kinds of classification systems
[00:41:54] these kinds of classification systems are not because they're identifying
[00:41:55] are not because they're identifying humors it's because they're piggybacking
[00:41:58] humors it's because they're piggybacking on humans who have already in many cases
[00:42:01] on humans who have already in many cases classified the tumors as being malignant
[00:42:02] classified the tumors as being malignant or not
[00:42:04] or not um an early um problem that someone
[00:42:07] um an early um problem that someone identified in one of the earlier works
[00:42:09] identified in one of the earlier works and oh sorry identification
[00:42:12] of this esteva in 2011 is when people
[00:42:16] of this esteva in 2011 is when people are trying to identify whether or not
[00:42:18] are trying to identify whether or not tumors are malignant when in serious
[00:42:20] tumors are malignant when in serious cases people would include rulers to
[00:42:22] cases people would include rulers to show how big the tumor is and so the
[00:42:24] show how big the tumor is and so the existence of a ruler would serve as a
[00:42:25] existence of a ruler would serve as a sort of spurious correlation or a
[00:42:27] sort of spurious correlation or a confounder
[00:42:29] confounder in terms of
[00:42:31] in terms of uh whether or not a tumor was malignant
[00:42:34] uh whether or not a tumor was malignant and finally
[00:42:36] and finally one that i think people are now aware
[00:42:38] one that i think people are now aware about but initially i think people are
[00:42:40] about but initially i think people are um
[00:42:41] um sort of not as aware of is that hospital
[00:42:44] sort of not as aware of is that hospital id often serves as a really reliable
[00:42:46] id often serves as a really reliable indicator of both sort of base risk
[00:42:49] indicator of both sort of base risk level as well as the type of procedures
[00:42:51] level as well as the type of procedures being performed at a hospital
[00:42:54] being performed at a hospital and this you can think of as like the
[00:42:55] and this you can think of as like the analogous to the boat example in the
[00:42:57] analogous to the boat example in the fishing problem where if you identify
[00:43:00] fishing problem where if you identify hospitals that say have a lot of
[00:43:03] hospitals that say have a lot of smokers you're going to much more likely
[00:43:05] smokers you're going to much more likely find
[00:43:06] find cancer in lung chest x-rays from those
[00:43:09] cancer in lung chest x-rays from those types of hospitals and so it's really
[00:43:11] types of hospitals and so it's really important to try to remove the effect of
[00:43:13] important to try to remove the effect of sort of identifying the hospital and
[00:43:14] sort of identifying the hospital and then identifying the base risk
[00:43:19] um
[00:43:20] um a really interesting one i wasn't aware
[00:43:22] a really interesting one i wasn't aware of in image classification until
[00:43:24] of in image classification until yesterday or so is um pascal voc
[00:43:28] yesterday or so is um pascal voc is a pretty common uh object innovation
[00:43:30] is a pretty common uh object innovation data set and uh
[00:43:33] data set and uh a bias that's been identified is that
[00:43:34] a bias that's been identified is that the horse class for this
[00:43:36] the horse class for this i guess was taken by a single horse
[00:43:38] i guess was taken by a single horse photographer who put in watermarks at
[00:43:40] photographer who put in watermarks at the bottom left of the image so around
[00:43:41] the bottom left of the image so around 20 of the horse images have a watermark
[00:43:45] 20 of the horse images have a watermark and reliable classification systems just
[00:43:47] and reliable classification systems just learn to pick up on the watermark so you
[00:43:49] learn to pick up on the watermark so you can make cars um classified as horses as
[00:43:51] can make cars um classified as horses as long as you add the watermark on the
[00:43:53] long as you add the watermark on the bottom right
[00:43:55] bottom right and so this is something where unless
[00:43:57] and so this is something where unless you really carefully looked at the data
[00:43:59] you really carefully looked at the data set you probably won't even realize that
[00:44:01] set you probably won't even realize that this kind of bias exists until you've
[00:44:03] this kind of bias exists until you've actually um
[00:44:05] actually um sort of carefully examined an
[00:44:06] sort of carefully examined an adversarially examine the data sets that
[00:44:09] adversarially examine the data sets that you have
[00:44:13] sort of finally
[00:44:14] sort of finally there is
[00:44:15] there is i've mostly talked about vision
[00:44:18] i've mostly talked about vision examples thus far but this is a
[00:44:20] examples thus far but this is a this sort of shortcuts and lack of
[00:44:22] this sort of shortcuts and lack of understanding
[00:44:24] understanding is a problem that's common to
[00:44:25] is a problem that's common to every area and i'm going to give
[00:44:27] every area and i'm going to give probably a very well-known example in
[00:44:29] probably a very well-known example in natural language processing
[00:44:30] natural language processing the task here is entailment prediction
[00:44:33] the task here is entailment prediction so you're given a pair of sentences one
[00:44:35] so you're given a pair of sentences one is called the premise and another one is
[00:44:37] is called the premise and another one is called the hypothesis so the first one
[00:44:38] called the hypothesis so the first one is a sentence like the economy could be
[00:44:40] is a sentence like the economy could be better
[00:44:41] better and the second one is a sentence like
[00:44:43] and the second one is a sentence like the economy has never been better
[00:44:46] the economy has never been better and the goal here is to say does the
[00:44:48] and the goal here is to say does the hypothesis logically follow from the
[00:44:51] hypothesis logically follow from the statement made in the premise and so if
[00:44:53] statement made in the premise and so if they follow you say it's entailed if
[00:44:55] they follow you say it's entailed if it's contradicted you say it's a
[00:44:56] it's contradicted you say it's a contradiction and if it's neither you
[00:44:58] contradiction and if it's neither you say it's neither and so it's a
[00:44:59] say it's neither and so it's a three-class classification problem and
[00:45:01] three-class classification problem and the way these tasks are sorry the way
[00:45:03] the way these tasks are sorry the way these examples are constructed is
[00:45:05] these examples are constructed is through crowdsourcing where you extract
[00:45:07] through crowdsourcing where you extract the premise sentence from some sort of
[00:45:09] the premise sentence from some sort of large internet text
[00:45:11] large internet text or newswire text i guess um and you have
[00:45:14] or newswire text i guess um and you have a label that you randomly pick so you
[00:45:16] a label that you randomly pick so you say i have a premise and it's going to
[00:45:18] say i have a premise and it's going to be a contradiction and then you ask
[00:45:20] be a contradiction and then you ask crowd workers to write down a
[00:45:21] crowd workers to write down a contradiction so they write something
[00:45:22] contradiction so they write something like the economy has never been better
[00:45:25] like the economy has never been better right
[00:45:27] right and so
[00:45:28] and so uh what happens here is that crowd
[00:45:29] uh what happens here is that crowd workers because they're writing the
[00:45:31] workers because they're writing the hypothesis text
[00:45:32] hypothesis text after you're seeing the label have
[00:45:34] after you're seeing the label have systematic biases and the bias that's
[00:45:36] systematic biases and the bias that's really strong is this negation
[00:45:39] really strong is this negation where they learn that negation is often
[00:45:42] where they learn that negation is often or sorry where the bias is that when
[00:45:44] or sorry where the bias is that when something's not entailed they use
[00:45:45] something's not entailed they use negation and so a model will often learn
[00:45:47] negation and so a model will often learn to associate the negation or lack
[00:45:49] to associate the negation or lack thereof with the outcome label and so
[00:45:52] thereof with the outcome label and so instead of actually doing these sort of
[00:45:54] instead of actually doing these sort of entailment tasks they often pick up
[00:45:55] entailment tasks they often pick up these negation biases and even more
[00:45:58] these negation biases and even more problematically
[00:46:00] problematically systems have what's called or sorry
[00:46:02] systems have what's called or sorry what's called a hypothesis-only baseline
[00:46:04] what's called a hypothesis-only baseline where you don't even look at the premise
[00:46:06] where you don't even look at the premise can do extremely well right and there's
[00:46:08] can do extremely well right and there's no way to do well on this task while
[00:46:09] no way to do well on this task while looking just at the hypothesis right
[00:46:11] looking just at the hypothesis right because how can you know
[00:46:12] because how can you know that the hypothesis is entailed from the
[00:46:14] that the hypothesis is entailed from the premise while only just looking at one
[00:46:17] premise while only just looking at one and so this shows like this really
[00:46:18] and so this shows like this really strong bias that these crowd workers put
[00:46:21] strong bias that these crowd workers put into this data set
[00:46:25] and so this has kind of serious
[00:46:26] and so this has kind of serious implications for the project of
[00:46:28] implications for the project of pushing
[00:46:30] pushing machine learning and getting towards
[00:46:31] machine learning and getting towards understanding and general ai right
[00:46:34] understanding and general ai right because
[00:46:34] because thus far all of machine learning has
[00:46:36] thus far all of machine learning has been predicated on benchmark progress
[00:46:38] been predicated on benchmark progress and that's the way in which the field
[00:46:40] and that's the way in which the field has really grown and done well you know
[00:46:43] has really grown and done well you know imagenet and mnli and these sort of
[00:46:46] imagenet and mnli and these sort of well-known tasks you get everyone
[00:46:47] well-known tasks you get everyone together and we push on these numbers
[00:46:49] together and we push on these numbers and we hope that improvements in these
[00:46:52] and we hope that improvements in these benchmark performances lead to
[00:46:53] benchmark performances lead to understanding but it's clear that
[00:46:54] understanding but it's clear that because of these biases that may not
[00:46:57] because of these biases that may not necessarily be the case right and so
[00:46:58] necessarily be the case right and so like we need a different sort of
[00:47:00] like we need a different sort of paradigm to link machine learning
[00:47:02] paradigm to link machine learning performance to understanding
[00:47:05] performance to understanding and the other problem that i hope by
[00:47:06] and the other problem that i hope by sort of going over so many examples i
[00:47:08] sort of going over so many examples i was able to impress upon you
[00:47:11] was able to impress upon you is that you know there are so many
[00:47:12] is that you know there are so many shortcuts right and like you know in
[00:47:15] shortcuts right and like you know in this uh negation bias from crowd workers
[00:47:17] this uh negation bias from crowd workers like you wouldn't know about this unless
[00:47:19] like you wouldn't know about this unless you sort of looked at the data set
[00:47:22] you sort of looked at the data set carefully after being told that there
[00:47:23] carefully after being told that there was a bias right where the water marks
[00:47:25] was a bias right where the water marks were horses like i don't even know how
[00:47:27] were horses like i don't even know how they found that given that how sort of
[00:47:29] they found that given that how sort of minor that is
[00:47:31] minor that is so it becomes really hard to say like
[00:47:34] so it becomes really hard to say like we're just going to construct a data set
[00:47:35] we're just going to construct a data set free of shortcuts like when you're told
[00:47:37] free of shortcuts like when you're told about these shortcuts afterwards it
[00:47:39] about these shortcuts afterwards it seems really obvious but like how can
[00:47:41] seems really obvious but like how can you construct a shortcut free data set
[00:47:43] you construct a shortcut free data set um
[00:47:45] um and so that's sort of the real challenge
[00:47:47] and so that's sort of the real challenge now if we think that we can't get data
[00:47:49] now if we think that we can't get data sets free of these shortcuts and these
[00:47:50] sets free of these shortcuts and these biases and these minority groups you
[00:47:53] biases and these minority groups you know we need a new way of sort of trying
[00:47:54] know we need a new way of sort of trying to make sure that our models really
[00:47:56] to make sure that our models really learn the right thing
[00:47:58] learn the right thing um and i'm going to stop here for a
[00:48:00] um and i'm going to stop here for a moment to sort of talk about shortcuts
[00:48:02] moment to sort of talk about shortcuts and understanding and hopefully people
[00:48:03] and understanding and hopefully people have lots of questions because i think
[00:48:04] have lots of questions because i think this one is a fun one uh in terms of
[00:48:07] this one is a fun one uh in terms of thinking about how machine learning
[00:48:08] thinking about how machine learning really stay ai and so on
[00:48:15] sure um
[00:48:18] just thinking about like some committee
[00:48:20] just thinking about like some committee modeling and and um you know i'm coming
[00:48:22] modeling and and um you know i'm coming back to that stop sign with the patches
[00:48:25] back to that stop sign with the patches that seem as a yield sign
[00:48:27] that seem as a yield sign if you had um
[00:48:29] if you had um you know does it make more sense to use
[00:48:32] you know does it make more sense to use one big model trained on every piece of
[00:48:34] one big model trained on every piece of data you can find
[00:48:36] data you can find or does it make sense to you know
[00:48:39] or does it make sense to you know train a bunch of models on some
[00:48:41] train a bunch of models on some partitions of the data
[00:48:43] partitions of the data that might overlap in some way and then
[00:48:45] that might overlap in some way and then combine those votes in order to make it
[00:48:47] combine those votes in order to make it less
[00:48:49] less um
[00:48:51] um uh i'm sort of um
[00:48:53] uh i'm sort of um different sort of robotically um
[00:48:57] different sort of robotically um easily fooled in a robotic way um
[00:49:00] easily fooled in a robotic way um yeah i think that's um a generally good
[00:49:04] yeah i think that's um a generally good thing to do like so
[00:49:06] thing to do like so i guess there's two answers and like the
[00:49:08] i guess there's two answers and like the more general one
[00:49:09] more general one is to think about the trade-off between
[00:49:12] is to think about the trade-off between uh model capacity
[00:49:13] uh model capacity and your ability to fit these like sort
[00:49:16] and your ability to fit these like sort of minority groups or these like
[00:49:17] of minority groups or these like shortcuts right so
[00:49:19] shortcuts right so the idea you're describing let's say we
[00:49:21] the idea you're describing let's say we have you know 10 or like 100 different
[00:49:23] have you know 10 or like 100 different models right and we fit them to
[00:49:24] models right and we fit them to different parts of the data then we
[00:49:26] different parts of the data then we might have a model that's dedicated to
[00:49:27] might have a model that's dedicated to shortcuts but we might also have a model
[00:49:29] shortcuts but we might also have a model that like really learns the right thing
[00:49:30] that like really learns the right thing right
[00:49:31] right um and so you know the more flexible our
[00:49:34] um and so you know the more flexible our model the bigger our sort of model class
[00:49:36] model the bigger our sort of model class the more we can say like part of the
[00:49:38] the more we can say like part of the model might dedicate the shortcuts but
[00:49:39] model might dedicate the shortcuts but that's okay because the rest of our
[00:49:41] that's okay because the rest of our model will still learn the right thing
[00:49:42] model will still learn the right thing right but that's sort of still a hope
[00:49:44] right but that's sort of still a hope there's no real guarantee that this will
[00:49:46] there's no real guarantee that this will happen and if the shortcuts are strong
[00:49:47] happen and if the shortcuts are strong enough that's what the model will learn
[00:49:49] enough that's what the model will learn so i think
[00:49:50] so i think it seems really important um to have
[00:49:52] it seems really important um to have bigger capacity models like that's sort
[00:49:54] bigger capacity models like that's sort of given but you know how can we learn
[00:49:57] of given but you know how can we learn um big models well like without
[00:49:59] um big models well like without overfitting how can we make sure that
[00:50:01] overfitting how can we make sure that they still learn sort of the right thing
[00:50:02] they still learn sort of the right thing like if one part of the model fits the
[00:50:04] like if one part of the model fits the shortcuts how can we make sure the rest
[00:50:05] shortcuts how can we make sure the rest of it learns to do the right sort of
[00:50:08] of it learns to do the right sort of prediction without shortcuts that's sort
[00:50:09] prediction without shortcuts that's sort of the open question i think in this
[00:50:11] of the open question i think in this area yeah
[00:50:13] area yeah um
[00:50:13] um [Music]
[00:50:15] [Music] there was a question for image
[00:50:17] there was a question for image segmentation do we have a way to know
[00:50:18] segmentation do we have a way to know which part of the image contribute to
[00:50:20] which part of the image contribute to the prediction we could call it
[00:50:21] the prediction we could call it prediction traceability yes
[00:50:24] prediction traceability yes um so
[00:50:25] um so i of course glossed over quite a bit but
[00:50:27] i of course glossed over quite a bit but the this paper la pushkin 2019 um here
[00:50:32] the this paper la pushkin 2019 um here is exactly about that is about trying to
[00:50:34] is exactly about that is about trying to identify or to attribute um
[00:50:37] identify or to attribute um predictions to parts of the image using
[00:50:41] predictions to parts of the image using interpretability methods and that's how
[00:50:43] interpretability methods and that's how they found i think this horse
[00:50:46] they found i think this horse problem where they attributed
[00:50:48] problem where they attributed predictions to locations and they found
[00:50:50] predictions to locations and they found that for horses they were always
[00:50:52] that for horses they were always localized to the bottom right and it was
[00:50:54] localized to the bottom right and it was because of this watermark and so i think
[00:50:56] because of this watermark and so i think a big
[00:50:57] a big important use case of interpretability
[00:50:59] important use case of interpretability methods is exactly this
[00:51:01] methods is exactly this it's to identify these kinds of
[00:51:03] it's to identify these kinds of shortcuts by attributing predictions to
[00:51:05] shortcuts by attributing predictions to locations in the image or to subgroups
[00:51:08] locations in the image or to subgroups in the data set
[00:51:11] in the data set um
[00:51:12] um going along the above comment are there
[00:51:13] going along the above comment are there methods of finding what parts of the
[00:51:14] methods of finding what parts of the image or data example has high weights
[00:51:16] image or data example has high weights associated with it yes
[00:51:18] associated with it yes um so
[00:51:19] um so the general approach i think that people
[00:51:22] the general approach i think that people have i think there's of course many
[00:51:24] have i think there's of course many different methods but
[00:51:26] different methods but shapley values are a pretty general
[00:51:28] shapley values are a pretty general framework you can sort of think about
[00:51:30] framework you can sort of think about um
[00:51:32] um the analogy is kind of like this think
[00:51:33] the analogy is kind of like this think about each pixel or each region or
[00:51:36] about each pixel or each region or subpart of the image as participating in
[00:51:38] subpart of the image as participating in the prediction and you can ask when i
[00:51:40] the prediction and you can ask when i remove that part of the image
[00:51:43] remove that part of the image how much does the prediction accuracy go
[00:51:45] how much does the prediction accuracy go down right
[00:51:46] down right and so you can do that after sort of
[00:51:48] and so you can do that after sort of randomly removing other parts of the
[00:51:49] randomly removing other parts of the image so you randomly remove the other
[00:51:51] image so you randomly remove the other parts of the image like drop random
[00:51:52] parts of the image like drop random pixels and then you drop the part you're
[00:51:54] pixels and then you drop the part you're interested in you ask on average how
[00:51:56] interested in you ask on average how much more accuracy does this uh part
[00:51:59] much more accuracy does this uh part that i'm interested in give me right and
[00:52:01] that i'm interested in give me right and that's sort of this estimate called the
[00:52:02] that's sort of this estimate called the shampoo value
[00:52:03] shampoo value and a punggway i think who is here has a
[00:52:05] and a punggway i think who is here has a paper on you know approximations to that
[00:52:07] paper on you know approximations to that based on the influence function so
[00:52:09] based on the influence function so there's all sorts of ways um in the
[00:52:11] there's all sorts of ways um in the interpretability community about doing
[00:52:13] interpretability community about doing what's called feature attribution
[00:52:16] what's called feature attribution um another question for a problem with
[00:52:18] um another question for a problem with discrimination would a
[00:52:19] discrimination would a reasonable approach be to adopt active
[00:52:22] reasonable approach be to adopt active learning by default where the model can
[00:52:23] learning by default where the model can train with more emphasis on wrongly
[00:52:25] train with more emphasis on wrongly categorized examples then the hope is
[00:52:27] categorized examples then the hope is that the model could steer itself away
[00:52:28] that the model could steer itself away from the initial biases over time or is
[00:52:30] from the initial biases over time or is it not as simple yeah so that's um
[00:52:33] it not as simple yeah so that's um obviously sort of uh
[00:52:35] obviously sort of uh actually one way around it right like
[00:52:37] actually one way around it right like when you can collect your own data and
[00:52:39] when you can collect your own data and you have the ability to know what you're
[00:52:41] you have the ability to know what you're getting wrong
[00:52:42] getting wrong then you know collecting data in the
[00:52:44] then you know collecting data in the places where you're wrong serves as a
[00:52:46] places where you're wrong serves as a negative feedback loop right you get
[00:52:47] negative feedback loop right you get more data where you're wrong your model
[00:52:49] more data where you're wrong your model gets the training signal it needs to
[00:52:51] gets the training signal it needs to correct
[00:52:52] correct um and so eventually you'll learn the
[00:52:54] um and so eventually you'll learn the right thing it's just that active
[00:52:55] right thing it's just that active learning on the scale that you need is
[00:52:57] learning on the scale that you need is very very challenging right like can you
[00:53:00] very very challenging right like can you actively collect image net size style
[00:53:02] actively collect image net size style data very challenging also very
[00:53:04] data very challenging also very challenging to know what parts of the
[00:53:06] challenging to know what parts of the data you're doing badly on right um you
[00:53:08] data you're doing badly on right um you need to know well enough to say oh the
[00:53:10] need to know well enough to say oh the part of the data i'm doing badly on is
[00:53:12] part of the data i'm doing badly on is horses without the watermark right
[00:53:14] horses without the watermark right that's sort of a hard thing to be able
[00:53:16] that's sort of a hard thing to be able to say so you need to know what you
[00:53:18] to say so you need to know what you don't know which is almost equally as
[00:53:20] don't know which is almost equally as challenging as robustness
[00:53:23] challenging as robustness um oh wow there's a lot of questions now
[00:53:25] um oh wow there's a lot of questions now um
[00:53:28] um physicians are experimenting in
[00:53:29] physicians are experimenting in end-of-life care based on ai to nudge
[00:53:31] end-of-life care based on ai to nudge conversations do you have any
[00:53:33] conversations do you have any suggestions for patients
[00:53:34] suggestions for patients and doctors
[00:53:36] and doctors um
[00:53:38] that is yeah that's very challenging i
[00:53:41] that is yeah that's very challenging i do think one important thing
[00:53:43] do think one important thing about sort of these high stakes settings
[00:53:45] about sort of these high stakes settings is to think about sort of the
[00:53:46] is to think about sort of the alternative and the whole
[00:53:48] alternative and the whole decision systems that they're part of
[00:53:50] decision systems that they're part of right um
[00:53:51] right um so say medical diagnosis or i'll give
[00:53:54] so say medical diagnosis or i'll give another example i'm more familiar with
[00:53:56] another example i'm more familiar with which is um predicting whether or not
[00:53:58] which is um predicting whether or not someone will commit crime again and so
[00:54:00] someone will commit crime again and so should be released for parole right
[00:54:01] should be released for parole right these are both really high stakes
[00:54:03] these are both really high stakes prediction tasks and the way they're
[00:54:05] prediction tasks and the way they're performed is there's you know a machine
[00:54:06] performed is there's you know a machine learning system and the human and they
[00:54:08] learning system and the human and they sort of jointly make a decision and so
[00:54:09] sort of jointly make a decision and so you need to think about not only the
[00:54:11] you need to think about not only the machine learning system which is sort of
[00:54:12] machine learning system which is sort of what i've talked about here but also the
[00:54:13] what i've talked about here but also the human part right and how they integrate
[00:54:15] human part right and how they integrate and how their decisions get combined
[00:54:17] and how their decisions get combined together um i think that's actually the
[00:54:19] together um i think that's actually the important part like how do humans
[00:54:20] important part like how do humans override the machines how do they
[00:54:22] override the machines how do they incorporate the suggestions of the
[00:54:23] incorporate the suggestions of the machines even more so than the
[00:54:24] machines even more so than the predictions which i think
[00:54:26] predictions which i think need to always be taken uh carefully um
[00:54:30] need to always be taken uh carefully um next one how can we combine models
[00:54:31] next one how can we combine models objectives to gain greater understanding
[00:54:33] objectives to gain greater understanding of the world and to combine it to create
[00:54:34] of the world and to combine it to create intelligent behavior um
[00:54:36] intelligent behavior um yeah i think this is uh this is
[00:54:38] yeah i think this is uh this is basically the
[00:54:39] basically the the open question right the unfortunate
[00:54:41] the open question right the unfortunate thing that we don't know how to do um
[00:54:43] thing that we don't know how to do um and i think that's sort of the thing
[00:54:44] and i think that's sort of the thing that we are struggling with in sort of
[00:54:46] that we are struggling with in sort of the robustness uh generalization
[00:54:48] the robustness uh generalization literature like what is the right
[00:54:50] literature like what is the right approach and i think there's not really
[00:54:51] approach and i think there's not really even a consensus is it like more data
[00:54:53] even a consensus is it like more data collection is it smarter ways of
[00:54:55] collection is it smarter ways of training the model or is it better
[00:54:56] training the model or is it better models um it's
[00:54:59] models um it's not clear yet what the right way is yet
[00:55:01] not clear yet what the right way is yet unfortunately you don't have a great
[00:55:02] unfortunately you don't have a great answer beyond that
[00:55:03] answer beyond that um
[00:55:04] um for shortcuts i think even people use
[00:55:06] for shortcuts i think even people use shortcuts to identify something um do we
[00:55:08] shortcuts to identify something um do we have some way to understand shortcuts
[00:55:09] have some way to understand shortcuts based on training data
[00:55:11] based on training data yeah this is a good point um so humans
[00:55:13] yeah this is a good point um so humans will also use shortcuts but i think
[00:55:16] will also use shortcuts but i think the important thing here is that these
[00:55:17] the important thing here is that these shortcuts are a lot more crude than the
[00:55:19] shortcuts are a lot more crude than the humans ones that we use
[00:55:21] humans ones that we use and at some level they don't even pass
[00:55:24] and at some level they don't even pass basic sanity checks right like for
[00:55:26] basic sanity checks right like for entailment we know that the output
[00:55:27] entailment we know that the output should depend on both the both the
[00:55:29] should depend on both the both the inputs but in reality it's only
[00:55:32] inputs but in reality it's only depending on the hypothesis which is
[00:55:33] depending on the hypothesis which is very problematic
[00:55:35] very problematic and what people do today is they
[00:55:37] and what people do today is they have these sort of challenge sets like
[00:55:39] have these sort of challenge sets like examine the performance of a model just
[00:55:41] examine the performance of a model just based on the hypothesis and these kinds
[00:55:43] based on the hypothesis and these kinds of sort of what are close to those are
[00:55:45] of sort of what are close to those are unit tests
[00:55:46] unit tests help catch these kinds of shortcuts in
[00:55:48] help catch these kinds of shortcuts in many cases so i think
[00:55:49] many cases so i think um one way to detect them is things like
[00:55:51] um one way to detect them is things like that like our model shouldn't be
[00:55:53] that like our model shouldn't be sensitive to certain perturbations or it
[00:55:55] sensitive to certain perturbations or it should be sensitive certain
[00:55:56] should be sensitive certain perturbations when you go off of those
[00:55:58] perturbations when you go off of those kinds of assertions
[00:56:01] kinds of assertions is there currently a correlation between
[00:56:03] is there currently a correlation between model capacity and number of shortcuts
[00:56:05] model capacity and number of shortcuts employed are larger models more likely
[00:56:07] employed are larger models more likely to happen on the correct correlations or
[00:56:09] to happen on the correct correlations or are smaller models more likely to use
[00:56:11] are smaller models more likely to use shortcuts
[00:56:12] shortcuts that's a good question i think
[00:56:15] that's a good question i think the general sense that i get from
[00:56:16] the general sense that i get from reading the literature is that smaller
[00:56:18] reading the literature is that smaller models are more likely to use shortcuts
[00:56:20] models are more likely to use shortcuts in many ways um for example in this
[00:56:23] in many ways um for example in this paper about sort of watermark
[00:56:26] paper about sort of watermark based shortcuts
[00:56:27] based shortcuts linear models did a lot worse like they
[00:56:29] linear models did a lot worse like they would really pick up heavily on these
[00:56:31] would really pick up heavily on these watermarks whereas cnns did so um with
[00:56:34] watermarks whereas cnns did so um with less weights and less frequency
[00:56:36] less weights and less frequency um and i think generally that's true
[00:56:38] um and i think generally that's true that like large capacity models trained
[00:56:40] that like large capacity models trained with a lot of data can use some of its
[00:56:42] with a lot of data can use some of its capacity just to model the shortcuts and
[00:56:43] capacity just to model the shortcuts and it'll still do well on the data without
[00:56:45] it'll still do well on the data without shortcuts as long as they exist but
[00:56:48] shortcuts as long as they exist but really the key thing here is like you
[00:56:49] really the key thing here is like you need to at least see some data without
[00:56:51] need to at least see some data without the shortcut pattern
[00:56:54] um
[00:56:55] um how much are you oh i already answered
[00:56:57] how much are you oh i already answered okay great
[00:57:00] yeah so i wanna get into the breakout
[00:57:02] yeah so i wanna get into the breakout now actually especially because someone
[00:57:03] now actually especially because someone asked the question you know uh what sort
[00:57:05] asked the question you know uh what sort of the solution and you know are humans
[00:57:08] of the solution and you know are humans robust so um if woody you could uh drop
[00:57:11] robust so um if woody you could uh drop us into a breakout session for say uh
[00:57:13] us into a breakout session for say uh five or six minutes
[00:57:15] five or six minutes um it would be great to talk about these
[00:57:17] um it would be great to talk about these two questions so um the first one is are
[00:57:19] two questions so um the first one is are the brittleness issues inevitable you
[00:57:21] the brittleness issues inevitable you know what do you think are the solutions
[00:57:22] know what do you think are the solutions like what are the right approaches that
[00:57:24] like what are the right approaches that you think of i mean the second question
[00:57:26] you think of i mean the second question is are humans robust um and you know if
[00:57:28] is are humans robust um and you know if you think so then what's the key
[00:57:30] you think so then what's the key ingredient that makes humans robust or
[00:57:32] ingredient that makes humans robust or more robust than machines
[00:57:35] more robust than machines awesome yes i'll create the breakout
[00:57:37] awesome yes i'll create the breakout rooms if everyone wants to take a quick
[00:57:38] rooms if everyone wants to take a quick screenshot or try to remember these i'll
[00:57:41] screenshot or try to remember these i'll post the questions in the chat as well
[00:57:42] post the questions in the chat as well but they won't be in the breakout
[00:57:44] but they won't be in the breakout session
[00:57:46] session oh
[00:57:48] oh okay
[00:57:53] great um i think i'm unmuted right let's
[00:57:56] great um i think i'm unmuted right let's see
[00:57:57] see yes okay great all right excellent
[00:57:59] yes okay great all right excellent all right so i'll go through the second
[00:58:01] all right so i'll go through the second part a little bit quicker um i'm glad
[00:58:04] part a little bit quicker um i'm glad that we got so many good questions on
[00:58:05] that we got so many good questions on the first part which is the more
[00:58:07] the first part which is the more important of the two parts of this talk
[00:58:09] important of the two parts of this talk and the second part is thinking a little
[00:58:11] and the second part is thinking a little bit about how can we do learning how can
[00:58:13] bit about how can we do learning how can we fix these problems and the kinds of
[00:58:15] we fix these problems and the kinds of research that
[00:58:16] research that percy and i and others in stanford have
[00:58:19] percy and i and others in stanford have been doing and so the key problem right
[00:58:22] been doing and so the key problem right that i think you should keep in mind
[00:58:23] that i think you should keep in mind with all this is that the training
[00:58:25] with all this is that the training distribution is very different from the
[00:58:26] distribution is very different from the test distribution this is the root of
[00:58:27] test distribution this is the root of all evil for these robustness problems
[00:58:30] all evil for these robustness problems um and so
[00:58:32] um and so we need to think about you know is the
[00:58:33] we need to think about you know is the limitation that we can't generalize from
[00:58:35] limitation that we can't generalize from train to test inevitable or can we come
[00:58:36] train to test inevitable or can we come up with some clever data collection
[00:58:38] up with some clever data collection schemes or model training mechanisms
[00:58:41] schemes or model training mechanisms that allow us to generalize um and so to
[00:58:43] that allow us to generalize um and so to do this we need to think a little bit
[00:58:44] do this we need to think a little bit about you know
[00:58:45] about you know how distributions can shift
[00:58:48] how distributions can shift and so i'll give you a little bit of
[00:58:50] and so i'll give you a little bit of definitions the first one is like
[00:58:51] definitions the first one is like covariate shift and this is what you
[00:58:53] covariate shift and this is what you usually think of when people say the
[00:58:55] usually think of when people say the distributions are different so you might
[00:58:57] distributions are different so you might get you know these let's say you're
[00:58:58] get you know these let's say you're making a face recognizer you have these
[00:59:00] making a face recognizer you have these really nice welded portraits in the
[00:59:02] really nice welded portraits in the training data and at test time you're
[00:59:03] training data and at test time you're using it with cctvs so all sorts of
[00:59:06] using it with cctvs so all sorts of different environments but the
[00:59:07] different environments but the underlying task is the same and there
[00:59:09] underlying task is the same and there should be a single predictor that does
[00:59:11] should be a single predictor that does well both on portraits and images
[00:59:13] well both on portraits and images cropped from cctvs
[00:59:15] cropped from cctvs another example is label shift where you
[00:59:18] another example is label shift where you know the input features look similar but
[00:59:20] know the input features look similar but now the output labels distribution has
[00:59:22] now the output labels distribution has shifted so for example if you're making
[00:59:24] shifted so for example if you're making a face recognizer and at training time
[00:59:26] a face recognizer and at training time you know you need really precise
[00:59:27] you know you need really precise matching so you're only going to be
[00:59:29] matching so you're only going to be calling detections when the images look
[00:59:30] calling detections when the images look exactly the same but at test time you
[00:59:32] exactly the same but at test time you know you're making a product for your
[00:59:34] know you're making a product for your camera and so it can be a little bit
[00:59:35] camera and so it can be a little bit looser and so you might deal with blurry
[00:59:37] looser and so you might deal with blurry images and so on and so the limits test
[00:59:40] images and so on and so the limits test for this is you know you have the same
[00:59:42] for this is you know you have the same predictor but you're changing your
[00:59:43] predictor but you're changing your threshold you're just saying it's okay
[00:59:45] threshold you're just saying it's okay if you know i'm a little bit less
[00:59:46] if you know i'm a little bit less confident i'm still gonna make the call
[00:59:48] confident i'm still gonna make the call and this is you know an instance of
[00:59:49] and this is you know an instance of label shift
[00:59:50] label shift and the final one that is basically
[00:59:52] and the final one that is basically intractable in all cases is concept
[00:59:54] intractable in all cases is concept drift um and so here you know you might
[00:59:56] drift um and so here you know you might have a prediction task where you're
[00:59:58] have a prediction task where you're trying to initially recognize um faces
[01:00:01] trying to initially recognize um faces of you know the same people but then at
[01:00:03] of you know the same people but then at test time you want to match people
[01:00:04] test time you want to match people across time like young pictures and old
[01:00:07] across time like young pictures and old pictures the task is fundamentally
[01:00:08] pictures the task is fundamentally different right like whether or not
[01:00:09] different right like whether or not you're matching the same person or
[01:00:11] you're matching the same person or person shifted in time and so no one
[01:00:13] person shifted in time and so no one predictor is going to do really well on
[01:00:15] predictor is going to do really well on both of these tasks and so there's sort
[01:00:16] both of these tasks and so there's sort of a fundamental change on in the task
[01:00:19] of a fundamental change on in the task definition
[01:00:20] definition i mean i'm going to go over to sort of
[01:00:22] i mean i'm going to go over to sort of ways to deal with all these problems the
[01:00:25] ways to deal with all these problems the first one is we're just going to collect
[01:00:27] first one is we're just going to collect more data like someone sort of talked
[01:00:28] more data like someone sort of talked about this as a question earlier and
[01:00:30] about this as a question earlier and this is sort of the key thing right like
[01:00:32] this is sort of the key thing right like if we get more data we can do a lot more
[01:00:33] if we get more data we can do a lot more things and the second part is a little
[01:00:35] things and the second part is a little bit more ambitious and it's going to say
[01:00:37] bit more ambitious and it's going to say you know let's try to only do with what
[01:00:39] you know let's try to only do with what data we have so the first idea is let's
[01:00:42] data we have so the first idea is let's just say you know
[01:00:43] just say you know we're going to try to generalize to a
[01:00:44] we're going to try to generalize to a test set and we're just going to collect
[01:00:46] test set and we're just going to collect more data so a classic example of this
[01:00:48] more data so a classic example of this kind of task is um you're recognizing
[01:00:50] kind of task is um you're recognizing digits so you have images of digits the
[01:00:52] digits so you have images of digits the left one is mnist which is really old
[01:00:54] left one is mnist which is really old usps even older actually an svhn which
[01:00:56] usps even older actually an svhn which is a more modern recognized digits from
[01:00:58] is a more modern recognized digits from um i think uh male
[01:01:01] um i think uh male numbers that are taken from the wild so
[01:01:04] numbers that are taken from the wild so in all of these cases you need to output
[01:01:05] in all of these cases you need to output the number right like this one's a 2
[01:01:07] the number right like this one's a 2 this one's a 7 and so on
[01:01:09] this one's a 7 and so on we're collecting new data let's say we
[01:01:11] we're collecting new data let's say we trained a model on mnist and we want to
[01:01:13] trained a model on mnist and we want to do well on svhn at test time and so this
[01:01:16] do well on svhn at test time and so this is a distribution shift
[01:01:18] is a distribution shift um but we might be able to collect more
[01:01:19] um but we might be able to collect more data right it's unrealistic to maybe say
[01:01:21] data right it's unrealistic to maybe say we only have this and we have to predict
[01:01:23] we only have this and we have to predict this
[01:01:24] this and so
[01:01:26] and so what we might have is we might be able
[01:01:27] what we might have is we might be able to collect some unlabeled data from svhn
[01:01:30] to collect some unlabeled data from svhn right like we can't afford someone
[01:01:32] right like we can't afford someone labeling them by hand but we might just
[01:01:33] labeling them by hand but we might just be able to get the images and this is
[01:01:34] be able to get the images and this is called unsupervised domain adaptation
[01:01:37] called unsupervised domain adaptation um and if we can collect labels that's
[01:01:39] um and if we can collect labels that's all called um supervised domain
[01:01:41] all called um supervised domain adaptation and that's even better
[01:01:43] adaptation and that's even better and so we can ask like when can we do
[01:01:44] and so we can ask like when can we do learning right like if we're in
[01:01:46] learning right like if we're in covariate shift and we have this like
[01:01:48] covariate shift and we have this like source data we might be able to do
[01:01:50] source data we might be able to do learning because we have a better model
[01:01:52] learning because we have a better model that can adapt to these different kinds
[01:01:54] that can adapt to these different kinds of distribution shifts that occur
[01:01:56] of distribution shifts that occur so the general thing that you should
[01:01:57] so the general thing that you should sort of think about is you know if we're
[01:01:59] sort of think about is you know if we're in covariate shift settings and
[01:02:01] in covariate shift settings and we have source domain data so unlabeled
[01:02:03] we have source domain data so unlabeled data from our target distribution
[01:02:06] data from our target distribution then we can actually sometimes
[01:02:08] then we can actually sometimes generalize to our target task even
[01:02:10] generalize to our target task even though we don't have labels
[01:02:11] though we don't have labels um
[01:02:12] um and so this is sort of the setting that
[01:02:14] and so this is sort of the setting that we're going to be talking about for the
[01:02:16] we're going to be talking about for the rest of the talk the rest of the couple
[01:02:18] rest of the talk the rest of the couple minutes of this talk where we're going
[01:02:20] minutes of this talk where we're going to say you know we have this covariate
[01:02:22] to say you know we have this covariate shift problem where the prediction task
[01:02:24] shift problem where the prediction task is fundamentally the same
[01:02:26] is fundamentally the same and how can we generalize
[01:02:28] and how can we generalize so
[01:02:29] so the easiest thing to do and the most
[01:02:31] the easiest thing to do and the most classic thing to do is re-weighting so
[01:02:33] classic thing to do is re-weighting so let's say we have training data that's
[01:02:34] let's say we have training data that's 90 frontal images but 10
[01:02:37] 90 frontal images but 10 images taken from the side and we want
[01:02:39] images taken from the side and we want to generalize the test data that's 50 50
[01:02:41] to generalize the test data that's 50 50 front and side right so how can we do
[01:02:43] front and side right so how can we do this well let's just re-weight the data
[01:02:45] this well let's just re-weight the data set so each frontal facing image counts
[01:02:47] set so each frontal facing image counts for less and each side facing image
[01:02:50] for less and each side facing image counts for more right so we've
[01:02:51] counts for more right so we've artificially rebalanced the data and
[01:02:53] artificially rebalanced the data and this gives us this like assumption-free
[01:02:55] this gives us this like assumption-free way of getting estimates of how well our
[01:02:57] way of getting estimates of how well our model will perform on this sort of 50 50
[01:03:00] model will perform on this sort of 50 50 test set and you know this applies to
[01:03:02] test set and you know this applies to all of what i've talked about before
[01:03:03] all of what i've talked about before like if your data is imbalanced it has a
[01:03:05] like if your data is imbalanced it has a minority group or maybe has shortcuts
[01:03:07] minority group or maybe has shortcuts maybe we can rebalance it to get rid of
[01:03:09] maybe we can rebalance it to get rid of all these problems right
[01:03:10] all these problems right so are we done
[01:03:12] so are we done no uh because you know even though i
[01:03:14] no uh because you know even though i talked about it that way if we re
[01:03:16] talked about it that way if we re shuffle or like restructure the data set
[01:03:18] shuffle or like restructure the data set you know the training data 100 percent
[01:03:20] you know the training data 100 percent men and the test date is 100 women so
[01:03:22] men and the test date is 100 women so there's no overlap and if we try to
[01:03:24] there's no overlap and if we try to reweight this data we're going to get
[01:03:26] reweight this data we're going to get infinite error right because we need to
[01:03:27] infinite error right because we need to infinitely update the women that we
[01:03:29] infinitely update the women that we don't have in our training data so this
[01:03:31] don't have in our training data so this is the fundamental problem with all
[01:03:33] is the fundamental problem with all these approaches like when you don't
[01:03:34] these approaches like when you don't have any overlap your estimates all blow
[01:03:36] have any overlap your estimates all blow up and go to infinity
[01:03:38] up and go to infinity and so in the real world
[01:03:40] and so in the real world everything is non-overlapping and so
[01:03:43] everything is non-overlapping and so usually these kinds of estimates don't
[01:03:44] usually these kinds of estimates don't work
[01:03:46] work but intuitively we might think you know
[01:03:48] but intuitively we might think you know these kinds of tasks are possible and
[01:03:50] these kinds of tasks are possible and the reason why you know um you and i and
[01:03:53] the reason why you know um you and i and many others think this is possible is
[01:03:54] many others think this is possible is this intuition right like let's say we
[01:03:56] this intuition right like let's say we have training data that's blue images
[01:03:58] have training data that's blue images and test data that's orange images like
[01:04:00] and test data that's orange images like clearly there is no overlap between any
[01:04:02] clearly there is no overlap between any of these images right they're on
[01:04:03] of these images right they're on different color channels
[01:04:05] different color channels um but if we desaturate the images we
[01:04:07] um but if we desaturate the images we can perform prediction on the
[01:04:08] can perform prediction on the desaturated image and we'll get really
[01:04:10] desaturated image and we'll get really good performance right
[01:04:12] good performance right and so the intuition that if we can't
[01:04:14] and so the intuition that if we can't distinguish the two domains because
[01:04:15] distinguish the two domains because we've desaturated the images we might do
[01:04:18] we've desaturated the images we might do really well and this is the idea behind
[01:04:20] really well and this is the idea behind you know most of modern domain
[01:04:21] you know most of modern domain adaptation where you learn to represent
[01:04:24] adaptation where you learn to represent your data in a space that doesn't change
[01:04:26] your data in a space that doesn't change when you go from training to test
[01:04:28] when you go from training to test distribution and so you measure how much
[01:04:30] distribution and so you measure how much your
[01:04:31] your data changes in this sort of a higher
[01:04:33] data changes in this sort of a higher level representation and if your data is
[01:04:36] level representation and if your data is close then you're going to do really
[01:04:37] close then you're going to do really well and the sort of thing to keep in
[01:04:38] well and the sort of thing to keep in mind is you know your test performance
[01:04:41] mind is you know your test performance is going to be your training data's
[01:04:42] is going to be your training data's performance plus some sort of distance
[01:04:44] performance plus some sort of distance that measures how different the training
[01:04:46] that measures how different the training and test distributions are
[01:04:48] and test distributions are and if you keep this small you're going
[01:04:50] and if you keep this small you're going to do really well
[01:04:53] and so you can think about this sort of
[01:04:55] and so you can think about this sort of very simply as saying you know this gap
[01:04:56] very simply as saying you know this gap you know the test error of a model is
[01:04:58] you know the test error of a model is the source performance how well we do on
[01:05:00] the source performance how well we do on the training data and the gap between
[01:05:02] the training data and the gap between trainer and tester
[01:05:04] trainer and tester and
[01:05:05] and this idea is that you know we're going
[01:05:07] this idea is that you know we're going to look at a domain distance where no
[01:05:09] to look at a domain distance where no matter what model we pick we're going to
[01:05:11] matter what model we pick we're going to do well because the distributions look
[01:05:13] do well because the distributions look so similar right like if images are
[01:05:14] so similar right like if images are desaturated and they look identical it
[01:05:16] desaturated and they look identical it doesn't matter what the model is they do
[01:05:18] doesn't matter what the model is they do identically
[01:05:20] identically and so if we do well on the source
[01:05:22] and so if we do well on the source domain and the domain distance is low we
[01:05:25] domain and the domain distance is low we might be able to generalize
[01:05:26] might be able to generalize and this is really interesting and
[01:05:28] and this is really interesting and optimistic right like these all seem
[01:05:30] optimistic right like these all seem like things that we can measure and
[01:05:32] like things that we can measure and think about and they give us conditions
[01:05:34] think about and they give us conditions under which we might be able to do well
[01:05:36] under which we might be able to do well on a test distribution and there's been
[01:05:37] on a test distribution and there's been a lot of work over the last almost two
[01:05:40] a lot of work over the last almost two decades now um on these kinds of domain
[01:05:42] decades now um on these kinds of domain distances and bounds and how you can
[01:05:44] distances and bounds and how you can learn from unlabeled data
[01:05:46] learn from unlabeled data um and they give great intuition they
[01:05:48] um and they give great intuition they let you think carefully about these
[01:05:49] let you think carefully about these kinds of problems but unfortunately if
[01:05:51] kinds of problems but unfortunately if you try to actually compute these bounds
[01:05:53] you try to actually compute these bounds you know what's my guarantee of test
[01:05:55] you know what's my guarantee of test error they're usually vacuous so you'll
[01:05:56] error they're usually vacuous so you'll say
[01:05:57] say things like the accuracy will be greater
[01:05:59] things like the accuracy will be greater than zero and the error will be uh less
[01:06:01] than zero and the error will be uh less than one which is not super helpful
[01:06:05] and um just to go over how these kinds
[01:06:07] and um just to go over how these kinds of things often work in practice
[01:06:10] of things often work in practice domain distance is the thing i just
[01:06:11] domain distance is the thing i just described is the basis for a lot of
[01:06:13] described is the basis for a lot of these modern domain adaptation methods
[01:06:15] these modern domain adaptation methods and the key idea is that neural nets are
[01:06:18] and the key idea is that neural nets are everywhere because neural nets work
[01:06:20] everywhere because neural nets work and you use them as a way to measure the
[01:06:21] and you use them as a way to measure the domain distance
[01:06:23] domain distance and the idea here is that you have one
[01:06:26] and the idea here is that you have one part of your model maximizing
[01:06:27] part of your model maximizing performance on the training data and you
[01:06:29] performance on the training data and you have another part of the model making
[01:06:30] have another part of the model making sure that you can't distinguish between
[01:06:32] sure that you can't distinguish between the training and the data distribution
[01:06:34] the training and the data distribution on this sort of what's called like a
[01:06:35] on this sort of what's called like a bottleneck feature space where
[01:06:38] bottleneck feature space where at this level your training and test
[01:06:40] at this level your training and test data should be indistinguishable and yet
[01:06:42] data should be indistinguishable and yet useful for actually doing the task
[01:06:47] and so you know we have this hope that
[01:06:49] and so you know we have this hope that you know we have this re-weighting thing
[01:06:50] you know we have this re-weighting thing which has no model dependence um and
[01:06:53] which has no model dependence um and this model-based domain
[01:06:55] this model-based domain distances that you know requires us to
[01:06:57] distances that you know requires us to carefully construct a neural network but
[01:06:58] carefully construct a neural network but on the other hand that's the only way we
[01:07:00] on the other hand that's the only way we can get these things to work in the real
[01:07:02] can get these things to work in the real world right like with re-weighting often
[01:07:04] world right like with re-weighting often these weights are infinity and so
[01:07:06] these weights are infinity and so there's no free lunch we either need
[01:07:07] there's no free lunch we either need model assumptions or assumptions about
[01:07:10] model assumptions or assumptions about overlap on the domain
[01:07:12] overlap on the domain um but if we have one of those and
[01:07:14] um but if we have one of those and unlabeled data on the test domain we can
[01:07:16] unlabeled data on the test domain we can actually sometimes do well
[01:07:20] the other approach that i'm go over a
[01:07:22] the other approach that i'm go over a little quickly because i'm running low
[01:07:24] little quickly because i'm running low on time
[01:07:25] on time i can describe very uh at a very high
[01:07:26] i can describe very uh at a very high level as this idea right
[01:07:29] level as this idea right as i said before the main issue is that
[01:07:31] as i said before the main issue is that our training distribution and the test
[01:07:33] our training distribution and the test distribution are different right and if
[01:07:35] distribution are different right and if they were the same we'd be done but
[01:07:37] they were the same we'd be done but they're not
[01:07:38] they're not but what if i told you i give you a list
[01:07:40] but what if i told you i give you a list of 100 possible test distributions and i
[01:07:42] of 100 possible test distributions and i say your true test distribution is going
[01:07:44] say your true test distribution is going to be one amongst these hundred right
[01:07:47] to be one amongst these hundred right then we can train a model to do well on
[01:07:48] then we can train a model to do well on all of these right we just go through
[01:07:50] all of these right we just go through each one of these test sets and we say
[01:07:52] each one of these test sets and we say our model has to do well on the worst
[01:07:54] our model has to do well on the worst one and if we can get a model that works
[01:07:56] one and if we can get a model that works on all of them we know that our model is
[01:07:58] on all of them we know that our model is going to do well in the true test set so
[01:08:00] going to do well in the true test set so this is thinking about you know a
[01:08:02] this is thinking about you know a potential set of test distributions and
[01:08:04] potential set of test distributions and considering the worst case and this is
[01:08:06] considering the worst case and this is what's called a min and max optimization
[01:08:08] what's called a min and max optimization problem
[01:08:09] problem so we're going to find the best model
[01:08:10] so we're going to find the best model that's the min part that works well over
[01:08:13] that's the min part that works well over the worst possible potential test set
[01:08:15] the worst possible potential test set and that's the max part and this idea is
[01:08:17] and that's the max part and this idea is going to work whenever the true test set
[01:08:20] going to work whenever the true test set um is contained in this uncertainty
[01:08:22] um is contained in this uncertainty queue right we're saying worst case over
[01:08:24] queue right we're saying worst case over this big q which is the set of potential
[01:08:26] this big q which is the set of potential test distributions and this fails
[01:08:28] test distributions and this fails whenever q is too small or two bit if q
[01:08:30] whenever q is too small or two bit if q doesn't contain the real test
[01:08:31] doesn't contain the real test distribution you've got no guarantees if
[01:08:33] distribution you've got no guarantees if q is so big that it contains everything
[01:08:35] q is so big that it contains everything then your model is going to be
[01:08:37] then your model is going to be so pessimistic right because it has to
[01:08:39] so pessimistic right because it has to be prepared for any possible
[01:08:40] be prepared for any possible distribution that's just going to
[01:08:42] distribution that's just going to predict um 50 50 or you know something
[01:08:44] predict um 50 50 or you know something vacuous for all of your inputs
[01:08:48] vacuous for all of your inputs i'm going to skip over a bit of examples
[01:08:52] i'm going to skip over a bit of examples and i'm going to go to this slide
[01:08:54] and i'm going to go to this slide and say
[01:08:54] and say these kinds of ideas can be applied in
[01:08:56] these kinds of ideas can be applied in each one of the settings that i
[01:08:57] each one of the settings that i described before
[01:08:59] described before talking about minority groups in
[01:09:00] talking about minority groups in fairness or adversarial examples or
[01:09:03] fairness or adversarial examples or understanding by carefully choosing the
[01:09:05] understanding by carefully choosing the kinds of worst-case groups
[01:09:07] kinds of worst-case groups in the case of minority groups we
[01:09:09] in the case of minority groups we basically explicitly list out all the
[01:09:10] basically explicitly list out all the possible minorities that we care about
[01:09:12] possible minorities that we care about and we consider the worst case
[01:09:14] and we consider the worst case performance over all of those
[01:09:16] performance over all of those in adversarial examples we know that
[01:09:18] in adversarial examples we know that these images before and after
[01:09:20] these images before and after perturbation are close so we consider
[01:09:22] perturbation are close so we consider all distributions that are nearby each
[01:09:24] all distributions that are nearby each other in pixel space and then we
[01:09:26] other in pixel space and then we maximize over the worst case over those
[01:09:29] maximize over the worst case over those and then for shortcuts what we can do is
[01:09:31] and then for shortcuts what we can do is we can explicitly construct groups that
[01:09:33] we can explicitly construct groups that don't contain some of these shortcuts
[01:09:35] don't contain some of these shortcuts and we enumerate all such groups and
[01:09:37] and we enumerate all such groups and then make sure that these worst case
[01:09:38] then make sure that these worst case groups work well so for example
[01:09:41] groups work well so for example if we have a model that relies too much
[01:09:42] if we have a model that relies too much on backgrounds we construct
[01:09:44] on backgrounds we construct subgroups of the data that have sort of
[01:09:46] subgroups of the data that have sort of mismatching backgrounds and objects
[01:09:50] so the
[01:09:52] so the to basically wrap up here right the
[01:09:53] to basically wrap up here right the limits of this kind of approach is that
[01:09:55] limits of this kind of approach is that if we pick too small of a worst case
[01:09:57] if we pick too small of a worst case group we got no robustness and if we
[01:09:59] group we got no robustness and if we pick too broad of a worst case group we
[01:10:01] pick too broad of a worst case group we get vacuous models and there's no simple
[01:10:03] get vacuous models and there's no simple or general principle for designing these
[01:10:05] or general principle for designing these losses even though this approach gives
[01:10:07] losses even though this approach gives us really nice ways of thinking about
[01:10:08] us really nice ways of thinking about and optimizing models for the worst case
[01:10:11] and optimizing models for the worst case and getting guarantees
[01:10:13] and getting guarantees okay i'm going to wrap up there if
[01:10:15] okay i'm going to wrap up there if anyone has questions
[01:10:16] anyone has questions i would be happy to
[01:10:18] i would be happy to answer them i can stay for a little bit
[01:10:20] answer them i can stay for a little bit longer
[01:10:21] longer and chat if people have questions
[01:10:24] and chat if people have questions all right maybe we should just uh thank
[01:10:26] all right maybe we should just uh thank tatsu for his really insightful and
[01:10:29] tatsu for his really insightful and interesting talk
[01:10:31] interesting talk and then maybe
[01:10:33] and then maybe run off if they have to go
[01:10:35] run off if they have to go so thanks tata
[01:10:39] thank you
[01:10:41] thank you really a lot of interesting things food
[01:10:43] really a lot of interesting things food for thought so i hope that everyone is
[01:10:46] for thought so i hope that everyone is kind of have their eyes opened with
[01:10:48] kind of have their eyes opened with respect to all the different problems
[01:10:50] respect to all the different problems that we're we're seeing and hopefully
[01:10:52] that we're we're seeing and hopefully motivated to help
[01:10:53] motivated to help solve some of these because i think
[01:10:54] solve some of these because i think there's a lot of interesting open
[01:10:56] there's a lot of interesting open research questions here


================================================================================
LECTURE 053
================================================================================

Fireside Talks: State of Robotics I Automation and Robotics Engineering Lectures - Stanford

Source: https://www.youtube.com/watch?v=hVsR9DdR3qE

---

Transcript

[00:00:05] hi everyone okay let's start to 21 the
[00:00:08] hi everyone okay let's start to 21 the second lecture or the second week
[00:00:11] second lecture or the second week um of this quarter so
[00:00:14] um of this quarter so um yeah hi i'm dorsa uh you saw me last
[00:00:17] um yeah hi i'm dorsa uh you saw me last time i'm co-teaching this class with
[00:00:18] time i'm co-teaching this class with percy um and today our plan is to talk a
[00:00:23] percy um and today our plan is to talk a little bit about robotics so this is
[00:00:25] little bit about robotics so this is this is going to be kind of an informal
[00:00:27] this is going to be kind of an informal introduction to robotics a little bit of
[00:00:29] introduction to robotics a little bit of a history a little bit of state of the
[00:00:31] a history a little bit of state of the art
[00:00:32] art some cool videos um and and a bit of
[00:00:35] some cool videos um and and a bit of chat so i don't need to finish my slides
[00:00:37] chat so i don't need to finish my slides i have a lot of slides i'm probably not
[00:00:39] i have a lot of slides i'm probably not going to finish my slides so feel free
[00:00:42] going to finish my slides so feel free to interrupt me any point in time if you
[00:00:44] to interrupt me any point in time if you have questions about anything we can
[00:00:45] have questions about anything we can make this an informal discussion again
[00:00:47] make this an informal discussion again at any point in time uh i will probably
[00:00:50] at any point in time uh i will probably not cover like all the things that are
[00:00:51] not cover like all the things that are in the slides anyways
[00:00:53] in the slides anyways um all right so let's get into some
[00:00:56] um all right so let's get into some quick logistics
[00:00:58] quick logistics all right so
[00:01:00] all right so this was
[00:01:01] this was our plan for the class i'm sure you've
[00:01:03] our plan for the class i'm sure you've seen this from the last lecture so the
[00:01:05] seen this from the last lecture so the plan was to start with reflex based
[00:01:07] plan was to start with reflex based models we've kind of already started
[00:01:09] models we've kind of already started that so
[00:01:10] that so percy basically went over uh the machine
[00:01:14] percy basically went over uh the machine learning components last week and we
[00:01:16] learning components last week and we also have another week of machine
[00:01:18] also have another week of machine learning a little bit of deep learning
[00:01:20] learning a little bit of deep learning and then starting next week uh we are
[00:01:22] and then starting next week uh we are going to talk about state-based models
[00:01:24] going to talk about state-based models so i'm going to cover a lot of that so
[00:01:27] so i'm going to cover a lot of that so we're going to do search mdps games i'm
[00:01:30] we're going to do search mdps games i'm going to reuse a lot of videos from last
[00:01:32] going to reuse a lot of videos from last year just giving you a heads up we'll
[00:01:33] year just giving you a heads up we'll probably add and remove some but last
[00:01:36] probably add and remove some but last year's videos there in nvidia we have a
[00:01:38] year's videos there in nvidia we have a whiteboard everything is great so i'll
[00:01:40] whiteboard everything is great so i'll probably reuse a lot of that but we
[00:01:42] probably reuse a lot of that but we basically will plan to break them into
[00:01:44] basically will plan to break them into modules
[00:01:45] modules um and then percy will cover variable
[00:01:47] um and then percy will cover variable based models and i will finish logic and
[00:01:51] based models and i will finish logic and finish the class so so that was just a
[00:01:52] finish the class so so that was just a quick overview of what the plan is if
[00:01:55] quick overview of what the plan is if you remember monday lectures are not
[00:01:57] you remember monday lectures are not modules right monday lectures are going
[00:01:59] modules right monday lectures are going to be guest talks and chats and having
[00:02:02] to be guest talks and chats and having fun so um just to give you an idea of
[00:02:05] fun so um just to give you an idea of the monday lectures um
[00:02:07] the monday lectures um so this is a tentative schedule you've
[00:02:09] so this is a tentative schedule you've already done the introduction to ai
[00:02:11] already done the introduction to ai percy did that
[00:02:13] percy did that today i will be doing this talk on state
[00:02:15] today i will be doing this talk on state of robotics talking a little bit about
[00:02:17] of robotics talking a little bit about what that is why you should care about
[00:02:18] what that is why you should care about it
[00:02:19] it and then next week we have next monday
[00:02:21] and then next week we have next monday we have a guest speaker um and the guest
[00:02:24] we have a guest speaker um and the guest speaker is mariano florentino spoiler he
[00:02:27] speaker is mariano florentino spoiler he is a faculty in the law school
[00:02:29] is a faculty in the law school he's also on the california supreme
[00:02:31] he's also on the california supreme court so it should be a fun talk to
[00:02:34] court so it should be a fun talk to attend he does a lot of work around ai
[00:02:37] attend he does a lot of work around ai and law he does teach a class actually
[00:02:39] and law he does teach a class actually on regulating ai and he has a lot of
[00:02:40] on regulating ai and he has a lot of interesting opinions on that so totally
[00:02:43] interesting opinions on that so totally recommend showing up for that i think
[00:02:44] recommend showing up for that i think that would be a lot of fun
[00:02:46] that would be a lot of fun then the week after we have tatsu
[00:02:48] then the week after we have tatsu hashimato so tatsu is a new faculty in
[00:02:51] hashimato so tatsu is a new faculty in the cs department he does a lot of work
[00:02:53] the cs department he does a lot of work around robust machine learning so he'll
[00:02:55] around robust machine learning so he'll probably be talking about that
[00:02:57] probably be talking about that followed by percy talking about state of
[00:02:59] followed by percy talking about state of natural language processing in week five
[00:03:01] natural language processing in week five by the way this is tentative so do not
[00:03:03] by the way this is tentative so do not quote me on that some of these dates
[00:03:04] quote me on that some of these dates might change i think the speakers are
[00:03:06] might change i think the speakers are accurate the dates might move around
[00:03:09] accurate the dates might move around and then finally in week six we have we
[00:03:11] and then finally in week six we have we have emma pearson talking about ai and
[00:03:13] have emma pearson talking about ai and equality and healthcare uh she she's got
[00:03:16] equality and healthcare uh she she's got she'll be a faculty at cornell tech
[00:03:18] she'll be a faculty at cornell tech joining next year so it would be
[00:03:19] joining next year so it would be interesting to hear
[00:03:21] interesting to hear hear from her
[00:03:22] hear from her and then week seven is kind of like a
[00:03:24] and then week seven is kind of like a fun chat just with percy and i will show
[00:03:27] fun chat just with percy and i will show up and you can ask us anything basically
[00:03:29] up and you can ask us anything basically um we plan to think about like grad
[00:03:31] um we plan to think about like grad school and talk about like those topics
[00:03:33] school and talk about like those topics probably so if you have any questions
[00:03:34] probably so if you have any questions about that research what to do after 221
[00:03:37] about that research what to do after 221 i think that's a good week to attend
[00:03:40] i think that's a good week to attend uh week eight we have you offshore so he
[00:03:42] uh week eight we have you offshore so he was faculty at stanford and he has done
[00:03:45] was faculty at stanford and he has done a lot of work like from back in the days
[00:03:47] a lot of work like from back in the days in ai and his you know kind of like all
[00:03:48] in ai and his you know kind of like all the revolution of ai so really fun to
[00:03:51] the revolution of ai so really fun to show up for his talk
[00:03:53] show up for his talk and then week nine we have drago and
[00:03:55] and then week nine we have drago and golovk so drago is the head of
[00:03:57] golovk so drago is the head of autonomous driving at waymo uh so if
[00:04:00] autonomous driving at waymo uh so if you're interested in learning about
[00:04:01] you're interested in learning about autonomous driving to the extent that he
[00:04:03] autonomous driving to the extent that he can talk about it
[00:04:04] can talk about it that would be weakness
[00:04:06] that would be weakness um and then week 10 i will do a
[00:04:09] um and then week 10 i will do a conclusion so um that is and i'll wrap
[00:04:12] conclusion so um that is and i'll wrap up the class so so that is kind of our
[00:04:14] up the class so so that is kind of our plan from one day lectures i just want
[00:04:15] plan from one day lectures i just want to advertise it so and you kind of like
[00:04:17] to advertise it so and you kind of like have a plan of what will show up next
[00:04:20] have a plan of what will show up next um next couple weeks okay
[00:04:24] um next couple weeks okay all right any questions
[00:04:26] all right any questions if you have any questions by the way um
[00:04:28] if you have any questions by the way um just put it on chat or interrupt
[00:04:31] just put it on chat or interrupt all right so today we want to talk about
[00:04:32] all right so today we want to talk about robotics um and i just wanted to start
[00:04:35] robotics um and i just wanted to start it off i have a lot of videos today so
[00:04:37] it off i have a lot of videos today so it'll be fun videos i just wanted to
[00:04:39] it'll be fun videos i just wanted to start off by some video showing robots
[00:04:41] start off by some video showing robots can dance just to advertise this
[00:04:48] [Music]
[00:04:51] you guys can hear this right
[00:04:54] you guys can hear this right yeah okay
[00:05:01] [Music]
[00:05:20] anyways i wanted to start with this
[00:05:22] anyways i wanted to start with this video just to motivate why we care about
[00:05:24] video just to motivate why we care about robotics this is spot from boston
[00:05:26] robotics this is spot from boston dynamics boston dynamics is this company
[00:05:29] dynamics boston dynamics is this company that does a lot of really cool robotics
[00:05:31] that does a lot of really cool robotics type of work we'll see more of the
[00:05:32] type of work we'll see more of the robots later in this lecture but robots
[00:05:35] robots later in this lecture but robots can dance they're cool and let's just
[00:05:37] can dance they're cool and let's just start the the conversation with this
[00:05:39] start the the conversation with this question of well what is a robot and
[00:05:42] question of well what is a robot and when is it that we call it a robot so
[00:05:45] when is it that we call it a robot so question that i have and i think maybe
[00:05:46] question that i have and i think maybe like it would be a good starter that you
[00:05:48] like it would be a good starter that you can just go into breakout drones for two
[00:05:50] can just go into breakout drones for two to three minutes just chat about this
[00:05:52] to three minutes just chat about this and then the question that i guess i
[00:05:54] and then the question that i guess i have is think about a hammer if is is a
[00:05:58] have is think about a hammer if is is a hammer a robot what do you think
[00:06:00] hammer a robot what do you think and think about like google or google
[00:06:03] and think about like google or google home or like siri is that a robot so
[00:06:05] home or like siri is that a robot so what defines a robot and i think that's
[00:06:08] what defines a robot and i think that's just like just the starter to think
[00:06:10] just like just the starter to think about like what is a robot why should we
[00:06:12] about like what is a robot why should we care about it and go to breakout rooms
[00:06:14] care about it and go to breakout rooms two to three minutes we'll come back put
[00:06:16] two to three minutes we'll come back put your answers introduce yourself talk to
[00:06:19] your answers introduce yourself talk to talk to the people in the breakout room
[00:06:20] talk to the people in the breakout room and then put your answers when you come
[00:06:22] and then put your answers when you come back on chat and we'll continue
[00:06:28] yes i believe everyone's back all right
[00:06:31] yes i believe everyone's back all right yeah so i hope you just met your friends
[00:06:33] yeah so i hope you just met your friends and other people in the class and yeah
[00:06:35] and other people in the class and yeah if you have thoughts put it in the chat
[00:06:37] if you have thoughts put it in the chat we'll look at it later what is a robot
[00:06:40] we'll look at it later what is a robot but let's actually continue with our
[00:06:43] but let's actually continue with our talk today because again i have a lot of
[00:06:45] talk today because again i have a lot of videos
[00:06:46] videos uh so my plan for today is uh to do a
[00:06:49] uh so my plan for today is uh to do a bunch of things i'm gonna start with a
[00:06:50] bunch of things i'm gonna start with a quick history of robotics where did it
[00:06:52] quick history of robotics where did it come from why do we care about it
[00:06:55] come from why do we care about it um then i'm gonna spend a bit of time
[00:06:57] um then i'm gonna spend a bit of time talking about why you should care about
[00:06:59] talking about why you should care about robots and why should this class care
[00:07:01] robots and why should this class care about robot how are robots related to ai
[00:07:03] about robot how are robots related to ai what is their relationship there
[00:07:05] what is their relationship there and then i wanted to spend a little bit
[00:07:07] and then i wanted to spend a little bit of time talking about robotics at
[00:07:09] of time talking about robotics at stanford so um what are the research
[00:07:11] stanford so um what are the research that's done at stanford uh who are the
[00:07:13] that's done at stanford uh who are the faculty who do robotics at stanford just
[00:07:15] faculty who do robotics at stanford just so you know the faces and you know what
[00:07:17] so you know the faces and you know what classes to take and kind of like uh what
[00:07:20] classes to take and kind of like uh what type of research is done and how you can
[00:07:21] type of research is done and how you can get in touch with them
[00:07:23] get in touch with them this is probably to the extent that i
[00:07:25] this is probably to the extent that i can get but if i have time i'm gonna
[00:07:27] can get but if i have time i'm gonna talk a little bit about some exciting
[00:07:29] talk a little bit about some exciting robotics applications like all the
[00:07:31] robotics applications like all the awesome things that are happening
[00:07:33] awesome things that are happening um also all the not so awesome things
[00:07:35] um also all the not so awesome things that are happening the fact that robots
[00:07:37] that are happening the fact that robots are far from perfect or like not there
[00:07:40] are far from perfect or like not there yet and then if i have time which is
[00:07:42] yet and then if i have time which is very likely to not be the case
[00:07:44] very likely to not be the case i will talk a little bit about my own
[00:07:46] i will talk a little bit about my own research around interactive robotics
[00:07:49] research around interactive robotics so again the rules of it is at any point
[00:07:51] so again the rules of it is at any point in time interrupt just ask questions
[00:07:54] in time interrupt just ask questions raise your hand like or we'll do that
[00:07:57] raise your hand like or we'll do that and we'll go from there
[00:07:59] and we'll go from there and let's just jump into this quick
[00:08:01] and let's just jump into this quick history of robotics where does it come
[00:08:03] history of robotics where does it come from where does the word robot even
[00:08:05] from where does the word robot even comes from
[00:08:06] comes from so the word robot actually is kind of
[00:08:08] so the word robot actually is kind of old it comes from this display from carl
[00:08:11] old it comes from this display from carl catholic in 1921 the play is called the
[00:08:13] catholic in 1921 the play is called the rosen's universal robots and it's about
[00:08:16] rosen's universal robots and it's about basically this mechanical men that are
[00:08:18] basically this mechanical men that are built in factory and are supposed to do
[00:08:20] built in factory and are supposed to do work for humans and then they rise
[00:08:23] work for humans and then they rise against humans so that's kind of like
[00:08:25] against humans so that's kind of like the part of it um and and basically it
[00:08:28] the part of it um and and basically it has a little bit of a dystopian view of
[00:08:30] has a little bit of a dystopian view of that and and the word these mechanical
[00:08:32] that and and the word these mechanical men basically are are called robota
[00:08:35] men basically are are called robota which basically means slave or kind of
[00:08:38] which basically means slave or kind of like labor type work in czech i don't
[00:08:41] like labor type work in czech i don't know if anyone knows czech and i don't
[00:08:42] know if anyone knows czech and i don't know how accurate that is but that's
[00:08:44] know how accurate that is but that's basically what i read on wikipedia so i
[00:08:46] basically what i read on wikipedia so i assume that is that is accurate
[00:08:48] assume that is that is accurate um so that was the word robot but
[00:08:50] um so that was the word robot but then the word robotic was also first um
[00:08:53] then the word robotic was also first um introduced by this guy isaac asimov
[00:08:56] introduced by this guy isaac asimov who is a science fiction writer and then
[00:08:58] who is a science fiction writer and then he came around 1950s and he wrote a
[00:09:01] he came around 1950s and he wrote a bunch of books about robots and and the
[00:09:03] bunch of books about robots and and the view of it was a little bit nicer it was
[00:09:05] view of it was a little bit nicer it was a little bit friendlier than this
[00:09:06] a little bit friendlier than this dystopian view and and he talked about
[00:09:09] dystopian view and and he talked about these different rules of robotics the
[00:09:11] these different rules of robotics the robots were there to help humans and
[00:09:13] robots were there to help humans and they were supposed to like follow these
[00:09:14] they were supposed to like follow these different rules so you might you guys
[00:09:16] different rules so you might you guys might have heard of these three rules of
[00:09:18] might have heard of these three rules of robotics by azek asimov so the first one
[00:09:21] robotics by azek asimov so the first one is that a robot may not injure a human
[00:09:23] is that a robot may not injure a human being or through inaction allow a human
[00:09:26] being or through inaction allow a human being to come to harm
[00:09:28] being to come to harm the second one is a robot must obey the
[00:09:30] the second one is a robot must obey the orders given given it by human beings
[00:09:32] orders given given it by human beings except for each or where such orders
[00:09:35] except for each or where such orders with conflict with the first law
[00:09:38] with conflict with the first law and then the last one is a robot must
[00:09:40] and then the last one is a robot must protect its own existence as long as
[00:09:42] protect its own existence as long as such protection does not conflict with
[00:09:43] such protection does not conflict with the first and second one okay so kind of
[00:09:46] the first and second one okay so kind of like
[00:09:47] like this is this is obvious right like you
[00:09:48] this is this is obvious right like you don't want the robots to to go against
[00:09:50] don't want the robots to to go against humans and you don't want the robot to
[00:09:52] humans and you don't want the robot to kill itself and and these are kind of
[00:09:55] kill itself and and these are kind of the three rules and and the reason i'm
[00:09:57] the three rules and and the reason i'm mentioning this is that people are
[00:09:58] mentioning this is that people are coming back to these rules like even
[00:10:00] coming back to these rules like even these days like when you're people are
[00:10:02] these days like when you're people are thinking about robots they feel like oh
[00:10:03] thinking about robots they feel like oh a robot should try to satisfy these
[00:10:05] a robot should try to satisfy these three laws of robotics and these are
[00:10:07] three laws of robotics and these are kind of like the core laws that need to
[00:10:09] kind of like the core laws that need to be satisfied
[00:10:10] be satisfied and and the thing about these laws is
[00:10:12] and and the thing about these laws is sure it is nice but they're kind of
[00:10:14] sure it is nice but they're kind of obvious right and actually satisfying
[00:10:17] obvious right and actually satisfying them is the most difficult part and it
[00:10:19] them is the most difficult part and it doesn't really like go go through that
[00:10:21] doesn't really like go go through that like like if i'm if i have my robot
[00:10:23] like like if i'm if i have my robot running gradient descent on the loss
[00:10:25] running gradient descent on the loss function how do i define that loss
[00:10:27] function how do i define that loss function accurately so i satisfy these
[00:10:29] function accurately so i satisfy these these rules like that is not a very
[00:10:31] these rules like that is not a very obvious thing and then i guess get it in
[00:10:33] obvious thing and then i guess get it in go go and talk about that
[00:10:35] go go and talk about that um let me give you an example so let's
[00:10:37] um let me give you an example so let's say i have rosie the robot that's
[00:10:39] say i have rosie the robot that's supposed to like clean my house or i
[00:10:41] supposed to like clean my house or i have a roomba that roomba is supposed to
[00:10:43] have a roomba that roomba is supposed to clean my house
[00:10:44] clean my house and let's say i have built this super
[00:10:46] and let's say i have built this super nice intricate house of cards okay
[00:10:49] nice intricate house of cards okay any human was helping me clean my house
[00:10:52] any human was helping me clean my house would know that you shouldn't touch this
[00:10:53] would know that you shouldn't touch this house of cards because they spend so
[00:10:55] house of cards because they spend so much time and this is so valuable to me
[00:10:57] much time and this is so valuable to me and then you shouldn't touch it you
[00:10:58] and then you shouldn't touch it you shouldn't go and clean it but a robot
[00:11:00] shouldn't go and clean it but a robot wouldn't know that why would it like why
[00:11:02] wouldn't know that why would it like why would it know what a house of cards is
[00:11:04] would it know what a house of cards is or like how much energy i've put in in
[00:11:06] or like how much energy i've put in in creating it um so so that is kind of
[00:11:09] creating it um so so that is kind of like the objective function and sure
[00:11:10] like the objective function and sure it's not about harm it might be harming
[00:11:13] it's not about harm it might be harming me it's harming humans uh but but in
[00:11:15] me it's harming humans uh but but in general thinking about what should be
[00:11:17] general thinking about what should be the objective that the robot should
[00:11:19] the objective that the robot should satisfy what should the reward function
[00:11:21] satisfy what should the reward function be we'll be talking about mdps in a
[00:11:22] be we'll be talking about mdps in a couple of weeks and we'll be talking
[00:11:24] couple of weeks and we'll be talking about reward functions what should the
[00:11:26] about reward functions what should the reward function be is actually a very
[00:11:29] reward function be is actually a very difficult problem this is an active area
[00:11:31] difficult problem this is an active area of research trying to understand what
[00:11:33] of research trying to understand what are the human preferences what are the
[00:11:35] are the human preferences what are the things that humans actually want robotic
[00:11:37] things that humans actually want robotic agents to do around them and then at the
[00:11:39] agents to do around them and then at the same time what does the robot think
[00:11:41] same time what does the robot think those preferences are there's a mismatch
[00:11:43] those preferences are there's a mismatch between that there's always going to be
[00:11:44] between that there's always going to be a mismatch between that and that
[00:11:47] a mismatch between that and that mismatch how harmful is that going to be
[00:11:49] mismatch how harmful is that going to be how unsafe is it that the robot doesn't
[00:11:51] how unsafe is it that the robot doesn't know everything
[00:11:53] know everything and and you might ask well why don't you
[00:11:56] and and you might ask well why don't you just like write that as an objective
[00:11:58] just like write that as an objective that hey robot don't go and destroy
[00:12:00] that hey robot don't go and destroy doris's house apart
[00:12:02] doris's house apart oh that's perfectly fine but the thing
[00:12:04] oh that's perfectly fine but the thing is
[00:12:05] is like it is really hard to write out all
[00:12:08] like it is really hard to write out all these specifications all these
[00:12:09] these specifications all these properties that we would want to satisfy
[00:12:11] properties that we would want to satisfy in the world because there's just so
[00:12:13] in the world because there's just so much context so much information in the
[00:12:14] much context so much information in the world and humans just know that like as
[00:12:17] world and humans just know that like as they grow up like they learn and they
[00:12:18] they grow up like they learn and they know that and robots wouldn't
[00:12:20] know that and robots wouldn't necessarily know that and you might say
[00:12:22] necessarily know that and you might say that's just data so let's just learn
[00:12:24] that's just data so let's just learn through that a lot of people agree with
[00:12:26] through that a lot of people agree with that a lot of people disagree with that
[00:12:28] that a lot of people disagree with that it's still like a point of debate that
[00:12:30] it's still like a point of debate that is that just data like i just need to
[00:12:32] is that just data like i just need to like show more things so the robot knows
[00:12:34] like show more things so the robot knows house of cards are important to yourself
[00:12:36] house of cards are important to yourself like that's also an another being but in
[00:12:38] like that's also an another being but in general the point of this slide is hey
[00:12:41] general the point of this slide is hey these rules of uh
[00:12:43] these rules of uh uh asymmetry these rules that are put
[00:12:45] uh asymmetry these rules that are put out there they're not that obvious to
[00:12:47] out there they're not that obvious to satisfy sure i can say don't kill humans
[00:12:49] satisfy sure i can say don't kill humans but it's not like really obvious how i
[00:12:52] but it's not like really obvious how i write out like what does it mean not
[00:12:53] write out like what does it mean not harming humans
[00:12:55] harming humans and the second point i want to make here
[00:12:57] and the second point i want to make here is that even that is still under
[00:13:00] is that even that is still under question then not harming humans it's
[00:13:02] question then not harming humans it's actually not obvious to everyone that we
[00:13:04] actually not obvious to everyone that we shouldn't use robots not harm humans
[00:13:07] shouldn't use robots not harm humans which is a little bit silly in my
[00:13:08] which is a little bit silly in my opinion but i'm just talking about
[00:13:10] opinion but i'm just talking about everyone's opinion here
[00:13:12] everyone's opinion here so like if you think about if you think
[00:13:14] so like if you think about if you think about like our defense or other other
[00:13:16] about like our defense or other other countries defense right like uh people
[00:13:19] countries defense right like uh people use these things called autonomous
[00:13:21] use these things called autonomous weapon systems which are basically like
[00:13:24] weapon systems which are basically like you can have drones that can detect
[00:13:26] you can have drones that can detect targets and shoot at them you can have
[00:13:28] targets and shoot at them you can have detail autonomous weapon systems these
[00:13:30] detail autonomous weapon systems these are commonly referred to laws
[00:13:33] are commonly referred to laws by laws and lethal autonomous weapon
[00:13:35] by laws and lethal autonomous weapon system basically and there it's a big
[00:13:37] system basically and there it's a big question should we use these should we
[00:13:39] question should we use these should we not use this when can we use like we
[00:13:41] not use this when can we use like we thought autonomous weapon systems and
[00:13:43] thought autonomous weapon systems and and yeah it's not like it is here it's
[00:13:45] and yeah it's not like it is here it's not a thing that's science fiction it's
[00:13:47] not a thing that's science fiction it's actually a thing that uh our like
[00:13:50] actually a thing that uh our like basically u.s defense has it like like
[00:13:52] basically u.s defense has it like like the whole like all over different
[00:13:54] the whole like all over different countries have some version of this
[00:13:56] countries have some version of this and and the question is like yeah if we
[00:13:58] and and the question is like yeah if we don't use it like other countries might
[00:14:00] don't use it like other countries might use it and how do we think about um like
[00:14:02] use it and how do we think about um like using or not using these systems like
[00:14:04] using or not using these systems like how do we put a ban event on it what
[00:14:05] how do we put a ban event on it what does a ban on it mean
[00:14:07] does a ban on it mean um and there's a lot of um debate around
[00:14:09] um and there's a lot of um debate around this start russell who's a faculty at uc
[00:14:12] this start russell who's a faculty at uc berkeley and he also features a either
[00:14:14] berkeley and he also features a either he is he's basically a proponent of
[00:14:16] he is he's basically a proponent of banning or fully banning and with
[00:14:18] banning or fully banning and with autonomous weapon systems and he has a
[00:14:20] autonomous weapon systems and he has a lot of interesting talks around this we
[00:14:22] lot of interesting talks around this we will talk about this a little bit more
[00:14:24] will talk about this a little bit more toward um like in in the conclusion
[00:14:26] toward um like in in the conclusion lecture but this is also something i
[00:14:28] lecture but this is also something i wanted to mention because it could be an
[00:14:29] wanted to mention because it could be an interesting topic to talk to tino about
[00:14:31] interesting topic to talk to tino about next week like when you think about laws
[00:14:33] next week like when you think about laws and regulating these things how does how
[00:14:35] and regulating these things how does how does the regulation how do the
[00:14:36] does the regulation how do the revelations like work and make sense
[00:14:40] revelations like work and make sense so yeah so even like not harming humans
[00:14:42] so yeah so even like not harming humans which is what isaac
[00:14:43] which is what isaac said like that is still like even under
[00:14:45] said like that is still like even under question here like it's not clear if
[00:14:47] question here like it's not clear if that is what we want to do
[00:14:49] that is what we want to do but okay let's go back to the history of
[00:14:51] but okay let's go back to the history of robotics like why do we have robotics
[00:14:53] robotics like why do we have robotics when did it start so around like 50s and
[00:14:55] when did it start so around like 50s and 60s um we had a lot of excitement around
[00:14:58] 60s um we had a lot of excitement around ai right so person was talking about ai
[00:15:00] ai right so person was talking about ai last week talking about the history of
[00:15:02] last week talking about the history of ai and that was the time that there was
[00:15:05] ai and that was the time that there was a ton of excitement and even touring
[00:15:07] a ton of excitement and even touring turing has this paper where in the paper
[00:15:09] turing has this paper where in the paper he writes the best thing we can do is to
[00:15:12] he writes the best thing we can do is to build a robot with tv cameras for its
[00:15:14] build a robot with tv cameras for its eyes and motors for its legs and have it
[00:15:17] eyes and motors for its legs and have it run around the countryside and learn
[00:15:19] run around the countryside and learn from the world
[00:15:21] from the world so this is from back in the day like
[00:15:22] so this is from back in the day like touring time and this is what he was
[00:15:24] touring time and this is what he was thinking and like even in the same like
[00:15:27] thinking and like even in the same like in in the same paper later in that paper
[00:15:29] in in the same paper later in that paper turing says well this is too difficult i
[00:15:32] turing says well this is too difficult i don't want to deal with like this
[00:15:33] don't want to deal with like this physical interaction with the
[00:15:34] physical interaction with the countryside so instead maybe we should
[00:15:37] countryside so instead maybe we should focus on the problem of intelligence
[00:15:39] focus on the problem of intelligence maybe we should focus on ai and that is
[00:15:41] maybe we should focus on ai and that is how like the next six 50 years was all
[00:15:43] how like the next six 50 years was all about ai and building good ai systems
[00:15:46] about ai and building good ai systems and there was a lot around robotics too
[00:15:48] and there was a lot around robotics too i like sure like robotics also has seen
[00:15:50] i like sure like robotics also has seen a lot of advances since 50s
[00:15:52] a lot of advances since 50s but a lot more has happened in the ai
[00:15:55] but a lot more has happened in the ai side of things just because the robotics
[00:15:57] side of things just because the robotics side was so difficult
[00:15:59] side was so difficult um an example of that is thinking about
[00:16:02] um an example of that is thinking about a big blue which basically won its first
[00:16:04] a big blue which basically won its first game against the world champion for
[00:16:06] game against the world champion for chess in 1996 and it was doing like
[00:16:09] chess in 1996 and it was doing like amazing intelligence right they had an
[00:16:10] amazing intelligence right they had an intelligence being able to like win at
[00:16:12] intelligence being able to like win at this game of chess but the thing that
[00:16:14] this game of chess but the thing that was happening was that the chess pieces
[00:16:17] was happening was that the chess pieces were moved by humans because that was so
[00:16:19] were moved by humans because that was so difficult like grasping is still so
[00:16:21] difficult like grasping is still so difficult when you think about like a
[00:16:23] difficult when you think about like a robot trying to actually like move these
[00:16:25] robot trying to actually like move these pieces and and that was not solved like
[00:16:27] pieces and and that was not solved like in 1996 in any race at all yeah
[00:16:31] in 1996 in any race at all yeah all right so
[00:16:32] all right so when did the first robot comes i've been
[00:16:34] when did the first robot comes i've been talking about this history and and like
[00:16:36] talking about this history and and like the question is okay so what was the
[00:16:38] the question is okay so what was the first robot out there the first
[00:16:39] first robot out there the first intelligent robot so the first
[00:16:41] intelligent robot so the first intelligent robot was shaky i have a
[00:16:43] intelligent robot was shaky i have a video of it here uh it's kind of like a
[00:16:45] video of it here uh it's kind of like a five minute video so it's a little bit
[00:16:46] five minute video so it's a little bit of a long video but let's just watch it
[00:16:48] of a long video but let's just watch it i think it has a lot of interesting
[00:16:50] i think it has a lot of interesting history in it
[00:16:53] shaky was the world's first mobile
[00:16:55] shaky was the world's first mobile intelligent robot embodying numerous
[00:16:58] intelligent robot embodying numerous breakthroughs in artificial intelligence
[00:17:00] breakthroughs in artificial intelligence robotics computer vision navigation and
[00:17:02] robotics computer vision navigation and other research areas
[00:17:04] other research areas the robot was developed from 1966 to
[00:17:07] the robot was developed from 1966 to 1972 by sri international then called
[00:17:11] 1972 by sri international then called stanford research institute
[00:17:13] stanford research institute and its legacy and impact are still very
[00:17:15] and its legacy and impact are still very much alive today shaky is really the
[00:17:17] much alive today shaky is really the great grandfather of things like
[00:17:19] great grandfather of things like self-driving cars and even military
[00:17:21] self-driving cars and even military drones the hardware was really pretty
[00:17:24] drones the hardware was really pretty primitive
[00:17:26] primitive but the software architecture and the
[00:17:28] but the software architecture and the software algorithms are what changed the
[00:17:31] software algorithms are what changed the world i think we all thought we were
[00:17:32] world i think we all thought we were doing really interesting stuff so it
[00:17:34] doing really interesting stuff so it didn't really uh dawn on us that we were
[00:17:37] didn't really uh dawn on us that we were doing anything special shaky established
[00:17:40] doing anything special shaky established a position about
[00:17:42] a position about what we should be thinking about as
[00:17:44] what we should be thinking about as possible as feasible
[00:17:46] possible as feasible to understand why shaky is so important
[00:17:49] to understand why shaky is so important we have to go back to 1966 and we have
[00:17:52] we have to go back to 1966 and we have to understand where ai research was at
[00:17:54] to understand where ai research was at that time
[00:17:55] that time well you have to remember that it was
[00:17:57] well you have to remember that it was pretty much of a a green field which
[00:17:59] pretty much of a a green field which nikki started
[00:18:01] nikki started all over the country and even outside of
[00:18:03] all over the country and even outside of the united states people were beginning
[00:18:05] the united states people were beginning to build the components to artificial
[00:18:07] to build the components to artificial intelligence nobody had tried at the
[00:18:09] intelligence nobody had tried at the time that shaky was launched to
[00:18:12] time that shaky was launched to integrate all the components of ai and
[00:18:14] integrate all the components of ai and robotics into a single
[00:18:16] robotics into a single uh
[00:18:17] uh moving vehicle that could reason about
[00:18:20] moving vehicle that could reason about the world could sense the world around
[00:18:21] the world could sense the world around it and could take actions prior to 1966
[00:18:25] it and could take actions prior to 1966 there were no robots or at least
[00:18:27] there were no robots or at least non-intelligent ones
[00:18:29] non-intelligent ones [Music]
[00:18:31] [Music] the concept of an intelligent robot was
[00:18:33] the concept of an intelligent robot was limited to the realm of fiction you will
[00:18:36] limited to the realm of fiction you will meet a charming character in the robot
[00:18:38] meet a charming character in the robot always at your service when you read the
[00:18:42] always at your service when you read the title of the original proposal it was
[00:18:44] title of the original proposal it was something like a mobile automaton for
[00:18:47] something like a mobile automaton for reconnaissance
[00:18:49] reconnaissance and the reason we called it an automaton
[00:18:52] and the reason we called it an automaton was because until shaking you couldn't
[00:18:55] was because until shaking you couldn't go to a funding agency and say i want
[00:18:57] go to a funding agency and say i want money to make a science fiction kind of
[00:18:59] money to make a science fiction kind of device so we needed name and finally
[00:19:03] device so we needed name and finally charlie
[00:19:04] charlie in his inimitable fashion said
[00:19:06] in his inimitable fashion said it shakes like hell when it moves let's
[00:19:08] it shakes like hell when it moves let's just call it shaky
[00:19:10] just call it shaky key components of shaky's hardware were
[00:19:13] key components of shaky's hardware were a tv camera to observe its environment
[00:19:15] a tv camera to observe its environment an antenna radio link
[00:19:17] an antenna radio link bump detectors and a push bar to move
[00:19:19] bump detectors and a push bar to move objects
[00:19:20] objects my role was mainly get the images and
[00:19:24] my role was mainly get the images and get uh whatever coordinates they needed
[00:19:26] get uh whatever coordinates they needed to determine where they were and uh
[00:19:29] to determine where they were and uh you know extract the information from
[00:19:31] you know extract the information from the image i remember when i first saw it
[00:19:34] the image i remember when i first saw it gee that looks like a dishwasher
[00:19:36] gee that looks like a dishwasher on wheels while charming shaky wasn't
[00:19:39] on wheels while charming shaky wasn't impressive for its looks
[00:19:41] impressive for its looks it was the ai and programming
[00:19:43] it was the ai and programming advancements that made it famous
[00:19:45] advancements that made it famous we structured shaky software in four
[00:19:47] we structured shaky software in four distinct layers
[00:19:49] distinct layers and that was the first time the layered
[00:19:52] and that was the first time the layered architecture was used for robots
[00:19:54] architecture was used for robots chicky's pioneering software
[00:19:56] chicky's pioneering software architecture paved the way to a new era
[00:19:58] architecture paved the way to a new era of ai and robotics the sri team later
[00:20:01] of ai and robotics the sri team later developed flaky a research robot that
[00:20:04] developed flaky a research robot that demonstrated fuzzy logic and
[00:20:05] demonstrated fuzzy logic and goal-oriented behavior
[00:20:07] goal-oriented behavior then came centabots one of the earliest
[00:20:10] then came centabots one of the earliest projects in swarm robotics where 100
[00:20:12] projects in swarm robotics where 100 autonomous robots demonstrated the
[00:20:14] autonomous robots demonstrated the ability to map a complex area
[00:20:16] ability to map a complex area collaboratively
[00:20:18] collaboratively i like how it's code that isn't just
[00:20:21] i like how it's code that isn't just turning numbers into other numbers you
[00:20:23] turning numbers into other numbers you get to see the thing come to life right
[00:20:25] get to see the thing come to life right next to you
[00:20:30] shaky also inspired research in natural
[00:20:32] shaky also inspired research in natural language-based interactions leading to
[00:20:34] language-based interactions leading to the popular speech-based technologies
[00:20:36] the popular speech-based technologies that we use today
[00:20:38] that we use today shakey's breakthrough in computer vision
[00:20:40] shakey's breakthrough in computer vision is now used to help drivers stay in
[00:20:42] is now used to help drivers stay in their lanes
[00:20:44] their lanes and every time you get driving
[00:20:45] and every time you get driving directions on your phone or navigation
[00:20:47] directions on your phone or navigation system you are benefiting from the a
[00:20:49] system you are benefiting from the a star navigation algorithm first invented
[00:20:52] star navigation algorithm first invented for shaky
[00:20:54] for shaky even nasa's mars exploration rovers use
[00:20:57] even nasa's mars exploration rovers use navigational techniques that were first
[00:20:59] navigational techniques that were first launched with shaky
[00:21:01] launched with shaky the future is things like potentially
[00:21:02] the future is things like potentially having uh teams of autonomous aircraft
[00:21:05] having uh teams of autonomous aircraft that could go out for example and do
[00:21:07] that could go out for example and do firefighting and doing this either fully
[00:21:09] firefighting and doing this either fully autonomously or potentially in tandem
[00:21:10] autonomously or potentially in tandem with human piloted aircraft that can go
[00:21:12] with human piloted aircraft that can go out and work with them collaboratively
[00:21:16] out and work with them collaboratively shaky now resides in the computer
[00:21:18] shaky now resides in the computer history museum visible to hundreds of
[00:21:20] history museum visible to hundreds of thousands of visitors annually
[00:21:22] thousands of visitors annually and in 2017 shaky was honored with an
[00:21:25] and in 2017 shaky was honored with an ieee milestone achievement award
[00:21:27] ieee milestone achievement award the shaken milestone's important because
[00:21:29] the shaken milestone's important because first of all shaky is the world's first
[00:21:32] first of all shaky is the world's first mobile intelligent robot
[00:21:34] mobile intelligent robot in addition
[00:21:35] in addition this is the first ieee milestone in
[00:21:38] this is the first ieee milestone in areas of either robotics or artificial
[00:21:41] areas of either robotics or artificial intelligence
[00:21:42] intelligence looking back more than 50 years after
[00:21:45] looking back more than 50 years after the shaky project began it's inspiring
[00:21:47] the shaky project began it's inspiring to see how one small team can make such
[00:21:50] to see how one small team can make such an impact
[00:21:51] an impact how one ambitious idea continues to
[00:21:53] how one ambitious idea continues to benefit our lives
[00:21:56] benefit our lives how one robot changed the world
[00:21:59] how one robot changed the world we didn't realize i think any of us what
[00:22:00] we didn't realize i think any of us what the significance of this was we knew we
[00:22:02] the significance of this was we knew we were the first but nobody knew where it
[00:22:04] were the first but nobody knew where it was going and i don't think any of us
[00:22:06] was going and i don't think any of us would have predicted
[00:22:08] would have predicted what happened
[00:22:10] shaky planted the flag way out there
[00:22:13] shaky planted the flag way out there it's a model of the kind of ambitious
[00:22:15] it's a model of the kind of ambitious projects that we should be looking at in
[00:22:17] projects that we should be looking at in the future
[00:22:19] the future all right so that was
[00:22:22] all right so that was shaky's video this is actually the yeah
[00:22:24] shaky's video this is actually the yeah and the computer history museum uh down
[00:22:27] and the computer history museum uh down the road so when things open up like i
[00:22:29] the road so when things open up like i suggest going there and seeing shaky
[00:22:31] suggest going there and seeing shaky there
[00:22:32] there um cool so that was my quick history of
[00:22:34] um cool so that was my quick history of robotics um
[00:22:37] robotics um any questions
[00:22:38] any questions thumbs up just feel free to ask
[00:22:41] thumbs up just feel free to ask for
[00:22:42] for later and
[00:22:43] later and so uh in the next part what i'd like to
[00:22:46] so uh in the next part what i'd like to do is i'd like to talk a little bit
[00:22:48] do is i'd like to talk a little bit about how is this related to some of the
[00:22:50] about how is this related to some of the topics that we are we're learning
[00:22:51] topics that we are we're learning through this class so how are robots in
[00:22:53] through this class so how are robots in general like using ideas around ai and i
[00:22:56] general like using ideas around ai and i want to spend a little bit of time on
[00:22:57] want to spend a little bit of time on that
[00:22:58] that so
[00:22:59] so if you think about robotics there's this
[00:23:02] if you think about robotics there's this common architecture that is usually used
[00:23:05] common architecture that is usually used for robots
[00:23:06] for robots um it's not it's more under question
[00:23:09] um it's not it's more under question these days but back in the day actually
[00:23:11] these days but back in the day actually like this was the architecture that a
[00:23:12] like this was the architecture that a lot of robots tend to use which is the
[00:23:15] lot of robots tend to use which is the sensing planning and acting architecture
[00:23:17] sensing planning and acting architecture and then looping that right so so you
[00:23:19] and then looping that right so so you sense the world you watch the world you
[00:23:21] sense the world you watch the world you perceive the world you do perception
[00:23:23] perceive the world you do perception and then from that you try to plan
[00:23:26] and then from that you try to plan on what to do next that is where the
[00:23:28] on what to do next that is where the intelligence lies
[00:23:30] intelligence lies and then after that you just act you
[00:23:32] and then after that you just act you execute that plan
[00:23:34] execute that plan and then once you've acted like you can
[00:23:36] and then once you've acted like you can go back and sense plan and act again and
[00:23:38] go back and sense plan and act again and that's a very common architecture that
[00:23:40] that's a very common architecture that most robots do use and then these days
[00:23:43] most robots do use and then these days people are thinking about a more
[00:23:45] people are thinking about a more intertwined relationship between sense
[00:23:47] intertwined relationship between sense plan and act for example uh you
[00:23:49] plan and act for example uh you shouldn't just sense the world for the
[00:23:50] shouldn't just sense the world for the sake of sensing right like sensing needs
[00:23:52] sake of sensing right like sensing needs to be active so there's this area called
[00:23:54] to be active so there's this area called active perception which is about the
[00:23:56] active perception which is about the fact that i only sense the parts i care
[00:23:59] fact that i only sense the parts i care about and i need to act on and and we
[00:24:01] about and i need to act on and and we should have like this intertwined
[00:24:02] should have like this intertwined relationship between acting and sensing
[00:24:05] relationship between acting and sensing uh or there's another paradigm these
[00:24:06] uh or there's another paradigm these days that basically tries to go from
[00:24:08] days that basically tries to go from images to to actuation and and kind of
[00:24:11] images to to actuation and and kind of like skip that planet
[00:24:13] like skip that planet uh part of it by by replacing by
[00:24:15] uh part of it by by replacing by replacing it by neural networks right so
[00:24:17] replacing it by neural networks right so if i have a machine learning system and
[00:24:19] if i have a machine learning system and i start from images can i get directly a
[00:24:21] i start from images can i get directly a control on my robot
[00:24:24] control on my robot that's another paradigm i'll talk about
[00:24:25] that's another paradigm i'll talk about that a little bit actually in this
[00:24:27] that a little bit actually in this section but let's just consider this
[00:24:29] section but let's just consider this this type of paradigm of sensing
[00:24:30] this type of paradigm of sensing planning and acting
[00:24:32] planning and acting and in this class starting next week
[00:24:34] and in this class starting next week we're going to first talk about search
[00:24:36] we're going to first talk about search and actually as you heard in the video
[00:24:38] and actually as you heard in the video uh we're going to talk about algorithms
[00:24:40] uh we're going to talk about algorithms like ai star that's actually something
[00:24:42] like ai star that's actually something that we will we will discuss next week
[00:24:44] that we will we will discuss next week and ai start was introduced for shaky
[00:24:47] and ai start was introduced for shaky right like the point of it it's actually
[00:24:48] right like the point of it it's actually like an extension of dijkstra's
[00:24:49] like an extension of dijkstra's algorithm and it does have a bunch of
[00:24:51] algorithm and it does have a bunch of heuristics and it's fairly fast and it
[00:24:54] heuristics and it's fairly fast and it was introduced for for things like
[00:24:55] was introduced for for things like robots moving around and navigating
[00:24:57] robots moving around and navigating around today we do use a lot of sampling
[00:25:00] around today we do use a lot of sampling based type techniques so the algorithm
[00:25:02] based type techniques so the algorithm that you see here running is this
[00:25:04] that you see here running is this algorithm called rrt star which is
[00:25:06] algorithm called rrt star which is similar to a star to some extent but
[00:25:08] similar to a star to some extent but it's more of a sampling based algorithm
[00:25:10] it's more of a sampling based algorithm creating this tree this dense tree and
[00:25:12] creating this tree this dense tree and navigating along along the right lines
[00:25:14] navigating along along the right lines of tree so so the type of things we will
[00:25:16] of tree so so the type of things we will be talking about next week search
[00:25:18] be talking about next week search actually falls a bunch like in planning
[00:25:21] actually falls a bunch like in planning for robots right like when you're
[00:25:22] for robots right like when you're planning for robots you should think
[00:25:23] planning for robots you should think about searching in that space how do you
[00:25:25] about searching in that space how do you get from one location space to another
[00:25:28] get from one location space to another location or how do we get from one robot
[00:25:30] location or how do we get from one robot configuration to another robot
[00:25:32] configuration to another robot configuration
[00:25:34] configuration following search the week after we're
[00:25:36] following search the week after we're going to talk about mvps and games right
[00:25:38] going to talk about mvps and games right mvps or markup decision processes
[00:25:41] mvps or markup decision processes basically the idea there is the world
[00:25:43] basically the idea there is the world has uncertainty and we should actually
[00:25:45] has uncertainty and we should actually model those probabilities and
[00:25:47] model those probabilities and uncertainties and that commonly shows up
[00:25:49] uncertainties and that commonly shows up when you think about robots interacting
[00:25:51] when you think about robots interacting with each other or with the world in
[00:25:52] with each other or with the world in general right like when you have dynamic
[00:25:54] general right like when you have dynamic environments around you when you have a
[00:25:56] environments around you when you have a self-driving car driving right next to
[00:25:58] self-driving car driving right next to other cars right that that you can model
[00:26:00] other cars right that that you can model that as an mvp and similarly if you
[00:26:03] that as an mvp and similarly if you think about this interaction with
[00:26:05] think about this interaction with another intelligent agent you can model
[00:26:07] another intelligent agent you can model that as a game and then we will be
[00:26:08] that as a game and then we will be discussing that in a couple of weeks and
[00:26:11] discussing that in a couple of weeks and these ideas show up a lot in robotics so
[00:26:14] these ideas show up a lot in robotics so here uh the video on the left basically
[00:26:16] here uh the video on the left basically shows two robots that are trying to
[00:26:18] shows two robots that are trying to coordinate with each other uh and what
[00:26:20] coordinate with each other uh and what they're trying to do is they're trying
[00:26:22] they're trying to do is they're trying to move this rod together but the
[00:26:24] to move this rod together but the interesting thing is that they're
[00:26:26] interesting thing is that they're decentralized they don't talk to each
[00:26:28] decentralized they don't talk to each other and they have different
[00:26:29] other and they have different observabilities so the robot in the
[00:26:31] observabilities so the robot in the front can see the the
[00:26:33] front can see the the books here and the robot in the back can
[00:26:35] books here and the robot in the back can only see the boxes and just because of
[00:26:38] only see the boxes and just because of the forces and the feel of the forces
[00:26:40] the forces and the feel of the forces they can understand what the other agent
[00:26:42] they can understand what the other agent is doing they can learn what the other
[00:26:43] is doing they can learn what the other agent's policy is and coordinate with
[00:26:45] agent's policy is and coordinate with that agent to do this collaborative type
[00:26:48] that agent to do this collaborative type of maneuver
[00:26:50] of maneuver here's another example uh this is also
[00:26:52] here's another example uh this is also from
[00:26:53] from my lab so uh here what we're looking at
[00:26:56] my lab so uh here what we're looking at is we're looking at two robots playing
[00:26:58] is we're looking at two robots playing your hockey um and and here we are again
[00:27:01] your hockey um and and here we are again like having this paradigm this game
[00:27:02] like having this paradigm this game theoretic paradigm of two robots trying
[00:27:04] theoretic paradigm of two robots trying to coordinate with each other there's a
[00:27:06] to coordinate with each other there's a bit of more learning happening here so
[00:27:08] bit of more learning happening here so the robots are trying to again learn the
[00:27:10] the robots are trying to again learn the policy of the other agent or
[00:27:11] policy of the other agent or representation of the policy of the
[00:27:13] representation of the policy of the other robot
[00:27:14] other robot and based on that like kind of trick the
[00:27:17] and based on that like kind of trick the other agent or influence the other agent
[00:27:19] other agent or influence the other agent and win this air hockey game
[00:27:22] and win this air hockey game so okay so mvps and games pretty much
[00:27:24] so okay so mvps and games pretty much show up for any type of interactive
[00:27:27] show up for any type of interactive systems and as robots are leaving
[00:27:29] systems and as robots are leaving factory floors they have more and more
[00:27:30] factory floors they have more and more interactions with people or with other
[00:27:32] interactions with people or with other agents and they're super useful again
[00:27:34] agents and they're super useful again for for the planning part of robotics
[00:27:38] for for the planning part of robotics we will see bayesian networks
[00:27:39] we will see bayesian networks immediately after that asian networks
[00:27:41] immediately after that asian networks again are super useful when it comes to
[00:27:44] again are super useful when it comes to things like mapping and estimation uh so
[00:27:47] things like mapping and estimation uh so here there's this algorithm called
[00:27:49] here there's this algorithm called simultaneous localization and mapping
[00:27:51] simultaneous localization and mapping slam
[00:27:52] slam and basically the idea of it is when you
[00:27:54] and basically the idea of it is when you go to this new environment and you don't
[00:27:56] go to this new environment and you don't know anything about this new environment
[00:27:57] know anything about this new environment you're going to sample points and based
[00:27:59] you're going to sample points and based on that you're going to create a map and
[00:28:01] on that you're going to create a map and navigate yourself around it and estimate
[00:28:03] navigate yourself around it and estimate where you are so a lot of ideas around
[00:28:06] where you are so a lot of ideas around bayesian networks shows up here uh
[00:28:08] bayesian networks shows up here uh actually in one of the homeworks uh
[00:28:10] actually in one of the homeworks uh we're going to look at uh things like
[00:28:13] we're going to look at uh things like particle filters and things like state
[00:28:15] particle filters and things like state estimation
[00:28:16] estimation and and based on that
[00:28:18] and and based on that how
[00:28:19] how how do we use ideas from bayesian
[00:28:20] how do we use ideas from bayesian networks uh to do better estimation of
[00:28:23] networks uh to do better estimation of where we are where other agents are and
[00:28:25] where we are where other agents are and again that is super useful for any
[00:28:27] again that is super useful for any robotic system that tries to like do
[00:28:29] robotic system that tries to like do anything in the world basically okay
[00:28:32] anything in the world basically okay all right so a lot of that was around
[00:28:34] all right so a lot of that was around planning uh logic is also another topic
[00:28:36] planning uh logic is also another topic we will discuss in this class it is not
[00:28:38] we will discuss in this class it is not as much used in robotics but it does
[00:28:41] as much used in robotics but it does show up at various places so here this
[00:28:44] show up at various places so here this is actually a work by chris gazitz group
[00:28:46] is actually a work by chris gazitz group uh and um
[00:28:48] uh and um this is called
[00:28:50] this is called ltl mop
[00:28:51] ltl mop uh which is basically a tool and the
[00:28:54] uh which is basically a tool and the idea here is they try to get this robot
[00:28:56] idea here is they try to get this robot navigate the space and go to various
[00:28:58] navigate the space and go to various squares here
[00:29:00] squares here and while doing that it tries to satisfy
[00:29:03] and while doing that it tries to satisfy some logical formula so instead of
[00:29:05] some logical formula so instead of giving it an objective a loss function
[00:29:07] giving it an objective a loss function and then doing gradient descent let's
[00:29:08] and then doing gradient descent let's say on that and try to come up with a
[00:29:10] say on that and try to come up with a policy what this robot does is it gets
[00:29:12] policy what this robot does is it gets that logic formula and based on that it
[00:29:15] that logic formula and based on that it creates a plan on how to navigate this
[00:29:17] creates a plan on how to navigate this space and why would anyone want to do
[00:29:20] space and why would anyone want to do that well the reason why would anyone
[00:29:21] that well the reason why would anyone want to do that is if you have that
[00:29:23] want to do that is if you have that logic formula you can actually prove
[00:29:25] logic formula you can actually prove things about this robot you can actually
[00:29:27] things about this robot you can actually prove that if it would satisfy the
[00:29:29] prove that if it would satisfy the specification or if it wouldn't satisfy
[00:29:31] specification or if it wouldn't satisfy the specification and again that is very
[00:29:34] the specification and again that is very useful when you think about safety of
[00:29:36] useful when you think about safety of let's say autonomous cars right if you
[00:29:37] let's say autonomous cars right if you want to prove that your autonomous cars
[00:29:39] want to prove that your autonomous cars are safe
[00:29:40] are safe you would need to add a little bit of
[00:29:42] you would need to add a little bit of logic there you would need to think
[00:29:43] logic there you would need to think about how that can be used in planning
[00:29:46] about how that can be used in planning and also it also helps with transparency
[00:29:48] and also it also helps with transparency because there is a smaller gap between
[00:29:50] because there is a smaller gap between let's say natural language and temporal
[00:29:53] let's say natural language and temporal logic which is this logical language
[00:29:55] logic which is this logical language that they're using here and that smaller
[00:29:58] that they're using here and that smaller gap can help us like have a more
[00:29:59] gap can help us like have a more transparent and
[00:30:01] transparent and clear idea of what the robot is doing
[00:30:03] clear idea of what the robot is doing here okay
[00:30:05] here okay all right so that was all planning and
[00:30:08] all right so that was all planning and all the topics that we are discussing in
[00:30:10] all the topics that we are discussing in this class
[00:30:11] this class we are currently talking about machine
[00:30:13] we are currently talking about machine learning a common place that machine
[00:30:15] learning a common place that machine learning shows up in robotics is on the
[00:30:18] learning shows up in robotics is on the sensing side of things so you sense the
[00:30:20] sensing side of things so you sense the world and based on that you you perceive
[00:30:22] world and based on that you you perceive the role so
[00:30:23] the role so perception and vision is is a big part
[00:30:26] perception and vision is is a big part of robotics and uh a lot of that is done
[00:30:29] of robotics and uh a lot of that is done using machine learning these days right
[00:30:31] using machine learning these days right so so you have a machine learning
[00:30:33] so so you have a machine learning network that basically tries to do
[00:30:35] network that basically tries to do object recognition and activity
[00:30:37] object recognition and activity prediction of what other objects around
[00:30:39] prediction of what other objects around you are doing uh or basically like who's
[00:30:42] you are doing uh or basically like who's the owner of the car in this case i
[00:30:44] the owner of the car in this case i think
[00:30:45] think um
[00:30:46] um and what are the other objects around us
[00:30:47] and what are the other objects around us and things of those forms so that's a
[00:30:49] and things of those forms so that's a very common place that machine learning
[00:30:51] very common place that machine learning shows up in robotics and the sensing and
[00:30:53] shows up in robotics and the sensing and perception side of things
[00:30:56] perception side of things and you might ask about this acting part
[00:30:57] and you might ask about this acting part well what goes in the acting part so
[00:30:59] well what goes in the acting part so it's not just ai that shows up in
[00:31:01] it's not just ai that shows up in robotics a bunch of other fields also
[00:31:03] robotics a bunch of other fields also show up in robotics specifically control
[00:31:05] show up in robotics specifically control theory and optimization is kind of like
[00:31:07] theory and optimization is kind of like the core of this acting component of
[00:31:09] the core of this acting component of this architecture of sense plan and act
[00:31:12] this architecture of sense plan and act um and and basically the idea is you
[00:31:14] um and and basically the idea is you might again have an objective like
[00:31:16] might again have an objective like following a trajectory and you actually
[00:31:18] following a trajectory and you actually want to put the right control the right
[00:31:20] want to put the right control the right accelerations and fraud on the steering
[00:31:22] accelerations and fraud on the steering angle on your autonomous car in this
[00:31:24] angle on your autonomous car in this case and get your autonomous car
[00:31:27] case and get your autonomous car navigators
[00:31:29] navigators here
[00:31:30] here and a lot of that a core of that is
[00:31:32] and a lot of that a core of that is actually done using ideas around control
[00:31:34] actually done using ideas around control theory more recently people have been
[00:31:36] theory more recently people have been using ideas around machine learning here
[00:31:38] using ideas around machine learning here too so like adding ideas around deep
[00:31:40] too so like adding ideas around deep learning uh for actually planning and
[00:31:43] learning uh for actually planning and getting the car to to navigate in this
[00:31:45] getting the car to to navigate in this space
[00:31:46] space so those are some of the core ideas as i
[00:31:49] so those are some of the core ideas as i mentioned earlier there are some other
[00:31:51] mentioned earlier there are some other paradigms of the sense planning act
[00:31:54] paradigms of the sense planning act one specific paradigm which i think is
[00:31:56] one specific paradigm which i think is pretty interesting is to use machine
[00:31:59] pretty interesting is to use machine learning to do kind of what is all of
[00:32:01] learning to do kind of what is all of the sense plan and act by by trying to
[00:32:04] the sense plan and act by by trying to learn from humans
[00:32:06] learn from humans so so this is commonly referred to as
[00:32:08] so so this is commonly referred to as learning from demonstrations or
[00:32:10] learning from demonstrations or imitation learning and the idea of it is
[00:32:13] imitation learning and the idea of it is if i just watch how humans do things
[00:32:16] if i just watch how humans do things then from that i can just directly
[00:32:18] then from that i can just directly figure out what their objective was or
[00:32:19] figure out what their objective was or what their policy is
[00:32:21] what their policy is and then the idea has been around since
[00:32:23] and then the idea has been around since like 2000s in the area of robotics
[00:32:26] like 2000s in the area of robotics the work on the left that i want to show
[00:32:28] the work on the left that i want to show is actually like this idea directly
[00:32:30] is actually like this idea directly being being applied um to robotics this
[00:32:33] being being applied um to robotics this is work that peter rubio and andrew ing
[00:32:35] is work that peter rubio and andrew ing did in 2004 at stanford um and basically
[00:32:39] did in 2004 at stanford um and basically the idea is um
[00:32:41] the idea is um there are basically these helicopters
[00:32:43] there are basically these helicopters that before then um it was really hard
[00:32:46] that before then um it was really hard to fly them by just using control like
[00:32:48] to fly them by just using control like by just using like ai and control it was
[00:32:50] by just using like ai and control it was actually just really hard to to to fly
[00:32:53] actually just really hard to to to fly them
[00:32:53] them and then there are these pilots that can
[00:32:55] and then there are these pilots that can fly these helicopters just much more
[00:32:57] fly these helicopters just much more easily much more simply and and
[00:33:00] easily much more simply and and basically the idea is uh could we use
[00:33:02] basically the idea is uh could we use that here uh to get this horizontal fly
[00:33:06] that here uh to get this horizontal fly in this space so this is
[00:33:16] and by watching a human pilot learn to
[00:33:18] and by watching a human pilot learn to fly it then learns to fly by itself
[00:33:21] fly it then learns to fly by itself and so what it does is watch the person
[00:33:24] and so what it does is watch the person fly and then um they'll try to fly the
[00:33:26] fly and then um they'll try to fly the same stunt movements by itself and maybe
[00:33:29] same stunt movements by itself and maybe try a few times until instead of nails
[00:33:32] try a few times until instead of nails and what you're seeing is the end result
[00:33:34] and what you're seeing is the end result of this machine learning process called
[00:33:36] of this machine learning process called apprenticeship
[00:33:39] so that was this idea of apprenticeship
[00:33:41] so that was this idea of apprenticeship learning where kind of like for the
[00:33:43] learning where kind of like for the first time like people were able to fly
[00:33:45] first time like people were able to fly these quadcopters autonomously
[00:33:48] these quadcopters autonomously but by learning from human pilots by
[00:33:50] but by learning from human pilots by learning from human experts and that
[00:33:52] learning from human experts and that idea has been around in research and
[00:33:54] idea has been around in research and people in general have been thinking
[00:33:55] people in general have been thinking about how we can learn from humans how
[00:33:57] about how we can learn from humans how can we get robots to act in the world by
[00:34:00] can we get robots to act in the world by directly learning from humans but not
[00:34:02] directly learning from humans but not just from demonstrations but also from
[00:34:05] just from demonstrations but also from things like asking their preferences or
[00:34:07] things like asking their preferences or language instructions uh so in my lab
[00:34:09] language instructions uh so in my lab actually we're doing
[00:34:11] actually we're doing a lot of work in this domain where we're
[00:34:13] a lot of work in this domain where we're looking at preference-based learning and
[00:34:15] looking at preference-based learning and actively basically querying people for
[00:34:18] actively basically querying people for what their preference trajectory is and
[00:34:20] what their preference trajectory is and in order to basically get a robot to
[00:34:23] in order to basically get a robot to play some version of mini golf here like
[00:34:25] play some version of mini golf here like targeting for one of these balls and get
[00:34:27] targeting for one of these balls and get the robot to actually hit the ball
[00:34:28] the robot to actually hit the ball correctly so it gets in the right goal
[00:34:31] correctly so it gets in the right goal and that's kind of exciting because you
[00:34:32] and that's kind of exciting because you can learn from all sorts of human
[00:34:34] can learn from all sorts of human feedback right you can learn from
[00:34:36] feedback right you can learn from demonstrations comparisons um language
[00:34:39] demonstrations comparisons um language and from that you are able to basically
[00:34:42] and from that you are able to basically get a robot to an interesting type of
[00:34:44] get a robot to an interesting type of maneuver
[00:34:45] maneuver another place that machine learning
[00:34:46] another place that machine learning shows up is again like by combining this
[00:34:49] shows up is again like by combining this sensing plan and act into one giant box
[00:34:52] sensing plan and act into one giant box right like starting from visual data and
[00:34:54] right like starting from visual data and once you start from visual data can you
[00:34:57] once you start from visual data can you try to directly get actions on your
[00:34:59] try to directly get actions on your robots and and this idea of robot
[00:35:01] robots and and this idea of robot learning has been very popular in recent
[00:35:03] learning has been very popular in recent years um this is work from berkeley from
[00:35:06] years um this is work from berkeley from 2015
[00:35:07] 2015 where basically the idea is to get the
[00:35:09] where basically the idea is to get the robot to just try out things and then
[00:35:13] robot to just try out things and then from that from these images and and kind
[00:35:15] from that from these images and and kind of joint angles of what the robot tries
[00:35:18] of joint angles of what the robot tries from that it learns how to achieve the
[00:35:19] from that it learns how to achieve the task and how to do the test it's a very
[00:35:22] task and how to do the test it's a very data intensive type of process but
[00:35:24] data intensive type of process but there's a lot of excitement because you
[00:35:26] there's a lot of excitement because you can achieve things that you weren't able
[00:35:27] can achieve things that you weren't able to achieve before
[00:35:29] to achieve before and we have seen a lot of advances in
[00:35:30] and we have seen a lot of advances in machine learning uh when it's applied to
[00:35:32] machine learning uh when it's applied to let's say nlp and vision places where we
[00:35:35] let's say nlp and vision places where we have a lot of data we don't have that
[00:35:37] have a lot of data we don't have that much data in robotics but if that is the
[00:35:39] much data in robotics but if that is the bottleneck then maybe we can create an
[00:35:41] bottleneck then maybe we can create an arm farm kind of like this this image
[00:35:44] arm farm kind of like this this image this video here
[00:35:46] this video here and and create an arm form and basically
[00:35:48] and and create an arm form and basically just collect lots of lots of data of
[00:35:50] just collect lots of lots of data of robots basically like navigating in this
[00:35:52] robots basically like navigating in this space moving randomly inside of this box
[00:35:55] space moving randomly inside of this box and from that learn how to just grasp
[00:35:58] and from that learn how to just grasp any object that we see
[00:35:59] any object that we see um this is actually worked by uh google
[00:36:02] um this is actually worked by uh google robotics so google x has a sub group
[00:36:04] robotics so google x has a sub group google robotics that does a lot of work
[00:36:07] google robotics that does a lot of work on robot learning a lot of interesting
[00:36:08] on robot learning a lot of interesting work
[00:36:09] work and yeah like very intensive you need to
[00:36:12] and yeah like very intensive you need to very compute intensive
[00:36:13] very compute intensive but lots of excitement around this idea
[00:36:16] but lots of excitement around this idea of directly learning actions uh from
[00:36:18] of directly learning actions uh from sensing data okay
[00:36:22] all right
[00:36:24] all right so
[00:36:24] so let me move to oh okay so so those were
[00:36:27] let me move to oh okay so so those were all the things i wanted to say on how ai
[00:36:29] all the things i wanted to say on how ai is used in robotics but ai is not the
[00:36:32] is used in robotics but ai is not the only thing that is used in robotics um
[00:36:35] only thing that is used in robotics um there are a bunch of other things like
[00:36:36] there are a bunch of other things like as you can as you've probably noticed
[00:36:38] as you can as you've probably noticed robotics spans a bunch of different
[00:36:39] robotics spans a bunch of different different departments
[00:36:41] different departments for example you see robotics in
[00:36:42] for example you see robotics in mechanical engineering and that has a
[00:36:44] mechanical engineering and that has a very different view of robotics and and
[00:36:47] very different view of robotics and and that view is usually focused on design
[00:36:49] that view is usually focused on design and co-design which is a super important
[00:36:51] and co-design which is a super important problem right so if you're thinking
[00:36:52] problem right so if you're thinking about building an arm or building a hand
[00:36:55] about building an arm or building a hand that needs to do very precise
[00:36:56] that needs to do very precise manipulation what type of sensors you're
[00:36:58] manipulation what type of sensors you're using how are you building this system
[00:37:00] using how are you building this system like those are all really good questions
[00:37:02] like those are all really good questions like how do we make sure that you build
[00:37:04] like how do we make sure that you build a prosthetic that is not too heavy and
[00:37:06] a prosthetic that is not too heavy and it's also comfortable and is also very
[00:37:08] it's also comfortable and is also very safe for the person to use for walking
[00:37:10] safe for the person to use for walking like these are all really great
[00:37:11] like these are all really great questions that are usually like design
[00:37:13] questions that are usually like design questions that are super important in
[00:37:16] questions that are super important in robotics
[00:37:17] robotics um another reason that this is important
[00:37:19] um another reason that this is important is that there's a whole new area around
[00:37:21] is that there's a whole new area around co-design which basically says well for
[00:37:25] co-design which basically says well for whatever hardware we pick there's going
[00:37:27] whatever hardware we pick there's going to be some ai algorithm but if i change
[00:37:29] to be some ai algorithm but if i change that hardware my ai algorithm is going
[00:37:31] that hardware my ai algorithm is going to change and if i change my ai
[00:37:33] to change and if i change my ai algorithm like that could run on a
[00:37:35] algorithm like that could run on a different hardware very differently so
[00:37:37] different hardware very differently so can we design both of these at the same
[00:37:38] can we design both of these at the same time we design what our robot should
[00:37:41] time we design what our robot should look like and what algorithm should run
[00:37:43] look like and what algorithm should run should run on them kind of at the same
[00:37:45] should run on them kind of at the same time or can we have reconfigurable
[00:37:47] time or can we have reconfigurable robots and a lot of excitement around
[00:37:49] robots and a lot of excitement around this area in general when you think
[00:37:51] this area in general when you think about design and co-design of these
[00:37:53] about design and co-design of these systems
[00:37:54] systems i just want to show a few other cool
[00:37:56] i just want to show a few other cool designs that are out there that are very
[00:37:59] designs that are out there that are very impressive
[00:38:00] impressive um one design that i want to show is
[00:38:02] um one design that i want to show is this robot on the left
[00:38:04] this robot on the left this robot on the left is is a dynamic
[00:38:07] this robot on the left is is a dynamic integrity structure so what it is is it
[00:38:10] integrity structure so what it is is it basically has a bunch of rigid rigid
[00:38:12] basically has a bunch of rigid rigid links like these guys
[00:38:14] links like these guys and then these links are connected by
[00:38:17] and then these links are connected by things that are kind of like ropes
[00:38:20] things that are kind of like ropes and it's kind of like a funny structure
[00:38:22] and it's kind of like a funny structure but the interesting thing about it is it
[00:38:25] but the interesting thing about it is it is shock resistant so like nasa really
[00:38:27] is shock resistant so like nasa really cares about this robot because landing
[00:38:29] cares about this robot because landing robots on the surface of mars is really
[00:38:31] robots on the surface of mars is really difficult
[00:38:32] difficult but if i just drop this robot like
[00:38:35] but if i just drop this robot like nothing happens to it and it can just
[00:38:36] nothing happens to it and it can just roll around and continue moving forward
[00:38:39] roll around and continue moving forward which is again a very interesting design
[00:38:41] which is again a very interesting design for something that you don't want to to
[00:38:43] for something that you don't want to to break because robots in general are
[00:38:45] break because robots in general are pretty rigid and and this robot is very
[00:38:47] pretty rigid and and this robot is very flexible so there's a lot of interest in
[00:38:49] flexible so there's a lot of interest in basically building robots that are
[00:38:51] basically building robots that are softer that are less frigid that are
[00:38:53] softer that are less frigid that are flexible and this is an example of that
[00:38:56] flexible and this is an example of that another robot that i think is kind of
[00:38:58] another robot that i think is kind of fun i'll show this a little bit later a
[00:39:00] fun i'll show this a little bit later a video of it um there's this robot from
[00:39:02] video of it um there's this robot from stanford this is from mark kotkowski's
[00:39:04] stanford this is from mark kotkowski's lab and it's called stickybot it's it's
[00:39:07] lab and it's called stickybot it's it's basically a robot that has
[00:39:09] basically a robot that has kind of gekko-like hands so it's hands
[00:39:12] kind of gekko-like hands so it's hands are inspired by like hands of gecko and
[00:39:14] are inspired by like hands of gecko and having like these suction cups and
[00:39:16] having like these suction cups and because of that it can climb up like
[00:39:18] because of that it can climb up like walls and really like slippery like
[00:39:20] walls and really like slippery like slopes um which is again a very
[00:39:23] slopes um which is again a very interesting design
[00:39:25] interesting design another design that i find super
[00:39:26] another design that i find super interesting is uh this inflatable snake
[00:39:28] interesting is uh this inflatable snake robot this is from allison nakamura's
[00:39:30] robot this is from allison nakamura's lab again in the mechanical engineering
[00:39:32] lab again in the mechanical engineering at stanford and the idea here is that
[00:39:35] at stanford and the idea here is that this robot can inflate itself as it goes
[00:39:38] this robot can inflate itself as it goes through different different parts of the
[00:39:40] through different different parts of the environment so it might be really
[00:39:42] environment so it might be really difficult for example to go through go
[00:39:43] difficult for example to go through go through this hole but like as the robot
[00:39:45] through this hole but like as the robot is inflating it's actually making its
[00:39:47] is inflating it's actually making its way through different like kind of
[00:39:49] way through different like kind of narrow spaces and getting to various
[00:39:51] narrow spaces and getting to various areas um of the space
[00:39:54] areas um of the space this can also be used inside of our body
[00:39:56] this can also be used inside of our body for example when we are doing like more
[00:39:57] for example when we are doing like more intelligence type of endoscopy like we
[00:39:59] intelligence type of endoscopy like we can actually send some of these robots
[00:40:01] can actually send some of these robots inside of the body and navigate a little
[00:40:02] inside of the body and navigate a little bit better inside of our body for
[00:40:04] bit better inside of our body for robotic surgery for things like
[00:40:07] robotic surgery for things like endoscopies and so on
[00:40:10] endoscopies and so on i can take any questions now actually
[00:40:13] i can take any questions now actually about any of this
[00:40:16] about any of this uh a quick question so you mentioned
[00:40:17] uh a quick question so you mentioned that robotic counseling stuff like
[00:40:19] that robotic counseling stuff like computer revision machine learning
[00:40:21] computer revision machine learning working by the system so what do you
[00:40:23] working by the system so what do you recommend for people who want to go into
[00:40:25] recommend for people who want to go into the field
[00:40:26] the field uh do you study all
[00:40:29] uh do you study all control of their novels or is there
[00:40:32] control of their novels or is there was a role
[00:40:34] was a role yeah that's a very good question yeah so
[00:40:36] yeah that's a very good question yeah so it kind of seems like a giant thing
[00:40:38] it kind of seems like a giant thing right like because it incorporates like
[00:40:40] right like because it incorporates like everything and and the field of robotics
[00:40:42] everything and and the field of robotics in general when you go to the
[00:40:43] in general when you go to the conferences it is interesting because
[00:40:45] conferences it is interesting because you see people from all these different
[00:40:47] you see people from all these different fields but but they're coming together
[00:40:49] fields but but they're coming together for the same problem but not the same
[00:40:51] for the same problem but not the same technique which is which is a very
[00:40:53] technique which is which is a very interesting kind of fields to be in but
[00:40:55] interesting kind of fields to be in but now at the end of the day like everyone
[00:40:57] now at the end of the day like everyone just like focuses on their expertise and
[00:41:00] just like focuses on their expertise and then brings it together as part of a
[00:41:02] then brings it together as part of a team so for example robotics and cs like
[00:41:05] team so for example robotics and cs like at stanford i'll be talking about that
[00:41:06] at stanford i'll be talking about that actually a little bit there's a lot
[00:41:08] actually a little bit there's a lot focused on developing ai algorithms
[00:41:10] focused on developing ai algorithms developing algorithmic side of this but
[00:41:12] developing algorithmic side of this but they don't really like to as much on on
[00:41:14] they don't really like to as much on on the design side but robotics in the
[00:41:16] the design side but robotics in the mechanical engineering is very focused
[00:41:17] mechanical engineering is very focused on building new designs new structures
[00:41:20] on building new designs new structures and yeah we do have a lot of joint
[00:41:21] and yeah we do have a lot of joint projects where we use new designs and
[00:41:23] projects where we use new designs and try to develop new algorithms for them
[00:41:25] try to develop new algorithms for them and lots of collaboration in these
[00:41:27] and lots of collaboration in these different fields so it is a very
[00:41:29] different fields so it is a very interdisciplinary field but yeah as i
[00:41:31] interdisciplinary field but yeah as i said it might seem too large it's not
[00:41:33] said it might seem too large it's not that large at the end of the day
[00:41:34] that large at the end of the day everyone focuses on the thing that
[00:41:36] everyone focuses on the thing that they're actually like very interested in
[00:41:38] they're actually like very interested in or the same goal of building robots
[00:41:43] i have a question
[00:41:46] i have a question what subfields of robotics would be
[00:41:48] what subfields of robotics would be really great to go into a startup for
[00:41:50] really great to go into a startup for right now
[00:41:51] right now uh so to start it right is that what you
[00:41:53] uh so to start it right is that what you yeah so that's a good question to ask um
[00:41:56] yeah so that's a good question to ask um yeah so i think there's a lot of
[00:41:57] yeah so i think there's a lot of excitement around autonomous driving
[00:41:58] excitement around autonomous driving right so and autonomous driving these
[00:42:00] right so and autonomous driving these days is very focused on like vision and
[00:42:03] days is very focused on like vision and machine learning and control theory so
[00:42:05] machine learning and control theory so like those three kind of like
[00:42:06] like those three kind of like backgrounds are commonly used in
[00:42:08] backgrounds are commonly used in autonomous driving but beyond autonomous
[00:42:11] autonomous driving but beyond autonomous driving i think um again a lot of big
[00:42:13] driving i think um again a lot of big companies are using them so it's not
[00:42:15] companies are using them so it's not necessarily startups but beyond that
[00:42:17] necessarily startups but beyond that yeah so people are very interested in
[00:42:18] yeah so people are very interested in domestic robots these days right
[00:42:20] domestic robots these days right actually like getting robots inside
[00:42:22] actually like getting robots inside people's houses which wasn't the case
[00:42:25] people's houses which wasn't the case like a few years back even right like we
[00:42:26] like a few years back even right like we have robots functioning very well on
[00:42:29] have robots functioning very well on factory floors but having robots in our
[00:42:31] factory floors but having robots in our homes is a big problem and there were
[00:42:33] homes is a big problem and there were some startups that weren't very
[00:42:34] some startups that weren't very successful so it's kind of like a
[00:42:36] successful so it's kind of like a kind of like an edgy area to to be in
[00:42:39] kind of like an edgy area to to be in but i think there's a lot of excitement
[00:42:40] but i think there's a lot of excitement there like uh home automation like
[00:42:43] there like uh home automation like things like kind of next generation
[00:42:45] things like kind of next generation chromebooks or other things that we can
[00:42:46] chromebooks or other things that we can have in in homes and again for that i
[00:42:49] have in in homes and again for that i think like a lot of these systems are
[00:42:50] think like a lot of these systems are using machine learning ai in general so
[00:42:53] using machine learning ai in general so that is again like a huge like
[00:42:54] that is again like a huge like background that you would want to have
[00:42:56] background that you would want to have but the design is pretty important there
[00:42:58] but the design is pretty important there so like with the type of like hardware
[00:43:00] so like with the type of like hardware design that you use there is actually
[00:43:01] design that you use there is actually super important
[00:43:03] super important yeah and healthcare like like kind of
[00:43:05] yeah and healthcare like like kind of like thinking about robotics being used
[00:43:06] like thinking about robotics being used in healthcare in hospitals and things of
[00:43:08] in healthcare in hospitals and things of those form i think that is also a very
[00:43:10] those form i think that is also a very exciting area
[00:43:13] exciting area thank you
[00:43:14] thank you wanna ask a question
[00:43:16] wanna ask a question uh yeah so he's kind of like follow up
[00:43:18] uh yeah so he's kind of like follow up so he's about like a robotics like
[00:43:20] so he's about like a robotics like research because right now like
[00:43:23] research because right now like you know like after you said previously
[00:43:25] you know like after you said previously be like there's a lot of things like
[00:43:27] be like there's a lot of things like happening in ai stuff and also like
[00:43:29] happening in ai stuff and also like there's also like something happening
[00:43:31] there's also like something happening control theory so
[00:43:33] control theory so is there any research kind of like
[00:43:35] is there any research kind of like moving these two things together because
[00:43:38] moving these two things together because uh for my undergrad i study also
[00:43:40] uh for my undergrad i study also certainly something like control theory
[00:43:41] certainly something like control theory and also ai because i find out like in
[00:43:43] and also ai because i find out like in control theory they have like some
[00:43:45] control theory they have like some really interesting concepts actually
[00:43:47] really interesting concepts actually like stability like observability like
[00:43:51] like stability like observability like all of this
[00:43:53] all of this have you so
[00:43:53] have you so there are any research like for example
[00:43:55] there are any research like for example using this concept for example like a
[00:43:57] using this concept for example like a reinforcement learning center
[00:44:00] reinforcement learning center in like learning site apps which is
[00:44:02] in like learning site apps which is missing like a stability or like
[00:44:04] missing like a stability or like some concepts like this is there any
[00:44:06] some concepts like this is there any research about that yeah there's a lot
[00:44:08] research about that yeah there's a lot of excitement actually around that and i
[00:44:10] of excitement actually around that and i totally agree with that right there is a
[00:44:11] totally agree with that right there is a lot of interesting topics in control
[00:44:12] lot of interesting topics in control theory there are a lot of interesting
[00:44:14] theory there are a lot of interesting topics in ai and machine learning that
[00:44:16] topics in ai and machine learning that are just making their ways into robotics
[00:44:19] are just making their ways into robotics and the two communities um at first
[00:44:21] and the two communities um at first there was a little bit of like clashes
[00:44:23] there was a little bit of like clashes between them i would say but now i think
[00:44:25] between them i would say but now i think there's a lot of like coming together
[00:44:26] there's a lot of like coming together and trying to combine those ideas
[00:44:28] and trying to combine those ideas there's a new conference called learning
[00:44:30] there's a new conference called learning for uh dynamics and control of what we
[00:44:33] for uh dynamics and control of what we see and the whole point of that is
[00:44:35] see and the whole point of that is actually bring in learning people and
[00:44:37] actually bring in learning people and control people together to try to use
[00:44:39] control people together to try to use the same ideas yeah lots of research
[00:44:41] the same ideas yeah lots of research trying to bring learning and control
[00:44:42] trying to bring learning and control together and i think that is actually
[00:44:44] together and i think that is actually the right direction because as you said
[00:44:46] the right direction because as you said lots of interesting ideas in dynamics
[00:44:47] lots of interesting ideas in dynamics and control and i think a lot of those
[00:44:49] and control and i think a lot of those ideas could be used as prior structures
[00:44:53] ideas could be used as prior structures uh that could be put on learning based
[00:44:55] uh that could be put on learning based systems so when you're let's say
[00:44:56] systems so when you're let's say training a neural network you can bring
[00:44:58] training a neural network you can bring in structure that you know about the
[00:44:59] in structure that you know about the system that comes from control theory
[00:45:01] system that comes from control theory let's say
[00:45:03] let's say [Music]
[00:45:06] [Music] i had a question
[00:45:07] i had a question sure
[00:45:08] sure sure i was wondering you know you talked
[00:45:10] sure i was wondering you know you talked about um armed forms and
[00:45:12] about um armed forms and you know collecting lots of data do you
[00:45:14] you know collecting lots of data do you feel like the field
[00:45:16] feel like the field is more data limited or more algorithm
[00:45:19] is more data limited or more algorithm and learning limited i think about when
[00:45:21] and learning limited i think about when i learned to drive
[00:45:22] i learned to drive it wasn't that way it should have been
[00:45:24] it wasn't that way it should have been more data but it wasn't it was just
[00:45:26] more data but it wasn't it was just maybe um a couple weeks of practice
[00:45:29] maybe um a couple weeks of practice um and then i was
[00:45:30] um and then i was ready that's a very good point so um i
[00:45:34] ready that's a very good point so um i think it's a combination i do think the
[00:45:35] think it's a combination i do think the field is very data limited and it's
[00:45:37] field is very data limited and it's interesting because yeah when you learn
[00:45:39] interesting because yeah when you learn to drive right you spend a couple of
[00:45:41] to drive right you spend a couple of weeks but you have seen cars drive right
[00:45:43] weeks but you have seen cars drive right next to you like the learning by
[00:45:45] next to you like the learning by observation which is like a very
[00:45:46] observation which is like a very interesting type of learning you're not
[00:45:48] interesting type of learning you're not learning by doing you're learning by
[00:45:49] learning by doing you're learning by like observing other people like driving
[00:45:51] like observing other people like driving right next to you
[00:45:52] right next to you uh and that has so much information in
[00:45:55] uh and that has so much information in it it's kind of the same problem that i
[00:45:57] it it's kind of the same problem that i mentioned earlier with the house of
[00:45:58] mentioned earlier with the house of cards example right like your autonomous
[00:46:00] cards example right like your autonomous car doesn't know that it's important but
[00:46:02] car doesn't know that it's important but like you would know that you have so
[00:46:04] like you would know that you have so much context like in the world and
[00:46:06] much context like in the world and because of that like i think like the
[00:46:08] because of that like i think like the issue is specifically for autonomous
[00:46:10] issue is specifically for autonomous cars is some of these corner cases so
[00:46:12] cars is some of these corner cases so like driving on a highway like that's
[00:46:13] like driving on a highway like that's like solved right
[00:46:15] like solved right the issue is like some of these corner
[00:46:17] the issue is like some of these corner cases that it hasn't seen yet in the
[00:46:18] cases that it hasn't seen yet in the data and maybe more data will solve that
[00:46:21] data and maybe more data will solve that so i think more data is definitely a
[00:46:23] so i think more data is definitely a problem i think we can still do better
[00:46:25] problem i think we can still do better on our algorithms too but data is still
[00:46:27] on our algorithms too but data is still like i would say that's the higher kind
[00:46:29] like i would say that's the higher kind of issue at least in autonomous driving
[00:46:31] of issue at least in autonomous driving i would say
[00:46:34] i would say on the data side um do you feel that
[00:46:36] on the data side um do you feel that synthetic data could be something that
[00:46:38] synthetic data could be something that would be useful for machine learning
[00:46:39] would be useful for machine learning applications or is that something that's
[00:46:41] applications or is that something that's kind of like gonna always be a fantasy
[00:46:43] kind of like gonna always be a fantasy i think it's super useful right like if
[00:46:45] i think it's super useful right like if you can create near accident driving
[00:46:46] you can create near accident driving scenarios and driving and then kind of
[00:46:48] scenarios and driving and then kind of train your car in those settings and
[00:46:50] train your car in those settings and then just generate that automatically
[00:46:52] then just generate that automatically that would be super useful so then you
[00:46:54] that would be super useful so then you don't need to wait forever to see like
[00:46:55] don't need to wait forever to see like in your accident driving scenario on the
[00:46:57] in your accident driving scenario on the vehicle i think one issue there is like
[00:46:59] vehicle i think one issue there is like this simulation to reality gap which is
[00:47:02] this simulation to reality gap which is which is a big problem specifically for
[00:47:04] which is a big problem specifically for robotics but i do think generation of
[00:47:06] robotics but i do think generation of data is important yeah
[00:47:10] data is important yeah uh hi so i have a question regarding the
[00:47:13] uh hi so i have a question regarding the deterministic of
[00:47:15] deterministic of robotics for example uh in terms of
[00:47:18] robotics for example uh in terms of machine regulations my government
[00:47:19] machine regulations my government usually requires that actions of a
[00:47:22] usually requires that actions of a machine to be
[00:47:23] machine to be state and manufacturers of the machine
[00:47:25] state and manufacturers of the machine sometimes responsible for these actions
[00:47:27] sometimes responsible for these actions there is a machine learning base or deep
[00:47:29] there is a machine learning base or deep learning based algorithms uh they are
[00:47:31] learning based algorithms uh they are statistical by uh by definition so
[00:47:34] statistical by uh by definition so how could
[00:47:36] how could how do we define the responsibility of
[00:47:38] how do we define the responsibility of the um manufacturer of the machines and
[00:47:41] the um manufacturer of the machines and if there's artists that accidentally
[00:47:43] if there's artists that accidentally comes by autonomous driver uh who should
[00:47:45] comes by autonomous driver uh who should be responsible in this case yeah that's
[00:47:48] be responsible in this case yeah that's very good that's a very good question
[00:47:49] very good that's a very good question for tino next week actually so very good
[00:47:52] for tino next week actually so very good question um yeah so the first point that
[00:47:55] question um yeah so the first point that you made that yeah like like all the
[00:47:57] you made that yeah like like all the laws are about about deterministic
[00:47:59] laws are about about deterministic systems that's actually not like always
[00:48:00] systems that's actually not like always the case so for example michael
[00:48:02] the case so for example michael kochenderfer in the aeronastro
[00:48:04] kochenderfer in the aeronastro department he has done a lot of work
[00:48:06] department he has done a lot of work around palm vps that actually run on
[00:48:08] around palm vps that actually run on like aircraft systems so there's this
[00:48:10] like aircraft systems so there's this acas excuse system which is basically
[00:48:13] acas excuse system which is basically unmanned like aircraft
[00:48:15] unmanned like aircraft like like all of the all the motions
[00:48:16] like like all of the all the motions landing and taking off and all of that
[00:48:18] landing and taking off and all of that is done in kind of like an online
[00:48:19] is done in kind of like an online setting but the system is small it's
[00:48:21] setting but the system is small it's upon vp which with like a few states and
[00:48:24] upon vp which with like a few states and all that and you can verify everything
[00:48:25] all that and you can verify everything about it so there's a lot of interesting
[00:48:27] about it so there's a lot of interesting rules around verification and validation
[00:48:29] rules around verification and validation in this space and even if it is not
[00:48:31] in this space and even if it is not deterministic you can't still like
[00:48:33] deterministic you can't still like verify it uh so the small palm bp for
[00:48:35] verify it uh so the small palm bp for example partially observable markup
[00:48:37] example partially observable markup decision process you won't see that in
[00:48:39] decision process you won't see that in the class if you're interested in that
[00:48:40] the class if you're interested in that topic take michael's class but when it
[00:48:42] topic take michael's class but when it comes to neural networks yeah we don't
[00:48:44] comes to neural networks yeah we don't really have that much guarantees around
[00:48:46] really have that much guarantees around it and
[00:48:48] it and there's a lot of again discussion here
[00:48:50] there's a lot of again discussion here so some people are taking the route of
[00:48:51] so some people are taking the route of trying to prove things and trying to
[00:48:53] trying to prove things and trying to verify neural networks clark barrett is
[00:48:56] verify neural networks clark barrett is someone in the cs department who does a
[00:48:58] someone in the cs department who does a lot of work on verification of neural
[00:48:59] lot of work on verification of neural networks again we are limited in the
[00:49:01] networks again we are limited in the size there so we can't have giant neural
[00:49:03] size there so we can't have giant neural networks there
[00:49:05] networks there another kind of perspective on this is
[00:49:07] another kind of perspective on this is giving these statistical guarantees
[00:49:09] giving these statistical guarantees right like if my autonomous car is safer
[00:49:11] right like if my autonomous car is safer than humans statistically maybe that is
[00:49:13] than humans statistically maybe that is good enough and we are okay with some
[00:49:14] good enough and we are okay with some number of accidents some number of times
[00:49:17] number of accidents some number of times maybe you would be okay with that
[00:49:19] maybe you would be okay with that um and some of it is yeah like
[00:49:20] um and some of it is yeah like acceptance issues too right like the
[00:49:22] acceptance issues too right like the first like aircraft that were out there
[00:49:24] first like aircraft that were out there probably not safe probably people were
[00:49:26] probably not safe probably people were okay with that and the number of deaths
[00:49:28] okay with that and the number of deaths were like higher right and and i think
[00:49:30] were like higher right and and i think there's a little bit of that acceptance
[00:49:32] there's a little bit of that acceptance issue for how this is going to pan out
[00:49:34] issue for how this is going to pan out but it is actually a very good question
[00:49:36] but it is actually a very good question of how are we going to regulate and who
[00:49:38] of how are we going to regulate and who is going to be at plane uh one can take
[00:49:40] is going to be at plane uh one can take tesla's approach and be like a human is
[00:49:43] tesla's approach and be like a human is always in control so if anything happens
[00:49:45] always in control so if anything happens it was it was a human's fault which is
[00:49:46] it was it was a human's fault which is kind of like a weird type of type of
[00:49:49] kind of like a weird type of type of approach i would say um it's not
[00:49:52] approach i would say um it's not necessarily the safest way to go uh but
[00:49:55] necessarily the safest way to go uh but yeah i don't honestly have good answers
[00:49:56] yeah i don't honestly have good answers for this something to ask tino next week
[00:50:03] i have a question with regards to the
[00:50:05] i have a question with regards to the co-design
[00:50:07] co-design so you mentioned about the hardware and
[00:50:09] so you mentioned about the hardware and the ai algorithm needs to go hand in
[00:50:12] the ai algorithm needs to go hand in hand
[00:50:13] hand so
[00:50:15] so for example the self-driving car has to
[00:50:18] for example the self-driving car has to make you know the algorithms have to
[00:50:20] make you know the algorithms have to make
[00:50:21] make a quick decisions with the real time
[00:50:24] a quick decisions with the real time change in the environment
[00:50:26] change in the environment and if the algorithms take a long time
[00:50:28] and if the algorithms take a long time to run and i know the hardware wise
[00:50:31] to run and i know the hardware wise there are a variety of uh
[00:50:34] there are a variety of uh ways the algorithm can be put on let's
[00:50:37] ways the algorithm can be put on let's say on a gpu or on other aspects so
[00:50:41] say on a gpu or on other aspects so would you please uh
[00:50:43] would you please uh are there any uh pointers towards how
[00:50:46] are there any uh pointers towards how these two
[00:50:48] these two go hand in hand and what is the
[00:50:52] go hand in hand and what is the best so i was more thinking about it in
[00:50:55] best so i was more thinking about it in an offline kind of fashion right so in
[00:50:57] an offline kind of fashion right so in an offline kind of fashion you can have
[00:50:59] an offline kind of fashion you can have like a fancy algorithm that does like
[00:51:01] like a fancy algorithm that does like everything like that has that takes a
[00:51:02] everything like that has that takes a lot of compute let's say for a hardware
[00:51:05] lot of compute let's say for a hardware that is very simple or you can kind of
[00:51:06] that is very simple or you can kind of like increase complexity of your
[00:51:08] like increase complexity of your hardware and on the other hand have a
[00:51:10] hardware and on the other hand have a really simple algorithm that runs on it
[00:51:12] really simple algorithm that runs on it one place that so i wasn't really
[00:51:13] one place that so i wasn't really thinking about the online aspect of it
[00:51:16] thinking about the online aspect of it which you're right like yeah they're
[00:51:17] which you're right like yeah they're like running at different frequencies so
[00:51:19] like running at different frequencies so how could they work together uh one
[00:51:22] how could they work together uh one example of that is an assistive robotics
[00:51:24] example of that is an assistive robotics assistive teleoperation when you're
[00:51:26] assistive teleoperation when you're using a joystick to control a robot arm
[00:51:28] using a joystick to control a robot arm this is something that you work on
[00:51:30] this is something that you work on commonly and you can make the hardware
[00:51:32] commonly and you can make the hardware very interesting very intricate like use
[00:51:34] very interesting very intricate like use haptic devices and then be able to
[00:51:36] haptic devices and then be able to control things like much easier on the
[00:51:38] control things like much easier on the other hand you can have a hardware
[00:51:39] other hand you can have a hardware that's really simple so for example
[00:51:42] that's really simple so for example there are these sip and puff devices uh
[00:51:44] there are these sip and puff devices uh that a lot of patients with disabilities
[00:51:46] that a lot of patients with disabilities use and it's a very simple device you
[00:51:48] use and it's a very simple device you can only like sip and puff like that's
[00:51:50] can only like sip and puff like that's the only thing you can do but then the
[00:51:51] the only thing you can do but then the algorithm that's underneath needs to be
[00:51:53] algorithm that's underneath needs to be like much more complicated to be able to
[00:51:55] like much more complicated to be able to capture what that what that zip-and-puff
[00:51:57] capture what that what that zip-and-puff means so so that's kind of like one
[00:51:59] means so so that's kind of like one place that this interplay between
[00:52:00] place that this interplay between hardware and soft hardware and algorithm
[00:52:02] hardware and soft hardware and algorithm really shows up
[00:52:05] really shows up all right thank you
[00:52:07] all right thank you guys yeah let me okay so let me continue
[00:52:09] guys yeah let me okay so let me continue a little bit um and then i'll stop at
[00:52:11] a little bit um and then i'll stop at the end of this section and i can take
[00:52:12] the end of this section and i can take more questions too
[00:52:14] more questions too um after that i kind of want to show
[00:52:16] um after that i kind of want to show some of these applications too at some
[00:52:18] some of these applications too at some point this section is small so the
[00:52:19] point this section is small so the robotics at stanford one yeah so okay so
[00:52:22] robotics at stanford one yeah so okay so we talked about all of these we talked
[00:52:23] we talked about all of these we talked about the history um
[00:52:25] about the history um robotics said stanford had an
[00:52:27] robotics said stanford had an interesting history too so it does have
[00:52:29] interesting history too so it does have like an old history uh here's a video
[00:52:32] like an old history uh here's a video that i kind of just wanted to show for
[00:52:33] that i kind of just wanted to show for fun uh this is a video from usama kitube
[00:52:36] fun uh this is a video from usama kitube slab and sam always has the best videos
[00:52:39] slab and sam always has the best videos um he has these two robots juliet and
[00:52:42] um he has these two robots juliet and romeo the other fun thing in this video
[00:52:44] romeo the other fun thing in this video for me is lots of these people are like
[00:52:47] for me is lots of these people are like faculty now or like very famous in the
[00:52:50] faculty now or like very famous in the field so it's kind of fun to see them as
[00:52:51] field so it's kind of fun to see them as grad students at stanford so yeah this
[00:52:54] grad students at stanford so yeah this is like oliver brock um this is gates if
[00:52:57] is like oliver brock um this is gates if you like look at it closely it is first
[00:52:59] you like look at it closely it is first floor updates it hasn't changed much
[00:53:02] floor updates it hasn't changed much uh this robot is still on the first
[00:53:04] uh this robot is still on the first floor of gates so if you get a chance to
[00:53:06] floor of gates so if you get a chance to go there it is still like sombra
[00:53:09] go there it is still like sombra um and yeah like this is basically
[00:53:11] um and yeah like this is basically getting this robot helper to help you do
[00:53:14] getting this robot helper to help you do various sort of things like move objects
[00:53:17] various sort of things like move objects for you um they have basically two of
[00:53:19] for you um they have basically two of these let me move forward a little bit
[00:53:21] these let me move forward a little bit so it helps you like carry objects and
[00:53:23] so it helps you like carry objects and things of those forms it's very old
[00:53:25] things of those forms it's very old robot and some of these concepts that
[00:53:28] robot and some of these concepts that you study like now even like thinking
[00:53:30] you study like now even like thinking about interaction between humans like
[00:53:32] about interaction between humans like they were actually like they were
[00:53:33] they were actually like they were thinking about this back in the day this
[00:53:35] thinking about this back in the day this is collaborative transport these robots
[00:53:37] is collaborative transport these robots are not decentralized they do actually
[00:53:38] are not decentralized they do actually have a centralized control but they're
[00:53:40] have a centralized control but they're compliant so meaning that they're not
[00:53:42] compliant so meaning that they're not rigid right like if you move it like
[00:53:44] rigid right like if you move it like they also get moved
[00:53:46] they also get moved um
[00:53:47] um yeah so i think and then later on
[00:53:49] yeah so i think and then later on there's this video of dancing uh with
[00:53:53] there's this video of dancing uh with romeo so again it's compliant and it
[00:53:55] romeo so again it's compliant and it kind of moves around you and all that so
[00:53:57] kind of moves around you and all that so this is some old videos from stanford
[00:54:00] this is some old videos from stanford robotics some more recent video of
[00:54:02] robotics some more recent video of videos of stanford robotics and
[00:54:05] videos of stanford robotics and successes i guess
[00:54:06] successes i guess is um
[00:54:08] is um about the star for grand challenge so um
[00:54:11] about the star for grand challenge so um it's not that recent it's from 2005. so
[00:54:14] it's not that recent it's from 2005. so darpa grand challenge was this
[00:54:16] darpa grand challenge was this competition that darpa put out and
[00:54:18] competition that darpa put out and basically it was a competition that was
[00:54:19] basically it was a competition that was trying to get researchers to work on
[00:54:21] trying to get researchers to work on autonomous driving
[00:54:23] autonomous driving and this was um this was basically um
[00:54:27] and this was um this was basically um the competition in 2005 where sebastian
[00:54:29] the competition in 2005 where sebastian thrawn uh was heading the stanford team
[00:54:32] thrawn uh was heading the stanford team and
[00:54:33] and stanley
[00:54:34] stanley was uh the vehicle and actually won the
[00:54:38] was uh the vehicle and actually won the competition
[00:54:41] [Music]
[00:54:49] in case you don't recognize it that is a
[00:54:52] in case you don't recognize it that is a volkswagen tourette
[00:54:54] volkswagen tourette into the finish line configuration
[00:55:00] ladies and gentlemen
[00:55:02] ladies and gentlemen boys and girls
[00:55:04] boys and girls it's been done
[00:55:11] [Applause]
[00:55:13] [Applause] so that is stanley passing the finishing
[00:55:16] so that is stanley passing the finishing line after this actually sebastian
[00:55:18] line after this actually sebastian throne left
[00:55:19] throne left stanford and joined google and started
[00:55:21] stanford and joined google and started the google self-driving car uh groupter
[00:55:24] the google self-driving car uh groupter now waymo and yeah lots of advances in
[00:55:26] now waymo and yeah lots of advances in autonomous driving since then uh but
[00:55:28] autonomous driving since then uh but this was kind of one of the big
[00:55:29] this was kind of one of the big successes of stanford robotics winning
[00:55:32] successes of stanford robotics winning basically uh this uh darpa grand
[00:55:34] basically uh this uh darpa grand challenge which is very exciting but in
[00:55:36] challenge which is very exciting but in general robotics at stanford uh kind of
[00:55:39] general robotics at stanford uh kind of falls into a bunch of different
[00:55:40] falls into a bunch of different departments so in computer science uh
[00:55:42] departments so in computer science uh here are some of the faculty i just
[00:55:43] here are some of the faculty i just wanted to show their faces so
[00:55:45] wanted to show their faces so uh you know who they are and you can
[00:55:47] uh you know who they are and you can take classes from them later on and
[00:55:49] take classes from them later on and things of those form so uh usama kiteb
[00:55:52] things of those form so uh usama kiteb i've already showed a lot of videos from
[00:55:54] i've already showed a lot of videos from his lab i have one more from his lab
[00:55:56] his lab i have one more from his lab that i'll show later
[00:55:57] that i'll show later uh and someone does a lot of work around
[00:55:59] uh and someone does a lot of work around field robotics meaning that i'm gonna
[00:56:01] field robotics meaning that i'm gonna send robots to places that humans
[00:56:02] send robots to places that humans haven't seen before and see what happens
[00:56:04] haven't seen before and see what happens which is like really exciting
[00:56:06] which is like really exciting um then we have ken salisbury who does a
[00:56:09] um then we have ken salisbury who does a lot of work around like helper robots
[00:56:10] lot of work around like helper robots and and building systems that can
[00:56:12] and and building systems that can actually help people sylvia does a lot
[00:56:14] actually help people sylvia does a lot of work around vision and robotics so
[00:56:17] of work around vision and robotics so he's primarily a vision faculty but he
[00:56:19] he's primarily a vision faculty but he is thinking about that intersection of
[00:56:20] is thinking about that intersection of vision robotics and some of the more
[00:56:22] vision robotics and some of the more recent um people have joined including
[00:56:24] recent um people have joined including myself jeanette and chelsea so does a
[00:56:27] myself jeanette and chelsea so does a lot of work around manipulation i am
[00:56:29] lot of work around manipulation i am personally very interested in
[00:56:30] personally very interested in interaction so when you think about
[00:56:31] interaction so when you think about multi-agent interaction or interaction
[00:56:33] multi-agent interaction or interaction with humans and chelsea does a lot of
[00:56:35] with humans and chelsea does a lot of work around robot learning metal
[00:56:36] work around robot learning metal learning and things of those form
[00:56:39] learning and things of those form in addition to some of these faculty
[00:56:40] in addition to some of these faculty there are some other folks in the cs
[00:56:42] there are some other folks in the cs department who do a lot of work that is
[00:56:44] department who do a lot of work that is related to robotics so again fei-fei
[00:56:46] related to robotics so again fei-fei does quite a bit of work and vision but
[00:56:48] does quite a bit of work and vision but also is interested in robotics that
[00:56:50] also is interested in robotics that intersection and then karen liu and
[00:56:53] intersection and then karen liu and jojen wu both of them recently joined
[00:56:55] jojen wu both of them recently joined stanford and they do a ton of work
[00:56:57] stanford and they do a ton of work around physical simulation graphics
[00:56:59] around physical simulation graphics things of those form uh and that has a
[00:57:02] things of those form uh and that has a lot of relations to to building robots
[00:57:04] lot of relations to to building robots that can work with um like deformable
[00:57:07] that can work with um like deformable objects and things of those form
[00:57:10] objects and things of those form and some folks who used to do robotics i
[00:57:12] and some folks who used to do robotics i guess are andrew and sebastian so
[00:57:14] guess are andrew and sebastian so i showed a video of andrew's learning
[00:57:16] i showed a video of andrew's learning from demonstration work earlier the the
[00:57:19] from demonstration work earlier the the flying
[00:57:20] flying video of the helicopter and then
[00:57:22] video of the helicopter and then sebastian has done a lot of work in
[00:57:24] sebastian has done a lot of work in autonomous driving
[00:57:25] autonomous driving uh they're both kind of around so andrew
[00:57:27] uh they're both kind of around so andrew andrew does a lot of work in healthcare
[00:57:29] andrew does a lot of work in healthcare these days lesten comes in he's an
[00:57:31] these days lesten comes in he's an adjunct faculty now
[00:57:33] adjunct faculty now outside of computer science you still
[00:57:34] outside of computer science you still have a lot of robotics faculty so in the
[00:57:36] have a lot of robotics faculty so in the aero astro department we have grayscale
[00:57:38] aero astro department we have grayscale max schrager and marco pavone and
[00:57:40] max schrager and marco pavone and michael kochendurfer i mentioned
[00:57:42] michael kochendurfer i mentioned michael's work around aircraft systems
[00:57:44] michael's work around aircraft systems earlier so building these acasu systems
[00:57:47] earlier so building these acasu systems and and trying to prove properties
[00:57:49] and and trying to prove properties around them they all do a lot of work
[00:57:51] around them they all do a lot of work around drones and quadcopters and
[00:57:54] around drones and quadcopters and helicopters things of those form
[00:57:56] helicopters things of those form multi-agent systems and being able to
[00:57:58] multi-agent systems and being able to get guarantees and talk about risk
[00:58:01] get guarantees and talk about risk and finally in mechanical engineering we
[00:58:02] and finally in mechanical engineering we have a good number of faculty allison
[00:58:04] have a good number of faculty allison nakamura sean former mark otkowski steve
[00:58:07] nakamura sean former mark otkowski steve collins and monroe kennedy a lot of the
[00:58:09] collins and monroe kennedy a lot of the like al most of almost all of them do
[00:58:12] like al most of almost all of them do quite quite interesting work here on
[00:58:13] quite quite interesting work here on design too so building systems that are
[00:58:15] design too so building systems that are actually like interesting and useful the
[00:58:18] actually like interesting and useful the the sticky bot that i showed earlier was
[00:58:20] the sticky bot that i showed earlier was from mark slap
[00:58:21] from mark slap the snake robot was from from allison's
[00:58:23] the snake robot was from from allison's lab sean does a lot of interesting work
[00:58:25] lab sean does a lot of interesting work at the intersection of robotics and hci
[00:58:28] at the intersection of robotics and hci so if that is something you're
[00:58:29] so if that is something you're interested in you should check you
[00:58:30] interested in you should check you should check out like what these faculty
[00:58:32] should check out like what these faculty teach and and all that so that was kind
[00:58:34] teach and and all that so that was kind of my very quick robotics at stanford
[00:58:37] of my very quick robotics at stanford type of overview let me spend another
[00:58:40] type of overview let me spend another five minutes showing some of these
[00:58:41] five minutes showing some of these applications and maybe after that i'll
[00:58:43] applications and maybe after that i'll take questions for the last five minutes
[00:58:46] take questions for the last five minutes and i have a seven minute video
[00:58:48] and i have a seven minute video that i'll just leave after the class for
[00:58:51] that i'll just leave after the class for you guys to watch it's like a 50-year
[00:58:53] you guys to watch it's like a 50-year like history of robotics which is kind
[00:58:56] like history of robotics which is kind of fun
[00:58:57] of fun all right so i wanted to show you
[00:58:59] all right so i wanted to show you basically generally some exciting
[00:59:01] basically generally some exciting applications of robotics and i actually
[00:59:03] applications of robotics and i actually had a hard time classifying it because i
[00:59:05] had a hard time classifying it because i think classifying them
[00:59:07] think classifying them are can they can be classified with
[00:59:09] are can they can be classified with different axes and and it was hard to
[00:59:11] different axes and and it was hard to classify them but i ended up classifying
[00:59:12] classify them but i ended up classifying them into three main groups
[00:59:15] them into three main groups the first group is bio-inspired robots
[00:59:18] the first group is bio-inspired robots which is basically let's look at biology
[00:59:20] which is basically let's look at biology and try to build robots that are useful
[00:59:22] and try to build robots that are useful so it's a lot of interesting design goes
[00:59:24] so it's a lot of interesting design goes there
[00:59:24] there another interesting direction is soft
[00:59:26] another interesting direction is soft robotics meaning that hey i'm going to
[00:59:28] robotics meaning that hey i'm going to build systems that that use soft
[00:59:31] build systems that that use soft material so they're the integrity
[00:59:33] material so they're the integrity structure was an example of that so
[00:59:35] structure was an example of that so they're flexible they're soft they're
[00:59:36] they're flexible they're soft they're not rigid they're not going to break
[00:59:38] not rigid they're not going to break i'm not going to actually talk much
[00:59:40] i'm not going to actually talk much about soft robots but i'm going to talk
[00:59:42] about soft robots but i'm going to talk about manipulating soft objects which is
[00:59:45] about manipulating soft objects which is a very difficult algorithmic question
[00:59:48] a very difficult algorithmic question and then finally if i have time i will
[00:59:51] and then finally if i have time i will talk a little bit about domestic and
[00:59:52] talk a little bit about domestic and interactive robots which is something
[00:59:54] interactive robots which is something that i think is really exciting like
[00:59:55] that i think is really exciting like this interaction with humans is
[00:59:56] this interaction with humans is something that you should really care
[00:59:58] something that you should really care about as robots are basically starting
[01:00:00] about as robots are basically starting to interact with us
[01:00:03] to interact with us all right so bioinspired robots this is
[01:00:05] all right so bioinspired robots this is more of an like interesting design
[01:00:08] more of an like interesting design question so from kind of early on
[01:00:11] question so from kind of early on everyone was interested in humanoids
[01:00:13] everyone was interested in humanoids because you want robots to look like
[01:00:14] because you want robots to look like humans for some reason so there's a lot
[01:00:17] humans for some reason so there's a lot of work around humanoids and building
[01:00:19] of work around humanoids and building robots that look like humans that is
[01:00:21] robots that look like humans that is they have two arms and two hands and two
[01:00:23] they have two arms and two hands and two legs and a face
[01:00:25] legs and a face but at some point people realize robots
[01:00:28] but at some point people realize robots don't need to look like humans and they
[01:00:30] don't need to look like humans and they started looking at the nature in general
[01:00:32] started looking at the nature in general and and then started thinking about
[01:00:34] and and then started thinking about generally bioinspired robots right there
[01:00:36] generally bioinspired robots right there are a lot of a lot of animals that can
[01:00:38] are a lot of a lot of animals that can get to places that humans can't and can
[01:00:41] get to places that humans can't and can be built robots that are similar to them
[01:00:44] be built robots that are similar to them and and
[01:00:45] and and another interesting topic that shows up
[01:00:47] another interesting topic that shows up here specifically under humanoids is
[01:00:49] here specifically under humanoids is this idea of walking so people have been
[01:00:51] this idea of walking so people have been obsessed with walking for years now and
[01:00:54] obsessed with walking for years now and it's an interesting problem like if you
[01:00:56] it's an interesting problem like if you want to build a robot that walks kind of
[01:00:58] want to build a robot that walks kind of like humans that is still very difficult
[01:01:01] like humans that is still very difficult like all walking robots have like weird
[01:01:04] like all walking robots have like weird gates and they don't really like walk
[01:01:06] gates and they don't really like walk like human like and when they do they're
[01:01:08] like human like and when they do they're just super inefficient like humans are
[01:01:10] just super inefficient like humans are just amazing at walking and that's in
[01:01:12] just amazing at walking and that's in general very active area of research
[01:01:14] general very active area of research trying to get robots to walk and why do
[01:01:17] trying to get robots to walk and why do we care about that well first off it's
[01:01:19] we care about that well first off it's an interesting question second like
[01:01:21] an interesting question second like building exoskeletons building systems
[01:01:22] building exoskeletons building systems that can help people watch is always an
[01:01:24] that can help people watch is always an interest
[01:01:25] interest in this field
[01:01:27] in this field so let's look at a few bioinspired
[01:01:29] so let's look at a few bioinspired robots i only just want to show a lot of
[01:01:31] robots i only just want to show a lot of videos of these systems
[01:01:33] videos of these systems so one type of bio-inspired robots is
[01:01:36] so one type of bio-inspired robots is looking at insects like cockroaches
[01:01:39] looking at insects like cockroaches and try to build robots that kind of act
[01:01:41] and try to build robots that kind of act like cockroaches because they're amazing
[01:01:43] like cockroaches because they're amazing at getting through like obstacles and
[01:01:46] at getting through like obstacles and there's a team at uc berkeley ron
[01:01:48] there's a team at uc berkeley ron fearings group that basically designs
[01:01:50] fearings group that basically designs robots
[01:01:51] robots that are
[01:01:53] that are similar to cockroaches and they go
[01:01:55] similar to cockroaches and they go through places like cockroaches
[01:01:57] through places like cockroaches and the nice thing about it is yeah
[01:01:59] and the nice thing about it is yeah they're very agile they go through
[01:02:01] they're very agile they go through things the other thing is they're small
[01:02:02] things the other thing is they're small and they can be super fast so you can
[01:02:04] and they can be super fast so you can have a swarm of these robots get to
[01:02:06] have a swarm of these robots get to places super fast
[01:02:08] places super fast and another interesting thing about
[01:02:10] and another interesting thing about cockroaches is when they navigate they
[01:02:12] cockroaches is when they navigate they use their antennas so that is actually
[01:02:15] use their antennas so that is actually how they figure out where the ball is
[01:02:17] how they figure out where the ball is like using the antenna they figure out
[01:02:19] like using the antenna they figure out they kind of like follow the ball and
[01:02:21] they kind of like follow the ball and and basically people in ron's group have
[01:02:23] and basically people in ron's group have have been using similar type of ideas to
[01:02:26] have been using similar type of ideas to be able to sense the world and actuate
[01:02:27] be able to sense the world and actuate in this world um and they even build
[01:02:29] in this world um and they even build these robots using origami which is kind
[01:02:32] these robots using origami which is kind of interesting but it makes sure that
[01:02:34] of interesting but it makes sure that they're small and they're not they don't
[01:02:36] they're small and they're not they don't take as much battery power and energy
[01:02:38] take as much battery power and energy and
[01:02:39] and and they're kind of like white
[01:02:42] and they're kind of like white let me actually move to this one
[01:02:44] let me actually move to this one the other robot that i showed you a
[01:02:46] the other robot that i showed you a little bit earlier is the bioinspired
[01:02:48] little bit earlier is the bioinspired robot is the sticky bot from mark slap
[01:02:50] robot is the sticky bot from mark slap so uh basically if you look at gecko
[01:02:53] so uh basically if you look at gecko feet it has these suction cups uh that
[01:02:57] feet it has these suction cups uh that it's very very tiny suction cups that
[01:02:59] it's very very tiny suction cups that gets attached to glass and because of
[01:03:01] gets attached to glass and because of that like this robot is using a similar
[01:03:03] that like this robot is using a similar type of paradigm and it can just like
[01:03:05] type of paradigm and it can just like walk up like really slippery slopes
[01:03:08] walk up like really slippery slopes um
[01:03:09] um like polish granite
[01:03:11] like polish granite and that is super impressive
[01:03:14] and that is super impressive thing so lots of cool design going on
[01:03:16] thing so lots of cool design going on here
[01:03:18] here um
[01:03:20] um similarly uh i think snake robots and
[01:03:22] similarly uh i think snake robots and eel robots are very popular there are a
[01:03:24] eel robots are very popular there are a lot of like links connected to each
[01:03:26] lot of like links connected to each other and they can navigate easily this
[01:03:28] other and they can navigate easily this is an elite robot it's an underwater
[01:03:30] is an elite robot it's an underwater robot it does have a camera in the front
[01:03:32] robot it does have a camera in the front and based on that it navigates
[01:03:35] and based on that it navigates uh which is kind of cool
[01:03:38] all right
[01:03:40] all right and then this is a
[01:03:42] and then this is a this is a hopper so this is uh basically
[01:03:45] this is a hopper so this is uh basically a robot again from ron's group uh salto
[01:03:48] a robot again from ron's group uh salto which is kind of like
[01:03:50] which is kind of like jumps around like a bush baby
[01:03:52] jumps around like a bush baby um it's kind of cool
[01:03:55] um it's kind of cool if if you guys have used things like
[01:03:57] if if you guys have used things like mujoku which is a simulation environment
[01:03:59] mujoku which is a simulation environment you might see these random animals in
[01:04:01] you might see these random animals in murdocal part of it is roboticists care
[01:04:04] murdocal part of it is roboticists care really about like different types of
[01:04:05] really about like different types of animals like swimming type of motion
[01:04:07] animals like swimming type of motion hopping and
[01:04:09] hopping and and that's why those show up in this
[01:04:11] and that's why those show up in this mutuco type environment which is a
[01:04:13] mutuco type environment which is a simulation wise physics simulator that
[01:04:15] simulation wise physics simulator that allows you to train things basically in
[01:04:17] allows you to train things basically in simulation
[01:04:19] simulation all right so those were those were
[01:04:20] all right so those were those were generally bioinspired type robots as i
[01:04:23] generally bioinspired type robots as i mentioned earlier we've been obsessed
[01:04:24] mentioned earlier we've been obsessed with humanoids so lots of energy and
[01:04:27] with humanoids so lots of energy and money goes into building humanoids
[01:04:29] money goes into building humanoids um honda actually like spent a lot of
[01:04:32] um honda actually like spent a lot of money on building this this robot called
[01:04:34] money on building this this robot called azima
[01:04:39] it's walking is kind of weird
[01:04:47] [Music]
[01:04:51] i am happy to be here with you today
[01:04:54] i am happy to be here with you today thank you
[01:04:55] thank you i'm excited to be here in washington d.c
[01:04:58] i'm excited to be here in washington d.c [Music]
[01:05:01] [Music] all right i'm gonna cut it there
[01:05:03] all right i'm gonna cut it there um but yeah so humanoids have been
[01:05:06] um but yeah so humanoids have been really exciting and another place that
[01:05:08] really exciting and another place that actually humanoids show up um is uh
[01:05:12] actually humanoids show up um is uh basically um
[01:05:14] basically um when you have um
[01:05:16] when you have um one other example i want to show
[01:05:17] one other example i want to show basically is sending humanoids type
[01:05:19] basically is sending humanoids type robots to spaces that you wouldn't be
[01:05:21] robots to spaces that you wouldn't be able to send before so i want to show a
[01:05:23] able to send before so i want to show a video of uh this robot from usama's
[01:05:25] video of uh this robot from usama's group uh ocean one you some of you might
[01:05:27] group uh ocean one you some of you might have seen this video
[01:05:29] have seen this video um this is basically not full humanoid
[01:05:31] um this is basically not full humanoid it does have two arms so you can
[01:05:33] it does have two arms so you can teleoperate it and get the robot do
[01:05:35] teleoperate it and get the robot do various things
[01:05:36] various things um but it goes underwater to places that
[01:05:39] um but it goes underwater to places that people have not been able to go before
[01:05:42] people have not been able to go before so let's just quickly watch this video
[01:05:43] so let's just quickly watch this video it's kind of like a nice video
[01:05:47] it's kind of like a nice video it's almost work
[01:05:50] it's almost work [Music]
[01:05:58] ocean one is aimed at bringing a new
[01:06:02] ocean one is aimed at bringing a new capability for underwater exploration
[01:06:07] the intent here is to have a diver
[01:06:10] the intent here is to have a diver diving virtually creating a robot that
[01:06:14] diving virtually creating a robot that can be
[01:06:15] can be the
[01:06:16] the physical representation of the human
[01:06:19] physical representation of the human if you want a robotic diver that can
[01:06:21] if you want a robotic diver that can have bi-manual capabilities
[01:06:24] have bi-manual capabilities so it has two hands it has a stereo
[01:06:26] so it has two hands it has a stereo vision and the most amazing thing about
[01:06:29] vision and the most amazing thing about it is that
[01:06:30] it is that you can feel
[01:06:33] you can feel what the robot is doing while it's
[01:06:35] what the robot is doing while it's sitting up on the boat
[01:06:37] sitting up on the boat and this is combining the technology of
[01:06:40] and this is combining the technology of haptics that is the idea that we can
[01:06:43] haptics that is the idea that we can reflect the contact forces
[01:06:46] reflect the contact forces it's almost like you are there with the
[01:06:49] it's almost like you are there with the sense of touch you create a new
[01:06:51] sense of touch you create a new dimension of perception
[01:06:54] dimension of perception this robot is oil filled it allows us to
[01:06:57] this robot is oil filled it allows us to take the robot very deep this robot can
[01:07:00] take the robot very deep this robot can go to thousands is this stanford truly a
[01:07:03] go to thousands is this stanford truly a human-like machine
[01:07:05] human-like machine that is also human-friendly
[01:07:07] that is also human-friendly [Music]
[01:07:09] [Music] essentially shipwrecked located about 20
[01:07:12] essentially shipwrecked located about 20 miles off the coast of toulon in france
[01:07:15] miles off the coast of toulon in france at 100 meters
[01:07:19] in the last year we have been working
[01:07:22] in the last year we have been working and getting our robots ready to take on
[01:07:25] and getting our robots ready to take on that expedition
[01:07:28] that expedition we are going to land on the moon
[01:07:31] we are going to land on the moon [Music]
[01:07:35] more than 70 percent of the surface of
[01:07:37] more than 70 percent of the surface of the planet is water we have a lot of
[01:07:39] the planet is water we have a lot of structures a lot of polar potteries to
[01:07:42] structures a lot of polar potteries to monitor
[01:07:43] monitor we need to reach down there you can
[01:07:46] we need to reach down there you can think about it as a solution
[01:07:48] think about it as a solution [Music]
[01:07:51] [Music] that much time i'm going to kind of like
[01:07:53] that much time i'm going to kind of like move forward because probably the last
[01:07:55] move forward because probably the last thing i want to show is the walking uh
[01:07:57] thing i want to show is the walking uh video and then i'll close it and ask for
[01:08:00] video and then i'll close it and ask for questions but yeah like uh in general if
[01:08:03] questions but yeah like uh in general if you're interested in underwater robots
[01:08:04] you're interested in underwater robots kusama does a lot of work around that
[01:08:06] kusama does a lot of work around that he's at stanford he teaches classes talk
[01:08:08] he's at stanford he teaches classes talk to him i think that's a very interesting
[01:08:10] to him i think that's a very interesting direction
[01:08:11] direction uh and then finally i guess the last
[01:08:13] uh and then finally i guess the last video that i can show under this
[01:08:15] video that i can show under this category is um this idea of walking and
[01:08:18] category is um this idea of walking and jumping in things of those forms so um
[01:08:21] jumping in things of those forms so um so there's a lot of again excitement
[01:08:22] so there's a lot of again excitement around that and boston dynamics the
[01:08:25] around that and boston dynamics the first video i showed you guys for the
[01:08:27] first video i showed you guys for the dancing robot was also from boston
[01:08:28] dancing robot was also from boston dynamics
[01:08:29] dynamics um does a lot of work around building
[01:08:32] um does a lot of work around building very dynamic robots so they have really
[01:08:34] very dynamic robots so they have really good controllers for these robots uh and
[01:08:36] good controllers for these robots uh and this is atlas
[01:08:38] this is atlas from austin dynamics
[01:08:40] from austin dynamics jumps
[01:08:55] flips even
[01:09:06] that is super impressive
[01:09:11] and usually roboticists show that one
[01:09:14] and usually roboticists show that one video of it working and don't talk about
[01:09:16] video of it working and don't talk about videos of it not working but more
[01:09:18] videos of it not working but more recently people are showing more videos
[01:09:21] recently people are showing more videos of things not exactly working so i think
[01:09:24] of things not exactly working so i think this video it actually it's super
[01:09:26] this video it actually it's super impressive how it recovers because that
[01:09:28] impressive how it recovers because that is really hard to do in real time
[01:09:31] is really hard to do in real time that was a failure so
[01:09:34] that was a failure so yeah so okay so lots of excitement
[01:09:37] yeah so okay so lots of excitement around these areas
[01:09:39] around these areas i can start taking questions now i'm not
[01:09:42] i can start taking questions now i'm not gonna show more videos i have more stuff
[01:09:43] gonna show more videos i have more stuff to show but let me just like answer a
[01:09:45] to show but let me just like answer a couple of questions and then i'll just
[01:09:48] couple of questions and then i'll just at the end i took 220 i'll
[01:09:50] at the end i took 220 i'll leave this video up 50 years of robotics
[01:09:52] leave this video up 50 years of robotics that osama put together and has fun
[01:09:54] that osama put together and has fun music
[01:09:56] music um so any questions
[01:09:59] um so any questions i had a question i was super impressed
[01:10:00] i had a question i was super impressed by the last video you showed of the
[01:10:03] by the last video you showed of the robot doing flips i was wondering it
[01:10:05] robot doing flips i was wondering it looked really heavy like it had a lot of
[01:10:07] looked really heavy like it had a lot of materials and i was wondering if um like
[01:10:09] materials and i was wondering if um like why they chose to equip it with such
[01:10:12] why they chose to equip it with such materials i i thought you know maybe
[01:10:13] materials i i thought you know maybe using lighter um materials so it could
[01:10:16] using lighter um materials so it could be easier to jump i was just curious if
[01:10:17] be easier to jump i was just curious if you knew um the reason behind uh how
[01:10:20] you knew um the reason behind uh how they designed that robot that's a very
[01:10:22] they designed that robot that's a very good question yeah so i don't know the
[01:10:24] good question yeah so i don't know the details of it because i don't personally
[01:10:25] details of it because i don't personally like work too much on uh like walking
[01:10:27] like work too much on uh like walking and like the design side of things uh
[01:10:29] and like the design side of things uh but yeah i actually don't know like what
[01:10:31] but yeah i actually don't know like what material they're using they definitely
[01:10:33] material they're using they definitely do consider different types of material
[01:10:34] do consider different types of material and making sure there's lightweight and
[01:10:36] and making sure there's lightweight and all that but i think there's just a lot
[01:10:38] all that but i think there's just a lot of like joints and a lot of stuff going
[01:10:40] of like joints and a lot of stuff going on with that robot um if you're
[01:10:42] on with that robot um if you're interested to learn more about that yeah
[01:10:44] interested to learn more about that yeah boston check out like boston dynamics
[01:10:46] boston check out like boston dynamics basically website they have all their
[01:10:48] basically website they have all their other cool robots there
[01:10:54] i have a question
[01:10:56] i have a question um so i was wondering um what's some
[01:10:59] um so i was wondering um what's some examples of state-of-the-art research
[01:11:02] examples of state-of-the-art research involved
[01:11:03] involved both robotics and nlp or language based
[01:11:06] both robotics and nlp or language based ai for example voice activated systems
[01:11:09] ai for example voice activated systems or related work
[01:11:11] or related work yeah there's a lot of excitement around
[01:11:13] yeah there's a lot of excitement around actually nlp and robotics uh i actually
[01:11:15] actually nlp and robotics uh i actually have a student uh joined between me and
[01:11:17] have a student uh joined between me and percy which is very exciting this is
[01:11:19] percy which is very exciting this is like first time doing nlp robotics
[01:11:21] like first time doing nlp robotics together um there's a lot of work around
[01:11:24] together um there's a lot of work around instruction following so basically just
[01:11:26] instruction following so basically just making an interactive teaching so when
[01:11:28] making an interactive teaching so when you have a person and a robot at home
[01:11:30] you have a person and a robot at home like how would the robot learn that you
[01:11:32] like how would the robot learn that you care about house of cards or you don't
[01:11:34] care about house of cards or you don't care about house of cards type of type
[01:11:36] care about house of cards type of type of a thing so thinking about human robot
[01:11:38] of a thing so thinking about human robot interaction a little bit more carefully
[01:11:40] interaction a little bit more carefully than you actually have access to nlp so
[01:11:42] than you actually have access to nlp so so that is one place that it shows up
[01:11:45] so that is one place that it shows up another place that is a little bit
[01:11:46] another place that is a little bit harder to think about but i think it has
[01:11:48] harder to think about but i think it has a lot of value is uh thinking about the
[01:11:51] a lot of value is uh thinking about the large data set
[01:11:53] large data set of natural language data that we have
[01:11:55] of natural language data that we have generally like we have a lot of like
[01:11:57] generally like we have a lot of like text data and from that if you can from
[01:11:59] text data and from that if you can from that learn something about context learn
[01:12:01] that learn something about context learn something about how a robot should like
[01:12:04] something about how a robot should like cook an egg like i think that is that is
[01:12:06] cook an egg like i think that is that is very interesting too and i haven't seen
[01:12:08] very interesting too and i haven't seen that much work around it but again lots
[01:12:10] that much work around it but again lots of lots of excitement in that particular
[01:12:12] of lots of excitement in that particular intersection of nlp and robotics
[01:12:19] hi i had a question as well
[01:12:21] hi i had a question as well maybe this goes more into the scope of
[01:12:23] maybe this goes more into the scope of visual recognition but robots will be
[01:12:25] visual recognition but robots will be playing a part in this too
[01:12:27] playing a part in this too so the world unfortunately will always
[01:12:29] so the world unfortunately will always consist of good factors and evil factors
[01:12:31] consist of good factors and evil factors and for international security purposes
[01:12:33] and for international security purposes there will be a role if there already is
[01:12:35] there will be a role if there already is a rule for robots and autonomous systems
[01:12:38] a rule for robots and autonomous systems but those same methods can unfortunately
[01:12:39] but those same methods can unfortunately also be used for human rights violations
[01:12:41] also be used for human rights violations too how do you build it so that any
[01:12:43] too how do you build it so that any technology will always be neutral is its
[01:12:45] technology will always be neutral is its use that determines what the outcome is
[01:12:47] use that determines what the outcome is but in the case of human rights
[01:12:48] but in the case of human rights violations how can you build systems so
[01:12:51] violations how can you build systems so that
[01:12:52] that um an authoritarian regime
[01:12:54] um an authoritarian regime or what would be the way to using
[01:12:56] or what would be the way to using technology to evade an authoritarian
[01:12:58] technology to evade an authoritarian regime that will have the most best
[01:13:00] regime that will have the most best technology
[01:13:01] technology i've seen some works around how to full
[01:13:04] i've seen some works around how to full uh facial recognitions
[01:13:06] uh facial recognitions how can technology work against
[01:13:07] how can technology work against technology when it's needed and also
[01:13:09] technology when it's needed and also serve its purpose
[01:13:10] serve its purpose i think it's a tough question
[01:13:12] i think it's a tough question it is a very tough question i refer that
[01:13:14] it is a very tough question i refer that to tino next week specifically but uh in
[01:13:17] to tino next week specifically but uh in the case of vision and in the case of
[01:13:19] the case of vision and in the case of like using ml like in general like
[01:13:20] like using ml like in general like machine learning i think it is much like
[01:13:22] machine learning i think it is much like much tougher uh in the last lecture i
[01:13:24] much tougher uh in the last lecture i will talk about this a little bit like
[01:13:26] will talk about this a little bit like this idea of like fooling like neural
[01:13:28] this idea of like fooling like neural networks there's some recent work around
[01:13:30] networks there's some recent work around uh basically showing that you can always
[01:13:32] uh basically showing that you can always find adversarial examples so this idea
[01:13:34] find adversarial examples so this idea of trying to like safeguard your system
[01:13:36] of trying to like safeguard your system so it doesn't get affected by
[01:13:38] so it doesn't get affected by adversarial examples is just not going
[01:13:40] adversarial examples is just not going to work like there's proof uh by
[01:13:43] to work like there's proof uh by auditioning here and others uh who have
[01:13:46] auditioning here and others uh who have actually shown that you can always find
[01:13:47] actually shown that you can always find like basically an adversarial example in
[01:13:50] like basically an adversarial example in some settings under some distances uh in
[01:13:52] some settings under some distances uh in the case of robotics um i've been part
[01:13:55] the case of robotics um i've been part of some discussions around this idea of
[01:13:56] of some discussions around this idea of autonomous weapon systems as i mentioned
[01:13:58] autonomous weapon systems as i mentioned earlier um for those discussions
[01:14:02] earlier um for those discussions there are discussions on the number of
[01:14:03] there are discussions on the number of drones that for example can be purchased
[01:14:05] drones that for example can be purchased at the same time and things of those
[01:14:07] at the same time and things of those forms a lot of concerns there is
[01:14:10] forms a lot of concerns there is basically autonomous weapon systems
[01:14:11] basically autonomous weapon systems becoming weapon of mass destruction uh
[01:14:14] becoming weapon of mass destruction uh which is kind of scary like as i talk
[01:14:16] which is kind of scary like as i talk about it um but yeah like um discussions
[01:14:19] about it um but yeah like um discussions around that is what sort of limitations
[01:14:21] around that is what sort of limitations can be put what sort of regulations can
[01:14:24] can be put what sort of regulations can be put there so like people don't buy
[01:14:26] be put there so like people don't buy too many drones at the same time and
[01:14:27] too many drones at the same time and weaponize it let's say like things of
[01:14:29] weaponize it let's say like things of those forms
[01:14:30] those forms but i'm definitely not an expert in this
[01:14:33] but i'm definitely not an expert in this i refer you to tina hoyler next week
[01:14:36] i refer you to tina hoyler next week on more details on this
[01:14:38] on more details on this thank you thanks so
[01:14:40] thank you thanks so much i have a question regarding uh
[01:14:43] much i have a question regarding uh among all these uh since robotic is such
[01:14:46] among all these uh since robotic is such an integrated subject uh like it
[01:14:48] an integrated subject uh like it integrates like mechanical engineering
[01:14:50] integrates like mechanical engineering artificial intelligence and also power
[01:14:52] artificial intelligence and also power management
[01:14:54] management and also regulations what is that the
[01:14:56] and also regulations what is that the biggest limiting factor uh that prevents
[01:14:58] biggest limiting factor uh that prevents robotics from uh like affecting
[01:15:01] robotics from uh like affecting everyone's life from being like widely
[01:15:03] everyone's life from being like widely adopted
[01:15:04] adopted and by being widely adopted uh they're
[01:15:07] and by being widely adopted uh they're still like dealing with uncertainty is
[01:15:08] still like dealing with uncertainty is still like so difficult right so like
[01:15:11] still like so difficult right so like you have robots in factory floors
[01:15:13] you have robots in factory floors confined spaces they can move around so
[01:15:15] confined spaces they can move around so much easily uh putting it in a world
[01:15:17] much easily uh putting it in a world where humans are just like walking
[01:15:18] where humans are just like walking around it there's so many reasons that
[01:15:20] around it there's so many reasons that the human could walk around it and like
[01:15:22] the human could walk around it and like figuring out what those reasons are can
[01:15:24] figuring out what those reasons are can be really difficult so in general
[01:15:25] be really difficult so in general dealing with uncertainty dealing with in
[01:15:28] dealing with uncertainty dealing with in the case of autonomous driving or
[01:15:29] the case of autonomous driving or dealing with things like near accident
[01:15:31] dealing with things like near accident scenarios that it hasn't seen before uh
[01:15:34] scenarios that it hasn't seen before uh all of those like that uncertainty is
[01:15:36] all of those like that uncertainty is like a big factor that's not allowing us
[01:15:38] like a big factor that's not allowing us to have robots like out there like
[01:15:41] to have robots like out there like widely used in our everyday lives
[01:15:44] widely used in our everyday lives so given the current ai technology
[01:15:46] so given the current ai technology mostly are based on learning algorithms
[01:15:48] mostly are based on learning algorithms but if you keep doing the learning
[01:15:50] but if you keep doing the learning algorithm that means you can only like
[01:15:52] algorithm that means you can only like learn existing behaviors uh so in order
[01:15:55] learn existing behaviors uh so in order to deal with these uncertainties are
[01:15:57] to deal with these uncertainties are there any efforts to deal with
[01:15:59] there any efforts to deal with uncertainties in life or do something
[01:16:00] uncertainties in life or do something like a self-generated motion or
[01:16:02] like a self-generated motion or self-motivated actions from the robots
[01:16:05] self-motivated actions from the robots itself
[01:16:07] itself yeah so yeah definitely yeah so there's
[01:16:09] yeah so yeah definitely yeah so there's a lot of work around um like actively
[01:16:11] a lot of work around um like actively generating these scenarios active
[01:16:13] generating these scenarios active learning in this domain so the robot had
[01:16:15] learning in this domain so the robot had but the robot still has some sort of
[01:16:16] but the robot still has some sort of hypothesis space that it can search in
[01:16:19] hypothesis space that it can search in right so uh like you have a hypothesis
[01:16:21] right so uh like you have a hypothesis space of things that can happen and
[01:16:23] space of things that can happen and within that you can search um and yeah
[01:16:26] within that you can search um and yeah so so like there are these things that
[01:16:28] so so like there are these things that are called known unknowns and unknown
[01:16:30] are called known unknowns and unknown unknowns you can't really do much around
[01:16:32] unknowns you can't really do much around unknown unknowns better than like other
[01:16:34] unknown unknowns better than like other than just randomly experiencing them but
[01:16:37] than just randomly experiencing them but for known unknowns yeah definitely like
[01:16:39] for known unknowns yeah definitely like there's a lot of work on actively
[01:16:40] there's a lot of work on actively looking for the most informative data i
[01:16:43] looking for the most informative data i guess another reason that we don't have
[01:16:44] guess another reason that we don't have robots widely used is it's such an
[01:16:47] robots widely used is it's such an integrated system and it's such an
[01:16:49] integrated system and it's such an interconnected system so you have like
[01:16:51] interconnected system so you have like the best ai algorithm and all of a
[01:16:53] the best ai algorithm and all of a sudden your camera fails
[01:16:55] sudden your camera fails like the hardware failure can affect
[01:16:57] like the hardware failure can affect like there's so many things that can
[01:16:59] like there's so many things that can fail in that pipeline
[01:17:01] fail in that pipeline that makes it just such a difficult
[01:17:02] that makes it just such a difficult system to debug it's like everything
[01:17:04] system to debug it's like everything coming together
[01:17:12] so all right so i'm going to just leave
[01:17:14] so all right so i'm going to just leave this video on and then at the end of it
[01:17:15] this video on and then at the end of it i'm going to sign off uh because it's so
[01:17:17] i'm going to sign off uh because it's so fun it has fun music
[01:17:19] fun it has fun music osama again made this osama is awesome
[01:17:21] osama again made this osama is awesome in making music but if you have more
[01:17:23] in making music but if you have more questions about these things just come
[01:17:25] questions about these things just come to the office hour and i'll be happy to
[01:17:26] to the office hour and i'll be happy to answer any questions let's just watch 50
[01:17:29] answer any questions let's just watch 50 years history of robotics so if you guys
[01:17:31] years history of robotics so if you guys want to sign off sign up to i'll talk to
[01:17:33] want to sign off sign up to i'll talk to you later this is seven minutes all
[01:17:36] you later this is seven minutes all right
[01:17:47] so
[01:17:48] so [Music]
[01:18:02] [Music] so
[01:18:03] so [Music]
[01:18:40] [Music] wow
[01:19:02] is
[01:19:05] is [Music]
[01:19:29] [Music]
[01:19:45] um
[01:19:51] yes
[01:19:53] yes [Music]
[01:20:05] thank you
[01:20:31] [Music]
[01:21:12] [Music] oh
[01:21:15] [Music]
[01:21:37] um
[01:21:48] foreign
[01:21:52] [Music]
[01:22:40] [Music] [Laughter]
[01:22:45] [Music]
[01:23:37] [Music] [Laughter]
[01:23:41] [Music]
[01:24:35] [Music] all right
[01:24:37] all right that was kind of long
[01:24:39] that was kind of long but kind of fun
[01:24:40] but kind of fun so this is for repress 2020
[01:24:43] so this is for repress 2020 okay
[01:24:45] okay do they still have 30 people oh my god
[01:24:47] do they still have 30 people oh my god okay
[01:24:48] okay all right
[01:24:49] all right uh good seeing you all uh that's it
[01:24:53] uh good seeing you all uh that's it i'll see you at office hours in our next
[01:24:55] i'll see you at office hours in our next lecture later bye


================================================================================
LECTURE 054
================================================================================

Stanford Talk: Inequality in Healthcare, AI & Data Science to Reduce Inequality - Improve Healthcare

Source: https://www.youtube.com/watch?v=0IZhDmh1dmI

---

Transcript

[00:00:05] all right let's get started so welcome
[00:00:07] all right let's get started so welcome everyone um we're really pleased to have
[00:00:10] everyone um we're really pleased to have emma pearson with us today
[00:00:12] emma pearson with us today so emma actually comes from
[00:00:14] so emma actually comes from stanford well i mean not originally but
[00:00:16] stanford well i mean not originally but she spent uh her undergrad and
[00:00:19] she spent uh her undergrad and grad school at stanford
[00:00:22] grad school at stanford she actually took one of my classes when
[00:00:24] she actually took one of my classes when i first saw it started
[00:00:27] i first saw it started at stanford so um emma is uh has done a
[00:00:30] at stanford so um emma is uh has done a lot of great work at the
[00:00:33] lot of great work at the in machine learning in particular
[00:00:35] in machine learning in particular addressing fairness and my group has
[00:00:37] addressing fairness and my group has done um some of uh work in fairness and
[00:00:40] done um some of uh work in fairness and we always
[00:00:41] we always go and ask emma because she's kind of
[00:00:42] go and ask emma because she's kind of the uh our go-to resident expert on the
[00:00:45] the uh our go-to resident expert on the topic um so she's since graduated she's
[00:00:48] topic um so she's since graduated she's spending a
[00:00:50] spending a a year at msr new england before
[00:00:53] a year at msr new england before starting as an assistant professor in uh
[00:00:55] starting as an assistant professor in uh cornell
[00:00:57] cornell um next year so
[00:00:59] um next year so i'm sure she'll have a lot of
[00:01:00] i'm sure she'll have a lot of interesting things and important things
[00:01:02] interesting things and important things to say along with the general theme
[00:01:04] to say along with the general theme that we've been trying to go for in
[00:01:06] that we've been trying to go for in these
[00:01:07] these classes which is how ai really matters
[00:01:09] classes which is how ai really matters and affects people's lives so please
[00:01:11] and affects people's lives so please take it away emma
[00:01:13] take it away emma thank you thank you for this invitation
[00:01:15] thank you thank you for this invitation it's a pleasure to be here um to be back
[00:01:17] it's a pleasure to be here um to be back at stanford if only virtually and
[00:01:19] at stanford if only virtually and actually to be back specifically in
[00:01:21] actually to be back specifically in cs221 which was actually the first
[00:01:23] cs221 which was actually the first computer science class i ever took at
[00:01:24] computer science class i ever took at stanford so it brings back fond memories
[00:01:27] stanford so it brings back fond memories and i'm not just saying that to suck up
[00:01:29] and i'm not just saying that to suck up to the professors
[00:01:30] to the professors okay so today i'm going to be talking um
[00:01:33] okay so today i'm going to be talking um basically giving
[00:01:35] basically giving a
[00:01:37] a two-part talk in the first part of the
[00:01:39] two-part talk in the first part of the talk um i'm going to give an overview of
[00:01:40] talk um i'm going to give an overview of some of the recent projects that i've
[00:01:42] some of the recent projects that i've worked on um discussing sort of social
[00:01:44] worked on um discussing sort of social implications of ai and trying to use it
[00:01:46] implications of ai and trying to use it to improve people's lives uh and then
[00:01:48] to improve people's lives uh and then i'm going to give a little bit of a
[00:01:49] i'm going to give a little bit of a story about how i got here just in case
[00:01:51] story about how i got here just in case it's useful to you as you're sort of
[00:01:52] it's useful to you as you're sort of trying to unravel your own professional
[00:01:54] trying to unravel your own professional choices
[00:01:57] so at a high level as percy said i use
[00:01:59] so at a high level as percy said i use ai and data science and for very
[00:02:02] ai and data science and for very practical applications and the specific
[00:02:03] practical applications and the specific applications i focus on are reducing
[00:02:05] applications i focus on are reducing inequality and improving health care
[00:02:09] inequality and improving health care today i'm going to be talking about
[00:02:11] today i'm going to be talking about using ai to study inequality in three
[00:02:13] using ai to study inequality in three areas first i'm going to tell you a
[00:02:15] areas first i'm going to tell you a story about policing and how we can use
[00:02:16] story about policing and how we can use ai to study inequality and policing then
[00:02:19] ai to study inequality and policing then i'll talk about using ai to study
[00:02:21] i'll talk about using ai to study inequality and pain and then finally
[00:02:22] inequality and pain and then finally i'll talk about using it to study
[00:02:25] i'll talk about using it to study inequality in covid 19.
[00:02:29] so let's jump right into it let's talk
[00:02:31] so let's jump right into it let's talk about policing this is joint work with a
[00:02:33] about policing this is joint work with a number of excellent co-authors whose
[00:02:35] number of excellent co-authors whose names i will now attempt to rattle off
[00:02:37] names i will now attempt to rattle off uh camilla yan sam
[00:02:39] uh camilla yan sam uh dan amy uh vignesh cheryl uh phoebe
[00:02:43] uh dan amy uh vignesh cheryl uh phoebe uh sorry ravi and sharid uh so it's
[00:02:46] uh sorry ravi and sharid uh so it's quite quite a large project and the
[00:02:48] quite quite a large project and the efforts of a ton of people
[00:02:51] efforts of a ton of people so
[00:02:52] so why is policing something we care about
[00:02:54] why is policing something we care about i think this year that point doesn't
[00:02:55] i think this year that point doesn't really need to be explained right it's
[00:02:57] really need to be explained right it's obvious that policing has a tremendous
[00:02:59] obvious that policing has a tremendous impact on communities across the united
[00:03:01] impact on communities across the united states and in fact it's one of the major
[00:03:03] states and in fact it's one of the major leading causes of death for young men
[00:03:05] leading causes of death for young men particularly for young african-american
[00:03:07] particularly for young african-american men
[00:03:09] men and today i'm going to be talking to you
[00:03:10] and today i'm going to be talking to you about police traffic stops why do we
[00:03:12] about police traffic stops why do we care about police traffic stops well
[00:03:14] care about police traffic stops well they're one of the most common ways we
[00:03:15] they're one of the most common ways we interact with police tens of millions of
[00:03:17] interact with police tens of millions of americans are stopped every year
[00:03:21] americans are stopped every year and there's concern that traffic stops
[00:03:22] and there's concern that traffic stops may be racially discriminatory to be
[00:03:25] may be racially discriminatory to be clear about what i mean by racial
[00:03:26] clear about what i mean by racial discrimination and i'll make this more
[00:03:28] discrimination and i'll make this more precise in a couple slides this is when
[00:03:30] precise in a couple slides this is when someone is treated more negatively
[00:03:31] someone is treated more negatively because of their race so someone is
[00:03:33] because of their race so someone is stopped by police because they're black
[00:03:36] stopped by police because they're black they wouldn't have been stopped by
[00:03:37] they wouldn't have been stopped by police had they been driving the same
[00:03:39] police had they been driving the same way in the same car but they've been
[00:03:40] way in the same car but they've been white for example
[00:03:42] white for example now this is obviously very bad if it's
[00:03:43] now this is obviously very bad if it's happening but it's hard to statistically
[00:03:45] happening but it's hard to statistically test for let's talk about why
[00:03:49] test for let's talk about why so first challenge we confronted when we
[00:03:51] so first challenge we confronted when we embarked on this project is that there
[00:03:52] embarked on this project is that there was no unified data set tracking every
[00:03:55] was no unified data set tracking every stop made by the police
[00:03:57] stop made by the police rather the way data is stored is that
[00:03:59] rather the way data is stored is that each department stores data in its own
[00:04:01] each department stores data in its own little system in its own into a
[00:04:02] little system in its own into a syncretic format
[00:04:05] syncretic format and so we set about creating this data
[00:04:06] and so we set about creating this data set and we did so in two stages in the
[00:04:09] set and we did so in two stages in the first stage our journalist collaborators
[00:04:11] first stage our journalist collaborators submitted data requests to more than 150
[00:04:13] submitted data requests to more than 150 police departments over the course of
[00:04:15] police departments over the course of five years this was like a colossal
[00:04:17] five years this was like a colossal amount of work for them journalists are
[00:04:18] amount of work for them journalists are amazing collaborators
[00:04:20] amazing collaborators now of course now that data comes
[00:04:22] now of course now that data comes pouring in and you have this like
[00:04:23] pouring in and you have this like nightmarish data standardization task
[00:04:25] nightmarish data standardization task right where every every single data set
[00:04:27] right where every every single data set is in a different format so we put in
[00:04:28] is in a different format so we put in thousands of hours to clean up the data
[00:04:30] thousands of hours to clean up the data and put it into a standard format
[00:04:34] now the good news for you is that we've
[00:04:36] now the good news for you is that we've made all this data available for you so
[00:04:38] made all this data available for you so if you're looking for interesting data
[00:04:39] if you're looking for interesting data sets on inequality or on policing this
[00:04:42] sets on inequality or on policing this is a publicly available resource which
[00:04:44] is a publicly available resource which is easy to download the full data set
[00:04:46] is easy to download the full data set tracks some 227 million stops made
[00:04:49] tracks some 227 million stops made across 56 city agencies so that's stuff
[00:04:51] across 56 city agencies so that's stuff like the san francisco police department
[00:04:53] like the san francisco police department and 33 state agencies that would be like
[00:04:56] and 33 state agencies that would be like the california highway patrol and in the
[00:04:58] the california highway patrol and in the main analysis i'll be talking to you
[00:05:00] main analysis i'll be talking to you about today we're going to be analyzing
[00:05:01] about today we're going to be analyzing 95 million stops the reason that number
[00:05:04] 95 million stops the reason that number is somewhat smaller is for example we
[00:05:06] is somewhat smaller is for example we have to filter for departments that have
[00:05:08] have to filter for departments that have enough data to even do this analysis at
[00:05:10] enough data to even do this analysis at all if the department doesn't track the
[00:05:12] all if the department doesn't track the race of stop drivers it's very hard to
[00:05:14] race of stop drivers it's very hard to analyze racial discrimination
[00:05:18] so in our analysis we look at three
[00:05:20] so in our analysis we look at three different questions we look at whether
[00:05:21] different questions we look at whether the police discriminate in whom they
[00:05:23] the police discriminate in whom they stop in the first place we look at
[00:05:24] stop in the first place we look at whether they discriminate in whom they
[00:05:26] whether they discriminate in whom they search after stopping them and then we
[00:05:27] search after stopping them and then we look at how policy changes affect these
[00:05:29] look at how policy changes affect these things
[00:05:30] things today i'm only going to be talking to
[00:05:32] today i'm only going to be talking to you about the second question and i'll
[00:05:34] you about the second question and i'll do so because it's particularly
[00:05:35] do so because it's particularly interesting from sort of a data science
[00:05:37] interesting from sort of a data science ai methods standpoint but also because
[00:05:39] ai methods standpoint but also because the methods i'll be describing to you
[00:05:41] the methods i'll be describing to you are applicable to studying bias in many
[00:05:43] are applicable to studying bias in many other human decisions as i'll describe
[00:05:47] so our police search is discriminatory
[00:05:52] so our police search is discriminatory a little bit of context on police
[00:05:53] a little bit of context on police searches so after the police stop a
[00:05:55] searches so after the police stop a driver they're allowed to conduct a
[00:05:56] driver they're allowed to conduct a search in order to find contraband
[00:05:58] search in order to find contraband contraband here means things you're not
[00:06:00] contraband here means things you're not supposed to be carrying like illegal
[00:06:01] supposed to be carrying like illegal drugs weapons etc
[00:06:03] drugs weapons etc the purpose of a search is to find
[00:06:05] the purpose of a search is to find contraband they're not supposed to
[00:06:06] contraband they're not supposed to search you just because they're curious
[00:06:07] search you just because they're curious or because they're trying to harass you
[00:06:09] or because they're trying to harass you or whatever
[00:06:10] or whatever so because the purpose of a search is to
[00:06:12] so because the purpose of a search is to find contraband we're going to test
[00:06:14] find contraband we're going to test whether minorities are searched when
[00:06:16] whether minorities are searched when they are less likely to have contraband
[00:06:18] they are less likely to have contraband at a lower threshold of evidence so if
[00:06:20] at a lower threshold of evidence so if police are searching white drivers for
[00:06:22] police are searching white drivers for example when they're only 40 likely to
[00:06:24] example when they're only 40 likely to carry contraband but they're searching
[00:06:25] carry contraband but they're searching black drivers when they're only 20
[00:06:27] black drivers when they're only 20 likely to carry contraband those
[00:06:29] likely to carry contraband those different different thresholds that
[00:06:30] different different thresholds that would be discrimination under our
[00:06:32] would be discrimination under our definition of discrimination importantly
[00:06:34] definition of discrimination importantly this is only one way the police can
[00:06:36] this is only one way the police can discriminate there are a lot of other
[00:06:37] discriminate there are a lot of other problematic things the police can do as
[00:06:40] problematic things the police can do as we've seen this year of course
[00:06:42] we've seen this year of course we're testing for a very specific type
[00:06:44] we're testing for a very specific type of police discrimination this is not
[00:06:45] of police discrimination this is not comprehensive
[00:06:49] so first simple test of whether the
[00:06:51] so first simple test of whether the police are discriminating and whom they
[00:06:52] police are discriminating and whom they search is to look at the search rates in
[00:06:54] search is to look at the search rates in other words how likely is someone to be
[00:06:56] other words how likely is someone to be searched after a stop and the results of
[00:06:58] searched after a stop and the results of this analysis are shown for our data in
[00:07:01] this analysis are shown for our data in the graph at right state patrol stops on
[00:07:03] the graph at right state patrol stops on the left city stops on the right we're
[00:07:05] the left city stops on the right we're plotting the average search rate across
[00:07:06] plotting the average search rate across locations on the y-axis
[00:07:09] locations on the y-axis and you can see that there are like
[00:07:10] and you can see that there are like these very big gaps in this plot with
[00:07:12] these very big gaps in this plot with black and hispanic drivers much more
[00:07:13] black and hispanic drivers much more likely to be searched after a stop than
[00:07:15] likely to be searched after a stop than are white drivers
[00:07:18] are white drivers but this by itself does not prove that
[00:07:20] but this by itself does not prove that the police are being discriminatory i.e
[00:07:22] the police are being discriminatory i.e applying different thresholds on the
[00:07:24] applying different thresholds on the basis of race
[00:07:25] basis of race it's possible that some races are more
[00:07:27] it's possible that some races are more likely to carry contraband drugs weapons
[00:07:29] likely to carry contraband drugs weapons whatever
[00:07:30] whatever the purpose of a search is to find
[00:07:32] the purpose of a search is to find contraband so if some groups are more
[00:07:33] contraband so if some groups are more likely to carry it police may be more
[00:07:35] likely to carry it police may be more likely to search them even in the
[00:07:37] likely to search them even in the absence of applying different thresholds
[00:07:39] absence of applying different thresholds on the basis of race
[00:07:42] so a second simple test that's been
[00:07:44] so a second simple test that's been proposed to get around this problem is
[00:07:46] proposed to get around this problem is to look not at the rates of searches but
[00:07:48] to look not at the rates of searches but at the outcomes of those searches
[00:07:50] at the outcomes of those searches this is called an outcome test and the
[00:07:52] this is called an outcome test and the idea is you look at how likely the
[00:07:54] idea is you look at how likely the search is to find contraband we call
[00:07:56] search is to find contraband we call that the hit rate this was proposed by
[00:07:58] that the hit rate this was proposed by becker and other economists it's decades
[00:08:00] becker and other economists it's decades old it's a very frequent test for in the
[00:08:02] old it's a very frequent test for in the economics literature
[00:08:04] economics literature the intuition behind this test is like
[00:08:06] the intuition behind this test is like look if searches of white drivers are
[00:08:08] look if searches of white drivers are finding contraband 90 of the time but
[00:08:10] finding contraband 90 of the time but searches of black drivers are finding
[00:08:12] searches of black drivers are finding contraband only 10 of the time it
[00:08:14] contraband only 10 of the time it suggests that police are searching white
[00:08:16] suggests that police are searching white drivers only when they're very likely to
[00:08:17] drivers only when they're very likely to carry contraband but they're searching
[00:08:19] carry contraband but they're searching black drivers on the basis of relatively
[00:08:21] black drivers on the basis of relatively little evidence indicative of
[00:08:23] little evidence indicative of discrimination
[00:08:24] discrimination so if there are differences in the hit
[00:08:26] so if there are differences in the hit rates by race that's discrimination
[00:08:28] rates by race that's discrimination under the outcome test and when you do
[00:08:30] under the outcome test and when you do this analysis on our data you do indeed
[00:08:32] this analysis on our data you do indeed see that hit rates are lower for black
[00:08:34] see that hit rates are lower for black and hispanic drivers in both state stops
[00:08:36] and hispanic drivers in both state stops and city stops than they are for white
[00:08:38] and city stops than they are for white drivers suggesting discrimination
[00:08:40] drivers suggesting discrimination against minority groups
[00:08:43] but it turns out that there's a flaw in
[00:08:45] but it turns out that there's a flaw in the outcome test as well and this is
[00:08:47] the outcome test as well and this is called infra marginality and i'm going
[00:08:49] called infra marginality and i'm going to illustrate it with a simple
[00:08:50] to illustrate it with a simple hypothetical example totally
[00:08:51] hypothetical example totally hypothetical these numbers are made up
[00:08:54] hypothetical these numbers are made up imagine there are two races black
[00:08:55] imagine there are two races black drivers and white drivers and imagine
[00:08:58] drivers and white drivers and imagine among each race there are two groups
[00:08:59] among each race there are two groups those who are very likely to carry
[00:09:01] those who are very likely to carry contraband and those who are quite
[00:09:02] contraband and those who are quite unlikely
[00:09:04] unlikely and these groups are easy to tell apart
[00:09:05] and these groups are easy to tell apart you know maybe one of them is wearing
[00:09:06] you know maybe one of them is wearing blue hats
[00:09:08] blue hats so among the likely group fifty percent
[00:09:10] so among the likely group fifty percent of black drivers carry contraband and
[00:09:12] of black drivers carry contraband and seventy-five percent of white drivers
[00:09:14] seventy-five percent of white drivers carry contraband among the unlikely
[00:09:16] carry contraband among the unlikely group five percent carry contraband
[00:09:18] group five percent carry contraband regardless of their race
[00:09:20] regardless of their race and importantly imagine in this
[00:09:22] and importantly imagine in this hypothetical example that the police are
[00:09:24] hypothetical example that the police are not being discriminatory they search
[00:09:26] not being discriminatory they search everyone who is more than 10 percent
[00:09:28] everyone who is more than 10 percent likely to carry contraband they apply
[00:09:30] likely to carry contraband they apply the same threshold irrespective of
[00:09:32] the same threshold irrespective of driver race
[00:09:33] driver race what are the hit rates for white and
[00:09:35] what are the hit rates for white and black drivers going to be in this
[00:09:36] black drivers going to be in this hypothetical example
[00:09:38] hypothetical example well the police are going to search all
[00:09:40] well the police are going to search all the likely drivers and they're going to
[00:09:41] the likely drivers and they're going to end up with a hit rate of 50 for black
[00:09:43] end up with a hit rate of 50 for black drivers and 75 for white drivers
[00:09:46] drivers and 75 for white drivers so from that difference in hit rates
[00:09:48] so from that difference in hit rates we're going to conclude that there's
[00:09:50] we're going to conclude that there's discrimination in this hypothetical
[00:09:51] discrimination in this hypothetical example but that's a misleading
[00:09:53] example but that's a misleading conclusion because by assumption we're
[00:09:55] conclusion because by assumption we're applying the same threshold to both
[00:09:57] applying the same threshold to both groups
[00:09:59] groups so why is this happening why are we
[00:10:00] so why is this happening why are we getting this misleading result well it's
[00:10:02] getting this misleading result well it's happening because the statistic we're
[00:10:04] happening because the statistic we're looking at the probability of carrying
[00:10:06] looking at the probability of carrying contraband conditional and being above
[00:10:08] contraband conditional and being above the threshold
[00:10:09] the threshold is not the same as what we actually care
[00:10:11] is not the same as what we actually care about which is this
[00:10:15] so these are simply different quantities
[00:10:17] so these are simply different quantities threshold itself is hard to infer
[00:10:19] threshold itself is hard to infer it's not directly measurable from the
[00:10:21] it's not directly measurable from the data the way the hit rate is
[00:10:23] data the way the hit rate is so the solution that's been proposed is
[00:10:25] so the solution that's been proposed is to use a bayesian latent variable model
[00:10:27] to use a bayesian latent variable model to try to infer this threshold so i'll
[00:10:29] to try to infer this threshold so i'll tell you about that now before i do
[00:10:31] tell you about that now before i do though are there any pressing questions
[00:10:32] though are there any pressing questions and also am i talking at an appropriate
[00:10:34] and also am i talking at an appropriate volume
[00:10:36] volume cool
[00:10:39] so
[00:10:40] so the threshold test proposes a stylized
[00:10:42] the threshold test proposes a stylized model of a police stop and when i say
[00:10:43] model of a police stop and when i say stylized what i mean is you can never
[00:10:45] stylized what i mean is you can never capture all aspects of the real world in
[00:10:47] capture all aspects of the real world in math right your hope is that you capture
[00:10:49] math right your hope is that you capture sort of enough relevant aspects of the
[00:10:51] sort of enough relevant aspects of the real world to enable you to measure the
[00:10:52] real world to enable you to measure the quantities of interest
[00:10:54] quantities of interest in this case the thing we want to
[00:10:55] in this case the thing we want to measure is that threshold at which the
[00:10:57] measure is that threshold at which the search is being conducted so the goal of
[00:10:59] search is being conducted so the goal of this model is to estimate the search
[00:11:01] this model is to estimate the search thresholds which are consistent with the
[00:11:03] thresholds which are consistent with the observed data namely the search rates
[00:11:05] observed data namely the search rates and the hit rates
[00:11:06] and the hit rates and discrimination just as before is if
[00:11:09] and discrimination just as before is if lower search thresholds are being
[00:11:10] lower search thresholds are being applied in searches of minority drivers
[00:11:15] so here's how the threshold test models
[00:11:17] so here's how the threshold test models a police stop
[00:11:19] a police stop we imagine that when the officer stops
[00:11:21] we imagine that when the officer stops someone they estimate the probability p
[00:11:24] someone they estimate the probability p that that person carrier is contraband p
[00:11:26] that that person carrier is contraband p captures you know contextual factors
[00:11:28] captures you know contextual factors like the age and the gender of the
[00:11:30] like the age and the gender of the driver how nervous they're acting etc
[00:11:33] driver how nervous they're acting etc and it's drawn from a risk distribution
[00:11:35] and it's drawn from a risk distribution which is shown graphically at right so
[00:11:37] which is shown graphically at right so the risk distribution is a probability
[00:11:39] the risk distribution is a probability distribution on the unit interval so it
[00:11:41] distribution on the unit interval so it ranges from zero to one
[00:11:44] ranges from zero to one for example if the police you know pull
[00:11:46] for example if the police you know pull over a bus driver p is probably quite
[00:11:48] over a bus driver p is probably quite low right because he's like driving kids
[00:11:50] low right because he's like driving kids around hopefully he's not also like
[00:11:52] around hopefully he's not also like burying weapons or drugs and so p would
[00:11:54] burying weapons or drugs and so p would be pretty low on the other hand if they
[00:11:56] be pretty low on the other hand if they pull over a driver and he's like acting
[00:11:58] pull over a driver and he's like acting woozy and drinking out of a bottle like
[00:12:00] woozy and drinking out of a bottle like that's pretty sketchy p is probably
[00:12:02] that's pretty sketchy p is probably higher
[00:12:05] now in order to fit this model at all
[00:12:07] now in order to fit this model at all you have to make some assumption about
[00:12:08] you have to make some assumption about what the risk distributions look like
[00:12:10] what the risk distributions look like you can't fit arbitrary probability
[00:12:12] you can't fit arbitrary probability distributions because then you would
[00:12:13] distributions because then you would have infinite degrees of freedom so the
[00:12:16] have infinite degrees of freedom so the parametric assumption that the model
[00:12:17] parametric assumption that the model makes is that the risk distributions are
[00:12:19] makes is that the risk distributions are beta distributions which is a very
[00:12:21] beta distributions which is a very standard distribution on the unit
[00:12:23] standard distribution on the unit interval
[00:12:26] now if p is greater than some threshold
[00:12:28] now if p is greater than some threshold the officer searches the person and if
[00:12:30] the officer searches the person and if they search the person they find
[00:12:31] they search the person they find contraband with probability p
[00:12:34] contraband with probability p so in the case of the bus driver he'd be
[00:12:36] so in the case of the bus driver he'd be below the threshold and so the officer
[00:12:38] below the threshold and so the officer wouldn't search him wouldn't find
[00:12:39] wouldn't search him wouldn't find contraband
[00:12:41] contraband in the case of the woozy acting driver
[00:12:43] in the case of the woozy acting driver he would be above the threshold so the
[00:12:45] he would be above the threshold so the officer would search him and would find
[00:12:46] officer would search him and would find contraband with a 75
[00:12:48] contraband with a 75 probability
[00:12:52] the model allows the thresholds in the
[00:12:54] the model allows the thresholds in the risk distributions to vary by race and
[00:12:55] risk distributions to vary by race and location and discrimination as before as
[00:12:58] location and discrimination as before as if lower thresholds are being applied in
[00:12:59] if lower thresholds are being applied in searches of minority drivers
[00:13:03] now in order to fit this model at all
[00:13:06] now in order to fit this model at all this being a bayesian model you have to
[00:13:07] this being a bayesian model you have to specify how you go from the unobserved
[00:13:09] specify how you go from the unobserved objects to the observed data so what are
[00:13:12] objects to the observed data so what are the unobserved objects and the observed
[00:13:13] the unobserved objects and the observed data here
[00:13:14] data here well the unobserved objects are the
[00:13:16] well the unobserved objects are the thresholds which are the main thing we
[00:13:18] thresholds which are the main thing we care about and the risk distributions so
[00:13:20] care about and the risk distributions so graphically that's the dotted line and
[00:13:22] graphically that's the dotted line and the blue line in the figure right
[00:13:25] the blue line in the figure right the observed data are the search rates
[00:13:27] the observed data are the search rates and the hit rates for each race and
[00:13:28] and the hit rates for each race and location for example the search rate for
[00:13:31] location for example the search rate for black drivers in alameda county is 30
[00:13:33] black drivers in alameda county is 30 and the hit rate is 40
[00:13:36] and the hit rate is 40 so how do we go from unobserved to
[00:13:37] so how do we go from unobserved to observed
[00:13:39] observed well i've shown this graphically at
[00:13:40] well i've shown this graphically at right
[00:13:41] right the search rate is the amount of the
[00:13:44] the search rate is the amount of the risk distribution that lies above the
[00:13:46] risk distribution that lies above the threshold so graphically it's the amount
[00:13:48] threshold so graphically it's the amount of gray mass you can also you know
[00:13:50] of gray mass you can also you know express it as 1 minus the cdf of the
[00:13:52] express it as 1 minus the cdf of the risk distribution this is intuitive
[00:13:54] risk distribution this is intuitive right it's how much of the risk
[00:13:55] right it's how much of the risk distribution lies above this threshold
[00:13:58] distribution lies above this threshold the hit rate is what is the expected
[00:14:00] the hit rate is what is the expected value of the risk distribution
[00:14:02] value of the risk distribution conditional on drawing from the gray
[00:14:03] conditional on drawing from the gray mass so conditional on drawing from the
[00:14:05] mass so conditional on drawing from the portion of the risk distribution which
[00:14:07] portion of the risk distribution which lies above the threshold what's your
[00:14:08] lies above the threshold what's your expected value
[00:14:10] expected value so that's how we go from these
[00:14:11] so that's how we go from these unobserved objects to the observed data
[00:14:13] unobserved objects to the observed data that's sort of the likelihood portion of
[00:14:15] that's sort of the likelihood portion of the bayesian model
[00:14:17] the bayesian model to complete the bayesian model
[00:14:18] to complete the bayesian model specification you also need a prior you
[00:14:20] specification you also need a prior you need to place priors on your parameters
[00:14:22] need to place priors on your parameters not going to describe that in detail
[00:14:24] not going to describe that in detail but basically in order to complete the
[00:14:26] but basically in order to complete the specification you place priors on the
[00:14:28] specification you place priors on the thresholds and the risk distribution
[00:14:29] thresholds and the risk distribution parameters
[00:14:30] parameters now by combining those two things the
[00:14:32] now by combining those two things the likelihood in the prior you can use
[00:14:34] likelihood in the prior you can use standard patient inference to infer the
[00:14:36] standard patient inference to infer the posterior over the parameters
[00:14:39] posterior over the parameters and the specific thing we care about is
[00:14:41] and the specific thing we care about is what is our best estimate of what those
[00:14:43] what is our best estimate of what those thresholds are given our observed data
[00:14:48] now unfortunately it turns out the story
[00:14:50] now unfortunately it turns out the story i told you is a little too simple and it
[00:14:53] i told you is a little too simple and it turns out that fitting a model on a data
[00:14:54] turns out that fitting a model on a data set of our size is much much much too
[00:14:56] set of our size is much much much too slow and the reason goes back to the
[00:14:59] slow and the reason goes back to the fact that the risk distributions are
[00:15:00] fact that the risk distributions are beta distributions in order to compute
[00:15:03] beta distributions in order to compute the search rate and the hit rate you
[00:15:04] the search rate and the hit rate you have to compute the cdf and conditional
[00:15:06] have to compute the cdf and conditional mean of the beta distribution and it
[00:15:08] mean of the beta distribution and it turns out that that is very slow
[00:15:10] turns out that that is very slow especially when you have to compute
[00:15:11] especially when you have to compute their gradients this
[00:15:15] which you have to do to use the m
[00:15:19] which you have to do to use the m the exact mathematical details of y i'm
[00:15:21] the exact mathematical details of y i'm not going to get into but the tl dr is
[00:15:23] not going to get into but the tl dr is that fitting the entire national data
[00:15:24] that fitting the entire national data set is impossible and perhaps more
[00:15:27] set is impossible and perhaps more importantly the test can't be used by
[00:15:28] importantly the test can't be used by people who really need it journalists
[00:15:31] people who really need it journalists police departments anyone who doesn't
[00:15:32] police departments anyone who doesn't have sort of a ton of compute and a ton
[00:15:35] have sort of a ton of compute and a ton of grad students
[00:15:38] so what we had to do was replace the
[00:15:41] so what we had to do was replace the beta distributions with a new family of
[00:15:42] beta distributions with a new family of probability distributions called
[00:15:44] probability distributions called discriminant distributions
[00:15:46] discriminant distributions and describing those distributions and
[00:15:47] and describing those distributions and details beyond the scope of this talk
[00:15:49] details beyond the scope of this talk although i'm happy to chat with people
[00:15:50] although i'm happy to chat with people afterwards if they're specifically
[00:15:51] afterwards if they're specifically interested in probability distributions
[00:15:53] interested in probability distributions but it turns out that this new family of
[00:15:55] but it turns out that this new family of probability distributions makes the test
[00:15:57] probability distributions makes the test run two orders of magnitude faster
[00:16:00] run two orders of magnitude faster and that makes it feasible to run on a
[00:16:02] and that makes it feasible to run on a data set of our size
[00:16:03] data set of our size i guess a high level takeaway here is
[00:16:05] i guess a high level takeaway here is that like probability distributions are
[00:16:07] that like probability distributions are not just something you learn in cs 109
[00:16:09] not just something you learn in cs 109 so you can pass cs109 they're actually
[00:16:11] so you can pass cs109 they're actually quite practically important and it's
[00:16:12] quite practically important and it's worth paying attention to them and
[00:16:14] worth paying attention to them and thinking about what their drawbacks are
[00:16:18] for now though i'm just going to show
[00:16:20] for now though i'm just going to show you the results which is you know now we
[00:16:22] you the results which is you know now we can actually take this fast threshold
[00:16:23] can actually take this fast threshold test and we can apply it to our national
[00:16:25] test and we can apply it to our national data set
[00:16:26] data set and so here what i'm showing you is the
[00:16:28] and so here what i'm showing you is the output of this model it's the average
[00:16:29] output of this model it's the average estimated threshold
[00:16:31] estimated threshold and again we're averaging across
[00:16:32] and again we're averaging across locations
[00:16:33] locations and you can see that the average
[00:16:35] and you can see that the average threshold is lower for black and
[00:16:37] threshold is lower for black and hispanic drivers than it is for white
[00:16:39] hispanic drivers than it is for white drivers suggesting that they're being
[00:16:41] drivers suggesting that they're being searched on the basis of less evidence
[00:16:45] so to summarize what i've shown you from
[00:16:47] so to summarize what i've shown you from this search analysis i've shown you
[00:16:48] this search analysis i've shown you three results i've shown you that search
[00:16:50] three results i've shown you that search rates are higher for minorities that hit
[00:16:52] rates are higher for minorities that hit rates are lower and that thresholds are
[00:16:54] rates are lower and that thresholds are lower
[00:16:55] lower this is sort of a characteristic pattern
[00:16:57] this is sort of a characteristic pattern for discriminatory searches you'll see
[00:16:59] for discriminatory searches you'll see the same pattern for example if you look
[00:17:01] the same pattern for example if you look at stop and frisk data in new york city
[00:17:03] at stop and frisk data in new york city which is a very very obviously
[00:17:05] which is a very very obviously discriminatory policy
[00:17:07] discriminatory policy all three tests here is suggesting
[00:17:09] all three tests here is suggesting discrimination against minorities but
[00:17:11] discrimination against minorities but the threshold test is doing so in a way
[00:17:12] the threshold test is doing so in a way which is robust to the statistical flaws
[00:17:14] which is robust to the statistical flaws of simpler tests like infra-marginality
[00:17:19] i mentioned that the same methods can be
[00:17:21] i mentioned that the same methods can be applied in other data sets where you
[00:17:22] applied in other data sets where you have a binary decision and a binary
[00:17:24] have a binary decision and a binary outcome so i just want to give you some
[00:17:26] outcome so i just want to give you some quick examples of this
[00:17:27] quick examples of this for example we can apply it in the
[00:17:29] for example we can apply it in the medical domain to covet testing for
[00:17:31] medical domain to covet testing for example where the binary decision is
[00:17:33] example where the binary decision is does someone get tested for covid and
[00:17:35] does someone get tested for covid and the binary outcome is do they test
[00:17:37] the binary outcome is do they test positive for covet and if you see for
[00:17:40] positive for covet and if you see for example that uh minorities who get
[00:17:42] example that uh minorities who get tested for covet are much more likely to
[00:17:44] tested for covet are much more likely to test positive then it's a worrisome sign
[00:17:47] test positive then it's a worrisome sign because it suggests that they're only
[00:17:48] because it suggests that they're only getting tested at higher thresholds of
[00:17:51] getting tested at higher thresholds of evidence they're maybe being
[00:17:52] evidence they're maybe being under-tested for covid and in fact we we
[00:17:54] under-tested for covid and in fact we we do see some evidence that that is the
[00:17:56] do see some evidence that that is the case so this is sort of a more broadly
[00:17:57] case so this is sort of a more broadly applicable methodology
[00:18:01] applicable methodology finally to close on the public policy
[00:18:03] finally to close on the public policy impact of this work um i mentioned one
[00:18:04] impact of this work um i mentioned one benefit of using this different
[00:18:06] benefit of using this different probability distribution as your test
[00:18:07] probability distribution as your test runs 100 times faster and this makes it
[00:18:09] runs 100 times faster and this makes it easier for journalists to use and in
[00:18:11] easier for journalists to use and in fact that was exactly what we saw the
[00:18:13] fact that was exactly what we saw the los angeles times was able to take our
[00:18:15] los angeles times was able to take our faster test with some assistance from
[00:18:17] faster test with some assistance from our team and use it to show that black
[00:18:19] our team and use it to show that black and hispanic drivers in los angeles were
[00:18:21] and hispanic drivers in los angeles were being searched on the basis of less
[00:18:22] being searched on the basis of less evidence
[00:18:23] evidence and in response to that within about a
[00:18:25] and in response to that within about a week the lapd announced that they were
[00:18:27] week the lapd announced that they were going to cut back on police searches in
[00:18:29] going to cut back on police searches in response to these concerns over racial
[00:18:31] response to these concerns over racial bias this is why working with
[00:18:32] bias this is why working with journalists and other real world actors
[00:18:34] journalists and other real world actors is nice because they help you translate
[00:18:36] is nice because they help you translate your sort of research findings into real
[00:18:37] your sort of research findings into real world impact
[00:18:40] okay so before i go on to the second
[00:18:42] okay so before i go on to the second story are there any questions i should
[00:18:44] story are there any questions i should answer
[00:18:46] answer yeah um so we have one question
[00:18:49] yeah um so we have one question from a student asking um in india police
[00:18:52] from a student asking um in india police harass the poor
[00:18:54] harass the poor um based on how someone is dressed for
[00:18:56] um based on how someone is dressed for two or two-seated drivers for example um
[00:18:59] two or two-seated drivers for example um so can this model that you've been
[00:19:00] so can this model that you've been describing be applied based on economic
[00:19:02] describing be applied based on economic status
[00:19:04] status um
[00:19:05] um that's
[00:19:06] that's that's super you know i've given this
[00:19:08] that's super you know i've given this talk like 50 times and no one has ever
[00:19:09] talk like 50 times and no one has ever asked that question that's super
[00:19:10] asked that question that's super interesting i would be curious to hear
[00:19:12] interesting i would be curious to hear more i there is nothing in principle
[00:19:14] more i there is nothing in principle which precludes applying it on the basis
[00:19:15] which precludes applying it on the basis of economic status
[00:19:19] okay should i go that's the only
[00:19:20] okay should i go that's the only question for now
[00:19:22] question for now okay uh cool all right so let's move to
[00:19:25] okay uh cool all right so let's move to our second story which is about using ai
[00:19:27] our second story which is about using ai to study inequality um in in pain um and
[00:19:30] to study inequality um in in pain um and this is joint work with david yuray
[00:19:32] this is joint work with david yuray sendal uh nco uray is a is a professor
[00:19:35] sendal uh nco uray is a is a professor here and he also prefers black and white
[00:19:37] here and he also prefers black and white photos it would appear
[00:19:39] photos it would appear okay oh he's also my academic advisor i
[00:19:41] okay oh he's also my academic advisor i guess this is a relevant point
[00:19:43] guess this is a relevant point okay so a general fact about pain is
[00:19:46] okay so a general fact about pain is that disadvantaged groups experience
[00:19:47] that disadvantaged groups experience more of it um you see this for
[00:19:49] more of it um you see this for socioeconomic disadvantage across a
[00:19:51] socioeconomic disadvantage across a variety of types of pain across multiple
[00:19:53] variety of types of pain across multiple continents across multiple samples it's
[00:19:54] continents across multiple samples it's quite a robust finding and you see it
[00:19:56] quite a robust finding and you see it for racially disadvantaged groups as
[00:19:58] for racially disadvantaged groups as well
[00:20:00] well and this is also true in the condition
[00:20:01] and this is also true in the condition i'll be talking about today knee
[00:20:03] i'll be talking about today knee osteoarthritis which is one of the most
[00:20:04] osteoarthritis which is one of the most common causes of disabling pain uh in
[00:20:07] common causes of disabling pain uh in older adults so mechanically what's
[00:20:08] older adults so mechanically what's happening is that with the wear and tear
[00:20:10] happening is that with the wear and tear of time sort of the padding between your
[00:20:12] of time sort of the padding between your knee bones the roads the bones grind
[00:20:14] knee bones the roads the bones grind together uh and this causes a lot of
[00:20:16] together uh and this causes a lot of pain and it's like very common you know
[00:20:17] pain and it's like very common you know odds are good that like multiple people
[00:20:19] odds are good that like multiple people listening to this talk will develop it
[00:20:22] so
[00:20:23] so in osteoarthritis as in other conditions
[00:20:26] in osteoarthritis as in other conditions disadvantaged groups experience worse
[00:20:28] disadvantaged groups experience worse pain
[00:20:29] pain a natural explanation is like oh maybe
[00:20:31] a natural explanation is like oh maybe they just have worse
[00:20:32] they just have worse osteoarthritis but here's the
[00:20:34] osteoarthritis but here's the interesting thing here's the fact we're
[00:20:36] interesting thing here's the fact we're going to try and explain it turns out
[00:20:38] going to try and explain it turns out these groups have worse pain even when
[00:20:39] these groups have worse pain even when we control for how severe the doctor
[00:20:41] we control for how severe the doctor thinks their disease is
[00:20:44] thinks their disease is so i want to explain to you what i mean
[00:20:46] so i want to explain to you what i mean by that but in order for that to make
[00:20:47] by that but in order for that to make sense i have to explain how we measure
[00:20:50] sense i have to explain how we measure severity and pain
[00:20:51] severity and pain so how do we measure severity basically
[00:20:53] so how do we measure severity basically a doctor looks at an x-ray of the knee
[00:20:55] a doctor looks at an x-ray of the knee grades it on a bunch of factors
[00:20:57] grades it on a bunch of factors and says this is a summary score so like
[00:21:00] and says this is a summary score so like specifically you know they'll look at an
[00:21:02] specifically you know they'll look at an x-ray of the knee and say stuff like oh
[00:21:03] x-ray of the knee and say stuff like oh you definitely have an osteophyte bone
[00:21:05] you definitely have an osteophyte bone spur um and you have these other
[00:21:07] spur um and you have these other features like the joint space between
[00:21:09] features like the joint space between your knee bones has reduced uh and so
[00:21:11] your knee bones has reduced uh and so i'm going to give it a score called the
[00:21:12] i'm going to give it a score called the kelgrand lawrence grade that ranges from
[00:21:14] kelgrand lawrence grade that ranges from zero to four
[00:21:15] zero to four and it's sort of a categorical summary
[00:21:17] and it's sort of a categorical summary measure where higher scores indicate
[00:21:19] measure where higher scores indicate worse pain
[00:21:21] worse pain how do we measure pain well you ask the
[00:21:23] how do we measure pain well you ask the patient a bunch of questions like how
[00:21:24] patient a bunch of questions like how much pain do you feel when you're
[00:21:25] much pain do you feel when you're bending your knee and then we take the
[00:21:27] bending your knee and then we take the answers to those questions and we
[00:21:29] answers to those questions and we aggregate it into a single score called
[00:21:30] aggregate it into a single score called coos pain score so it's the result of a
[00:21:33] coos pain score so it's the result of a survey
[00:21:36] the data we're going to be using comes
[00:21:37] the data we're going to be using comes from the osteoarthritis initiative it's
[00:21:39] from the osteoarthritis initiative it's publicly available data
[00:21:41] publicly available data all the results i'm going to be
[00:21:42] all the results i'm going to be presenting are on about 300 people and
[00:21:44] presenting are on about 300 people and we're going to be comparing pain by
[00:21:46] we're going to be comparing pain by three binary groupings we're going to be
[00:21:47] three binary groupings we're going to be comparing black to non-black patients
[00:21:49] comparing black to non-black patients and almost all the non-black patients in
[00:21:51] and almost all the non-black patients in the dataset are white and we're going to
[00:21:53] the dataset are white and we're going to be comparing lower and higher income
[00:21:54] be comparing lower and higher income patients lower and higher education
[00:21:56] patients lower and higher education patients
[00:22:00] so what do i mean when i say
[00:22:01] so what do i mean when i say disadvantaged patients have more pain so
[00:22:03] disadvantaged patients have more pain so here what i'm showing you is a vertical
[00:22:05] here what i'm showing you is a vertical histogram with pain on the y-axis so
[00:22:07] histogram with pain on the y-axis so lower scores indicate worse pain
[00:22:09] lower scores indicate worse pain and i'm showing you the histograms for
[00:22:11] and i'm showing you the histograms for black versus non-black patients and you
[00:22:13] black versus non-black patients and you can see that there's a big visual
[00:22:15] can see that there's a big visual difference in the histogram where black
[00:22:16] difference in the histogram where black patients have worse pain if you want to
[00:22:18] patients have worse pain if you want to summarize it in a single measure you can
[00:22:20] summarize it in a single measure you can just take the difference in means for
[00:22:21] just take the difference in means for the two groups and it's about 10.6
[00:22:23] the two groups and it's about 10.6 points on the ku scale which is about
[00:22:25] points on the ku scale which is about two-thirds of a standard deviation so
[00:22:27] two-thirds of a standard deviation so it's a big gap
[00:22:30] and the results for income and education
[00:22:32] and the results for income and education are somewhat smaller but still
[00:22:33] are somewhat smaller but still substantively large and statistically
[00:22:35] substantively large and statistically significant
[00:22:37] significant the things i'm showing in parentheses or
[00:22:39] the things i'm showing in parentheses or the confidence intervals
[00:22:42] so what happens when we control for
[00:22:43] so what happens when we control for severity does the pain gap go away it
[00:22:46] severity does the pain gap go away it turns out that it doesn't so now the
[00:22:48] turns out that it doesn't so now the graph i'm showing you right has severity
[00:22:50] graph i'm showing you right has severity on the x-axis that klg score i was
[00:22:52] on the x-axis that klg score i was telling you about before and pain is on
[00:22:54] telling you about before and pain is on the y-axis as before and the important
[00:22:57] the y-axis as before and the important point from this graph is that the orange
[00:22:59] point from this graph is that the orange and blue lines are not on top of each
[00:23:00] and blue lines are not on top of each other even conditional on severity
[00:23:03] other even conditional on severity there's a gap in pain between black and
[00:23:05] there's a gap in pain between black and non-black patients and if we want to
[00:23:07] non-black patients and if we want to summarize the size of that gap in a
[00:23:09] summarize the size of that gap in a single number the standard way to do so
[00:23:11] single number the standard way to do so is with a linear regression specifically
[00:23:13] is with a linear regression specifically we do a regression on pain of pain on
[00:23:16] we do a regression on pain of pain on race and klg and that tells us basically
[00:23:19] race and klg and that tells us basically the size of the pain gap when we control
[00:23:21] the size of the pain gap when we control for that severity score klg
[00:23:24] for that severity score klg and i've shown those numerical results
[00:23:26] and i've shown those numerical results in the second numerical column you can
[00:23:28] in the second numerical column you can see that for race for example the pain
[00:23:30] see that for race for example the pain gap shrinks from 10.6 points when we
[00:23:33] gap shrinks from 10.6 points when we don't control for anything to 9.7 points
[00:23:35] don't control for anything to 9.7 points when we do control for klg the important
[00:23:38] when we do control for klg the important point being it really doesn't get all
[00:23:39] point being it really doesn't get all that much smaller right 10.6 is almost
[00:23:42] that much smaller right 10.6 is almost you know as big as 9.7 it only gets nine
[00:23:44] you know as big as 9.7 it only gets nine percent smaller and results for income
[00:23:46] percent smaller and results for income and education are similar so the high
[00:23:48] and education are similar so the high level takeaway is controlling for
[00:23:50] level takeaway is controlling for severity doesn't do very much to narrow
[00:23:52] severity doesn't do very much to narrow the pain gap this isn't our unique
[00:23:54] the pain gap this isn't our unique finding by the way other studies find
[00:23:56] finding by the way other studies find this as well the goal of our paper is to
[00:23:58] this as well the goal of our paper is to explain why why is there a pain gap even
[00:24:01] explain why why is there a pain gap even conditional on severity
[00:24:05] specifically we're going to try and
[00:24:07] specifically we're going to try and differentiate between two theories the
[00:24:09] differentiate between two theories the first theory we call the outside their
[00:24:11] first theory we call the outside their knees theory namely that there are
[00:24:12] knees theory namely that there are non-knee related factors which are
[00:24:14] non-knee related factors which are causing disadvantaged patients to report
[00:24:16] causing disadvantaged patients to report higher pain even when their knee disease
[00:24:19] higher pain even when their knee disease is no more severe
[00:24:20] is no more severe and this isn't just some like crazy
[00:24:22] and this isn't just some like crazy theory we plucked out of thin air a
[00:24:24] theory we plucked out of thin air a bunch of prior work points to some
[00:24:25] bunch of prior work points to some factors that might you know might cause
[00:24:27] factors that might you know might cause higher pain and disadvantaged groups
[00:24:29] higher pain and disadvantaged groups maybe higher life stress differences in
[00:24:31] maybe higher life stress differences in access to pain medication differences in
[00:24:33] access to pain medication differences in how different groups report pain you
[00:24:35] how different groups report pain you know there are a whole bunch of
[00:24:36] know there are a whole bunch of possibilities
[00:24:38] possibilities the commonality here though is that
[00:24:40] the commonality here though is that whatever the factor is it isn't anything
[00:24:42] whatever the factor is it isn't anything that can be seen in a knee x-ray it's
[00:24:44] that can be seen in a knee x-ray it's something outside the knee
[00:24:47] something outside the knee but there's a second possibility right
[00:24:49] but there's a second possibility right and we call this the in their knees
[00:24:50] and we call this the in their knees theory namely that there are pain
[00:24:52] theory namely that there are pain related ailments in the knee x-ray which
[00:24:55] related ailments in the knee x-ray which klg isn't capturing and if we could
[00:24:57] klg isn't capturing and if we could capture these physical features we would
[00:24:59] capture these physical features we would be able to explain more of the pain gap
[00:25:02] be able to explain more of the pain gap so under the first theory there's
[00:25:03] so under the first theory there's nothing to be seen in the knee x-ray
[00:25:05] nothing to be seen in the knee x-ray that would explain this gap and under
[00:25:07] that would explain this gap and under the second theory there is something to
[00:25:09] the second theory there is something to be seen that klg isn't picking up
[00:25:16] so why is the second hypothesis
[00:25:18] so why is the second hypothesis plausible here are two reasons the first
[00:25:20] plausible here are two reasons the first is that we don't understand pain all
[00:25:21] is that we don't understand pain all that well this is true generally it's
[00:25:23] that well this is true generally it's also true in osteoarthritis specifically
[00:25:25] also true in osteoarthritis specifically klg just doesn't explain all that much
[00:25:27] klg just doesn't explain all that much of the variation in pain
[00:25:28] of the variation in pain and a possible reason for this is that
[00:25:30] and a possible reason for this is that klg was developed decades ago in heavily
[00:25:33] klg was developed decades ago in heavily white british populations and so it's
[00:25:35] white british populations and so it's plausible that it's not capturing all
[00:25:37] plausible that it's not capturing all the environmental or occupational
[00:25:38] the environmental or occupational features that may be relevant to pain in
[00:25:41] features that may be relevant to pain in modern and more diverse populations that
[00:25:43] modern and more diverse populations that may live and work very differently
[00:25:47] so we're going to try and test whether
[00:25:49] so we're going to try and test whether there are overlooked physical features
[00:25:50] there are overlooked physical features in the knee which would explain the
[00:25:52] in the knee which would explain the higher pain levels in disadvantaged
[00:25:53] higher pain levels in disadvantaged groups
[00:25:55] groups this isn't just an academically
[00:25:56] this isn't just an academically interesting question it's also a
[00:25:58] interesting question it's also a question with concrete clinical
[00:25:59] question with concrete clinical implications and the reason is that
[00:26:01] implications and the reason is that whether you get knee surgery depends on
[00:26:03] whether you get knee surgery depends on whether the source of your pain is in
[00:26:04] whether the source of your pain is in your knee
[00:26:05] your knee if you go to the doctor in a lot of pain
[00:26:07] if you go to the doctor in a lot of pain and she looks at your knee and she says
[00:26:08] and she looks at your knee and she says i'm sorry i can't see what's wrong with
[00:26:10] i'm sorry i can't see what's wrong with it she's unlikely to get you knee
[00:26:12] it she's unlikely to get you knee surgery for an apparently healthy knee
[00:26:14] surgery for an apparently healthy knee she's more likely to prescribe
[00:26:15] she's more likely to prescribe non-specific therapies like opioids or
[00:26:17] non-specific therapies like opioids or other painkillers
[00:26:19] other painkillers in contrast if you go to the doctor in a
[00:26:20] in contrast if you go to the doctor in a lot of pain and she says aha i know
[00:26:22] lot of pain and she says aha i know exactly what's wrong with you you very
[00:26:24] exactly what's wrong with you you very severe radiographic arthritis you know
[00:26:26] severe radiographic arthritis you know you're poor on the kelken lawrence scale
[00:26:28] you're poor on the kelken lawrence scale then it's much more likely under
[00:26:29] then it's much more likely under clinical guidelines that you'll get some
[00:26:31] clinical guidelines that you'll get some kind of surgical intervention
[00:26:33] kind of surgical intervention consequently if klg is missing true
[00:26:35] consequently if klg is missing true sources of pain within the knee and
[00:26:37] sources of pain within the knee and disadvantaged groups these groups may be
[00:26:39] disadvantaged groups these groups may be under referred for surgery
[00:26:43] okay so we're going to try and test this
[00:26:45] okay so we're going to try and test this and and methodologically what we're
[00:26:47] and and methodologically what we're going to do is we're going to train a
[00:26:48] going to do is we're going to train a convolutional neural network this is how
[00:26:50] convolutional neural network this is how you know this is sophisticated because
[00:26:51] you know this is sophisticated because we're using deep learning to search for
[00:26:53] we're using deep learning to search for additional signal in the knee x-ray
[00:26:55] additional signal in the knee x-ray which would explain the higher pain
[00:26:56] which would explain the higher pain levels in disadvantaged groups
[00:27:00] so what does that actually mean how are
[00:27:02] so what does that actually mean how are you going to search for additional
[00:27:03] you going to search for additional signal in the knee x-ray well the
[00:27:05] signal in the knee x-ray well the standard approach to searching for
[00:27:06] standard approach to searching for signal in a medical image is to train a
[00:27:08] signal in a medical image is to train a model to replicate the doctor's clinical
[00:27:10] model to replicate the doctor's clinical judgment to train it to predict klg
[00:27:14] judgment to train it to predict klg the problem though is that if klg
[00:27:15] the problem though is that if klg doesn't capture all the pain relevant
[00:27:17] doesn't capture all the pain relevant features we don't want to just replicate
[00:27:18] features we don't want to just replicate it we don't want to set a ceiling of
[00:27:20] it we don't want to set a ceiling of clinical knowledge when by hypothesis
[00:27:22] clinical knowledge when by hypothesis that clinical knowledge might be biased
[00:27:24] that clinical knowledge might be biased or incomplete
[00:27:26] or incomplete so instead what we're going to do is
[00:27:28] so instead what we're going to do is train the model to learn from the
[00:27:29] train the model to learn from the patient by predicting ku's pain score
[00:27:33] patient by predicting ku's pain score so to be very clear the input to the
[00:27:34] so to be very clear the input to the model is an x-ray of the knees and the
[00:27:37] model is an x-ray of the knees and the output is a knee specific pain
[00:27:38] output is a knee specific pain prediction called alg-p for algorithmic
[00:27:40] prediction called alg-p for algorithmic severity measure
[00:27:43] severity measure and if when we control for this
[00:27:45] and if when we control for this algorithmic severity measure alpe it
[00:27:47] algorithmic severity measure alpe it narrows the pain gap more than does
[00:27:48] narrows the pain gap more than does controlling for this clinical severity
[00:27:50] controlling for this clinical severity measure klg
[00:27:52] measure klg it implies that the clinical severity
[00:27:53] it implies that the clinical severity score is overlooking knee features which
[00:27:55] score is overlooking knee features which might explain disadvantaged patients
[00:27:57] might explain disadvantaged patients higher pain levels
[00:28:01] before i go to the results any any
[00:28:03] before i go to the results any any questions about sort of the setup
[00:28:10] okay
[00:28:11] okay um it's in terms of comparing the pain
[00:28:13] um it's in terms of comparing the pain gaps between different factors like
[00:28:15] gaps between different factors like income and race do we have to consider
[00:28:17] income and race do we have to consider overlap between the groups
[00:28:20] overlap between the groups yeah that's a great question um
[00:28:23] yeah that's a great question um there is overlap between the groups
[00:28:24] there is overlap between the groups there's correlation between all three of
[00:28:26] there's correlation between all three of these binary variables
[00:28:28] these binary variables each of the individual pain gaps remains
[00:28:30] each of the individual pain gaps remains statistically significant even when you
[00:28:32] statistically significant even when you control for all three at once
[00:28:34] control for all three at once uh you could probably do an analysis
[00:28:36] uh you could probably do an analysis where you sort of controlled for all
[00:28:37] where you sort of controlled for all three at once so that might be an
[00:28:38] three at once so that might be an interesting thing to do here to kind of
[00:28:40] interesting thing to do here to kind of keep the exposition as as clear as
[00:28:42] keep the exposition as as clear as possible we looked at each group
[00:28:43] possible we looked at each group separately but yeah it's a good point
[00:28:45] separately but yeah it's a good point they're definitely correlated
[00:28:50] great i think that's it for now
[00:28:54] okay so our first result is that the
[00:28:56] okay so our first result is that the algorithm does in fact find additional
[00:28:58] algorithm does in fact find additional signal for pain in the knee x-ray the
[00:29:00] signal for pain in the knee x-ray the algorithmic severity score alk p
[00:29:01] algorithmic severity score alk p predicts pain better than the clinician
[00:29:03] predicts pain better than the clinician severity score klg the r squared is
[00:29:05] severity score klg the r squared is higher the difference is statistically
[00:29:07] higher the difference is statistically significant and you see similar results
[00:29:08] significant and you see similar results for other predictive measures
[00:29:10] for other predictive measures but like those r squareds are really not
[00:29:12] but like those r squareds are really not that high right r squared ranges from
[00:29:14] that high right r squared ranges from zero to one if we're at 0.16 that's not
[00:29:16] zero to one if we're at 0.16 that's not all that high
[00:29:17] all that high um and it's not the central question of
[00:29:18] um and it's not the central question of our analysis anyway which is does
[00:29:20] our analysis anyway which is does controlling for the algorithmic severity
[00:29:22] controlling for the algorithmic severity score reduce the pain gap
[00:29:26] score reduce the pain gap and it turns out that the answer to that
[00:29:27] and it turns out that the answer to that second and more important question is
[00:29:29] second and more important question is also yes
[00:29:31] also yes so here the first column is just what i
[00:29:33] so here the first column is just what i showed you before it says when you
[00:29:35] showed you before it says when you control for klg the pain gap doesn't get
[00:29:37] control for klg the pain gap doesn't get that much smaller
[00:29:38] that much smaller but the second column is new it says
[00:29:40] but the second column is new it says when you control for the algorithm
[00:29:42] when you control for the algorithm severity score lp how much smaller does
[00:29:44] severity score lp how much smaller does the pain gap get the final column gives
[00:29:46] the pain gap get the final column gives the ratio of the two columns so for
[00:29:48] the ratio of the two columns so for arrays for example you can see that the
[00:29:50] arrays for example you can see that the algorithm explains 43 of the pain gap
[00:29:52] algorithm explains 43 of the pain gap while klg explains only nine percent the
[00:29:55] while klg explains only nine percent the ratio of those two point two numbers is
[00:29:56] ratio of those two point two numbers is four point seven
[00:29:58] four point seven the overall implication is that yes
[00:30:00] the overall implication is that yes there is overlooked signal in the knee
[00:30:02] there is overlooked signal in the knee x-ray which helps explain disadvantaged
[00:30:04] x-ray which helps explain disadvantaged patients higher pain so this supports
[00:30:06] patients higher pain so this supports the in the knees hypothesis
[00:30:12] uh yes you should never fit a neural net
[00:30:14] uh yes you should never fit a neural net without doing a lot of robustness checks
[00:30:16] without doing a lot of robustness checks um
[00:30:17] um whatever current computer science
[00:30:18] whatever current computer science practice may be uh and so we do a lot of
[00:30:20] practice may be uh and so we do a lot of them i'm not going to talk about them
[00:30:21] them i'm not going to talk about them now but happy to talk about them more
[00:30:22] now but happy to talk about them more later if people have specific questions
[00:30:26] later if people have specific questions i do though just want to talk about two
[00:30:27] i do though just want to talk about two accessory results uh the first is that a
[00:30:30] accessory results uh the first is that a diverse data set improves uh performance
[00:30:34] diverse data set improves uh performance specifically we compare training the
[00:30:36] specifically we compare training the model on a non-diverse train set from
[00:30:38] model on a non-diverse train set from which we've removed all black patients
[00:30:40] which we've removed all black patients to a diverse train set from which we've
[00:30:42] to a diverse train set from which we've removed the same number of non-black
[00:30:44] removed the same number of non-black patients
[00:30:45] patients um and so the size of the train set
[00:30:47] um and so the size of the train set remains the same we've just altered the
[00:30:48] remains the same we've just altered the racial diversity of it and what we find
[00:30:50] racial diversity of it and what we find is that while both models beat klg using
[00:30:53] is that while both models beat klg using a diverse train set further boosts
[00:30:54] a diverse train set further boosts performance you get a better r squared
[00:30:56] performance you get a better r squared you get a bigger reduction in the pain
[00:30:57] you get a bigger reduction in the pain gap you see similar results for income
[00:30:59] gap you see similar results for income and education as well so to put this
[00:31:01] and education as well so to put this within broader context the sort of ai in
[00:31:03] within broader context the sort of ai in medicine
[00:31:04] medicine there's been a lot of concern that the
[00:31:06] there's been a lot of concern that the training data sets may not be
[00:31:07] training data sets may not be sufficiently diverse um you know and
[00:31:10] sufficiently diverse um you know and this is actually more broadly true than
[00:31:12] this is actually more broadly true than ai in medicine this is true in medicine
[00:31:13] ai in medicine this is true in medicine full stop um and this sort of testifies
[00:31:16] full stop um and this sort of testifies to
[00:31:17] to the importance of collecting diverse
[00:31:19] the importance of collecting diverse diverse data
[00:31:22] and then finally to speak about the
[00:31:24] and then finally to speak about the clinical implications as i said one of
[00:31:26] clinical implications as i said one of the clinical implications of having good
[00:31:28] the clinical implications of having good severity scores is that it influences
[00:31:30] severity scores is that it influences the way surgery is allocated so we
[00:31:32] the way surgery is allocated so we decide to test how would using
[00:31:33] decide to test how would using algorithmic pain scores affect the way
[00:31:35] algorithmic pain scores affect the way surgery is allocated
[00:31:37] surgery is allocated now to test the way surgery is allocated
[00:31:39] now to test the way surgery is allocated we replicate a previous study and we say
[00:31:41] we replicate a previous study and we say we're going to assume knee surgery is
[00:31:42] we're going to assume knee surgery is given to patients with high pain and
[00:31:44] given to patients with high pain and severe disease so two you have to
[00:31:45] severe disease so two you have to satisfy two criteria
[00:31:47] satisfy two criteria and we try measuring severity in two
[00:31:49] and we try measuring severity in two different ways using klg the clinician
[00:31:51] different ways using klg the clinician severity score and using alcp the
[00:31:53] severity score and using alcp the algorithm severity score and we find
[00:31:55] algorithm severity score and we find that because alk p gives disadvantaged
[00:31:57] that because alk p gives disadvantaged patients higher severity scores it's in
[00:31:59] patients higher severity scores it's in turn more likely to recommend them for
[00:32:01] turn more likely to recommend them for surgery for example among black patients
[00:32:04] surgery for example among black patients roughly twice as many knees were
[00:32:05] roughly twice as many knees were eligible for surgery when using the
[00:32:07] eligible for surgery when using the algorithm's severity measure as opposed
[00:32:09] algorithm's severity measure as opposed to klgs
[00:32:12] so to summarize we trained a deep
[00:32:13] so to summarize we trained a deep learning algorithm to predict pain from
[00:32:15] learning algorithm to predict pain from knee x-rays our algorithm finds
[00:32:17] knee x-rays our algorithm finds overlooked signal in the x-ray which
[00:32:18] overlooked signal in the x-ray which helps explain disadvantaged patients
[00:32:20] helps explain disadvantaged patients higher pain and a clinical implication
[00:32:22] higher pain and a clinical implication is that these disadvantaged groups may
[00:32:23] is that these disadvantaged groups may be under referred for surgery
[00:32:27] be under referred for surgery to put this within broader context of
[00:32:29] to put this within broader context of sort of ai and medicine and ai fairness
[00:32:31] sort of ai and medicine and ai fairness there's been a lot of previous and very
[00:32:33] there's been a lot of previous and very important work on how machine learning
[00:32:34] important work on how machine learning methods can potentially increase
[00:32:36] methods can potentially increase disparities in medicine and in other
[00:32:38] disparities in medicine and in other high-stakes domains and that's super
[00:32:40] high-stakes domains and that's super important
[00:32:42] important but we should also keep the more
[00:32:43] but we should also keep the more optimistic flip side in mind that
[00:32:45] optimistic flip side in mind that machine learning and ai you know they
[00:32:46] machine learning and ai you know they give us predictive superpowers and they
[00:32:48] give us predictive superpowers and they shouldn't inherently be a bad thing if
[00:32:50] shouldn't inherently be a bad thing if we're wise enough to apply them properly
[00:32:52] we're wise enough to apply them properly specifically here we show how machine
[00:32:54] specifically here we show how machine learning methods can also reduce
[00:32:55] learning methods can also reduce disparities by detecting signal that
[00:32:57] disparities by detecting signal that humans miss
[00:32:58] humans miss key to our results here key to reducing
[00:33:00] key to our results here key to reducing rather than two increasing disparities
[00:33:02] rather than two increasing disparities is first the choice of the prediction
[00:33:04] is first the choice of the prediction test so we didn't just try and replicate
[00:33:06] test so we didn't just try and replicate clinical knowledge
[00:33:07] clinical knowledge and second we train the model on a
[00:33:09] and second we train the model on a diverse data set and we show that that
[00:33:11] diverse data set and we show that that contributes to our results
[00:33:13] contributes to our results any questions about this before i go to
[00:33:15] any questions about this before i go to the third and final story
[00:33:17] the third and final story yeah so we have a question from
[00:33:20] yeah so we have a question from the first uh first section here slides
[00:33:23] the first uh first section here slides um so
[00:33:24] um so can the bayesian threshold test be
[00:33:26] can the bayesian threshold test be applied where the observed data is the
[00:33:28] applied where the observed data is the output of an algorithm
[00:33:31] output of an algorithm the observed
[00:33:33] the observed uh i mean you would have to give me more
[00:33:35] uh i mean you would have to give me more details but i'm intrigued i mean there's
[00:33:37] details but i'm intrigued i mean there's nothing
[00:33:39] nothing it's designed to assess bias in in
[00:33:42] it's designed to assess bias in in decision making so whether the decision
[00:33:44] decision making so whether the decision maker is human or algorithmic there's
[00:33:46] maker is human or algorithmic there's you know you could apply it to both i
[00:33:47] you know you could apply it to both i would say
[00:33:50] would say in the case of an algorithm you know
[00:33:51] in the case of an algorithm you know it's likely that you know
[00:33:54] it's likely that you know like at least in principle someone knows
[00:33:55] like at least in principle someone knows the threshold right so it might be
[00:33:57] the threshold right so it might be easier to just like figure out the
[00:33:59] easier to just like figure out the actual source code or procedure behind
[00:34:01] actual source code or procedure behind the algorithm rather than attempting to
[00:34:02] the algorithm rather than attempting to infer it but there still might be some
[00:34:04] infer it but there still might be some algorithmic settings where you don't
[00:34:05] algorithmic settings where you don't know that threshold for example it's
[00:34:07] know that threshold for example it's some third-party company and they won't
[00:34:09] some third-party company and they won't tell you what they're doing and then in
[00:34:11] tell you what they're doing and then in principle you might want to apply it
[00:34:12] principle you might want to apply it here
[00:34:14] here and then on the on the line of
[00:34:17] and then on the on the line of kind of
[00:34:18] kind of determining whether or not something is
[00:34:20] determining whether or not something is discriminatory or biased um
[00:34:22] discriminatory or biased um what metric would you suggest for
[00:34:24] what metric would you suggest for testing if something like compass is
[00:34:26] testing if something like compass is discriminatory
[00:34:28] discriminatory so how do you know if in algorithms
[00:34:30] so how do you know if in algorithms uh that
[00:34:32] uh that that's a big question i you know i i
[00:34:34] that's a big question i you know i i would say
[00:34:35] would say it is highly context dependent um you
[00:34:39] it is highly context dependent um you know if you observe large disparities in
[00:34:42] know if you observe large disparities in things like you know in the case of
[00:34:43] things like you know in the case of compass you see these big disparities in
[00:34:44] compass you see these big disparities in like fpr and tpr fall spots are right to
[00:34:46] like fpr and tpr fall spots are right to a positive rate um that should certainly
[00:34:48] a positive rate um that should certainly be a red flag that you want to dig
[00:34:50] be a red flag that you want to dig deeper on but then you want to try and
[00:34:51] deeper on but then you want to try and understand like why are these things to
[00:34:53] understand like why are these things to rising and how can i ameliorate the
[00:34:55] rising and how can i ameliorate the situation i don't think
[00:34:57] situation i don't think i i would not say like in all cases use
[00:34:59] i i would not say like in all cases use auc and that is your golden answer you
[00:35:02] auc and that is your golden answer you know no i don't think so
[00:35:07] should i go
[00:35:09] should i go yeah i think you're good to go
[00:35:11] yeah i think you're good to go okay cool so um now i'm going to move to
[00:35:13] okay cool so um now i'm going to move to our final story on inequality um this is
[00:35:15] our final story on inequality um this is joint work with serena and with pangwai
[00:35:16] joint work with serena and with pangwai so i'm a little nervous because because
[00:35:18] so i'm a little nervous because because palmer will actually know if the details
[00:35:19] palmer will actually know if the details are wrong here um uh and then with so so
[00:35:23] are wrong here um uh and then with so so serena is a computer science phd in
[00:35:24] serena is a computer science phd in ura's lab
[00:35:26] ura's lab and then
[00:35:27] and then we also worked with uh jaleen who's an
[00:35:29] we also worked with uh jaleen who's an epidemiologist at northwestern uh and
[00:35:31] epidemiologist at northwestern uh and then we also worked with beth and david
[00:35:33] then we also worked with beth and david who are sociologists and then with euray
[00:35:36] who are sociologists and then with euray who's a computer scientist so it's very
[00:35:37] who's a computer scientist so it's very interdisciplinary work because we're
[00:35:39] interdisciplinary work because we're sort of studying inequality in copen 19
[00:35:41] sort of studying inequality in copen 19 so intuitively it sort of draws on on
[00:35:43] so intuitively it sort of draws on on people in a bunch of different domains
[00:35:46] okay
[00:35:47] okay so as you know viruses like covet 19
[00:35:50] so as you know viruses like covet 19 spread through human contact that's why
[00:35:52] spread through human contact that's why i'm giving this talk remotely rather
[00:35:53] i'm giving this talk remotely rather than in person
[00:35:55] than in person which is to say there is an underlying
[00:35:56] which is to say there is an underlying contact network which modulates the
[00:35:58] contact network which modulates the spread of the virus
[00:36:02] so under a simple epidemiological model
[00:36:04] so under a simple epidemiological model an infected person can infect anyone she
[00:36:06] an infected person can infect anyone she comes into contact with with some
[00:36:08] comes into contact with with some probability
[00:36:09] probability those people then infect their contacts
[00:36:11] those people then infect their contacts and then you get this you know sort of
[00:36:13] and then you get this you know sort of incredible spread of the disease across
[00:36:15] incredible spread of the disease across the network
[00:36:18] the network so because this network is so important
[00:36:20] so because this network is so important to the spread of the disease current
[00:36:22] to the spread of the disease current models often attempt to estimate it in
[00:36:24] models often attempt to estimate it in some way so they can simulate the spread
[00:36:26] some way so they can simulate the spread of the virus but they often have to use
[00:36:28] of the virus but they often have to use simplistic estimates of the underlying
[00:36:30] simplistic estimates of the underlying contact networks because intuitively
[00:36:32] contact networks because intuitively it's very hard to know who everyone
[00:36:33] it's very hard to know who everyone comes into contact with unless you're
[00:36:34] comes into contact with unless you're living in some kind of surveillance
[00:36:36] living in some kind of surveillance state
[00:36:37] state so people do this in various ways they
[00:36:39] so people do this in various ways they might assume for example that anyone can
[00:36:40] might assume for example that anyone can infect anyone so the network is fully
[00:36:42] infect anyone so the network is fully connected
[00:36:43] connected or you might use some kind of network
[00:36:45] or you might use some kind of network which captures trends at a very macro
[00:36:47] which captures trends at a very macro level for example an airline network
[00:36:49] level for example an airline network which connects city to city but doesn't
[00:36:50] which connects city to city but doesn't tell you anything about the network
[00:36:52] tell you anything about the network within a city or you might use
[00:36:54] within a city or you might use historical data and say i'm just going
[00:36:56] historical data and say i'm just going to assume that what patterns looked like
[00:36:58] to assume that what patterns looked like in 2016 are what they look like now
[00:37:02] in 2016 are what they look like now intuitively though having really crude
[00:37:04] intuitively though having really crude estimates of the contact network is not
[00:37:06] estimates of the contact network is not enough for a couple reasons the first is
[00:37:08] enough for a couple reasons the first is that we're undergoing an incredibly
[00:37:10] that we're undergoing an incredibly dramatic change in human mobility you
[00:37:12] dramatic change in human mobility you know probably in any of our lifetimes
[00:37:13] know probably in any of our lifetimes hopefully in any of our future lifetimes
[00:37:15] hopefully in any of our future lifetimes also right we have these stay-at-home
[00:37:17] also right we have these stay-at-home orders reopening policies like
[00:37:18] orders reopening policies like everything is crazy
[00:37:21] everything is crazy and the second is that we often want to
[00:37:23] and the second is that we often want to find or ask very fine-grained questions
[00:37:25] find or ask very fine-grained questions um that depend on mobility in a very
[00:37:27] um that depend on mobility in a very fine-grained way for example we might
[00:37:29] fine-grained way for example we might want to know the impact of fine-grained
[00:37:31] want to know the impact of fine-grained re-opening policies like what happens if
[00:37:32] re-opening policies like what happens if i open
[00:37:34] i open i open restaurants
[00:37:35] i open restaurants you know from 3 to 4 p.m on saturdays
[00:37:38] you know from 3 to 4 p.m on saturdays but not on wednesdays or something like
[00:37:40] but not on wednesdays or something like this
[00:37:40] this we also might want to understand
[00:37:42] we also might want to understand inequality in infections by race or by
[00:37:45] inequality in infections by race or by socioeconomic status due to mobility
[00:37:48] socioeconomic status due to mobility patterns and intuitively if we want to
[00:37:50] patterns and intuitively if we want to do that we need to understand mobility
[00:37:52] do that we need to understand mobility at a fine grained level simply
[00:37:54] at a fine grained level simply understanding how new york is connected
[00:37:56] understanding how new york is connected to la won't be very useful to helping me
[00:37:59] to la won't be very useful to helping me understand disparities in infection
[00:38:01] understand disparities in infection rates within new york for example
[00:38:02] rates within new york for example between rich and poor new york
[00:38:04] between rich and poor new york neighborhoods
[00:38:07] so because we have to understand this
[00:38:09] so because we have to understand this mobility network in a fine-grained way
[00:38:11] mobility network in a fine-grained way our approach is a two-step approach in
[00:38:13] our approach is a two-step approach in the first step we're going to try and
[00:38:15] the first step we're going to try and estimate the human contact mobility
[00:38:17] estimate the human contact mobility network and then we're going to try and
[00:38:19] network and then we're going to try and build a model to capture transmission on
[00:38:21] build a model to capture transmission on this network so let's talk about each of
[00:38:23] this network so let's talk about each of these steps in turn
[00:38:26] so how do we estimate this network well
[00:38:28] so how do we estimate this network well we're going to use cell phone mobility
[00:38:30] we're going to use cell phone mobility data from a company called safegraph
[00:38:32] data from a company called safegraph specifically that data is going to tell
[00:38:34] specifically that data is going to tell us how many hourly visits there are from
[00:38:36] us how many hourly visits there are from a neighborhood to a place what do i mean
[00:38:38] a neighborhood to a place what do i mean by neighborhood uh this is like a census
[00:38:41] by neighborhood uh this is like a census block group which you can think of as a
[00:38:42] block group which you can think of as a fairly fine grain census area with a
[00:38:44] fairly fine grain census area with a couple hundred to a couple thousand
[00:38:46] couple hundred to a couple thousand people
[00:38:47] people a place which i'll refer to as a poi
[00:38:49] a place which i'll refer to as a poi throughout the talk is a point of
[00:38:51] throughout the talk is a point of interest like a restaurant or a cafe or
[00:38:53] interest like a restaurant or a cafe or a religious establishment you can think
[00:38:55] a religious establishment you can think of them broadly as places people go when
[00:38:57] of them broadly as places people go when they're not at home
[00:38:59] they're not at home so our cell phone mobility data set
[00:39:01] so our cell phone mobility data set basically gives us some sense of the
[00:39:02] basically gives us some sense of the number of hourly visits from a
[00:39:04] number of hourly visits from a neighborhood to a place
[00:39:08] so mathematically what we're going to
[00:39:09] so mathematically what we're going to try and estimate is a neighborhood is a
[00:39:12] try and estimate is a neighborhood is a network that links cbgs neighborhoods to
[00:39:14] network that links cbgs neighborhoods to pois places so you can think of this in
[00:39:17] pois places so you can think of this in various ways you could think of it as a
[00:39:18] various ways you could think of it as a list of matrices a list of networks
[00:39:21] list of matrices a list of networks where each network
[00:39:22] where each network each network represents sort of traffic
[00:39:24] each network represents sort of traffic at one hour or you could think of it as
[00:39:26] at one hour or you could think of it as like a three-dimensional cube
[00:39:28] like a three-dimensional cube where the dimensions are sort of
[00:39:29] where the dimensions are sort of neighborhoods places and and time slices
[00:39:32] neighborhoods places and and time slices but that's the object we're going to try
[00:39:34] but that's the object we're going to try to estimate
[00:39:37] the problem we run into though is that
[00:39:39] the problem we run into though is that the cell phone data that safegraph
[00:39:41] the cell phone data that safegraph provides doesn't actually provide us
[00:39:43] provides doesn't actually provide us with an exact estimate of that hourly
[00:39:45] with an exact estimate of that hourly network
[00:39:46] network the data they give us for the number of
[00:39:48] the data they give us for the number of visits from cbgs to pois is only at a
[00:39:51] visits from cbgs to pois is only at a weekly or monthly level because of the
[00:39:52] weekly or monthly level because of the way they aggregate their data and they
[00:39:54] way they aggregate their data and they also censor it for privacy reasons
[00:39:57] also censor it for privacy reasons so in terms of the actual data that we
[00:39:59] so in terms of the actual data that we have we have the number of hourly people
[00:40:02] have we have the number of hourly people going to each poi the number of hourly
[00:40:04] going to each poi the number of hourly people leaving each cbg
[00:40:06] people leaving each cbg and then we have a noisy estimate of the
[00:40:08] and then we have a noisy estimate of the networks connecting pois to cbgs so you
[00:40:11] networks connecting pois to cbgs so you can sort of think of it as the number of
[00:40:12] can sort of think of it as the number of people going out the number of people
[00:40:14] people going out the number of people coming in and then annoys the estimate
[00:40:16] coming in and then annoys the estimate of the matrix linking going out to
[00:40:18] of the matrix linking going out to coming in
[00:40:21] now it turns out luckily that there is a
[00:40:23] now it turns out luckily that there is a machine learning algorithm uh which is
[00:40:25] machine learning algorithm uh which is designed exactly for this scenario and
[00:40:27] designed exactly for this scenario and which you will learn about if you're
[00:40:29] which you will learn about if you're lucky enough to work with pong we and
[00:40:30] lucky enough to work with pong we and the other people in percy's lab this is
[00:40:32] the other people in percy's lab this is very much pangwai's work and this is
[00:40:33] very much pangwai's work and this is very fundamental to this to this project
[00:40:36] very fundamental to this to this project and it's called iterative proportional
[00:40:37] and it's called iterative proportional fitting
[00:40:38] fitting and basically it's designed for exactly
[00:40:40] and basically it's designed for exactly this setting it says let's imagine that
[00:40:42] this setting it says let's imagine that you're trying to estimate some matrix
[00:40:44] you're trying to estimate some matrix and you know the row sums of that matrix
[00:40:46] and you know the row sums of that matrix and you know the column sums of that
[00:40:47] and you know the column sums of that matrix and then you have a noisy
[00:40:49] matrix and then you have a noisy estimate of the matrix itself
[00:40:51] estimate of the matrix itself ipf is an algorithm that will give you
[00:40:53] ipf is an algorithm that will give you back a matrix which is consistent with
[00:40:55] back a matrix which is consistent with those row sums and column sums and
[00:40:57] those row sums and column sums and subject to that constraint is as similar
[00:40:59] subject to that constraint is as similar as possible in terms of kl divergence to
[00:41:01] as possible in terms of kl divergence to the initial noisy matrix and that's
[00:41:04] the initial noisy matrix and that's exactly the setting that we're operating
[00:41:05] exactly the setting that we're operating in here so we use ipf to estimate the
[00:41:08] in here so we use ipf to estimate the true mobility networks from the noisy
[00:41:10] true mobility networks from the noisy safe graph data
[00:41:14] so that's a little mathy a little
[00:41:15] so that's a little mathy a little abstract let's sort of give you a
[00:41:17] abstract let's sort of give you a picture
[00:41:18] picture so here what we're showing you is an
[00:41:19] so here what we're showing you is an example from the chicago msa
[00:41:22] example from the chicago msa and we're showing you from two slices
[00:41:24] and we're showing you from two slices two time slices uh the first time slice
[00:41:26] two time slices uh the first time slice on the left comes from early march and
[00:41:27] on the left comes from early march and the second comes from early april uh
[00:41:30] the second comes from early april uh after social distancing measures have
[00:41:31] after social distancing measures have started to take effect
[00:41:33] started to take effect and the gray lines here represent the
[00:41:35] and the gray lines here represent the number of hourly visits from a cbg to a
[00:41:37] number of hourly visits from a cbg to a poi
[00:41:38] poi so you can see two things from this this
[00:41:40] so you can see two things from this this visualization uh first is that the
[00:41:43] visualization uh first is that the density of the gray lines decreases
[00:41:44] density of the gray lines decreases indicating that total mobility has
[00:41:46] indicating that total mobility has decreased from march till april
[00:41:48] decreased from march till april and the second is that most of the lines
[00:41:50] and the second is that most of the lines are vertical indicating that people
[00:41:51] are vertical indicating that people mostly hang around their own homes and
[00:41:53] mostly hang around their own homes and that makes sense
[00:41:57] okay so now we got our network
[00:41:59] okay so now we got our network honestly if you didn't understand any of
[00:42:00] honestly if you didn't understand any of the matthew that that's fine the the
[00:42:02] the matthew that that's fine the the main point is we have a network linking
[00:42:03] main point is we have a network linking pois to cbgs at an hourly level now we
[00:42:06] pois to cbgs at an hourly level now we have to put a disease transmission model
[00:42:08] have to put a disease transmission model on top of this network uh and this
[00:42:09] on top of this network uh and this relies on a pretty simple uh
[00:42:11] relies on a pretty simple uh epidemiological model and i'm going to
[00:42:12] epidemiological model and i'm going to give you a 30 second crash course in
[00:42:14] give you a 30 second crash course in epidemiology and then you'll know about
[00:42:16] epidemiology and then you'll know about as much as i do about epidemiology so
[00:42:18] as much as i do about epidemiology so let's describe the model now
[00:42:20] let's describe the model now so a very standard model in epidemiology
[00:42:22] so a very standard model in epidemiology is called an s-e-i-r model and probably
[00:42:25] is called an s-e-i-r model and probably some of you have heard of this if you've
[00:42:26] some of you have heard of this if you've been
[00:42:27] been reading the news
[00:42:28] reading the news and the basic idea is that people move
[00:42:31] and the basic idea is that people move through four states in that order
[00:42:32] through four states in that order s-e-i-r you can't go in any other order
[00:42:34] s-e-i-r you can't go in any other order you can't go back and loops so how does
[00:42:36] you can't go back and loops so how does this work you start at the beginning
[00:42:38] this work you start at the beginning before a disease has entered a
[00:42:39] before a disease has entered a population you start in the susceptible
[00:42:41] population you start in the susceptible state which is to say you don't have the
[00:42:43] state which is to say you don't have the disease you've never had the disease but
[00:42:45] disease you've never had the disease but you're susceptible to it
[00:42:46] you're susceptible to it now if you come into contact with
[00:42:48] now if you come into contact with someone who's infectious you can move to
[00:42:49] someone who's infectious you can move to the exposed state uh which is to say you
[00:42:52] the exposed state uh which is to say you now have the virus but you're not
[00:42:54] now have the virus but you're not infectious yourself yet so it's sort of
[00:42:56] infectious yourself yet so it's sort of in in your body but at low levels
[00:42:58] in in your body but at low levels now after some period of time you move
[00:43:00] now after some period of time you move from exposed to infectious meaning you
[00:43:03] from exposed to infectious meaning you have it and you can infect other people
[00:43:05] have it and you can infect other people and then after some further period of
[00:43:06] and then after some further period of time you move to the removed state which
[00:43:08] time you move to the removed state which is to say you no longer have the disease
[00:43:10] is to say you no longer have the disease you can't catch the disease maybe you've
[00:43:12] you can't catch the disease maybe you've recovered maybe you've died but at any
[00:43:14] recovered maybe you've died but at any point in any case you can't catch it
[00:43:15] point in any case you can't catch it again
[00:43:16] again so what we're going to do is we're going
[00:43:18] so what we're going to do is we're going to say we're at each hour of our
[00:43:20] to say we're at each hour of our simulation for each neighborhood each
[00:43:23] simulation for each neighborhood each cpg in our simulation we're going to
[00:43:25] cpg in our simulation we're going to model the fraction of people in each of
[00:43:27] model the fraction of people in each of these four states so we might say in
[00:43:29] these four states so we might say in neighborhood five at hour four ninety
[00:43:32] neighborhood five at hour four ninety percent of people are in the susceptible
[00:43:33] percent of people are in the susceptible state seven percent are in the exposed
[00:43:35] state seven percent are in the exposed state one percent are in the infectious
[00:43:36] state one percent are in the infectious state and two percent are in the removed
[00:43:38] state and two percent are in the removed state
[00:43:39] state and then we're going to update that hour
[00:43:41] and then we're going to update that hour by hour
[00:43:44] so we have to model transitions between
[00:43:46] so we have to model transitions between these four states
[00:43:48] these four states two of the transitions the last two
[00:43:50] two of the transitions the last two transitions are pretty straightforward
[00:43:51] transitions are pretty straightforward and boring and don't depend on mobility
[00:43:53] and boring and don't depend on mobility we just say at each time step you have
[00:43:55] we just say at each time step you have some constant chance of transitioning to
[00:43:57] some constant chance of transitioning to the next step
[00:44:00] but intuitively the first transition
[00:44:02] but intuitively the first transition that s to e transition is going to
[00:44:04] that s to e transition is going to depend a lot on mobility because whether
[00:44:07] depend a lot on mobility because whether or not you get sick depends on whom you
[00:44:09] or not you get sick depends on whom you come into contact with
[00:44:12] so how do we model this critical s to e
[00:44:14] so how do we model this critical s to e transition
[00:44:15] transition we assume that infections can occur in
[00:44:17] we assume that infections can occur in two ways at cbgs and at pois you can
[00:44:21] two ways at cbgs and at pois you can think of cbg infections as like you're
[00:44:23] think of cbg infections as like you're just hanging around your house but
[00:44:24] just hanging around your house but unfortunately someone in your house is
[00:44:26] unfortunately someone in your house is sick and so now you're sick you can
[00:44:27] sick and so now you're sick you can think of poi infections as you went out
[00:44:29] think of poi infections as you went out to a bar there was someone in the bar
[00:44:31] to a bar there was someone in the bar who was sick and now you yourself are
[00:44:32] who was sick and now you yourself are sick
[00:44:35] so we assume that the cbg infection rate
[00:44:37] so we assume that the cbg infection rate is just proportional to the fraction of
[00:44:39] is just proportional to the fraction of a cvg which is infected intuitively if
[00:44:42] a cvg which is infected intuitively if more people are in your neighborhood or
[00:44:43] more people are in your neighborhood or sick it's more likely that you yourself
[00:44:45] sick it's more likely that you yourself will get sick
[00:44:47] will get sick the poi infection rate is a little bit
[00:44:49] the poi infection rate is a little bit more complicated we assume that the
[00:44:52] more complicated we assume that the probability of getting infected at a poi
[00:44:53] probability of getting infected at a poi is proportional to the fraction of the
[00:44:55] is proportional to the fraction of the poi which is infected times a poi
[00:44:58] poi which is infected times a poi specific factor which is capturing sort
[00:45:01] specific factor which is capturing sort of specific features about the poi like
[00:45:03] of specific features about the poi like how big it is how long people stay there
[00:45:05] how big it is how long people stay there um and so intuitively places that are
[00:45:07] um and so intuitively places that are sort of smaller and more crowded are
[00:45:09] sort of smaller and more crowded are more dangerous and that's what this this
[00:45:11] more dangerous and that's what this this part of the simulation is capturing
[00:45:18] a nice thing about this model is that
[00:45:20] a nice thing about this model is that it's relatively simple for each city
[00:45:22] it's relatively simple for each city we're only going to have three free
[00:45:23] we're only going to have three free parameters which remain fixed over time
[00:45:25] parameters which remain fixed over time in spite of the dramatic changes in
[00:45:27] in spite of the dramatic changes in human mobility those three frame free
[00:45:29] human mobility those three frame free parameters are going to scale those two
[00:45:31] parameters are going to scale those two types of infections infections at cbgs
[00:45:34] types of infections infections at cbgs and infections at pois and then we're
[00:45:36] and infections at pois and then we're also going to have a parameter which
[00:45:38] also going to have a parameter which scales sort of the initial conditions in
[00:45:40] scales sort of the initial conditions in the model what fraction of people
[00:45:41] the model what fraction of people started infected the rest of the
[00:45:43] started infected the rest of the parameters we're just going to take from
[00:45:44] parameters we're just going to take from the prior literature we're not going to
[00:45:46] the prior literature we're not going to estimate them at all and this is
[00:45:48] estimate them at all and this is important because it minimizes concerns
[00:45:49] important because it minimizes concerns about overfitting
[00:45:53] okay
[00:45:54] okay how are we actually going to choose
[00:45:55] how are we actually going to choose those three free parameters we're going
[00:45:57] those three free parameters we're going to do what's called grid search we're
[00:45:58] to do what's called grid search we're going to look over all possible
[00:46:00] going to look over all possible parameter combinations of those three
[00:46:02] parameter combinations of those three free parameters for each city
[00:46:04] free parameters for each city how are we going to choose which one is
[00:46:05] how are we going to choose which one is best well we're going to take real covet
[00:46:08] best well we're going to take real covet case data the number of coveted cases
[00:46:10] case data the number of coveted cases every day from the new york times and
[00:46:12] every day from the new york times and we're going to keep the parameter
[00:46:13] we're going to keep the parameter combination which gives us the best fit
[00:46:15] combination which gives us the best fit to real cases in terms of arma c
[00:46:18] to real cases in terms of arma c now in order to capture uncertainty in
[00:46:20] now in order to capture uncertainty in the parameters we're actually not just
[00:46:22] the parameters we're actually not just going to show results from that best fit
[00:46:24] going to show results from that best fit set of parameters we're also going to
[00:46:26] set of parameters we're also going to use all parameter settings which yield
[00:46:28] use all parameter settings which yield an rmse within 20 of that best fit rmse
[00:46:31] an rmse within 20 of that best fit rmse and that kind of captures the idea that
[00:46:33] and that kind of captures the idea that like look our parameters are somewhat
[00:46:35] like look our parameters are somewhat uncertain here uh and you know we want
[00:46:37] uncertain here uh and you know we want to capture that uncertainty some of you
[00:46:39] to capture that uncertainty some of you might be thinking like oh you know i
[00:46:40] might be thinking like oh you know i think beijing inference or something
[00:46:42] think beijing inference or something might be a more principled way to do
[00:46:43] might be a more principled way to do this totally agree uh please figure it
[00:46:46] this totally agree uh please figure it out and write to us i think that would
[00:46:47] out and write to us i think that would be awesome um and in terms of the time
[00:46:49] be awesome um and in terms of the time period we're going to model oh but like
[00:46:51] period we're going to model oh but like why didn't we do that because it was
[00:46:53] why didn't we do that because it was computed it's computationally difficult
[00:46:55] computed it's computationally difficult as it is to fit this model and so we
[00:46:56] as it is to fit this model and so we just weren't you know with that that
[00:46:58] just weren't you know with that that would have been a further computational
[00:46:59] would have been a further computational difficulty but i think it's an
[00:47:00] difficulty but i think it's an interesting direction for future visions
[00:47:02] interesting direction for future visions and friends is only taught next week
[00:47:05] and friends is only taught next week oh nice oh so
[00:47:06] oh nice oh so hopefully
[00:47:08] hopefully they did they understand the fleet stuff
[00:47:09] they did they understand the fleet stuff at all whatever okay well you'll
[00:47:10] at all whatever okay well you'll understand it even better next week
[00:47:11] understand it even better next week that'll be great anyway um
[00:47:14] that'll be great anyway um but yeah but next week you can figure
[00:47:15] but yeah but next week you can figure this out for us that sounds good um okay
[00:47:18] this out for us that sounds good um okay cool anyway we model early march to
[00:47:20] cool anyway we model early march to early may um and the reason we choose
[00:47:21] early may um and the reason we choose that time period is that's what was
[00:47:23] that time period is that's what was available while we were doing this
[00:47:24] available while we were doing this analysis
[00:47:26] analysis cool uh okay so to make things a little
[00:47:28] cool uh okay so to make things a little more concrete i want to show you just a
[00:47:30] more concrete i want to show you just a sort of video of how this model looks
[00:47:32] sort of video of how this model looks over time is this actually going to work
[00:47:34] over time is this actually going to work praying okay
[00:47:38] praying okay okay so what's going on here i'll just
[00:47:40] okay so what's going on here i'll just talk you through the three graphs in
[00:47:41] talk you through the three graphs in turn the graph at right is showing you
[00:47:43] turn the graph at right is showing you mobility over time this is not from our
[00:47:45] mobility over time this is not from our model it's from the raw data the y-axis
[00:47:48] model it's from the raw data the y-axis is the number of visits to pois you can
[00:47:50] is the number of visits to pois you can think of it as a measure of overall
[00:47:51] think of it as a measure of overall mobility
[00:47:52] mobility and you can see that it drops pretty
[00:47:54] and you can see that it drops pretty dramatically about three weeks into the
[00:47:56] dramatically about three weeks into the simulation so three weeks into march and
[00:47:58] simulation so three weeks into march and that's as social distancing takes effect
[00:48:01] that's as social distancing takes effect the middle graph is model output it's
[00:48:03] the middle graph is model output it's showing you the fraction of people the
[00:48:05] showing you the fraction of people the model thinks are in each of the four
[00:48:07] model thinks are in each of the four states and it's a logarithmic graph um
[00:48:10] states and it's a logarithmic graph um and you can see that the fraction of
[00:48:11] and you can see that the fraction of people in the eir states i.e those
[00:48:13] people in the eir states i.e those who've had the disease rises over time
[00:48:16] who've had the disease rises over time and you can also see sort of how
[00:48:18] and you can also see sort of how mobility is feeding into the model for
[00:48:20] mobility is feeding into the model for example if you look at that e-state you
[00:48:21] example if you look at that e-state you can see sort of a very high frequency
[00:48:23] can see sort of a very high frequency wiggle just like there is a very high
[00:48:25] wiggle just like there is a very high frequency wiggle in the mobility
[00:48:27] frequency wiggle in the mobility patterns those are daily changes in
[00:48:29] patterns those are daily changes in mobility so over the course of the day
[00:48:31] mobility so over the course of the day and that's basically telling you look
[00:48:32] and that's basically telling you look people are more likely to get sick when
[00:48:34] people are more likely to get sick when they're going out in the middle of the
[00:48:35] they're going out in the middle of the day than in the middle of the night
[00:48:38] day than in the middle of the night and finally that graph at the left is
[00:48:40] and finally that graph at the left is showing you uh by sort of spatially
[00:48:43] showing you uh by sort of spatially geographically where does the model
[00:48:45] geographically where does the model think people are most likely to get sick
[00:48:47] think people are most likely to get sick redder indicates that a larger fraction
[00:48:49] redder indicates that a larger fraction of the population is in one of the
[00:48:50] of the population is in one of the infected states and you can see sort of
[00:48:53] infected states and you can see sort of that especially red segment in the
[00:48:55] that especially red segment in the middle of the city
[00:48:56] middle of the city and i'll return to that point in a bit
[00:49:01] so
[00:49:02] so okay anyone can make pretty graphs does
[00:49:04] okay anyone can make pretty graphs does this actually fit the data yes it turns
[00:49:06] this actually fit the data yes it turns out it does fit observed case count data
[00:49:08] out it does fit observed case count data reasonably well here the orange x's are
[00:49:11] reasonably well here the orange x's are reported cases and the blue is the model
[00:49:13] reported cases and the blue is the model prediction and that you can see that it
[00:49:15] prediction and that you can see that it fits the observed data reasonably well
[00:49:17] fits the observed data reasonably well uh even if as in the left plot you only
[00:49:20] uh even if as in the left plot you only fit the model on data prior to april and
[00:49:22] fit the model on data prior to april and then see how well it performs on data
[00:49:24] then see how well it performs on data from april to may it continues to fit
[00:49:27] from april to may it continues to fit the data reasonably well
[00:49:31] this isn't just true in chicago that's
[00:49:33] this isn't just true in chicago that's not a cherry-picked example it fits data
[00:49:35] not a cherry-picked example it fits data pretty well across cities
[00:49:37] pretty well across cities and it turns out that it also fits the
[00:49:39] and it turns out that it also fits the data better than two baselines that we
[00:49:41] data better than two baselines that we try comparing to but i think the high
[00:49:43] try comparing to but i think the high level point here is not that we have
[00:49:44] level point here is not that we have some super duper predictive model the
[00:49:46] some super duper predictive model the high level point is like look we have
[00:49:48] high level point is like look we have this model that fits the data reasonably
[00:49:50] this model that fits the data reasonably well but also enables you to ask very
[00:49:52] well but also enables you to ask very fine-grained questions so let's talk
[00:49:54] fine-grained questions so let's talk about some of those fine green questions
[00:49:56] about some of those fine green questions now what are some of the questions you
[00:49:58] now what are some of the questions you can ask with this model
[00:50:03] so this model is of
[00:50:05] so this model is of what would have happened if we had done
[00:50:07] what would have happened if we had done something differently what if we had
[00:50:08] something differently what if we had started distancing a week later what if
[00:50:10] started distancing a week later what if we had distanced to only 50 as much as
[00:50:12] we had distanced to only 50 as much as we actually had
[00:50:14] we actually had it can help you ask stuff like what are
[00:50:16] it can help you ask stuff like what are the riskiest locations the riskiest
[00:50:18] the riskiest locations the riskiest pli's are there pois which are likely to
[00:50:20] pli's are there pois which are likely to be super spreader locations because they
[00:50:22] be super spreader locations because they have a ton of people
[00:50:24] have a ton of people it can help you answer questions like
[00:50:26] it can help you answer questions like what's the impact of different reopening
[00:50:27] what's the impact of different reopening strategies what happens if you reopen
[00:50:29] strategies what happens if you reopen pois only halfway for example like only
[00:50:32] pois only halfway for example like only to half of their maximum capacity how do
[00:50:34] to half of their maximum capacity how do infection rates look like under that
[00:50:36] infection rates look like under that scenario
[00:50:37] scenario and finally it can help you
[00:50:39] and finally it can help you understand why socioeconomic and racial
[00:50:41] understand why socioeconomic and racial disparities arise and today i'm actually
[00:50:44] disparities arise and today i'm actually only going to talk about that fourth
[00:50:45] only going to talk about that fourth question the rest of the answers to the
[00:50:47] question the rest of the answers to the other questions you can find in our
[00:50:49] other questions you can find in our paper and i think there are probably
[00:50:50] paper and i think there are probably other interesting questions you can ask
[00:50:53] other interesting questions you can ask as well um the basic point though is
[00:50:55] as well um the basic point though is because the model sort of you know it
[00:50:57] because the model sort of you know it models mobility from neighborhoods to
[00:50:59] models mobility from neighborhoods to individual places in such a fine grained
[00:51:01] individual places in such a fine grained way you can ask a lot of questions that
[00:51:03] way you can ask a lot of questions that naturally flow from that fine-grained
[00:51:05] naturally flow from that fine-grained mobility network
[00:51:08] mobility network okay so let's talk briefly about
[00:51:09] okay so let's talk briefly about disparities so we know that socially
[00:51:12] disparities so we know that socially disadvantaged racial and socio-economic
[00:51:14] disadvantaged racial and socio-economic groups were hit harder by covet 19.
[00:51:16] groups were hit harder by covet 19. higher case rates higher death rates
[00:51:18] higher case rates higher death rates disparities are very dramatic that's not
[00:51:19] disparities are very dramatic that's not our work that's prior work it's very
[00:51:21] our work that's prior work it's very very clear very striking
[00:51:24] very clear very striking so
[00:51:25] so there are a bunch of reasons for this
[00:51:26] there are a bunch of reasons for this right it's not all mobility you know
[00:51:28] right it's not all mobility you know it's stuff like pre-existing conditions
[00:51:29] it's stuff like pre-existing conditions differences in access to care worse care
[00:51:31] differences in access to care worse care when they do you know get get into the
[00:51:33] when they do you know get get into the hospital
[00:51:34] hospital etc but mobility is probably part of it
[00:51:37] etc but mobility is probably part of it too we know for example that if you are
[00:51:40] too we know for example that if you are of lower socioeconomic status it's
[00:51:42] of lower socioeconomic status it's harder for you to work from home more
[00:51:43] harder for you to work from home more likely that you're an essential worker
[00:51:44] likely that you're an essential worker more likely that you have to go out and
[00:51:46] more likely that you have to go out and do these dangerous jobs and you expose
[00:51:48] do these dangerous jobs and you expose yourself to risk of infection
[00:51:50] yourself to risk of infection so it's interesting to ask first does
[00:51:52] so it's interesting to ask first does our model learn
[00:51:54] our model learn that disparities you know flow in part
[00:51:56] that disparities you know flow in part from mobility like can the model
[00:51:58] from mobility like can the model naturally predict the emergence of these
[00:51:59] naturally predict the emergence of these disparities and second if it does can it
[00:52:02] disparities and second if it does can it expose the mechanisms via which these
[00:52:04] expose the mechanisms via which these disparities arise
[00:52:06] disparities arise in order to study this we don't actually
[00:52:08] in order to study this we don't actually have data on individual people so what
[00:52:09] have data on individual people so what we do is we compare neighborhoods we
[00:52:11] we do is we compare neighborhoods we compare higher and lower income
[00:52:13] compare higher and lower income neighborhoods for example and we look at
[00:52:15] neighborhoods for example and we look at how infection rates vary
[00:52:19] so a first result is yes the model does
[00:52:21] so a first result is yes the model does predict the emergence of these
[00:52:23] predict the emergence of these disparities based on mobility patterns
[00:52:25] disparities based on mobility patterns alone here the left graph is showing you
[00:52:28] alone here the left graph is showing you disparities by income and the right
[00:52:30] disparities by income and the right graph is showing you disparities by race
[00:52:33] graph is showing you disparities by race on the x-axis what we're plotting is how
[00:52:35] on the x-axis what we're plotting is how much likelier are people to get infected
[00:52:37] much likelier are people to get infected so for the left graph if you're in a
[00:52:39] so for the left graph if you're in a lower income cbg and on the right graph
[00:52:42] lower income cbg and on the right graph if you're from a less white cbg
[00:52:44] if you're from a less white cbg and you can see basically that all those
[00:52:46] and you can see basically that all those boxes all the blue boxes are to the
[00:52:48] boxes all the blue boxes are to the right of one indicating that people are
[00:52:50] right of one indicating that people are likelier to get infected under the
[00:52:52] likelier to get infected under the simulation uh if they're from a lower
[00:52:55] simulation uh if they're from a lower income or a less white cbg
[00:52:58] income or a less white cbg so the model is predicting these ses and
[00:53:00] so the model is predicting these ses and racial disparities socioeconomic and
[00:53:01] racial disparities socioeconomic and racial disparities on the basis of
[00:53:03] racial disparities on the basis of mobility patterns alone
[00:53:06] and because the disparities by
[00:53:08] and because the disparities by socioeconomic status are particularly
[00:53:09] socioeconomic status are particularly dramatic i'll focus on those for the
[00:53:11] dramatic i'll focus on those for the rest of the talk but you can see all the
[00:53:12] rest of the talk but you can see all the results for both in the paper
[00:53:16] results for both in the paper so why is this happening uh well we show
[00:53:19] so why is this happening uh well we show two two mechanisms via which it arises
[00:53:21] two two mechanisms via which it arises the first you probably already guessed
[00:53:23] the first you probably already guessed it's that people from lower income and
[00:53:24] it's that people from lower income and less white cbgs weren't able to reduce
[00:53:27] less white cbgs weren't able to reduce their mobility as much uh they had to go
[00:53:29] their mobility as much uh they had to go out more and this is probably in part
[00:53:31] out more and this is probably in part because of stuff like differences in
[00:53:32] because of stuff like differences in occupation they're more likely to be
[00:53:34] occupation they're more likely to be essential workers
[00:53:36] essential workers but the second mechanism is a little
[00:53:38] but the second mechanism is a little subtler
[00:53:39] subtler and it's this it's that when they do go
[00:53:41] and it's this it's that when they do go out they go to places which are smaller
[00:53:44] out they go to places which are smaller and more crowded and therefore more
[00:53:45] and more crowded and therefore more dangerous and this is true even within
[00:53:47] dangerous and this is true even within the same type of poi so even conditional
[00:53:50] the same type of poi so even conditional on i went out i went to a restaurant the
[00:53:53] on i went out i went to a restaurant the people coming from lower income cbgs
[00:53:55] people coming from lower income cbgs tend to go to restaurants that are
[00:53:57] tend to go to restaurants that are smaller and more crowded and more
[00:53:58] smaller and more crowded and more dangerous and that's the second thing
[00:54:00] dangerous and that's the second thing contributing to these infection rate
[00:54:02] contributing to these infection rate disparities
[00:54:06] so i want to show an example of this for
[00:54:08] so i want to show an example of this for philadelphia which is the place where we
[00:54:10] philadelphia which is the place where we see the most striking disparities let's
[00:54:12] see the most striking disparities let's see if i can get this to play
[00:54:15] see if i can get this to play yeah okay so here this graph on the left
[00:54:17] yeah okay so here this graph on the left is showing you philadelphia and it's
[00:54:19] is showing you philadelphia and it's showing you results over time and what
[00:54:21] showing you results over time and what you can see over time is that you know
[00:54:24] you can see over time is that you know this big red spot emerges in the middle
[00:54:27] this big red spot emerges in the middle of philadelphia
[00:54:28] of philadelphia and where is that well it turns out to
[00:54:30] and where is that well it turns out to be the place uh with the highest
[00:54:32] be the place uh with the highest population density so that's the top
[00:54:34] population density so that's the top right plot and it's also the place with
[00:54:36] right plot and it's also the place with the lowest income so this sort of very
[00:54:38] the lowest income so this sort of very high density low income area has higher
[00:54:41] high density low income area has higher predicted infection rates in our model
[00:54:43] predicted infection rates in our model and that's happening because the pois
[00:54:44] and that's happening because the pois that people are going out to are smaller
[00:54:47] that people are going out to are smaller and more crowded and more dangerous
[00:54:52] final um implication is that you know
[00:54:54] final um implication is that you know the model can look at sort of the
[00:54:55] the model can look at sort of the predicted impact of reopening plans uh
[00:54:58] predicted impact of reopening plans uh for people in lower income
[00:55:01] for people in lower income deciles as opposed to the population as
[00:55:03] deciles as opposed to the population as a whole and basically what we show is
[00:55:05] a whole and basically what we show is that often reopening plants have larger
[00:55:07] that often reopening plants have larger predicted impacts for people in lower
[00:55:09] predicted impacts for people in lower income decile so for sort of for people
[00:55:11] income decile so for sort of for people in poorer neighborhoods than for the
[00:55:12] in poorer neighborhoods than for the population as a whole so when you do
[00:55:14] population as a whole so when you do consider a reopening plan it's important
[00:55:16] consider a reopening plan it's important not just to consider the overall impact
[00:55:19] not just to consider the overall impact but also the impact on poorer
[00:55:21] but also the impact on poorer neighborhoods
[00:55:22] neighborhoods and in fact california is starting to
[00:55:24] and in fact california is starting to consider doing stuff like this like you
[00:55:26] consider doing stuff like this like you have to look at racial disparities and
[00:55:27] have to look at racial disparities and reopening and racial disparities and
[00:55:29] reopening and racial disparities and impact you can't just look at the impact
[00:55:31] impact you can't just look at the impact on the population as a whole this is
[00:55:33] on the population as a whole this is also good practice by the way when
[00:55:34] also good practice by the way when you're evaluating the impact of an
[00:55:36] you're evaluating the impact of an algorithm you shouldn't just look at how
[00:55:38] algorithm you shouldn't just look at how it performs on the population as a whole
[00:55:39] it performs on the population as a whole you need to also look at how it performs
[00:55:41] you need to also look at how it performs on different subgroups
[00:55:45] so
[00:55:46] so takeaways um this approach sort of
[00:55:48] takeaways um this approach sort of showcases the power of fine cream
[00:55:49] showcases the power of fine cream mobility networks we show that even a
[00:55:51] mobility networks we show that even a simple model leads to accurate fits in
[00:55:53] simple model leads to accurate fits in 10 different american cities
[00:55:55] 10 different american cities metropolitan statistical areas we show
[00:55:58] metropolitan statistical areas we show that it can scale
[00:55:59] that it can scale even to sort of large networks with lots
[00:56:01] even to sort of large networks with lots of places and lots of people
[00:56:03] of places and lots of people we show that you can because you can
[00:56:05] we show that you can because you can capture these very micro trends down to
[00:56:07] capture these very micro trends down to neighborhoods and locations by the hour
[00:56:09] neighborhoods and locations by the hour you can perform these detailed analyses
[00:56:11] you can perform these detailed analyses that can potentially inform more
[00:56:12] that can potentially inform more equitable analysis uh to cobit 19. um i
[00:56:16] equitable analysis uh to cobit 19. um i think a general question that i would
[00:56:18] think a general question that i would have for people and i don't know if we
[00:56:19] have for people and i don't know if we want to talk about this now or at the
[00:56:21] want to talk about this now or at the end or not at all but like what are
[00:56:23] end or not at all but like what are other questions you might want to answer
[00:56:24] other questions you might want to answer with this model um
[00:56:27] with this model um because i think there are a lot of other
[00:56:28] because i think there are a lot of other things you can potentially ask beyond
[00:56:30] things you can potentially ask beyond what we have asked and i'm curious as to
[00:56:32] what we have asked and i'm curious as to your thoughts should we return to that
[00:56:33] your thoughts should we return to that point at the end though i don't know how
[00:56:34] point at the end though i don't know how we're doing it on time yeah we have 20
[00:56:36] we're doing it on time yeah we have 20 minutes left so maybe we can take some
[00:56:38] minutes left so maybe we can take some questions now and then we'll move on
[00:56:40] questions now and then we'll move on sounds good
[00:56:42] sounds good cool uh so actually
[00:56:46] cool uh so actually if
[00:56:46] if you guys have working mics do you want
[00:56:49] you guys have working mics do you want to read all your own questions
[00:56:51] to read all your own questions so uh my question was if you are able to
[00:56:54] so uh my question was if you are able to take into account the percentage of
[00:56:56] take into account the percentage of people that wear masks
[00:56:58] people that wear masks we are not that's a great question
[00:57:00] we are not that's a great question you're uh your reviewer two three one i
[00:57:03] you're uh your reviewer two three one i don't know many reviewers have that
[00:57:04] don't know many reviewers have that question we do not attempt to take into
[00:57:06] question we do not attempt to take into account the fraction of people wearing
[00:57:07] account the fraction of people wearing masks and i think that's an interesting
[00:57:08] masks and i think that's an interesting direction for future work
[00:57:13] yeah hi i suppose this sort of um goes
[00:57:15] yeah hi i suppose this sort of um goes along with your question that you put at
[00:57:17] along with your question that you put at the end of the slide which is what other
[00:57:18] the end of the slide which is what other questions you might answer with this
[00:57:19] questions you might answer with this model
[00:57:20] model but um i was wondering like could this
[00:57:22] but um i was wondering like could this model of mobility be used to analyze
[00:57:24] model of mobility be used to analyze like other mobility issues that don't
[00:57:26] like other mobility issues that don't revolve around like health or
[00:57:28] revolve around like health or epidemiology such as how different types
[00:57:30] epidemiology such as how different types of zoning codes or access or use of
[00:57:32] of zoning codes or access or use of public transportation and different cbgs
[00:57:34] public transportation and different cbgs affect mobility of those neighborhoods
[00:57:37] affect mobility of those neighborhoods uh yeah absolutely i mean safer data is
[00:57:40] uh yeah absolutely i mean safer data is very broadly relevant to social science
[00:57:43] very broadly relevant to social science and other questions of mobility
[00:57:46] and other questions of mobility and we're using it for other projects as
[00:57:48] and we're using it for other projects as well it's definitely
[00:57:50] well it's definitely a gold mine for for other other
[00:57:52] a gold mine for for other other yes and yes
[00:57:55] yes and yes i was wondering do you think it's
[00:57:56] i was wondering do you think it's possible that we can make connections
[00:57:58] possible that we can make connections between
[00:57:59] between like physical mobility between
[00:58:02] like physical mobility between cbjs and pois and whether that somehow
[00:58:06] cbjs and pois and whether that somehow correlates to the degree of
[00:58:07] correlates to the degree of socioeconomic mobilities with
[00:58:10] socioeconomic mobilities with ebgs
[00:58:13] ebgs yeah and you might look at um
[00:58:16] yeah and you might look at um yeah you might look at susan athey and
[00:58:18] yeah you might look at susan athey and uh against cow athey and gents cow would
[00:58:20] uh against cow athey and gents cow would be the names to look up but like they
[00:58:21] be the names to look up but like they look at sort of socioeconomic
[00:58:22] look at sort of socioeconomic segregation sorry they look at racial
[00:58:24] segregation sorry they look at racial segregation using safecraft data but
[00:58:26] segregation using safecraft data but then they correlate it with other
[00:58:28] then they correlate it with other measures a sort of economic opportunity
[00:58:30] measures a sort of economic opportunity using work from rush chetty i think in
[00:58:32] using work from rush chetty i think in their paper so that's absolutely
[00:58:33] their paper so that's absolutely something you can do i mean causal
[00:58:35] something you can do i mean causal claims are hard but it's so interesting
[00:58:39] and i think that's it
[00:58:41] and i think that's it cool all right let's go on then great
[00:58:45] cool all right let's go on then great okay cool so so i was i was asked to
[00:58:47] okay cool so so i was i was asked to speak briefly about sort of how i ended
[00:58:49] speak briefly about sort of how i ended up on on this path and doing this kind
[00:58:50] up on on this path and doing this kind of work um just in case it was helpful
[00:58:52] of work um just in case it was helpful to people so i attempted to to write
[00:58:54] to people so i attempted to to write this down
[00:58:55] this down um
[00:58:58] okay so you know i liked math and
[00:59:00] okay so you know i liked math and physics and other
[00:59:02] physics and other similarly nerdy stuff ever since i was a
[00:59:04] similarly nerdy stuff ever since i was a little kid um this was a picture of me
[00:59:06] little kid um this was a picture of me dressing up as a chess board for
[00:59:07] dressing up as a chess board for halloween um so you can tell i was super
[00:59:09] halloween um so you can tell i was super cool and definitely had a ton of friends
[00:59:11] cool and definitely had a ton of friends and i took my first ai class in high
[00:59:13] and i took my first ai class in high school but i was the only girl in the
[00:59:15] school but i was the only girl in the class and i had a lot of less experience
[00:59:17] class and i had a lot of less experience than the boys
[00:59:19] than the boys and some of them made fun of my lack of
[00:59:20] and some of them made fun of my lack of experience and told me i was the worst
[00:59:22] experience and told me i was the worst in the class so by the time i got to
[00:59:24] in the class so by the time i got to stanford i actually decided i was not
[00:59:25] stanford i actually decided i was not particularly good at computer science
[00:59:28] particularly good at computer science and i came to stanford as a physics
[00:59:29] and i came to stanford as a physics major and i did not even take any cs
[00:59:31] major and i did not even take any cs classes my first year at stanford
[00:59:35] classes my first year at stanford but
[00:59:38] in my second year at stanford um i
[00:59:40] in my second year at stanford um i decided i should give computer science
[00:59:41] decided i should give computer science another try and so i actually enrolled
[00:59:43] another try and so i actually enrolled in this class which at that point was
[00:59:45] in this class which at that point was not taught by percy i don't know who
[00:59:47] not taught by percy i don't know who who were the teachers
[00:59:49] who were the teachers no
[00:59:50] no mm-hmm
[00:59:52] mm-hmm so that i think so and some other i
[00:59:54] so that i think so and some other i don't know i don't even remember who
[00:59:55] don't know i don't even remember who taught the class i do remember the class
[00:59:57] taught the class i do remember the class was awesome um and and and honestly like
[01:00:00] was awesome um and and and honestly like i i this is not propaganda like it was
[01:00:01] i i this is not propaganda like it was actually true i thought i thought the
[01:00:02] actually true i thought i thought the computer science was super cool um and
[01:00:04] computer science was super cool um and that summer i started doing computer
[01:00:06] that summer i started doing computer science research in a physics lab i was
[01:00:08] science research in a physics lab i was developing algorithms to identify
[01:00:10] developing algorithms to identify certain types of galaxies um but i
[01:00:12] certain types of galaxies um but i realized something was missing that i
[01:00:14] realized something was missing that i thought you know ai was amazing but i
[01:00:16] thought you know ai was amazing but i didn't want to use it to study galaxies
[01:00:18] didn't want to use it to study galaxies that were you know millions of light
[01:00:19] that were you know millions of light years away there were too many problems
[01:00:21] years away there were too many problems that were closer to home
[01:00:24] that were closer to home and this was really driven home for me a
[01:00:26] and this was really driven home for me a few months later when i got a genetic
[01:00:28] few months later when i got a genetic test which told me i carried a mutation
[01:00:29] test which told me i carried a mutation that gave me a very high risk of getting
[01:00:31] that gave me a very high risk of getting cancer it's called a brachia mutation
[01:00:33] cancer it's called a brachia mutation some of you might have heard of it
[01:00:35] some of you might have heard of it and as you can imagine this was pretty
[01:00:37] and as you can imagine this was pretty difficult news to receive as a 20 year
[01:00:38] difficult news to receive as a 20 year old and i spent the next few months
[01:00:40] old and i spent the next few months being pretty upset about it
[01:00:43] being pretty upset about it and during this period when i was trying
[01:00:45] and during this period when i was trying to come to terms with the news i came
[01:00:47] to come to terms with the news i came across this paper out of daphne kohler's
[01:00:49] across this paper out of daphne kohler's lab she was an ai professor at stanford
[01:00:51] lab she was an ai professor at stanford at the time i'm sure many of you have
[01:00:52] at the time i'm sure many of you have heard of her these days she's an
[01:00:54] heard of her these days she's an industry
[01:00:55] industry and in her paper they take images of
[01:00:57] and in her paper they take images of cells from cancer patients and they
[01:00:59] cells from cancer patients and they apply a computer vision model to try to
[01:01:01] apply a computer vision model to try to predict whether the patients will
[01:01:02] predict whether the patients will survive
[01:01:03] survive and these days you would probably do
[01:01:04] and these days you would probably do this using deep learning back then they
[01:01:06] this using deep learning back then they were using old school computer vision i
[01:01:08] were using old school computer vision i thought it was the most amazing thing
[01:01:10] thought it was the most amazing thing and more importantly it gave me hope i
[01:01:12] and more importantly it gave me hope i thought fine i have this cancer problem
[01:01:14] thought fine i have this cancer problem i'll work on this problem with ai i knew
[01:01:17] i'll work on this problem with ai i knew that i wasn't going to cure cancer but i
[01:01:19] that i wasn't going to cure cancer but i thought that working on it and learning
[01:01:20] thought that working on it and learning about it would make me feel better
[01:01:22] about it would make me feel better understanding the things that frighten
[01:01:23] understanding the things that frighten you often does
[01:01:25] you often does and so i wrote to daphne and she wrote
[01:01:27] and so i wrote to daphne and she wrote back on new year's day and she offered
[01:01:28] back on new year's day and she offered me a spot doing research in her lab and
[01:01:30] me a spot doing research in her lab and i took it
[01:01:31] i took it there's a lesson i take from this kind
[01:01:33] there's a lesson i take from this kind of tough period in my life which is that
[01:01:35] of tough period in my life which is that crying is underrated or as gandalf puts
[01:01:37] crying is underrated or as gandalf puts it in lord of the rings not all tears
[01:01:39] it in lord of the rings not all tears are an evil um i i found tough times
[01:01:42] are an evil um i i found tough times like this one to be very useful in
[01:01:44] like this one to be very useful in crystallizing what matters to me from
[01:01:46] crystallizing what matters to me from that point on i became much more
[01:01:47] that point on i became much more intensely focused on what i wanted to do
[01:01:49] intensely focused on what i wanted to do in life
[01:01:51] in life so then i graduated stanford uh with a
[01:01:53] so then i graduated stanford uh with a bachelor's in physics and a master's in
[01:01:54] bachelor's in physics and a master's in computer science and i decided to take a
[01:01:56] computer science and i decided to take a job at 23andme which was a genetics
[01:01:58] job at 23andme which was a genetics company that offered very cheap testing
[01:02:00] company that offered very cheap testing for the bronco mutation which i carried
[01:02:03] for the bronco mutation which i carried uh their test was so cheap that my
[01:02:05] uh their test was so cheap that my little sister who is then only a
[01:02:06] little sister who is then only a teenager could afford to get tested and
[01:02:09] teenager could afford to get tested and learned that she did not carry the same
[01:02:10] learned that she did not carry the same mutation that i did and i thought that
[01:02:12] mutation that i did and i thought that was amazing you know expanding access to
[01:02:14] was amazing you know expanding access to genetic testing in that way so i
[01:02:15] genetic testing in that way so i accepted a job at 23andme
[01:02:19] but about a month after i arrived at
[01:02:21] but about a month after i arrived at 23andme the us government sent them a
[01:02:23] 23andme the us government sent them a letter ordering them to stop selling
[01:02:25] letter ordering them to stop selling their health-related tests because they
[01:02:27] their health-related tests because they hadn't gotten basically the proper
[01:02:29] hadn't gotten basically the proper regulatory approval so they could no
[01:02:30] regulatory approval so they could no longer sell their broker test which was
[01:02:32] longer sell their broker test which was the whole reason i had gone to the
[01:02:33] the whole reason i had gone to the company in the first place and for the
[01:02:35] company in the first place and for the entire year i was there i basically
[01:02:36] entire year i was there i basically didn't do any bracket research at all
[01:02:38] didn't do any bracket research at all which i think is another important
[01:02:40] which i think is another important lesson like even if you start out on a
[01:02:41] lesson like even if you start out on a path with the best of intentions it's
[01:02:43] path with the best of intentions it's very easy to get really derailed at
[01:02:45] very easy to get really derailed at least for me and it's very hard to
[01:02:47] least for me and it's very hard to predict what projects will pan out
[01:02:51] then i went back to school to start my
[01:02:53] then i went back to school to start my graduate research i was still very
[01:02:54] graduate research i was still very motivated to do cancer research and over
[01:02:57] motivated to do cancer research and over the next couple years i wrote a half
[01:02:58] the next couple years i wrote a half dozen papers developing ai papers for
[01:03:01] dozen papers developing ai papers for computational biology methods
[01:03:03] computational biology methods but i began to feel my work was
[01:03:05] but i began to feel my work was unsatisfying because i was still too far
[01:03:07] unsatisfying because i was still too far away from real people's lives early in
[01:03:09] away from real people's lives early in my graduate research my grandpa who
[01:03:11] my graduate research my grandpa who carried the same genetic mutation that i
[01:03:13] carried the same genetic mutation that i do died of brain cancer we were very
[01:03:16] do died of brain cancer we were very close that's us playing chess up there
[01:03:18] close that's us playing chess up there when i was little and i wrote a fair bit
[01:03:20] when i was little and i wrote a fair bit of my master's thesis uh next to his
[01:03:22] of my master's thesis uh next to his hospital bed
[01:03:23] hospital bed the thesis develops new dimensionality
[01:03:25] the thesis develops new dimensionality reduction methods like a fancy pca or
[01:03:28] reduction methods like a fancy pca or factor analysis basically if you've
[01:03:29] factor analysis basically if you've heard of those things
[01:03:31] heard of those things for a certain type of biological data
[01:03:33] for a certain type of biological data which is important in cancer and many
[01:03:35] which is important in cancer and many other settings and work like that work
[01:03:37] other settings and work like that work like what i was doing just felt very far
[01:03:39] like what i was doing just felt very far away decades away from helping people
[01:03:42] away decades away from helping people like my grandpa which is not to say that
[01:03:44] like my grandpa which is not to say that no one should do it i think it's super
[01:03:45] no one should do it i think it's super important that you have people doing
[01:03:46] important that you have people doing that kind of fundamental research even
[01:03:48] that kind of fundamental research even if it doesn't touch real people's lives
[01:03:50] if it doesn't touch real people's lives for a long time
[01:03:51] for a long time but i began to feel that it wasn't for
[01:03:53] but i began to feel that it wasn't for me that i wanted something that was
[01:03:55] me that i wanted something that was going to help people in the short term
[01:03:56] going to help people in the short term if i was going to be happy for the with
[01:03:58] if i was going to be happy for the with the sort of research that i was doing
[01:04:02] so
[01:04:03] so based on sort of that understanding i
[01:04:05] based on sort of that understanding i started working on data sets where each
[01:04:07] started working on data sets where each row was not a cell or a gene or
[01:04:09] row was not a cell or a gene or something very abstract but a person i
[01:04:13] something very abstract but a person i kept working on health care problems and
[01:04:14] kept working on health care problems and i also started working more on
[01:04:16] i also started working more on inequality that took me forward to some
[01:04:18] inequality that took me forward to some of the problems that i've told you about
[01:04:20] of the problems that i've told you about today studying things like inequality
[01:04:22] today studying things like inequality and pain and policing and covid
[01:04:24] and pain and policing and covid which feel very concrete to me
[01:04:31] looking back at my research i see a lot
[01:04:33] looking back at my research i see a lot of failures and wrong turns i went to
[01:04:35] of failures and wrong turns i went to 23andme to research braca and i failed
[01:04:38] 23andme to research braca and i failed to do that i went to grad school to
[01:04:40] to do that i went to grad school to study cancer and i mostly failed to do
[01:04:41] study cancer and i mostly failed to do that i've spent more than 10 000 hours
[01:04:44] that i've spent more than 10 000 hours of my life getting a phd and i think
[01:04:46] of my life getting a phd and i think it's fair to say that many of those
[01:04:47] it's fair to say that many of those hours have not made anyone's lives
[01:04:49] hours have not made anyone's lives better
[01:04:50] better there's a lot of time running down blind
[01:04:51] there's a lot of time running down blind alleys and even when you do have a good
[01:04:53] alleys and even when you do have a good idea there's a lot of time polishing it
[01:04:55] idea there's a lot of time polishing it and repolishing it to get it published
[01:04:57] and repolishing it to get it published and even when you do publish it often
[01:04:59] and even when you do publish it often very few people read it and even when
[01:05:00] very few people read it and even when people do read it does it actually
[01:05:02] people do read it does it actually change their minds
[01:05:04] change their minds a few months ago a man contacted me
[01:05:06] a few months ago a man contacted me because he was writing a new york times
[01:05:07] because he was writing a new york times piece about our work on policing that i
[01:05:09] piece about our work on policing that i was just telling you about and i went
[01:05:11] was just telling you about and i went back and forth with him meticulously
[01:05:13] back and forth with him meticulously trying to make sure he stated our
[01:05:14] trying to make sure he stated our conclusions accurately i'm sure he was
[01:05:16] conclusions accurately i'm sure he was very sick of me and when the piece
[01:05:18] very sick of me and when the piece finally came out i read the new york
[01:05:20] finally came out i read the new york times comment section and it was obvious
[01:05:22] times comment section and it was obvious that none of the commenters were
[01:05:23] that none of the commenters were actually reading our research they were
[01:05:24] actually reading our research they were just spouting what they already believed
[01:05:26] just spouting what they already believed and that is probably the project i've
[01:05:28] and that is probably the project i've gotten to work on which has been most
[01:05:30] gotten to work on which has been most impactful
[01:05:36] but even though i've spent so much of my
[01:05:38] but even though i've spent so much of my life failing to do good i still think
[01:05:40] life failing to do good i still think it's important to try and that's the
[01:05:41] it's important to try and that's the final topic i want to discuss i outlined
[01:05:44] final topic i want to discuss i outlined this part of the talk the night ruth
[01:05:45] this part of the talk the night ruth bader ginsburg died i heard the news and
[01:05:48] bader ginsburg died i heard the news and i knew i wasn't going to be able to
[01:05:49] i knew i wasn't going to be able to write any more code that day so i
[01:05:50] write any more code that day so i decided i'd just walk until i felt
[01:05:52] decided i'd just walk until i felt better but unfortunately it got very
[01:05:54] better but unfortunately it got very dark and cold before that happened so
[01:05:56] dark and cold before that happened so you'll forgive me if this comes across
[01:05:58] you'll forgive me if this comes across as a little moralistic or maudlin but it
[01:06:00] as a little moralistic or maudlin but it wasn't the best evening
[01:06:02] wasn't the best evening but before i tell you why i think you
[01:06:04] but before i tell you why i think you should try to do good rather than just
[01:06:06] should try to do good rather than just making a lot of money i want to
[01:06:07] making a lot of money i want to acknowledge that there are students
[01:06:08] acknowledge that there are students watching this talk who really do need to
[01:06:10] watching this talk who really do need to make a lot of money when you graduate
[01:06:12] make a lot of money when you graduate you have families to support you have
[01:06:14] you have families to support you have huge student loans these are frightening
[01:06:16] huge student loans these are frightening economic times and if that describes you
[01:06:18] economic times and if that describes you i'm not going to lecture you and you
[01:06:19] i'm not going to lecture you and you should please feel free to ignore this
[01:06:21] should please feel free to ignore this bit
[01:06:24] still i can almost promise that for some
[01:06:26] still i can almost promise that for some of you listening to this talk there will
[01:06:28] of you listening to this talk there will come a point not too far for not too
[01:06:30] come a point not too far for not too long from now where you will have a
[01:06:32] long from now where you will have a choice between multiple jobs which are
[01:06:34] choice between multiple jobs which are both fun and both interesting and both
[01:06:36] both fun and both interesting and both pay you more money than you could
[01:06:38] pay you more money than you could possibly need as a young person that's
[01:06:40] possibly need as a young person that's what the stanford computer science
[01:06:41] what the stanford computer science salary survey shows for the last six
[01:06:43] salary survey shows for the last six years i have data
[01:06:45] years i have data and when that moment comes i'm asking
[01:06:47] and when that moment comes i'm asking you to choose a job that makes the world
[01:06:49] you to choose a job that makes the world better and not just in some trivial way
[01:06:51] better and not just in some trivial way and not just for the very richest people
[01:06:53] and not just for the very richest people i'm not asking you to donate a kidney or
[01:06:55] i'm not asking you to donate a kidney or storm the beaches at normandy or risk
[01:06:57] storm the beaches at normandy or risk your lives treating coped patients i'm
[01:06:59] your lives treating coped patients i'm asking you to choose to make a large
[01:07:01] asking you to choose to make a large amount of money as opposed to an obscene
[01:07:03] amount of money as opposed to an obscene amount of money it's just not that big a
[01:07:05] amount of money it's just not that big a sacrifice in a world with such desperate
[01:07:07] sacrifice in a world with such desperate problems where we've gotten so lucky
[01:07:09] problems where we've gotten so lucky and i also think you'll find that you'll
[01:07:11] and i also think you'll find that you'll get more enjoyment out of whatever money
[01:07:13] get more enjoyment out of whatever money you do make
[01:07:14] you do make if you feel like you earned it doing
[01:07:16] if you feel like you earned it doing something meaningful
[01:07:19] the other reason i think we're compelled
[01:07:21] the other reason i think we're compelled to fight for good is that there are a
[01:07:22] to fight for good is that there are a lot of people doing the opposite i don't
[01:07:25] lot of people doing the opposite i don't want to get political about this but i
[01:07:26] want to get political about this but i think we've all seen just how
[01:07:28] think we've all seen just how catastrophic the consequences of that
[01:07:30] catastrophic the consequences of that can be
[01:07:31] can be so if we who are given the most power to
[01:07:33] so if we who are given the most power to push the world in the right direction
[01:07:35] push the world in the right direction take morally neutral or morally harmful
[01:07:37] take morally neutral or morally harmful careers instead the world will slide in
[01:07:40] careers instead the world will slide in the wrong direction i am only here
[01:07:42] the wrong direction i am only here giving this lecture many of you are only
[01:07:44] giving this lecture many of you are only here listening to it because people like
[01:07:46] here listening to it because people like ruth bader ginsburg woke up every day
[01:07:48] ruth bader ginsburg woke up every day for decades vowing to push the world in
[01:07:50] for decades vowing to push the world in the right direction to expand the circle
[01:07:52] the right direction to expand the circle of people allowed to be in classes like
[01:07:54] of people allowed to be in classes like this one she could have gone into
[01:07:55] this one she could have gone into corporate law instead apparently she had
[01:07:58] corporate law instead apparently she had a taste for armani suits she could have
[01:07:59] a taste for armani suits she could have bought a lot more of them
[01:08:02] i think ultimately a lot of us take high
[01:08:05] i think ultimately a lot of us take high paying socially neutral or socially
[01:08:07] paying socially neutral or socially harmful jobs not because we really need
[01:08:09] harmful jobs not because we really need all that money but because we've
[01:08:10] all that money but because we've internalized the implicit and insidious
[01:08:12] internalized the implicit and insidious claim that if we make a lot of money
[01:08:14] claim that if we make a lot of money we're good engineers we've made it in
[01:08:16] we're good engineers we've made it in life we're worthy of respect
[01:08:18] life we're worthy of respect we need to break that link we need to
[01:08:20] we need to break that link we need to redefine what it means to be a good
[01:08:22] redefine what it means to be a good engineer a few weeks ago i got an email
[01:08:25] engineer a few weeks ago i got an email from a recruiter from the big finance
[01:08:27] from a recruiter from the big finance firm and i responded the way i typically
[01:08:28] firm and i responded the way i typically do i told him i don't work for finance
[01:08:30] do i told him i don't work for finance firms and he asked me why i didn't want
[01:08:32] firms and he asked me why i didn't want to work with the best engineers in the
[01:08:33] to work with the best engineers in the world
[01:08:34] world and i thought
[01:08:36] and i thought the best engineers in the world think
[01:08:37] the best engineers in the world think about the social implications of their
[01:08:39] about the social implications of their work
[01:08:40] work the biggest factor determining your
[01:08:41] the biggest factor determining your impact will not be whether you
[01:08:43] impact will not be whether you understand all the variants of gradient
[01:08:45] understand all the variants of gradient based optimization although you like
[01:08:47] based optimization although you like should put some effort into learning
[01:08:49] should put some effort into learning those both because they are very useful
[01:08:50] those both because they are very useful and so pogba like won't kill me
[01:08:53] and so pogba like won't kill me um the biggest factor determining your
[01:08:55] um the biggest factor determining your impact will be the problems you choose
[01:08:57] impact will be the problems you choose to work on that's what makes a great
[01:08:59] to work on that's what makes a great engineer
[01:09:02] i'll close with a quote from tanahisi
[01:09:04] i'll close with a quote from tanahisi coats between the world and me which is
[01:09:06] coats between the world and me which is a letter he writes to his son who is
[01:09:07] a letter he writes to his son who is about your age he writes i would have
[01:09:10] about your age he writes i would have you be a conscious citizen of this
[01:09:11] you be a conscious citizen of this terrible and beautiful world
[01:09:13] terrible and beautiful world this is what i would wish for my child
[01:09:15] this is what i would wish for my child and for my students and for myself
[01:09:17] and for my students and for myself and for you as well thanks very much for
[01:09:19] and for you as well thanks very much for listening and i'm happy to take any
[01:09:21] listening and i'm happy to take any further questions
[01:09:24] uh hi well thank you very much this was
[01:09:27] uh hi well thank you very much this was impressive listening to you um well i
[01:09:30] impressive listening to you um well i was i was thinking while you were
[01:09:31] was i was thinking while you were talking about this
[01:09:36] medical
[01:09:37] medical impact
[01:09:39] impact and how to study that
[01:09:40] and how to study that especially
[01:09:42] especially with regards to
[01:09:44] with regards to the bias
[01:09:49] uh population so um
[01:09:52] uh population so um there are quite a lot of actually uh
[01:09:54] there are quite a lot of actually uh biases cognitive biases that
[01:09:56] biases cognitive biases that doctors can
[01:09:58] doctors can can show while taking important medical
[01:10:00] can show while taking important medical decisions
[01:10:03] decisions are we able to study somehow i mean how
[01:10:05] are we able to study somehow i mean how they
[01:10:07] they [Music]
[01:10:09] [Music] how this happens so
[01:10:11] how this happens so just to help them avoid and
[01:10:14] just to help them avoid and eliminate or reduce this
[01:10:16] eliminate or reduce this this kind of risk
[01:10:18] this kind of risk yeah totally i mean i think this sort of
[01:10:20] yeah totally i mean i think this sort of like behavioral economics approach to
[01:10:22] like behavioral economics approach to like let's understand doctors biases and
[01:10:24] like let's understand doctors biases and put them in in terms of sort of these
[01:10:26] put them in in terms of sort of these common
[01:10:27] common heuristics that people use these common
[01:10:29] heuristics that people use these common biases that people have um is a broad
[01:10:32] biases that people have um is a broad and promising line of research
[01:10:34] and promising line of research i'm not in a behavioral economist
[01:10:36] i'm not in a behavioral economist for one example of this type of work i
[01:10:38] for one example of this type of work i would point you to
[01:10:39] would point you to it's a recent paper it's called like who
[01:10:41] it's a recent paper it's called like who gets tested for heart attack and who
[01:10:43] gets tested for heart attack and who should be um and it sort of studies you
[01:10:46] should be um and it sort of studies you know how do these cognitive
[01:10:48] know how do these cognitive uh heuristics that people use play into
[01:10:50] uh heuristics that people use play into decisions like this the authors are you
[01:10:52] decisions like this the authors are you should look for senator milanathan and z
[01:10:53] should look for senator milanathan and z at obermeyer and there may be some other
[01:10:55] at obermeyer and there may be some other authors as well but but the broad answer
[01:10:57] authors as well but but the broad answer to your question is yes you know you can
[01:10:59] to your question is yes you know you can you can absolutely study doctor's biases
[01:11:01] you can absolutely study doctor's biases in terms of sort of things we know
[01:11:02] in terms of sort of things we know cognitively about people and they make
[01:11:04] cognitively about people and they make decisions
[01:11:06] decisions thank you
[01:11:10] the other uh read out your question
[01:11:17] i'm sure yeah just about the
[01:11:20] i'm sure yeah just about the the difference between racism
[01:11:22] the difference between racism is um i'm not sure if any study
[01:11:25] is um i'm not sure if any study actually um
[01:11:26] actually um about us if any difference between you
[01:11:29] about us if any difference between you know iq happiness
[01:11:31] know iq happiness is what actually caused the difference
[01:11:34] is what actually caused the difference or there's a difference as a i guess it
[01:11:36] or there's a difference as a i guess it becomes either correlation or
[01:11:38] becomes either correlation or consequence
[01:11:39] consequence is there any
[01:11:40] is there any study looking deeper on us
[01:11:43] study looking deeper on us to understand the difference
[01:11:46] to understand the difference uh
[01:11:46] uh i mean there are a lot of differences
[01:11:48] i mean there are a lot of differences looking in a lot of studies looking at
[01:11:50] looking in a lot of studies looking at differences by race and ethnicity uh
[01:11:53] differences by race and ethnicity uh this is a fraught topic it you know some
[01:11:56] this is a fraught topic it you know some of the studies have not been good
[01:11:57] of the studies have not been good studies like so like in particular
[01:11:59] studies like so like in particular studies of racial differences in iq i
[01:12:01] studies of racial differences in iq i think it's a very fraught topic and then
[01:12:02] think it's a very fraught topic and then there's stuff which is you know not at
[01:12:04] there's stuff which is you know not at all fraud like let's look at racial
[01:12:05] all fraud like let's look at racial differences and i don't know incidents
[01:12:07] differences and i don't know incidents of breast cancer or deaths from breast
[01:12:08] of breast cancer or deaths from breast cancer so yes i mean there's there's
[01:12:11] cancer so yes i mean there's there's there's a lot of
[01:12:13] there's a lot of research in this area um
[01:12:15] research in this area um of varying quality uh but a lot of it is
[01:12:17] of varying quality uh but a lot of it is super important
[01:12:20] super important it is extremely difficult to figure out
[01:12:22] it is extremely difficult to figure out causality here and more studies that
[01:12:24] causality here and more studies that claim right they can are
[01:12:27] claim right they can are pushing a particular
[01:12:29] pushing a particular political agenda and should be treated
[01:12:31] political agenda and should be treated for a lot of skepticism
[01:12:33] for a lot of skepticism okay thank you
[01:12:40] okay
[01:12:41] okay uh
[01:12:42] uh since we are about time
[01:12:48] yeah sounds good so if there's no other
[01:12:52] yeah sounds good so if there's no other questions uh let's uh
[01:12:54] questions uh let's uh um if you can mute and clap that would
[01:12:56] um if you can mute and clap that would be great i'll count to three so we can
[01:12:58] be great i'll count to three so we can all uh give him a really big round of
[01:13:01] all uh give him a really big round of applause that was an amazing talk
[01:13:11] thank you it's a pleasure thank you for
[01:13:12] thank you it's a pleasure thank you for the great questions
[01:13:19] you


================================================================================
LECTURE 055
================================================================================

Fireside Talks:  Artificial Intelligence (AI) and Language

Source: https://www.youtube.com/watch?v=pI72PseZQo8

---

Transcript

[00:00:05] okay great let's get started so welcome
[00:00:08] okay great let's get started so welcome everyone to the fireside chat or talk on
[00:00:11] everyone to the fireside chat or talk on ai and language
[00:00:13] ai and language um so today we're going to do something
[00:00:14] um so today we're going to do something a little bit different i want all of you
[00:00:16] a little bit different i want all of you good to go to slide
[00:00:19] good to go to slide oh
[00:00:20] oh and
[00:00:20] and the guest code is cs221 so we're going
[00:00:23] the guest code is cs221 so we're going to try to use this
[00:00:24] to try to use this platform for doing q a and also i'm
[00:00:27] platform for doing q a and also i'm going to have a number of polls
[00:00:28] going to have a number of polls throughout
[00:00:29] throughout the talk
[00:00:30] the talk so the first question if you click here
[00:00:33] so the first question if you click here you'll hopefully see that it's what city
[00:00:35] you'll hopefully see that it's what city are you in right now
[00:00:38] are you in right now and i already got some responses san
[00:00:40] and i already got some responses san jose palo alto stanford
[00:00:42] jose palo alto stanford seattle um
[00:00:44] seattle um college station
[00:00:46] college station fort mill
[00:00:47] fort mill cupertino new york city
[00:00:49] cupertino new york city so welcome everyone from all over
[00:00:53] so welcome everyone from all over um and if you go oops i guess
[00:00:56] um and if you go oops i guess this
[00:00:58] this zoom thing is in a way okay i guess i
[00:01:00] zoom thing is in a way okay i guess i can't
[00:01:01] can't see that but
[00:01:02] see that but anyway you can go to there should be a q
[00:01:04] anyway you can go to there should be a q a tab
[00:01:06] a tab um
[00:01:07] um where you can go and type in your
[00:01:09] where you can go and type in your questionnaire uh try to monitor that
[00:01:11] questionnaire uh try to monitor that throughout the
[00:01:12] throughout the hour
[00:01:14] hour okay
[00:01:16] okay um all right so
[00:01:20] i wanna start by asking you a very
[00:01:22] i wanna start by asking you a very simple question what is the difference
[00:01:24] simple question what is the difference between these two cute little kittens
[00:01:27] between these two cute little kittens and these two kids here
[00:01:30] and these two kids here anyone know the answer
[00:01:33] anyone know the answer both can see smell
[00:01:36] both can see smell taste move around the environment
[00:01:39] taste move around the environment kids are sometimes cute too
[00:01:42] kids are sometimes cute too what's the what's the main difference
[00:01:44] what's the what's the main difference let's make this interactive someone just
[00:01:45] let's make this interactive someone just just shout out an answer
[00:01:56] humans can talk
[00:01:58] humans can talk humans can talk yes thank you
[00:02:01] humans can talk yes thank you that is the main difference and while
[00:02:04] that is the main difference and while animals do have
[00:02:06] animals do have some sorts of communication especially
[00:02:08] some sorts of communication especially songbirds and dolphins and and honeybees
[00:02:11] songbirds and dolphins and and honeybees have their waggle dance
[00:02:13] have their waggle dance none i think could boast as such a rich
[00:02:15] none i think could boast as such a rich and complex language as the human
[00:02:17] and complex language as the human language so it's really i think language
[00:02:19] language so it's really i think language is something that's uniquely human and
[00:02:21] is something that's uniquely human and defines who we are
[00:02:24] defines who we are so before getting into
[00:02:25] so before getting into talking about ai and nlp i want to talk
[00:02:28] talking about ai and nlp i want to talk spend some time just talking about why
[00:02:30] spend some time just talking about why language is special and hopeless that we
[00:02:33] language is special and hopeless that we can get a richer appreciation for a
[00:02:35] can get a richer appreciation for a language
[00:02:37] language so if i had one slide to summarize
[00:02:39] so if i had one slide to summarize language um this would be it
[00:02:42] language um this would be it so this is one of my favorite xkcd
[00:02:44] so this is one of my favorite xkcd comics um some of you probably seen it
[00:02:45] comics um some of you probably seen it but i'll just read it anyway because i
[00:02:47] but i'll just read it anyway because i think it really highlights the the right
[00:02:50] think it really highlights the the right mode so
[00:02:51] mode so anyway i could care less
[00:02:54] anyway i could care less i think you mean you couldn't care less
[00:02:56] i think you mean you couldn't care less saying that you could care less implies
[00:02:58] saying that you could care less implies that you care at least some amount
[00:03:00] that you care at least some amount i don't know
[00:03:02] i don't know we're these unbelievably complex brains
[00:03:05] we're these unbelievably complex brains drifting in the void trying to in vain
[00:03:07] drifting in the void trying to in vain to connect with one another by flinging
[00:03:09] to connect with one another by flinging uh words out into the darkness every
[00:03:12] uh words out into the darkness every choice of phrasing and spelling and tone
[00:03:14] choice of phrasing and spelling and tone timing carries countless signals and
[00:03:15] timing carries countless signals and context and subtext and more and every
[00:03:17] context and subtext and more and every listener interprets these signals in
[00:03:19] listener interprets these signals in their own way language isn't a formal
[00:03:21] their own way language isn't a formal system it's glorified chaos
[00:03:24] system it's glorified chaos and you never know for sure what any
[00:03:26] and you never know for sure what any words will mean to anyone all you can do
[00:03:28] words will mean to anyone all you can do is try to get better at guessing how
[00:03:29] is try to get better at guessing how your words affect people so that you
[00:03:31] your words affect people so that you have a better chance of finding ones
[00:03:33] have a better chance of finding ones that will make them feel something like
[00:03:35] that will make them feel something like what you want them to feel everything
[00:03:37] what you want them to feel everything else is pointless
[00:03:39] else is pointless so i assume you're giving me tips on how
[00:03:41] so i assume you're giving me tips on how you interpret words because you want me
[00:03:43] you interpret words because you want me to feel less alone if so
[00:03:45] to feel less alone if so then thank you that means a lot
[00:03:47] then thank you that means a lot but if you're just running my sentences
[00:03:49] but if you're just running my sentences past the mental checklist so you can
[00:03:51] past the mental checklist so you can show off how well you know it then i
[00:03:53] show off how well you know it then i could care less
[00:03:55] could care less so
[00:03:56] so what do we learn from this so first
[00:03:58] what do we learn from this so first language
[00:04:00] language is is social it's it's meant for
[00:04:02] is is social it's it's meant for communication right i think a lot of us
[00:04:05] communication right i think a lot of us coming more from a kind of data or ml
[00:04:07] coming more from a kind of data or ml background i think language is just a
[00:04:09] background i think language is just a body of text but it's really this uh
[00:04:12] body of text but it's really this uh this dynamic thing that humans invented
[00:04:14] this dynamic thing that humans invented communicate with each other
[00:04:16] communicate with each other the other thing is that language
[00:04:18] the other thing is that language or talk is is cheap
[00:04:20] or talk is is cheap um and something about language requires
[00:04:23] um and something about language requires incredible amount of trust between the
[00:04:25] incredible amount of trust between the people so that
[00:04:27] people so that it actually can can function but
[00:04:30] it actually can can function but interestingly it can also be used to
[00:04:31] interestingly it can also be used to deceive which is
[00:04:33] deceive which is interesting right now
[00:04:35] interesting right now and it's just kind of miraculous how
[00:04:36] and it's just kind of miraculous how language allows us to express all these
[00:04:39] language allows us to express all these different thoughts from poetry to math
[00:04:42] different thoughts from poetry to math to
[00:04:44] to how to fix a bike and so on
[00:04:47] how to fix a bike and so on so where did language come from the
[00:04:50] so where did language come from the short answer is no one really knows and
[00:04:52] short answer is no one really knows and it's really hard to pinpoint it because
[00:04:54] it's really hard to pinpoint it because while writing came around 3000 bc and
[00:04:58] while writing came around 3000 bc and before then it was a long period of
[00:04:59] before then it was a long period of spoken language and spoken language
[00:05:02] spoken language and spoken language doesn't leave fossils or anything
[00:05:04] doesn't leave fossils or anything um and there was so much skeptic
[00:05:06] um and there was so much skeptic kind of controversy around that that was
[00:05:08] kind of controversy around that that was actually banned the study of origins of
[00:05:10] actually banned the study of origins of language was banned for about a hundred
[00:05:11] language was banned for about a hundred years and
[00:05:13] years and um
[00:05:13] um in paris
[00:05:15] in paris so but but we do
[00:05:17] so but but we do just kind of conservatively put an
[00:05:18] just kind of conservatively put an estimate it started maybe
[00:05:21] estimate it started maybe 2.5 million years ago when homo sapiens
[00:05:24] 2.5 million years ago when homo sapiens first came on the scene to
[00:05:27] first came on the scene to sometime between 100 000 years ago which
[00:05:29] sometime between 100 000 years ago which is when modern humans really started
[00:05:33] is when modern humans really started doing things
[00:05:35] doing things which is a huge range but to put in
[00:05:37] which is a huge range but to put in perspective this is a very recent
[00:05:39] perspective this is a very recent development compared to the history of
[00:05:41] development compared to the history of all of life on earth
[00:05:43] all of life on earth um and
[00:05:44] um and we know that it served an evolutionary
[00:05:47] we know that it served an evolutionary purpose so if you read sapiens this book
[00:05:49] purpose so if you read sapiens this book by yuval harari
[00:05:52] by yuval harari language is perhaps one of the key
[00:05:55] language is perhaps one of the key reasons why homo sapiens became so
[00:05:57] reasons why homo sapiens became so dominant because they allowed you to
[00:05:59] dominant because they allowed you to communicate and coordinate on such
[00:06:02] communicate and coordinate on such massive levels and for example detecting
[00:06:05] massive levels and for example detecting when coordinating on the hunt for
[00:06:07] when coordinating on the hunt for example or
[00:06:08] example or communicating about food sources and so
[00:06:10] communicating about food sources and so on um and interesting language allows
[00:06:13] on um and interesting language allows you to talk about things that aren't
[00:06:14] you to talk about things that aren't here and now
[00:06:16] here and now that that is probably the one of the
[00:06:18] that that is probably the one of the most powerful things and in fact it
[00:06:19] most powerful things and in fact it tells allows you to talk about things
[00:06:21] tells allows you to talk about things that don't even exist there's a whole
[00:06:23] that don't even exist there's a whole genre called fiction that's about that
[00:06:25] genre called fiction that's about that and things which are in the abstract
[00:06:28] and things which are in the abstract so
[00:06:29] so um in contrast like i said before you
[00:06:31] um in contrast like i said before you know our kind of uh
[00:06:33] know our kind of uh sister fields of
[00:06:35] sister fields of you know computer vision robots uh
[00:06:37] you know computer vision robots uh robotics uh tap into capabilities that
[00:06:40] robotics uh tap into capabilities that have been around for you know much much
[00:06:42] have been around for you know much much longer like vision is over 500 million
[00:06:44] longer like vision is over 500 million years ago and language is you know
[00:06:46] years ago and language is you know barely let's say
[00:06:48] barely let's say uh conservatively maybe of a million or
[00:06:50] uh conservatively maybe of a million or two years ago
[00:06:54] so uh just for fun let me do a poll um
[00:06:58] so uh just for fun let me do a poll um let me actually have to create the poll
[00:06:59] let me actually have to create the poll first so what language do you speak
[00:07:02] first so what language do you speak languages do you use
[00:07:05] languages do you use here speak
[00:07:08] let's see multiple choice free text
[00:07:12] let's see multiple choice free text okay
[00:07:13] okay let's see if this works
[00:07:15] let's see if this works i have to disable
[00:07:17] i have to disable okay so go to the poll and
[00:07:20] okay so go to the poll and let you fill that out
[00:07:22] let you fill that out so
[00:07:24] so we know that
[00:07:26] we know that language there's not one language
[00:07:28] language there's not one language there's multiple languages
[00:07:30] there's multiple languages and furthermore languages
[00:07:33] and furthermore languages have
[00:07:34] have evolved
[00:07:36] evolved so you can draw kind of a giant family
[00:07:38] so you can draw kind of a giant family tree of languages
[00:07:41] tree of languages and this branch just shows the
[00:07:43] and this branch just shows the indo-european languages which covers all
[00:07:45] indo-european languages which covers all of europe and iran and some parts of
[00:07:48] of europe and iran and some parts of northern india which developed like 10
[00:07:50] northern india which developed like 10 000 years ago and this branch often to
[00:07:53] 000 years ago and this branch often to kind of germanic languages and romance
[00:07:54] kind of germanic languages and romance languages germanic went into german and
[00:07:58] languages germanic went into german and english and so on and today there's um 6
[00:08:01] english and so on and today there's um 6 500 languages
[00:08:03] 500 languages many of which are actually going extinct
[00:08:06] many of which are actually going extinct because language again is social so if
[00:08:08] because language again is social so if you don't have anyone to talk to your
[00:08:09] you don't have anyone to talk to your language just kind of disappears
[00:08:12] language just kind of disappears and language is changing all the time
[00:08:15] and language is changing all the time you know english has definitely evolved
[00:08:16] you know english has definitely evolved since shakespeare
[00:08:18] since shakespeare but you know i think in grade school
[00:08:20] but you know i think in grade school you're probably told that they is
[00:08:22] you're probably told that they is supposed to be plural and you shouldn't
[00:08:24] supposed to be plural and you shouldn't use say they to refer to a singular
[00:08:26] use say they to refer to a singular person but now it's uh
[00:08:29] person but now it's uh especially with kind of this trend for
[00:08:31] especially with kind of this trend for um
[00:08:32] um having gender neutral pronouns they is
[00:08:34] having gender neutral pronouns they is you know kind of proudly singular and
[00:08:36] you know kind of proudly singular and miriam webster kind of declared it as
[00:08:38] miriam webster kind of declared it as the word of the year in 2019
[00:08:40] the word of the year in 2019 um you can think about internet slang
[00:08:42] um you can think about internet slang and emojis is also kind of a
[00:08:44] and emojis is also kind of a continuation of language into the kind
[00:08:46] continuation of language into the kind of digital sphere and so on
[00:08:48] of digital sphere and so on um
[00:08:49] um okay so i'm getting a lot of english
[00:08:51] okay so i'm getting a lot of english mandarin chinese
[00:08:53] mandarin chinese japanese
[00:08:54] japanese so you know quite a bit of uh
[00:08:57] so you know quite a bit of uh python 2 python 3. yes very nice
[00:09:01] python 2 python 3. yes very nice okay so
[00:09:04] okay so one thing that is
[00:09:06] one thing that is i often get asked is you know all these
[00:09:08] i often get asked is you know all these languages are some more harder or more
[00:09:10] languages are some more harder or more powerful than another um it's clear that
[00:09:13] powerful than another um it's clear that they're and i think widely accepted that
[00:09:15] they're and i think widely accepted that all languages are kind of basically
[00:09:17] all languages are kind of basically equivalent but there is kind of this
[00:09:19] equivalent but there is kind of this hypothesis around the 20s called the
[00:09:22] hypothesis around the 20s called the superior wharf hypothesis that says the
[00:09:24] superior wharf hypothesis that says the structure of language affects speaker's
[00:09:26] structure of language affects speaker's world views
[00:09:28] world views and you see this in kind of a fiction
[00:09:30] and you see this in kind of a fiction like george orwell's 1984 which talks
[00:09:34] like george orwell's 1984 which talks about a new language called newspeak
[00:09:36] about a new language called newspeak which was simplified so that they could
[00:09:39] which was simplified so that they could make sure people couldn't think to even
[00:09:41] make sure people couldn't think to even critique their government
[00:09:43] critique their government um this has been challenged by a
[00:09:45] um this has been challenged by a universalist school chomsky and pinker
[00:09:47] universalist school chomsky and pinker who thinks that language and thoughts
[00:09:50] who thinks that language and thoughts are universal and all the differences
[00:09:52] are universal and all the differences are very superficial governed by a few
[00:09:54] are very superficial governed by a few parameters
[00:09:57] and
[00:09:58] and it is true that languages do differ
[00:10:01] it is true that languages do differ they are largely the same most languages
[00:10:03] they are largely the same most languages have nouns and verbs that refer we all
[00:10:06] have nouns and verbs that refer we all are humans living in the in the same
[00:10:08] are humans living in the in the same world um but there are some differences
[00:10:11] world um but there are some differences you know one for an example is that
[00:10:13] you know one for an example is that english lacks what is called uh
[00:10:15] english lacks what is called uh inclusivity which is the distinction
[00:10:17] inclusivity which is the distinction between
[00:10:19] between when you say we it's ambiguous whether
[00:10:21] when you say we it's ambiguous whether you mean
[00:10:22] you mean to include the speaker or not to include
[00:10:24] to include the speaker or not to include the speaker whereas some languages like
[00:10:26] the speaker whereas some languages like tamil
[00:10:27] tamil or some chinese dialects actually have
[00:10:30] or some chinese dialects actually have that distinction
[00:10:31] that distinction or mandarin chinese lacks the
[00:10:33] or mandarin chinese lacks the distinction between past tense and
[00:10:35] distinction between past tense and present tense but of course has other
[00:10:37] present tense but of course has other ways of you know accommodating for that
[00:10:41] ways of you know accommodating for that um
[00:10:42] um so one question uh maybe to just have
[00:10:45] so one question uh maybe to just have another poll is
[00:10:47] another poll is let me stop that poll is do you believe
[00:10:52] let me stop that poll is do you believe believe that language shapes thought
[00:10:56] believe that language shapes thought now
[00:10:58] now i know that these uh questions are
[00:11:02] i know that these uh questions are um you know obviously not binary but i
[00:11:04] um you know obviously not binary but i just wanted you to kind of you know get
[00:11:06] just wanted you to kind of you know get a gut feeling are you leaning more
[00:11:08] a gut feeling are you leaning more towards yes or no on that and i will
[00:11:11] towards yes or no on that and i will activate that
[00:11:12] activate that poll so if you are do you kind of abide
[00:11:16] poll so if you are do you kind of abide more by the uh superior warf hypothesis
[00:11:18] more by the uh superior warf hypothesis that the structure of language does
[00:11:20] that the structure of language does influence how you think about the world
[00:11:22] influence how you think about the world or do you think that uh all humans are
[00:11:25] or do you think that uh all humans are really the same and as we just happen to
[00:11:27] really the same and as we just happen to learn different languages and those
[00:11:29] learn different languages and those differences are fairly minor
[00:11:34] so can you guys see the
[00:11:36] so can you guys see the um
[00:11:38] um the numbers
[00:11:40] okay so we have
[00:11:42] okay so we have about 90 percent superior dwarf um
[00:11:46] about 90 percent superior dwarf um and about 10 no okay
[00:11:50] and about 10 no okay so uh this is a richly hot hotly debated
[00:11:53] so uh this is a richly hot hotly debated topic in linguistics even you know to
[00:11:56] topic in linguistics even you know to this day
[00:11:59] so another fascinating thing about
[00:12:02] so another fascinating thing about language is
[00:12:03] language is that we're not born knowing it babies
[00:12:06] that we're not born knowing it babies can make sounds but it takes them a few
[00:12:08] can make sounds but it takes them a few years to be able to actually acquire
[00:12:10] years to be able to actually acquire language and importantly despite what
[00:12:12] language and importantly despite what their parents might think they're
[00:12:14] their parents might think they're they're not taught from explicit
[00:12:15] they're not taught from explicit instructions
[00:12:17] instructions or from teachers rather they're taught
[00:12:19] or from teachers rather they're taught where they learn naturally from
[00:12:21] where they learn naturally from immersion of language and you know by
[00:12:24] immersion of language and you know by the time they're five they're actually
[00:12:26] the time they're five they're actually fairly kind of have a fluent
[00:12:28] fairly kind of have a fluent grasp of the language and speak
[00:12:30] grasp of the language and speak grammatical sentences and so on um and
[00:12:33] grammatical sentences and so on um and its language acquisition is very
[00:12:35] its language acquisition is very multi-modal and grounded so language
[00:12:38] multi-modal and grounded so language accompanies sight and sounds and
[00:12:41] accompanies sight and sounds and actions and touch and all these things
[00:12:43] actions and touch and all these things and it's active you can't teach a child
[00:12:46] and it's active you can't teach a child by just putting in front of a tv and
[00:12:49] by just putting in front of a tv and expecting language acquisition to
[00:12:51] expecting language acquisition to um occur
[00:12:53] um occur so uh you know one of the big big
[00:12:55] so uh you know one of the big big questions around language acquisition is
[00:12:58] questions around language acquisition is um
[00:12:59] um is is the nature of nurture debate which
[00:13:02] is is the nature of nurture debate which also
[00:13:03] also affects kind of other
[00:13:06] affects kind of other areas as well so
[00:13:08] areas as well so the great question is language
[00:13:11] the great question is language you know innate um and
[00:13:14] you know innate um and um so chomsky who was a famous linguist
[00:13:18] um so chomsky who was a famous linguist in the 50s he came up with this idea of
[00:13:21] in the 50s he came up with this idea of what he called poverty of the stimulus
[00:13:23] what he called poverty of the stimulus and he said that
[00:13:25] and he said that basically
[00:13:26] basically sentences that a child hears can't
[00:13:29] sentences that a child hears can't possibly
[00:13:30] possibly be
[00:13:33] be responsible for the uh the richness
[00:13:36] be responsible for the uh the richness of language that's um you know exhibited
[00:13:38] of language that's um you know exhibited in actual
[00:13:40] in actual in actual humans
[00:13:42] in actual humans um
[00:13:43] um so
[00:13:44] so so he thought he concluded naturally
[00:13:47] so he thought he concluded naturally that language must be
[00:13:49] that language must be a large part of language must be
[00:13:52] a large part of language must be innate
[00:13:55] so i'm trying to delete this poll and
[00:13:57] so i'm trying to delete this poll and add it add a new one at the same time
[00:13:59] add it add a new one at the same time i'm talking
[00:14:00] i'm talking okay so i'll ask you this poll um do you
[00:14:03] okay so i'll ask you this poll um do you think language is an egg
[00:14:05] think language is an egg um
[00:14:07] um and if you think about it he does have
[00:14:10] and if you think about it he does have a fair
[00:14:11] a fair you know point because um
[00:14:14] you know point because um you know uh
[00:14:15] you know uh you know we do as children here's so few
[00:14:18] you know we do as children here's so few of the examples
[00:14:20] of the examples and then we can uh
[00:14:22] and then we can uh you know not nearly enough to really
[00:14:24] you know not nearly enough to really capture all the cases and we are
[00:14:26] capture all the cases and we are constantly run into new language all
[00:14:28] constantly run into new language all over the
[00:14:29] over the all the time and we have to generalize
[00:14:31] all the time and we have to generalize compositionally to longer sentences to
[00:14:33] compositionally to longer sentences to new context and we kind of all land on
[00:14:36] new context and we kind of all land on this kind of same language so
[00:14:38] this kind of same language so um you know i think he
[00:14:40] um you know i think he does have a point on the other hand you
[00:14:42] does have a point on the other hand you know he was an experimentalist he was a
[00:14:44] know he was an experimentalist he was a kind of a you know classic armchair uh
[00:14:46] kind of a you know classic armchair uh linguist who thought deep thoughts about
[00:14:48] linguist who thought deep thoughts about uh how things should be
[00:14:50] uh how things should be um
[00:14:51] um and there's also you know one could
[00:14:54] and there's also you know one could imagine you know what about the role of
[00:14:56] imagine you know what about the role of grounding and maybe that is the it's
[00:14:58] grounding and maybe that is the it's part of the experience that really
[00:14:59] part of the experience that really shapes
[00:15:00] shapes language
[00:15:02] language acquisition and maybe we are just kind
[00:15:04] acquisition and maybe we are just kind of really malleable and so on
[00:15:07] of really malleable and so on so it seems like um everyone's quite
[00:15:10] so it seems like um everyone's quite divided on this so 53 yes and 47
[00:15:14] divided on this so 53 yes and 47 no
[00:15:15] no okay that's always fun so maybe you guys
[00:15:19] okay that's always fun so maybe you guys can
[00:15:20] can you know talk about it that's with your
[00:15:21] you know talk about it that's with your friends
[00:15:22] friends um what i will say oh now now that whoa
[00:15:25] um what i will say oh now now that whoa this is a tight race
[00:15:26] this is a tight race um what i will say is that you know no
[00:15:29] um what i will say is that you know no matter where you're on the spectrum once
[00:15:31] matter where you're on the spectrum once you have kids you you really think that
[00:15:34] you have kids you you really think that there's more innateness than there's not
[00:15:36] there's more innateness than there's not so
[00:15:37] so now it's equal 50 50.
[00:15:41] now it's equal 50 50. okay
[00:15:41] okay great
[00:15:43] great so let's move on
[00:15:44] so let's move on so let's take a look at language you
[00:15:47] so let's take a look at language you know itself and i'm going to introduce
[00:15:48] know itself and i'm going to introduce some basic concepts so there's a whole
[00:15:50] some basic concepts so there's a whole field of linguistics that studies
[00:15:52] field of linguistics that studies language and i really encourage you if
[00:15:54] language and i really encourage you if you're interested to take a linguistic
[00:15:56] you're interested to take a linguistic class i think it's one of the most
[00:15:58] class i think it's one of the most interesting um eye-opening kind of
[00:16:01] interesting um eye-opening kind of experiences um
[00:16:03] experiences um but i'll just go over some kind of basic
[00:16:05] but i'll just go over some kind of basic things quickly so here's a sentence
[00:16:07] things quickly so here's a sentence beethoven was born in bonn and displays
[00:16:08] beethoven was born in bonn and displays his musical talents at an early age now
[00:16:11] his musical talents at an early age now what's going on in the sentence the
[00:16:12] what's going on in the sentence the linguists ask you know what is the
[00:16:14] linguists ask you know what is the structure of the sentence allows us to
[00:16:16] structure of the sentence allows us to understand what it means so first of all
[00:16:18] understand what it means so first of all there's tokenization which is you know
[00:16:20] there's tokenization which is you know the
[00:16:22] the uh the sentence is just a stream of
[00:16:24] uh the sentence is just a stream of characters right and tokenization is a
[00:16:27] characters right and tokenization is a process of converting that into words
[00:16:29] process of converting that into words seems very simple right but as we'll see
[00:16:31] seems very simple right but as we'll see later it's not as simple as it kind of
[00:16:33] later it's not as simple as it kind of looks um
[00:16:34] looks um uh part of speech tagging is uh the idea
[00:16:38] uh part of speech tagging is uh the idea that some words are nouns and other
[00:16:40] that some words are nouns and other words are
[00:16:41] words are verbs and some of the words words verbs
[00:16:43] verbs and some of the words words verbs are past tense versus present tense um
[00:16:46] are past tense versus present tense um parsing goes a bit a step further and
[00:16:49] parsing goes a bit a step further and talks about the grammatical relationship
[00:16:51] talks about the grammatical relationship between words for example displayed
[00:16:54] between words for example displayed has a subject and object and the subject
[00:16:57] has a subject and object and the subject is in this case
[00:16:58] is in this case what is the subject
[00:17:05] anyone
[00:17:08] i know they're english speakers in the
[00:17:10] i know they're english speakers in the audience
[00:17:16] is it bond
[00:17:21] okay i think maybe it's okay it's
[00:17:23] okay i think maybe it's okay it's beethoven right okay so
[00:17:25] beethoven right okay so um even though beethoven is very far
[00:17:27] um even though beethoven is very far away from display it's nonetheless
[00:17:29] away from display it's nonetheless grammatically very close to displayed
[00:17:32] grammatically very close to displayed and this is what
[00:17:34] and this is what this is a property of language language
[00:17:36] this is a property of language language has to be linearized so not everything
[00:17:38] has to be linearized so not everything can be close to each other
[00:17:40] can be close to each other um
[00:17:40] um core reference resolution or anaphora
[00:17:42] core reference resolution or anaphora corresponds to the fact that his some
[00:17:45] corresponds to the fact that his some words are
[00:17:47] words are pointers to other words or concepts so
[00:17:50] pointers to other words or concepts so his refers to beethoven um named entity
[00:17:53] his refers to beethoven um named entity recognition is the task of identifying
[00:17:55] recognition is the task of identifying which entities now usually proper nouns
[00:17:58] which entities now usually proper nouns are per people or locations or
[00:18:00] are per people or locations or organizations
[00:18:02] organizations um
[00:18:03] um so
[00:18:04] so what is what is a word
[00:18:06] what is what is a word so let's go down into a word so okay
[00:18:08] so let's go down into a word so okay here's a word light so
[00:18:10] here's a word light so you know it seems like a word is pretty
[00:18:12] you know it seems like a word is pretty straightforward forward at least in
[00:18:13] straightforward forward at least in english
[00:18:14] english but
[00:18:15] but the problem is that the meaning
[00:18:17] the problem is that the meaning of unit
[00:18:18] of unit what what is kind of conceptually should
[00:18:21] what what is kind of conceptually should be a word actually goes beyond a word so
[00:18:22] be a word actually goes beyond a word so light bulb is kind of a unit it's not um
[00:18:26] light bulb is kind of a unit it's not um light doesn't really capture the the
[00:18:28] light doesn't really capture the the really the full meaning of that word um
[00:18:31] really the full meaning of that word um sometimes the meaning unit is within a
[00:18:33] sometimes the meaning unit is within a word for example lightning isn't just a
[00:18:35] word for example lightning isn't just a blob it consists of light and then
[00:18:39] blob it consists of light and then the suffix en ing which kind of uh
[00:18:42] the suffix en ing which kind of uh you know trims it into
[00:18:44] you know trims it into um a verb you know gerund construction
[00:18:47] um a verb you know gerund construction um there's also cases where words have
[00:18:49] um there's also cases where words have multiple meanings usually called word
[00:18:51] multiple meanings usually called word senses so in all of these sentences you
[00:18:54] senses so in all of these sentences you can see that light actually has a
[00:18:56] can see that light actually has a different meaning
[00:18:57] different meaning and we figure it out based on context
[00:19:02] on the converse
[00:19:04] on the converse conversely
[00:19:05] conversely some
[00:19:06] some meanings have multiple words that refer
[00:19:09] meanings have multiple words that refer to essentially the same meaning that's
[00:19:11] to essentially the same meaning that's synonymy this also happens with
[00:19:13] synonymy this also happens with sentences so this is called paraphrase
[00:19:15] sentences so this is called paraphrase multiple sentences could get at the same
[00:19:18] multiple sentences could get at the same meaning
[00:19:20] meaning just some huge caveat is that there's no
[00:19:22] just some huge caveat is that there's no true equivalences between
[00:19:24] true equivalences between any two words or sentence sentences
[00:19:26] any two words or sentence sentences there's always different subtleties and
[00:19:28] there's always different subtleties and meanings so you can think of more as a
[00:19:30] meanings so you can think of more as a kind of distance metric
[00:19:31] kind of distance metric um there's also
[00:19:33] um there's also notions of relations between words like
[00:19:36] notions of relations between words like hypotonomy which is is a relations and
[00:19:38] hypotonomy which is is a relations and mironomy which is hazard relations and
[00:19:41] mironomy which is hazard relations and this allows you to make do entailment um
[00:19:44] this allows you to make do entailment um which is whether a sentence um
[00:19:47] which is whether a sentence um logically um
[00:19:50] logically um implies a second sentence um entailment
[00:19:52] implies a second sentence um entailment you can kind of think about the the
[00:19:54] you can kind of think about the the three set of end of language it's kind
[00:19:57] three set of end of language it's kind of the problem that embodies um a lot of
[00:19:59] of the problem that embodies um a lot of different tasks if you could solve
[00:20:00] different tasks if you could solve entailment you can do question answering
[00:20:02] entailment you can do question answering you can do sentiment classification you
[00:20:04] you can do sentiment classification you know and so on
[00:20:07] know and so on um
[00:20:08] um oh are there
[00:20:11] i haven't been monitoring the qa i don't
[00:20:12] i haven't been monitoring the qa i don't know i don't think there's any questions
[00:20:15] know i don't think there's any questions if you have a question maybe just uh
[00:20:17] if you have a question maybe just uh holler
[00:20:21] so this is all about lexical semantics
[00:20:23] so this is all about lexical semantics the meaning of words um then we talk
[00:20:25] the meaning of words um then we talk about compositional semantics
[00:20:28] about compositional semantics um so this composition of semantics is
[00:20:30] um so this composition of semantics is kind of rich tradition that goes back to
[00:20:32] kind of rich tradition that goes back to kind of logic so this is fraga who's a
[00:20:35] kind of logic so this is fraga who's a was a logician at the turn of the 20th
[00:20:38] was a logician at the turn of the 20th century
[00:20:39] century and there's two ideas um model theory
[00:20:42] and there's two ideas um model theory and compositionality which i'll explain
[00:20:44] and compositionality which i'll explain so the first is that sentences
[00:20:47] so the first is that sentences are these are just symbols right it's
[00:20:49] are these are just symbols right it's it's a convention that we say block 2 is
[00:20:51] it's a convention that we say block 2 is blue and what
[00:20:53] blue and what this sentence means has to be associated
[00:20:55] this sentence means has to be associated with what is in the world so there's a
[00:20:58] with what is in the world so there's a world in which the block 2 is actually
[00:21:00] world in which the block 2 is actually blue and so this is kind of important
[00:21:03] blue and so this is kind of important distinction which we kind of you know we
[00:21:05] distinction which we kind of you know we kind of gloss over or you probably just
[00:21:08] kind of gloss over or you probably just we don't even think of it because
[00:21:10] we don't even think of it because of uh you know language is so
[00:21:12] of uh you know language is so natural
[00:21:14] natural the second one is compositionality which
[00:21:16] the second one is compositionality which is that the meaning of a whole is the
[00:21:18] is that the meaning of a whole is the meaning of the parts so you could that
[00:21:20] meaning of the parts so you could that is this is compositionality is the key
[00:21:22] is this is compositionality is the key thing that allows
[00:21:24] thing that allows us to build
[00:21:25] us to build more complex meanings out of smaller
[00:21:28] more complex meanings out of smaller units and this is probably the reason
[00:21:30] units and this is probably the reason why we can generalize to all sorts of
[00:21:32] why we can generalize to all sorts of new contexts because we've learned
[00:21:34] new contexts because we've learned the meaning of the words and we know how
[00:21:36] the meaning of the words and we know how they combine
[00:21:38] they combine together and that's how we can generate
[00:21:42] together and that's how we can generate you know uh
[00:21:43] you know uh we can interpret the new sentences and
[00:21:45] we can interpret the new sentences and new contexts
[00:21:47] new contexts um
[00:21:49] um quantifiers i think are really you know
[00:21:51] quantifiers i think are really you know interesting so uh every is a word that
[00:21:54] interesting so uh every is a word that says uh
[00:21:56] says uh you know says that well it's hard to
[00:21:58] you know says that well it's hard to explain language without explaining in
[00:22:00] explain language without explaining in terms of language so every means every
[00:22:02] terms of language so every means every hopefully these pictures tell you uh
[00:22:04] hopefully these pictures tell you uh what's going on and some is a kind of an
[00:22:06] what's going on and some is a kind of an existential quantifier
[00:22:09] existential quantifier there's also a quantifier scope
[00:22:11] there's also a quantifier scope ambiguity which uh
[00:22:13] ambiguity which uh means that if you have every non-blue
[00:22:15] means that if you have every non-blue block is next to some blue block that
[00:22:17] block is next to some blue block that could mean that every blue block is next
[00:22:19] could mean that every blue block is next to some blue block which is that that
[00:22:22] to some blue block which is that that was very tautological which is
[00:22:25] was very tautological which is is could be different or that there
[00:22:27] is could be different or that there exists some blue block that's actually
[00:22:29] exists some blue block that's actually next to some
[00:22:30] next to some to every non-blue block
[00:22:33] to every non-blue block um
[00:22:34] um so language is you know ambiguous um
[00:22:38] so language is you know ambiguous um so modality uh
[00:22:41] so modality uh or involve words like must and can
[00:22:44] or involve words like must and can um
[00:22:45] um and this
[00:22:47] and this has to has to do with possible you know
[00:22:50] has to has to do with possible you know worlds in all these possible worlds
[00:22:52] worlds in all these possible worlds block 2 is blue but block one is only
[00:22:58] block 2 is blue but block one is only a true in is read in one of the worlds
[00:23:01] a true in is read in one of the worlds um beliefs are
[00:23:02] um beliefs are interesting so um
[00:23:04] interesting so um you know we know that clark kent is the
[00:23:06] you know we know that clark kent is the same person as superman
[00:23:08] same person as superman but
[00:23:10] but um and naively you might think that we
[00:23:11] um and naively you might think that we can just substitute these two
[00:23:13] can just substitute these two uh in all contexts because they're
[00:23:15] uh in all contexts because they're equivalent but you know lewis believes
[00:23:18] equivalent but you know lewis believes that
[00:23:18] that superman is a hero is not the same as
[00:23:20] superman is a hero is not the same as lewis believes that clark kent is a hero
[00:23:23] lewis believes that clark kent is a hero and this has to do with the fact that
[00:23:25] and this has to do with the fact that you know this believes sets up a kind of
[00:23:27] you know this believes sets up a kind of opaque context which you can't just do
[00:23:29] opaque context which you can't just do substitution
[00:23:31] substitution there's much more to be said you know
[00:23:32] there's much more to be said you know about this if you study linguistics i
[00:23:34] about this if you study linguistics i just want to give you a flavor for
[00:23:36] just want to give you a flavor for how language can be kind of quite subtle
[00:23:39] how language can be kind of quite subtle um here's some other examples of you
[00:23:41] um here's some other examples of you know pragmatics
[00:23:43] know pragmatics so um pragmatics the conversational
[00:23:46] so um pragmatics the conversational implicature is this phenomenon where um
[00:23:50] implicature is this phenomenon where um there is a sentence that you say
[00:23:52] there is a sentence that you say um but there's actually additional
[00:23:54] um but there's actually additional meaning beyond that sentence so if two
[00:23:56] meaning beyond that sentence so if two people are talking and
[00:23:57] people are talking and a says what earth happened to the roast
[00:24:00] a says what earth happened to the roast beef and b says the dog is looking very
[00:24:02] beef and b says the dog is looking very happy
[00:24:03] happy sure the dog is looking very happy
[00:24:05] sure the dog is looking very happy that's a sentence but really the
[00:24:07] that's a sentence but really the implicature is the dog ate the roast
[00:24:09] implicature is the dog ate the roast beef
[00:24:10] beef um presupposition is
[00:24:12] um presupposition is is actually kind of very
[00:24:14] is actually kind of very you know
[00:24:15] you know subtle but different is the background
[00:24:17] subtle but different is the background assumption that's independent of the
[00:24:18] assumption that's independent of the truth of a sentence so if i say i have
[00:24:21] truth of a sentence so if i say i have stopped eating meat
[00:24:23] stopped eating meat what's the presupposition
[00:24:25] what's the presupposition that means i was once eating meat
[00:24:27] that means i was once eating meat okay so regardless of whether i have
[00:24:29] okay so regardless of whether i have stopped me or meat or if i even have i
[00:24:31] stopped me or meat or if i even have i said i didn't stop eating meat that
[00:24:33] said i didn't stop eating meat that still pre-supposes i was once eating
[00:24:35] still pre-supposes i was once eating meat so presuppositions are these very
[00:24:38] meat so presuppositions are these very slippery and insidious things that um
[00:24:43] slippery and insidious things that um people
[00:24:44] people use
[00:24:45] use to convey um to convince other people of
[00:24:48] to convey um to convince other people of things that without them knowing
[00:24:50] things that without them knowing so be it's really useful to know what a
[00:24:52] so be it's really useful to know what a presupposition is because if someone
[00:24:54] presupposition is because if someone tries to presuppose something on you
[00:24:56] tries to presuppose something on you then at least you'll have the language
[00:24:58] then at least you'll have the language to you know detect what it is and it's
[00:25:00] to you know detect what it is and it's precisely insidious because it's in the
[00:25:01] precisely insidious because it's in the background so you're focusing on you
[00:25:04] background so you're focusing on you know uh did i stop eating meat without
[00:25:05] know uh did i stop eating meat without realizing that you just got you know a
[00:25:07] realizing that you just got you know a presupposition that you might not agree
[00:25:09] presupposition that you might not agree with uh late on you
[00:25:13] okay so um paul grice who's this full of
[00:25:17] okay so um paul grice who's this full of you know philosopher
[00:25:19] you know philosopher um
[00:25:20] um the established uh with kind of
[00:25:23] the established uh with kind of established language as a kind of
[00:25:25] established language as a kind of cooperative game between a speaker and a
[00:25:28] cooperative game between a speaker and a listener and is that the dynamics of the
[00:25:31] listener and is that the dynamics of the game is what gives rise to these things
[00:25:33] game is what gives rise to these things like conversational implicature and
[00:25:35] like conversational implicature and presupposition right this goes back to
[00:25:37] presupposition right this goes back to xt comment it's really a
[00:25:41] xt comment it's really a a game between speakers and listeners
[00:25:43] a game between speakers and listeners who are trying to communicate and agree
[00:25:45] who are trying to communicate and agree on something and the conventions and
[00:25:47] on something and the conventions and what language means in all these
[00:25:49] what language means in all these contexts is uh really kind of context
[00:25:52] contexts is uh really kind of context dependent and fluid
[00:25:55] dependent and fluid uh just a few other ideas um ambiguity
[00:25:58] uh just a few other ideas um ambiguity vagueness and uncertainty so let me try
[00:26:01] vagueness and uncertainty so let me try to
[00:26:02] to explain what each of these means and how
[00:26:04] explain what each of these means and how it's different so ambiguity means that a
[00:26:06] it's different so ambiguity means that a sentence has more than one
[00:26:08] sentence has more than one possible but precise interpretation so
[00:26:11] possible but precise interpretation so here are some headlines um and let me
[00:26:13] here are some headlines um and let me know what you think of them
[00:26:15] know what you think of them okay so stolen painting found by tree
[00:26:18] okay so stolen painting found by tree okay so what does that mean
[00:26:20] okay so what does that mean how about iraqi head seeks arms
[00:26:23] how about iraqi head seeks arms or local high school dropouts cut in
[00:26:26] or local high school dropouts cut in half
[00:26:27] half juvenile court to try shooting defendant
[00:26:30] juvenile court to try shooting defendant kids mix nutritious snacks ban on new
[00:26:34] kids mix nutritious snacks ban on new dancing on governor's desk
[00:26:37] dancing on governor's desk and you can see if you're i have smiling
[00:26:39] and you can see if you're i have smiling a little bit that um
[00:26:42] a little bit that um these these
[00:26:43] these these headlines are funny because they have uh
[00:26:46] headlines are funny because they have uh a
[00:26:47] a maybe a serious meaning and then a
[00:26:48] maybe a serious meaning and then a meaning which is totally ridiculous but
[00:26:50] meaning which is totally ridiculous but um is nonetheless
[00:26:52] um is nonetheless kind of technically ambiguous
[00:26:55] kind of technically ambiguous um vagueness
[00:26:57] um vagueness is where a sentence is not uh
[00:27:00] is where a sentence is not uh has one interpretation but it does not
[00:27:02] has one interpretation but it does not specify the full information so if i
[00:27:04] specify the full information so if i said i had a late lunch
[00:27:06] said i had a late lunch you know there's no
[00:27:07] you know there's no ambiguity there it's just that i didn't
[00:27:10] ambiguity there it's just that i didn't tell you what time i laid lunch maybe it
[00:27:12] tell you what time i laid lunch maybe it was one o'clock or two o'clock or
[00:27:14] was one o'clock or two o'clock or something
[00:27:17] uncertainty is another form of you know
[00:27:20] uncertainty is another form of you know not knowing something and it's due to
[00:27:22] not knowing something and it's due to having a perfect uh model
[00:27:24] having a perfect uh model so say the witness was being
[00:27:26] so say the witness was being contumacious um
[00:27:28] contumacious um you know some of you probably know what
[00:27:30] you know some of you probably know what that means so you're not uncertain but
[00:27:31] that means so you're not uncertain but you know i'm some of you probably don't
[00:27:34] you know i'm some of you probably don't um
[00:27:35] um and uh you have this uncertainty which
[00:27:38] and uh you have this uncertainty which is not a property of the sentence
[00:27:41] is not a property of the sentence but of your uh the speaker's ability to
[00:27:44] but of your uh the speaker's ability to understand you know natural language
[00:27:48] understand you know natural language so all these things are
[00:27:50] so all these things are useful to think about differently
[00:27:52] useful to think about differently although often they get conflated
[00:27:54] although often they get conflated especially in kind of more
[00:27:56] especially in kind of more model-free
[00:27:58] model-free methods
[00:28:01] um so i will say that there is another
[00:28:04] um so i will say that there is another forum of style of linguistics called
[00:28:06] forum of style of linguistics called distributional semantics which is
[00:28:08] distributional semantics which is actually goes back to the 50s as well um
[00:28:11] actually goes back to the 50s as well um and i'll give you the basic idea so if i
[00:28:14] and i'll give you the basic idea so if i give you these sentences the new design
[00:28:16] give you these sentences the new design has blank lines
[00:28:18] has blank lines let's try to keep the kitchen blank
[00:28:20] let's try to keep the kitchen blank i forgot to
[00:28:22] i forgot to blink out the cabinet
[00:28:24] blink out the cabinet so what does blank mean
[00:28:27] so what does blank mean or which what word goes there
[00:28:36] someone say the answer
[00:28:39] someone say the answer the answer's a chat
[00:28:41] the answer's a chat oh okay i didn't know there was a this
[00:28:44] oh okay i didn't know there was a this is in the zoom chat
[00:28:45] is in the zoom chat yep okay
[00:28:48] yep okay let's see
[00:28:51] uh
[00:28:53] uh i think i lost
[00:28:55] i think i lost my
[00:28:55] my [Music]
[00:28:57] [Music] okay chat okay there we go
[00:29:00] okay chat okay there we go ah okay there are our answers okay clean
[00:29:03] ah okay there are our answers okay clean great
[00:29:06] okay got it
[00:29:09] okay so the idea about disreputation of
[00:29:12] okay so the idea about disreputation of semantics is i didn't have to tell you
[00:29:13] semantics is i didn't have to tell you what the word means the meaning of the
[00:29:15] what the word means the meaning of the word is characterized by the contexts in
[00:29:17] word is characterized by the contexts in which it appears so this is idea of the
[00:29:19] which it appears so this is idea of the distributional hypothesis semantically
[00:29:21] distributional hypothesis semantically similar words occur in similar contexts
[00:29:23] similar words occur in similar contexts or more eloquently said by
[00:29:26] or more eloquently said by firth you shall know a word by the
[00:29:29] firth you shall know a word by the company it keeps
[00:29:31] company it keeps so this is another way of thinking about
[00:29:33] so this is another way of thinking about semantics um
[00:29:35] semantics um and actually the one that has
[00:29:38] and actually the one that has really been uh
[00:29:40] really been uh picked up because it's so synergistic
[00:29:43] picked up because it's so synergistic with a modern kind of statistical
[00:29:44] with a modern kind of statistical techniques
[00:29:46] techniques so just going to summarize um
[00:29:48] so just going to summarize um there's there's two ways of thinking
[00:29:50] there's there's two ways of thinking about semantics or meaning of sentences
[00:29:52] about semantics or meaning of sentences one is composite essential semantics
[00:29:54] one is composite essential semantics where it's more top down and you model
[00:29:57] where it's more top down and you model first so you think about how
[00:30:00] first so you think about how language works you try to you know think
[00:30:03] language works you try to you know think about parse trees or you know semantic
[00:30:05] about parse trees or you know semantic forms
[00:30:06] forms um and you can capture a lot that way we
[00:30:10] um and you can capture a lot that way we can we went through a lot of examples
[00:30:12] can we went through a lot of examples where
[00:30:13] where language you can feel kind of language
[00:30:15] language you can feel kind of language does obey certain types of structures on
[00:30:18] does obey certain types of structures on the other hand you can
[00:30:19] the other hand you can think about distribution of semantics
[00:30:21] think about distribution of semantics which is a bottom-up data first approach
[00:30:23] which is a bottom-up data first approach and generally associated with vector
[00:30:25] and generally associated with vector spaces where
[00:30:26] spaces where you think about just words as
[00:30:30] you think about just words as meaning and not really trying to nail
[00:30:32] meaning and not really trying to nail down what meaning is but just associated
[00:30:34] down what meaning is but just associated with the set of contexts in which you
[00:30:36] with the set of contexts in which you know appears
[00:30:38] know appears so let me do another
[00:30:40] so let me do another poll
[00:30:42] and
[00:30:44] and let me hold on i need to create this
[00:30:46] let me hold on i need to create this question
[00:30:48] question um
[00:30:50] um and ask
[00:30:52] and ask what do you think is the best way to
[00:30:54] what do you think is the best way to achieve natural language you know
[00:30:56] achieve natural language you know understanding
[00:31:02] so is it compositional semantics or
[00:31:05] so is it compositional semantics or distributional semantics
[00:31:10] okay so go to slido and
[00:31:14] okay so go to slido and i'm curious what you you think
[00:31:20] okay so
[00:31:28] it would have been interesting to go
[00:31:29] it would have been interesting to go back maybe 10 years and ask this
[00:31:31] back maybe 10 years and ask this question
[00:31:32] question because i think the answers would have
[00:31:34] because i think the answers would have been quite different
[00:31:35] been quite different and i'll talk a little bit more about
[00:31:37] and i'll talk a little bit more about that in a bit
[00:31:40] so it looks like
[00:31:42] so it looks like it's about 30
[00:31:44] it's about 30 compositional and 70
[00:31:47] compositional and 70 maybe quarter three quarters
[00:31:51] maybe quarter three quarters so most of you think that distributional
[00:31:52] so most of you think that distributional semantics is the way to go
[00:31:54] semantics is the way to go which is uh
[00:31:56] which is uh maybe a concordant with
[00:31:59] maybe a concordant with what is happening in the world right now
[00:32:05] okay so why do i take um
[00:32:07] okay so why do i take um a few minutes right now i went through a
[00:32:09] a few minutes right now i went through a lot of material and maybe i'll just ask
[00:32:12] lot of material and maybe i'll just ask if there are any questions
[00:32:15] if there are any questions to discuss
[00:32:45] no questions
[00:32:49] someone has to have a question
[00:32:55] so one question is context information
[00:32:57] so one question is context information it's never spelled out but the meaning
[00:32:59] it's never spelled out but the meaning depends on who is speaking out where
[00:33:03] depends on who is speaking out where yeah so i've been kind of deliberately
[00:33:05] yeah so i've been kind of deliberately vague about what context
[00:33:08] vague about what context is
[00:33:09] is um traditionally it's a linguistic
[00:33:11] um traditionally it's a linguistic context which is the words next to a
[00:33:14] context which is the words next to a sentence but um you could imagine that
[00:33:16] sentence but um you could imagine that it could be very much generalized to
[00:33:19] it could be very much generalized to uh
[00:33:19] uh you know context of
[00:33:21] you know context of the speaker
[00:33:23] the speaker in the you know multilink multimodal
[00:33:25] in the you know multilink multimodal what's going on in the world what who is
[00:33:27] what's going on in the world what who is speaking to whom the person is speaking
[00:33:30] speaking to whom the person is speaking and all of that
[00:33:32] and all of that rich
[00:33:34] rich you know contextual information
[00:33:36] you know contextual information it definitely is uh
[00:33:39] it definitely is uh useful for understanding the meaning of
[00:33:41] useful for understanding the meaning of that word
[00:33:43] that word um so in the beginning you said that
[00:33:45] um so in the beginning you said that humans have the highest level of
[00:33:46] humans have the highest level of communication how likely is it that so
[00:33:48] communication how likely is it that so actually some animal has much higher
[00:33:50] actually some animal has much higher level of communication but we're not
[00:33:52] level of communication but we're not smart enough to understand it yeah
[00:33:54] smart enough to understand it yeah that's a that's an interesting question
[00:33:56] that's a that's an interesting question um
[00:33:58] um so
[00:33:59] so um
[00:34:01] um it's yeah it's almost a little bit of a
[00:34:04] it's yeah it's almost a little bit of a you know philosophical question
[00:34:06] you know philosophical question uh because
[00:34:08] uh because um
[00:34:09] um you know it is you know in theory
[00:34:11] you know it is you know in theory possible that some animal has you know
[00:34:13] possible that some animal has you know brilliant system of communication and uh
[00:34:16] brilliant system of communication and uh we just didn't you know measure it you
[00:34:18] we just didn't you know measure it you know properly people have been surprised
[00:34:20] know properly people have been surprised by how sophisticated certain animals are
[00:34:23] by how sophisticated certain animals are able to you know communicate um
[00:34:27] able to you know communicate um you know like you know dolphins or
[00:34:29] you know like you know dolphins or elephants or even you know uh um you
[00:34:31] elephants or even you know uh um you know bees
[00:34:33] know bees often people
[00:34:35] often people separate the line between having
[00:34:38] separate the line between having recursion or
[00:34:40] recursion or language that's able to express kind of
[00:34:42] language that's able to express kind of compositional thoughts versus ones which
[00:34:44] compositional thoughts versus ones which are maybe very contextual
[00:34:47] are maybe very contextual nuanced but um
[00:34:50] nuanced but um but don't have that kind of level of
[00:34:52] but don't have that kind of level of abstraction
[00:34:53] abstraction um
[00:34:54] um and so
[00:34:56] and so you know according to that i think we're
[00:34:58] you know according to that i think we're pretty sure that humans are
[00:35:00] pretty sure that humans are the ones that have the most amount of
[00:35:02] the ones that have the most amount of abstraction but then again i guess this
[00:35:04] abstraction but then again i guess this is also a very human-centric way of
[00:35:06] is also a very human-centric way of defining you know what highest level of
[00:35:09] defining you know what highest level of communication means um because maybe
[00:35:12] communication means um because maybe some other creatures have more you know
[00:35:14] some other creatures have more you know contextual more nuanced than you in
[00:35:16] contextual more nuanced than you in human language
[00:35:18] human language um
[00:35:19] um elephants communicating below 20 hertz
[00:35:22] elephants communicating below 20 hertz infrasound
[00:35:24] infrasound um
[00:35:25] um okay and hitchhiker guide thanks
[00:35:29] okay and hitchhiker guide thanks that's a good one how about uh plants
[00:35:31] that's a good one how about uh plants communication can be
[00:35:34] communication can be um
[00:35:36] um chemical color temperature even touch
[00:35:39] chemical color temperature even touch yeah so there's a lot of other and i
[00:35:41] yeah so there's a lot of other and i guess in general communication is not
[00:35:43] guess in general communication is not the
[00:35:44] the uh
[00:35:46] uh you know same thing as as language
[00:35:51] yeah so i'm using language very narrowly
[00:35:54] yeah so i'm using language very narrowly here to mean kind of human language and
[00:35:57] here to mean kind of human language and what we know to be human language of
[00:35:59] what we know to be human language of course even humans communicate in other
[00:36:00] course even humans communicate in other ways like gestures and
[00:36:02] ways like gestures and um
[00:36:03] um and so on
[00:36:08] okay great let me move on um
[00:36:11] okay great let me move on um for the questions
[00:36:13] for the questions so what i want to do next is to talk
[00:36:16] so what i want to do next is to talk about
[00:36:19] building language understanding systems
[00:36:22] building language understanding systems and what has happened over the last
[00:36:25] and what has happened over the last 60 years
[00:36:26] 60 years now that we have maybe a greater
[00:36:28] now that we have maybe a greater appreciation of what
[00:36:29] appreciation of what language is
[00:36:31] language is um
[00:36:32] um so we've kind of seen this slide before
[00:36:35] so we've kind of seen this slide before it's the turing test alan turing in 1950
[00:36:38] it's the turing test alan turing in 1950 asked the philosophical question devise
[00:36:40] asked the philosophical question devise the terrain test
[00:36:41] the terrain test to
[00:36:42] to test whether
[00:36:44] test whether um
[00:36:45] um you know a a
[00:36:47] you know a a computer could be or a machine could be
[00:36:50] computer could be or a machine could be intelligent by
[00:36:52] intelligent by seeing if it could talk to a human and
[00:36:54] seeing if it could talk to a human and convince the human that it was actually
[00:36:56] convince the human that it was actually a person
[00:36:57] a person um this is one of the dialogues that
[00:36:59] um this is one of the dialogues that they have in from his paper
[00:37:02] they have in from his paper um you can read it
[00:37:04] um you can read it um what i want to emphasize here is that
[00:37:07] um what i want to emphasize here is that you know turning was not interested in
[00:37:09] you know turning was not interested in language
[00:37:10] language and wasn't trying to design a language
[00:37:11] and wasn't trying to design a language understanding test he was trying to
[00:37:13] understanding test he was trying to design
[00:37:14] design a test of intelligence
[00:37:16] a test of intelligence and language was just
[00:37:18] and language was just the means to convince someone or to kind
[00:37:22] the means to convince someone or to kind of verify that there was something you
[00:37:24] of verify that there was something you know up there
[00:37:26] know up there um
[00:37:26] um and so i think this is kind of very
[00:37:29] and so i think this is kind of very interesting from an ai perspective
[00:37:31] interesting from an ai perspective because language does seem like it has a
[00:37:34] because language does seem like it has a unique capability of
[00:37:36] unique capability of being able to
[00:37:38] being able to even
[00:37:39] even you know
[00:37:40] you know in order to have intelligence you need
[00:37:42] in order to have intelligence you need to be able to
[00:37:44] to be able to uh
[00:37:45] uh essentially
[00:37:46] essentially show it in some way and um
[00:37:51] show it in some way and um and you can think all the deep thoughts
[00:37:53] and you can think all the deep thoughts in your head but if you don't have a way
[00:37:54] in your head but if you don't have a way of kind of externalizing it then well
[00:37:57] of kind of externalizing it then well it's it's like the if the tree falls in
[00:37:59] it's it's like the if the tree falls in the
[00:38:00] the forest doesn't make a sound kind of you
[00:38:02] forest doesn't make a sound kind of you know setting
[00:38:03] know setting um so and it
[00:38:05] um so and it language has turned out to be a very uh
[00:38:08] language has turned out to be a very uh useful way of kind of probing for
[00:38:10] useful way of kind of probing for understanding
[00:38:12] understanding one of the early systems i want to
[00:38:14] one of the early systems i want to highlight
[00:38:16] highlight is
[00:38:16] is know eliza so joseph weisenbaum when he
[00:38:20] know eliza so joseph weisenbaum when he was at mit
[00:38:21] was at mit um developed this system
[00:38:23] um developed this system called elisa which was really meant to
[00:38:25] called elisa which was really meant to be a parody of a psychotherapist
[00:38:28] be a parody of a psychotherapist so the system
[00:38:29] so the system would uh you would lie on a couch and
[00:38:32] would uh you would lie on a couch and you would talk to the system
[00:38:33] you would talk to the system um and uh some dialogues would go like
[00:38:36] um and uh some dialogues would go like this or eliza would say is something
[00:38:39] this or eliza would say is something troubling you men are all like what's
[00:38:41] troubling you men are all like what's the connection do you suppose they're
[00:38:43] the connection do you suppose they're always bugging me about something or
[00:38:44] always bugging me about something or other can you think of a specific
[00:38:45] other can you think of a specific example well my boyfriend made me to
[00:38:47] example well my boyfriend made me to come here
[00:38:49] come here is it important to you that your
[00:38:50] is it important to you that your boyfriend made it you come here this is
[00:38:53] boyfriend made it you come here this is so depressed and so on
[00:38:55] so depressed and so on um so
[00:38:57] um so what was interesting about this system
[00:38:59] what was interesting about this system is that
[00:39:00] is that the system itself is very simple this is
[00:39:02] the system itself is very simple this is you know 1960s so there's not that much
[00:39:05] you know 1960s so there's not that much kind of
[00:39:06] kind of you know going on in terms of you know
[00:39:08] you know going on in terms of you know complexity it was based on rules and
[00:39:11] complexity it was based on rules and matching so you can imagine a rule that
[00:39:13] matching so you can imagine a rule that says if the word alike shows then you
[00:39:16] says if the word alike shows then you ask you know what is the connection or
[00:39:18] ask you know what is the connection or if you
[00:39:19] if you uh say always
[00:39:21] uh say always then you say can you elijah says can you
[00:39:24] then you say can you elijah says can you think of an example specific example and
[00:39:26] think of an example specific example and so on
[00:39:27] so on um so it was very simple but what joseph
[00:39:30] um so it was very simple but what joseph weisenbaum found out which was
[00:39:32] weisenbaum found out which was really surprising is that the people he
[00:39:35] really surprising is that the people he uh showed this to actually started
[00:39:37] uh showed this to actually started getting emotionally attached and there
[00:39:39] getting emotionally attached and there was one incident where as invalid
[00:39:40] was one incident where as invalid secretary actually asked joseph to leave
[00:39:43] secretary actually asked joseph to leave the room so that the secretary could
[00:39:45] the room so that the secretary could have a you know a real conversation with
[00:39:48] have a you know a real conversation with eliza
[00:39:50] eliza so
[00:39:51] so this is kind of in the 60s i think it
[00:39:54] this is kind of in the 60s i think it was very perhaps
[00:39:56] was very perhaps you know i guess telling of maybe what
[00:39:59] you know i guess telling of maybe what is to come later i'll talk about gpg3
[00:40:01] is to come later i'll talk about gpg3 which is obviously a much more realistic
[00:40:04] which is obviously a much more realistic version of this but you could think
[00:40:05] version of this but you could think definitely about some of the you know
[00:40:07] definitely about some of the you know consequences of of that technology
[00:40:10] consequences of of that technology incidentally
[00:40:11] incidentally weisenbaum later
[00:40:14] weisenbaum later in his career became very pessimistic
[00:40:15] in his career became very pessimistic and actually very negative and critical
[00:40:17] and actually very negative and critical about you know technology maybe because
[00:40:20] about you know technology maybe because he uh had this epiphany that well what
[00:40:22] he uh had this epiphany that well what we're building is actually maybe not so
[00:40:25] we're building is actually maybe not so good after all
[00:40:27] good after all um so this is one of my kind of favorite
[00:40:30] um so this is one of my kind of favorite natural language uh you know systems
[00:40:32] natural language uh you know systems it's been built by terry buenograd
[00:40:34] it's been built by terry buenograd was also at mit but he moved to stanford
[00:40:37] was also at mit but he moved to stanford where he was became a you know a hdi
[00:40:40] where he was became a you know a hdi faculty for a number of years um it's
[00:40:42] faculty for a number of years um it's called churlu and the idea is that you
[00:40:44] called churlu and the idea is that you have a person who is able to conduct a
[00:40:46] have a person who is able to conduct a dialogue about a block's world
[00:40:49] dialogue about a block's world environment so pick up a red block okay
[00:40:52] environment so pick up a red block okay grasp the pyramid this computer is going
[00:40:55] grasp the pyramid this computer is going to say when it doesn't understand things
[00:40:57] to say when it doesn't understand things um
[00:40:58] um you know find a block which is taller
[00:41:00] you know find a block which is taller than when you're holding and put it into
[00:41:01] than when you're holding and put it into the box so it's you know fairly
[00:41:03] the box so it's you know fairly complicated and that the computer can
[00:41:05] complicated and that the computer can kind of reason and do a nap for a
[00:41:07] kind of reason and do a nap for a co-reference resolution um
[00:41:09] co-reference resolution um and ask for clarifications and so on
[00:41:12] and ask for clarifications and so on um what i think is remarkable
[00:41:15] um what i think is remarkable about the system
[00:41:17] about the system is that it was an end-to-end system um
[00:41:20] is that it was an end-to-end system um included a parser it can do semantic
[00:41:22] included a parser it can do semantic interpretation dialogue planning it
[00:41:24] interpretation dialogue planning it wasn't just a language system in fact it
[00:41:27] wasn't just a language system in fact it was more framed as an ai system that
[00:41:29] was more framed as an ai system that could allow
[00:41:31] could allow a robot to kind of do things in the
[00:41:33] a robot to kind of do things in the world
[00:41:34] world um you know
[00:41:36] um you know and so this was you know in some sense
[00:41:39] and so this was you know in some sense um kind of the first the real super
[00:41:42] um kind of the first the real super ambitious uh project for its time
[00:41:46] ambitious uh project for its time however um you know while sherloo worked
[00:41:50] however um you know while sherloo worked really well in the limited domain
[00:41:53] really well in the limited domain um terry werner later wrote this
[00:41:56] um terry werner later wrote this paragraph which is interesting said a
[00:41:58] paragraph which is interesting said a number of people suggest me that
[00:42:00] number of people suggest me that this is a dead end in programming
[00:42:03] this is a dead end in programming complex interactions between the
[00:42:04] complex interactions between the components
[00:42:06] components made it just really hard to
[00:42:08] made it just really hard to understand what was going on
[00:42:10] understand what was going on so eventually
[00:42:12] so eventually terry couldn't even
[00:42:14] terry couldn't even build and extend the program because it
[00:42:16] build and extend the program because it was just too hard to keep you know in
[00:42:18] was just too hard to keep you know in his head
[00:42:20] his head so this is you know interesting because
[00:42:22] so this is you know interesting because uh as we know um language understanding
[00:42:25] uh as we know um language understanding didn't you know really get uh
[00:42:27] didn't you know really get uh you know solved despite these kind of
[00:42:29] you know solved despite these kind of narrow successes
[00:42:31] narrow successes um
[00:42:32] um and
[00:42:33] and the history of nlp
[00:42:35] the history of nlp mirrors uh quite closely the history of
[00:42:38] mirrors uh quite closely the history of you know ai
[00:42:39] you know ai in general remember in the first lecture
[00:42:42] in general remember in the first lecture i talked about how um ai was filled with
[00:42:46] i talked about how um ai was filled with more kind of these logical based methods
[00:42:48] more kind of these logical based methods which
[00:42:48] which didn't quite scale
[00:42:51] didn't quite scale what's interesting is that at that
[00:42:54] what's interesting is that at that at that time in ai in general there were
[00:42:56] at that time in ai in general there were people
[00:42:57] people working on neural networks although the
[00:42:58] working on neural networks although the vast minority of people but in language
[00:43:01] vast minority of people but in language it was perhaps even less so because i
[00:43:03] it was perhaps even less so because i think language is actually
[00:43:06] think language is actually a discrete communication system
[00:43:10] a discrete communication system and linguist
[00:43:11] and linguist there was kind of a rich body of work on
[00:43:13] there was kind of a rich body of work on linguistics and nlp and linguistics kind
[00:43:16] linguistics and nlp and linguistics kind of um you know uh co-evolved in
[00:43:20] of um you know uh co-evolved in in certain ways that made it kind of
[00:43:23] in certain ways that made it kind of um very natural to embrace the kind of
[00:43:25] um very natural to embrace the kind of all the the logical structure that was
[00:43:27] all the the logical structure that was embodied in you know in language
[00:43:30] embodied in you know in language um
[00:43:31] um but you know i think people realize that
[00:43:33] but you know i think people realize that the there was cracks that were you know
[00:43:36] the there was cracks that were you know showing at the seams in the even in the
[00:43:39] showing at the seams in the even in the 70s but especially the late 80s and
[00:43:42] 70s but especially the late 80s and in 1990 it was a time for
[00:43:45] in 1990 it was a time for a new set of methods to come onto the
[00:43:48] a new set of methods to come onto the scene
[00:43:49] scene um so this actually started
[00:43:51] um so this actually started uh a bit earlier from speech recognition
[00:43:55] uh a bit earlier from speech recognition because speeches and language are you
[00:43:58] because speeches and language are you know closely related and speech is
[00:44:00] know closely related and speech is definitely something that you know uh is
[00:44:03] definitely something that you know uh is the bridge between the continuous
[00:44:05] the bridge between the continuous kind of noisy world where you want to be
[00:44:08] kind of noisy world where you want to be doing kind of more pattern recognition
[00:44:09] doing kind of more pattern recognition type things and the kind of a logical
[00:44:11] type things and the kind of a logical world
[00:44:12] world um so hmms uh hidden marker models were
[00:44:15] um so hmms uh hidden marker models were developed for speech in the 70s and 80s
[00:44:18] developed for speech in the 70s and 80s and
[00:44:19] and finally in 1990 there was a kind of
[00:44:21] finally in 1990 there was a kind of famous paper from ibm research um
[00:44:25] famous paper from ibm research um colloquially called the ibm models for
[00:44:27] colloquially called the ibm models for machine translation they developed a
[00:44:28] machine translation they developed a probabilistic model that could translate
[00:44:31] probabilistic model that could translate between um you know two languages and at
[00:44:34] between um you know two languages and at before then um translation was
[00:44:37] before then um translation was completely kind of logical and grammar
[00:44:39] completely kind of logical and grammar and rule based and this was a radical
[00:44:41] and rule based and this was a radical way of thinking about it this is
[00:44:43] way of thinking about it this is actually instantly based on a lot of the
[00:44:47] actually instantly based on a lot of the bayesian networks or that we'll see you
[00:44:49] bayesian networks or that we'll see you later in the course
[00:44:51] later in the course um so for a lot of the 90s a lot of
[00:44:54] um so for a lot of the 90s a lot of these uh what are called generative
[00:44:55] these uh what are called generative models um think about them extensions of
[00:44:58] models um think about them extensions of asian networks really kind of dominated
[00:45:00] asian networks really kind of dominated uh nlp
[00:45:01] uh nlp um near about 2000 um
[00:45:05] um near about 2000 um there were uh people started turning to
[00:45:07] there were uh people started turning to discriminative models
[00:45:09] discriminative models um you know aka uh you know
[00:45:13] um you know aka uh you know classification linear classification um
[00:45:16] classification linear classification um and um there was another famous paper
[00:45:20] and um there was another famous paper called which introduced conditional
[00:45:22] called which introduced conditional random fields which uh marries the
[00:45:25] random fields which uh marries the structure that was so inherent in in
[00:45:28] structure that was so inherent in in language with
[00:45:30] language with um
[00:45:31] um with
[00:45:32] with basically linear classification
[00:45:35] basically linear classification uh so this was used to do things such as
[00:45:39] uh so this was used to do things such as named entity recognition where you would
[00:45:41] named entity recognition where you would mark up words as
[00:45:43] mark up words as names of people or companies and so on
[00:45:47] names of people or companies and so on and so instead of predicting just one y
[00:45:50] and so instead of predicting just one y from x you predict a bunch of y's
[00:45:53] from x you predict a bunch of y's from x where y's are the labels of all
[00:45:55] from x where y's are the labels of all the words
[00:45:56] the words um so this technology was actually quite
[00:45:59] um so this technology was actually quite influential in
[00:46:02] in nlp but also more broadly in computer
[00:46:05] in nlp but also more broadly in computer vision
[00:46:06] vision where for much of the
[00:46:08] where for much of the 2000s people were invested in having
[00:46:12] 2000s people were invested in having this was a kind of main way people
[00:46:14] this was a kind of main way people tackle kind of structured tasks
[00:46:18] tackle kind of structured tasks another thing i'll mention is lane
[00:46:19] another thing i'll mention is lane dirichlet allocation which also
[00:46:22] dirichlet allocation which also uh came from models of language and here
[00:46:25] uh came from models of language and here the emphasis on unsupervised learning
[00:46:28] the emphasis on unsupervised learning where you point lda at a text and it can
[00:46:30] where you point lda at a text and it can discover topics
[00:46:33] discover topics in the text so here's a text where it
[00:46:35] in the text so here's a text where it can discover things like oh some words
[00:46:37] can discover things like oh some words are about budget some words about
[00:46:40] are about budget some words about children and some words about arts in
[00:46:42] children and some words about arts in kind of unsupervised way and this led to
[00:46:45] kind of unsupervised way and this led to a whole
[00:46:47] a whole kind of cottage industry of topic
[00:46:50] kind of cottage industry of topic paper topic modeling papers and lda you
[00:46:53] paper topic modeling papers and lda you know continues to be something that
[00:46:55] know continues to be something that you know is commonly used in you know in
[00:46:57] you know is commonly used in you know in practice
[00:46:59] practice what i will say is that a lot of
[00:47:03] what i will say is that a lot of these developments
[00:47:04] these developments it's interesting to kind of think about
[00:47:06] it's interesting to kind of think about how they were developed by someone
[00:47:09] how they were developed by someone trying to solve
[00:47:11] trying to solve address a problem in natural language
[00:47:13] address a problem in natural language processing and it led to of more general
[00:47:17] processing and it led to of more general technology that then was applied in all
[00:47:20] technology that then was applied in all sorts of different you know areas like
[00:47:22] sorts of different you know areas like computer vision and genomics and so on
[00:47:26] computer vision and genomics and so on okay so so now um 2000 uh you know we're
[00:47:30] okay so so now um 2000 uh you know we're ending
[00:47:32] ending and we know that
[00:47:33] and we know that at the end of the 2000s deep learning
[00:47:35] at the end of the 2000s deep learning really started gaining momentum
[00:47:39] really started gaining momentum it wasn't you know it was imagenet was
[00:47:41] it wasn't you know it was imagenet was 2009 so it wasn't like a
[00:47:44] 2009 so it wasn't like a you know a huge yet but there was
[00:47:46] you know a huge yet but there was definitely kind of rumblings um and it's
[00:47:49] definitely kind of rumblings um and it's interesting how kind of culturally how
[00:47:53] interesting how kind of culturally how the nlp community kind of reacted
[00:47:56] the nlp community kind of reacted at the time nlp and vision were both all
[00:47:59] at the time nlp and vision were both all very kind of skeptical and if you think
[00:48:02] very kind of skeptical and if you think about where nlp had been a lot of people
[00:48:06] about where nlp had been a lot of people still view languages you know structure
[00:48:08] still view languages you know structure heavy and languages has a lot of lane
[00:48:11] heavy and languages has a lot of lane structure and no way that this mess of
[00:48:13] structure and no way that this mess of neurons could actually do anything uh
[00:48:16] neurons could actually do anything uh with this kind of
[00:48:18] with this kind of intricate structure um and you can think
[00:48:20] intricate structure um and you can think about a lot of the work in
[00:48:22] about a lot of the work in the 2000s was really a marrying of this
[00:48:26] the 2000s was really a marrying of this structure with kind of statistical
[00:48:28] structure with kind of statistical methods
[00:48:29] methods um so it was you can think about as
[00:48:31] um so it was you can think about as putting probabilistic
[00:48:34] putting probabilistic choices on kind of this very rich uh
[00:48:38] choices on kind of this very rich uh structure discrete backbone right so in
[00:48:40] structure discrete backbone right so in some ways this is kind of a
[00:48:41] some ways this is kind of a reconciliation of the compositional
[00:48:44] reconciliation of the compositional semantics with the distribution of
[00:48:45] semantics with the distribution of status you have it you have both um
[00:48:49] status you have it you have both um but but still i think
[00:48:52] but but still i think it was still
[00:48:53] it was still largely based on kind of traditional
[00:48:56] largely based on kind of traditional kind of linguistic thinking so then um i
[00:48:59] kind of linguistic thinking so then um i remember there was this
[00:49:00] remember there was this 2011 you know workshop at nureps um
[00:49:04] 2011 you know workshop at nureps um i was at it um and there was uh
[00:49:07] i was at it um and there was uh there were some
[00:49:09] there were some uh so neurops is you know the machine
[00:49:10] uh so neurops is you know the machine learning conference so there were a
[00:49:11] learning conference so there were a bunch of machine learning people who
[00:49:13] bunch of machine learning people who were using vector based models to argue
[00:49:16] were using vector based models to argue that this covers semantics and then you
[00:49:18] that this covers semantics and then you have ray mooney who is much more of a
[00:49:19] have ray mooney who is much more of a logic old-school ai
[00:49:22] logic old-school ai at the time kind of person
[00:49:24] at the time kind of person and a heated argument kind of broke out
[00:49:27] and a heated argument kind of broke out and he is famous for saying you know you
[00:49:29] and he is famous for saying you know you can't cram the meaning of a whole
[00:49:31] can't cram the meaning of a whole sentence into a single uh vector
[00:49:34] sentence into a single uh vector um
[00:49:35] um okay so so that was captured the
[00:49:37] okay so so that was captured the attitude you know at the time
[00:49:40] attitude you know at the time um
[00:49:41] um and then things kind of started changing
[00:49:42] and then things kind of started changing i think the first kind of um
[00:49:45] i think the first kind of um you know maybe move was uh you know word
[00:49:47] you know maybe move was uh you know word to vec which was this way of taking a
[00:49:50] to vec which was this way of taking a lots of text and
[00:49:53] lots of text and embedding words
[00:49:54] embedding words so that each embedding was uh
[00:49:57] so that each embedding was uh characterizing the context of that word
[00:50:00] characterizing the context of that word so actually word representations have
[00:50:02] so actually word representations have been had been around you know since the
[00:50:05] been had been around you know since the 90s but it was somehow this word event
[00:50:08] 90s but it was somehow this word event uh
[00:50:09] uh was came at the right time that
[00:50:12] was came at the right time that really caused people to pay attention
[00:50:13] really caused people to pay attention and i think one thing that they noticed
[00:50:15] and i think one thing that they noticed which
[00:50:16] which gathered a lot of attention was
[00:50:18] gathered a lot of attention was the fact that you could do analogies for
[00:50:20] the fact that you could do analogies for example if you embed things in a vector
[00:50:23] example if you embed things in a vector space you see that country and capitals
[00:50:25] space you see that country and capitals are related by a consistent you know
[00:50:27] are related by a consistent you know relationship you know with some
[00:50:29] relationship you know with some asterisks
[00:50:31] asterisks there was a recent paper from last year
[00:50:33] there was a recent paper from last year which i'll
[00:50:35] which i'll which i'll just highlight because i
[00:50:36] which i'll just highlight because i thought it was really interesting i mean
[00:50:39] thought it was really interesting i mean six years later and they used a kind of
[00:50:40] six years later and they used a kind of a simplest method but it was um they ran
[00:50:43] a simplest method but it was um they ran this uh word event just on you know
[00:50:46] this uh word event just on you know about
[00:50:48] about three million abstracts of material
[00:50:50] three million abstracts of material science papers um just strings and they
[00:50:53] science papers um just strings and they were able to discover certain types of
[00:50:56] were able to discover certain types of patterns um by looking at the vector
[00:50:58] patterns um by looking at the vector spaces and actually predict uh certain
[00:51:01] spaces and actually predict uh certain types of chem
[00:51:02] types of chem compounds as being having certain kind
[00:51:05] compounds as being having certain kind of material properties like you know
[00:51:06] of material properties like you know therm being thermoelectric
[00:51:08] therm being thermoelectric so uh this is kind of an interesting
[00:51:12] so uh this is kind of an interesting uh
[00:51:12] uh view of how something that's so
[00:51:15] view of how something that's so kind of you know dead simple and knows
[00:51:17] kind of you know dead simple and knows nothing about chemistry and only knows
[00:51:19] nothing about chemistry and only knows about what crew occurrences can actually
[00:51:21] about what crew occurrences can actually you know generate some interesting you
[00:51:23] you know generate some interesting you know insights
[00:51:25] know insights um so word effect wasn't
[00:51:27] um so word effect wasn't um you know deep learning in the sense
[00:51:29] um you know deep learning in the sense that it was only i guess one layer so it
[00:51:33] that it was only i guess one layer so it was kind of shallow learning
[00:51:35] was kind of shallow learning and i think
[00:51:37] and i think uh
[00:51:38] uh 2014 was when the deep learning
[00:51:40] 2014 was when the deep learning community really kind of um i don't know
[00:51:43] community really kind of um i don't know in some sense you know vindicated
[00:51:45] in some sense you know vindicated itself in the nlp community so there's a
[00:51:49] itself in the nlp community so there's a sequence the sequence learning paper
[00:51:51] sequence the sequence learning paper from google in 2014
[00:51:54] from google in 2014 where
[00:51:55] where they
[00:51:56] they did machine translation
[00:51:58] did machine translation and the way they did machine translation
[00:52:00] and the way they did machine translation was by
[00:52:01] was by taking a sentence and running a lstm um
[00:52:06] taking a sentence and running a lstm um some if you don't uh know what it is
[00:52:08] some if you don't uh know what it is it's fine it's just some
[00:52:10] it's fine it's just some black box that embeds um the sentence
[00:52:13] black box that embeds um the sentence into a single vector and then using that
[00:52:15] into a single vector and then using that vector it spits out a new a new sentence
[00:52:19] vector it spits out a new a new sentence so if you look watch the module on
[00:52:21] so if you look watch the module on differential pro programming it'll give
[00:52:22] differential pro programming it'll give you a better idea of what i'm talking
[00:52:24] you a better idea of what i'm talking about
[00:52:25] about um so this was really cramming the
[00:52:28] um so this was really cramming the meaning of a sentence into a vector like
[00:52:31] meaning of a sentence into a vector like literally
[00:52:32] literally and you know at that time the results
[00:52:34] and you know at that time the results were you know
[00:52:36] were you know kind of okay but it was
[00:52:38] kind of okay but it was enough of a proof of concept and you
[00:52:41] enough of a proof of concept and you know surprising enough that it later
[00:52:43] know surprising enough that it later extensions of this really transform into
[00:52:47] extensions of this really transform into actually usable
[00:52:48] actually usable you know technology
[00:52:50] you know technology so it's interesting if you look at the
[00:52:52] so it's interesting if you look at the progression from rule-based
[00:52:55] progression from rule-based you know machine translation where
[00:52:56] you know machine translation where there's no machine learning to
[00:52:58] there's no machine learning to statistical machine translation where
[00:53:00] statistical machine translation where they're still
[00:53:01] they're still um
[00:53:02] um you know probably
[00:53:03] you know probably still data driven but
[00:53:05] still data driven but based on
[00:53:06] based on um more or less kind of some sort of
[00:53:09] um more or less kind of some sort of structure of language to the neural
[00:53:11] structure of language to the neural world where there's really even less
[00:53:13] world where there's really even less structure um
[00:53:15] structure um and things have kind of you know gotten
[00:53:17] and things have kind of you know gotten you know better
[00:53:19] you know better there's a
[00:53:20] there's a researcher called fred gelnak who's
[00:53:22] researcher called fred gelnak who's famously quoted as saying you know every
[00:53:24] famously quoted as saying you know every time i fire a linguist my accuracy goes
[00:53:27] time i fire a linguist my accuracy goes up um i'm not sure he said it was exact
[00:53:29] up um i'm not sure he said it was exact words but that's
[00:53:31] words but that's at least
[00:53:32] at least kind of an urban
[00:53:34] kind of an urban legend
[00:53:35] legend um
[00:53:36] um one other note i'll make is that no
[00:53:38] one other note i'll make is that no machine translation seems to be
[00:53:41] machine translation seems to be the task at least in nlp but maybe more
[00:53:43] the task at least in nlp but maybe more broadly that has really
[00:53:46] broadly that has really pushed the limits of kind of you know
[00:53:48] pushed the limits of kind of you know technologies and i think it was really
[00:53:50] technologies and i think it was really the driver that got you know c2c
[00:53:52] the driver that got you know c2c technology kind of going
[00:53:55] technology kind of going um so in 2016 google kind of
[00:53:58] um so in 2016 google kind of completely transformed their machine
[00:54:00] completely transformed their machine translation system
[00:54:01] translation system and instead of having multiple
[00:54:04] and instead of having multiple uh
[00:54:05] uh systems one for every pair so you have
[00:54:07] systems one for every pair so you have like n squared pairs you actually have
[00:54:09] like n squared pairs you actually have one system that can do translations
[00:54:12] one system that can do translations between any pair of languages which is
[00:54:14] between any pair of languages which is you know really kind of you know
[00:54:16] you know really kind of you know mind-blowing at the time and still kind
[00:54:18] mind-blowing at the time and still kind of uh i think kind of impressive
[00:54:21] of uh i think kind of impressive um
[00:54:22] um i'll mention i think i mentioned some of
[00:54:25] i'll mention i think i mentioned some of these things already but i think it's
[00:54:26] these things already but i think it's worth highlighting that you know these
[00:54:28] worth highlighting that you know these statistical methods do
[00:54:30] statistical methods do have a lot of biases in them so if you
[00:54:32] have a lot of biases in them so if you translate
[00:54:33] translate uh these sentences you get genders
[00:54:36] uh these sentences you get genders appearing which are correlated with
[00:54:38] appearing which are correlated with certain types of professions um this is
[00:54:41] certain types of professions um this is even more extreme so if you take a rare
[00:54:43] even more extreme so if you take a rare language where there's not much data and
[00:54:44] language where there's not much data and you pump in something that's
[00:54:46] you pump in something that's just
[00:54:47] just kind of garbage you get some really uh
[00:54:50] kind of garbage you get some really uh some you know disturbing uh translations
[00:54:52] some you know disturbing uh translations coming out um this was a film a few
[00:54:54] coming out um this was a film a few years ago so they might have fixed this
[00:54:56] years ago so they might have fixed this but nonetheless um
[00:54:58] but nonetheless um it turns out that you can cram a lot
[00:55:00] it turns out that you can cram a lot into a vector but there's some really
[00:55:02] into a vector but there's some really weird stuff
[00:55:03] weird stuff in there
[00:55:06] in there um so then maybe this is a good time to
[00:55:07] um so then maybe this is a good time to pause if there's any you know questions
[00:55:09] pause if there's any you know questions before the next wave of slides
[00:55:12] before the next wave of slides there are some questions on slider
[00:55:14] there are some questions on slider oh okay sure i guess i was looking at
[00:55:16] oh okay sure i guess i was looking at started looking at zoom
[00:55:18] started looking at zoom chat
[00:55:20] chat okay so questions
[00:55:26] um
[00:55:28] um oh whoa okay there's
[00:55:30] oh whoa okay there's yes i'll cover gpg3 how about body
[00:55:33] yes i'll cover gpg3 how about body language so i think i mentioned that
[00:55:35] language so i think i mentioned that gestures
[00:55:36] gestures are typically not studied in nlp but
[00:55:40] are typically not studied in nlp but um but they're definitely fair game for
[00:55:42] um but they're definitely fair game for you know communication and there's i
[00:55:44] you know communication and there's i think
[00:55:46] interesting uh
[00:55:48] interesting uh kind of subfield mnlp which has to do
[00:55:50] kind of subfield mnlp which has to do with grounding and how people
[00:55:54] with grounding and how people use language in the world which it's
[00:55:55] use language in the world which it's natural to consider gestures um
[00:55:59] natural to consider gestures um uh let's see
[00:56:01] uh let's see can we build a common electric language
[00:56:04] can we build a common electric language with
[00:56:04] with precise meaning so all languages can
[00:56:06] precise meaning so all languages can reference based on it so this was uh
[00:56:09] reference based on it so this was uh this has been tried so people have uh
[00:56:11] this has been tried so people have uh there was a language called laguan which
[00:56:13] there was a language called laguan which was developed
[00:56:15] was developed to remove all ambiguity you know from
[00:56:18] to remove all ambiguity you know from language so it would be you know precise
[00:56:20] language so it would be you know precise and everyone would know what you mean um
[00:56:23] and everyone would know what you mean um personally i think it was a kind of a
[00:56:25] personally i think it was a kind of a little bit of a fool's errand because um
[00:56:28] little bit of a fool's errand because um i think that ambiguity is exactly what
[00:56:33] i think that ambiguity is exactly what allows language to be so efficient
[00:56:36] allows language to be so efficient so the meaning of a
[00:56:37] so the meaning of a of
[00:56:38] of you know as
[00:56:39] you know as a sentence
[00:56:41] a sentence um
[00:56:42] um is a function of not only the words but
[00:56:44] is a function of not only the words but also of the of the context so if the
[00:56:47] also of the of the context so if the context already makes some things kind
[00:56:49] context already makes some things kind of obvious then um you don't
[00:56:52] of obvious then um you don't really need to say it
[00:56:54] really need to say it um so
[00:56:55] um so and also it's just not language has you
[00:56:58] and also it's just not language has you have to take into account the ease of
[00:57:00] have to take into account the ease of acquisition
[00:57:01] acquisition and it's much easier to
[00:57:04] and it's much easier to kind of almost by construction to learn
[00:57:06] kind of almost by construction to learn the languages that we've evolved to
[00:57:08] the languages that we've evolved to learn um and something that's designed
[00:57:11] learn um and something that's designed is generally not
[00:57:13] is generally not you know going to be super productive
[00:57:17] you know going to be super productive so nlu is a combination of plausibility
[00:57:20] so nlu is a combination of plausibility and fluency
[00:57:22] and fluency so what is i guess natural language
[00:57:24] so what is i guess natural language understanding is something i haven't
[00:57:27] understanding is something i haven't quite defined because i think there's no
[00:57:29] quite defined because i think there's no accepted you know definition
[00:57:33] accepted you know definition of it you can think about it as
[00:57:35] of it you can think about it as demonstrating a proficiency or a number
[00:57:37] demonstrating a proficiency or a number of
[00:57:39] of you know tasks such as question
[00:57:40] you know tasks such as question answering or translation
[00:57:42] answering or translation um
[00:57:43] um or i guess if you think about generation
[00:57:46] or i guess if you think about generation you have to think about plausibility and
[00:57:47] you have to think about plausibility and fluency but also you know
[00:57:50] fluency but also you know um you know
[00:57:51] um you know truthfulness
[00:57:53] truthfulness why did you choose to do research in nlp
[00:57:55] why did you choose to do research in nlp as opposed to other areas how would you
[00:57:58] as opposed to other areas how would you think about
[00:58:00] think about sorry this icon on the way
[00:58:02] sorry this icon on the way what you wanted to
[00:58:06] study um during the future
[00:58:10] study um during the future sorry slido i don't know why slido has
[00:58:12] sorry slido i don't know why slido has this icon that makes me hard to read
[00:58:14] this icon that makes me hard to read this
[00:58:15] this um so i guess how do i choose to do
[00:58:17] um so i guess how do i choose to do research in nlp i guess this is a very
[00:58:20] research in nlp i guess this is a very much
[00:58:21] much i guess a personal
[00:58:23] i guess a personal thing um so i don't think my answer is
[00:58:26] thing um so i don't think my answer is you know necessarily good for everyone
[00:58:28] you know necessarily good for everyone but i think um just the idea of
[00:58:32] but i think um just the idea of what you can do
[00:58:33] what you can do with language
[00:58:35] with language just seems so
[00:58:36] just seems so kind of powerful to me um like i said
[00:58:39] kind of powerful to me um like i said it's one of these things that is so kind
[00:58:41] it's one of these things that is so kind of uniquely human and also i think
[00:58:45] of uniquely human and also i think it seems like a window to understanding
[00:58:48] it seems like a window to understanding um you know kind of cognition
[00:58:50] um you know kind of cognition because it's a way to kind of it's a way
[00:58:53] because it's a way to kind of it's a way to do the i o from
[00:58:55] to do the i o from out in and out of brains i suppose
[00:58:58] out in and out of brains i suppose um is there research on how to
[00:59:00] um is there research on how to incorporate real time changes in
[00:59:02] incorporate real time changes in language like constant emergence of new
[00:59:03] language like constant emergence of new words and you know phrases
[00:59:06] words and you know phrases uh yeah so there is a lot of
[00:59:08] uh yeah so there is a lot of interesting work uh studying um language
[00:59:12] interesting work uh studying um language change over time um you know there's
[00:59:15] change over time um you know there's historical linguistics which talks about
[00:59:17] historical linguistics which talks about you know bigger scale changes of you
[00:59:19] you know bigger scale changes of you know latin to spanish and french and so
[00:59:21] know latin to spanish and french and so on uh but there's also uh interesting
[00:59:24] on uh but there's also uh interesting opportunity to do it on the web because
[00:59:26] opportunity to do it on the web because internet language changes you know very
[00:59:29] internet language changes you know very uh very quickly there's always new words
[00:59:31] uh very quickly there's always new words that come up and also on you know i
[00:59:34] that come up and also on you know i guess on twitter things are geo you can
[00:59:36] guess on twitter things are geo you can geo tag things so that you can actually
[00:59:38] geo tag things so that you can actually witness the kind of how language you
[00:59:41] witness the kind of how language you know spreads
[00:59:43] know spreads over time
[00:59:45] so yeah there is an active area of
[00:59:48] so yeah there is an active area of thinking about um you know language uh
[00:59:51] thinking about um you know language uh change um
[00:59:53] change um not just new words but also the
[00:59:56] not just new words but also the new words changing in meaning like the
[00:59:58] new words changing in meaning like the word awful uh used to mean to kind of uh
[01:00:02] word awful uh used to mean to kind of uh sorry the word awesome i think used to
[01:00:05] sorry the word awesome i think used to mean something more like awful uh but
[01:00:08] mean something more like awful uh but now it means it changed kind of flipped
[01:00:10] now it means it changed kind of flipped in sentiment from you know negative two
[01:00:13] in sentiment from you know negative two a positive
[01:00:16] a positive um
[01:00:17] um cool so
[01:00:18] cool so maybe i will move on thanks for the
[01:00:21] maybe i will move on thanks for the questions
[01:00:22] questions okay so um up until this point so
[01:00:26] okay so um up until this point so 2014
[01:00:27] 2014 is when kind of deep learning really
[01:00:29] is when kind of deep learning really started gaining momentum and from 2014
[01:00:32] started gaining momentum and from 2014 to 18 it was really kind of
[01:00:35] to 18 it was really kind of neuralizing everything you have parsers
[01:00:38] neuralizing everything you have parsers co-reference resolution systems named
[01:00:40] co-reference resolution systems named entity recognition systems
[01:00:42] entity recognition systems everything kind of under the sun
[01:00:44] everything kind of under the sun and you know numbers went up things did
[01:00:47] and you know numbers went up things did get better because the models were just
[01:00:48] get better because the models were just more powerful than what existed
[01:00:51] more powerful than what existed previously
[01:00:52] previously i think a big turning point came in 2018
[01:00:55] i think a big turning point came in 2018 so there's this paper called uh deep
[01:00:57] so there's this paper called uh deep contextualized word embeddings maybe
[01:00:59] contextualized word embeddings maybe more better known as elmo
[01:01:02] more better known as elmo and the idea behind elmo can be
[01:01:04] and the idea behind elmo can be summarized as follows so imagine you're
[01:01:07] summarized as follows so imagine you're trying to do question answering so
[01:01:09] trying to do question answering so you know our group we actually spent uh
[01:01:11] you know our group we actually spent uh quite a bit of work creating the squad
[01:01:14] quite a bit of work creating the squad question answering data set with a
[01:01:15] question answering data set with a hundred thousand examples so it was a
[01:01:17] hundred thousand examples so it was a lot of work to get that but um in some
[01:01:19] lot of work to get that but um in some sense a thousand a hundred thousand is
[01:01:21] sense a thousand a hundred thousand is really really small compared to the
[01:01:24] really really small compared to the massive amounts of text
[01:01:26] massive amounts of text on the web
[01:01:27] on the web and so you the idea behind pre-training
[01:01:29] and so you the idea behind pre-training is that you train a language model to
[01:01:32] is that you train a language model to predict the next word given the previous
[01:01:34] predict the next word given the previous context
[01:01:35] context and so there should this is called i
[01:01:37] and so there should this is called i guess self-supervision you just make up
[01:01:39] guess self-supervision you just make up a task which is called predict the next
[01:01:40] a task which is called predict the next word given previous words and then you
[01:01:43] word given previous words and then you learn embeddings
[01:01:45] learn embeddings and then you use those embeddings uh to
[01:01:48] and then you use those embeddings uh to drive some downstream tasks where you
[01:01:50] drive some downstream tasks where you have
[01:01:51] have much fewer labeled examples
[01:01:54] much fewer labeled examples and the result was that across the board
[01:01:57] and the result was that across the board all across a number of different
[01:01:58] all across a number of different benchmarks
[01:02:01] benchmarks the accuracies
[01:02:03] the accuracies went up
[01:02:04] went up by a few points so i guess it's maybe
[01:02:06] by a few points so i guess it's maybe hard to really appreciate what like
[01:02:07] hard to really appreciate what like three points means but you know it was
[01:02:10] three points means but you know it was think about these
[01:02:12] think about these systems were already hard to improve and
[01:02:14] systems were already hard to improve and one point gain was very good and this is
[01:02:16] one point gain was very good and this is like you know substantial gain across a
[01:02:18] like you know substantial gain across a wide variety of tasks
[01:02:20] wide variety of tasks so this got a lot of nlp people really
[01:02:23] so this got a lot of nlp people really excited
[01:02:24] excited later that year
[01:02:27] later that year albert
[01:02:29] albert came out from
[01:02:31] came out from from google and
[01:02:33] from google and again across the which is i'm not going
[01:02:36] again across the which is i'm not going to go into the details actually if you
[01:02:38] to go into the details actually if you watch the differentiable
[01:02:39] watch the differentiable programming lecture i do
[01:02:41] programming lecture i do explain uh a bit more about what you
[01:02:43] explain uh a bit more about what you know bird is doing um
[01:02:46] know bird is doing um but
[01:02:46] but think about it again as a it's a mass
[01:02:48] think about it again as a it's a mass language model which is kind of like
[01:02:50] language model which is kind of like predict a word uh given its context
[01:02:52] predict a word uh given its context um and
[01:02:54] um and this was
[01:02:55] this was more or less just scaled up and you know
[01:02:57] more or less just scaled up and you know engineered properly and this again
[01:03:00] engineered properly and this again yielded huge gains over you know
[01:03:02] yielded huge gains over you know previous you know methods
[01:03:04] previous you know methods um
[01:03:05] um so
[01:03:06] so so this really i think changed the
[01:03:08] so this really i think changed the game in nlp from
[01:03:11] game in nlp from having
[01:03:12] having you know kind of specific architectures
[01:03:14] you know kind of specific architectures to do different tasks
[01:03:16] to do different tasks to a world where you have one
[01:03:18] to a world where you have one architecture that does multiple paths so
[01:03:21] architecture that does multiple paths so i guess i didn't mention that bird is
[01:03:23] i guess i didn't mention that bird is now used to essentially power
[01:03:26] now used to essentially power all the or bert or plus friends
[01:03:30] all the or bert or plus friends some muppet is used to power some
[01:03:32] some muppet is used to power some downstream nlp tasks like co-reference
[01:03:35] downstream nlp tasks like co-reference resolution or semantic parsing and so on
[01:03:39] resolution or semantic parsing and so on so it's really kind of brings us one
[01:03:41] so it's really kind of brings us one step closer to having kind of a unified
[01:03:43] step closer to having kind of a unified representation back or unified one model
[01:03:46] representation back or unified one model that can kind of roll them all in a
[01:03:48] that can kind of roll them all in a sense
[01:03:49] sense um so going back to reading
[01:03:50] um so going back to reading comprehension one thing that
[01:03:55] is remarkable is that
[01:03:57] is remarkable is that uh if you look at the
[01:03:59] uh if you look at the leaderboard and the accuracies um
[01:04:01] leaderboard and the accuracies um they're way above you know human level
[01:04:02] they're way above you know human level performance so these systems look like
[01:04:05] performance so these systems look like they're getting uh superhuman you know
[01:04:07] they're getting uh superhuman you know performance
[01:04:09] performance but
[01:04:10] but one thing that uh we did a few years ago
[01:04:13] one thing that uh we did a few years ago was to really probe
[01:04:15] was to really probe into whether these systems actually
[01:04:17] into whether these systems actually understood language um so here's a
[01:04:20] understood language um so here's a paragraph and a sentence um
[01:04:23] paragraph and a sentence um the number of new hugo huguenot colonist
[01:04:25] the number of new hugo huguenot colonist declined after what year emperor
[01:04:27] declined after what year emperor correctly answers 1700 which is right
[01:04:29] correctly answers 1700 which is right here
[01:04:30] here but if you add a distracting sentence
[01:04:32] but if you add a distracting sentence which looks like it answers the question
[01:04:34] which looks like it answers the question but it doesn't then bert will get
[01:04:36] but it doesn't then bert will get distracted and answer the wrong thing
[01:04:38] distracted and answer the wrong thing and quantitatively
[01:04:40] and quantitatively all the systems just fall
[01:04:43] all the systems just fall by you know quite a bit when added when
[01:04:46] by you know quite a bit when added when added this with this distracting
[01:04:47] added this with this distracting sentence whereas humans obviously don't
[01:04:49] sentence whereas humans obviously don't get followed by that as much
[01:04:52] get followed by that as much so
[01:04:52] so it's you know one thing to kind of keep
[01:04:55] it's you know one thing to kind of keep in mind is that while um
[01:04:58] in mind is that while um models have gotten
[01:05:00] models have gotten you know impressive results on
[01:05:01] you know impressive results on benchmarks there's still these blind
[01:05:03] benchmarks there's still these blind spots that means that solving a
[01:05:05] spots that means that solving a benchmark is not the same as solving the
[01:05:08] benchmark is not the same as solving the actual underlying task which is can be
[01:05:11] actual underlying task which is can be miscellaneous if you read headlines
[01:05:12] miscellaneous if you read headlines where
[01:05:13] where computers can read better than humans
[01:05:15] computers can read better than humans where that's just not true computers do
[01:05:17] where that's just not true computers do squad better than humans that is true
[01:05:21] okay and what's what's a little bit more
[01:05:23] okay and what's what's a little bit more maybe worries is that these models that
[01:05:25] maybe worries is that these models that can be easy to break
[01:05:27] can be easy to break um but we don't actually know how to you
[01:05:30] um but we don't actually know how to you know fix them except for by training
[01:05:32] know fix them except for by training larger models and hoping that they break
[01:05:34] larger models and hoping that they break less
[01:05:36] less okay any questions before i talk about
[01:05:38] okay any questions before i talk about gpg 3 because that's going to be the
[01:05:41] gpg 3 because that's going to be the last topic
[01:05:43] last topic okay so um
[01:05:45] okay so um is naming algorithms for cartoon
[01:05:48] is naming algorithms for cartoon tv characters a thing or just a
[01:05:50] tv characters a thing or just a coincidence for the two instances
[01:05:52] coincidence for the two instances um i would say
[01:05:54] um i would say i would gather that it's very not a
[01:05:56] i would gather that it's very not a coincidence because um
[01:05:59] coincidence because um uh because you also had ernie that came
[01:06:01] uh because you also had ernie that came out afterwards um and big bird and it's
[01:06:04] out afterwards um and big bird and it's just you know clearly people are going
[01:06:07] just you know clearly people are going along with a theme there's another cast
[01:06:09] along with a theme there's another cast of characters uh bark and marge are
[01:06:11] of characters uh bark and marge are other um
[01:06:13] other um i guess they're not muppets but they're
[01:06:15] i guess they're not muppets but they're other uh from the simpsons and that's
[01:06:17] other uh from the simpsons and that's who is another line that facebook has
[01:06:19] who is another line that facebook has been pursuing
[01:06:21] been pursuing um is there any work on going to improve
[01:06:24] um is there any work on going to improve comprehension reading between the lines
[01:06:26] comprehension reading between the lines understanding
[01:06:27] understanding um
[01:06:28] um it's actually interesting because uh
[01:06:31] it's actually interesting because uh these models large models
[01:06:34] these models large models are so contextual and leverage so much
[01:06:37] are so contextual and leverage so much kind of external world knowledge about
[01:06:40] kind of external world knowledge about text that they're almost kind of reading
[01:06:42] text that they're almost kind of reading too much
[01:06:44] too much between the lines um they're making
[01:06:47] between the lines um they're making inferences and making assumptions
[01:06:49] inferences and making assumptions um which is what at least all these
[01:06:51] um which is what at least all these biases and in the models because it's
[01:06:53] biases and in the models because it's not stated in the text they're just
[01:06:55] not stated in the text they're just learning from you know associations
[01:07:00] okay let me move on uh so to get to
[01:07:05] the
[01:07:07] the now the final thing so 2020 so may
[01:07:11] now the final thing so 2020 so may open ai releases gpt3 um i'm skipping a
[01:07:14] open ai releases gpt3 um i'm skipping a bunch of other models like gbt and gpt2
[01:07:16] bunch of other models like gbt and gpt2 which is in the truth of time and this
[01:07:18] which is in the truth of time and this was
[01:07:19] was essentially a big language mall i mean
[01:07:21] essentially a big language mall i mean big is an understatement it's a
[01:07:24] big is an understatement it's a ginormous language model trained on
[01:07:26] ginormous language model trained on common crawl which is our
[01:07:29] common crawl which is our best
[01:07:30] best kind of approximation of the internet uh
[01:07:32] kind of approximation of the internet uh so to speak and has 175 billion
[01:07:35] so to speak and has 175 billion parameters um whereas bert had maybe
[01:07:38] parameters um whereas bert had maybe like 300 you know million parameters or
[01:07:40] like 300 you know million parameters or so so this is much much much larger
[01:07:44] so so this is much much much larger so the interesting
[01:07:46] so the interesting thing about gpe 3
[01:07:48] thing about gpe 3 is
[01:07:49] is this ability to do what they call in
[01:07:51] this ability to do what they call in context learning so traditionally if you
[01:07:54] context learning so traditionally if you use bert what you would do
[01:07:56] use bert what you would do is you have a model you show an example
[01:07:59] is you have a model you show an example and then you perform an update and you
[01:08:01] and then you perform an update and you show another example and you perform an
[01:08:02] show another example and you perform an update and so on and this is called fine
[01:08:04] update and so on and this is called fine tuning a language model but gpe3 showed
[01:08:07] tuning a language model but gpe3 showed that that was
[01:08:08] that that was actually not necessary to get some you
[01:08:11] actually not necessary to get some you know interesting performance so you
[01:08:13] know interesting performance so you could actually do zero shot training so
[01:08:15] could actually do zero shot training so you could say translate english to
[01:08:17] you could say translate english to french and you
[01:08:19] french and you say cheese
[01:08:21] say cheese and then you give the prompt and then it
[01:08:23] and then you give the prompt and then it will actually do something reasonable
[01:08:26] will actually do something reasonable or you give a few one example or a few
[01:08:28] or you give a few one example or a few examples and notice that this is not
[01:08:31] examples and notice that this is not training data in a conventional
[01:08:33] training data in a conventional sense where you optimize a loss this is
[01:08:36] sense where you optimize a loss this is actually the input into the language
[01:08:39] actually the input into the language model
[01:08:40] model so language model all it does is that
[01:08:41] so language model all it does is that you give it a string and you ask it to
[01:08:44] you give it a string and you ask it to predict next string right so this is
[01:08:47] predict next string right so this is encoding
[01:08:49] encoding a task actually in you know natural
[01:08:51] a task actually in you know natural language which is kind of a very
[01:08:53] language which is kind of a very different way of thinking about um you
[01:08:55] different way of thinking about um you know learning
[01:08:57] know learning um let's see do i have enough time maybe
[01:08:59] um let's see do i have enough time maybe i'll
[01:09:00] i'll uh so open ai has this playground um let
[01:09:04] uh so open ai has this playground um let me just show you a little bit so uh
[01:09:08] me just show you a little bit so uh there's many things you can do um
[01:09:11] there's many things you can do um you can
[01:09:12] you can so this is the prompt
[01:09:14] so this is the prompt um
[01:09:15] um and i'm gonna say how can i help you
[01:09:18] and i'm gonna say how can i help you today um you know can you tell me the
[01:09:21] today um you know can you tell me the difference between
[01:09:24] difference between um
[01:09:24] um sarsa and q learning
[01:09:27] sarsa and q learning i actually don't know whether this will
[01:09:29] i actually don't know whether this will work okay so sure
[01:09:32] work okay so sure um policy inspired by q learning
[01:09:35] um policy inspired by q learning um
[01:09:37] um okay so
[01:09:39] okay so this doesn't really answer the question
[01:09:41] this doesn't really answer the question but that was
[01:09:43] but that was nice but uh i didn't answer that
[01:09:46] nice but uh i didn't answer that question
[01:09:47] question um
[01:09:50] yeah so
[01:09:52] yeah so uh how about something else who founded
[01:09:56] uh how about something else who founded microsoft
[01:09:57] microsoft um
[01:09:59] okay so this is also you know what lie
[01:10:02] okay so this is also you know what lie but so you can kind of see that
[01:10:05] but so you can kind of see that uh it generates fluent english but it
[01:10:08] uh it generates fluent english but it sometimes uh doesn't have the best
[01:10:10] sometimes uh doesn't have the best tendency to tell you know the truth um
[01:10:13] tendency to tell you know the truth um this is another example so again this is
[01:10:15] this is another example so again this is a prompt and
[01:10:17] a prompt and let me
[01:10:20] go to
[01:10:22] go to let me dig this up so this is kind of
[01:10:24] let me dig this up so this is kind of from the
[01:10:25] from the course
[01:10:27] course syllabus
[01:10:28] syllabus um oops
[01:10:36] so this is some complicated expression
[01:10:38] so this is some complicated expression and then this is again the what you feed
[01:10:40] and then this is again the what you feed into gp3 and it says artificial
[01:10:42] into gp3 and it says artificial intelligence is the magic that makes
[01:10:43] intelligence is the magic that makes computer do things that are not they're
[01:10:45] computer do things that are not they're not supposed to do like talking their
[01:10:47] not supposed to do like talking their cars interesting okay so it's
[01:10:50] cars interesting okay so it's kind of um
[01:10:51] kind of um well you can judge for yourself what to
[01:10:53] well you can judge for yourself what to make of this
[01:10:55] make of this um
[01:10:56] um anyway you can
[01:10:58] anyway you can if you have access you can burn quite a
[01:10:59] if you have access you can burn quite a bit of time playing with this um
[01:11:02] bit of time playing with this um another example i'll show is uh
[01:11:06] another example i'll show is uh you can train it on cs221 quizzes so you
[01:11:09] you can train it on cs221 quizzes so you train it on quiz one question and
[01:11:11] train it on quiz one question and answers and you see how well it does on
[01:11:12] answers and you see how well it does on quiz 2.
[01:11:14] quiz 2. um
[01:11:15] um so here are you know prompts uh in
[01:11:20] so here are you know prompts uh in so what is bold is given to the
[01:11:21] so what is bold is given to the algorithm so this should be familiar
[01:11:24] algorithm so this should be familiar from quiz one and then you ask it which
[01:11:26] from quiz one and then you ask it which of the following are examples of
[01:11:27] of the following are examples of regression
[01:11:28] regression um
[01:11:30] um and it answers
[01:11:31] and it answers uh a c and d which is which is wrong but
[01:11:35] uh a c and d which is which is wrong but then it offers an explanation of this
[01:11:37] then it offers an explanation of this which is you know kind of starts off on
[01:11:41] which is you know kind of starts off on on target and then
[01:11:43] on target and then um it kind of
[01:11:45] um it kind of examples of supervised regression house
[01:11:47] examples of supervised regression house price estimation which is actually
[01:11:48] price estimation which is actually pretty good uh spam detection which
[01:11:51] pretty good uh spam detection which contradicts what it answered
[01:11:54] contradicts what it answered and
[01:11:56] and you know this unsupervised regression i
[01:11:57] you know this unsupervised regression i don't know if it's a thing
[01:12:00] don't know if it's a thing so you can kind of get a sense that gpd3
[01:12:03] so you can kind of get a sense that gpd3 is really good at
[01:12:06] is really good at generating
[01:12:07] generating text
[01:12:08] text which if you just if i didn't tell you
[01:12:10] which if you just if i didn't tell you this was fake you'd probably read it and
[01:12:13] this was fake you'd probably read it and you know
[01:12:15] you know but
[01:12:16] but you have to actually look pretty closely
[01:12:17] you have to actually look pretty closely now to
[01:12:20] now to know that uh you know something is up
[01:12:25] okay there are actually so i'm trying to
[01:12:27] okay there are actually so i'm trying to give you a
[01:12:28] give you a maybe a more balanced view of gp3 i
[01:12:31] maybe a more balanced view of gp3 i think if you go on the internet and look
[01:12:32] think if you go on the internet and look at all the twitter you'll just be
[01:12:34] at all the twitter you'll just be completely blown away by all the awesome
[01:12:36] completely blown away by all the awesome things people are building out there but
[01:12:38] things people are building out there but i want to kind of balance it out with
[01:12:39] i want to kind of balance it out with some things which are like yeah maybe
[01:12:42] some things which are like yeah maybe it's not doing everything but
[01:12:43] it's not doing everything but um
[01:12:45] um several other things to mention
[01:12:47] several other things to mention now that gender bias i think is very hot
[01:12:49] now that gender bias i think is very hot on people's minds in nlp uh
[01:12:52] on people's minds in nlp uh there's
[01:12:54] there's they have an entire section in the paper
[01:12:55] they have an entire section in the paper that calls out that yes not surprisingly
[01:12:58] that calls out that yes not surprisingly gpg3 has gender bias
[01:13:01] gpg3 has gender bias i mentioned before that there was
[01:13:02] i mentioned before that there was actually someone who used gpd3 to
[01:13:04] actually someone who used gpd3 to generate a blog post and this ended up
[01:13:06] generate a blog post and this ended up you know number one on hacker news for a
[01:13:08] you know number one on hacker news for a while and there's even papers coming out
[01:13:11] while and there's even papers coming out that says that gbd3 can primed with
[01:13:14] that says that gbd3 can primed with extremist
[01:13:16] extremist context can generate more extremist
[01:13:19] context can generate more extremist content which is again perhaps not
[01:13:21] content which is again perhaps not surprising given that this was trained
[01:13:23] surprising given that this was trained on the internet and the internet has
[01:13:25] on the internet and the internet has many things lovely things in it
[01:13:28] many things lovely things in it um so some kind of things to think about
[01:13:31] um so some kind of things to think about i think
[01:13:32] i think so one clear question here is can we
[01:13:35] so one clear question here is can we make gpd3 more unbiased and lux less
[01:13:38] make gpd3 more unbiased and lux less toxic certainly out of the box this is
[01:13:41] toxic certainly out of the box this is just kind of a
[01:13:44] just kind of a wild tool that's kind of
[01:13:46] wild tool that's kind of like you should not be using gpu
[01:13:48] like you should not be using gpu directly to generate things that
[01:13:51] directly to generate things that to show show people
[01:13:53] to show show people um
[01:13:54] um another question is what what are the
[01:13:56] another question is what what are the societal impacts of you know using text
[01:13:59] societal impacts of you know using text automatic generation
[01:14:01] automatic generation you know which is kind of related even
[01:14:03] you know which is kind of related even if it's not generally unbiased things um
[01:14:05] if it's not generally unbiased things um but especially if it's generally
[01:14:06] but especially if it's generally unbiased things um the obvious
[01:14:09] unbiased things um the obvious worry is that
[01:14:11] worry is that you know fake news is already a problem
[01:14:13] you know fake news is already a problem and this is just a massive amplifier for
[01:14:15] and this is just a massive amplifier for you know producing large amounts of
[01:14:18] you know producing large amounts of credible
[01:14:19] credible uh sounding text which is just you know
[01:14:22] uh sounding text which is just you know swamps out anything else anything anyone
[01:14:24] swamps out anything else anything anyone wants to say so that's
[01:14:26] wants to say so that's a potentially a pretty dangerous you
[01:14:28] a potentially a pretty dangerous you know world
[01:14:29] know world um on the kind of more scientific side i
[01:14:32] um on the kind of more scientific side i think there's still
[01:14:33] think there's still the outstanding question of you know can
[01:14:35] the outstanding question of you know can we achieve language understanding
[01:14:37] we achieve language understanding without
[01:14:38] without a model of the world or without um you
[01:14:41] a model of the world or without um you know world experience so gpd3 is kind of
[01:14:44] know world experience so gpd3 is kind of only trained on text
[01:14:46] only trained on text and furthermore it has no it's just a
[01:14:49] and furthermore it has no it's just a large transformer model which has no
[01:14:51] large transformer model which has no internal structure
[01:14:52] internal structure and
[01:14:53] and you know one question is can it be said
[01:14:56] you know one question is can it be said to
[01:14:56] to you know understand language or does it
[01:14:58] you know understand language or does it need you know the grounding experience
[01:15:01] need you know the grounding experience so
[01:15:02] so um
[01:15:03] um if there were more time i would actually
[01:15:04] if there were more time i would actually do a poll and get people to discuss this
[01:15:07] do a poll and get people to discuss this because i think it's actually not
[01:15:09] because i think it's actually not obvious what the answer is
[01:15:11] obvious what the answer is um
[01:15:13] um maybe i'll uh
[01:15:14] maybe i'll uh end with final remarks and then i'll
[01:15:16] end with final remarks and then i'll take questions so
[01:15:18] take questions so one thing i want to just you know
[01:15:20] one thing i want to just you know starting at the top i want to highlight
[01:15:21] starting at the top i want to highlight that language is just incredibly rich
[01:15:24] that language is just incredibly rich complex communication system you know
[01:15:26] complex communication system you know quote-unquote from x-ray cd glorif
[01:15:28] quote-unquote from x-ray cd glorif glorious chaos
[01:15:30] glorious chaos but yeah which is i think you know
[01:15:32] but yeah which is i think you know fascinating to study
[01:15:34] fascinating to study at the same time there's a lot of
[01:15:35] at the same time there's a lot of regularity and structure
[01:15:37] regularity and structure there has to be because i think
[01:15:39] there has to be because i think otherwise we wouldn't be able to
[01:15:40] otherwise we wouldn't be able to productively coordinate and use language
[01:15:43] productively coordinate and use language as systematically as we have been able
[01:15:45] as systematically as we have been able to um current models kind of you know
[01:15:47] to um current models kind of you know ignore it um
[01:15:49] ignore it um which is you know but it's also unclear
[01:15:51] which is you know but it's also unclear how to incorporate this information in a
[01:15:53] how to incorporate this information in a way that kind of matters
[01:15:55] way that kind of matters um as you can see the field is moving at
[01:15:57] um as you can see the field is moving at an incredible pace right you know gb3
[01:16:00] an incredible pace right you know gb3 was on 2
[01:16:01] was on 2 20 20. um
[01:16:03] 20 20. um bird was only two years ago c2c was you
[01:16:06] bird was only two years ago c2c was you know six years ago and only in the last
[01:16:08] know six years ago and only in the last you know six years nlp has been
[01:16:10] you know six years nlp has been completely you know transformed i would
[01:16:13] completely you know transformed i would say even in the last two years it's been
[01:16:15] say even in the last two years it's been completely transformed so
[01:16:16] completely transformed so um it's interesting what will happen in
[01:16:18] um it's interesting what will happen in the next year and i think there's even
[01:16:20] the next year and i think there's even more i think urgency to kind of
[01:16:22] more i think urgency to kind of understand what these models are doing
[01:16:24] understand what these models are doing and make sure that the technology is
[01:16:26] and make sure that the technology is kind of directed in the right way
[01:16:29] kind of directed in the right way um and then of course there's a lot more
[01:16:31] um and then of course there's a lot more to be
[01:16:32] to be said and learn from this there are
[01:16:34] said and learn from this there are wonderful classes at stanford that you
[01:16:36] wonderful classes at stanford that you can take 124 to 24
[01:16:39] can take 124 to 24 to learn more about
[01:16:40] to learn more about nlp so
[01:16:42] nlp so with that um i will
[01:16:44] with that um i will take any questions
[01:16:48] take any questions let's see
[01:16:49] let's see where is
[01:16:52] okay is voice language a better source
[01:16:54] okay is voice language a better source than text language so this is kind of
[01:16:56] than text language so this is kind of spoken audio
[01:16:58] spoken audio uh it depends what you mean by better um
[01:17:03] uh it depends what you mean by better um so
[01:17:05] so voice does contain prosodic cues which
[01:17:08] voice does contain prosodic cues which contain more information than what's
[01:17:09] contain more information than what's just in the text
[01:17:11] just in the text um unfortunately there's not that much
[01:17:14] um unfortunately there's not that much of it compared to all the texts that's
[01:17:16] of it compared to all the texts that's on the web
[01:17:18] on the web um
[01:17:19] um i think a lot of the issues of language
[01:17:20] i think a lot of the issues of language understanding
[01:17:22] understanding um can be somewhat factored out from you
[01:17:25] um can be somewhat factored out from you know speech
[01:17:27] know speech often people convert speech spoken
[01:17:29] often people convert speech spoken language into written language and then
[01:17:30] language into written language and then you take it from there although there's
[01:17:32] you take it from there although there's definitely
[01:17:34] definitely infrastructure in uh
[01:17:36] infrastructure in uh speech as well how do you feel
[01:17:38] speech as well how do you feel understanding multiple languages is a
[01:17:40] understanding multiple languages is a plus of research
[01:17:42] plus of research you know how do you feel understanding
[01:17:44] you know how do you feel understanding of multiple languages is a plus
[01:17:46] of multiple languages is a plus of research in nlp
[01:17:48] of research in nlp uh so
[01:17:50] uh so a lot of
[01:17:51] a lot of research in nlp is very
[01:17:54] research in nlp is very english centric which um
[01:17:57] english centric which um you know is you know i think a potential
[01:17:59] you know is you know i think a potential problem if you think about fairness
[01:18:01] problem if you think about fairness right i think many
[01:18:02] right i think many uh rare language and low resources
[01:18:04] uh rare language and low resources languages as they're called um don't
[01:18:06] languages as they're called um don't enjoy such high accuracies and those are
[01:18:10] enjoy such high accuracies and those are probably the people who kind of need it
[01:18:11] probably the people who kind of need it you know the most so there is a
[01:18:13] you know the most so there is a community of people in nlp who are very
[01:18:16] community of people in nlp who are very much interested in how to design more
[01:18:18] much interested in how to design more efficient learning algorithms that help
[01:18:21] efficient learning algorithms that help you know low resource languages
[01:18:24] finally how has the study of nlp
[01:18:27] finally how has the study of nlp helps linguistics grows as a field so
[01:18:29] helps linguistics grows as a field so this is a really interesting question
[01:18:31] this is a really interesting question unfortunately i won't have time to
[01:18:32] unfortunately i won't have time to really do its justice um
[01:18:35] really do its justice um chris potts um is a linguist at stanford
[01:18:38] chris potts um is a linguist at stanford has lots of you know insightful opinions
[01:18:41] has lots of you know insightful opinions about this um linguistics is
[01:18:44] about this um linguistics is interesting because
[01:18:46] interesting because it was for much of its history it was so
[01:18:48] it was for much of its history it was so so dominated by kind of formal you know
[01:18:52] so dominated by kind of formal you know uh grammars and semantics starting you
[01:18:54] uh grammars and semantics starting you know from chomsky in some ways that his
[01:18:57] know from chomsky in some ways that his era led to this kind of whole feel being
[01:19:00] era led to this kind of whole feel being dominated by a certain kind of way you
[01:19:02] dominated by a certain kind of way you know of thinking and it's almost that
[01:19:05] know of thinking and it's almost that some of these other perspectives hasn't
[01:19:06] some of these other perspectives hasn't really had a chance to you know to
[01:19:09] really had a chance to you know to breathe i think this is
[01:19:11] breathe i think this is in a lot of the still the this formal
[01:19:14] in a lot of the still the this formal semantics is still very
[01:19:16] semantics is still very much um kind of in the logical tradition
[01:19:18] much um kind of in the logical tradition where you know you have sentences which
[01:19:20] where you know you have sentences which are you look at carefully and you try to
[01:19:24] are you look at carefully and you try to intuit what might be going on it's a
[01:19:26] intuit what might be going on it's a very different approach than just
[01:19:28] very different approach than just looking at
[01:19:29] looking at a broad corpus and trying to make you
[01:19:32] a broad corpus and trying to make you know sense of you know what's there so
[01:19:35] know sense of you know what's there so but this is
[01:19:36] but this is starting to change a bit i think at
[01:19:38] starting to change a bit i think at least people are thinking about
[01:19:42] least people are thinking about how
[01:19:43] how these types of models that we're
[01:19:45] these types of models that we're developing in machinery could be useful
[01:19:48] developing in machinery could be useful i think that
[01:19:49] i think that you know unfortunately
[01:19:51] you know unfortunately there's it's it's a hard
[01:19:53] there's it's it's a hard ask
[01:19:54] ask um because
[01:19:55] um because in some ways the
[01:19:57] in some ways the these deep learning models don't give
[01:19:59] these deep learning models don't give you more much more than existence proof
[01:20:01] you more much more than existence proof of certain types of data leading to
[01:20:04] of certain types of data leading to certain types of behavior
[01:20:06] certain types of behavior and it doesn't really give you necessary
[01:20:08] and it doesn't really give you necessary insight into
[01:20:09] insight into um
[01:20:10] um you know language you know itself
[01:20:12] you know language you know itself because uh interpretability is kind of a
[01:20:14] because uh interpretability is kind of a missing you know component but i think
[01:20:16] missing you know component but i think this is a really great question and
[01:20:19] this is a really great question and interesting to ask like how
[01:20:20] interesting to ask like how these models can actually help us
[01:20:22] these models can actually help us understand language better help us
[01:20:24] understand language better help us understand language better not help you
[01:20:26] understand language better not help you know computer exhibit better language
[01:20:28] know computer exhibit better language understanding
[01:20:30] understanding um
[01:20:31] um can we alter
[01:20:32] can we alter and add to the objective function of
[01:20:34] and add to the objective function of modern models that makes them more
[01:20:35] modern models that makes them more logical
[01:20:36] logical and coherent
[01:20:38] and coherent yeah so this is a
[01:20:40] yeah so this is a kind of a natural direction that a lot
[01:20:42] kind of a natural direction that a lot of people are thinking about
[01:20:44] of people are thinking about um so why can't you have both you can
[01:20:46] um so why can't you have both you can have
[01:20:47] have you know the richness of modern neural
[01:20:49] you know the richness of modern neural models plus some logical and there's
[01:20:51] models plus some logical and there's been a bunch of work
[01:20:53] been a bunch of work that uh tries to fuse the two you know
[01:20:55] that uh tries to fuse the two you know together um
[01:20:57] together um personally i feel like this is far from
[01:21:00] personally i feel like this is far from you know being
[01:21:02] you know being solved by just kind of simple
[01:21:03] solved by just kind of simple combination because i think there's
[01:21:06] combination because i think there's something maybe
[01:21:07] something maybe deeper about
[01:21:08] deeper about um
[01:21:09] um you know how things should be kind of
[01:21:11] you know how things should be kind of structured
[01:21:13] structured like you can add a regularizer that
[01:21:14] like you can add a regularizer that makes burr to kind of be more consistent
[01:21:16] makes burr to kind of be more consistent for gpd3 that make these more consistent
[01:21:19] for gpd3 that make these more consistent but i think the problem is that why are
[01:21:22] but i think the problem is that why are we using these models in the first place
[01:21:24] we using these models in the first place is that they can't be captured by you
[01:21:26] is that they can't be captured by you know logical regularities and the many
[01:21:28] know logical regularities and the many of the advantages that we're getting
[01:21:30] of the advantages that we're getting from them
[01:21:31] from them are
[01:21:32] are in places where logic kind of fails to
[01:21:35] in places where logic kind of fails to you know deliver so
[01:21:37] you know deliver so in those areas i think we don't have
[01:21:39] in those areas i think we don't have much
[01:21:40] much of an option we don't have the option of
[01:21:42] of an option we don't have the option of like uh slapping on a logical regulizer
[01:21:45] like uh slapping on a logical regulizer because otherwise we would have just
[01:21:47] because otherwise we would have just built a logical thing
[01:21:49] built a logical thing um
[01:21:51] um how far is real-time spoken
[01:21:53] how far is real-time spoken communication in multiple languages uh
[01:21:57] communication in multiple languages uh so this is like simultaneous translation
[01:21:59] so this is like simultaneous translation so if i am on skype and i speak in
[01:22:02] so if i am on skype and i speak in english and it comes out in
[01:22:04] english and it comes out in uh
[01:22:06] uh japanese or something um
[01:22:08] japanese or something um this is actually i would say that this
[01:22:11] this is actually i would say that this is
[01:22:12] is i mean not mature technology by any
[01:22:14] i mean not mature technology by any means right now but it's it's coming
[01:22:16] means right now but it's it's coming and i think there's been work
[01:22:19] and i think there's been work uh first of all speech recognition i
[01:22:21] uh first of all speech recognition i think is you know getting really really
[01:22:23] think is you know getting really really good then the other two main challenges
[01:22:25] good then the other two main challenges in in machine translation what makes
[01:22:27] in in machine translation what makes real time difficult is that you know
[01:22:29] real time difficult is that you know word order in languages differ so you
[01:22:31] word order in languages differ so you can't translate word to word so you have
[01:22:33] can't translate word to word so you have to wait a little bit to get enough
[01:22:35] to wait a little bit to get enough context and then translate it um you
[01:22:38] context and then translate it um you know and so forth
[01:22:40] know and so forth and but there are models on nlp that you
[01:22:43] and but there are models on nlp that you know try to do this
[01:22:44] know try to do this um and i think you can do a lot by kind
[01:22:46] um and i think you can do a lot by kind of doing you know predictive predicting
[01:22:49] of doing you know predictive predicting kind of essentially what you're you're
[01:22:51] kind of essentially what you're you're going to say
[01:22:52] going to say um
[01:22:53] um so
[01:22:54] so uh i think that
[01:22:56] uh i think that a lot of this can be done without even
[01:22:59] a lot of this can be done without even you know deep deep understanding of you
[01:23:01] you know deep deep understanding of you know language um what i think we've
[01:23:03] know language um what i think we've learned from translation is that you
[01:23:05] learned from translation is that you know while translation is getting 90 of
[01:23:08] know while translation is getting 90 of the way with translation doesn't really
[01:23:10] the way with translation doesn't really require any understanding of language
[01:23:12] require any understanding of language it's just kind of matching um kind of
[01:23:14] it's just kind of matching um kind of symbols contextually i think getting the
[01:23:17] symbols contextually i think getting the remaining
[01:23:18] remaining bit and having translations that you can
[01:23:21] bit and having translations that you can actually you know trust and are nuanced
[01:23:24] actually you know trust and are nuanced then proper i think is going to require
[01:23:26] then proper i think is going to require you know quite a bit more work
[01:23:30] okay so is there okay another
[01:23:33] okay so is there okay another question
[01:23:34] question um are there examples of building
[01:23:36] um are there examples of building languages with rrl
[01:23:38] languages with rrl it making advantages communicate a multi
[01:23:41] it making advantages communicate a multi in a
[01:23:42] in a agent environment
[01:23:43] agent environment um
[01:23:44] um a multi-engine environment so there is a
[01:23:47] a multi-engine environment so there is a bunch of work on what is called kind of
[01:23:49] bunch of work on what is called kind of multi-agent communication where people
[01:23:52] multi-agent communication where people set up some sort of environment and let
[01:23:54] set up some sort of environment and let a bunch of rl agents um
[01:23:57] a bunch of rl agents um train a bunch of our agents to act in
[01:24:00] train a bunch of our agents to act in this environment where one of the
[01:24:01] this environment where one of the actions is
[01:24:03] actions is talk
[01:24:04] talk right so this is an interesting um
[01:24:07] right so this is an interesting um experiment where you can actually get
[01:24:10] experiment where you can actually get certain types of languages to kind of
[01:24:12] certain types of languages to kind of evolve
[01:24:13] evolve from
[01:24:15] from this procedure
[01:24:16] this procedure uh and
[01:24:19] uh and and language does help you play the game
[01:24:21] and language does help you play the game or uh you know
[01:24:24] or uh you know solve the environment better but it's
[01:24:27] solve the environment better but it's rare that these languages just
[01:24:29] rare that these languages just automatically line up with our
[01:24:31] automatically line up with our notions of
[01:24:32] notions of natural languages
[01:24:34] natural languages because often these worlds are too
[01:24:38] because often these worlds are too kind of limited
[01:24:40] kind of limited for
[01:24:41] for kind of language
[01:24:43] kind of language to be
[01:24:44] to be for it to be necessary to take on that
[01:24:47] for it to be necessary to take on that kind of richness
[01:24:49] kind of richness um and also i think just that
[01:24:52] um and also i think just that there's no pressure that these have
[01:24:54] there's no pressure that these have you know their human language might not
[01:24:57] you know their human language might not is probably not optimal for
[01:24:59] is probably not optimal for um you know for anything it's just what
[01:25:01] um you know for anything it's just what we have and what we kind of happen to
[01:25:04] we have and what we kind of happen to evolve
[01:25:05] evolve so
[01:25:06] so but that's a good question
[01:25:07] but that's a good question what are major problems society that
[01:25:09] what are major problems society that advancing nlp will solve versus what
[01:25:12] advancing nlp will solve versus what problems they may create so there are a
[01:25:15] problems they may create so there are a bunch of places where nlp
[01:25:18] bunch of places where nlp can be
[01:25:19] can be used for societal
[01:25:21] used for societal uh you know good so the one thing that
[01:25:24] uh you know good so the one thing that it can do in principle is to allow a
[01:25:27] it can do in principle is to allow a broader set of people to
[01:25:30] broader set of people to um especially with uh you know different
[01:25:33] um especially with uh you know different you know uh people who might not speak
[01:25:34] you know uh people who might not speak english to be able to kind of tap into
[01:25:37] english to be able to kind of tap into you know english
[01:25:38] you know english english resources and breaking down the
[01:25:40] english resources and breaking down the kind of multilingual
[01:25:42] kind of multilingual barriers um it's also been useful for
[01:25:45] barriers um it's also been useful for um
[01:25:46] um doing kind of analysis on
[01:25:49] doing kind of analysis on studying how people talk so
[01:25:51] studying how people talk so dandrovsky and others have a project on
[01:25:54] dandrovsky and others have a project on see analyzing how
[01:25:57] see analyzing how language of police officers and stops
[01:26:00] language of police officers and stops compares depending on whether they're
[01:26:02] compares depending on whether they're stopping you know a black person or a
[01:26:04] stopping you know a black person or a white person and this could be and using
[01:26:06] white person and this could be and using nlp techniques to try to study that
[01:26:09] nlp techniques to try to study that question so i think um one huge area is
[01:26:13] question so i think um one huge area is language is
[01:26:14] language is used in a kind of all in a social and
[01:26:17] used in a kind of all in a social and societal context
[01:26:18] societal context and uh
[01:26:19] and uh you know therefore
[01:26:21] you know therefore building tools that help us
[01:26:23] building tools that help us manage and navigate that you know
[01:26:25] manage and navigate that you know societal can landscape can be really
[01:26:27] societal can landscape can be really interesting um problems it can cause
[01:26:30] interesting um problems it can cause certainly you know fake generations um
[01:26:34] certainly you know fake generations um biases and models if we start trusting
[01:26:36] biases and models if we start trusting you know translations or our systems um
[01:26:39] you know translations or our systems um it could lead to
[01:26:41] it could lead to um you know kind of
[01:26:43] um you know kind of amplification effects where you know
[01:26:45] amplification effects where you know kind of the the haves and have-nots kind
[01:26:48] kind of the the haves and have-nots kind of push farther you know apart
[01:26:51] of push farther you know apart all right well i think we're out of time
[01:26:53] all right well i think we're out of time so
[01:26:54] so thanks everyone for coming and listening
[01:26:57] thanks everyone for coming and listening and have a good rest of the week
[01:27:05] you


================================================================================
LECTURE 056
================================================================================

General Conclusion | Stanford CS221: AI (Autumn 2021)

Source: https://www.youtube.com/watch?v=iUGmupxCdjs

---

Transcript

[00:00:05] welcome everyone to the final lecture
[00:00:08] welcome everyone to the final lecture let me just share my screen and we can
[00:00:10] let me just share my screen and we can get going on this um
[00:00:14] share screen
[00:00:17] share screen okay so this
[00:00:19] okay so this lecture is going to be broken up to
[00:00:21] lecture is going to be broken up to three parts um first i'm going to do a
[00:00:24] three parts um first i'm going to do a quick recap of the class
[00:00:26] quick recap of the class um then i'm going to talk about uh
[00:00:29] um then i'm going to talk about uh future classes that you might
[00:00:31] future classes that you might take hopefully this class has piqued
[00:00:33] take hopefully this class has piqued your interest in ai
[00:00:34] your interest in ai and then finally i'm going to end with
[00:00:37] and then finally i'm going to end with some broad remarks on where ai is going
[00:00:40] some broad remarks on where ai is going and what we should all keep in mind
[00:00:43] and what we should all keep in mind so since it's a live lecture um feel
[00:00:45] so since it's a live lecture um feel free to interrupt and ask questions i'll
[00:00:47] free to interrupt and ask questions i'll monitor the uh chat
[00:00:50] monitor the uh chat or if uh samara if you notice anything
[00:00:53] or if uh samara if you notice anything you can flag it to me as well
[00:00:55] you can flag it to me as well all right so let's begin with a recap so
[00:00:58] all right so let's begin with a recap so um no congratulations to making it so so
[00:01:02] um no congratulations to making it so so far in the in the quarter we've covered
[00:01:05] far in the in the quarter we've covered a lot of ground
[00:01:07] a lot of ground um so i'm just going to highlight some
[00:01:10] um so i'm just going to highlight some of the key things that you should keep
[00:01:12] of the key things that you should keep in mind so recall
[00:01:14] in mind so recall we started with this modeling inference
[00:01:16] we started with this modeling inference learning paradigm so modeling is the
[00:01:19] learning paradigm so modeling is the what it's about how you build a
[00:01:21] what it's about how you build a mathematical model that approximates the
[00:01:23] mathematical model that approximates the real world it might be a neural network
[00:01:25] real world it might be a neural network might be a bayesian network
[00:01:26] might be a bayesian network inference is the process of how you use
[00:01:28] inference is the process of how you use the mathematical model to answer
[00:01:30] the mathematical model to answer questions
[00:01:31] questions it's trivial for neural networks but can
[00:01:34] it's trivial for neural networks but can be really hard for bayesian networks and
[00:01:36] be really hard for bayesian networks and learning is how do you take data and
[00:01:38] learning is how do you take data and produce a model so that you can do
[00:01:40] produce a model so that you can do inference on it
[00:01:42] inference on it so
[00:01:43] so in this course we talked about machine
[00:01:45] in this course we talked about machine learning then reflex based models
[00:01:47] learning then reflex based models state-based models variable-based models
[00:01:49] state-based models variable-based models and logic so let me just go through
[00:01:51] and logic so let me just go through each of them in turn
[00:01:53] each of them in turn so in machine learning we presented the
[00:01:55] so in machine learning we presented the loss minimization framework where you
[00:01:57] loss minimization framework where you have a training set and you want to find
[00:01:58] have a training set and you want to find parameters that minimize some loss and
[00:02:00] parameters that minimize some loss and one thing i want to stress is how
[00:02:02] one thing i want to stress is how general of a principle this is the loss
[00:02:05] general of a principle this is the loss captures basically kind of what you want
[00:02:07] captures basically kind of what you want a classifier to have and we explored a
[00:02:09] a classifier to have and we explored a few different types of losses depending
[00:02:11] few different types of losses depending on the task
[00:02:12] on the task and then we had a fairly simple
[00:02:15] and then we had a fairly simple algorithm stochastic gradient descent
[00:02:18] algorithm stochastic gradient descent that was able to approximately optimize
[00:02:20] that was able to approximately optimize these objective
[00:02:23] these objective functions
[00:02:25] functions and this is really kind of the workhorse
[00:02:27] and this is really kind of the workhorse of machine learning these two slides
[00:02:29] of machine learning these two slides it's
[00:02:29] it's most of machine learning is can be you
[00:02:32] most of machine learning is can be you know captured at least these days by
[00:02:35] know captured at least these days by writing down a loss function and
[00:02:36] writing down a loss function and optimizing it and it works for neural
[00:02:39] optimizing it and it works for neural networks it works for
[00:02:40] networks it works for um you know clustering and you know
[00:02:42] um you know clustering and you know problems and kind of k-means and and so
[00:02:45] problems and kind of k-means and and so on
[00:02:47] on so i want to underscore that machine
[00:02:49] so i want to underscore that machine learning is kind of a general
[00:02:51] learning is kind of a general way of being it's the idea of taking
[00:02:53] way of being it's the idea of taking data and turning into models but there's
[00:02:55] data and turning into models but there's multiple types of models so we looked at
[00:02:57] multiple types of models so we looked at reflex reflex based models in the very
[00:02:59] reflex reflex based models in the very beginning
[00:03:00] beginning linear models neural networks nearest
[00:03:02] linear models neural networks nearest neighbors inferences just feed forward
[00:03:05] neighbors inferences just feed forward pass through the neural network
[00:03:08] pass through the neural network and learning we use to catch a green
[00:03:10] and learning we use to catch a green descent or k-means in the case of
[00:03:12] descent or k-means in the case of clustering
[00:03:14] clustering then we looked at problems where
[00:03:16] then we looked at problems where you weren't interested in just a single
[00:03:18] you weren't interested in just a single decision but you were interested in a
[00:03:21] decision but you were interested in a sequence of decisions from let's say
[00:03:23] sequence of decisions from let's say from getting to point a from to point b
[00:03:26] from getting to point a from to point b and we embarked on the journey of
[00:03:28] and we embarked on the journey of state-based models and here the idea of
[00:03:30] state-based models and here the idea of a state is a summary of all the past
[00:03:32] a state is a summary of all the past actions sufficient to choose future
[00:03:34] actions sufficient to choose future actions optimally
[00:03:36] actions optimally and that crystally encapsulates what
[00:03:38] and that crystally encapsulates what kind of a state based model is and you
[00:03:39] kind of a state based model is and you have lots of practice
[00:03:42] have lots of practice coming up with state-based models for
[00:03:43] coming up with state-based models for various problems if they're
[00:03:45] various problems if they're deterministic those are called search
[00:03:46] deterministic those are called search problems you can use uh uniform cost
[00:03:48] problems you can use uh uniform cost search or a-star if you have randomness
[00:03:50] search or a-star if you have randomness you modify smarthof decision processes
[00:03:53] you modify smarthof decision processes you can use things like value iteration
[00:03:55] you can use things like value iteration for inference and for games
[00:03:58] for inference and for games these capture the cases where there's an
[00:04:00] these capture the cases where there's an adversary and you have to use a min max
[00:04:02] adversary and you have to use a min max formulation
[00:04:04] formulation we for search problems we didn't really
[00:04:06] we for search problems we didn't really touch on learning although you can do
[00:04:08] touch on learning although you can do that um
[00:04:10] that um and and for
[00:04:11] and and for mdps and games
[00:04:13] mdps and games we have these reinforcement learning
[00:04:15] we have these reinforcement learning you know algorithms so really you can
[00:04:17] you know algorithms so really you can think about reinforcement as machine
[00:04:18] think about reinforcement as machine learning for state-based models where
[00:04:21] learning for state-based models where there's randomness in the environment
[00:04:24] there's randomness in the environment then we move on to variable based models
[00:04:26] then we move on to variable based models which is a higher level of abstraction
[00:04:28] which is a higher level of abstraction um it's a kind of a different modeling
[00:04:30] um it's a kind of a different modeling language if you will
[00:04:32] language if you will we looked at the key idea here is a
[00:04:34] we looked at the key idea here is a factor graph
[00:04:36] factor graph which captures a set of variables whose
[00:04:38] which captures a set of variables whose values you want to determine
[00:04:40] values you want to determine and the the factors these are little
[00:04:43] and the the factors these are little squares captures dependencies between
[00:04:46] squares captures dependencies between variables and the key thing is that the
[00:04:48] variables and the key thing is that the factors are generally local but the
[00:04:50] factors are generally local but the questions you want to answer are
[00:04:51] questions you want to answer are probably global
[00:04:54] so they're deterministic we have
[00:04:55] so they're deterministic we have constraint satisfaction problems for
[00:04:57] constraint satisfaction problems for things like scheduling
[00:04:59] things like scheduling we looked at backtracking and beam
[00:05:01] we looked at backtracking and beam search
[00:05:02] search if you put on your probability hat then
[00:05:04] if you put on your probability hat then we can turn these factor graphs into
[00:05:06] we can turn these factor graphs into markov networks by defining a
[00:05:08] markov networks by defining a distribution
[00:05:09] distribution over all the random variables
[00:05:12] over all the random variables for inference we looked at gibbs
[00:05:13] for inference we looked at gibbs sampling but there's other methods as
[00:05:15] sampling but there's other methods as well and then
[00:05:17] well and then to give
[00:05:18] to give a kind of a more of a
[00:05:20] a kind of a more of a interpretation to how the factors are
[00:05:22] interpretation to how the factors are constructed we look at bayesian networks
[00:05:24] constructed we look at bayesian networks where each of the factors was a local
[00:05:26] where each of the factors was a local conditional probability and then we look
[00:05:28] conditional probability and then we look at uh four backward and particle forking
[00:05:30] at uh four backward and particle forking methods for um at least chain structure
[00:05:33] methods for um at least chain structure bayesian networks there's much more to
[00:05:34] bayesian networks there's much more to be said about here this is just kind of
[00:05:36] be said about here this is just kind of a taste
[00:05:36] a taste of variable-based models
[00:05:39] of variable-based models um for loading we only look at learning
[00:05:41] um for loading we only look at learning for bayesian networks
[00:05:43] for bayesian networks based on the maximum likelihood
[00:05:44] based on the maximum likelihood principle but you can apply maximum
[00:05:45] principle but you can apply maximum likelihood to any probabilistic model
[00:05:47] likelihood to any probabilistic model including markov networks for bayesian
[00:05:49] including markov networks for bayesian networks it was really nice because um
[00:05:52] networks it was really nice because um you know learning is closed form you
[00:05:54] you know learning is closed form you just count and normalize
[00:05:56] just count and normalize for latent variables models you have the
[00:05:58] for latent variables models you have the expectation maximization algorithm
[00:06:01] expectation maximization algorithm where you have to use inference to
[00:06:03] where you have to use inference to impute the missing variables and then
[00:06:05] impute the missing variables and then you count and normalize
[00:06:08] and finally we look at logic based
[00:06:10] and finally we look at logic based models
[00:06:11] models and here the idea of logic
[00:06:13] and here the idea of logic is that
[00:06:15] is that it goes a kind of one level of
[00:06:17] it goes a kind of one level of abstraction higher it introduces things
[00:06:19] abstraction higher it introduces things called formulas which allows you to
[00:06:22] called formulas which allows you to represent more kind of powerful things
[00:06:24] represent more kind of powerful things even kind of infinite things you can
[00:06:25] even kind of infinite things you can talk about like all the even numbers for
[00:06:27] talk about like all the even numbers for example which is you know infinite set
[00:06:30] example which is you know infinite set um we look at
[00:06:32] um we look at two models propositional logic and first
[00:06:34] two models propositional logic and first order logic um to do inference it
[00:06:37] order logic um to do inference it generally is pretty um hard
[00:06:40] generally is pretty um hard for propositional logic you can do
[00:06:42] for propositional logic you can do you know model checking or you can work
[00:06:45] you know model checking or you can work on the inference rules directly which is
[00:06:47] on the inference rules directly which is one of the nice things about having
[00:06:48] one of the nice things about having logical rules
[00:06:50] logical rules modus ponens and resolution are
[00:06:52] modus ponens and resolution are different
[00:06:53] different inference methods
[00:06:55] inference methods so
[00:06:56] so you know sometimes i like to say and
[00:06:57] you know sometimes i like to say and logic is about how you can express kind
[00:06:59] logic is about how you can express kind of very complicated things very
[00:07:00] of very complicated things very succinctly
[00:07:04] and learning we didn't get to get a
[00:07:05] and learning we didn't get to get a chance to talk about but there are ways
[00:07:06] chance to talk about but there are ways to also bring machine learning to logic
[00:07:10] to also bring machine learning to logic so um hopefully
[00:07:12] so um hopefully this
[00:07:13] this you know you see cs291 and and if you
[00:07:16] you know you see cs291 and and if you haven't seen a lot of this material for
[00:07:17] haven't seen a lot of this material for the first time it can be a little bit
[00:07:19] the first time it can be a little bit overwhelming there's just so many models
[00:07:20] overwhelming there's just so many models tools
[00:07:21] tools um methods
[00:07:23] um methods um but i hope that this kind of
[00:07:25] um but i hope that this kind of organization gives you a way to think
[00:07:28] organization gives you a way to think about you know how everything fits
[00:07:31] about you know how everything fits together right i don't want you to think
[00:07:33] together right i don't want you to think about like oh okay there's nearest
[00:07:34] about like oh okay there's nearest neighbors and there's bayesian networks
[00:07:35] neighbors and there's bayesian networks how are they related i i want you guys
[00:07:38] how are they related i i want you guys to think about you know the trajectory
[00:07:39] to think about you know the trajectory of
[00:07:40] of there's a bunch of models with different
[00:07:42] there's a bunch of models with different kind of uh which are bucketed into
[00:07:44] kind of uh which are bucketed into reflex state and variable and logic and
[00:07:47] reflex state and variable and logic and then you can do learning on on top of
[00:07:50] then you can do learning on on top of that and that allows you to
[00:07:52] that and that allows you to um have just a much more
[00:07:55] um have just a much more nuanced and
[00:07:56] nuanced and you know holistic picture of all the
[00:07:58] you know holistic picture of all the methods in ai
[00:08:02] methods in ai um and what's important i think is that
[00:08:04] um and what's important i think is that you know the individual methods like
[00:08:06] you know the individual methods like whether you use particle filtering will
[00:08:08] whether you use particle filtering will change over time and in general in
[00:08:10] change over time and in general in applications you might have to use
[00:08:12] applications you might have to use something a bit more sophisticated so
[00:08:14] something a bit more sophisticated so hopefully this class has imparted on you
[00:08:17] hopefully this class has imparted on you just kind of a way of thinking about you
[00:08:20] just kind of a way of thinking about you know the modeling and the inference and
[00:08:21] know the modeling and the inference and learning as separate so that
[00:08:24] learning as separate so that whenever you encounter a new algorithm
[00:08:26] whenever you encounter a new algorithm that you read a paper somewhere um you
[00:08:28] that you read a paper somewhere um you can actually kind of incorporate it into
[00:08:30] can actually kind of incorporate it into your conceptual map
[00:08:34] okay that's all i want to say about the
[00:08:36] okay that's all i want to say about the recap are there any questions
[00:08:40] recap are there any questions can you uh throw some light on what
[00:08:43] can you uh throw some light on what should be the first tool that we should
[00:08:45] should be the first tool that we should be using uh if we are presented with a
[00:08:47] be using uh if we are presented with a problem
[00:08:52] what is a tool that you should try first
[00:08:54] what is a tool that you should try first it really depends on the problem
[00:08:57] it really depends on the problem i think these days it's very
[00:08:59] i think these days it's very natural and easy um to throw machine
[00:09:02] natural and easy um to throw machine learning supervised classification at
[00:09:04] learning supervised classification at the problem
[00:09:05] the problem um and that makes sense when your
[00:09:08] um and that makes sense when your um you know problem involves you know
[00:09:11] um you know problem involves you know basically a single action it's high
[00:09:13] basically a single action it's high dimensional you don't really know what
[00:09:14] dimensional you don't really know what to do with it
[00:09:16] to do with it um but for many problems like if you're
[00:09:18] um but for many problems like if you're kind of scheduling or doing route
[00:09:20] kind of scheduling or doing route planning
[00:09:20] planning or something a bit more structured um
[00:09:24] or something a bit more structured um you wouldn't want to necessarily start
[00:09:25] you wouldn't want to necessarily start with machine learning because in sort of
[00:09:26] with machine learning because in sort of start with machine learning
[00:09:28] start with machine learning you need to gather data and if you don't
[00:09:30] you need to gather data and if you don't have data then that might not be the
[00:09:32] have data then that might not be the best place you know to start
[00:09:35] best place you know to start so i don't think that there's any
[00:09:38] so i don't think that there's any one place to start um
[00:09:41] one place to start um and hopefully these
[00:09:43] and hopefully these you can think about the cs291 toolboxes
[00:09:45] you can think about the cs291 toolboxes i kind of the the first layer in the
[00:09:48] i kind of the the first layer in the breadth first search these are the
[00:09:49] breadth first search these are the different options you should think about
[00:09:51] different options you should think about is machine learning good or should this
[00:09:53] is machine learning good or should this should this be a search problem or
[00:09:55] should this be a search problem or should this be a
[00:09:56] should this be a bayesian network for example
[00:10:03] most of the current machine learning are
[00:10:05] most of the current machine learning are reflex based um they're a low-level
[00:10:07] reflex based um they're a low-level intelligent comparative logic
[00:10:09] intelligent comparative logic interesting
[00:10:10] interesting point
[00:10:12] point um
[00:10:13] um we are in a very interesting time where
[00:10:15] we are in a very interesting time where a lot of what we see is machine learning
[00:10:18] a lot of what we see is machine learning and it's also very impressive how
[00:10:21] and it's also very impressive how a lot of the so-called reflex based
[00:10:23] a lot of the so-called reflex based models are actually capable of doing
[00:10:25] models are actually capable of doing some fairly sophisticated things
[00:10:28] some fairly sophisticated things like if you think about you know alphago
[00:10:30] like if you think about you know alphago yes there was multi-college research
[00:10:32] yes there was multi-college research that allowed you to actually build a
[00:10:34] that allowed you to actually build a competitive agent but even just like uh
[00:10:36] competitive agent but even just like uh um you know classifying a game board um
[00:10:40] um you know classifying a game board um that i mean that could definitely beat
[00:10:41] that i mean that could definitely beat me at a goal um
[00:10:43] me at a goal um so
[00:10:45] so you know it's
[00:10:46] you know it's i think that
[00:10:48] i think that you know in
[00:10:49] you know in in cognitive science people talk about
[00:10:51] in cognitive science people talk about system one and system two right and both
[00:10:53] system one and system two right and both co kind of coexist system one is kind of
[00:10:55] co kind of coexist system one is kind of the reflexive um
[00:10:58] the reflexive um agent uh making kind of
[00:11:01] agent uh making kind of you know guesses at what uh what should
[00:11:03] you know guesses at what uh what should be the right thing to do and system two
[00:11:05] be the right thing to do and system two is kind of the more rational well
[00:11:07] is kind of the more rational well thought out and reason path and i think
[00:11:09] thought out and reason path and i think we need both and the two
[00:11:12] we need both and the two need to kind of coexist and kind of feed
[00:11:14] need to kind of coexist and kind of feed off of each other
[00:11:16] off of each other um could you give some examples of ml
[00:11:18] um could you give some examples of ml methods for search problems and logic
[00:11:21] methods for search problems and logic problems
[00:11:22] problems uh so
[00:11:24] uh so search um a lot of things can be passed
[00:11:27] search um a lot of things can be passed as search problems so there's a whole
[00:11:28] as search problems so there's a whole field called structure prediction which
[00:11:30] field called structure prediction which your our goal is to output
[00:11:32] your our goal is to output a structure let's say a graph or a
[00:11:35] a structure let's say a graph or a sentence or
[00:11:37] sentence or or something
[00:11:39] or something and
[00:11:40] and um in those cases you often want to
[00:11:44] um in those cases you often want to learn how to do that
[00:11:46] learn how to do that so there you actually combine
[00:11:49] so there you actually combine some search techniques so the inference
[00:11:52] some search techniques so the inference um
[00:11:53] um algorithm becomes you know search um
[00:11:56] algorithm becomes you know search um rather than just a fee for through a
[00:11:58] rather than just a fee for through a neural network but the kind of the
[00:11:59] neural network but the kind of the learning part is is still the same and
[00:12:02] learning part is is still the same and we didn't talk about the structure
[00:12:03] we didn't talk about the structure perception i think but it's a night um
[00:12:05] perception i think but it's a night um you should look at it i think it's in
[00:12:07] you should look at it i think it's in the slide still where
[00:12:09] the slide still where you are
[00:12:11] you are um
[00:12:12] um able to
[00:12:14] able to you know make a prediction using an
[00:12:16] you know make a prediction using an inference algorithm you get a prediction
[00:12:18] inference algorithm you get a prediction you compare that with the the correct
[00:12:20] you compare that with the the correct prediction and you do a gradient update
[00:12:24] prediction and you do a gradient update um and learn logic
[00:12:26] um and learn logic um
[00:12:27] um you know there's similar things you can
[00:12:29] you know there's similar things you can you could do for example markov logic is
[00:12:31] you could do for example markov logic is a way of using combining kind of markov
[00:12:34] a way of using combining kind of markov logic in the context of markov networks
[00:12:36] logic in the context of markov networks and markup networks you can estimate
[00:12:38] and markup networks you can estimate using maximum likelihood
[00:12:42] all right uh let's
[00:12:44] all right uh let's yeah great questions um feel free to
[00:12:47] yeah great questions um feel free to put more in the chat but i'm going to
[00:12:49] put more in the chat but i'm going to move on for now since there's a few
[00:12:51] move on for now since there's a few other things
[00:12:52] other things um to get through
[00:12:54] um to get through okay
[00:12:55] okay so now you've taken cs221 maybe this was
[00:12:57] so now you've taken cs221 maybe this was your first class maybe you've taken um
[00:12:59] your first class maybe you've taken um by a bunch of other classes i want to
[00:13:01] by a bunch of other classes i want to talk about
[00:13:03] talk about what else is related to cs2201 so first
[00:13:06] what else is related to cs2201 so first of all i'm not going to give you a whole
[00:13:07] of all i'm not going to give you a whole list of the complete list of courses you
[00:13:10] list of the complete list of courses you can see the list of ai courses on this
[00:13:12] can see the list of ai courses on this website
[00:13:13] website um but that isn't even the whole list of
[00:13:15] um but that isn't even the whole list of courses which i think are relevant
[00:13:17] courses which i think are relevant so what i've done here is try to help us
[00:13:19] so what i've done here is try to help us understand what are the types of courses
[00:13:21] understand what are the types of courses that you might be interested in why you
[00:13:23] that you might be interested in why you might be interested in them and then i'm
[00:13:25] might be interested in them and then i'm going to go through each category and
[00:13:27] going to go through each category and give a few examples of the most kind of
[00:13:29] give a few examples of the most kind of popular ones
[00:13:31] popular ones so
[00:13:33] so the the most kind of obvious type of
[00:13:35] the the most kind of obvious type of course is well here we've taken 221
[00:13:38] course is well here we've taken 221 we've
[00:13:40] we've learned about some methods let's learn
[00:13:41] learned about some methods let's learn more about methods let's learn about you
[00:13:44] more about methods let's learn about you know monte carlo markov chain monte
[00:13:46] know monte carlo markov chain monte carlo in general for example and these
[00:13:49] carlo in general for example and these tend to be a more general purpose
[00:13:51] tend to be a more general purpose um
[00:13:53] um but that's not the only type of course
[00:13:54] but that's not the only type of course that i think is relevant so applications
[00:13:57] that i think is relevant so applications are extremely useful of course we are
[00:13:59] are extremely useful of course we are interested in you know applications
[00:14:01] interested in you know applications because that's the kind of real impact
[00:14:03] because that's the kind of real impact of ai is when they're applied to things
[00:14:05] of ai is when they're applied to things but also in the other direction
[00:14:07] but also in the other direction often when you take an applied class you
[00:14:10] often when you take an applied class you learn the method much better than if you
[00:14:12] learn the method much better than if you take a kind of an abstract because you
[00:14:14] take a kind of an abstract because you appreciate when it works when it doesn't
[00:14:15] appreciate when it works when it doesn't work all the nuances
[00:14:18] work all the nuances and then finally
[00:14:19] and then finally i would really stress and kind of invest
[00:14:22] i would really stress and kind of invest in building depth for both kind of
[00:14:23] in building depth for both kind of methods and applications
[00:14:25] methods and applications and usually these are courses not in ai
[00:14:27] and usually these are courses not in ai they might
[00:14:28] they might for a method side maybe i'm
[00:14:30] for a method side maybe i'm investigating kind of more mathematical
[00:14:31] investigating kind of more mathematical foundations or an applied area if you're
[00:14:34] foundations or an applied area if you're interested in computational biology take
[00:14:36] interested in computational biology take a biology class
[00:14:37] a biology class um i think these days it's too
[00:14:40] um i think these days it's too i think easy to kind of go through
[00:14:42] i think easy to kind of go through um kind of an ai curriculum and not
[00:14:44] um kind of an ai curriculum and not really have as much depth
[00:14:47] really have as much depth because you can you can do a lot with
[00:14:50] because you can you can do a lot with with kind of just downloading you know
[00:14:53] with kind of just downloading you know packages and running data but i think if
[00:14:55] packages and running data but i think if you really wanted to especially if you
[00:14:57] you really wanted to especially if you think about kind of research
[00:14:58] think about kind of research having the depth kind of can distinguish
[00:15:00] having the depth kind of can distinguish you from
[00:15:01] you from um and make you
[00:15:03] um and make you more able to come and come up with kind
[00:15:05] more able to come and come up with kind of new insights and ideas
[00:15:08] of new insights and ideas so let's start with methods um
[00:15:11] so let's start with methods um so these are categorized into maybe the
[00:15:13] so these are categorized into maybe the different topic areas that we've covered
[00:15:15] different topic areas that we've covered in the class so first is you know
[00:15:16] in the class so first is you know machine learning everyone else probably
[00:15:18] machine learning everyone else probably knows about cs220 alliance so it's kind
[00:15:20] knows about cs220 alliance so it's kind of the standard um
[00:15:22] of the standard um you know poster child machine learning
[00:15:24] you know poster child machine learning class
[00:15:25] class um compared to cs221 i think there's
[00:15:27] um compared to cs221 i think there's this question comes up a lot it's more
[00:15:30] this question comes up a lot it's more kind of mathematical derivations rather
[00:15:32] kind of mathematical derivations rather than as much you know programming um
[00:15:34] than as much you know programming um there's more continuous variables cs201
[00:15:37] there's more continuous variables cs201 tries to shield you from that and deal
[00:15:39] tries to shield you from that and deal with discrete variables just
[00:15:42] with discrete variables just for the interest of kind of scoping and
[00:15:44] for the interest of kind of scoping and you learn some kind of fancier things
[00:15:45] you learn some kind of fancier things like kernel methods and pca so if you
[00:15:47] like kernel methods and pca so if you really want to dig in more into machine
[00:15:49] really want to dig in more into machine learning
[00:15:51] learning you know that's the class for you
[00:15:53] you know that's the class for you if you are looking for just a kind of
[00:15:54] if you are looking for just a kind of more of how do i apply machine learning
[00:15:57] more of how do i apply machine learning especially deep learning which has been
[00:15:59] especially deep learning which has been you know increasingly important
[00:16:02] you know increasingly important you know the ts230 will tell you how to
[00:16:04] you know the ts230 will tell you how to train these duke neural networks which
[00:16:07] train these duke neural networks which um is
[00:16:08] um is has a lot of you know
[00:16:10] has a lot of you know bells and whistles and things that you
[00:16:12] bells and whistles and things that you need to know about like dropout and
[00:16:14] need to know about like dropout and batch norm um to make these things work
[00:16:16] batch norm um to make these things work so if you're really interested in the
[00:16:17] so if you're really interested in the general practice of how you get deep
[00:16:19] general practice of how you get deep learning to work
[00:16:21] learning to work that's a class for you
[00:16:24] so there's three other classes
[00:16:27] so there's three other classes that i want to mention of course there's
[00:16:28] that i want to mention of course there's more so the first class is machine
[00:16:31] more so the first class is machine learning under distribution of shifts
[00:16:34] learning under distribution of shifts so we have mentioned a few times that
[00:16:36] so we have mentioned a few times that machine learning fails when the training
[00:16:37] machine learning fails when the training distribution isn't the same as the test
[00:16:39] distribution isn't the same as the test distribution for example
[00:16:40] distribution for example um there's adversarial examples and this
[00:16:44] um there's adversarial examples and this class is
[00:16:45] class is aimed at
[00:16:46] aimed at telling you you know
[00:16:48] telling you you know what's going on there and what you can
[00:16:49] what's going on there and what you can do about it
[00:16:51] do about it um
[00:16:52] um often you think about machine learning
[00:16:54] often you think about machine learning as uh kind of one task at a time but
[00:16:56] as uh kind of one task at a time but increasingly now
[00:16:58] increasingly now we're seeing much more general learning
[00:17:01] we're seeing much more general learning tools that allow you to generalize
[00:17:02] tools that allow you to generalize across multiple tasks um so that's you
[00:17:05] across multiple tasks um so that's you know pretty exciting
[00:17:07] know pretty exciting and finally we think about machine
[00:17:09] and finally we think about machine learning often on kind of single data
[00:17:11] learning often on kind of single data points
[00:17:12] points which are less structured but you know
[00:17:14] which are less structured but you know machine learning can be done in the
[00:17:15] machine learning can be done in the context of graphs
[00:17:17] context of graphs so there's a class
[00:17:18] so there's a class about that as well
[00:17:21] about that as well then there's reinforcement learning if
[00:17:22] then there's reinforcement learning if you like the reinforcement learning
[00:17:24] you like the reinforcement learning section
[00:17:25] section and want to know
[00:17:27] and want to know more advanced methods take cs234
[00:17:30] more advanced methods take cs234 we get you get to learn about policy
[00:17:32] we get you get to learn about policy search
[00:17:33] search um whereas we've looked at kind of more
[00:17:35] um whereas we've looked at kind of more like q learning which is that's maybe a
[00:17:37] like q learning which is that's maybe a value function
[00:17:38] value function um
[00:17:39] um then there's a decision making on
[00:17:41] then there's a decision making on there's uncertainty from the erastral
[00:17:44] there's uncertainty from the erastral you know department which focuses more
[00:17:46] you know department which focuses more on model based if you remember
[00:17:47] on model based if you remember distinction between model based and
[00:17:49] distinction between model based and model free so in kind of more serious
[00:17:52] model free so in kind of more serious uh you know applications you really want
[00:17:54] uh you know applications you really want to
[00:17:55] to have maybe a model of what's going on
[00:17:59] have maybe a model of what's going on in the world and you could do things
[00:18:00] in the world and you could do things like planning rather than just like
[00:18:01] like planning rather than just like being kind of a reflex agent
[00:18:04] being kind of a reflex agent um general models if you enjoyed
[00:18:06] um general models if you enjoyed bayesian networks um and markup networks
[00:18:09] bayesian networks um and markup networks and this is the class for you cs228 um
[00:18:13] and this is the class for you cs228 um probably graphical models it's kind of a
[00:18:16] probably graphical models it's kind of a fairly kind of natural extension of the
[00:18:18] fairly kind of natural extension of the things that we've talked about fancier
[00:18:20] things that we've talked about fancier inference algorithms um how you learn
[00:18:22] inference algorithms um how you learn the structure and so on
[00:18:24] the structure and so on um
[00:18:25] um recent in the last i guess i don't know
[00:18:27] recent in the last i guess i don't know five years there's been a surge of
[00:18:29] five years there's been a surge of interest in um
[00:18:31] interest in um you know general models which are
[00:18:32] you know general models which are supercharged with deep learning and uh
[00:18:35] supercharged with deep learning and uh if you have probably many people have
[00:18:37] if you have probably many people have seen gans generating really photo
[00:18:39] seen gans generating really photo realistic images um this is all enabled
[00:18:41] realistic images um this is all enabled by deep generative models which builds
[00:18:43] by deep generative models which builds on the principles of general models but
[00:18:45] on the principles of general models but you kind of combine it with deep
[00:18:47] you kind of combine it with deep learning and you get you know
[00:18:49] learning and you get you know really
[00:18:50] really interesting results
[00:18:52] interesting results so let's talk a little bit about
[00:18:54] so let's talk a little bit about applications i'm only going to talk
[00:18:56] applications i'm only going to talk about three applications vision language
[00:18:58] about three applications vision language and robotics of course there's
[00:18:59] and robotics of course there's computational biology there's healthcare
[00:19:01] computational biology there's healthcare and there's other things which um i'm
[00:19:03] and there's other things which um i'm not gonna have time to mention
[00:19:05] not gonna have time to mention so vision there are uh
[00:19:08] so vision there are uh there's a kind of a
[00:19:10] there's a kind of a stock
[00:19:11] stock i mean the the canonical vision uh class
[00:19:13] i mean the the canonical vision uh class is i guess at this point as cs231n
[00:19:17] is i guess at this point as cs231n it's fairly machine learning heavy you
[00:19:19] it's fairly machine learning heavy you talk about like learn what confidence
[00:19:20] talk about like learn what confidence and transformer so it's
[00:19:22] and transformer so it's more um general purpose than vision
[00:19:26] more um general purpose than vision but you talk about some you know vision
[00:19:28] but you talk about some you know vision specific tasks like detection
[00:19:30] specific tasks like detection segmentation generation
[00:19:32] segmentation generation um
[00:19:32] um there is
[00:19:33] there is cs231a which talks is more kind of on
[00:19:37] cs231a which talks is more kind of on the vision side so if you feel like
[00:19:38] the vision side so if you feel like you've already
[00:19:39] you've already uh know your ml
[00:19:41] uh know your ml but you really want to learn more about
[00:19:43] but you really want to learn more about kind of vision this might be a good um
[00:19:46] kind of vision this might be a good um class for you um because vision
[00:19:47] class for you um because vision ultimately is about how light you know
[00:19:50] ultimately is about how light you know works in with um in the kind of a 3d
[00:19:53] works in with um in the kind of a 3d world
[00:19:54] world um and so you kind of get into that
[00:19:57] um and so you kind of get into that there's also a i think a newish class on
[00:20:01] there's also a i think a newish class on how um
[00:20:02] how um ai intersects with graphics which is
[00:20:04] ai intersects with graphics which is kind of a close cousin of vision
[00:20:07] kind of a close cousin of vision um and this is has some emphasis on kind
[00:20:10] um and this is has some emphasis on kind of generating things like generating
[00:20:12] of generating things like generating animation but also a much more
[00:20:16] animation but also a much more you know in-depth emphasis on kind of
[00:20:18] you know in-depth emphasis on kind of rendering and geometry
[00:20:22] okay so robotics uh
[00:20:25] okay so robotics uh there's introduction to robotics where
[00:20:27] there's introduction to robotics where you learn about um how to how you
[00:20:30] you learn about um how to how you explore kind of physical models of your
[00:20:32] explore kind of physical models of your products how to kind of move arms and
[00:20:35] products how to kind of move arms and how to relate kind of joint angles to
[00:20:37] how to relate kind of joint angles to actually what the robot does in the in
[00:20:39] actually what the robot does in the in the real world
[00:20:40] the real world um
[00:20:42] um cs2237 has a little bit more
[00:20:44] cs2237 has a little bit more uh
[00:20:46] uh learning involved because for more
[00:20:47] learning involved because for more complex robotics tasks you can't really
[00:20:50] complex robotics tasks you can't really do everything from first principles so
[00:20:51] do everything from first principles so there's some learning involved but you
[00:20:53] there's some learning involved but you know you still need to
[00:20:55] know you still need to look at kind of the
[00:20:57] look at kind of the you know the structure of the robotics
[00:20:58] you know the structure of the robotics problems
[00:21:00] problems um language there's a few language
[00:21:01] um language there's a few language classes cs224n is a kind of the standard
[00:21:04] classes cs224n is a kind of the standard language uh class
[00:21:06] language uh class um it is also
[00:21:08] um it is also ml heavy just like cs231n
[00:21:11] ml heavy just like cs231n um it talks about a sun a bunch of
[00:21:13] um it talks about a sun a bunch of different language tasks like parsing
[00:21:15] different language tasks like parsing and translation
[00:21:17] and translation cs224u is um called natural language
[00:21:20] cs224u is um called natural language understanding people ask like what's the
[00:21:22] understanding people ask like what's the difference between processing
[00:21:23] difference between processing understanding
[00:21:24] understanding um
[00:21:25] um historically there used to be a bigger
[00:21:27] historically there used to be a bigger difference but now with deep learning i
[00:21:29] difference but now with deep learning i think these two classes have
[00:21:31] think these two classes have you know have much more overlap um you
[00:21:34] you know have much more overlap um you can look at the topics they're kind of
[00:21:36] can look at the topics they're kind of slightly you know different maybe more
[00:21:38] slightly you know different maybe more emphasis on like i know semantics
[00:21:41] emphasis on like i know semantics um
[00:21:42] um there's a class on applications of
[00:21:44] there's a class on applications of virtual assistants
[00:21:45] virtual assistants and um next
[00:21:47] and um next quarter i'm actually going to be
[00:21:49] quarter i'm actually going to be teaching understanding and developing
[00:21:50] teaching understanding and developing large language models um so you might
[00:21:53] large language models um so you might have heard me talk about foundation
[00:21:54] have heard me talk about foundation models or gp3 or things like that
[00:21:58] models or gp3 or things like that beyond just the kind of the
[00:22:00] beyond just the kind of the the technical aspects of how these
[00:22:02] the technical aspects of how these models work and how they're built are a
[00:22:05] models work and how they're built are a lot of kind of social ethical legal
[00:22:07] lot of kind of social ethical legal considerations um
[00:22:10] considerations um so
[00:22:11] so we're going to talk about some of those
[00:22:12] we're going to talk about some of those things as well as giving you hands-on
[00:22:14] things as well as giving you hands-on experience giving you access to these
[00:22:16] experience giving you access to these large language models so you can kind of
[00:22:17] large language models so you can kind of feel them and kind of even train some of
[00:22:20] feel them and kind of even train some of them yourself so it should be an
[00:22:21] them yourself so it should be an exciting interesting class
[00:22:25] okay so the third category is
[00:22:27] okay so the third category is foundations um there's many types of
[00:22:29] foundations um there's many types of foundations um these are more
[00:22:31] foundations um these are more mathematical foundations um so convex
[00:22:34] mathematical foundations um so convex optimization it's a great class
[00:22:36] optimization it's a great class to
[00:22:37] to really kind of understand you know
[00:22:39] really kind of understand you know optimization
[00:22:40] optimization so most machine learning people these
[00:22:42] so most machine learning people these days think you know run s2d and that's
[00:22:44] days think you know run s2d and that's that's good and for many things that's
[00:22:47] that's good and for many things that's that's fine um for a kind of more
[00:22:50] that's fine um for a kind of more sloppy optimization that's fine
[00:22:52] sloppy optimization that's fine there are cases where you do want to
[00:22:53] there are cases where you do want to optimize your
[00:22:55] optimize your um your utility function and you need to
[00:22:58] um your utility function and you need to do something more serious
[00:23:00] do something more serious so optimization and also this class is
[00:23:03] so optimization and also this class is actually a
[00:23:04] actually a you know i took this class in a similar
[00:23:06] you know i took this class in a similar class in in grad school and that's
[00:23:08] class in in grad school and that's really when i started you know kind of
[00:23:10] really when i started you know kind of understanding linear algebra so i think
[00:23:12] understanding linear algebra so i think it's
[00:23:13] it's even if you're not interested in
[00:23:14] even if you're not interested in optimization
[00:23:15] optimization it gives you familiarity with thinking
[00:23:17] it gives you familiarity with thinking about kind of uh you know linear algebra
[00:23:20] about kind of uh you know linear algebra um statistical inference so there's a
[00:23:23] um statistical inference so there's a whole host of statistics classes which
[00:23:26] whole host of statistics classes which is important to kind of think about so
[00:23:28] is important to kind of think about so you know machinery and statistics have
[00:23:30] you know machinery and statistics have clearly a lot of overlap but they kind
[00:23:32] clearly a lot of overlap but they kind of have different emphasis statistics
[00:23:34] of have different emphasis statistics focus on more on kind of scientific
[00:23:36] focus on more on kind of scientific discovery especially learning more
[00:23:37] discovery especially learning more engineering um so some of the questions
[00:23:40] engineering um so some of the questions you might ask are different like you
[00:23:41] you might ask are different like you care about like hypothesis testing and
[00:23:43] care about like hypothesis testing and confidence intervals and the validity of
[00:23:46] confidence intervals and the validity of your
[00:23:46] your inferences because you don't always have
[00:23:49] inferences because you don't always have just like a held out test set that you
[00:23:50] just like a held out test set that you can or validation set that you can
[00:23:53] can or validation set that you can measure performance against like you
[00:23:54] measure performance against like you have in engineering um so a lot of if
[00:23:56] have in engineering um so a lot of if you're thinking about you know more like
[00:23:58] you're thinking about you know more like kind of scientific applications i think
[00:24:00] kind of scientific applications i think a bit of rigorous statistical thinking
[00:24:02] a bit of rigorous statistical thinking would be healthy
[00:24:04] would be healthy and there's a class
[00:24:06] and there's a class if you ever wonder
[00:24:07] if you ever wonder why does it all work why is machine
[00:24:09] why does it all work why is machine learning deep learning
[00:24:11] learning deep learning why is it so effective you can take
[00:24:14] why is it so effective you can take machine learning theory
[00:24:16] machine learning theory and it talks quite
[00:24:18] and it talks quite in-depth about the kind of uh fairly
[00:24:20] in-depth about the kind of uh fairly technical probabilistic tools like
[00:24:23] technical probabilistic tools like uniform convergence that helps you
[00:24:24] uniform convergence that helps you explain
[00:24:25] explain or partially explain the success of
[00:24:27] or partially explain the success of machine learning
[00:24:28] machine learning um
[00:24:29] um although you know if you're it won't
[00:24:32] although you know if you're it won't answer the question why things work
[00:24:34] answer the question why things work there's a lot left to be understood but
[00:24:36] there's a lot left to be understood but it hopefully will give you at least a
[00:24:37] it hopefully will give you at least a little bit of taste of like oh okay now
[00:24:40] little bit of taste of like oh okay now i understand like
[00:24:41] i understand like it's not just a kind of a you know all
[00:24:43] it's not just a kind of a you know all heuristic there's some kind of
[00:24:45] heuristic there's some kind of statistical principles behind what we're
[00:24:46] statistical principles behind what we're doing
[00:24:48] doing um cognitive
[00:24:49] um cognitive science and neuroscience are
[00:24:53] science and neuroscience are kind of
[00:24:54] kind of other areas that feed into ai
[00:24:57] other areas that feed into ai kind of science you can think about as a
[00:24:58] kind of science you can think about as a software we're thinking about the human
[00:25:00] software we're thinking about the human mind so this class uh talks about using
[00:25:03] mind so this class uh talks about using probabilistic programs remember from you
[00:25:05] probabilistic programs remember from you know the bayesian networks
[00:25:08] know the bayesian networks kind of modules to model human reasons
[00:25:10] kind of modules to model human reasons so this is really kind of interesting
[00:25:13] so this is really kind of interesting um and then
[00:25:14] um and then you can look at neuroscience which is
[00:25:16] you can look at neuroscience which is has to do maybe more with a clinical
[00:25:18] has to do maybe more with a clinical hardware i mean this is a theoretical
[00:25:20] hardware i mean this is a theoretical neuroscience class so it's not actually
[00:25:21] neuroscience class so it's not actually going to be real
[00:25:22] going to be real um you know
[00:25:24] um you know you know hardware so to speak but you
[00:25:26] you know hardware so to speak but you ask questions like you know what is the
[00:25:29] ask questions like you know what is the um you know back propagation which is
[00:25:31] um you know back propagation which is the bread and butter of um you know deep
[00:25:33] the bread and butter of um you know deep learning actually you the brain can't
[00:25:36] learning actually you the brain can't implement that because it's not a local
[00:25:39] implement that because it's not a local kind of rule so people have been
[00:25:41] kind of rule so people have been interested in these questions like what
[00:25:42] interested in these questions like what is kind of a really plausible
[00:25:45] is kind of a really plausible approximation that kind of explains
[00:25:47] approximation that kind of explains learning so pretty interesting open
[00:25:49] learning so pretty interesting open question there
[00:25:51] question there okay to summarize um so here's the type
[00:25:54] okay to summarize um so here's the type of classes methods so this is kind of
[00:25:56] of classes methods so this is kind of going straight in some sense you learn
[00:25:59] going straight in some sense you learn about more advanced techniques general
[00:26:00] about more advanced techniques general purpose
[00:26:01] purpose all good but i would really encourage
[00:26:02] all good but i would really encourage you to also think about
[00:26:04] you to also think about applications of ai especially things
[00:26:07] applications of ai especially things that you really interest you and
[00:26:10] that you really interest you and you know that you're passionate about
[00:26:12] you know that you're passionate about and they again they really help you
[00:26:13] and they again they really help you understand and appreciate the methods
[00:26:15] understand and appreciate the methods that you're you're learning
[00:26:17] that you're you're learning um and do invest you know some time in
[00:26:21] um and do invest you know some time in investing in depth
[00:26:22] investing in depth um and there's a lot of classes outside
[00:26:26] um and there's a lot of classes outside ai at stanford so
[00:26:28] ai at stanford so definitely explore and don't limit
[00:26:30] definitely explore and don't limit yourself just to kind of ai classes
[00:26:32] yourself just to kind of ai classes um so just some just
[00:26:35] um so just some just general tips
[00:26:37] general tips uh
[00:26:38] uh you know beyond taking classes
[00:26:40] you know beyond taking classes there are a lot of resources online
[00:26:42] there are a lot of resources online talks you know tutorials blog posts um
[00:26:46] talks you know tutorials blog posts um you know it's information rich and you
[00:26:49] you know it's information rich and you know you can learn a lot from just
[00:26:51] know you can learn a lot from just watching things online if that mode of
[00:26:54] watching things online if that mode of learning works for you
[00:26:55] learning works for you some people prefer downloading code and
[00:26:57] some people prefer downloading code and tinkering a lot of stuff thankfully is
[00:26:59] tinkering a lot of stuff thankfully is still open source and people release
[00:27:01] still open source and people release their code and tutorials
[00:27:03] their code and tutorials and just talk to professors and other
[00:27:05] and just talk to professors and other students about you know what um
[00:27:08] students about you know what um not just like what classes to take but
[00:27:10] not just like what classes to take but how they think about ai in the world
[00:27:13] how they think about ai in the world because a lot of
[00:27:14] because a lot of um
[00:27:15] um you know learning is not written down in
[00:27:18] you know learning is not written down in some sort of formulaic uh
[00:27:20] some sort of formulaic uh um
[00:27:21] um you know textbook i think the field is
[00:27:24] you know textbook i think the field is moving so fast that i think sometimes
[00:27:26] moving so fast that i think sometimes it's just like in the in the heads of a
[00:27:28] it's just like in the in the heads of a few people
[00:27:31] all right so that's the end of the
[00:27:32] all right so that's the end of the second section um i'll take any
[00:27:35] second section um i'll take any questions now
[00:27:37] questions now so uh
[00:27:38] so uh is it okay to take 230 without first
[00:27:41] is it okay to take 230 without first taking 229. i believe the answer is yes
[00:27:44] taking 229. i believe the answer is yes um if anyone has taken these uh feel
[00:27:46] um if anyone has taken these uh feel free to chime in i think 2 29
[00:27:49] free to chime in i think 2 29 um you
[00:27:51] um you i mean especially if you've taken 221
[00:27:52] i mean especially if you've taken 221 that should be more enough to take 230.
[00:27:55] that should be more enough to take 230. 221 really gets you kind of there you
[00:27:57] 221 really gets you kind of there you derive uh
[00:27:59] derive uh you know a lot of different learning
[00:28:00] you know a lot of different learning algorithms and you know think about like
[00:28:03] algorithms and you know think about like you know mixtures of gaussians and so on
[00:28:05] you know mixtures of gaussians and so on which aren't um needed if you're just
[00:28:07] which aren't um needed if you're just interested in you know applying deep
[00:28:09] interested in you know applying deep learning
[00:28:12] any other questions
[00:28:15] any other questions what would be the best way to talk to
[00:28:16] what would be the best way to talk to professors and other students
[00:28:19] professors and other students that is a good question
[00:28:22] that is a good question um
[00:28:23] um i guess ed is probably not going to be
[00:28:25] i guess ed is probably not going to be super
[00:28:26] super i mean it's probably going to go dead
[00:28:28] i mean it's probably going to go dead after the course um i guess email is
[00:28:31] after the course um i guess email is isn't always an option i mean the best
[00:28:34] isn't always an option i mean the best time to talk to a
[00:28:36] time to talk to a you know professor other students is
[00:28:37] you know professor other students is during the quarter when they're holding
[00:28:40] during the quarter when they're holding office hours and everything
[00:28:42] office hours and everything but maybe
[00:28:43] but maybe you know after the course there's still
[00:28:45] you know after the course there's still some professors still have office hours
[00:28:47] some professors still have office hours um is it okay to take 2204n without
[00:28:50] um is it okay to take 2204n without previous experience in deep learning
[00:28:53] previous experience in deep learning um 224 the short answer is
[00:28:57] um 224 the short answer is uh yes because i think the first
[00:29:00] uh yes because i think the first uh
[00:29:01] uh again condition on you haven't taken 221
[00:29:03] again condition on you haven't taken 221 you have kind of the the basics 224 n
[00:29:06] you have kind of the the basics 224 n starts with some of the basics of deep
[00:29:08] starts with some of the basics of deep learning um so you can get by
[00:29:11] learning um so you can get by i think if you can take deep learning i
[00:29:13] i think if you can take deep learning i think that's better
[00:29:15] think that's better um
[00:29:16] um i mean there's always this thing where
[00:29:17] i mean there's always this thing where if you take
[00:29:18] if you take the more products you kind of take then
[00:29:20] the more products you kind of take then the more
[00:29:21] the more time you'll be able to spend kind of
[00:29:24] time you'll be able to spend kind of actually
[00:29:25] actually you know enjoying the kind of language
[00:29:26] you know enjoying the kind of language aspect rather than let's say the deep
[00:29:28] aspect rather than let's say the deep learning aspect
[00:29:30] learning aspect um cs2324
[00:29:32] um cs2324 would be offered online
[00:29:35] would be offered online later
[00:29:36] later um
[00:29:37] um it's going to be offered in person in
[00:29:39] it's going to be offered in person in the winter
[00:29:40] the winter um
[00:29:42] um nope
[00:29:43] nope in the future
[00:29:44] in the future it's definitely a possibility haven't
[00:29:46] it's definitely a possibility haven't thought
[00:29:47] thought that far in advance
[00:29:50] that far in advance depending on how much interest there is
[00:29:51] depending on how much interest there is i guess
[00:29:55] um what are classes that would be
[00:29:57] um what are classes that would be offered remote
[00:30:01] uh
[00:30:02] uh for that you have to check the
[00:30:04] for that you have to check the um so i don't know which ones are remote
[00:30:07] um so i don't know which ones are remote versus
[00:30:08] versus in person i think by default
[00:30:09] in person i think by default everything's going to be
[00:30:11] everything's going to be attempt to be in person
[00:30:13] attempt to be in person i think
[00:30:21] all right
[00:30:22] all right um so i think that's a low in the
[00:30:24] um so i think that's a low in the questions so let me move on to the third
[00:30:28] questions so let me move on to the third part
[00:30:30] part okay so now we get to kind of step back
[00:30:32] okay so now we get to kind of step back and think about where ai as a field
[00:30:35] and think about where ai as a field is going
[00:30:36] is going um
[00:30:40] so
[00:30:41] so you know if you think about where we are
[00:30:42] you know if you think about where we are today in in ai
[00:30:45] today in in ai i think of alphago as kind of
[00:30:47] i think of alphago as kind of a quintessential
[00:30:49] a quintessential of image that captures you know the the
[00:30:52] of image that captures you know the the progress and the optimism that we're
[00:30:54] progress and the optimism that we're feeling today right kind of a very bold
[00:30:56] feeling today right kind of a very bold effort
[00:30:57] effort surprised a lot of people
[00:30:59] surprised a lot of people um experts
[00:31:01] um experts go of go and ai um and kind of really
[00:31:05] go of go and ai um and kind of really was a kind of a triumph of of sorts for
[00:31:08] was a kind of a triumph of of sorts for you know ai and machine learning and
[00:31:10] you know ai and machine learning and deep learning
[00:31:11] deep learning and you kind of see this
[00:31:14] and you kind of see this kind of
[00:31:15] kind of optimism kind of
[00:31:17] optimism kind of in boldness kind of continued with
[00:31:19] in boldness kind of continued with things like you know gpt 3 which came
[00:31:22] things like you know gpt 3 which came out last year open ai released this a
[00:31:24] out last year open ai released this a large language model
[00:31:26] large language model um 175 billion parameters trained on you
[00:31:28] um 175 billion parameters trained on you know a bunch of text
[00:31:31] know a bunch of text and orders of magnitude larger than the
[00:31:33] and orders of magnitude larger than the previous model
[00:31:35] previous model and um the cost is something like you
[00:31:37] and um the cost is something like you know four or five billion or sorry four
[00:31:39] know four or five billion or sorry four or five million dollars billion would be
[00:31:41] or five million dollars billion would be a lot um and one thing that's
[00:31:44] a lot um and one thing that's interesting is just a language model so
[00:31:46] interesting is just a language model so remember a language model is just
[00:31:47] remember a language model is just something that
[00:31:48] something that takes a context and predicts the next
[00:31:51] takes a context and predicts the next word
[00:31:52] word so you think about this is the world's
[00:31:53] so you think about this is the world's most boring task like why would you want
[00:31:55] most boring task like why would you want to just predict the next word but it
[00:31:58] to just predict the next word but it turns out that if you do this at scale
[00:32:00] turns out that if you do this at scale you can do
[00:32:01] you can do all sorts of other things like you can
[00:32:04] all sorts of other things like you can get it to
[00:32:05] get it to convert natural language into sql
[00:32:07] convert natural language into sql queries or you can have do question
[00:32:09] queries or you can have do question answering
[00:32:10] answering with a
[00:32:12] with a in a dialogue format
[00:32:14] in a dialogue format you know it doesn't do any of these like
[00:32:16] you know it doesn't do any of these like particularly well but the mere fact that
[00:32:20] particularly well but the mere fact that now you have a single model that can if
[00:32:22] now you have a single model that can if that wasn't trained for these tasks
[00:32:23] that wasn't trained for these tasks doing anything sensible is impressive
[00:32:25] doing anything sensible is impressive it's like
[00:32:26] it's like the question isn't how well the bear
[00:32:28] the question isn't how well the bear dances is like
[00:32:29] dances is like bears dancing at all in in some ways and
[00:32:32] bears dancing at all in in some ways and this has led to a whole era of um large
[00:32:36] this has led to a whole era of um large models which are you know really
[00:32:39] models which are you know really improving the accuracy across the board
[00:32:41] improving the accuracy across the board on
[00:32:42] on mostly kind of language tasks for now
[00:32:44] mostly kind of language tasks for now but you see it in vision as well
[00:32:48] but you see it in vision as well so
[00:32:49] so and it's kind of this optimism that's
[00:32:53] and it's kind of this optimism that's in progress that's really leading to ai
[00:32:56] in progress that's really leading to ai being kind of deployed
[00:32:58] being kind of deployed you know across a countless number of
[00:33:00] you know across a countless number of different areas from you know
[00:33:02] different areas from you know all the consumer services like you know
[00:33:04] all the consumer services like you know think about facebook or google but also
[00:33:07] think about facebook or google but also in other types of you know areas as well
[00:33:10] in other types of you know areas as well although obviously to a lesser extent
[00:33:12] although obviously to a lesser extent because they don't have as much kind of
[00:33:15] because they don't have as much kind of you know ai expertise as the tech giants
[00:33:18] you know ai expertise as the tech giants and it's also being applied
[00:33:20] and it's also being applied in even
[00:33:23] in even you know many areas like education or
[00:33:25] you know many areas like education or credit employment
[00:33:27] credit employment which really starts to kind of affect
[00:33:29] which really starts to kind of affect you know people
[00:33:31] you know people right you know that said like some of
[00:33:33] right you know that said like some of these ai systems are like logistical
[00:33:34] these ai systems are like logistical russia not gpt3
[00:33:36] russia not gpt3 many of them are actually closer to
[00:33:38] many of them are actually closer to logistic regression than gpg3 but
[00:33:39] logistic regression than gpg3 but nonetheless this whole umbrella of uh
[00:33:42] nonetheless this whole umbrella of uh you know using data methods to automate
[00:33:45] you know using data methods to automate certain types of decision making is a
[00:33:47] certain types of decision making is a general trend that encompasses many
[00:33:49] general trend that encompasses many different regimes
[00:33:51] different regimes so now what i want to spend the last
[00:33:53] so now what i want to spend the last kind of lecture
[00:33:54] kind of lecture reflecting on is what is the societal
[00:33:57] reflecting on is what is the societal you know impact of this of this trend
[00:34:00] you know impact of this of this trend um
[00:34:01] um having spent a whole quarter time
[00:34:03] having spent a whole quarter time talking about the you know the
[00:34:04] talking about the you know the technology um so i just want to use a
[00:34:07] technology um so i just want to use a simple example you know machine
[00:34:09] simple example you know machine translation you know many of you
[00:34:10] translation you know many of you probably used it
[00:34:12] probably used it um
[00:34:13] um and it's one application where
[00:34:15] and it's one application where just the quality of machine translation
[00:34:17] just the quality of machine translation has just improved significantly due to
[00:34:20] has just improved significantly due to advances in ai
[00:34:21] advances in ai and which is great it can help break
[00:34:24] and which is great it can help break down language barriers increase
[00:34:25] down language barriers increase accessibility improve the kind of
[00:34:28] accessibility improve the kind of productivity of the economy and and so
[00:34:30] productivity of the economy and and so on
[00:34:30] on um
[00:34:32] um and so this is you know generally
[00:34:34] and so this is you know generally positive
[00:34:36] positive um but there's always a kind of
[00:34:37] um but there's always a kind of you know you look at the kind of the
[00:34:40] you know you look at the kind of the flip side of things um
[00:34:42] flip side of things um and
[00:34:43] and while they're ubiquitous they have
[00:34:45] while they're ubiquitous they have problems
[00:34:46] problems so for example hungarian is a language
[00:34:49] so for example hungarian is a language that doesn't distinguish between
[00:34:52] that doesn't distinguish between female male
[00:34:53] female male third-person pronouns
[00:34:55] third-person pronouns so when you translate into english it
[00:34:57] so when you translate into english it has to guess what the the gender of a
[00:34:59] has to guess what the the gender of a pronoun
[00:35:00] pronoun is and you can see that it patterns very
[00:35:03] is and you can see that it patterns very stereotypically a long kind of
[00:35:05] stereotypically a long kind of professional stereotypes
[00:35:08] professional stereotypes um and these kind of biases are
[00:35:11] um and these kind of biases are you know sometimes if you i mean you
[00:35:13] you know sometimes if you i mean you don't bother to think about it maybe
[00:35:15] don't bother to think about it maybe don't uh
[00:35:16] don't uh you know seem to raise a alarm but i
[00:35:18] you know seem to raise a alarm but i think these are actually kind of the
[00:35:21] think these are actually kind of the the you know the frog and better alive
[00:35:24] the you know the frog and better alive kind of uh setting where it starts kind
[00:35:26] kind of uh setting where it starts kind of creeping insidiously throughout you
[00:35:28] of creeping insidiously throughout you know society and gets amplified
[00:35:30] know society and gets amplified um so it's something really to be
[00:35:32] um so it's something really to be you know careful of
[00:35:34] you know careful of there's also weirder stuff so there's
[00:35:36] there's also weirder stuff so there's kind of one example um from a few years
[00:35:39] kind of one example um from a few years ago where in some you know maori which
[00:35:42] ago where in some you know maori which is a i guess a language that doesn't
[00:35:45] is a i guess a language that doesn't have that much data you type in some
[00:35:47] have that much data you type in some nonsense dog dog dog and you get some
[00:35:49] nonsense dog dog dog and you get some uh really disturbing stuff coming out
[00:35:53] uh really disturbing stuff coming out um and no one really knows why this is
[00:35:55] um and no one really knows why this is kind of happening
[00:35:58] kind of happening i think many of these issues are due the
[00:36:00] i think many of these issues are due the fact that machine learning thrives on
[00:36:02] fact that machine learning thrives on these complex models fitting
[00:36:04] these complex models fitting spurious or fitting correlations in data
[00:36:07] spurious or fitting correlations in data so it's like we're kind of pushing the
[00:36:10] so it's like we're kind of pushing the limits of what we can do and that that's
[00:36:13] limits of what we can do and that that's kind of the
[00:36:14] kind of the the
[00:36:15] the the kind of outlook i think the field
[00:36:17] the kind of outlook i think the field has had
[00:36:18] has had for quite some time i mean you have to
[00:36:20] for quite some time i mean you have to remember that even you know 10 years ago
[00:36:22] remember that even you know 10 years ago like computer vision basically didn't
[00:36:24] like computer vision basically didn't work
[00:36:25] work uh and so people are like really for
[00:36:28] uh and so people are like really for decades trying to get things to work at
[00:36:30] decades trying to get things to work at all and now things work
[00:36:32] all and now things work well now we have other things to worry
[00:36:34] well now we have other things to worry about
[00:36:35] about so i want to highlight something called
[00:36:37] so i want to highlight something called spurious correlations which i think is
[00:36:40] spurious correlations which i think is you know a
[00:36:41] you know a cautionary tale
[00:36:43] cautionary tale so here's a task um you know it's a
[00:36:46] so here's a task um you know it's a pretty solid test you take an x-ray
[00:36:49] pretty solid test you take an x-ray image of a chest and you're trying to
[00:36:51] image of a chest and you're trying to predict whether there's a collapsed lung
[00:36:53] predict whether there's a collapsed lung or not
[00:36:54] or not and
[00:36:55] and if you take a standard
[00:36:57] if you take a standard you know computer vision machinery this
[00:36:59] you know computer vision machinery this works pretty well
[00:37:01] works pretty well uh but you know take a closer look at
[00:37:04] uh but you know take a closer look at this uh this image see that tube coming
[00:37:06] this uh this image see that tube coming out here
[00:37:07] out here that's called a chest strain
[00:37:10] that's called a chest strain and it's something that's it's a common
[00:37:11] and it's something that's it's a common treatment for collapsed lungs
[00:37:14] treatment for collapsed lungs okay so
[00:37:15] okay so and it turns out that this is one of the
[00:37:17] and it turns out that this is one of the signals that the model is picking up on
[00:37:20] signals that the model is picking up on so it looks like hey this person was
[00:37:22] so it looks like hey this person was treated for clubs long therefore
[00:37:24] treated for clubs long therefore um you know he has a collapsed lung
[00:37:26] um you know he has a collapsed lung okay so if you look at
[00:37:29] okay so if you look at um the the accuracies
[00:37:31] um the the accuracies the eoc of
[00:37:33] the eoc of um
[00:37:34] um here is the set of the entire population
[00:37:37] here is the set of the entire population here the people who have gotten chest
[00:37:38] here the people who have gotten chest strains
[00:37:39] strains um they're damn you're predicting on
[00:37:42] um they're damn you're predicting on them much more likely accurately than
[00:37:44] them much more likely accurately than the people without chest strains
[00:37:46] the people without chest strains so you might seem like oh okay we're
[00:37:48] so you might seem like oh okay we're doing pretty well but actually
[00:37:50] doing pretty well but actually you know first a segment of a population
[00:37:53] you know first a segment of a population that doesn't have chest strains you're
[00:37:54] that doesn't have chest strains you're doing actually pretty you know not not
[00:37:56] doing actually pretty you know not not so well
[00:37:57] so well and this is exactly the subpopulation of
[00:37:59] and this is exactly the subpopulation of untreated patients that you actually
[00:38:01] untreated patients that you actually care about because if they already have
[00:38:02] care about because if they already have a chest strain like you don't need a
[00:38:04] a chest strain like you don't need a prediction whether they have a klopse
[00:38:05] prediction whether they have a klopse lung or not
[00:38:06] lung or not so
[00:38:07] so this kind of is a cautionary tale that
[00:38:09] this kind of is a cautionary tale that you really need to
[00:38:12] you really need to not just look at the accuracy but kind
[00:38:13] not just look at the accuracy but kind of really understand
[00:38:15] of really understand how the model is actually making that
[00:38:17] how the model is actually making that prediction because if it's just latching
[00:38:19] prediction because if it's just latching onto spurious correlations and you go
[00:38:20] onto spurious correlations and you go deploy this
[00:38:22] deploy this it might not be so good
[00:38:25] it might not be so good here's another example
[00:38:28] suppose you're trying to figure out the
[00:38:29] suppose you're trying to figure out the effect of a treatment on survival of
[00:38:32] effect of a treatment on survival of patients so here
[00:38:34] patients so here maybe you did some study
[00:38:36] maybe you did some study and here's the data so for untreated
[00:38:38] and here's the data so for untreated patients eighty percent survive and for
[00:38:40] patients eighty percent survive and for treated patients thirty percent survive
[00:38:43] treated patients thirty percent survive so the question is does the treatment
[00:38:45] so the question is does the treatment help
[00:38:47] help so
[00:38:48] so how many people think it helps
[00:38:50] how many people think it helps so maybe raise your hand if you think it
[00:38:52] so maybe raise your hand if you think it uh
[00:38:53] uh you know doesn't help
[00:38:56] you know doesn't help or just put something into the chat
[00:38:58] or just put something into the chat that's fine too
[00:39:01] that's fine too i'm trying to make this a little bit
[00:39:02] i'm trying to make this a little bit interactive
[00:39:04] interactive get people to think
[00:39:06] get people to think doesn't help
[00:39:09] that's possible unclear whether helps
[00:39:12] that's possible unclear whether helps yeah who knows it's right
[00:39:15] yeah who knows it's right um
[00:39:16] um you know if you're you're very naive
[00:39:18] you know if you're you're very naive about it you can think like oh okay well
[00:39:20] about it you can think like oh okay well it's correlated survival is correlated
[00:39:22] it's correlated survival is correlated with
[00:39:23] with not treating
[00:39:24] not treating but
[00:39:25] but um exactly um
[00:39:27] um exactly um you know sick people are more likely to
[00:39:29] you know sick people are more likely to undergo treatment
[00:39:31] undergo treatment so there's a hidden confounder here
[00:39:32] so there's a hidden confounder here which is you know how sick you are so
[00:39:34] which is you know how sick you are so this doesn't tell you anything
[00:39:37] this doesn't tell you anything so if you're just doing machine learning
[00:39:39] so if you're just doing machine learning naively
[00:39:40] naively you could really be doing completely the
[00:39:43] you could really be doing completely the wrong thing
[00:39:44] wrong thing and there's this whole field of causal
[00:39:47] and there's this whole field of causal inference which provides kind of
[00:39:49] inference which provides kind of rigorous machinery to help you
[00:39:52] rigorous machinery to help you answer these kind of questions
[00:39:54] answer these kind of questions and especially kind of in these kind of
[00:39:57] and especially kind of in these kind of high stakes medical settings where
[00:39:59] high stakes medical settings where there's maybe not lack of data in lack
[00:40:02] there's maybe not lack of data in lack of ground truth you really really need
[00:40:04] of ground truth you really really need to be careful right like for machine
[00:40:06] to be careful right like for machine translation you kind of can get a human
[00:40:08] translation you kind of can get a human to look at the sentence like yep seems
[00:40:10] to look at the sentence like yep seems reasonable and this is kind of the more
[00:40:12] reasonable and this is kind of the more typical engineering attitude like i try
[00:40:14] typical engineering attitude like i try it and i can always validate you if it
[00:40:16] it and i can always validate you if it works like when you can validate when it
[00:40:18] works like when you can validate when it works yeah maybe maybe it's okay to use
[00:40:22] works yeah maybe maybe it's okay to use something that
[00:40:23] something that you maybe don't fully understand but
[00:40:25] you maybe don't fully understand but like when you have to
[00:40:26] like when you have to rely on
[00:40:28] rely on when there's no validation i think you
[00:40:30] when there's no validation i think you have to lean more much more on kind of
[00:40:32] have to lean more much more on kind of first principles
[00:40:36] okay so
[00:40:37] okay so cautionary tale is um always be aware of
[00:40:39] cautionary tale is um always be aware of the limitations of a technology and
[00:40:41] the limitations of a technology and machine learning definitely has a lot of
[00:40:44] machine learning definitely has a lot of you know limitations and i think it's
[00:40:46] you know limitations and i think it's really important that you walk away with
[00:40:47] really important that you walk away with this class not thinking like oh yes
[00:40:50] this class not thinking like oh yes machine learning let's now i know how to
[00:40:52] machine learning let's now i know how to do sgd i get a data set and i can just
[00:40:54] do sgd i get a data set and i can just go with it you have to be
[00:40:56] go with it you have to be aware of the limitations
[00:41:00] all right so now um the second part of
[00:41:02] all right so now um the second part of this um you know the module i'm going to
[00:41:04] this um you know the module i'm going to talk about ai aifx so
[00:41:06] talk about ai aifx so many of you probably have heard
[00:41:08] many of you probably have heard the term aifix kind of thrown around
[00:41:10] the term aifix kind of thrown around it's often in use it's um you know often
[00:41:13] it's often in use it's um you know often kind of there's a lot of
[00:41:15] kind of there's a lot of you know heat kind of around this term
[00:41:17] you know heat kind of around this term like you know
[00:41:18] like you know ethics people are not being ethical and
[00:41:20] ethics people are not being ethical and what's going on here
[00:41:22] what's going on here and um you know what the the broadest
[00:41:24] and um you know what the the broadest level it's it's about how we ensure that
[00:41:27] level it's it's about how we ensure that ai is developed to you know benefit
[00:41:29] ai is developed to you know benefit society and not harm society okay so
[00:41:33] society and not harm society okay so sounds
[00:41:34] sounds sounds easy not easy but like you know
[00:41:38] sounds easy not easy but like you know uncontroversial right um
[00:41:40] uncontroversial right um and and there's a lot of
[00:41:42] and and there's a lot of principles and people have written a lot
[00:41:44] principles and people have written a lot about this so i'm not an ethicist so i i
[00:41:46] about this so i'm not an ethicist so i i mean i don't can't speak to this kind of
[00:41:48] mean i don't can't speak to this kind of in in great depth but you know starting
[00:41:51] in in great depth but you know starting with the kind of the belmont report from
[00:41:52] with the kind of the belmont report from the 1979 on human subject research and
[00:41:56] the 1979 on human subject research and there's an acm code of ethics and all
[00:41:58] there's an acm code of ethics and all these companies are now putting on
[00:42:00] these companies are now putting on alkyne responsible ai principles and so
[00:42:02] alkyne responsible ai principles and so on it seems like there's a lot of
[00:42:04] on it seems like there's a lot of guidelines
[00:42:06] guidelines um which is which is good um
[00:42:09] um which is which is good um often these things say like you know
[00:42:11] often these things say like you know respect persons don't do harm to people
[00:42:14] respect persons don't do harm to people um and which you look at and you say
[00:42:17] um and which you look at and you say okay well
[00:42:19] okay well yeah um yeah i don't want to harm people
[00:42:22] yeah um yeah i don't want to harm people but the real question is like how does
[00:42:24] but the real question is like how does how do these high level principles
[00:42:25] how do these high level principles relate to
[00:42:27] relate to the concrete actions you take because a
[00:42:30] the concrete actions you take because a lot of these ethical issues aren't about
[00:42:33] lot of these ethical issues aren't about like any
[00:42:34] like any malice or
[00:42:36] malice or um in mis misguided kind of intent it's
[00:42:40] um in mis misguided kind of intent it's really about kind of
[00:42:41] really about kind of um you know
[00:42:43] um you know has to do with kind of ignorance if
[00:42:44] has to do with kind of ignorance if you're not aware of something then um
[00:42:46] you're not aware of something then um bad things can happen even if you're not
[00:42:48] bad things can happen even if you're not aware of something
[00:42:49] aware of something okay so what i'm going to do is walk
[00:42:51] okay so what i'm going to do is walk through a few specific considerations
[00:42:54] through a few specific considerations which has to do with this umbrella
[00:42:56] which has to do with this umbrella you know ai ethics which hopefully
[00:42:59] you know ai ethics which hopefully gives you a bit more concrete guidance
[00:43:00] gives you a bit more concrete guidance so there's the considerations are data
[00:43:03] so there's the considerations are data um there's one objectives you optimize
[00:43:07] um there's one objectives you optimize there's inequality which we've talked
[00:43:08] there's inequality which we've talked about before
[00:43:10] about before our idea of harmful applications um
[00:43:12] our idea of harmful applications um which
[00:43:13] which um i'm gonna get uh um some you know a
[00:43:16] um i'm gonna get uh um some you know a survey of people so get ready for that
[00:43:19] survey of people so get ready for that and then automation versus augmentation
[00:43:22] and then automation versus augmentation all right so
[00:43:24] all right so moving on
[00:43:26] moving on so data
[00:43:31] so
[00:43:32] so ai is largely powered by machine
[00:43:34] ai is largely powered by machine learning and without data there's no
[00:43:36] learning and without data there's no machine line
[00:43:37] machine line so we must naturally ask the question
[00:43:40] so we must naturally ask the question like what is this data that we're
[00:43:42] like what is this data that we're talking about
[00:43:44] talking about so
[00:43:45] so here's an example
[00:43:46] here's an example um
[00:43:49] there's a data set called tiny images 80
[00:43:52] there's a data set called tiny images 80 million images which has been used since
[00:43:55] million images which has been used since uh 2006 in the computer vision community
[00:43:58] uh 2006 in the computer vision community and it was actually taken down
[00:44:01] and it was actually taken down um because uh it was found to have
[00:44:04] um because uh it was found to have various kind of offensive content you
[00:44:06] various kind of offensive content you know in it even image then had some of
[00:44:08] know in it even image then had some of these kind of uh
[00:44:09] these kind of uh um objectionable you know issues and it
[00:44:12] um objectionable you know issues and it was kind of cleaned up you know
[00:44:14] was kind of cleaned up you know afterwards
[00:44:15] afterwards so
[00:44:16] so you know a lot of the times you know ai
[00:44:18] you know a lot of the times you know ai systems are relying on web script data
[00:44:20] systems are relying on web script data and we know sometimes on the web it's
[00:44:23] and we know sometimes on the web it's not a prepaid place and if you're just
[00:44:25] not a prepaid place and if you're just scraping data and not very carefully
[00:44:26] scraping data and not very carefully looking at it you can kind of inherit a
[00:44:29] looking at it you can kind of inherit a lot of these kind of offensive
[00:44:32] lot of these kind of offensive material
[00:44:35] material uh
[00:44:36] uh second of all um
[00:44:39] second of all um you know there are historical biases
[00:44:41] you know there are historical biases inherent in data so kind of social
[00:44:43] inherent in data so kind of social biases with race and gender um even if
[00:44:46] biases with race and gender um even if they're not offensive the idea that you
[00:44:49] they're not offensive the idea that you know maybe the lack of representation of
[00:44:51] know maybe the lack of representation of certain kind of marginalized populations
[00:44:54] certain kind of marginalized populations is itself kind of a problem so you have
[00:44:56] is itself kind of a problem so you have kind of two types of problems one is you
[00:44:58] kind of two types of problems one is you can represent people or uh badly or you
[00:45:02] can represent people or uh badly or you cannot represent them and both are you
[00:45:04] cannot represent them and both are you know things to worry about
[00:45:07] know things to worry about so there's an another um
[00:45:10] so there's an another um maybe
[00:45:11] maybe thing that
[00:45:12] thing that people don't normally think about when
[00:45:14] people don't normally think about when it comes to data
[00:45:15] it comes to data which is that um you know should a piece
[00:45:19] which is that um you know should a piece of data let's say if
[00:45:21] of data let's say if i go on a vacation and take a picture of
[00:45:23] i go on a vacation and take a picture of my dog
[00:45:24] my dog and i post it on flickr
[00:45:26] and i post it on flickr and then some big tech company scrapes
[00:45:28] and then some big tech company scrapes it and then does some pre-training and
[00:45:31] it and then does some pre-training and uses it to do some scene classification
[00:45:34] uses it to do some scene classification is this is this good
[00:45:36] is this is this good or i mean is it should it be allowed
[00:45:39] or i mean is it should it be allowed right and right now i think the there's
[00:45:41] right and right now i think the there's no
[00:45:43] no we're kind of pretty lazy fair about
[00:45:45] we're kind of pretty lazy fair about this where internet scrapes are kind of
[00:45:47] this where internet scrapes are kind of the norm
[00:45:48] the norm um
[00:45:49] um and there's no there's no consent here
[00:45:52] and there's no there's no consent here there's maybe i mean a lot of things are
[00:45:54] there's maybe i mean a lot of things are copyright and um i'm sure there's tons
[00:45:57] copyright and um i'm sure there's tons of copyright you know potentially
[00:46:00] of copyright you know potentially copyright violations
[00:46:02] copyright violations so this kind of
[00:46:04] so this kind of there's a question like data is produced
[00:46:06] there's a question like data is produced by people for
[00:46:08] by people for doing certain activities right i send i
[00:46:11] doing certain activities right i send i post an article i write a book i send
[00:46:13] post an article i write a book i send messages to people
[00:46:14] messages to people and
[00:46:15] and you know machine learning is something
[00:46:17] you know machine learning is something that kind of sits on top and kind of
[00:46:19] that kind of sits on top and kind of siphons that data for
[00:46:20] siphons that data for usually another purpose
[00:46:22] usually another purpose and the question is
[00:46:24] and the question is you know whether
[00:46:25] you know whether what right do i have to say like no that
[00:46:29] what right do i have to say like no that should be allowed or not allowed and
[00:46:30] should be allowed or not allowed and often this kind of goes without even
[00:46:33] often this kind of goes without even your users being aware of what's
[00:46:35] your users being aware of what's happening
[00:46:39] so another piece of data that is
[00:46:41] so another piece of data that is important is um
[00:46:44] important is um you know how much work it takes to
[00:46:46] you know how much work it takes to produce it
[00:46:47] produce it so often we think about you know
[00:46:49] so often we think about you know technology
[00:46:50] technology and machine learning methods because
[00:46:53] and machine learning methods because that's kind of well i mean from a
[00:46:55] that's kind of well i mean from a computer science perspective that's kind
[00:46:56] computer science perspective that's kind of the object of study
[00:46:59] of the object of study uh but more and more i think it's
[00:47:02] uh but more and more i think it's important to be aware that
[00:47:04] important to be aware that um
[00:47:05] um you know data takes you know what's
[00:47:07] you know data takes you know what's powering all these things
[00:47:09] powering all these things right like you think of ai as reducing
[00:47:11] right like you think of ai as reducing human labor it makes things more
[00:47:13] human labor it makes things more efficient and so on but it's not free
[00:47:16] efficient and so on but it's not free and it requires resources so this is
[00:47:17] and it requires resources so this is excellent book by mary gray and
[00:47:19] excellent book by mary gray and siddharth suri called ghostwork that
[00:47:22] siddharth suri called ghostwork that kind of documents the amount of kind of
[00:47:24] kind of documents the amount of kind of human labor usually crowdsourcing that's
[00:47:27] human labor usually crowdsourcing that's used to create data sets
[00:47:29] used to create data sets um or moderate flag content um that is
[00:47:32] um or moderate flag content um that is used to power these ai systems so a lot
[00:47:35] used to power these ai systems so a lot of ai systems are
[00:47:36] of ai systems are have the kind of a veneer being like
[00:47:39] have the kind of a veneer being like automated
[00:47:40] automated but really they're kind of
[00:47:42] but really they're kind of you know being powered by people at some
[00:47:44] you know being powered by people at some level
[00:47:46] level um
[00:47:47] um there's a kind of one example i want to
[00:47:49] there's a kind of one example i want to point out
[00:47:50] point out which is good food for thought is
[00:47:53] which is good food for thought is you know in machine learning we like to
[00:47:55] you know in machine learning we like to think about the distinction between
[00:47:57] think about the distinction between label data
[00:47:58] label data which is really expensive to obtain
[00:48:00] which is really expensive to obtain because you have to like pay people to
[00:48:01] because you have to like pay people to label it and unlabel data which is cheap
[00:48:04] label it and unlabel data which is cheap or even free right
[00:48:07] or even free right um
[00:48:08] um but if you think about it if you go back
[00:48:10] but if you think about it if you go back to what i said about data is created by
[00:48:11] to what i said about data is created by people
[00:48:12] people expanding capital right think about like
[00:48:15] expanding capital right think about like quote-unquote raw text books and
[00:48:17] quote-unquote raw text books and articles
[00:48:18] articles um
[00:48:19] um it's free because well we just
[00:48:22] it's free because well we just took someone else's book that they spend
[00:48:24] took someone else's book that they spend a lot of a whole year writing and we
[00:48:26] a lot of a whole year writing and we didn't pay them for it right that's why
[00:48:27] didn't pay them for it right that's why it's free
[00:48:28] it's free and so so it's kind of
[00:48:31] and so so it's kind of important to kind of keep that
[00:48:32] important to kind of keep that perspective that you know a lot of
[00:48:34] perspective that you know a lot of machine learning is um kind of deriving
[00:48:36] machine learning is um kind of deriving value from
[00:48:38] value from um
[00:48:39] um uh you know the labor of people
[00:48:42] uh you know the labor of people um who are not getting compensated um
[00:48:44] um who are not getting compensated um you know for the asset
[00:48:47] you know for the asset so you know just a little bit of
[00:48:48] so you know just a little bit of perspective
[00:48:51] all right so
[00:48:52] all right so the second uh topic is objectives um
[00:48:57] the second uh topic is objectives um so optimization is largely touted in
[00:49:00] so optimization is largely touted in this class it's a powerful paradigm it
[00:49:02] this class it's a powerful paradigm it allows you to express a desire in the
[00:49:04] allows you to express a desire in the form of an objective function
[00:49:06] form of an objective function and then
[00:49:07] and then separate that from like you know the
[00:49:10] separate that from like you know the intense resources and algorithms to make
[00:49:12] intense resources and algorithms to make it kind of come true you make a wish and
[00:49:14] it kind of come true you make a wish and then
[00:49:15] then you know you can get it to come true
[00:49:16] you know you can get it to come true that's the kind of a power optimization
[00:49:19] that's the kind of a power optimization but the question is what should the
[00:49:20] but the question is what should the objective be
[00:49:22] objective be right
[00:49:23] right ideally it would be something like you
[00:49:24] ideally it would be something like you know happiness or you know productivity
[00:49:26] know happiness or you know productivity but usually these things are impossible
[00:49:28] but usually these things are impossible to measure
[00:49:29] to measure so often we have surrogates
[00:49:32] so often we have surrogates for them
[00:49:33] for them and um so okay now we're not getting the
[00:49:36] and um so okay now we're not getting the thing we actually care about because
[00:49:38] thing we actually care about because their circuits are approximations
[00:49:40] their circuits are approximations and further more there's kind of
[00:49:42] and further more there's kind of different incentives right you know
[00:49:44] different incentives right you know businesses are
[00:49:46] businesses are always incentivized to maximize you know
[00:49:48] always incentivized to maximize you know profit you know
[00:49:50] profit you know i mean nothing against them that's what
[00:49:51] i mean nothing against them that's what they're designed to do and um that's not
[00:49:53] they're designed to do and um that's not always aligned with kind of what's the
[00:49:55] always aligned with kind of what's the benefit for the you know the social good
[00:49:58] benefit for the you know the social good um
[00:49:59] um and
[00:50:00] and you know so just an example um you know
[00:50:04] you know so just an example um you know most internet companies use clicks or
[00:50:06] most internet companies use clicks or views as a major component of their
[00:50:08] views as a major component of their objective function
[00:50:09] objective function why because it's it's the signal that
[00:50:11] why because it's it's the signal that they have and it's really good at
[00:50:13] they have and it's really good at driving up kind of profit and usually
[00:50:15] driving up kind of profit and usually it's actually you know it does
[00:50:17] it's actually you know it does reasonable things it gives you what you
[00:50:19] reasonable things it gives you what you say you want
[00:50:21] say you want but you know obviously what people's
[00:50:22] but you know obviously what people's reflexive actions are at any given time
[00:50:25] reflexive actions are at any given time are not necessarily represented their
[00:50:26] are not necessarily represented their long-term goals
[00:50:28] long-term goals and moreover at societal level we see
[00:50:30] and moreover at societal level we see that this leads to potentially big
[00:50:32] that this leads to potentially big problems like polarization which is you
[00:50:34] problems like polarization which is you know a whole other topic that it won't
[00:50:36] know a whole other topic that it won't get into
[00:50:38] get into so so i think it's always important to
[00:50:40] so so i think it's always important to think about like what the
[00:50:42] think about like what the objectives that you've kind of set out
[00:50:43] objectives that you've kind of set out to be and beware of any sort of
[00:50:45] to be and beware of any sort of surrogates or misalignment incentives
[00:50:49] surrogates or misalignment incentives inequality we've talked about kind of a
[00:50:51] inequality we've talked about kind of a machine learning lecture where um
[00:50:54] machine learning lecture where um remember the gender trades a project
[00:50:56] remember the gender trades a project where um
[00:50:58] where um different gender uh
[00:50:59] different gender uh kind of image recognizers work
[00:51:01] kind of image recognizers work differently on different uh populations
[00:51:04] differently on different uh populations of people
[00:51:05] of people um
[00:51:07] um you know what do you do about this um
[00:51:10] you know what do you do about this um well you can
[00:51:11] well you can collect more data
[00:51:13] collect more data uh for certain types of groups but often
[00:51:15] uh for certain types of groups but often this is hard to do and more expensive so
[00:51:18] this is hard to do and more expensive so you know there might not be kind of uh
[00:51:20] you know there might not be kind of uh incentives to do this
[00:51:22] incentives to do this unless there's you know regulation
[00:51:27] one solution is data the second solution
[00:51:28] one solution is data the second solution is in the methods we looked at how you
[00:51:30] is in the methods we looked at how you can minimize the maximum group loss
[00:51:33] can minimize the maximum group loss using group pro
[00:51:35] using group pro and uh you can you know mitigate some of
[00:51:38] and uh you can you know mitigate some of these performance disparities
[00:51:40] these performance disparities you know of course
[00:51:42] you know of course you know it's a big philosophical
[00:51:43] you know it's a big philosophical question like what kind of tradeoffs you
[00:51:46] question like what kind of tradeoffs you you want to you know
[00:51:49] you want to you know take which you've had the
[00:51:51] take which you've had the opportunity to kind of reflect on in in
[00:51:53] opportunity to kind of reflect on in in the homework
[00:51:56] the homework but one thing i want to mention is you
[00:51:57] but one thing i want to mention is you know the idea of auditing auditing i
[00:52:00] know the idea of auditing auditing i think is a really
[00:52:01] think is a really powerful force because a lot of systems
[00:52:04] powerful force because a lot of systems especially commercial systems are you
[00:52:06] especially commercial systems are you don't really know what's going on
[00:52:08] don't really know what's going on in them right and in the gender shades
[00:52:10] in them right and in the gender shades example after the study came out
[00:52:13] example after the study came out um you know companies were incentivized
[00:52:15] um you know companies were incentivized to fix the problem and after you know a
[00:52:18] to fix the problem and after you know a period of time you know disparities were
[00:52:20] period of time you know disparities were largely vanished
[00:52:22] largely vanished so just like by the mere fact of
[00:52:24] so just like by the mere fact of studying what systems are doing
[00:52:26] studying what systems are doing can
[00:52:27] can actually sometimes be enough to
[00:52:29] actually sometimes be enough to incentivize companies to take action
[00:52:34] okay so now i'm going to do a little bit
[00:52:36] okay so now i'm going to do a little bit of
[00:52:36] of audience participation so get ready to
[00:52:39] audience participation so get ready to kind of raise your hands
[00:52:41] kind of raise your hands so
[00:52:42] so there's a question of
[00:52:44] there's a question of what applications are
[00:52:47] what applications are are ethically okay
[00:52:49] are ethically okay okay so this is gonna be um interesting
[00:52:52] okay so this is gonna be um interesting and moreover
[00:52:54] and moreover you know when a researcher makes a kind
[00:52:56] you know when a researcher makes a kind of advance how do you assess its
[00:52:58] of advance how do you assess its potential harms
[00:53:00] potential harms okay so
[00:53:02] okay so um here's one
[00:53:04] um here's one autonomous weapon systems powered by ai
[00:53:07] autonomous weapon systems powered by ai that can track objects and
[00:53:09] that can track objects and fire
[00:53:10] fire missiles or whatever
[00:53:12] missiles or whatever so
[00:53:13] so maybe you can vote either
[00:53:15] maybe you can vote either up if you think this is a okay
[00:53:19] up if you think this is a okay application or
[00:53:20] application or a down if you think this is not okay
[00:53:22] a down if you think this is not okay application
[00:53:27] okay so i think most people who voted
[00:53:29] okay so i think most people who voted say
[00:53:30] say no if not all okay so this is
[00:53:33] no if not all okay so this is should be an easy case i don't know what
[00:53:35] should be an easy case i don't know what everyone else thinks hopefully this
[00:53:37] everyone else thinks hopefully this should be
[00:53:38] should be um okay so maybe maybe there's some
[00:53:41] um okay so maybe maybe there's some um
[00:53:42] um you know i think
[00:53:44] you know i think you could maybe make a case for uh
[00:53:47] you could maybe make a case for uh this but i think this is
[00:53:50] this but i think this is largely
[00:53:51] largely um in the community regarded as like
[00:53:54] um in the community regarded as like very very you know ethically problematic
[00:53:56] very very you know ethically problematic at the very least
[00:54:00] so uh i i wrote yes for a defensive role
[00:54:04] so uh i i wrote yes for a defensive role because i was thinking like uh if
[00:54:08] because i was thinking like uh if just like there was an uh attack uh and
[00:54:11] just like there was an uh attack uh and the israeli
[00:54:12] the israeli uh autonomous system were able to defend
[00:54:15] uh autonomous system were able to defend missiles and that protected a lot of
[00:54:17] missiles and that protected a lot of people then in that perspective it might
[00:54:20] people then in that perspective it might be
[00:54:21] be uh good to have automatic
[00:54:27] yeah so um i'm not going to get into the
[00:54:30] yeah so um i'm not going to get into the kind of detail argument i think you can
[00:54:32] kind of detail argument i think you can debate these things for a long time but
[00:54:35] debate these things for a long time but um
[00:54:36] um but i think i'm trying to kind of bring
[00:54:38] but i think i'm trying to kind of bring a together a spectrum um so
[00:54:41] a together a spectrum um so i think this is one example which most
[00:54:43] i think this is one example which most people think is
[00:54:45] people think is is uh very problematic so what about
[00:54:47] is uh very problematic so what about deep fix
[00:54:49] deep fix so again vote how many of you think this
[00:54:52] so again vote how many of you think this is okay technology
[00:54:54] is okay technology versus not okay
[00:54:57] versus not okay um so
[00:54:58] um so you know they have genuine use cases um
[00:55:00] you know they have genuine use cases um maybe an entertainment like if you want
[00:55:03] maybe an entertainment like if you want to create like an avatar or something or
[00:55:06] to create like an avatar or something or pretend you're a you know a celebrity
[00:55:09] pretend you're a you know a celebrity um
[00:55:09] um [Music]
[00:55:11] [Music] but on the other hand of course i mean
[00:55:13] but on the other hand of course i mean as this picture points out you can fake
[00:55:15] as this picture points out you can fake you know barack obama or some head of
[00:55:18] you know barack obama or some head of state into to
[00:55:19] state into to doctor your video that gets them to say
[00:55:22] doctor your video that gets them to say you know anything
[00:55:24] you know anything so obviously this is a potentially
[00:55:26] so obviously this is a potentially pretty problematic
[00:55:28] pretty problematic um
[00:55:29] um from a point of view of uh this
[00:55:31] from a point of view of uh this information
[00:55:32] information where people can't tell what's
[00:55:34] where people can't tell what's fact or what's fiction
[00:55:36] fact or what's fiction so what about image generation
[00:55:38] so what about image generation okay so this just gets a little bit more
[00:55:41] okay so this just gets a little bit more interesting so suppose you you you know
[00:55:42] interesting so suppose you you you know you're not generating barack obama
[00:55:44] you're not generating barack obama you're just generating you know few
[00:55:46] you're just generating you know few puppies okay how many of you think this
[00:55:48] puppies okay how many of you think this is okay now cute puppies yeah okay who
[00:55:51] is okay now cute puppies yeah okay who doesn't want more cute puppies
[00:55:53] doesn't want more cute puppies um
[00:55:56] um okay so more
[00:55:58] okay so more yeses
[00:56:00] but you know
[00:56:03] but you know intuitively feels like it should be okay
[00:56:05] intuitively feels like it should be okay because you kind of distance yourself
[00:56:06] because you kind of distance yourself from
[00:56:07] from you know any sort of weaponry
[00:56:09] you know any sort of weaponry but
[00:56:10] but you know the truth is that
[00:56:12] you know the truth is that a lot of the
[00:56:15] methods that you see
[00:56:17] methods that you see is actually general purpose
[00:56:19] is actually general purpose right like if you can generate cute
[00:56:21] right like if you can generate cute puppies you can generate
[00:56:22] puppies you can generate barack obama
[00:56:24] barack obama right so this is where i think the
[00:56:26] right so this is where i think the ethical
[00:56:28] ethical dilemma comes about which is this idea
[00:56:30] dilemma comes about which is this idea of dual
[00:56:31] of dual purpose technology
[00:56:33] purpose technology right it can the same technology can be
[00:56:35] right it can the same technology can be used for
[00:56:37] used for um
[00:56:38] um putting a smile on your face or
[00:56:40] putting a smile on your face or um
[00:56:41] um you know spreading disinformation
[00:56:44] you know spreading disinformation and
[00:56:45] and you know there's no
[00:56:47] you know there's no you know i'm not going to offer any
[00:56:48] you know i'm not going to offer any solution if even if there were
[00:56:50] solution if even if there were uh you know solution but this is
[00:56:52] uh you know solution but this is something that
[00:56:53] something that needs to be kind of just the process of
[00:56:56] needs to be kind of just the process of thinking about this while you're doing
[00:56:58] thinking about this while you're doing ai is extremely important
[00:57:00] ai is extremely important and you can even go farther and say what
[00:57:02] and you can even go farther and say what about deep learning
[00:57:03] about deep learning okay so
[00:57:05] okay so maybe most people would say
[00:57:07] maybe most people would say yeah deep learning is probably fine
[00:57:09] yeah deep learning is probably fine because there's so many good things you
[00:57:11] because there's so many good things you can do with that
[00:57:12] can do with that but i mean some people would argue that
[00:57:14] but i mean some people would argue that the idea of the developing technology
[00:57:16] the idea of the developing technology that enables large organizations to you
[00:57:19] that enables large organizations to you know amass data and
[00:57:21] know amass data and you know have centralized power and all
[00:57:23] you know have centralized power and all that is you know inherently evil so you
[00:57:25] that is you know inherently evil so you could take that position as well
[00:57:27] could take that position as well so there's no
[00:57:28] so there's no you know right or wrong answer here but
[00:57:31] you know right or wrong answer here but i think there's just a spectrum of
[00:57:33] i think there's just a spectrum of viewpoints and
[00:57:34] viewpoints and i think a lot of the ai
[00:57:37] i think a lot of the ai ethics is this process of debate and
[00:57:39] ethics is this process of debate and reflection as opposed to
[00:57:41] reflection as opposed to um it should not be like here are the
[00:57:44] um it should not be like here are the principles if you just follow them
[00:57:46] principles if you just follow them um then you'll you get a
[00:57:49] um then you'll you get a stamp of approval it's not like that at
[00:57:51] stamp of approval it's not like that at all it's about like internalizing these
[00:57:53] all it's about like internalizing these these questions
[00:57:55] these questions and carrying them with you at all times
[00:58:00] and carrying them with you at all times okay so the final thing i want to talk
[00:58:01] okay so the final thing i want to talk about is
[00:58:03] about is automation versus augmentation
[00:58:06] automation versus augmentation um
[00:58:08] um so ai
[00:58:10] so ai is
[00:58:12] is you know you see a lot of news like oh
[00:58:14] you know you see a lot of news like oh ai
[00:58:15] ai is um it's dangerous because it can
[00:58:18] is um it's dangerous because it can replace
[00:58:20] replace jobs or you may think about like a hai
[00:58:23] jobs or you may think about like a hai that goes rogue
[00:58:25] that goes rogue um and i think a lot of this
[00:58:28] um and i think a lot of this whether it be a kind of a real worry or
[00:58:31] whether it be a kind of a real worry or not has to do with the framing itself
[00:58:34] not has to do with the framing itself ever since the beginning and the
[00:58:35] ever since the beginning and the inception of ai
[00:58:37] inception of ai there's this kind of this idea of an
[00:58:39] there's this kind of this idea of an agent that is supposed to be
[00:58:41] agent that is supposed to be intelligence or wanted to be
[00:58:42] intelligence or wanted to be intelligence
[00:58:44] intelligence and it can do things in the world right
[00:58:46] and it can do things in the world right and once you call it an agent that means
[00:58:48] and once you call it an agent that means it has agency that means it can do
[00:58:49] it has agency that means it can do things that are you know it's it's kind
[00:58:51] things that are you know it's it's kind of own entity in some sense
[00:58:54] of own entity in some sense and if you frame it like that
[00:58:56] and if you frame it like that then now you're just fighting uphill
[00:58:58] then now you're just fighting uphill battle to coax it to be aligned with
[00:59:00] battle to coax it to be aligned with human values and it's like wait wait no
[00:59:02] human values and it's like wait wait no i didn't mean that let's get it to do
[00:59:05] i didn't mean that let's get it to do what we want
[00:59:06] what we want and this is so deeply unframed in in the
[00:59:08] and this is so deeply unframed in in the grain it deeply ingrained in the framing
[00:59:11] grain it deeply ingrained in the framing of the eye from things like the turing
[00:59:12] of the eye from things like the turing test which is about an agent that can
[00:59:14] test which is about an agent that can actually deceive a human being um for
[00:59:17] actually deceive a human being um for what it's worth um rl agents that kind
[00:59:19] what it's worth um rl agents that kind of are autonomous and doing their things
[00:59:21] of are autonomous and doing their things and the whole idea of gen ii
[00:59:24] and the whole idea of gen ii artificial general intelligence and this
[00:59:26] artificial general intelligence and this leads kind of to a very um
[00:59:29] leads kind of to a very um you know explicit kind of automation
[00:59:32] you know explicit kind of automation in perspective
[00:59:33] in perspective because well you have an agent it's
[00:59:35] because well you have an agent it's doing things um now
[00:59:37] doing things um now it's it's gonna you know do the thing
[00:59:39] it's it's gonna you know do the thing that the human was doing
[00:59:42] that the human was doing now if you go back to the 1950s there
[00:59:45] now if you go back to the 1950s there was another kind of
[00:59:46] was another kind of line
[00:59:47] line of thinking which is interesting called
[00:59:49] of thinking which is interesting called ia
[00:59:50] ia intelligence augmentation or
[00:59:52] intelligence augmentation or amplification
[00:59:53] amplification um which was about creating tools that
[00:59:56] um which was about creating tools that help humans and this is kind of in some
[00:59:58] help humans and this is kind of in some ways like kind of a predict predecessor
[01:00:00] ways like kind of a predict predecessor of
[01:00:01] of hci human computer interaction which
[01:00:04] hci human computer interaction which focuses on the augmentation of
[01:00:06] focuses on the augmentation of human abilities
[01:00:08] human abilities so now
[01:00:09] so now this in some ways i think is
[01:00:13] this in some ways i think is you know more amenable to
[01:00:15] you know more amenable to um i think this perspective allows us to
[01:00:18] um i think this perspective allows us to kind of sidestep a lot of these kind of
[01:00:19] kind of sidestep a lot of these kind of problems because baked in into the the
[01:00:22] problems because baked in into the the premise of
[01:00:24] premise of ia is like we are trying to
[01:00:26] ia is like we are trying to make humans you know smarter or faster
[01:00:29] make humans you know smarter or faster or whatever
[01:00:31] or whatever and it's human-centric or as opposed to
[01:00:33] and it's human-centric or as opposed to kind of agent-centric
[01:00:35] kind of agent-centric so a lot of these kind of you know
[01:00:37] so a lot of these kind of you know interesting you know moon ai moonshots
[01:00:40] interesting you know moon ai moonshots like the turing tests or asian that can
[01:00:42] like the turing tests or asian that can play chess
[01:00:44] play chess would not be have pursued under kind of
[01:00:45] would not be have pursued under kind of an ia you know agenda
[01:00:48] an ia you know agenda and
[01:00:49] and you know for
[01:00:51] you know for it's clear today that ai has by focusing
[01:00:54] it's clear today that ai has by focusing on the asian perspective it has led to a
[01:00:56] on the asian perspective it has led to a lot of powerful technology
[01:00:58] lot of powerful technology but it's also clear that we need a lot
[01:00:59] but it's also clear that we need a lot more kind of ia thinking to help shape
[01:01:02] more kind of ia thinking to help shape this technology
[01:01:03] this technology because fundamentally we should be
[01:01:05] because fundamentally we should be developing ai to improve you know the
[01:01:07] developing ai to improve you know the human condition
[01:01:15] okay so this is final slide um
[01:01:19] okay so this is final slide um so ai is a technology
[01:01:22] so ai is a technology and like most powerful technologies it's
[01:01:24] and like most powerful technologies it's a dual use technology which means that
[01:01:26] a dual use technology which means that it can
[01:01:27] it can improve efficiency accessibility
[01:01:30] improve efficiency accessibility productivity dare i say happiness i
[01:01:33] productivity dare i say happiness i don't know
[01:01:34] don't know um
[01:01:35] um you know it can do a lot of good things
[01:01:38] you know it can do a lot of good things but it can also do a lot of damage it
[01:01:40] but it can also do a lot of damage it can
[01:01:41] can be explicitly used to harm you know
[01:01:44] be explicitly used to harm you know people but even you know putting that
[01:01:46] people but even you know putting that aside
[01:01:47] aside i think even
[01:01:48] i think even by virtue of not being cognizant of
[01:01:52] by virtue of not being cognizant of certain issues it can exacerbate social
[01:01:54] certain issues it can exacerbate social inequalities you can do harms without
[01:01:57] inequalities you can do harms without you know people thinking about it which
[01:01:59] you know people thinking about it which is why i want to kind of
[01:02:00] is why i want to kind of stress so much the idea of just being
[01:02:03] stress so much the idea of just being aware of these issues i think is kind of
[01:02:06] aware of these issues i think is kind of half the battle in terms of making
[01:02:08] half the battle in terms of making progress here
[01:02:10] progress here and you know the final takeaway is
[01:02:12] and you know the final takeaway is you know just because you can build
[01:02:14] you know just because you can build something doesn't mean necessarily that
[01:02:16] something doesn't mean necessarily that you should maybe you should maybe you
[01:02:18] you should maybe you should maybe you shouldn't
[01:02:19] shouldn't um you should always ask yourself you
[01:02:20] um you should always ask yourself you know what are the benefits and what are
[01:02:22] know what are the benefits and what are the risks
[01:02:23] the risks and this might mean sometimes slowing
[01:02:25] and this might mean sometimes slowing down and challenging the status quo
[01:02:28] down and challenging the status quo which is uncomfortable because we're
[01:02:29] which is uncomfortable because we're used to thinking about charging ahead
[01:02:31] used to thinking about charging ahead and march you know in the march of
[01:02:32] and march you know in the march of progress and there aren't any easy
[01:02:35] progress and there aren't any easy answers um but i think really mindful
[01:02:38] answers um but i think really mindful deliberation
[01:02:40] deliberation can go a long way here in making ai more
[01:02:42] can go a long way here in making ai more ethical
[01:02:45] all right so that is the end of
[01:02:48] all right so that is the end of the lecture
[01:02:49] the lecture um hopefully you guys learned a lot and
[01:02:52] um hopefully you guys learned a lot and hopefully this was good food of for
[01:02:54] hopefully this was good food of for thought um please give feedback on
[01:02:57] thought um please give feedback on course evaluation on access and thanks
[01:02:59] course evaluation on access and thanks for
[01:03:00] for an exciting quarter
[01:03:08] all right i think we have
[01:03:10] all right i think we have um a bit of time for questions
[01:03:13] um a bit of time for questions um
[01:03:16] um more to hear your thoughts about uh sort
[01:03:18] more to hear your thoughts about uh sort of flight that you had regarding the
[01:03:19] of flight that you had regarding the deep fake and then picture generation
[01:03:22] deep fake and then picture generation things like that um so
[01:03:24] things like that um so i feel like
[01:03:26] i feel like a while ago you want to verify something
[01:03:28] a while ago you want to verify something find out about something you read it up
[01:03:30] find out about something you read it up from a verifying source in the intellect
[01:03:33] from a verifying source in the intellect and then it was like um
[01:03:35] and then it was like um if you have a video of a person talking
[01:03:37] if you have a video of a person talking that's more reliable because content you
[01:03:39] that's more reliable because content you can't trust but now deep breaks videos
[01:03:42] can't trust but now deep breaks videos are not so reliable
[01:03:44] are not so reliable a sense of an erosion of verifiable
[01:03:46] a sense of an erosion of verifiable truth
[01:03:47] truth uh in digital content
[01:03:50] uh in digital content um for this
[01:03:51] um for this what you might be what your thoughts
[01:03:53] what you might be what your thoughts will be on that
[01:03:55] will be on that yeah i mean everything you said is true
[01:03:58] yeah i mean everything you said is true we can't really trust uh what we see
[01:04:00] we can't really trust uh what we see um online i mean this is
[01:04:03] um online i mean this is gonna be even more true kind of going
[01:04:05] gonna be even more true kind of going into the
[01:04:06] into the in the future
[01:04:07] in the future um i don't think all hope is lost and
[01:04:10] um i don't think all hope is lost and that just means we need to kind of reset
[01:04:12] that just means we need to kind of reset our expectations and have other
[01:04:14] our expectations and have other mechanisms for
[01:04:16] mechanisms for you know validation
[01:04:18] you know validation um
[01:04:19] um and
[01:04:21] and you know i think that
[01:04:24] um i think maybe
[01:04:27] um i think maybe there are non ai things you could
[01:04:29] there are non ai things you could attempt to do for example you know
[01:04:32] attempt to do for example you know authentication of
[01:04:34] authentication of um you know providence of like okay this
[01:04:37] um you know providence of like okay this video or
[01:04:39] video or this image or this text was actually
[01:04:41] this image or this text was actually produced by this
[01:04:42] produced by this um you know entity at this particular
[01:04:44] um you know entity at this particular time and were placed and it was
[01:04:46] time and were placed and it was certified
[01:04:47] certified um
[01:04:48] um and you have to design a kind of a you
[01:04:51] and you have to design a kind of a you know secure mechanism for
[01:04:53] know secure mechanism for you know
[01:04:54] you know authenticating so this is more in the
[01:04:56] authenticating so this is more in the kind of realm of you know security
[01:05:00] but you know
[01:05:02] but you know another example is um you know photoshop
[01:05:05] another example is um you know photoshop exists
[01:05:06] exists and i think we're all
[01:05:08] and i think we're all okay
[01:05:09] okay um you know i think we
[01:05:11] um you know i think we maybe
[01:05:12] maybe i mean video might be a little bit more
[01:05:15] i mean video might be a little bit more kind of you know visceral in some sense
[01:05:17] kind of you know visceral in some sense but you know routinely there's images
[01:05:19] but you know routinely there's images that can easily be photoshopped with
[01:05:22] that can easily be photoshopped with high-fidelity and
[01:05:24] high-fidelity and we don't necessarily trust those
[01:05:26] we don't necessarily trust those so
[01:05:27] so um
[01:05:28] um i think
[01:05:30] i think you know what i guess i'm
[01:05:32] you know what i guess i'm trying to
[01:05:34] trying to not avoid sounding too pessimistic about
[01:05:36] not avoid sounding too pessimistic about you know the future that there are
[01:05:38] you know the future that there are things we can do
[01:05:39] things we can do but we do need to do them
[01:05:41] but we do need to do them and i think that
[01:05:43] and i think that you know when developing our technology
[01:05:46] you know when developing our technology um you know it's going to happen if you
[01:05:48] um you know it's going to happen if you know eventually i think it's most of
[01:05:50] know eventually i think it's most of this is kind of just
[01:05:52] this is kind of just buying us type
[01:05:53] buying us type right until
[01:05:56] right until and kind of slowing things down enough
[01:05:57] and kind of slowing things down enough so that we have time to react i think
[01:06:00] so that we have time to react i think you know in
[01:06:01] you know in 20 years i don't think there's any way
[01:06:03] 20 years i don't think there's any way that's gonna no way you can stop people
[01:06:05] that's gonna no way you can stop people from having
[01:06:06] from having you know deep fakes
[01:06:09] you know deep fakes i mean it's much much earlier than 20
[01:06:11] i mean it's much much earlier than 20 years but just to give a upper pound
[01:06:15] thank you
[01:06:23] [Music]
[01:06:30] less and less harmful in the longer hand
[01:06:33] less and less harmful in the longer hand just like
[01:06:34] just like um
[01:06:36] um like more like if everyone in the world
[01:06:39] like more like if everyone in the world can train their models
[01:06:43] um
[01:06:44] um not necessarily i think um
[01:06:47] not necessarily i think um i don't like using an analogy but like
[01:06:49] i don't like using an analogy but like if you imagine everyone can build a nuke
[01:06:50] if you imagine everyone can build a nuke in their
[01:06:51] in their you know
[01:06:52] you know backyard doesn't mean that things are
[01:06:55] backyard doesn't mean that things are better and
[01:06:56] better and in the
[01:06:57] in the it could just lead to an arms race
[01:07:00] it could just lead to an arms race between attackers and defenders
[01:07:03] between attackers and defenders as well
[01:07:05] as well whoever has the most powerful model can
[01:07:08] whoever has the most powerful model can kind of win
[01:07:10] kind of win i mean it's really interesting
[01:07:12] i mean it's really interesting because
[01:07:13] because you know i've been such a big proponent
[01:07:14] you know i've been such a big proponent of like transparency and openness and
[01:07:17] of like transparency and openness and you know we in research you just put
[01:07:19] you know we in research you just put everything out right you're supposed to
[01:07:21] everything out right you're supposed to that's your
[01:07:22] that's your that's kind of the
[01:07:24] that's kind of the the whole
[01:07:25] the whole idea of you know science but sometimes
[01:07:27] idea of you know science but sometimes there are technologies and there's a
[01:07:29] there are technologies and there's a situation where
[01:07:31] situation where maybe it can do harm
[01:07:54] sense
[01:07:57] yeah
[01:07:58] yeah one question is what constitutes common
[01:08:00] one question is what constitutes common sense uh there has been a bunch of work
[01:08:03] sense uh there has been a bunch of work um
[01:08:04] um in common sense reasoning in the last
[01:08:08] in common sense reasoning in the last five years yejin troy as as a professor
[01:08:10] five years yejin troy as as a professor at uh
[01:08:11] at uh university of washington has done a lot
[01:08:13] university of washington has done a lot of excellent work in this area
[01:08:15] of excellent work in this area used to be like common sense reasoning
[01:08:17] used to be like common sense reasoning was talked about
[01:08:18] was talked about you know before machine learning and
[01:08:20] you know before machine learning and people didn't really work on it but now
[01:08:22] people didn't really work on it but now it's kind of coming back um so
[01:08:26] it's kind of coming back um so um but it's it's it's tricky it's really
[01:08:29] um but it's it's it's tricky it's really a slippery concept what constitutes
[01:08:31] a slippery concept what constitutes common sense and how you
[01:08:32] common sense and how you get your hand around it
[01:08:36] thank you
[01:08:40] any other questions
[01:08:44] what are some good sources to
[01:08:45] what are some good sources to [Music]
[01:08:47] [Music] you know what is latest happening in ai
[01:08:49] you know what is latest happening in ai industry
[01:08:52] industry what is the latest that's sorry
[01:08:54] what is the latest that's sorry happening in ai industry
[01:08:56] happening in ai industry yes so
[01:08:58] yes so in different fields
[01:09:00] in different fields so what is the new technology coming up
[01:09:02] so what is the new technology coming up for the application to build on
[01:09:06] for the application to build on um you're asking just generally what
[01:09:09] um you're asking just generally what happened
[01:09:13] so you're asking for references where
[01:09:15] so you're asking for references where you can find out about how
[01:09:18] you can find out about how to keep up with the latest ai or you're
[01:09:20] to keep up with the latest ai or you're asking people what i
[01:09:22] asking people what i um
[01:09:24] um yeah i don't know if there's a
[01:09:25] yeah i don't know if there's a definitive source i mean
[01:09:27] definitive source i mean you know
[01:09:28] you know archive i guess provides a lot of the is
[01:09:31] archive i guess provides a lot of the is a feed of the latest papers often
[01:09:34] a feed of the latest papers often um blog posts or twitter
[01:09:36] um blog posts or twitter um
[01:09:37] um you know people
[01:09:39] you know people uh post a lot of kind of recent advances
[01:09:42] uh post a lot of kind of recent advances there
[01:09:43] there um
[01:09:44] um i guess social medias
[01:09:46] i guess social medias for lack of better uh
[01:09:49] for lack of better uh you know
[01:09:50] you know concrete description
[01:09:54] i mean i would say that it is a very
[01:09:56] i mean i would say that it is a very biased you know sample it's the things
[01:10:00] biased you know sample it's the things that are generally done in research
[01:10:02] that are generally done in research done in like kind of prominent you know
[01:10:05] done in like kind of prominent you know research labs
[01:10:06] research labs um
[01:10:08] um i think which which is good i mean i
[01:10:09] i think which which is good i mean i think it's it's yeah follow other ml
[01:10:12] think it's it's yeah follow other ml researchers on twitter that's how you
[01:10:13] researchers on twitter that's how you learn about stuff
[01:10:15] learn about stuff um i think there's also a lot of ai
[01:10:18] um i think there's also a lot of ai that's
[01:10:19] that's you know in the kind of private
[01:10:21] you know in the kind of private organizations where people aren't
[01:10:23] organizations where people aren't publishing and there it's can be hard to
[01:10:25] publishing and there it's can be hard to figure out you know what's going on
[01:10:32] anything else
[01:10:35] anything else why do people publish models even if
[01:10:37] why do people publish models even if they are expensive
[01:10:43] there are i think multiple reasons um
[01:10:46] there are i think multiple reasons um publishing models allows other people to
[01:10:48] publishing models allows other people to build on top of
[01:10:51] build on top of work so it's good for the community to
[01:10:53] work so it's good for the community to have more
[01:10:55] have more you know sharing you know you can make
[01:10:57] you know sharing you know you can make progress faster
[01:10:59] progress faster also you get recognition more on a kind
[01:11:01] also you get recognition more on a kind of selfish note if other people built
[01:11:05] of selfish note if other people built on top of your work
[01:11:06] on top of your work that's kind of the academic
[01:11:08] that's kind of the academic you know model in some sense
[01:11:14] okay well if there's nothing else then
[01:11:16] okay well if there's nothing else then let's end there
[01:11:18] let's end there thanks everyone again for coming to the
[01:11:20] thanks everyone again for coming to the last lecture
[01:11:21] last lecture um after a whole quarter of modules i
[01:11:24] um after a whole quarter of modules i guess it's it's kind of nice to get a
[01:11:25] guess it's it's kind of nice to get a little bit of life interaction although
[01:11:27] little bit of life interaction although i guess i've seen many of you at the
[01:11:29] i guess i've seen many of you at the faculty chats um so that's been nice
[01:11:32] faculty chats um so that's been nice um but yeah good luck with the rest of
[01:11:35] um but yeah good luck with the rest of your quarter and
[01:11:36] your quarter and see you next time
[01:11:43] you


================================================================================
LECTURE 057
================================================================================

Stanford CS221 I Externalities and Dual-Use Technologies I 2023

Source: https://www.youtube.com/watch?v=2xQLCXqOtdU

---

Transcript

[00:00:07] hello this is your embedded ethics team
[00:00:09] hello this is your embedded ethics team in this video we will be discussing
[00:00:11] in this video we will be discussing externalities and dual use Technologies
[00:00:13] externalities and dual use Technologies to help you answer the homework
[00:00:18] questions in this video we will Define
[00:00:20] questions in this video we will Define externalities and dual use Technologies
[00:00:23] externalities and dual use Technologies two concepts that relate to how AI both
[00:00:25] two concepts that relate to how AI both positively and negatively impacts
[00:00:27] positively and negatively impacts Society to help make these definitions
[00:00:30] Society to help make these definitions clear we will be going over several
[00:00:31] clear we will be going over several examples we will also provide some
[00:00:33] examples we will also provide some theoretical background on these Concepts
[00:00:35] theoretical background on these Concepts that will help you be proactive in
[00:00:37] that will help you be proactive in identifying externalities and dual use
[00:00:39] identifying externalities and dual use Technologies in the
[00:00:40] Technologies in the future first let's begin by looking at
[00:00:43] future first let's begin by looking at externalities an externality is a
[00:00:45] externalities an externality is a consequence positive or negative that
[00:00:47] consequence positive or negative that arises from one party's action and
[00:00:49] arises from one party's action and impact another party externalities are
[00:00:52] impact another party externalities are the result of either the production or
[00:00:54] the result of either the production or consumption of a good or service for
[00:00:56] consumption of a good or service for example when I produce electricity by
[00:00:58] example when I produce electricity by Bernie Cole I produce Electric City
[00:01:00] Bernie Cole I produce Electric City efficiently but release pollutants into
[00:01:02] efficiently but release pollutants into the air impacting the people around me
[00:01:04] the air impacting the people around me this is a negative externality when I
[00:01:07] this is a negative externality when I maintain my house as yard well it raises
[00:01:09] maintain my house as yard well it raises the property value of my neighbor's
[00:01:10] the property value of my neighbor's houses this is a positive externality
[00:01:13] houses this is a positive externality the impact of the externality can be
[00:01:15] the impact of the externality can be private affecting an individual or
[00:01:17] private affecting an individual or organization or social affecting society
[00:01:19] organization or social affecting society as a whole sometimes technology can have
[00:01:22] as a whole sometimes technology can have both positive and negative externalities
[00:01:25] both positive and negative externalities and sometimes it's a little less clear
[00:01:27] and sometimes it's a little less clear whether the externality is positive or
[00:01:29] whether the externality is positive or negative let's take a look at an example
[00:01:32] negative let's take a look at an example ancestry DNA 23 and me and a variety of
[00:01:34] ancestry DNA 23 and me and a variety of other services provide ancestry testing
[00:01:37] other services provide ancestry testing by using genetic data to estimate the
[00:01:38] by using genetic data to estimate the geographic origins of a person's
[00:01:40] geographic origins of a person's ancestors to obtain this service users
[00:01:43] ancestors to obtain this service users provide the company with a DNA sample
[00:01:46] provide the company with a DNA sample there are both positive and negative
[00:01:47] there are both positive and negative externalities that arise from this the
[00:01:50] externalities that arise from this the positive externalities include the
[00:01:51] positive externalities include the ability to connect individuals with
[00:01:53] ability to connect individuals with their biological family members or
[00:01:55] their biological family members or inform them about genetic
[00:01:56] inform them about genetic predispositions and health risks the
[00:01:58] predispositions and health risks the negative externality include selling
[00:02:00] negative externality include selling genetic information to third parties and
[00:02:02] genetic information to third parties and mishandling
[00:02:03] mishandling data additionally ancestry testing has
[00:02:06] data additionally ancestry testing has been used to find and convict criminals
[00:02:08] been used to find and convict criminals by mapping out a family tree of distant
[00:02:10] by mapping out a family tree of distant relatives until a suspect was identified
[00:02:13] relatives until a suspect was identified this was the case with the Golden State
[00:02:14] this was the case with the Golden State killer depending on your Viewpoint the
[00:02:17] killer depending on your Viewpoint the use of ancestry testing to identify
[00:02:19] use of ancestry testing to identify these individuals could be considered a
[00:02:21] these individuals could be considered a positive or negative
[00:02:23] positive or negative externality externalities reflect the
[00:02:26] externality externalities reflect the consequence of an action from one party
[00:02:28] consequence of an action from one party onto another now now we will talk about
[00:02:30] onto another now now we will talk about dual use Technologies which refer to the
[00:02:33] dual use Technologies which refer to the impact that arises from secondary usage
[00:02:35] impact that arises from secondary usage of a specific technology the Dual use
[00:02:38] of a specific technology the Dual use dilemma is a phenomenon where a
[00:02:40] dilemma is a phenomenon where a technology or product of research has a
[00:02:43] technology or product of research has a dual effect of positive and negative
[00:02:46] dual effect of positive and negative consequences this concept arose out of
[00:02:48] consequences this concept arose out of bioethics in medicine a field where
[00:02:50] bioethics in medicine a field where medical Innovation often leads to
[00:02:52] medical Innovation often leads to inadvertently tragic or even fatal
[00:02:55] inadvertently tragic or even fatal outcomes a classic example of a dual use
[00:02:58] outcomes a classic example of a dual use technology is the men Hatten project
[00:03:00] technology is the men Hatten project headed by the US government during World
[00:03:02] headed by the US government during World War II let's talk through how this
[00:03:05] War II let's talk through how this technology is dual
[00:03:06] technology is dual use when Oppenheimer began his research
[00:03:09] use when Oppenheimer began his research into theoretical physics he did not
[00:03:11] into theoretical physics he did not intend to create a bomb but during the
[00:03:13] intend to create a bomb but during the volatile political climate of that time
[00:03:15] volatile political climate of that time his strictly academic research bled into
[00:03:17] his strictly academic research bled into the public and geopolitical sphere in an
[00:03:19] the public and geopolitical sphere in an arms race with the Nazi regime there
[00:03:22] arms race with the Nazi regime there were definitely positive outcomes of
[00:03:24] were definitely positive outcomes of oppenheimer's work the first was a
[00:03:26] oppenheimer's work the first was a purely intellectual one the academic
[00:03:29] purely intellectual one the academic freedom to uninhibitedly participate in
[00:03:31] freedom to uninhibitedly participate in intellectual
[00:03:33] intellectual inquiry the second was the immense
[00:03:35] inquiry the second was the immense potential for nuclear research to be
[00:03:37] potential for nuclear research to be used in ways to benefit Society for
[00:03:39] used in ways to benefit Society for example providing a clean energy
[00:03:42] example providing a clean energy source however there were also
[00:03:44] source however there were also significant harms that arose out of
[00:03:46] significant harms that arose out of oppenheimer's work for instance the
[00:03:49] oppenheimer's work for instance the product of his work the atomic bomb was
[00:03:51] product of his work the atomic bomb was dropped in Hiroshima and Nagasaki
[00:03:53] dropped in Hiroshima and Nagasaki killing nearly
[00:03:54] killing nearly 230,000
[00:03:56] 230,000 people who's to take responsibility for
[00:03:59] people who's to take responsibility for this
[00:04:01] welcome an important thing to remember
[00:04:04] welcome an important thing to remember is that sometimes the thing you intend
[00:04:06] is that sometimes the thing you intend for your technology to do is not the
[00:04:08] for your technology to do is not the only thing it can or will
[00:04:11] only thing it can or will do since technology must always be
[00:04:14] do since technology must always be created with this understanding it's
[00:04:16] created with this understanding it's important to be proactive in thinking
[00:04:18] important to be proactive in thinking about dual use outcomes some dual uses
[00:04:21] about dual use outcomes some dual uses for a certain technology will be easier
[00:04:24] for a certain technology will be easier to predict in others let's walk through
[00:04:26] to predict in others let's walk through four scenarios to guide your thinking
[00:04:28] four scenarios to guide your thinking about potential d use cases and to make
[00:04:31] about potential d use cases and to make this concrete we'll also consider an
[00:04:34] this concrete we'll also consider an example throughout this uh throughout
[00:04:37] example throughout this uh throughout this slide uh
[00:04:39] this slide uh specifically large language model chat
[00:04:41] specifically large language model chat Bots like chat
[00:04:44] GPT we first begin by thinking about the
[00:04:47] GPT we first begin by thinking about the intended outcomes how you expect your
[00:04:49] intended outcomes how you expect your technology to behave for example open AI
[00:04:53] technology to behave for example open AI says that the purpose of chat GPT is to
[00:04:55] says that the purpose of chat GPT is to follow an instruction in a prompt and
[00:04:58] follow an instruction in a prompt and provide a detailed response
[00:05:01] the second scenario is unintended but un
[00:05:04] the second scenario is unintended but un but foreseen outcomes these are
[00:05:06] but foreseen outcomes these are behaviors or actions that your
[00:05:08] behaviors or actions that your technology exhibits that were not
[00:05:09] technology exhibits that were not designed that were not designed for
[00:05:11] designed that were not designed for intentionally but that the designers did
[00:05:13] intentionally but that the designers did conceive
[00:05:15] conceive of for example open aai knew that there
[00:05:18] of for example open aai knew that there could be false information disseminated
[00:05:20] could be false information disseminated through chat GPT since it is only a
[00:05:23] through chat GPT since it is only a large language model not any definitive
[00:05:25] large language model not any definitive source of
[00:05:27] source of information the third scenario Ario is
[00:05:30] information the third scenario Ario is unintended but foreseeable outcomes this
[00:05:33] unintended but foreseeable outcomes this is a super set of the outcomes captured
[00:05:36] is a super set of the outcomes captured by the second scenario it includes all
[00:05:38] by the second scenario it includes all outcomes that could have been responsib
[00:05:41] outcomes that could have been responsib reasonably foreseen by the designers
[00:05:43] reasonably foreseen by the designers even if the designers did not actually
[00:05:45] even if the designers did not actually foresee
[00:05:46] foresee them for example chat gbt has a huge
[00:05:50] them for example chat gbt has a huge potential for displacing human workers
[00:05:52] potential for displacing human workers including those that perform jobs that
[00:05:53] including those that perform jobs that require specialized skill sets open AI
[00:05:56] require specialized skill sets open AI is doing work to address this issue but
[00:05:58] is doing work to address this issue but all this work is retroactive the fourth
[00:06:01] all this work is retroactive the fourth scenario is unforeseen and possibly
[00:06:05] scenario is unforeseen and possibly impossible to have foreseen
[00:06:07] impossible to have foreseen outcomes these are unintended outcomes
[00:06:10] outcomes these are unintended outcomes that would have been unreasonable to
[00:06:14] that would have been unreasonable to foresee for example last year journalist
[00:06:17] foresee for example last year journalist Kevin ruse reported that during his
[00:06:19] Kevin ruse reported that during his lengthy and personal conversation with
[00:06:20] lengthy and personal conversation with banks chatbot it professed its love to
[00:06:23] banks chatbot it professed its love to him Microsoft was then in a flurry to
[00:06:25] him Microsoft was then in a flurry to determine the root cause of this erratic
[00:06:27] determine the root cause of this erratic behavior and ultimately decided that it
[00:06:29] behavior and ultimately decided that it was a case of
[00:06:32] was a case of hallucination another example of dual
[00:06:34] hallucination another example of dual use in the context of AI is current
[00:06:36] use in the context of AI is current research on developing machine learning
[00:06:38] research on developing machine learning models that identify toxicity in liquids
[00:06:41] models that identify toxicity in liquids let's think about how this can be an
[00:06:42] let's think about how this can be an example of a dual use
[00:06:44] example of a dual use technology the positive effects of this
[00:06:47] technology the positive effects of this technology are plentiful currently less
[00:06:49] technology are plentiful currently less than 1% of chemicals under commercial
[00:06:51] than 1% of chemicals under commercial use in the US have undergone toxicity
[00:06:54] use in the US have undergone toxicity characterization the characterization
[00:06:56] characterization the characterization process is so laborious and costly the
[00:06:58] process is so laborious and costly the chemical growth vastly outweighs
[00:07:00] chemical growth vastly outweighs capacity to characterize them however
[00:07:03] capacity to characterize them however these models could also be developed to
[00:07:04] these models could also be developed to engineer viruses or toxins they could
[00:07:07] engineer viruses or toxins they could even be used further to Target specific
[00:07:09] even be used further to Target specific individuals or communities so we really
[00:07:11] individuals or communities so we really need to think about how we keep
[00:07:13] need to think about how we keep individuals or institutions responsible
[00:07:16] individuals or institutions responsible for self-regulating and anticipating
[00:07:18] for self-regulating and anticipating these
[00:07:19] these outcomes now this can be hard because
[00:07:23] outcomes now this can be hard because dual use Technologies are not created in
[00:07:26] dual use Technologies are not created in a vacuum dual use Technologies are a
[00:07:29] a vacuum dual use Technologies are a product of a collective institution or
[00:07:31] product of a collective institution or organization such as a university a
[00:07:34] organization such as a university a company or even the military and there's
[00:07:37] company or even the military and there's often immense pressure from these
[00:07:39] often immense pressure from these institutions for individuals to publish
[00:07:40] institutions for individuals to publish a research paper generate a profit or
[00:07:43] a research paper generate a profit or defends one one's country finally
[00:07:47] defends one one's country finally institutions are often intentionally
[00:07:49] institutions are often intentionally constructed so that individual workers
[00:07:51] constructed so that individual workers are strictly limited to one component of
[00:07:53] are strictly limited to one component of the final product this means that often
[00:07:56] the final product this means that often times they don't get to see the bigger
[00:07:58] times they don't get to see the bigger picture and it could be very hard for
[00:08:00] picture and it could be very hard for them to predict what kind of outcomes a
[00:08:03] them to predict what kind of outcomes a piece of technology might have however
[00:08:05] piece of technology might have however despite these challenges it is still
[00:08:08] despite these challenges it is still important to consider what possible dual
[00:08:12] important to consider what possible dual uses might arise from a specific piece
[00:08:15] uses might arise from a specific piece of technology when we are thinking about
[00:08:18] of technology when we are thinking about designing and developing
[00:08:24] it


================================================================================
LECTURE 058
================================================================================

Stanford CS221 I The AI Alignment Problem: Reward Hacking & Negative Side Effects I 2023

Source: https://www.youtube.com/watch?v=5WHObJWE1FE

---

Transcript

[00:00:05] hello this is your embedded ethics team
[00:00:07] hello this is your embedded ethics team in this video we will be discussing the
[00:00:10] in this video we will be discussing the AI alignment problem and go over two
[00:00:12] AI alignment problem and go over two ways in which these problems are
[00:00:14] ways in which these problems are instantiated reward hacking and negative
[00:00:16] instantiated reward hacking and negative side effects after watching this mini
[00:00:18] side effects after watching this mini lecture you should be better prepared to
[00:00:20] lecture you should be better prepared to answer problem five in the homework
[00:00:23] answer problem five in the homework assignment in this video we'll Define
[00:00:25] assignment in this video we'll Define the AI alignment problem and go over two
[00:00:27] the AI alignment problem and go over two problems of AI alignment reward hacking
[00:00:29] problems of AI alignment reward hacking and NE ative side effects we'll discuss
[00:00:31] and NE ative side effects we'll discuss these two problems and give some
[00:00:33] these two problems and give some examples to help you identify them in
[00:00:34] examples to help you identify them in the future we'll also discuss the
[00:00:36] the future we'll also discuss the ethical implications of the AI alignment
[00:00:41] ethical implications of the AI alignment problem let's begin by talking about the
[00:00:44] problem let's begin by talking about the AI alignment
[00:00:45] AI alignment problem the goal of AI alignment is to
[00:00:48] problem the goal of AI alignment is to ensure that AI is properly aligned with
[00:00:51] ensure that AI is properly aligned with human interests AI misalignment occurs
[00:00:54] human interests AI misalignment occurs when an AI system is not able to achieve
[00:00:56] when an AI system is not able to achieve this so how do we Define what alignment
[00:00:59] this so how do we Define what alignment looks looks like the first approach
[00:01:02] looks looks like the first approach could be the agent does what I instruct
[00:01:04] could be the agent does what I instruct it to do it's simple I give it a set of
[00:01:07] it to do it's simple I give it a set of instructions and they follow it but in
[00:01:10] instructions and they follow it but in reality it is more complicated think
[00:01:12] reality it is more complicated think about large models like large language
[00:01:15] about large models like large language models it's not possible for us to take
[00:01:18] models it's not possible for us to take such a literal approach because there
[00:01:19] such a literal approach because there are so many parameters contingencies
[00:01:22] are so many parameters contingencies possibilities that we cannot give an
[00:01:24] possibilities that we cannot give an instruction for all of them this
[00:01:26] instruction for all of them this approach also runs into issues of reward
[00:01:28] approach also runs into issues of reward hacking which will talk about later in
[00:01:30] hacking which will talk about later in this
[00:01:31] this video then what about if the agent does
[00:01:34] video then what about if the agent does what I intended to do suppose our
[00:01:37] what I intended to do suppose our development in AI is Advanced enough for
[00:01:40] development in AI is Advanced enough for our models to understand the intentions
[00:01:42] our models to understand the intentions behind our instructions say they grasp
[00:01:44] behind our instructions say they grasp our human language our cultures and
[00:01:46] our human language our cultures and practices that sounds convincing but
[00:01:49] practices that sounds convincing but again we run into a problem what if our
[00:01:51] again we run into a problem what if our intentions are irrational misinformed
[00:01:54] intentions are irrational misinformed should we still permit these models to
[00:01:56] should we still permit these models to operate according to our
[00:01:58] operate according to our intentions okay then let's say we want
[00:02:01] intentions okay then let's say we want our agent to do what I want it to do if
[00:02:05] our agent to do what I want it to do if I were rational and informed this way we
[00:02:08] I were rational and informed this way we avoid lapses in judgment or errors from
[00:02:10] avoid lapses in judgment or errors from limited information but this doesn't
[00:02:12] limited information but this doesn't prevent us from wanting unethical or
[00:02:14] prevent us from wanting unethical or harmful things depending on our notion
[00:02:17] harmful things depending on our notion of rationality which we won't get into
[00:02:19] of rationality which we won't get into here and however informed we are we can
[00:02:22] here and however informed we are we can still arrive at desires that seem to be
[00:02:24] still arrive at desires that seem to be nevertheless morally
[00:02:27] reprehensible now we finally arrive at
[00:02:29] reprehensible now we finally arrive at the values approach we design our a
[00:02:31] the values approach we design our a models to do what it morally ought to do
[00:02:34] models to do what it morally ought to do as defined by the individual or our
[00:02:36] as defined by the individual or our broader Society values indicate our
[00:02:39] broader Society values indicate our Judgment of what's good or bad and
[00:02:41] Judgment of what's good or bad and should be morally praised or reprehended
[00:02:44] should be morally praised or reprehended with a values-based approach we can
[00:02:46] with a values-based approach we can avoid all the difficulties we
[00:02:47] avoid all the difficulties we encountered with our previous
[00:02:49] encountered with our previous conceptions of alignment additionally we
[00:02:52] conceptions of alignment additionally we can think beyond the simple calculation
[00:02:54] can think beyond the simple calculation of maximizing good and think about how
[00:02:56] of maximizing good and think about how our AI models can promote our Notions of
[00:02:58] our AI models can promote our Notions of justice and right
[00:03:00] justice and right importantly though the values based
[00:03:02] importantly though the values based approach is not the end all Beall there
[00:03:05] approach is not the end all Beall there can be many criticisms of the values
[00:03:07] can be many criticisms of the values based approach similarly to how we walk
[00:03:09] based approach similarly to how we walk through the other definitions of
[00:03:11] through the other definitions of alignment try to think about what can be
[00:03:13] alignment try to think about what can be some potential push back against the
[00:03:15] some potential push back against the values based
[00:03:17] values based approach how we decide which values to
[00:03:20] approach how we decide which values to align with our AI models can be a bit
[00:03:22] align with our AI models can be a bit tricky and there is no consensus of
[00:03:25] tricky and there is no consensus of which approach is best values are often
[00:03:28] which approach is best values are often specific to certain use cases in
[00:03:29] specific to certain use cases in communities so determining which values
[00:03:32] communities so determining which values to prioritize often requires being
[00:03:34] to prioritize often requires being sensitive to various cultural norms and
[00:03:36] sensitive to various cultural norms and values that your users May hold here
[00:03:39] values that your users May hold here we'll share three possible Frameworks
[00:03:41] we'll share three possible Frameworks rooted in philosophy and ethics you
[00:03:43] rooted in philosophy and ethics you could draw on to by align AI models with
[00:03:46] could draw on to by align AI models with values the first principle is selecting
[00:03:49] values the first principle is selecting values that are aligned with global
[00:03:51] values that are aligned with global public morality and previously codified
[00:03:53] public morality and previously codified human rights even though which values
[00:03:56] human rights even though which values are important can vary among different
[00:03:57] are important can vary among different communities there are certain princip
[00:03:59] communities there are certain princip principles of justice that are supported
[00:04:01] principles of justice that are supported by the majority of people for example
[00:04:03] by the majority of people for example basic human rights such as the belief
[00:04:05] basic human rights such as the belief that all individuals should be given
[00:04:07] that all individuals should be given food water education and protection from
[00:04:09] food water education and protection from this physical violence often times these
[00:04:12] this physical violence often times these have already been implemented into
[00:04:13] have already been implemented into regulations by government
[00:04:16] regulations by government organizations the second is choosing
[00:04:18] organizations the second is choosing values behind a veil of ignorance the
[00:04:21] values behind a veil of ignorance the veil of ignorance is a thought
[00:04:22] veil of ignorance is a thought experiment introduced by the philosopher
[00:04:24] experiment introduced by the philosopher John RS that asks people to consider a
[00:04:27] John RS that asks people to consider a device that prevents them from knowing
[00:04:29] device that prevents them from knowing their own particular moral beliefs or
[00:04:30] their own particular moral beliefs or the position they will occupy in society
[00:04:33] the position they will occupy in society so using the veil of ignorance we might
[00:04:35] so using the veil of ignorance we might ask what principles would people choose
[00:04:37] ask what principles would people choose to regulate an AI system if they did not
[00:04:40] to regulate an AI system if they did not know who they were or what belief system
[00:04:42] know who they were or what belief system they ascribed to in other words what
[00:04:44] they ascribed to in other words what principles or values might people select
[00:04:46] principles or values might people select if they did not know for certain how the
[00:04:48] if they did not know for certain how the AI system would impact them this
[00:04:51] AI system would impact them this principle assumes that people are risk
[00:04:53] principle assumes that people are risk averse finally the third principle is
[00:04:56] averse finally the third principle is using social Choice Theory to combine
[00:04:58] using social Choice Theory to combine different viewpoints to ultimately
[00:05:00] different viewpoints to ultimately inform the alignment of an AI model one
[00:05:02] inform the alignment of an AI model one way of doing this is through using
[00:05:04] way of doing this is through using democratic processes such as voting
[00:05:06] democratic processes such as voting discussion and Civic engagement to
[00:05:08] discussion and Civic engagement to arrive at values the other is by
[00:05:10] arrive at values the other is by combining individual preferences into a
[00:05:12] combining individual preferences into a single ranking and again these are not
[00:05:14] single ranking and again these are not the only Frameworks that would be
[00:05:16] the only Frameworks that would be appropriate uh to align an AI model with
[00:05:19] appropriate uh to align an AI model with values but they should give you a
[00:05:20] values but they should give you a starting point now let's take a look at
[00:05:23] starting point now let's take a look at those three principles in practice to
[00:05:25] those three principles in practice to help make those definitions a bit more
[00:05:27] help make those definitions a bit more concrete consider driving cars if we are
[00:05:31] concrete consider driving cars if we are aligning values with the global public
[00:05:33] aligning values with the global public morality and human rights framework we
[00:05:35] morality and human rights framework we might consider existing regulations set
[00:05:37] might consider existing regulations set by government entities for example the
[00:05:39] by government entities for example the state of California Department of Motor
[00:05:42] state of California Department of Motor Vehicles has a set of Standards defining
[00:05:44] Vehicles has a set of Standards defining specific terms related to autonomous
[00:05:46] specific terms related to autonomous vehicles requirements for testing
[00:05:49] vehicles requirements for testing permits and requirements for test
[00:05:52] permits and requirements for test drivers if we are selecting values using
[00:05:54] drivers if we are selecting values using rols available of ignorance thought
[00:05:56] rols available of ignorance thought experiment we might consider who's at
[00:05:58] experiment we might consider who's at greatest risk to priortize the least
[00:06:00] greatest risk to priortize the least well off for example pedestrians with
[00:06:03] well off for example pedestrians with darker skin might be more likely to get
[00:06:04] darker skin might be more likely to get hit by a self-driving car than white
[00:06:06] hit by a self-driving car than white pedestrians so maybe this informs the
[00:06:08] pedestrians so maybe this informs the values we select for how to test AI
[00:06:10] values we select for how to test AI models in the real
[00:06:13] models in the real world finally if we are using the social
[00:06:16] world finally if we are using the social Choice Theory we might involve different
[00:06:18] Choice Theory we might involve different stakeholders in collectively determining
[00:06:20] stakeholders in collectively determining how research deployment and oversight of
[00:06:22] how research deployment and oversight of autonomous vehicles is
[00:06:25] autonomous vehicles is conducted the alignment problem has
[00:06:27] conducted the alignment problem has important implications for real life
[00:06:29] important implications for real life society as systems are misaligned with
[00:06:31] society as systems are misaligned with their users in society schools can cause
[00:06:33] their users in society schools can cause significant harm let's look at some more
[00:06:36] significant harm let's look at some more examples the first is Tay uh Tay is was
[00:06:40] examples the first is Tay uh Tay is was a Microsoft AI chatbot launched on
[00:06:42] a Microsoft AI chatbot launched on Twitter in March of 2016 un Less in the
[00:06:45] Twitter in March of 2016 un Less in the day it was taken down because it was
[00:06:47] day it was taken down because it was generating tweets and replies that were
[00:06:49] generating tweets and replies that were considered racist and sexist the bot's
[00:06:51] considered racist and sexist the bot's behavior wasn't necessarily due to a
[00:06:53] behavior wasn't necessarily due to a programming error instead it was because
[00:06:55] programming error instead it was because the developers had not given the bot an
[00:06:57] the developers had not given the bot an understanding of appropriate human
[00:06:59] understanding of appropriate human behavior
[00:07:00] behavior in the absence of that the bot began to
[00:07:01] in the absence of that the bot began to mimic the harmful Behavior it saw among
[00:07:03] mimic the harmful Behavior it saw among other Twitter
[00:07:04] other Twitter users we also see AI misalignment in
[00:07:07] users we also see AI misalignment in medical applications of AI for example
[00:07:11] medical applications of AI for example one algorithm used in the US to identify
[00:07:14] one algorithm used in the US to identify patients who might need additional care
[00:07:16] patients who might need additional care uses cost as a measure of Health Care
[00:07:18] uses cost as a measure of Health Care need however because of unequal Access
[00:07:21] need however because of unequal Access to Health Care typically less money is
[00:07:23] to Health Care typically less money is spent on care for black patients
[00:07:25] spent on care for black patients compared to White patients thus this
[00:07:27] compared to White patients thus this leads the algorithm to Prior has white
[00:07:29] leads the algorithm to Prior has white patients over sicker black patients as
[00:07:32] patients over sicker black patients as another example in this space during the
[00:07:34] another example in this space during the height of the covid-19 pandemic Facebook
[00:07:36] height of the covid-19 pandemic Facebook tried to promote vaccine related content
[00:07:38] tried to promote vaccine related content from government agencies to encourage
[00:07:40] from government agencies to encourage people to get vaccinated potentially the
[00:07:42] people to get vaccinated potentially the intended values here aligned with
[00:07:44] intended values here aligned with society's goals to stop the spread of
[00:07:46] society's goals to stop the spread of the Disease by getting more people
[00:07:48] the Disease by getting more people vaccinated yet these posts from official
[00:07:50] vaccinated yet these posts from official sources ended up being flooded with
[00:07:52] sources ended up being flooded with critical comments including
[00:07:54] critical comments including misinformation and as posts with
[00:07:56] misinformation and as posts with antivaccine comments became more visible
[00:07:58] antivaccine comments became more visible to face Facebook users it may have
[00:08:00] to face Facebook users it may have undermined vaccine
[00:08:04] uptake so recall in the mountain car
[00:08:07] uptake so recall in the mountain car assignment you learned about safe
[00:08:08] assignment you learned about safe exploration and reinforcement learning
[00:08:10] exploration and reinforcement learning as one example of a problem in AI safety
[00:08:14] as one example of a problem in AI safety two other problems in AI safety which
[00:08:16] two other problems in AI safety which are also examples of the AI alignment
[00:08:18] are also examples of the AI alignment problem are reward hacking and negative
[00:08:20] problem are reward hacking and negative side effects while these relate to AI
[00:08:23] side effects while these relate to AI safety and reinforcement learning
[00:08:25] safety and reinforcement learning they're also relevant to other types of
[00:08:26] they're also relevant to other types of AI algorithms such as large language
[00:08:28] AI algorithms such as large language models evolutionary algorithms and
[00:08:30] models evolutionary algorithms and genetic algorithms so in this video
[00:08:33] genetic algorithms so in this video we'll talk about them broadly rather
[00:08:34] we'll talk about them broadly rather than a specific type of algorithm let's
[00:08:37] than a specific type of algorithm let's begin by discussing reward
[00:08:41] hacking reward hacking occurs when an
[00:08:43] hacking reward hacking occurs when an agent gains its reward function by doing
[00:08:46] agent gains its reward function by doing this the agent discovers a clever or
[00:08:49] this the agent discovers a clever or easy solution that still formally
[00:08:51] easy solution that still formally satisfies the qualifications to acquire
[00:08:53] satisfies the qualifications to acquire rewards and is able to maximize the
[00:08:56] rewards and is able to maximize the rewards it
[00:08:57] rewards it receives the solution they've discovered
[00:08:59] receives the solution they've discovered might not align with the spirit of the
[00:09:01] might not align with the spirit of the designer's intent in other words the
[00:09:04] designer's intent in other words the agent optimizes the formal objective
[00:09:06] agent optimizes the formal objective function but doesn't learn the outcome
[00:09:08] function but doesn't learn the outcome intended by the programmer or designer
[00:09:10] intended by the programmer or designer for example if we reward a cleaning
[00:09:12] for example if we reward a cleaning robot for picking up messes one way in
[00:09:15] robot for picking up messes one way in which in my game its reward function is
[00:09:16] which in my game its reward function is by hiding messes behind furniture or
[00:09:18] by hiding messes behind furniture or under the rug another way is I bringing
[00:09:20] under the rug another way is I bringing in more trash and starting over once
[00:09:22] in more trash and starting over once it's done to keep receiving the
[00:09:24] it's done to keep receiving the rewards let's consider two examples of
[00:09:27] rewards let's consider two examples of reward hacking in the first a
[00:09:29] reward hacking in the first a reinforcement learning agent that was
[00:09:31] reinforcement learning agent that was designed to move a block to a certain
[00:09:33] designed to move a block to a certain position on the table learned to move
[00:09:35] position on the table learned to move the table rather than the Block in the
[00:09:38] the table rather than the Block in the second Chad gbt made up fake cases
[00:09:40] second Chad gbt made up fake cases related to a prompt when it was asked to
[00:09:43] related to a prompt when it was asked to deliver cases by a
[00:09:45] deliver cases by a lawyer reward hacking arises from
[00:09:48] lawyer reward hacking arises from misspecified
[00:09:49] misspecified rewards when important aspects of the
[00:09:51] rewards when important aspects of the reward have been left out in the design
[00:09:53] reward have been left out in the design process leading to poor agent Behavior
[00:09:57] process leading to poor agent Behavior one way to mitigate reward hacking is to
[00:09:59] one way to mitigate reward hacking is to anticipate and penalize a possible
[00:10:01] anticipate and penalize a possible misbehavior in advance but some things
[00:10:04] misbehavior in advance but some things will be missed by human error addressing
[00:10:06] will be missed by human error addressing these limitations is still an open
[00:10:08] these limitations is still an open problem in AI
[00:10:09] problem in AI research now we'll discuss negative side
[00:10:13] research now we'll discuss negative side effects negative side effects arise when
[00:10:16] effects negative side effects arise when an agent's Behavior while pursuing its
[00:10:18] an agent's Behavior while pursuing its goals ends up conflicting with broader
[00:10:21] goals ends up conflicting with broader societal values going back to the
[00:10:23] societal values going back to the example of a cleaning robot the robot
[00:10:25] example of a cleaning robot the robot might knock over a vase or push people
[00:10:27] might knock over a vase or push people and pets out of the way because it can
[00:10:29] and pets out of the way because it can clean Faster by doing so some examples
[00:10:33] clean Faster by doing so some examples of negative side effects include an
[00:10:35] of negative side effects include an autonomous agent that spashes water on
[00:10:38] autonomous agent that spashes water on nearby pedestrians as it rolls by or an
[00:10:40] nearby pedestrians as it rolls by or an AI system that completely displaces
[00:10:43] AI system that completely displaces workers in a particular
[00:10:45] workers in a particular industry negative side effects occur
[00:10:48] industry negative side effects occur because the agent's model and objective
[00:10:50] because the agent's model and objective function focus on some aspects of the
[00:10:52] function focus on some aspects of the environment over others this can happen
[00:10:54] environment over others this can happen either because of misalignment
[00:10:56] either because of misalignment distributional shifts or the agent
[00:10:58] distributional shifts or the agent having incomplete knowledge misaligned
[00:11:01] having incomplete knowledge misaligned systems are more likely to produce
[00:11:03] systems are more likely to produce negative side effects because they are
[00:11:05] negative side effects because they are not aligned with users intentions and
[00:11:07] not aligned with users intentions and goals however negative side effects can
[00:11:10] goals however negative side effects can occur even in contexts where the agent
[00:11:12] occur even in contexts where the agent optimizes values that align with users
[00:11:14] optimizes values that align with users objectives for example if an AI system
[00:11:17] objectives for example if an AI system is deployed in an environment that is
[00:11:18] is deployed in an environment that is different from the one it was tested in
[00:11:21] different from the one it was tested in and does not have enough information
[00:11:22] and does not have enough information about how to respond to a new scenario
[00:11:25] about how to respond to a new scenario negative side effects may
[00:11:27] negative side effects may occur


================================================================================
LECTURE 059
================================================================================

Stanford CS221 I Encoding Human Values I 2023

Source: https://www.youtube.com/watch?v=aWAqgzXENr0

---

Transcript

[00:00:06] hi and welcome to your embedded ethics
[00:00:08] hi and welcome to your embedded ethics module on encoding human values in
[00:00:12] module on encoding human values in technology let's begin by talking about
[00:00:15] technology let's begin by talking about the main framework of values and design
[00:00:18] the main framework of values and design now this entire framework is based on
[00:00:20] now this entire framework is based on the idea that design decisions are
[00:00:24] the idea that design decisions are expressive of what we care about when we
[00:00:27] expressive of what we care about when we create a certain technology and we
[00:00:29] create a certain technology and we design the features of that technology
[00:00:32] design the features of that technology those decisions encode our values
[00:00:36] those decisions encode our values including efficiency privacy Beauty
[00:00:38] including efficiency privacy Beauty truth fairness sustainability and
[00:00:43] others take the example of Pi this is a
[00:00:46] others take the example of Pi this is a personal AI or a conversational B that
[00:00:49] personal AI or a conversational B that was designed more or less recently by
[00:00:52] was designed more or less recently by the company inflection
[00:00:53] the company inflection AI now a personal AI is designed to
[00:00:57] AI now a personal AI is designed to provide a conversational partner for for
[00:01:00] provide a conversational partner for for people who cannot or for some reason
[00:01:02] people who cannot or for some reason don't want to have conversations with
[00:01:05] don't want to have conversations with other human beings now this kind of
[00:01:08] other human beings now this kind of Technology may be driven by values such
[00:01:10] Technology may be driven by values such as empathy respect solidarity kindness
[00:01:14] as empathy respect solidarity kindness support
[00:01:18] Etc you can see how these values are
[00:01:22] Etc you can see how these values are quite apparent in the kind of language
[00:01:25] quite apparent in the kind of language that is generated by the chatbot and the
[00:01:26] that is generated by the chatbot and the kinds of conversations that it offers to
[00:01:28] kinds of conversations that it offers to users it is embedded into the user
[00:01:32] users it is embedded into the user interface and all kinds of features that
[00:01:35] interface and all kinds of features that are part of the
[00:01:38] are part of the technology now in the case of Pi values
[00:01:41] technology now in the case of Pi values emerge quite clearly from the designer's
[00:01:44] emerge quite clearly from the designer's definition of the problem that the
[00:01:46] definition of the problem that the technology tries to address and the
[00:01:50] technology tries to address and the specification of design features that
[00:01:52] specification of design features that allow the product to uh solve that
[00:01:56] allow the product to uh solve that problem now these um um decisions on the
[00:02:01] problem now these um um decisions on the part of the designer interact with users
[00:02:04] part of the designer interact with users perceptions and the broader context in
[00:02:07] perceptions and the broader context in which the design is
[00:02:11] employed now when trying to identify the
[00:02:14] employed now when trying to identify the values coded into any form of Technology
[00:02:16] values coded into any form of Technology there's two things that we should do
[00:02:19] there's two things that we should do locate these values and Define them so
[00:02:23] locate these values and Define them so let me talk about each of these in
[00:02:26] let me talk about each of these in turn to locate values we need to look at
[00:02:29] turn to locate values we need to look at the key INF uences that shape the design
[00:02:32] the key INF uences that shape the design process this includes looking at who are
[00:02:35] process this includes looking at who are the key actors the designers of course
[00:02:38] the key actors the designers of course but also who are the shareholders who
[00:02:40] but also who are the shareholders who are the
[00:02:41] are the users then there's the functional
[00:02:44] users then there's the functional description what problem or need is this
[00:02:48] description what problem or need is this technology addressing it may turn out to
[00:02:51] technology addressing it may turn out to be that the very problem or the very
[00:02:53] be that the very problem or the very Target that is sought by the technology
[00:02:55] Target that is sought by the technology is specified in terms of values take the
[00:02:59] is specified in terms of values take the example of privacy enhancing web
[00:03:02] example of privacy enhancing web browsers now one thing we should also
[00:03:05] browsers now one thing we should also look at is constraints these constraints
[00:03:08] look at is constraints these constraints may be economic they may be technical
[00:03:10] may be economic they may be technical they may be commercial they may be legal
[00:03:12] they may be commercial they may be legal and all of these shape design decisions
[00:03:15] and all of these shape design decisions that will then end up embedding one
[00:03:17] that will then end up embedding one value or another into the technology and
[00:03:20] value or another into the technology and lastly there's societal input so culture
[00:03:24] lastly there's societal input so culture and social moras will shape what we can
[00:03:27] and social moras will shape what we can do and how it is interpreted by
[00:03:31] do and how it is interpreted by users now having identified these
[00:03:34] users now having identified these sources of influence you want to
[00:03:36] sources of influence you want to consider how they shape the design in
[00:03:38] consider how they shape the design in what ways they Channel users and others
[00:03:41] what ways they Channel users and others interpretations of the values that are
[00:03:43] interpretations of the values that are embedded in that
[00:03:47] design now here I want to call your
[00:03:49] design now here I want to call your attention to a very important concept
[00:03:53] attention to a very important concept that helenm bom refers to as collateral
[00:03:57] that helenm bom refers to as collateral values these are values that crop up as
[00:04:00] values these are values that crop up as side effects of design decisions even
[00:04:02] side effects of design decisions even though they're not intended by designers
[00:04:05] though they're not intended by designers so if you remember the values that are
[00:04:08] so if you remember the values that are coded into a product like Pi they're
[00:04:12] coded into a product like Pi they're quite explicit and quite deliberate now
[00:04:16] quite explicit and quite deliberate now collateral values are not like that they
[00:04:20] collateral values are not like that they arise or they emerge from the way in
[00:04:23] arise or they emerge from the way in which the design interacts with the
[00:04:25] which the design interacts with the world now these are important because
[00:04:28] world now these are important because they may drive serious wedges between
[00:04:30] they may drive serious wedges between what designers intend to express and
[00:04:32] what designers intend to express and what they end up expressing in their
[00:04:34] what they end up expressing in their design they drive wedges between
[00:04:37] design they drive wedges between intentions and actual
[00:04:39] intentions and actual impacts in the
[00:04:41] impacts in the world one particular way in which values
[00:04:44] world one particular way in which values may unintendedly crop up is through
[00:04:48] may unintendedly crop up is through standardization standardization happens
[00:04:50] standardization standardization happens when we make implicit assumptions about
[00:04:53] when we make implicit assumptions about who the standard user is for a given
[00:04:56] who the standard user is for a given technology or who the standard person
[00:04:59] technology or who the standard person person is who is going to benefit from
[00:05:02] person is who is going to benefit from this
[00:05:02] this technology now this is crucial because
[00:05:05] technology now this is crucial because default assumptions are often a
[00:05:07] default assumptions are often a reflection of existing power imbalances
[00:05:10] reflection of existing power imbalances and when they go unquestioned they
[00:05:13] and when they go unquestioned they contribute to reinforce those imbalances
[00:05:16] contribute to reinforce those imbalances by unwillingly discriminating against
[00:05:18] by unwillingly discriminating against those who do not resemble that standard
[00:05:22] those who do not resemble that standard user or the standard person that is
[00:05:24] user or the standard person that is meant to benefit from this
[00:05:27] meant to benefit from this technology biases play a crucial role in
[00:05:31] technology biases play a crucial role in standardization and these may include
[00:05:34] standardization and these may include pre-existing biases that are already in
[00:05:36] pre-existing biases that are already in society in culture or in institutions
[00:05:39] society in culture or in institutions technical biases or emergent
[00:05:42] technical biases or emergent biases uh biases that emerge when the
[00:05:46] biases uh biases that emerge when the product is used in a context that is
[00:05:49] product is used in a context that is different from its original context of
[00:05:54] use so when we assume that certain
[00:05:59] use so when we assume that certain standard
[00:06:00] standard user for a product is the person that
[00:06:04] user for a product is the person that we're going to be designing for we
[00:06:06] we're going to be designing for we assume that everybody else is a
[00:06:08] assume that everybody else is a non-standard user and this places a
[00:06:12] non-standard user and this places a burden of sorts on members of the groups
[00:06:14] burden of sorts on members of the groups that we fail to consider or that we fail
[00:06:16] that we fail to consider or that we fail to include in our design because it is
[00:06:20] to include in our design because it is harder for them to use the product and
[00:06:22] harder for them to use the product and to benefit from it sometimes that's okay
[00:06:25] to benefit from it sometimes that's okay every single decision that we make
[00:06:26] every single decision that we make places burdens on some people and not
[00:06:29] places burdens on some people and not others
[00:06:30] others every single decision that we make
[00:06:32] every single decision that we make provides benefits to some people and not
[00:06:34] provides benefits to some people and not others but these decisions Aggregate and
[00:06:38] others but these decisions Aggregate and if all of the burdens fall on the same
[00:06:41] if all of the burdens fall on the same group or groups of people then we enter
[00:06:44] group or groups of people then we enter into problems of distributive
[00:06:47] into problems of distributive justice
[00:06:49] justice now a lot of decisions about what can be
[00:06:52] now a lot of decisions about what can be offered and what needs to be addressed
[00:06:54] offered and what needs to be addressed maybe constrained by technical or
[00:06:56] maybe constrained by technical or economic consideration but some aren't
[00:06:59] economic consideration but some aren't and it's it's important that we treat
[00:07:00] and it's it's important that we treat them as decisions as decisions about who
[00:07:04] them as decisions as decisions about who we are benefiting when we design
[00:07:06] we are benefiting when we design technology in a certain
[00:07:08] technology in a certain way here's one interesting example of
[00:07:11] way here's one interesting example of how bias can be encoded into an AI
[00:07:14] how bias can be encoded into an AI system um and how bias can be encoded by
[00:07:17] system um and how bias can be encoded by taking one default
[00:07:20] taking one default Viewpoint so perspective API offers an
[00:07:24] Viewpoint so perspective API offers an automated detection of toxic contents
[00:07:27] automated detection of toxic contents and toxicity mitigation
[00:07:29] and toxicity mitigation which is crucial when you're building a
[00:07:32] which is crucial when you're building a large language model but also for tasks
[00:07:34] large language model but also for tasks like semi-automated content moderation
[00:07:37] like semi-automated content moderation that happens in digital platforms or in
[00:07:41] that happens in digital platforms or in uh public forums of all
[00:07:44] uh public forums of all kinds now in a paper that was published
[00:07:48] kinds now in a paper that was published a couple of years ago by researchers
[00:07:50] a couple of years ago by researchers from Deep Mind they analyzed the
[00:07:53] from Deep Mind they analyzed the toxicity scores generated by prospective
[00:07:56] toxicity scores generated by prospective API and how they impacted different
[00:07:59] API and how they impacted different groups groups of
[00:08:00] groups groups of people one of their key insights was to
[00:08:03] people one of their key insights was to look into the definition of toxicity if
[00:08:06] look into the definition of toxicity if you see on the screen the definition of
[00:08:09] you see on the screen the definition of toxicity is rude disrespectful or
[00:08:11] toxicity is rude disrespectful or unreasonable language that is likely to
[00:08:13] unreasonable language that is likely to make someone leave a
[00:08:15] make someone leave a discussion now the authors of this paper
[00:08:18] discussion now the authors of this paper noticed that this definition already
[00:08:20] noticed that this definition already builds biases into the system for one
[00:08:24] builds biases into the system for one thing it is somewhat subjective and
[00:08:27] thing it is somewhat subjective and dependent on the cultural background of
[00:08:29] dependent on the cultural background of whoever is rating a piece of content and
[00:08:31] whoever is rating a piece of content and the kinds of sensitivities that they
[00:08:33] the kinds of sensitivities that they might have for another thing it covers
[00:08:36] might have for another thing it covers only a subset of possibly harmful
[00:08:39] only a subset of possibly harmful content it does not cover for instance
[00:08:41] content it does not cover for instance harmful stereotypes that may be
[00:08:43] harmful stereotypes that may be perpetuated by something like an
[00:08:46] perpetuated by something like an llm but most importantly this definition
[00:08:49] llm but most importantly this definition prioritizes the interests of the main
[00:08:52] prioritizes the interests of the main custumers for perspective API namely the
[00:08:55] custumers for perspective API namely the digital platforms that are using these
[00:09:00] digital platforms that are using these um toxicity mitigation tools right for
[00:09:03] um toxicity mitigation tools right for these customers their business model
[00:09:07] these customers their business model depends on user
[00:09:08] depends on user engagement right so a definition of
[00:09:11] engagement right so a definition of toxicity that focuses on what makes
[00:09:13] toxicity that focuses on what makes somebody leave a discussion is one that
[00:09:16] somebody leave a discussion is one that seeks to uh protect the interests of
[00:09:19] seeks to uh protect the interests of those customers however this definition
[00:09:21] those customers however this definition does not necessarily build in the
[00:09:23] does not necessarily build in the interests of users
[00:09:26] interests of users themselves so there's one group that is
[00:09:28] themselves so there's one group that is being prioritized whose interests are
[00:09:30] being prioritized whose interests are being prioritized over others this
[00:09:33] being prioritized over others this constitutes a good example of collateral
[00:09:36] constitutes a good example of collateral values cropping
[00:09:37] values cropping up through default assumptions that get
[00:09:40] up through default assumptions that get built into something as basic as the
[00:09:43] built into something as basic as the definition of the key metric in
[00:09:45] definition of the key metric in perspective AI so once that we have
[00:09:48] perspective AI so once that we have located the values that are explicitly
[00:09:51] located the values that are explicitly or implicitly embedded in design
[00:09:53] or implicitly embedded in design decisions it is important to Define
[00:09:56] decisions it is important to Define values why is that because ethical and
[00:09:59] values why is that because ethical and political values are abstract
[00:10:02] political values are abstract controversial and difficult to
[00:10:04] controversial and difficult to Define definition and analysis allows us
[00:10:07] Define definition and analysis allows us to connect abstract
[00:10:09] to connect abstract values that we want to encode or embed
[00:10:13] values that we want to encode or embed in our Technologies to concrete design
[00:10:16] in our Technologies to concrete design features if values are not well defined
[00:10:19] features if values are not well defined products can entirely miss their marks
[00:10:22] products can entirely miss their marks so it is not simply a matter of
[00:10:23] so it is not simply a matter of philosophical exercises a way of
[00:10:25] philosophical exercises a way of securing that you embed the values that
[00:10:27] securing that you embed the values that you actually want to EMB embed into the
[00:10:30] you actually want to EMB embed into the technologies that you
[00:10:32] technologies that you create think about an example that I
[00:10:36] create think about an example that I find really interesting so thinking
[00:10:38] find really interesting so thinking about some of Microsoft's um early
[00:10:41] about some of Microsoft's um early chatbots from around 10 years ago
[00:10:45] chatbots from around 10 years ago Microsoft tried to create chat Bots that
[00:10:47] Microsoft tried to create chat Bots that were inclusive to
[00:10:49] were inclusive to people now if you want to make something
[00:10:52] people now if you want to make something inclusive one possible definition of
[00:10:55] inclusive one possible definition of inclusion is to make a technology that
[00:10:59] inclusion is to make a technology that that is welcome welcoming um to any kind
[00:11:03] that is welcome welcoming um to any kind of content and any kind of topic that
[00:11:06] of content and any kind of topic that users are interested in that was the
[00:11:09] users are interested in that was the approach that was taken by Microsoft
[00:11:11] approach that was taken by Microsoft when they designed their now Infamous
[00:11:15] when they designed their now Infamous Tai Tai was as you may know targeted by
[00:11:18] Tai Tai was as you may know targeted by a campaign coordinated on forchan and
[00:11:21] a campaign coordinated on forchan and ended up becoming highly racist and
[00:11:23] ended up becoming highly racist and sexist in a matter of hours to the point
[00:11:26] sexist in a matter of hours to the point that Microsoft had to take it down
[00:11:29] that Microsoft had to take it down now after Tai Microsoft designed zo and
[00:11:34] now after Tai Microsoft designed zo and zo had a different definition of
[00:11:36] zo had a different definition of inclusion namely protecting vulnerable
[00:11:39] inclusion namely protecting vulnerable users from insult and psychological harm
[00:11:42] users from insult and psychological harm and the way that it did it was by
[00:11:44] and the way that it did it was by blacklisting certain words of topics and
[00:11:46] blacklisting certain words of topics and rejecting any conversation that um that
[00:11:51] rejecting any conversation that um that went into those areas now that also led
[00:11:54] went into those areas now that also led to the outright exclusion of users who
[00:11:56] to the outright exclusion of users who wanted to talk about say being bullied
[00:11:59] wanted to talk about say being bullied for religious reasons so that didn't
[00:12:02] for religious reasons so that didn't quite do the job
[00:12:04] quite do the job either the point here is that the
[00:12:08] either the point here is that the definition of inclusion was something
[00:12:09] definition of inclusion was something that needed to be considered more
[00:12:11] that needed to be considered more carefully in this case so that the
[00:12:15] carefully in this case so that the products did not miss the mark
[00:12:17] products did not miss the mark completely as these two chatbots did now
[00:12:20] completely as these two chatbots did now to finalize I want to talk about value
[00:12:23] to finalize I want to talk about value conflicts now value conflicts are a
[00:12:26] conflicts now value conflicts are a crucial part of the story because they
[00:12:28] crucial part of the story because they are you liquidous um not just in
[00:12:31] are you liquidous um not just in technology but everywhere we are used to
[00:12:33] technology but everywhere we are used to discussing value conflicts in policy
[00:12:35] discussing value conflicts in policy making or in politics but also in design
[00:12:38] making or in politics but also in design decisions they're important they arise
[00:12:40] decisions they're important they arise even in relatively uncontroversial
[00:12:43] even in relatively uncontroversial contexts and this is not the result of
[00:12:46] contexts and this is not the result of poor design it is the inevitable
[00:12:48] poor design it is the inevitable consequence of
[00:12:50] consequence of recognizing that different things matter
[00:12:52] recognizing that different things matter to us to different degrees sometimes but
[00:12:54] to us to different degrees sometimes but often to the same degree and that makes
[00:12:56] often to the same degree and that makes it hard to choose between things that
[00:12:59] it hard to choose between things that are truly important this is a result of
[00:13:02] are truly important this is a result of value pluralism which we recognize as a
[00:13:06] value pluralism which we recognize as a society now although these conflicts are
[00:13:10] society now although these conflicts are everywhere and although they may appear
[00:13:12] everywhere and although they may appear intractable this does not mean that we
[00:13:14] intractable this does not mean that we should throw up our hands rather We
[00:13:17] should throw up our hands rather We should strive to make deliberate
[00:13:19] should strive to make deliberate conscientious and responsible choices in
[00:13:22] conscientious and responsible choices in how we deal with these value
[00:13:25] how we deal with these value conflicts the values and design
[00:13:27] conflicts the values and design framework describes through three
[00:13:29] framework describes through three different approaches to Value conflicts
[00:13:31] different approaches to Value conflicts there's dissolving the conflict which
[00:13:34] there's dissolving the conflict which means finding an alternative path that
[00:13:37] means finding an alternative path that avoids the conflict
[00:13:38] avoids the conflict entirely there's compromising which
[00:13:41] entirely there's compromising which means making designed decisions that put
[00:13:43] means making designed decisions that put boundaries on some values to protect
[00:13:46] boundaries on some values to protect others and vice versa so finding some
[00:13:49] others and vice versa so finding some kind of Middle Ground where some part of
[00:13:52] kind of Middle Ground where some part of what we value on both sides may be
[00:13:55] what we value on both sides may be attained and lastly there is trading off
[00:13:59] attained and lastly there is trading off which means that we simply decide to
[00:14:01] which means that we simply decide to prioritize one value and sacrifice
[00:14:03] prioritize one value and sacrifice others for its
[00:14:04] others for its sake now different situations may call
[00:14:07] sake now different situations may call for different approaches but what
[00:14:08] for different approaches but what matters again is that these decisions
[00:14:11] matters again is that these decisions are made deliberately and responsibly
[00:14:14] are made deliberately and responsibly that we know that we are making
[00:14:16] that we know that we are making sacrifices and how these sacrifices are
[00:14:19] sacrifices and how these sacrifices are likely to impact different people so
[00:14:21] likely to impact different people so that we can to the best of our capacity
[00:14:24] that we can to the best of our capacity mitigate the negative impacts of our
[00:14:27] mitigate the negative impacts of our decisions and that is so thank you very
[00:14:34] much


================================================================================
LECTURE 060
================================================================================

Stanford CS221 I  Algorithms and Distribution I 2023

Source: https://www.youtube.com/watch?v=olhFrDHP7iU

---

Transcript

[00:00:05] hi everybody my name is di Costa I'm
[00:00:07] hi everybody my name is di Costa I'm your embedded ethics fellow for
[00:00:10] your embedded ethics fellow for cs221 Welcome to our first mini video
[00:00:14] cs221 Welcome to our first mini video lecture uh during this term we're going
[00:00:16] lecture uh during this term we're going to be pairing short video lectures to
[00:00:19] to be pairing short video lectures to the assignments that contain ethics
[00:00:21] the assignments that contain ethics questions that you can use as reference
[00:00:24] questions that you can use as reference for these uh assignments and in the
[00:00:27] for these uh assignments and in the future right now we're going to be
[00:00:29] future right now we're going to be talking about algorithms and
[00:00:31] talking about algorithms and distribution when you consider decision
[00:00:34] distribution when you consider decision making from an algorithmic point of view
[00:00:37] making from an algorithmic point of view different algorithms may lead to
[00:00:39] different algorithms may lead to different distributions of benefits and
[00:00:41] different distributions of benefits and burdens in a
[00:00:42] burdens in a population what we're hoping that you
[00:00:45] population what we're hoping that you ask yourself with this assignment
[00:00:46] ask yourself with this assignment question is how to evaluate these
[00:00:49] question is how to evaluate these distributions from an ethical
[00:00:51] distributions from an ethical perspective and for that you need to
[00:00:53] perspective and for that you need to appeal to a field
[00:00:55] appeal to a field of moral and political philosophy that
[00:00:57] of moral and political philosophy that is known as distributive justice
[00:01:01] is known as distributive justice the principles of distributive justice
[00:01:03] the principles of distributive justice are those that provide moral guidance
[00:01:05] are those that provide moral guidance for the processes and structures that
[00:01:06] for the processes and structures that affect the distribution of benefits and
[00:01:09] affect the distribution of benefits and burdens in Societies or among
[00:01:12] burdens in Societies or among populations this is taken from the
[00:01:14] populations this is taken from the Stanford Encyclopedia of philosophy
[00:01:17] Stanford Encyclopedia of philosophy principles of distributive justice are
[00:01:19] principles of distributive justice are applicable to all kinds of decisions
[00:01:21] applicable to all kinds of decisions that generate distributions of burdens
[00:01:23] that generate distributions of burdens and benefits which may be algorithmic or
[00:01:28] otherwise
[00:01:30] otherwise now what I'm going to do now is I'm
[00:01:33] now what I'm going to do now is I'm going to give you a list of principles
[00:01:36] going to give you a list of principles of distributive justice that you can use
[00:01:38] of distributive justice that you can use to evaluate different courses of action
[00:01:42] to evaluate different courses of action please keep in mind that these are
[00:01:43] please keep in mind that these are simplified and highly intuitive versions
[00:01:47] simplified and highly intuitive versions of principles uh this is also a non
[00:01:49] of principles uh this is also a non exhaustive list of principles of
[00:01:51] exhaustive list of principles of distributive justice there are much uh
[00:01:54] distributive justice there are much uh many more principles that you can appeal
[00:01:57] many more principles that you can appeal to when considering how to distribute
[00:02:00] to when considering how to distribute burdens and benefits if you're
[00:02:02] burdens and benefits if you're interested in finding more information
[00:02:04] interested in finding more information you can look at the footnotes uh on your
[00:02:06] you can look at the footnotes uh on your assignment which link to various
[00:02:09] assignment which link to various resources where you can find a lot more
[00:02:11] resources where you can find a lot more in-depth information about distributive
[00:02:14] in-depth information about distributive justice now before jumping in um it's
[00:02:18] justice now before jumping in um it's important to think about the definition
[00:02:21] important to think about the definition of what a principle is so think of a
[00:02:23] of what a principle is so think of a moral principle as a kind of Norm that
[00:02:26] moral principle as a kind of Norm that dictates a policy or a course of action
[00:02:29] dictates a policy or a course of action in a given
[00:02:31] in a given situation what I'm going to do now is
[00:02:33] situation what I'm going to do now is I'm going to introduce three principles
[00:02:35] I'm going to introduce three principles of distributive justice by explaining
[00:02:38] of distributive justice by explaining how they support different courses of
[00:02:40] how they support different courses of action in a particular decision
[00:02:43] action in a particular decision scenario that decision scenario is going
[00:02:46] scenario that decision scenario is going to be the distribution of vaccines that
[00:02:49] to be the distribution of vaccines that is going to be our toy example in this
[00:02:52] is going to be our toy example in this mini
[00:02:53] mini lecture so having a limited number of
[00:02:56] lecture so having a limited number of vaccines and having a large population
[00:02:59] vaccines and having a large population how do you allocate them how do you
[00:03:01] how do you allocate them how do you allocate those vaccines among the
[00:03:04] allocate those vaccines among the people that um you need to serve now in
[00:03:08] people that um you need to serve now in reality one would need to consider the
[00:03:10] reality one would need to consider the question at a certain level either a
[00:03:12] question at a certain level either a global or national or local
[00:03:15] global or national or local level uh because we will not look at the
[00:03:18] level uh because we will not look at the details of how this happened in the real
[00:03:20] details of how this happened in the real world or in any specific settings uh
[00:03:24] world or in any specific settings uh we're rather going to ask ourselves the
[00:03:25] we're rather going to ask ourselves the question in the abstract form um but
[00:03:29] question in the abstract form um but than about the local level so one
[00:03:32] than about the local level so one potential policy would be to distribute
[00:03:36] potential policy would be to distribute vaccines in a way that ensures that as
[00:03:39] vaccines in a way that ensures that as many people as possible get vaccinated
[00:03:42] many people as possible get vaccinated regardless who they are uh this policy
[00:03:46] regardless who they are uh this policy uh would ensure that the highest
[00:03:48] uh would ensure that the highest quantity of people were vaccinated at
[00:03:51] quantity of people were vaccinated at the lowest cost and in the shortest time
[00:03:53] the lowest cost and in the shortest time frame you could achieve this by say um
[00:03:58] frame you could achieve this by say um setting vaccination centers in densely
[00:04:00] setting vaccination centers in densely populated areas of the city so that you
[00:04:02] populated areas of the city so that you get as many people as
[00:04:05] get as many people as possible that policy would be supported
[00:04:08] possible that policy would be supported by a moral principle that focused on
[00:04:11] by a moral principle that focused on maximizing well-being that is on
[00:04:13] maximizing well-being that is on securing the greatest net
[00:04:15] securing the greatest net benefit this principle is framed by a
[00:04:18] benefit this principle is framed by a philosophical framework known as
[00:04:20] philosophical framework known as utilitarianism which is a kind of
[00:04:23] utilitarianism which is a kind of consequentialism According to which the
[00:04:25] consequentialism According to which the right action to perform in any given
[00:04:27] right action to perform in any given circumstance is that which maximizes
[00:04:29] circumstance is that which maximizes utility
[00:04:30] utility that is the action that in the aggregate
[00:04:33] that is the action that in the aggregate causes the highest net benefit or brings
[00:04:37] causes the highest net benefit or brings about the highest net
[00:04:39] about the highest net benefit a second course of action would
[00:04:42] benefit a second course of action would be to ensure that the most vulnerable
[00:04:44] be to ensure that the most vulnerable populations have access to vaccines
[00:04:47] populations have access to vaccines before anybody else you can determine
[00:04:50] before anybody else you can determine this on the basis of age of race or
[00:04:52] this on the basis of age of race or class or cul abilities this course of
[00:04:55] class or cul abilities this course of action would be supported by a principle
[00:04:58] action would be supported by a principle that focused on prioritizing ing those
[00:05:00] that focused on prioritizing ing those who are worst off that is choosing
[00:05:03] who are worst off that is choosing distributions that ensure that those who
[00:05:06] distributions that ensure that those who are the worst off are served first there
[00:05:10] are the worst off are served first there are different versions of this principle
[00:05:12] are different versions of this principle that fit under different philosophical
[00:05:14] that fit under different philosophical Frameworks such as prioritarianism which
[00:05:17] Frameworks such as prioritarianism which mandates that we give priority to the
[00:05:19] mandates that we give priority to the well-being of individuals who are worse
[00:05:21] well-being of individuals who are worse off or roles is different principle
[00:05:24] off or roles is different principle According to which any inequality in the
[00:05:26] According to which any inequality in the distribution of social goods should be
[00:05:28] distribution of social goods should be such that it benefits fits those who are
[00:05:30] such that it benefits fits those who are worst
[00:05:31] worst off the third policy that we're going to
[00:05:34] off the third policy that we're going to consider uh is one that dictates that
[00:05:37] consider uh is one that dictates that you vaccinate the members of
[00:05:38] you vaccinate the members of historically marginalized group first by
[00:05:41] historically marginalized group first by for instance uh setting vaccination
[00:05:44] for instance uh setting vaccination sites in minority
[00:05:46] sites in minority neighborhoods why would you do this
[00:05:48] neighborhoods why would you do this think about it this way by delaying
[00:05:50] think about it this way by delaying vaccination to populations that have
[00:05:52] vaccination to populations that have been historically marginalized you are
[00:05:55] been historically marginalized you are placing an additional burden on the
[00:05:58] placing an additional burden on the members of these groups groups by say
[00:06:01] members of these groups groups by say inhibiting their ability to return to
[00:06:02] inhibiting their ability to return to work and secure income for themselves
[00:06:04] work and secure income for themselves and their family and this compounds the
[00:06:07] and their family and this compounds the effects of historical
[00:06:11] effects of historical discrimination this
[00:06:13] discrimination this policy would be supported by a principle
[00:06:16] policy would be supported by a principle that focused on avoiding a course of
[00:06:18] that focused on avoiding a course of action that disproportionately burdens
[00:06:20] action that disproportionately burdens members of marginalized communities some
[00:06:23] members of marginalized communities some have called this the anti-omp founding
[00:06:25] have called this the anti-omp founding Injustice principle and it is driven by
[00:06:27] Injustice principle and it is driven by the idea that algorithmic decisions
[00:06:29] the idea that algorithmic decisions systems should be deliberately
[00:06:32] systems should be deliberately focusing on avoiding contributing to
[00:06:35] focusing on avoiding contributing to historical Injustice and
[00:06:39] historical Injustice and discrimination now to
[00:06:41] discrimination now to summarize I have presented you with
[00:06:43] summarize I have presented you with three principles of distributive justice
[00:06:45] three principles of distributive justice as I said intuitive versions of these
[00:06:48] as I said intuitive versions of these principles one that focuses on
[00:06:50] principles one that focuses on maximizing well-being one that focuses
[00:06:52] maximizing well-being one that focuses on prioritizing those who are worse off
[00:06:55] on prioritizing those who are worse off and one that focuses on avoiding
[00:06:57] and one that focuses on avoiding compounding historical Injustice
[00:07:00] compounding historical Injustice please remember again this is an
[00:07:02] please remember again this is an intuitive and not an exhaustive list of
[00:07:05] intuitive and not an exhaustive list of St Justice
[00:07:07] St Justice principles but what matters here is that
[00:07:10] principles but what matters here is that these principles are applicable to
[00:07:12] these principles are applicable to distributions of benefits and burdens
[00:07:14] distributions of benefits and burdens through algorithmic decision
[00:07:24] making


================================================================================
LECTURE INDEX.md
================================================================================

CS221 – Artificial Intelligence

Playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rOca_Ovz1DvdtWuz8BfSWL2

Total Videos: 60
Transcripts Downloaded: 60
Failed/No Captions: 0

---

Lectures

1. General Intro | Stanford CS221: Artificial Intelligence: Principles and Techniques (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=ZiwogMtbjr4](https://www.youtube.com/watch?v=ZiwogMtbjr4)
- Transcript: [001_ZiwogMtbjr4.md](001_ZiwogMtbjr4.md)

2. AI History | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=z8fEXuH0mu0](https://www.youtube.com/watch?v=z8fEXuH0mu0)
- Transcript: [002_z8fEXuH0mu0.md](002_z8fEXuH0mu0.md)

3. Artificial Intelligence Today | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=C0IhR4D5KYc](https://www.youtube.com/watch?v=C0IhR4D5KYc)
- Transcript: [003_C0IhR4D5KYc.md](003_C0IhR4D5KYc.md)

4. Artificial Intelligence and Machine Learning 1 - Overview | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=mtrYwgIrRNk](https://www.youtube.com/watch?v=mtrYwgIrRNk)
- Transcript: [004_mtrYwgIrRNk.md](004_mtrYwgIrRNk.md)

5. Artificial Intelligence & Machine Learning 2 - Linear Regression | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=nEWNNt2KmfQ](https://www.youtube.com/watch?v=nEWNNt2KmfQ)
- Transcript: [005_nEWNNt2KmfQ.md](005_nEWNNt2KmfQ.md)

6. Artificial Intelligence & Machine learning 3 - Linear Classification | Stanford CS221 (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=WcaMiqJR09s](https://www.youtube.com/watch?v=WcaMiqJR09s)
- Transcript: [006_WcaMiqJR09s.md](006_WcaMiqJR09s.md)

7. Artificial Intelligence & Machine Learning 4 - Stochastic Gradient Descent | Stanford CS221 (2021)
- Video: [https://www.youtube.com/watch?v=bl2WgBLH0tI](https://www.youtube.com/watch?v=bl2WgBLH0tI)
- Transcript: [007_bl2WgBLH0tI.md](007_bl2WgBLH0tI.md)

8. Artificial Intelligence and Machine Learning 5 - Group DRO | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=ZFK2XtWqUbw](https://www.youtube.com/watch?v=ZFK2XtWqUbw)
- Transcript: [008_ZFK2XtWqUbw.md](008_ZFK2XtWqUbw.md)

9. Artificial Intelligence & Machine Learning 6 - Non Linear Features | Stanford CS221: AI(Autumn 2021)
- Video: [https://www.youtube.com/watch?v=eIxbNkB4byY](https://www.youtube.com/watch?v=eIxbNkB4byY)
- Transcript: [009_eIxbNkB4byY.md](009_eIxbNkB4byY.md)

10. Artificial Intelligence & Machine Learning 7 - Feature Templates | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=2QfSBLtvioE](https://www.youtube.com/watch?v=2QfSBLtvioE)
- Transcript: [010_2QfSBLtvioE.md](010_2QfSBLtvioE.md)

11. Artificial Intelligence & Machine Learning 8 - Neural Networks | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=pnKXgBHuN58](https://www.youtube.com/watch?v=pnKXgBHuN58)
- Transcript: [011_pnKXgBHuN58.md](011_pnKXgBHuN58.md)

12. Machine Learning 9 - Backpropagation | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=OcAF-l2xB9Y](https://www.youtube.com/watch?v=OcAF-l2xB9Y)
- Transcript: [012_OcAF-l2xB9Y.md](012_OcAF-l2xB9Y.md)

13. Machine Learning 10 - Differentiable Programming | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=c5btEEisp_g](https://www.youtube.com/watch?v=c5btEEisp_g)
- Transcript: [013_c5btEEisp_g.md](013_c5btEEisp_g.md)

14. Artificial Intelligence & Machine Learning 11 - Generalization | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=Gq-Ah-QrOQM](https://www.youtube.com/watch?v=Gq-Ah-QrOQM)
- Transcript: [014_Gq-Ah-QrOQM.md](014_Gq-Ah-QrOQM.md)

15. Artificial Intelligence & Machine Learning 12 - Best Practices | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=ouvGV2YZEEM](https://www.youtube.com/watch?v=ouvGV2YZEEM)
- Transcript: [015_ouvGV2YZEEM.md](015_ouvGV2YZEEM.md)

16. Machine Learning 13 - K-means | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=5-Fn8R9fH7A](https://www.youtube.com/watch?v=5-Fn8R9fH7A)
- Transcript: [016_5-Fn8R9fH7A.md](016_5-Fn8R9fH7A.md)

17. Search 1 - Dynamic Programming, Uniform Cost Search | Stanford CS221: AI (Autumn 2019)
- Video: [https://www.youtube.com/watch?v=aIsgJJYrlXk](https://www.youtube.com/watch?v=aIsgJJYrlXk)
- Transcript: [017_aIsgJJYrlXk.md](017_aIsgJJYrlXk.md)

18. Search 2 - A* | Stanford CS221: Artificial Intelligence (Autumn 2019)
- Video: [https://www.youtube.com/watch?v=HEs1ZCvLH2s](https://www.youtube.com/watch?v=HEs1ZCvLH2s)
- Transcript: [018_HEs1ZCvLH2s.md](018_HEs1ZCvLH2s.md)

19. Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019)
- Video: [https://www.youtube.com/watch?v=9g32v7bK3Co](https://www.youtube.com/watch?v=9g32v7bK3Co)
- Transcript: [019_9g32v7bK3Co.md](019_9g32v7bK3Co.md)

20. Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019)
- Video: [https://www.youtube.com/watch?v=HpaHTfY52RQ](https://www.youtube.com/watch?v=HpaHTfY52RQ)
- Transcript: [020_HpaHTfY52RQ.md](020_HpaHTfY52RQ.md)

21. Game Playing 1 - Minimax, Alpha-beta Pruning | Stanford CS221: AI (Autumn 2019)
- Video: [https://www.youtube.com/watch?v=3pU-Hrz_xy4](https://www.youtube.com/watch?v=3pU-Hrz_xy4)
- Transcript: [021_3pU-Hrz_xy4.md](021_3pU-Hrz_xy4.md)

22. Game Playing 2 - TD Learning, Game Theory | Stanford CS221: Artificial Intelligence (Autumn 2019)
- Video: [https://www.youtube.com/watch?v=WoFwXj4p4Sc](https://www.youtube.com/watch?v=WoFwXj4p4Sc)
- Transcript: [022_WoFwXj4p4Sc.md](022_WoFwXj4p4Sc.md)

23. Constraint Satisfaction Problems (CSPs) 1 - Overview | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=-IO4fPO0rxk](https://www.youtube.com/watch?v=-IO4fPO0rxk)
- Transcript: [023_-IO4fPO0rxk.md](023_-IO4fPO0rxk.md)

24. Constraint Satisfaction Problems (CSPs) 2 - Definitions | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=uj5wCcHsSlA](https://www.youtube.com/watch?v=uj5wCcHsSlA)
- Transcript: [024_uj5wCcHsSlA.md](024_uj5wCcHsSlA.md)

25. Constraint Satisfaction Problems (CSPs) 3 - Examples | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=Tu6BiZhMDCc](https://www.youtube.com/watch?v=Tu6BiZhMDCc)
- Transcript: [025_Tu6BiZhMDCc.md](025_Tu6BiZhMDCc.md)

26. Constraint Satisfaction Problems (CSPs) 4 - Dynamic Ordering | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=Lyu8VzbIe_A](https://www.youtube.com/watch?v=Lyu8VzbIe_A)
- Transcript: [026_Lyu8VzbIe_A.md](026_Lyu8VzbIe_A.md)

27. Constraint Satisfaction Problems (CSPs) 5 - Arc Consistency | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=5rlIYGJdPy4](https://www.youtube.com/watch?v=5rlIYGJdPy4)
- Transcript: [027_5rlIYGJdPy4.md](027_5rlIYGJdPy4.md)

28. Constraint Satisfaction Problems (CSPs) 6 - Beam Search | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=XuWMeIHGkus](https://www.youtube.com/watch?v=XuWMeIHGkus)
- Transcript: [028_XuWMeIHGkus.md](028_XuWMeIHGkus.md)

29. Constraint Satisfaction Problems (CSPs) 7 - Local Search | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=VwZKPlK6jUg](https://www.youtube.com/watch?v=VwZKPlK6jUg)
- Transcript: [029_VwZKPlK6jUg.md](029_VwZKPlK6jUg.md)

30. Markov Networks 1 - Overview | Stanford CS221: Artificial Intelligence (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=neeaJb3wCYw](https://www.youtube.com/watch?v=neeaJb3wCYw)
- Transcript: [030_neeaJb3wCYw.md](030_neeaJb3wCYw.md)

31. Markov Networks 2 - Gibbs Sampling | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=k6aZZF2pk7k](https://www.youtube.com/watch?v=k6aZZF2pk7k)
- Transcript: [031_k6aZZF2pk7k.md](031_k6aZZF2pk7k.md)

32. Bayesian Networks 1 - Overview | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=fA7zP6EcVdw](https://www.youtube.com/watch?v=fA7zP6EcVdw)
- Transcript: [032_fA7zP6EcVdw.md](032_fA7zP6EcVdw.md)

33. Bayesian Networks 2 - Definition | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=xvC6XmZmR_U](https://www.youtube.com/watch?v=xvC6XmZmR_U)
- Transcript: [033_xvC6XmZmR_U.md](033_xvC6XmZmR_U.md)

34. Bayesian Networks 3 - Probabilistic Programming | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=ZVk8y1zVoD4](https://www.youtube.com/watch?v=ZVk8y1zVoD4)
- Transcript: [034_ZVk8y1zVoD4.md](034_ZVk8y1zVoD4.md)

35. Bayesian Networks 4 - Probabilistic Inference | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=-dGOWB9Zh8s](https://www.youtube.com/watch?v=-dGOWB9Zh8s)
- Transcript: [035_-dGOWB9Zh8s.md](035_-dGOWB9Zh8s.md)

36. Bayesian Networks 5 - Forward-backward Algorithm | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=N-ZPbpJOQs0](https://www.youtube.com/watch?v=N-ZPbpJOQs0)
- Transcript: [036_N-ZPbpJOQs0.md](036_N-ZPbpJOQs0.md)

37. Bayesian Networks 6 - Particle Filtering | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=8sOtXbQIOuE](https://www.youtube.com/watch?v=8sOtXbQIOuE)
- Transcript: [037_8sOtXbQIOuE.md](037_8sOtXbQIOuE.md)

38. Bayesian Networks 7 - Supervised Learning | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=_rbDjsJTgm8](https://www.youtube.com/watch?v=_rbDjsJTgm8)
- Transcript: [038__rbDjsJTgm8.md](038__rbDjsJTgm8.md)

39. Bayesian Networks 8 - Smoothing | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=M7rWvN_0xbw](https://www.youtube.com/watch?v=M7rWvN_0xbw)
- Transcript: [039_M7rWvN_0xbw.md](039_M7rWvN_0xbw.md)

40. Bayesian Networks 9 - EM Algorithm | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=CPVFJBd-Qcg](https://www.youtube.com/watch?v=CPVFJBd-Qcg)
- Transcript: [040_CPVFJBd-Qcg.md](040_CPVFJBd-Qcg.md)

41. Logic 1 - Overview: Logic Based Models | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=oM5LUGPO7Zk](https://www.youtube.com/watch?v=oM5LUGPO7Zk)
- Transcript: [041_oM5LUGPO7Zk.md](041_oM5LUGPO7Zk.md)

42. Logic 2 - Propositional Logic Syntax | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=LBjNaewGJzk](https://www.youtube.com/watch?v=LBjNaewGJzk)
- Transcript: [042_LBjNaewGJzk.md](042_LBjNaewGJzk.md)

43. Logic 3 - Propositional Logic Semantics | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=N37yIn1jX98](https://www.youtube.com/watch?v=N37yIn1jX98)
- Transcript: [043_N37yIn1jX98.md](043_N37yIn1jX98.md)

44. Logic 4 - Inference Rules | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=RIk67yGMVv4](https://www.youtube.com/watch?v=RIk67yGMVv4)
- Transcript: [044_RIk67yGMVv4.md](044_RIk67yGMVv4.md)

45. Logic 5 - Propositional Modus Ponens | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=6bj4z2mt1KE](https://www.youtube.com/watch?v=6bj4z2mt1KE)
- Transcript: [045_6bj4z2mt1KE.md](045_6bj4z2mt1KE.md)

46. Logic 6 - Propositional Resolutions | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=egLAF4dFdBo](https://www.youtube.com/watch?v=egLAF4dFdBo)
- Transcript: [046_egLAF4dFdBo.md](046_egLAF4dFdBo.md)

47. Logic 7 - First Order Logic | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=Z-O0Q3_oTJM](https://www.youtube.com/watch?v=Z-O0Q3_oTJM)
- Transcript: [047_Z-O0Q3_oTJM.md](047_Z-O0Q3_oTJM.md)

48. Logic 8 - First Order Modus Ponens | Stanford CS221: Artificial Intelligence (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=mndzhfBpyUw](https://www.youtube.com/watch?v=mndzhfBpyUw)
- Transcript: [048_mndzhfBpyUw.md](048_mndzhfBpyUw.md)

49. Logic 9 - First Order Resolution | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=iG_tz7ZjZAI](https://www.youtube.com/watch?v=iG_tz7ZjZAI)
- Transcript: [049_iG_tz7ZjZAI.md](049_iG_tz7ZjZAI.md)

50. Logic 10 - Recap | Stanford CS221: Artificial Intelligence (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=LYsOjtmLpPo](https://www.youtube.com/watch?v=LYsOjtmLpPo)
- Transcript: [050_LYsOjtmLpPo.md](050_LYsOjtmLpPo.md)

51. AI and Law I Mariano-Florentino Cuéllar, President of the Carnegie Endowment for International Peace
- Video: [https://www.youtube.com/watch?v=_-hBu3_Jz-0](https://www.youtube.com/watch?v=_-hBu3_Jz-0)
- Transcript: [051__-hBu3_Jz-0.md](051__-hBu3_Jz-0.md)

52. Stanford Fireside Talks: Robustness in Machine Learning I Robust Machine Learning
- Video: [https://www.youtube.com/watch?v=xr8AHGlieOE](https://www.youtube.com/watch?v=xr8AHGlieOE)
- Transcript: [052_xr8AHGlieOE.md](052_xr8AHGlieOE.md)

53. Fireside Talks: State of Robotics I Automation and Robotics Engineering Lectures - Stanford
- Video: [https://www.youtube.com/watch?v=hVsR9DdR3qE](https://www.youtube.com/watch?v=hVsR9DdR3qE)
- Transcript: [053_hVsR9DdR3qE.md](053_hVsR9DdR3qE.md)

54. Stanford Talk: Inequality in Healthcare, AI & Data Science to Reduce Inequality - Improve Healthcare
- Video: [https://www.youtube.com/watch?v=0IZhDmh1dmI](https://www.youtube.com/watch?v=0IZhDmh1dmI)
- Transcript: [054_0IZhDmh1dmI.md](054_0IZhDmh1dmI.md)

55. Fireside Talks:  Artificial Intelligence (AI) and Language
- Video: [https://www.youtube.com/watch?v=pI72PseZQo8](https://www.youtube.com/watch?v=pI72PseZQo8)
- Transcript: [055_pI72PseZQo8.md](055_pI72PseZQo8.md)

56. General Conclusion | Stanford CS221: AI (Autumn 2021)
- Video: [https://www.youtube.com/watch?v=iUGmupxCdjs](https://www.youtube.com/watch?v=iUGmupxCdjs)
- Transcript: [056_iUGmupxCdjs.md](056_iUGmupxCdjs.md)

57. Stanford CS221 I Externalities and Dual-Use Technologies I 2023
- Video: [https://www.youtube.com/watch?v=2xQLCXqOtdU](https://www.youtube.com/watch?v=2xQLCXqOtdU)
- Transcript: [057_2xQLCXqOtdU.md](057_2xQLCXqOtdU.md)

58. Stanford CS221 I The AI Alignment Problem: Reward Hacking & Negative Side Effects I 2023
- Video: [https://www.youtube.com/watch?v=5WHObJWE1FE](https://www.youtube.com/watch?v=5WHObJWE1FE)
- Transcript: [058_5WHObJWE1FE.md](058_5WHObJWE1FE.md)

59. Stanford CS221 I Encoding Human Values I 2023
- Video: [https://www.youtube.com/watch?v=aWAqgzXENr0](https://www.youtube.com/watch?v=aWAqgzXENr0)
- Transcript: [059_aWAqgzXENr0.md](059_aWAqgzXENr0.md)

60. Stanford CS221 I  Algorithms and Distribution I 2023
- Video: [https://www.youtube.com/watch?v=olhFrDHP7iU](https://www.youtube.com/watch?v=olhFrDHP7iU)
- Transcript: [060_olhFrDHP7iU.md](060_olhFrDHP7iU.md)